|
Home | Switchboard | Unix Administration | Red Hat | TCP/IP Networks | Neoliberalism | Toxic Managers |
(slightly skeptical) Educational society promoting "Back to basics" movement against IT overcomplexity and bastardization of classic Unix |
|
Note: The following information is provided in part by the
Extreme Science and Engineering Discovery Environment (
The format of Windows and Unix text files differs. In Windows, lines end with both the line feed and carriage return ASCII characters, but Unix uses only a line feed.
There are several utility software to convert text files from UNIX or Linux to DOS operating systems and vice-versa; however, it always helps to know the manual conversion. In shell programming languages like UNIX or Linux the text files conclude with a new line operator"\n" also known as the line feed and its ASCII code is 0A. A DOS Text file concludes a line by the carriage return or the entry key "\r": its ASCII code is 0D. The lines in the DOS end with CRLF or with "\r\n". To convert DOS text into UNIX or Linux erase the "\r"; you can also use ASCII codes if you are using GNU-sed version.
As a consequence, some Windows applications will not show the line breaks
in Unix-format files. Likewise, Unix programs may display the carriage returns in
Windows text files with Ctrl-m
(^M
) characters at the
end of each line.
In this case you can use multiple Perl one liners along the following lines
perl -pi -e 's/\r\n/\n/g' input.file
perl -pi -e 's/\r/\n/g' input.file
For simple conversion you can use FTP, screen capture,
unix2dos and
dos2unix, tr
,
awk, Perl,
and vi to do the conversion. You can also
use CYGWIN.
When using an FTP program to move a text file between Unix and Windows, be sure the file is transferred in ASCII format, so the document is transformed into a text format appropriate for the host. Some FTP programs, especially graphical applications (e.g., Hummingbird FTP), do this automatically. If you are using command line FTP, before you begin the transfer, enter:
ascii
Note: You need to use a client that supports secure FTP to transfer files to and from Indiana University's central systems. For more, see At IU, what SSH/SFTP clients are supported and where can I get them?
The utilities dos2unix
and unix2dos
are available for
converting files from the Unix command line.
To convert a Windows file to a Unix file, enter:
dos2unix winfile.txt unixfile.txt
To convert a Unix file to Windows, enter:
unix2dos unixfile.txt winfile.txt
You can use tr
to remove all carriage returns and Ctrl-z
(^Z
) characters from a Windows file:
tr -d '\15
\32
' < winfile.txt > unixfile.txt
However, you cannot use tr
to convert a document from Unix format
to Windows.
To use awk to convert a Windows file to Unix, enter:
awk '{ sub("\r$", ""); print }' winfile.txt > unixfile.txt
To convert a Unix file to Windows, enter:
awk 'sub("$", "\r")' unixfile.txt > winfile.txt
Older versions of awk
do not include the sub
function.
In such cases, use the same command, but replace awk
with gawk
or nawk
.
To convert a Windows text file to a Unix text file using Perl, enter:
perl -p -e 's/\r$//' < winfile.txt > unixfile.txt
To convert from a Unix text file to a Windows text file, enter:
perl -p -e 's/\n/\r\n/' < unixfile.txt > winfile.txt
You must use single quotation marks in either command line. This prevents your shell from trying to evaluate anything inside.
In vi, you can remove carriage return
( ^M
) characters with the following command:
:1,$s/^M//g
Note: To input the ^M
character, press Ctrl-v
,
and then press Enter
or return
.
In vim, use :set ff=unix
to convert to Unix; use :set ff=dos
to convert to Windows.
|
||||
Bulletin | Latest | Past week | Past month |
|
October 26, 2010 | Ubuntu Genius
Most people don't realise that when they hit the Enter key to create a new paragraph in a text file, something very different is going on behind the scenes in the three major operating systems: Windows, Macintosh and Linux. The "end-of-line delimiter" (often expressed as "End-Of-Line", "End of Line", or just "EOL") – which some of you know as the "line break" or "newline" – is a special character used to designate the end of a line within a text file.
UNIX-based operating systems (like all Linux distros and BSD derivatives) use the line feed character (\n or <LF>), "classic" Mac OS uses a carriage return (\r or <CR>), while DOS/Windows uses a carriage return followed by a line feed (\r\n or <CR><LF>). Now that Mac OS X is based on FreeBSD's file system, it follows the UNIX convention.
Now, the reason most people don't know about all this is because nobody really should have to. But while users of Linux distros and Mac OS can open Windows text files in basically any available editor and not even know the difference, the same can't be said for Windows users opening files created in one of the other operating systems.
If you type up a simple text file in Ubuntu and save it in the default "Unix/Linux" format, in Windows it will appear as one continuous paragraph, with black squares where the line breaks (or new paragraphs) should be. While you can open the file in a more advanced text editor (or proper word processor) to view it as it should look, others you've sent it to are just likely to double-click it and let it open in Notepad (which can only handle MS-DOS EOL).
Occasionally, the reverse is the issue, but you can convert Windows text files to UNIX easily with Gedit, as well as convert them via the terminal, so hopefully the following guide will be of use.
For more detailed info on End-Of-Line, go to the Wikipedia page.
Or if you're wanting to do the reverse, check out how to convert to Windows format via the terminal and with Save As… in Gedit.
Converting Windows EOL to Linux via the Terminal
If you find the text editor you're using to display Windows files in Ubuntu shows ^M instead of a line break (not very likely with even the most lightweight text editors, but something you'll probably come across if you display the text in a terminal), don't worry – just convert them to Unix/Linux format.
While you can actually open them in Gedit and use Save As… to save over them (or to create copies) in the correct format, for more than a couple of files this would be the long, complicated solution.
By far the quickest and easiest approach is to convert the offending files via the command-line. This way, you could batch-convert hundreds of such files at once, not have to do them individually.
There are actually quite a few ways to do this, but we'll look at a couple of tiny packages you can install, and the related commands to use.The first – the tofrodos package – is undoubtedly the most widely-used, so we'll look at that in detail – especially since many of the guides out there are outdated, since the commands it contains have been renamed.
The second is a little package called flip, and since it's tiny and won't cause any issues, it's worth installing as a backup (just in case. I found it useful after trying to get tofrodos going on a new system, before I found out the commands were changed).
There is no actual command tofrodos, as it is just the package that contains the commands todos and fromdos. Currently, the vast majority of online guides will list the commands as unix2dos and dos2unix, but as the developer states:
With this release the symlinks "unix2dos" and "dos2unix" are dropped from the package. This will allow the introduction of the original dos2unix package, which also supports conversion to MacOS style files.
So now you can choose to use either todos (to convert to Windows) and fromdos (to convert to Linux), or just fromdos with options (fromdos -u to convert to DOS, and fromdos -d to convert to UNIX, though obviously the -d option really isn't needed, as it is the default behaviour for the fromdos command).
We'll use fromdos, as it is easier to remember, and show how to alter a single file, or all text files in a given folder. When you're ready to proceed, open a terminal in the folder containing the text file(s) and use one of the following commands (note that for the purpose of illustration, the .txt suffix is used, but you can specify any other extension for your text files).
To Convert to UNIX/Linux via Terminal:
Single file (remember to replace filename.txt with the actual name of the file)
fromdos filename.txt
All text files in a folder (if the extension differs to .txt, simply replace it in the command)
fromdos *.txt
Similarly, flip is easy to use:
flip -u filename.txt (or flip -u *.txt for multiple files)
Task: Convert Dos TO Unix Using tr Command
Type the following command:
tr -d '\r' < input.file > output.fileTask: Convert Dos TO Unix Using Perl One Liner
Type the following command:
perl -pi -e 's/\r\n/\n/g' input.fileTask: Convert UNIX to DOS format using sed command
Type the following command:
$ sed 's/$'"/`echo \\\r`/" input.txt > output.txtNote: sed version may not work under different UNIX/Linux variant,refer your local sed man page for more info.Task: Convert DOS newlines (CR/LF) to Unix format using sed command
If you are using BASH shell type the following command (press Ctrl-V then Ctrl-M to get pattern or special symbol)
$ sed 's/^M$//' input.txt > output.txtNote: sed version may not work under different UNIX/Linux variant, refer your local sed man page for more info.
Converting DOS-UNIX and vice-versa
The text files under Unix end their line with the symbol "\n" (called Line Feed and noted LF, ASCII code = 0A).Text files under DOS by a "line", end their line with the symbol "\r"(called Carriage Return and noted CR, ASCII 0D).
Thus, every line in a DOS file ends with a CRLF sequence, or \r\n.
Conversion from DOS to UNIX
Simply delete the "\r" (carriage return) at the end of the line.
The "\ r" is symbolically represented by "^M", which is obtained by the following sequence of keys "CTRL-V" + "CTRL-M".
sed 's/^M$//' fileNote:
With the GNU-sed(gsed 3.02.80) version, we can use the ASCII notation:
sed 's/\x0D$//' file
Conversion from UNIX to DOS
Just do the opposite of the previous command, namely (the "^M" being entered in the same way (CTRL-V + CTRL-M)):sed 's/$/^M/' fileNote:With the GNU-sed(gsed 3.02.80) version, we can use the symbolic notation "\r":
sed 's/$/\r/' file
HowTo UNIX - Linux Convert DOS Newlines CR-LF to Unix-Linux Format
Converting DOS-UNIX and vice-versa
|
||||
Bulletin | Latest | Past week | Past month |
|
Society
Groupthink : Two Party System as Polyarchy : Corruption of Regulators : Bureaucracies : Understanding Micromanagers and Control Freaks : Toxic Managers : Harvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Neoliberalism : The Iron Law of Oligarchy : Libertarian Philosophy
Quotes
War and Peace : Skeptical Finance : John Kenneth Galbraith :Talleyrand : Oscar Wilde : Otto Von Bismarck : Keynes : George Carlin : Skeptics : Propaganda : SE quotes : Language Design and Programming Quotes : Random IT-related quotes : Somerset Maugham : Marcus Aurelius : Kurt Vonnegut : Eric Hoffer : Winston Churchill : Napoleon Bonaparte : Ambrose Bierce : Bernard Shaw : Mark Twain Quotes
Bulletin:
Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks The efficient markets hypothesis : Political Skeptic Bulletin, 2013 : Unemployment Bulletin, 2010 : Vol 23, No.10 (October, 2011) An observation about corporate security departments : Slightly Skeptical Euromaydan Chronicles, June 2014 : Greenspan legacy bulletin, 2008 : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Financial Humor Bulletin, 2010 : Inequality Bulletin, 2009 : Financial Humor Bulletin, 2008 : Copyleft Problems Bulletin, 2004 : Financial Humor Bulletin, 2011 : Energy Bulletin, 2010 : Malware Protection Bulletin, 2010 : Vol 26, No.1 (January, 2013) Object-Oriented Cult : Political Skeptic Bulletin, 2011 : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law
History:
Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds : Larry Wall : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOS : Programming Languages History : PL/1 : Simula 67 : C : History of GCC development : Scripting Languages : Perl history : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history
Classic books:
The Peter Principle : Parkinson Law : 1984 : The Mythical Man-Month : How to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Hater’s Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite
Most popular humor pages:
Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor
The Last but not Least Technology is dominated by two types of people: those who understand what they do not manage and those who manage what they do not understand ~Archibald Putt. Ph.D
Copyright © 1996-2021 by Softpanorama Society. www.softpanorama.org was initially created as a service to the (now defunct) UN Sustainable Development Networking Programme (SDNP) without any remuneration. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.
FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.
This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...
|
You can use PayPal to to buy a cup of coffee for authors of this site |
Disclaimer:
The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the Softpanorama society. We do not warrant the correctness of the information provided or its fitness for any purpose. The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be tracked by Google please disable Javascript for this site. This site is perfectly usable without Javascript.
Last modified: March 12, 2019