Softpanorama

May the source be with you, but remember the KISS principle ;-)
Home Switchboard Unix Administration Red Hat TCP/IP Networks Neoliberalism Toxic Managers
(slightly skeptical) Educational society promoting "Back to basics" movement against IT overcomplexity and  bastardization of classic Unix

Conversion of files from Windows to Unix format

News

Unixification of Windows

Recommended Links Cygwin Packages Cygwin mini Tutorial Using Apache 1.3 with Cygwin

Classic Unix tools

Unix Utilities

Cygwin/X

X Window System

XDMCP in Cygwin

VNC Teraterm Putty screen  "Can't open display" Error
Perl on Cygwin ssh VIM Filesystem Recovery Disk Backup syslog Command line mail clients Command line Net Tools
Hummingbird Exceed SFU
(version 3.5)
uwin Xming Replacing Exceed with Cygwin   Humor

Etc

 Note: The following information is provided in part by the Extreme Science and Engineering Discovery Environment (XSEDE), a National Science Foundation (NSF) project that provides researchers with advanced digital resources and services that facilitate scientific discovery. For more, see the XSEDE web site.


The format of Windows and Unix text files differs.  In Windows, lines end with both the line feed and carriage return ASCII characters, but Unix uses only a line feed.

There are several utility software to convert text files from UNIX or Linux to DOS operating systems and vice-versa; however, it always helps to know the manual conversion. In shell programming languages like UNIX or Linux the text files conclude with a new line operator"\n" also known as the line feed and its ASCII code is 0A. A DOS Text file concludes a line by the carriage return or the entry key "\r": its ASCII code is 0D. The lines in the DOS end with CRLF or with "\r\n". To convert  DOS text into UNIX or Linux erase the "\r"; you can also use ASCII codes if you are using GNU-sed version.

As a consequence, some Windows applications will not show the line breaks in Unix-format files. Likewise, Unix programs may display the carriage returns in Windows text files with Ctrl-m (^M) characters at the end of each line.

Notes:

  1. Sometimes when you edit files in Windows and Unix you get a file that have fragments in "Unix style" and fragments in Windows style. dos2unix does not convert such files. Sometime there are even fragments that have only ^M at the end. See How do I convert between Unix an

    In this case you can use multiple Perl one liners along the following lines

    perl -pi -e 's/\r\n/\n/g' input.file
    perl -pi -e 's/\r/\n/g' input.file
  2. To see what problems you have you can use hexdump or Midnight Commander on Unix FAR on Windows. They will show what characters are used in particular file

For simple conversion you can use FTP, screen capture, unix2dos and dos2unix, tr, awk, Perl, and vi to do the conversion. You can also use CYGWIN.

FTP

When using an FTP program to move a text file between Unix and Windows, be sure the file is transferred in ASCII format, so the document is transformed into a text format appropriate for the host. Some FTP programs, especially graphical applications (e.g., Hummingbird FTP), do this automatically. If you are using command line FTP, before you begin the transfer, enter:

ascii

Note: You need to use a client that supports secure FTP to transfer files to and from Indiana University's central systems. For more, see At IU, what SSH/SFTP clients are supported and where can I get them?

dos2unix and unix2dos

The utilities dos2unix and unix2dos are available for converting files from the Unix command line.

To convert a Windows file to a Unix file, enter:

  dos2unix winfile.txt unixfile.txt

To convert a Unix file to Windows, enter:

  unix2dos unixfile.txt winfile.txt

tr

You can use tr to remove all carriage returns and Ctrl-z (^Z) characters from a Windows file:

  tr -d '\15\32' < winfile.txt > unixfile.txt

However, you cannot use tr to convert a document from Unix format to Windows.

awk

To use awk to convert a Windows file to Unix, enter:

  awk '{ sub("\r$", ""); print }' winfile.txt > unixfile.txt

To convert a Unix file to Windows, enter:

  awk 'sub("$", "\r")' unixfile.txt > winfile.txt

Older versions of awk do not include the sub function. In such cases, use the same command, but replace awk with gawk or nawk.

Perl

To convert a Windows text file to a Unix text file using Perl, enter:

  perl -p -e 's/\r$//' < winfile.txt > unixfile.txt

To convert from a Unix text file to a Windows text file, enter:

  perl -p -e 's/\n/\r\n/' < unixfile.txt > winfile.txt

You must use single quotation marks in either command line. This prevents your shell from trying to evaluate anything inside.

vi

In vi, you can remove carriage return ( ^M ) characters with the following command:

  :1,$s/^M//g

Note: To input the ^M character, press Ctrl-v, and then press Enter or return.

In vim, use :set ff=unix to convert to Unix; use :set ff=dos to convert to Windows.

This document was developed with support from National Science Foundation (NSF) grant OCI-1053575. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the NSF.
Top updates

Bulletin Latest Past week Past month
Google Search


NEWS CONTENTS

Old News ;-)

Ubuntu Genius's Blog

October 26, 2010 | Ubuntu Genius

Most people don't realise that when they hit the Enter key to create a new paragraph in a text file, something very different is going on behind the scenes in the three major operating systems: Windows, Macintosh and Linux. The "end-of-line delimiter" (often expressed as "End-Of-Line", "End of Line", or just "EOL") – which some of you know as the "line break" or "newline" – is a special character used to designate the end of a line within a text file.

UNIX-based operating systems (like all Linux distros and BSD derivatives) use the line feed character (\n or <LF>), "classic" Mac OS uses a carriage return (\r or <CR>), while DOS/Windows uses a carriage return followed by a line feed (\r\n or <CR><LF>). Now that Mac OS X is based on FreeBSD's file system, it follows the UNIX convention.

Now, the reason most people don't know about all this is because nobody really should have to. But while users of Linux distros and Mac OS can open Windows text files in basically any available editor and not even know the difference, the same can't be said for Windows users opening files created in one of the other operating systems.

If you type up a simple text file in Ubuntu and save it in the default "Unix/Linux" format, in Windows it will appear as one continuous paragraph, with black squares where the line breaks (or new paragraphs) should be. While you can open the file in a more advanced text editor (or proper word processor) to view it as it should look, others you've sent it to are just likely to double-click it and let it open in Notepad (which can only handle MS-DOS EOL).

Occasionally, the reverse is the issue, but you can convert Windows text files to UNIX easily with Gedit, as well as convert them via the terminal, so hopefully the following guide will be of use.

For more detailed info on End-Of-Line, go to the Wikipedia page.

Or if you're wanting to do the reverse, check out how to convert to Windows format via the terminal and with Save As… in Gedit.

Converting Windows EOL to Linux via the Terminal

If you find the text editor you're using to display Windows files in Ubuntu shows ^M instead of a line break (not very likely with even the most lightweight text editors, but something you'll probably come across if you display the text in a terminal), don't worry – just convert them to Unix/Linux format.

While you can actually open them in Gedit and use Save As… to save over them (or to create copies) in the correct format, for more than a couple of files this would be the long, complicated solution.

By far the quickest and easiest approach is to convert the offending files via the command-line. This way, you could batch-convert hundreds of such files at once, not have to do them individually.
There are actually quite a few ways to do this, but we'll look at a couple of tiny packages you can install, and the related commands to use.

The first – the tofrodos package – is undoubtedly the most widely-used, so we'll look at that in detail – especially since many of the guides out there are outdated, since the commands it contains have been renamed.

The second is a little package called flip, and since it's tiny and won't cause any issues, it's worth installing as a backup (just in case. I found it useful after trying to get tofrodos going on a new system, before I found out the commands were changed).

There is no actual command tofrodos, as it is just the package that contains the commands todos and fromdos. Currently, the vast majority of online guides will list the commands as unix2dos and dos2unix, but as the developer states:

With this release the symlinks "unix2dos" and "dos2unix" are dropped from the package. This will allow the introduction of the original dos2unix package, which also supports conversion to MacOS style files.

So now you can choose to use either todos (to convert to Windows) and fromdos (to convert to Linux), or just fromdos with options (fromdos -u to convert to DOS, and fromdos -d to convert to UNIX, though obviously the -d option really isn't needed, as it is the default behaviour for the fromdos command).

We'll use fromdos, as it is easier to remember, and show how to alter a single file, or all text files in a given folder. When you're ready to proceed, open a terminal in the folder containing the text file(s) and use one of the following commands (note that for the purpose of illustration, the .txt suffix is used, but you can specify any other extension for your text files).

To Convert to UNIX/Linux via Terminal:

Single file (remember to replace filename.txt with the actual name of the file)

fromdos filename.txt

All text files in a folder (if the extension differs to .txt, simply replace it in the command)

fromdos *.txt

Similarly, flip is easy to use:

flip -u filename.txt (or flip -u *.txt for multiple files)

HowTo UNIX - Linux Convert DOS Newlines CR-LF to Unix-Linux Format

Task: Convert Dos TO Unix Using tr Command

Type the following command:

tr -d '\r' < input.file > output.file

Task: Convert Dos TO Unix Using Perl One Liner

Type the following command:

perl -pi -e 's/\r\n/\n/g' input.file

Task: Convert UNIX to DOS format using sed command

Type the following command:

$ sed 's/$'"/`echo \\\r`/" input.txt > output.txt 
Note: sed version may not work under different UNIX/Linux variant,refer your 
	local sed man page for more info.

Task: Convert DOS newlines (CR/LF) to Unix format using sed command

If you are using BASH shell type the following command (press Ctrl-V then Ctrl-M to get pattern or special symbol)

$ sed 's/^M$//' input.txt > output.txt
Note: sed version may not work under different UNIX/Linux variant, refer your local sed man page for more info.

Converting DOS-UNIX and vice-versa

The text files under Unix end their line with the symbol "\n" (called Line Feed and noted LF, ASCII code = 0A).

Text files under DOS by a "line", end their line with the symbol "\r"(called Carriage Return and noted CR, ASCII 0D).
Thus, every line in a DOS file ends with a CRLF sequence, or \r\n.

Conversion from DOS to UNIX


Simply delete the "\r" (carriage return) at the end of the line.
The "\ r" is symbolically represented by "^M", which is obtained by the following sequence of keys "CTRL-V" + "CTRL-M".

sed 's/^M$//' file

Note:

With the GNU-sed(gsed 3.02.80) version, we can use the ASCII notation:

sed 's/\x0D$//' file

Conversion from UNIX to DOS


Just do the opposite of the previous command, namely (the "^M" being entered in the same way (CTRL-V + CTRL-M)):
sed 's/$/^M/' file
Note:

With the GNU-sed(gsed 3.02.80) version, we can use the symbolic notation "\r":

sed 's/$/\r/' file


Recommended Links

HowTo UNIX - Linux Convert DOS Newlines CR-LF to Unix-Linux Format

Converting DOS-UNIX and vice-versa

Top Visited

Bulletin Latest Past week Past month
Google Search





Etc

Society

Groupthink : Two Party System as Polyarchy : Corruption of Regulators : Bureaucracies : Understanding Micromanagers and Control Freaks : Toxic Managers :   Harvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Neoliberalism  : The Iron Law of Oligarchy : Libertarian Philosophy

Quotes

War and Peace : Skeptical Finance : John Kenneth Galbraith :Talleyrand : Oscar Wilde : Otto Von Bismarck : Keynes : George Carlin : Skeptics : Propaganda  : SE quotes : Language Design and Programming Quotes : Random IT-related quotesSomerset Maugham : Marcus Aurelius : Kurt Vonnegut : Eric Hoffer : Winston Churchill : Napoleon Bonaparte : Ambrose BierceBernard Shaw : Mark Twain Quotes

Bulletin:

Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks The efficient markets hypothesis : Political Skeptic Bulletin, 2013 : Unemployment Bulletin, 2010 :  Vol 23, No.10 (October, 2011) An observation about corporate security departments : Slightly Skeptical Euromaydan Chronicles, June 2014 : Greenspan legacy bulletin, 2008 : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Financial Humor Bulletin, 2010 : Inequality Bulletin, 2009 : Financial Humor Bulletin, 2008 : Copyleft Problems Bulletin, 2004 : Financial Humor Bulletin, 2011 : Energy Bulletin, 2010 : Malware Protection Bulletin, 2010 : Vol 26, No.1 (January, 2013) Object-Oriented Cult : Political Skeptic Bulletin, 2011 : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method  : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law

History:

Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds  : Larry Wall  : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOSProgramming Languages History : PL/1 : Simula 67 : C : History of GCC developmentScripting Languages : Perl history   : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history

Classic books:

The Peter Principle : Parkinson Law : 1984 : The Mythical Man-MonthHow to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Hater’s Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite

Most popular humor pages:

Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor

The Last but not Least Technology is dominated by two types of people: those who understand what they do not manage and those who manage what they do not understand ~Archibald Putt. Ph.D


Copyright © 1996-2021 by Softpanorama Society. www.softpanorama.org was initially created as a service to the (now defunct) UN Sustainable Development Networking Programme (SDNP) without any remuneration. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.

FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.

This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...

You can use PayPal to to buy a cup of coffee for authors of this site

Disclaimer:

The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the Softpanorama society. We do not warrant the correctness of the information provided or its fitness for any purpose. The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be tracked by Google please disable Javascript for this site. This site is perfectly usable without Javascript.

Last modified: March 12, 2019