Softpanorama

May the source be with you, but remember the KISS principle ;-)
Home Switchboard Unix Administration Red Hat TCP/IP Networks Neoliberalism Toxic Managers
(slightly skeptical) Educational society promoting "Back to basics" movement against IT overcomplexity and  bastardization of classic Unix

dos2unix - DOS to UNIX text converter

News Recommended Books Recommended Links Reference mv command

ln command

tar cpio Admin Horror Stories Unix History Humor

Etc


Introduction

dos2unix , the program that converts plain text files in DOS/MAC format to UNIX format . Default option is to convert the file in place (-o). There is also option -k --keepdate that allows to preserve the date stamp of the file.

The Dos2unix package includes utilities "dos2unix" and "unix2dos" to convert plain text files in DOS or Mac format to Unix format and vice versa.

In DOS/Windows text files a line break, also known as newline, is a combination of two characters: a Carriage Return (CR) followed by a Line Feed (LF). In Unix text files a line break is a single character: the Line Feed (LF). In Mac text files, prior to Mac OS X, a line break was single Carriage Return (CR) character. Nowadays Mac OS uses Unix style (LF) line breaks.

Binary files are automatically skipped, unless conversion is forced.

Non-regular files, such as directories and FIFOs, are automatically skipped.

Symbolic links and their targets are by default kept untouched. Symbolic links can optionally be replaced, or the output can be written to the symbolic link target. Symbolic links on Windows are not supported. Windows symbolic links always replaced, keeping the targets unchanged.

Dos2unix was modeled after dos2unix under SunOS/Solaris and has similar conversion modes.

Important options are

-k --keepdate
Keep the date stamp of output file same as input file.
-f, --force Force conversion of binary files (file with non-printable characters are considered to be binary by dos2unix)

Syntax and options

dos2unix [options] [-c convmode] [-o file ...] [-n infile outfile ...] 

Options:

[-hkqV] [--help] [--keepdate] [--quiet] [--version]

If single argument is given that file in converted "in-place" with the same name.

The following options are available:

Conversion modes

MAC MODE
In normal mode line breaks are converted from DOS to Unix and vice
versa. Mac line breaks are not converted.

In Mac mode line breaks are converted from Mac to Unix and vice versa.
DOS line breaks are not changed.

To run in Mac mode use the command-line option "-c mac" or use the
commands "mac2unix" or "unix2mac".

Conversion modes ascii, 7bit, and iso are similar to those of dos2unix/unix2dos under SunOS/Solaris.

ascii
In mode "ascii" only line breaks are converted. This is the default
conversion mode.

Although the name of this mode is ASCII, which is a 7 bit standard,
the actual mode is 8 bit. Use always this mode when converting
Unicode UTF-8 files.

7bit
In this mode all 8 bit non-ASCII characters (with values from 128
to 255) are converted to a 7 bit space.

iso Characters are converted between a DOS character set (code page)
and ISO character set ISO-8859-1 (Latin-1) on Unix. DOS characters
without ISO-8859-1 equivalent, for which conversion is not
possible, are converted to a dot. The same counts for ISO-8859-1
characters without DOS counterpart.

When only option "-iso" is used dos2unix will try to determine the
active code page. When this is not possible dos2unix will use
default code page CP437, which is mainly used in the USA. To force
a specific code page use options "-437" (US), "-850" (Western
European), "-860" (Portuguese), "-863" (French Canadian), or "-865"
(Nordic). Windows code page CP1252 (Western European) is also
supported with option "-1252". For other code pages use dos2unix in
combination with iconv(1). Iconv can convert between a long list
of character encodings.

Never use ISO converion on Unicode text files. It will corrupt
UTF-8 encoded files.

Examples

Get input from stdin and write output to stdout.

dos2unix

Convert and replace a.txt. Convert and replace b.txt.

dos2unix a.txt b.txt
dos2unix -o a.txt b.txt

Convert and replace a.txt in ASCII conversion mode.

dos2unix a.txt -c iso b.txt

Convert and replace b.txt in ISO conversion mode.

dos2unix -c ascii a.txt -c iso b.txt

Convert c.txt from Mac to Unix ascii format.

dos2unix -c mac c.txt b.txt

Convert and replace a.txt while keeping original date stamp.

dos2unix -k a.txt
dos2unix -k -o a.txt

Convert a.txt and write to e.txt.

dos2unix -n a.txt e.txt

Convert a.txt and write to e.txt, keep date stamp of e.txt same as a.txt.

dos2unix -k -n a.txt e.txt

Convert and replace a.txt. Convert b.txt and write to e.txt.

dos2unix a.txt -n b.txt e.txt
dos2unix -o a.txt -n b.txt e.txt

Convert c.txt and write to e.txt. Convert and replace a.txt. Convert and replace b.txt. Convert d.txt and write to f.txt.

dos2unix -n c.txt e.txt -o a.txt b.txt -n d.txt f.txt

Convert all files in a tree

find /srv/www/Public_html -type f -name "*.shtml" -print0  | grep -v "/_" | xargs -l dos2unix -k -f 
			

Convert from DOS default code page to Unix Latin-1

dos2unix -iso -n in.txt out.txt

Convert from DOS CP850 to Unix Latin-1

dos2unix -850 -n in.txt out.txt

Convert from Windows CP1252 to Unix Latin-1

dos2unix -1252 -n in.txt out.txt

Convert from Windows CP1252 to Unix UTF-8 (Unicode)

iconv -f CP1252 -t UTF-8 in.txt | dos2unix > out.txt

Convert from Unix Latin-1 to DOS default code page.

unix2dos -iso -n in.txt out.txt

Convert from Unix Latin-1 to DOS CP850

unix2dos -850 -n in.txt out.txt

Convert from Unix Latin-1 to Windows CP1252

unix2dos -1252 -n in.txt out.txt

Convert from Unix UTF-8 (Unicode) to Windows CP1252

unix2dos < in.txt | iconv -f UTF-8 -t CP1252 > out.txt

See also <http://czyborra.com/charsets/codepages.html> and
<http://czyborra.com/charsets/iso8859.html>.

UNICODE
Encodings
There exist different Unicode encodings. On Unix and Linux Unicode
files are typically encoded in UTF-8 encoding. On Windows Unicode text
files can be encoded in UTF-8, UTF-16, or UTF-16 big endian, but are
mostly encoded in UTF-16 format.

Conversion
Unicode text files can have DOS, Unix or Mac line breaks, like regular
text files.

All versions of dos2unix and unix2dos can convert UTF-8 encoded files,
because UTF-8 was designed for backward compatiblity with ASCII.

Dos2unix and unix2dos with Unicode UTF-16 support, can read little and
big endian UTF-16 encoded text files. To see if dos2unix was built with
UTF-16 support type "dos2unix -V".

The Windows versions of dos2unix and unix2dos convert UTF-16 encoded
files always to UTF-8 encoded files. Unix versions of dos2unix/unix2dos
convert UTF-16 encoded files to the locale character encoding when it
is set to UTF-8. Use the locale(1) command to find out what the locale
character encoding is.

Because UTF-8 formatted text files are well supported on both Windows
and Unix, dos2unix and unix2dos have no option to write UTF-16 files.
All UTF-16 characters can be encoded in UTF-8. Conversion from UTF-16
to UTF-8 is without loss. UTF-16 files will be skipped on Unix when the
locale character encoding is not UTF-8, to prevent accidental loss of
text. When an UTF-16 to UTF-8 conversion error occurs, for instance
when the UTF-16 input file contains an error, the file will be skipped.

ISO and 7-bit mode conversion do not work on UTF-16 files.

Byte Order Mark
On Windows Unicode text files typically have a Byte Order Mark (BOM),
because many Windows programs (including Notepad) add BOMs by default.
See also <http://en.wikipedia.org/wiki/Byte_order_mark>.

On Unix Unicode files typically don't have a BOM. It is assumed that
text files are encoded in the locale character encoding.

Dos2unix can only detect if a file is in UTF-16 format if the file has
a BOM. When an UTF-16 file doesn't have a BOM, dos2unix will see the
file as a binary file.

Use dos2unix in combination with iconv(1) to convert an UTF-16 file
without BOM.

Dos2unix never writes a BOM in the output file, unless you use option
"-m".

Unix2dos writes a BOM in the output file when the input file has a BOM,
or when option "-m" is used.

Unicode examples
Convert from Windows UTF-16 (with BOM) to Unix UTF-8

dos2unix -n in.txt out.txt

Convert from Windows UTF-16 (without BOM) to Unix UTF-8

iconv -f UTF-16 -t UTF-8 in.txt | dos2unix > out.txt

Convert from Unix UTF-8 to Windows UTF-8 with BOM

unix2dos -m -n in.txt out.txt

Convert from Unix UTF-8 to Windows UTF-16

unix2dos < in.txt | iconv -f UTF-8 -t UTF-16 > out.txt

EXAMPLES
Read input from 'stdin' and write output to 'stdout'.

dos2unix
dos2unix -l -c mac

Convert and replace a.txt. Convert and replace b.txt.

dos2unix a.txt b.txt
dos2unix -o a.txt b.txt

Convert and replace a.txt in ascii conversion mode.

dos2unix a.txt

Convert and replace a.txt in ascii conversion mode. Convert and
replace b.txt in 7bit conversion mode.

dos2unix a.txt -c 7bit b.txt
dos2unix -c ascii a.txt -c 7bit b.txt
dos2unix -ascii a.txt -7 b.txt

Convert a.txt from Mac to Unix format.

dos2unix -c mac a.txt
mac2unix a.txt

Convert a.txt from Unix to Mac format.

unix2dos -c mac a.txt
unix2mac a.txt

Convert and replace a.txt while keeping original date stamp.

dos2unix -k a.txt
dos2unix -k -o a.txt

Convert a.txt and write to e.txt.

dos2unix -n a.txt e.txt

Convert a.txt and write to e.txt, keep date stamp of e.txt same as
a.txt.

dos2unix -k -n a.txt e.txt

Convert and replace a.txt. Convert b.txt and write to e.txt.

dos2unix a.txt -n b.txt e.txt
dos2unix -o a.txt -n b.txt e.txt

Convert c.txt and write to e.txt. Convert and replace a.txt. Convert
and replace b.txt. Convert d.txt and write to f.txt.

dos2unix -n c.txt e.txt -o a.txt b.txt -n d.txt f.txt

RECURSIVE CONVERSION
Use dos2unix in combination with the find(1) and xargs(1) commands to
recursively convert text files in a directory tree structure. For
instance to convert all .txt files in the directory tree under the
current directory type:

find . -name *.txt | xargs dos2unix

LOCALIZATION
LANG
The primary language is selected with the environment variable
LANG. The LANG variable consists out of several parts. The first
part is in small letters the language code. The second is optional
and is the country code in capital letters, preceded with an
underscore. There is also an optional third part: character
encoding, preceded with a dot. A few examples for POSIX standard
type shells:

export LANG=nl Dutch
export LANG=nl_NL Dutch, The Netherlands
export LANG=nl_BE Dutch, Belgium
export LANG=es_ES Spanish, Spain
export LANG=es_MX Spanish, Mexico
export LANG=en_US.iso88591 English, USA, Latin-1 encoding
export LANG=en_GB.UTF-8 English, UK, UTF-8 encoding

For a complete list of language and country codes see the gettext
manual:
<http://www.gnu.org/software/gettext/manual/gettext.html#Language-Codes>

On Unix systems you can use to command locale(1) to get locale
specific information.

LANGUAGE
With the LANGUAGE environment variable you can specify a priority
list of languages, separated by colons. Dos2unix gives preference
to LANGUAGE over LANG. For instance, first Dutch and then German:
"LANGUAGE=nl:de". You have to first enable localization, by setting
LANG (or LC_ALL) to a value other than "C", before you can use a
language priority list through the LANGUAGE variable. See also the
gettext manual:
<http://www.gnu.org/software/gettext/manual/gettext.html#The-LANGUAGE-variable>

If you select a language which is not available you will get the
standard English messages.

DOS2UNIX_LOCALEDIR
With the environment variable DOS2UNIX_LOCALEDIR the LOCALEDIR set
during compilation can be overruled. LOCALEDIR is used to find the
language files. The GNU default value is "/usr/local/share/locale".
Option --version will display the LOCALEDIR that is used.

Example (POSIX shell):

export DOS2UNIX_LOCALEDIR=$HOME/share/locale

RETURN VALUE
On success, zero is returned. When a system error occurs the last
system error will be returned. For other errors 1 is returned.

The return value is always zero in quiet mode, except when wrong
command-line options are used.

STANDARDS
<http://en.wikipedia.org/wiki/Text_file>

<http://en.wikipedia.org/wiki/Carriage_return>

<http://en.wikipedia.org/wiki/Newline>

<http://en.wikipedia.org/wiki/Unicode>

AUTHORS
Benjamin Lin - <[email protected]> Bernd Johannes Wuebben (mac2unix
mode) - <[email protected]>, Christian Wurll (add extra newline) -
<[email protected]>, Erwin Waterlander - <[email protected]>
(Maintainer)

Project page: <http://waterlan.home.xs4all.nl/dos2unix.html>

SourceForge page: <http://sourceforge.net/projects/dos2unix/>

Freecode: <http://freecode.com/projects/dos2unix>

SEE ALSO
file(1) find(1) iconv(1) locale(1) xargs(1)

dos2unix 2012-09-15 dos2unix(1)


Top Visited
Switchboard
Latest
Past week
Past month

NEWS CONTENTS

Old News ;-)

How do I convert between Unix and DOS text files?

The Unix and DOS operating systems (which includes Microsoft Windows) differ in the format in which they store text files. DOS places both a line feed and a carriage return character at the end of each line of a text file, but Unix uses only a line feed character. Some DOS applications need to see carriage return characters at the ends of lines, and may treat Unix-format files as giant single lines. Some Unix applications won't recognize the carriage returns added by DOS, and will display Ctrl-m ( ^M ) characters at the end of each line.

There are many ways to solve this problem. This document provides instructions for using FTP, screen capture, unix2dos and dos2unix, tr, awk, Perl, and Emacs to do the conversion. Before you use these utilities, the files you are converting must first be on a Unix computer.

FTP

When using an FTP program to move a text file between Unix and DOS, be sure the file is transferred in ASCII format. This will ensure that the document is transformed into a text format appropriate for the host. Some FTP programs, especially graphical applications like Rapid Filer, do this automatically. If you are using FTP from the Unix or DOS prompt, however, before you begin the file transfer, be sure to enter at the FTP prompt:

  ascii

Screen Capture

You can also convert files from Unix to DOS format when transferring them to a PC with a communications program by selecting ASCII text download. Select this option with your communications program to capture all the text subsequently displayed to your screen, and then enter at the Unix prompt:

  cat unixfile.txt

Replace unixfile.txt with the name of the Unix text file you are transferring. Most communications programs will add carriage returns to the stream of text as they save it to your computer's hard drive. Once the file has finished displaying, abort the text download.

Note: This method may be slow for large text files. Also, no error checking is performed on the file as it is transferred. Line noise may corrupt its contents, especially if you are using a terminal connect program such as HyperTerminal.

dos2unix and unix2dos

On systems using SunOS, such as , the utilities dos2unix and unix2dos are available. These utilities provide a straightforward method for converting files from the Unix command line.

To use either command, simply type the command followed by the name of the file you wish to convert, and the name of a file which will contain the converted results. Thus, to convert a DOS file to a Unix file, at the Unix prompt, enter:

  dos2unix dosfile.txt unixfile.txt

To convert a Unix file to DOS, enter:

  unix2dos unixfile.txt dosfile.txt

Note: These utilities are only available on SunOS systems. To determine what variety of Unix is running on your computer, see the Knowledge Base document In Unix, how can I display information about the operating system?

tr

You can use tr to remove all carriage returns and Ctrl-z ( ^Z ) characters from a DOS file by entering:

  tr -d '\15\32' < dosfile.txt > unixfile.txt

You cannot use to to convert a document from Unix format to DOS.

awk

To use awk to convert a DOS file to Unix, at the Unix prompt, enter:

  awk '{ sub("\r$", ""); print }' dosfile.txt > unixfile.txt

To convert a Unix file to DOS using awk, at the command line, enter:

  awk 'sub("$", "\r")' unixfile.txt > dosfile.txt

On some systems, the version of awk may be old and not include the function sub. If so, try the same command, but with gawk or nawk replacing awk.

Perl

To convert a DOS text file to a Unix text file using Perl, at the Unix shell prompt, enter:

  perl -p -e 's/\r$//' < dosfile.txt > unixfile.txt

To convert from a Unix text file to a DOS text file with Perl, at the Unix shell prompt, enter:

  perl -p -e 's/$/\r/' < unixfile.txt > dosfile.txt

You must use single quotation marks in either command line. This prevents your shell from trying to evaluate anything inside.

Emacs

You can also convert a DOS file named dosfile.txt to a Unix text file using Emacs, a Unix text editor. Enter at the Unix shell prompt:

  emacs dosfile.txt

This will open the file in the Emacs text editor. To remove all the ^M characters, enter:

  M-% C-q C-m RET RET !

Note: For Emacs, keystrokes are presented in a special format. See the Knowledge Base document In Emacs, how are keystrokes denoted?

It may also be necessary to remove a Ctrl-z at the end of the document. To quickly get to the end of the document in Emacs, type:

  M->

If you see a Ctrl-z at the end of the document, delete it.

To convert a Unix file named unixfile.txt to a DOS text file, first open it in Emacs. At the Unix shell prompt, enter:

  emacs unixfile.txt

This will open the file in the Emacs text editor. To add carriage returns, which will appear as ^M in Emacs, type:

  M-% C-q C-j RET C-q C-m C-q C-j RET !

It may also be necessary to add a Ctrl-z at the end of the document. At the very end of the document, type:

  C-q C-z

Recommended Links

Google matched content

Softpanorama Recommended

Top articles

Sites

This is document acux in domain all from the Knowledge Base.
Last updated on October 16, 2000



Etc

Society

Groupthink : Two Party System as Polyarchy : Corruption of Regulators : Bureaucracies : Understanding Micromanagers and Control Freaks : Toxic Managers :   Harvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Neoliberalism  : The Iron Law of Oligarchy : Libertarian Philosophy

Quotes

War and Peace : Skeptical Finance : John Kenneth Galbraith :Talleyrand : Oscar Wilde : Otto Von Bismarck : Keynes : George Carlin : Skeptics : Propaganda  : SE quotes : Language Design and Programming Quotes : Random IT-related quotesSomerset Maugham : Marcus Aurelius : Kurt Vonnegut : Eric Hoffer : Winston Churchill : Napoleon Bonaparte : Ambrose BierceBernard Shaw : Mark Twain Quotes

Bulletin:

Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks The efficient markets hypothesis : Political Skeptic Bulletin, 2013 : Unemployment Bulletin, 2010 :  Vol 23, No.10 (October, 2011) An observation about corporate security departments : Slightly Skeptical Euromaydan Chronicles, June 2014 : Greenspan legacy bulletin, 2008 : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Financial Humor Bulletin, 2010 : Inequality Bulletin, 2009 : Financial Humor Bulletin, 2008 : Copyleft Problems Bulletin, 2004 : Financial Humor Bulletin, 2011 : Energy Bulletin, 2010 : Malware Protection Bulletin, 2010 : Vol 26, No.1 (January, 2013) Object-Oriented Cult : Political Skeptic Bulletin, 2011 : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method  : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law

History:

Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds  : Larry Wall  : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOSProgramming Languages History : PL/1 : Simula 67 : C : History of GCC developmentScripting Languages : Perl history   : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history

Classic books:

The Peter Principle : Parkinson Law : 1984 : The Mythical Man-MonthHow to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Hater’s Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite

Most popular humor pages:

Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor

The Last but not Least Technology is dominated by two types of people: those who understand what they do not manage and those who manage what they do not understand ~Archibald Putt. Ph.D


Copyright © 1996-2021 by Softpanorama Society. www.softpanorama.org was initially created as a service to the (now defunct) UN Sustainable Development Networking Programme (SDNP) without any remuneration. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.

FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.

This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...

You can use PayPal to to buy a cup of coffee for authors of this site

Disclaimer:

The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the Softpanorama society. We do not warrant the correctness of the information provided or its fitness for any purpose. The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be tracked by Google please disable Javascript for this site. This site is perfectly usable without Javascript.

Last modified: March, 12, 2019