|
Home | Switchboard | Unix Administration | Red Hat | TCP/IP Networks | Neoliberalism | Toxic Managers |
(slightly skeptical) Educational society promoting "Back to basics" movement against IT overcomplexity and bastardization of classic Unix |
News | See also | Recommended Links | Reference | Perl tr function | TR Set Notation | cut command | printf Command |
SED | AWK | Caesar cipher | Cryptography | chpasswd | Sysadmin Horror Stories | Humor | Etc |
|
Unix tr command copies the standard input to the standard output with substitution or deletion of selected characters. In addition it can squeeze repeating characters into a singe character (with option -s). This makes tr a great preprocessing tool for the cut command which in many cases is way too primitive to be useful without this functionality.
NOTE: Perl also has tr function, which can be used instead of command, if you need additional flexibility of preprocessing. The semantic is basically the same.
Utility performs classic alphabet1 to alphabet2 type of translation sometimes called 1:1 transliteration and as such is suitable for implementation of the Caesar cipher. Unix inherited tr from Multics as a derivative of PL/1 translate built-in function, which in turn was a generalization of a TR command in System/360 architecture (see IBM System-360 Green Card).
|
The format of the tr command is somewhat strange -- this is one of the few Unix commands that accepts input only from standard input.
tr [ options ] [ set1 [ set2 ] ]
For example, to translate file from regular "\n" line delimiter to \0 line delimiter required by such utilities as xargs, du and other you can pipe the file into tr
cat file | tr '\n' '\0' | du --files0-from -
Input characters in the string set1 are mapped to corresponding characters in the string set2. Logically set1 and set2 should have equal length. If this is not the case, no error is generated, but two rules are applied to make them equal:
The set1 can be specified directly as a complement if option -c is given( see below)
Both sets can be specified by
tr '{}' '()' < infile > outfile
tr 'A-Z' 'a-z' < infile > outfile
For example, more correct implementation of the change of the case from upper to lower or vise versa can be specified as following:
cat names | tr '[:upper:]' '[:lower:]' > lc_names
Classes can be combined to form a more complex set, for example '[:lower:][:upper:]'
The tr utility accepts three additional options which substantially increase its power:
Most Unix administrators do not suspect about existence of those options, which are quite useful and greatly extend the usability of this generally very simple command.
-s -- squeeze repeated characters in the output into a singe character. This makes tr a great preprocessing tool for the cut command |
Here is more full description of those options:
To replace every nonprinting character, other than valid control characters, with a ? (question mark), enter:
tr -c '[:print:][:cntrl:]' '?' < textfile > newfileHere is more complex and rather elegant example in which the goal is to create a list of words in a file (option -s means "squeeze repeating symbols", see below):
tr -cs '[:lower:][:upper:]' '[\n*]' < text > wordsThis translates each sequence of characters other than lowercase ppercase letters into a single newline character. The * (asterisk) causes the tr command to repeat the new line character enough times to make the second string as long as the first string.
Extract digits form a string:
echo "Abc123d56E" | tr -cd '[[:digit:]]'Output:
12356
For example:
tr --delete '=;:`"<>,./?!@#$%^&(){}[]'
tr can be used to change the carriage returns at the end of each line into the newline UNIX expects. tr allows you to specify characters as octal values by preceding the value with a backslash, so the command:
tr -d '\015' < pc.file > unix.fileORtr -d '\r' < pc.file > unix.file
will remove the carriage return from the carriage return/newline pair used by Microsoft OSes
as a line terminator. Please note that this can also be done by dos2unix utility.
tr -d '\0' < textfile > newfile
tr -s '[:space:]' '[\:*]' < in_file
tr -s '\n' < textfile > newfile
OR
tr -s '\012' < textfile > newfile
tr -cs '[:lower:][:upper:]' '[\n*]' < text > words
This translates each sequence of characters other than lowercase uppercase letters into a
single newline character. The * (asterisk) causes the tr command to repeat
the new line character enough times to make the second string as long as the first string.
cat infile | tr -cs "[:alnum:]" "\n" | sort | uniq -c | sort -rn
Sets set1 and set2 are specified as strings of characters. Most represent themselves. Interpreted sequences are:
- \nnn -- character with octal value nnn
- \xnn -- character with hexadecimal value nn
- \\ -- backslash
- \a -- alert
- \b -- backpace
- \f -- form feed
- \r -- return
- \t -- horizontal tab
- \v -- vertical tab
- \E -- escape
- c1-c2 -- all characters from c1 to c2in ascending order. The character specified by c1 must collate before the character specified by c2.
- [c1-c2] -- same as c1-c2 if both sets use this form
- [c*] -- set2 extended to the length of set1 with the symbol c. In other words fills out the set2 with the character specified by c. This option can be used only at the end of the set2. Any characters specified after the * (asterisk) are ignored.
- [c*N] -- N copies of symbol c. N is considered a decimal integer unless the first digit is a 0; then it is considered an octal integer.
- [:alnum:] -- all letters and digits
- [:alpha:] -- all letters
- [:blank:] -- all horizontal whitespace
- [:cntrl:] -- all control characters
- [:digit:] -- all digits
- [:graph:] -- all printable characters, not including space
- [:lower:] -- all lower case letters
- [:print:] -- all printable characters, including space
- [:punct:] -- all punctuation characters
- [:space:] -- all horizontal or vertical whitespace
- [:upper:] -- all upper case letters
- [:xdigit:] -- all hexadecimal digits
- [=c=] -- Specifies all of the characters with the same equivalence class as the character specified by C.
Notes:
(Some examples were adapted from AIX man page)
$ echo"aaabbbccc" | tr -s 'abc' abc
To replace every sequence of one or more new lines with a single new line, enter:
tr -s '\n' < textfile > newfile
OR
tr -s '\012' < textfile > newfile
tr -s '[:space:]' '#'
tr -s '[:blank:]' ' ' < input.txt > output.txt
$ printf "%s\n" "The cow jumped over the moon" | tr -c 'aeiou' '?' ??e??o???u??e??o?e????e??oo??
The following example creates a list of all the words in `file1' one per line in `file2', where a word is taken to be a maximal string of alphabetic character. The second string is quoted to protect `\' from the Shell. 012 is the ASCII code for newline.
tr -cs A-Za-z '\012' <file1 >file2
Note: you can use more modern variant:
tr -cs "[:alpha:]" "\n" < file1 >file2Or:
tr -cs '[:lower:][:upper:]' '\n' < file` > file2
tr -c '[:print:][:cntrl:]' '?' < textfile > newfile
This example scans a file created in a different locale to find characters that are not printable characters in the current locale and replace them with ? sign
tr -c "[:print:]" '?' < myfile.txt
tr '{}' '()' < textfile > newfile
This translates each { (left brace) to ( (left parenthesis) and each
} (right brace) to ) (right parenthesis). All other characters remain
unchanged.
tr 'a-z' 'A-Z' < textfile > newfile
tr -cd "[:print:]" < myfile.txt
tr -d '\0' < textfile > newfile
tr supports character equivalence. To translate any e-like characters in a variable named FOREIGN_STRING to a plain e, for example, you use
$ printf "$FOREIGN_STRING" | tr "[=e=]" "e"
$ tr -d '\r' < dos.txt > linux.txt
Apple text files have carriage returns instead of line feeds. tr can take care of that as well by replacing the carriage returns.
$ tr '\r' '\n' < apple.txt > linux.txt
The other escaped characters recognized by tr are as follows:
Like many other simple Unix utilities TR can be emulated using Perl. See Perl tr function for details. Perl tr function has the same semantec and supports that same option as tr command.
For example here how to squeeze sequences of spaces and tabs into one single space
perl -pe 'tr/ \t/ /s; $_ = substr($_,1,-1)'
Here incocation of function is "tr/ \t/ /s", As you can see Perl requires that sets are delimited with some character (slash "/" in this example) and options are specified at the end. Other then that this is a direct analog of
tr -s ' \t' ' '
Additional examples are available from Perl One Liners
Difference Between tr and sed Command Linux.com
10 Jul 13In Linux every thing is in files and some time we need to edit files to make some changes. There are many command line utilities like vim, vi, nano etc that allow us to open file, find the particular word from file and replace it with our correct word. If we want to modify a large file without opening, there are also many command line utilities as echo, sed and tr in Linux that will allow us to modify a file without opening it. Sometimes we can do same modification in a file with sed and tr. As per below example we can use sed and tr command for same purpose. Where we can use both sed or tr, we will prefer to use of tr command because the tr is more faster. Of course, in many practical cases, the speed difference is too small to notice.
Suppose we have a string as "This+is+test+for+tr+and+sed" and we want to replace '+' with white-space ' ' and this type of replacement can be done with both tr as well sed command as below
[user@test ~]$ echo This+is+test+for+tr+and+sed |tr '+' ' ' This is test for tr and sed[user@test ~]$ echo This+is+test+for+tr+and+sed |sed 's/\+/ /g' This is test for tr and sedWe can use sed and tr commands as editor and basic text transformations, but there are difference in uses of tr command and sed command.
Difference between tr and sed
tr command Translate, squeeze, delete characters from standard input, writing to standard output. on the other hand sed is a stream editor or it is used to perform basic text transformations on an input stream
tr perform character based transformation but sed perform string based transformation.
For example
[user@test ~]$ echo I am a good boy | tr 'good' 'test' I am a tsst bsytr has done character based transformation and it is replacing good to best as g=b, o=e, o=s, d=t and because o is double so it ignore first rule and using o=s and output is as above.
[user@test ~]$ echo I am a good boy | sed 's/good/best/g' I am a best boyBut sed is string based transformation and if there will 'good' string more than one time those will replace with 'best'.
But in other cases tr command also more useful and easier. Just suppose that we have entered braces '{}' by mistake instead of parenthesis '()' in a file test.txt and we can translate braces with parenthesis with tr.
[user@test ~]$ tr '{}' '()' < test.txt > newtest.txtIt will replace '{}' with '()' from test.txt and save output in newtest.txt.
|
Switchboard | ||||
Latest | |||||
Past week | |||||
Past month |
Mar 05, 2020 | www.networkworld.com
There are many ways to change text on the Linux command line from lowercase to uppercase and vice versa. In fact, you have an impressive set of commands to choose from. This post examines some of the best commands for the job and how you can get them to do just what you want.
Using trThe tr (translate) command is one of the easiest to use on the command line or within a script. If you have a string that you want to be sure is in uppercase, you just pass it through a tr command like this:
$ echo Hello There | tr [:lower:] [:upper:] HELLO THERE[Get regularly scheduled insights by signing up for Network World newsletters.]Below is an example of using this kind of command in a script when you want to be sure that all of the text that is added to a file is in uppercase for consistency:
#!/bin/bash echo -n "Enter department name: " read dept echo $dept | tr [:lower:] [:upper:] >> deptsSwitching the order to [:upper:] [:lower:] would have the opposite effect, putting all the department names in lowercase:
... ... ...
Oct 30, 2018 | www.tecmint.com
8. Here is an example of breaking a single line of words (sentence) into multiple lines, where each word appears in a separate line.
$ echo "My UID is $UID" My UID is 1000 $ echo "My UID is $UID" | tr " " "\n" My UID is 10009. Related to the previous example, you can also translate multiple lines of words into a single sentence as shown.
$ cat uid.txt My UID is 1000 $ tr "\n" " " < uid.txt My UID is 100010. It is also possible to translate just a single character, for instance a space into a
" : "
character, as follows.$ echo "Tecmint.com =>Linux-HowTos,Guides,Tutorials" | tr " " ":" Tecmint.com:=>Linux-HowTos,Guides,TutorialsThere are several sequence characters you can use with tr , for more information, see the tr man page.
... ... ...
Oct 22, 2017 | seismo.berkeley.edu
A tr script to remove all non-printing characters from a file is below. Non-printing characters may be invisible, but cause problems with printing or sending the file via electronic mail. You run it from Unix command prompt, everything on one line:
> tr -d '\001'-'\011''\013''\014''\016'-'\037''\200'-'\377' < filein > fileoutWhat is the meaning of this tr script is, that it deletes all charactes with octal value from 001 to 011, characters 013, 014, characters from 016 to 037 and characters from 200 to 377. Other characters are copied over from filein to fileout and these are printable. Please remember, you can not fold a line containing tr command, everything must be on one line, how long it would be. In practice, this script solves some mysterious Unix printing problems.Type in a text file named "f127.TR" with the line starting tr above. Print the file on screen with cat f127.TR command, replace "filein" and "fileout" with your file names, not same the file, then copy and paste the line and run (execute) it. Please, remember this does not solve Unix end-of-file problem, that is the character '\000', also known as a 'null', in the file. Nor does it handle binary file problem, that is a file starting with two zeroes '\060' and '\060'
Sometimes there are some invisible characters causing havoc. This tr command line converts tabulate- characters into hashes (#) and formfeed- characters into stars (*).
> tr '\011\014' '#*' < filein > fileoutThe numeric value of tabulate is 9, hex 09, octal 011 and in C-notation it is \t or \011. Formfeed is 12, hex 0C, octal 014 and in C-notation it is \f or \014. Please note, tr replaces character from the first (leftmost) group with corresponding character in the second group. Characters in octal format, like \014 are counted as one character each.
sed -e 's/\"ฎ\"/ /g' -e 's/\"\"/ /g' < file
lexjansen.com
Non printable & special characters in clinical trial data create potential problems in producing quality deliverables. There could be major issues such as incorrect statistics / counts in the deliverables, or minor ones such as incorrect line breaks, page brakes or appearance of strange symbols in the reports. Identifying and deleting these issues could pose challenges. When faced with this issue in Pharmaceutical & Biotech industry, it is imperative to clean them up. We need to understand the underlying cause and use various techniques to identify and handle them.
strings filename
tr -dc '[:print:]' < oldfile > newfile
Wikipedia
Most versions of tr, including GNU tr and classic Unix tr, operate on single-byte characters and are not Unicode compliant. An exception is the Heirloom Toolchest implementation, which provides basic Unicode support.
Ruby and Perl also have an internal tr operator, which operates analogously. Tcl's string map command is more general in that it maps strings to strings while tr maps characters to characters.
Stack Overflow
This is the command I'm using on a standard web page I wget from online.
tr '<' '\n<' < index.html
however it giving me newlines, but not adding the back carrot in again. e.g.
echo "<hello><world>" | tr '<' '\n<'
returns
(blank line which is fine) hello> world>
instead of
(blank line or not) <hello> <world>
Thanks
That's because
tr
only does character-for-character substitution (or deletion).Try
sed
instead.echo '<hello><world>' | sed -e 's/</\n&/g'
Or
awk
.echo '<hello><world>' | awk '{gsub(/</,"\n<",$0)}1'
Or
perl
.echo '<hello><world>' | perl -pe 's/</\n</g'
Or
ruby
.echo '<hello><world>' | ruby -pe '$_.gsub!(/</,"\n<")'
Or
python
.echo '<hello><world>' \ | python -c 'for l in __import__("fileinput").input():print l.replace("<","\n<")'
===
I tried that but I get n<hello>n<world>. I don't know what the sed newline character is Kamran224 Dec 1 '11 at 23:26
====
@Kamran224 This works for me but try: echo -e '<hello><world>' | sed -e 's/</\n&/g' user649198 Dec 1 '11 at 23:29
====
@Kamran224
\n
is a GNU sed extension. What system are you on? ephemient Dec 1 '11 at 23:36====
@ephemient SunOS (afs system on my campus) Kamran224 Dec 1 '11 at 23:43
====
@Jaypal A string of 8 spaces does not equal a tab; you need a literal tab character. The 8-space thing is about tab stops, not tabs. Michael J. Barber Dec 4 '11 at 7:27
====
Does this work for you?
awk -F"><" -v OFS=">\n<" '{print $1,$2}' [jaypal:~/Temp] echo "<hello><world>" | awk -F"><" -v OFS=">\n<" '{$1=$1}1'; <hello> <world>
You can put a regex / / (lines you want this to happen for) in front of the
awk
{}
action.====
'{$1=$1}1'
is shorter and will work if there is more than><
on a line. ephemient Dec 2 '11 at 0:10====
Thanks @ephemient I agree, Have updated my answer. jaypal Dec 2 '11 at 0:16
====
This would replace fewer of the
<
characters than in the question. Michael J. Barber Dec 4 '11 at 7:29====
If you have GNU
grep
, this may work for you:grep -Po '<.*?>[^<]*' index.html
which should pass through all of the HTML, but each tag should start at the beginning of the line with possible non-tag text following on the same line.
If you want nothing but tags:
grep -Po '<.*?>' index.html
You should know, however, that it's not a good idea to parse HTML with regexes.
Aug 9, 2013
For a variety of reasons you can end up with text files on your Unix filesystem that have binary characters in them. In fact, I showed you how to do this to yourself in my blog post about the Unix script command. (There's nothing wrong with this approach; it's just a by-product of using the script command.)
To fix this problem, and get the binary characters out of your files, there are several approaches you can take to fix this problem. Probably the easiest solution involves using the Unix tr command. Here's all you have to remove non-printable binary characters (garbage) from a Unix text file:
tr -cd '\11\12\15\40-\176' < file-with-binary-chars > clean-fileThis command uses the
-c
and-d
arguments to thetr
command to remove all the characters from the input stream other than the ASCII octal values that are shown between the single quotes. This command specifically allows the following characters to pass through this Unix filter:octal 11: tab octal 12: linefeed octal 15: carriage return octal 40 through octal 176: all the "good" keyboard charactersAll the other binary characters -- the "garbage" characters in your file -- are stripped out during this translation process.
The following example creates a list of all the words in `file1' one per line in `file2', where a word is taken to be a maximal string of alphabetics. The second string is quoted to protect `\' from the Shell. 012 is the ASCII code for newline.
tr -cs A-Za-z '\012' <file1 >file2
Examples
- To translate braces into parentheses, type:
tr '{}' '()' < textfile > newfile- To translate braces into brackets type:
tr '{}' '\[]' < textfile > newfileThis translates each { (left brace) to [ (left bracket) and each } (right brace) to ] (right bracket). The left bracket must be entered with a \ (backslash) escape character.
- To translate lowercase characters to uppercase, type:
tr 'a-z' 'A-Z' < textfile > newfile- To create a list of words in a file, type:
tr -cs '[:lower:][:upper:]' '[\n*]' < textfile > newfileThis translates each sequence of characters other than lowercase ppercase letters into a single newline character. The * (asterisk) causes the tr command to repeat the new line character enough times to make the second string as long as the first string.
- To delete all NULL characters from a file, type:
tr -d '\0' < textfile > newfile- To replace every sequence of one or more new lines with a single new line, type:
tr -s '\n' < textfile > newfileOR
tr -s '\012' < textfile very nonprinting character, other than valid control characters, with a ? (question mark), type:tr -c '[:print:][:cntrl:]' '[?*]' < textfile > newfileThis scans a file created in a different locale to find characters that are not printable characters in the current locale.
- To replace every sequence of characters in the <space> character class with a single # character, type:
tr -s '[:space:]' '[#*]'
Mar 3, 2009 | UNIX BASH scripting
Just to introduce a good use of Linux tr command; if you need to concatenate the digits from a string, here is a way:
$ echo "Abc123d56E" | tr -cd '[[:digit:]]'
Output:
12356From tr man pages:
tr [OPTION]... SET1 [SET2]
-c, -C, --complement: first complement SET1
-d, --delete : delete characters in SET1, do not translateSimilarly:
$ echo "Abc123d56E" | tr -d '[[:digit:]]'
Output:
AbcdE
Create a list of the words in /path/to/file, one per line, enter:
$ tr -cs "[:alpha:]" "\n" < /path/to/file
Where,
- -c : Complement the set of characters in string1
- -s : Replace each input sequence of a repeated character that is listed in SET1 with a single occurrence of that character
Aug 01, 2006 | developerWorks
Translating text
Now that you know at least five different ways of generating some text, let's look at doing some simple translations on it.
The tr command lets you translate characters in one set to the corresponding characters in a second set. Let's take a look at a few examples (Listing 4) to see how it works.
Listing 4. Using tr to translate charactersecho "a test" | tr t p echo "a test" | tr aest 1234 echo "a test" | tr -d t echo "a test" | tr '[:lower:]' '[:upper:]'Looking at the output of these commands (see Listing 5) gives you a clue about how tr works (here's a hint: it's a direct replacement of characters in the first set with the corresponding characters from the second set).chrish@dhcp3 [199]$ echo "a test" | tr t p a pesp chrish@dhcp3 [200]$ echo "a test" | tr aest 1234 1 4234 chrish@dhcp3 [201]$ echo "a test" | tr -d t a es chrish@dhcp3 [202]$ echo "a test" | tr '[:lower:]' '[:upper:]' A TESTThe first and second examples are simple enough, replacing one character for another. The third example, with the -d option (delete), removes the specified characters completely from the output. This is often used to remove carriage returns from DOS text files to turn them into UNIX text files. Finally, the last example uses character classes (those names inside of [: :]) to convert all lower-case letters into upper-case letters. Portable Operating System Interface-standard (POSIX-standard) character classes include:Listing 6. Converting DOS text files into UNIX text files
- alnum: alphanumeric characters
- alpha: alphabetic characters
- cntrl: control (non-printing) characters
- digit: numeric characters
- graph: graphic characters
- lower: lower-case alphabetic characters
- print: printable characters
- punct: punctuation characters
- space: whitespace characters
- upper: upper-case characters
- xdigit: hexadecimal characters
tr -d '\r' < input_dos_file.txt > output_unix_file.txtAlthough the tr command respects C locale environment variables (try man locale for more information about these), don't expect it to do anything sensible with UTF-8 documents, such as being able to replace lower-case accented characters with appropriate upper-case characters. The tr command works best with ASCII and the other standard C locales.
The following example is a complete awk program, which prints the number of occurrences of each word in its input. It illustrates the associative nature of awk arrays by using strings as subscripts. It also demonstrates the `for x in array' construction. Finally, it shows how awk can be used in conjunction with other utility programs to do a useful task of some complexity with a minimum of effort. Some explanations follow the program listing.
awk ' # Print list of word frequencies { for (i = 1; i <= NF; i++) freq[$i]++ } END { for (word in freq) printf "%s\t%d\n", word, freq[word] }'The first thing to notice about this program is that it has two rules. The first rule, because it has an empty pattern, is executed on every line of the input. It uses awk's field-accessing mechanism (see section Examining Fields) to pick out the individual words from the line, and the built-in variable NF (see section Built-in Variables) to know how many fields are available.
For each input word, an element of the array freq is incremented to reflect that the word has been seen an additional time.
The second rule, because it has the pattern END, is not executed until the input has been exhausted. It prints out the contents of the freq table that has been built up inside the first action.
Note that this program has several problems that would prevent it from being useful by itself on real text files:
- Words are detected using the awk convention that fields are separated by whitespace and that other characters in the input (except newlines) don't have any special meaning to awk. This means that punctuation characters count as part of words.
- The awk language considers upper and lower case characters to be distinct. Therefore, `foo' and `Foo' are not treated by this program as the same word. This is undesirable since in normal text, words are capitalized if they begin sentences, and a frequency analyzer should not be sensitive to that.
- The output does not come out in any useful order. You're more likely to be interested in which words occur most frequently, or having an alphabetized table of how frequently each word occurs.
The way to solve these problems is to use other system utilities to process the input and output of the awk script. Suppose the script shown above is saved in the file `frequency.awk'. Then the shell command:
tr A-Z a-z < file1 | tr -cd 'a-z\012' \ | awk -f frequency.awk \ | sort +1 -nrproduces a table of the words appearing in `file1' in order of decreasing frequency.
The first tr command in this pipeline translates all the upper case characters in `file1' to lower case. The second tr command deletes all the characters in the input except lower case characters and newlines. The second argument to the second tr is quoted to protect the backslash in it from being interpreted by the shell. The awk program reads this suitably massaged data and produces a word frequency table, which is not ordered.
The awk script's output is now sorted by the sort command and printed on the terminal. The options given to sort in this example specify to sort by the second field of each input line (skipping one field), that the sort keys should be treated as numeric quantities (otherwise `15' would come before `5'), and that the sorting should be done in descending (reverse) order.
See the general operating system documentation for more information on how to use the tr and sort commands.
Shell scripting exampleIn the following example you will get confirmation before deleting the file. If the user responds in lower case, the tr command will do nothing, but if the user responds in upper case, the character will be changed to lower case. This will ensure that even if user responds with YES, YeS, YEs etc; script should remove file:
#!/bin/bash echo -n "Enter file name : " read myfile echo -n "Are you sure ( yes or no ) ? " read confirmation confirmation="$(echo ${confirmation} | tr 'A-Z' 'a-z')" if [ "$confirmation" == "yes" ]; then [ -f $myfile ] && /bin/rm $myfile || echo "Error - file $myfile not found" else : # do nothing fiRemove all non-printable characters from myfile.txt
$ tr -cd "[:print:]" < myfile.txtRemove all two more successive blank spaces from a copy of the text in a file called input.txt and save output to a new file called output.txt
tr -s ' ' ' ' < input.txt > output.txtThe -d option is used to delete every instance of the string (i.e., sequence of characters) specified in set1. For example, the following would remove every instance of the word nameserver from a copy of the text in a file called /etc/resolv.conf and write the output to a file called ns.ipaddress.txt:
tr -d 'nameserver' < /etc/resolv.conf > ns.ipaddress.txt
- To translate braces into parentheses, enter:
tr '{}' '()' < textfile > newfileThis translates each { (left brace) to ( (left parenthesis) and each } (right brace) to ) (right parenthesis). All other characters remain unchanged.
- To translate braces into brackets, enter:
tr '{}' '\[]' < textfile > newfileThis translates each { (left brace) to [ (left bracket) and each } (right brace) to ] (right bracket). The left bracket must be entered with a \ (backslash) escape character.
- To translate lowercase characters to uppercase, enter:
tr 'a-z' 'A-Z' < textfile > newfile- To create a list of words in a file, enter:
tr -cs '[:lower:][:upper:]' '[\n*]' < textfile > newfileThis translates each sequence of characters other than lowercase ppercase letters into a single newline character. The * (asterisk) causes the tr command to repeat the new line character enough times to make the second string as long as the first string.
- To delete all NULL characters from a file, enter:
tr -d '\0' < textfile > newfile- To replace every sequence of one or more new lines with a single new line, enter:
tr -s '\n' < textfile > newfileOR
tr -s '\012' < textfile > newfile- To replace every nonprinting character, other than valid control characters, with a ? (question mark), enter:
tr -c '[:print:][:cntrl:]' '[?*]' < textfile > newfileThis scans a file created in a different locale to find characters that are not printable characters in the current locale.
- To replace every sequence of characters in the <space> character class with a single # (pound sign) character, enter:
tr -s '[:space:]' '[#*]'
Cat-ting our file (columns.txt) and then piping the output of the cat command to the input of the translate command causing all lowercase names to be translated to uppercase names.cat columns.txt | tr '[a-z]' '[A-Z]'Remember we have not modified the file columns.txt so how do we save the output? Simple, by redirecting the output of the translate command with '>' to a file called UpCaseColumns.txt with:
cat columns.txt | tr '[a-z]' '[A-Z]' > UpCaseColumns.txtSince the tr command, does not take a filename like sed did, we could have changed the above example to:
tr '[a-z]' '[A-Z]' < columns.txt > UpCaseColumns.txtAs you can see the input to the translate command now comes, not from stdin, but rather from columns.txt. So either way we do it, we can achieve what we've set out to do, using tr as part of a stream, or taking the input from the stdin ('<').
In the shell program we use to remove all non-printable ASCII characters from a text file, we tell the tr command to delete every character in the translation process except for the specific characters we specify. In essence, we filter out the undesirable characters. The tr command we use in our program is shown below:tr -cd '\11\12\40-\176' < $INPUT_FILE > $OUTPUT_FILEIn this command, the variable INPUT_FILE must contain the name of the Solaris file you'll be reading from, and OUTPUT_FILE must contain the name of the output file you'll be writing to. When the -c and -d options of the tr command are used in combination like this, the only characters tr writes to the standard output stream are the characters we've specified on the command line.
Although it may not look very attractive, we're using octal characters in our tr command to make our programming job easier and more efficient. Our command tells tr to retain only the octal characters 11, 12, and 40 through 176 when writing to standard output. Octal character 11 corresponds to the [TAB] character, and octal 12 corresponds to the [LINEFEED] character. The octal characters 40 through 176 correspond to the standard visible keyboard characters, beginning with the [Space] character (octal 40) through the ~ character (octal 176). These are the only characters retained by tr -- the rest are filtered out, leaving us with a clean ASCII file.
Example1: Change uppercase to lowercase in a file:
D:\temp>more score.txt john 81 91 mark 82 93 tina 88 92D:\temp>tr '[a-z]' '[A-Z]' < score.txt > score1.txtD:\temp>more score1.txt JOHN 81 91 MARK 82 93 TINA 88 92
LinuxPlanet
LP: Would you talk a little more about the tr utility?
Ah, tr. Well, first thing that comes to mind is that it is the answer to the trivia question, "Name a Linux utility that accepts input only from standard input and never from a file named as an argument on the command line." It is an odd beast that is useful only sometimes--but when it is useful it is very useful. Here is an excerpt that talks about tr:
"The tr utility reads standard input and, for each input character, maps it to an alternate character, deletes the character, or leaves the character alone. This utility reads from standard input and writes to standard output.
"The tr utility is typically used with two arguments, string1 and string2. The position of each character in the two strings is important: Each time tr finds a character from string1 in its input, it replaces that character with the corresponding character from string2.
"With one argument, string1, and the --delete option, tr deletes the characters specified in string1. The option --squeeze-repeats replaces multiple sequential occurrences of characters in string1 with single occurrences (for example, abbc becomes abc).
"You can use a hyphen to represent a range of characters instring1 or string2. The two command lines in the following example produce the same result:
$ echo abcdef | tr 'abcdef' 'xyzabc' xyzabc $ echo abcdef | tr 'a-f' 'x-za-c' xyzabc"The next example demonstrates a popular method for disguising text, often called ROT13 (rotate 13) because it replaces the first letter of the alphabet with the thirteenth, the second with the fourteenth, and so forth.
$ echo The punchline of the joke is ... | > tr 'A-M N-Z a-m n-z' 'N-Z A-M n-z a-m' Gur chapuyvar bs gur wbxr vf ..."To make the text intelligible again, reverse the order of the arguments to tr:
$ echo Gur chapuyvar bs gur wbxr vf ... | > tr 'N-Z A-M n-z a-m' 'A-M N-Z a-m n-z' The punchline of the joke is ..."The --delete option causes tr to delete selected characters:
$ echo If you can read this, you can spot the missing vowels! | > tr --delete 'aeiou' If y cn rd ths, y cn spt th mssng vwls!"In the following example, tr replaces characters and reduces pairs of identical characters to single characters:
$ echo tennessee | tr --squeeze-repeats 'tnse' 'srne' serene"The next example replaces each sequence of nonalphabetic characters (the complement of all the alphabetic characters as specified by the character class alpha) in the file draft1 with a single NEWLINE character. The output is a list of words, one per line.
$ tr --complement --squeeze-repeats '[:alpha:]' '\n' < draft1"The final example uses character classes to upshift the string hi there:
$ echo hi there | tr '[:lower:]' '[:upper:]' HI THERE
Linux Journal
Luckily, we can also use ranges of characters to specify the characters more efficiently:
tr a-z A-Z Ever had those horrible upper case DOS file names? Here's a Bourne script to take care of them:for f in *; do mv $f `echo $f | tr A-Z a-z` doneMany UNIX editors allow some text to be processed by the shell. For example, to replace all upper case characters of the next paragraph with lower case while in vi, type:
tr A-Z a-zAs another example, the command:
tr a-z A-Zcapitalizes the current and next line (the character after the ! is a movement character). If you read the International Obfuscated C Code Contest (ftp://ftp.uu.net./pub/ioccc/), you frequently see that part of the hints are coded by a method called rot13. rot13 is a Caesar cypher, i.e., a cypher in which all letters are shifted some number of places. For example, a becomes b, b becomes c, ..., y becomes z, and z becomes a. In rot13 each letter is shifted 13 places. It is a weak cypher, and to decipher it, you can use rot13 again. You can also use tr to read the text in this way:
tr a-zA-Z n-za-mN-ZA-MAnother interesting way to use tr is to change files from Macintosh format to UNIX format. For returns, the Macintosh uses \r while UNIX uses \n. GNU tr allows you to use the C special characters, so type:tr \r \n
If you don't have GNU's version of tr, you can always use the corresponding octal numbers as shown here:
tr \015 \012You might wonder what would happen if the second string is shorter than the first string. POSIX says this is not allowed. System V says that only that portion of the first string is used that has a matching character in the second string. BSD and GNU pad the second string with its final character in order to match the length of the first string. The reason this last method is handy becomes clearer when we take complements into account. Assume you wish to make a list of all words and keywords in your listing. When you use -c, tr complements the first string. In C, all identifiers and keywords consist of a-zA-Z0-9_, so those are the characters we want to keep. Thus, we can do the following:
tr -c a-zA-Z0-9_ \nIf we pipe the tr output through sort -u, we get our desired list. If we follow POSIX, the second string would have to describe 193 newline characters (described as \n*193 or \n*). If we use system V, only the zero byte is translated to a newline, since the complement of a-zA-Z0-9_ starts with the zero byte.
The second important use of tr is to remove characters. For this option, you use the flag -d with one string as an argument. To fix up those nasty MS-DOS text files with a ^M at the end of the line and a trailing ^Z, specify tr in this way:
tr -d \015\032Many people have written a program in C to do this same operation. Well, a C program isn't necessary--you only need to know the right program, tr, with the right flags. The -d flag isn't used often, but is nice to have when needed. You can combine it with the -c flag to delete everything except characters from the string you supplied as an argument.Repeated characters can be squeezed into a single one using the -s option with one string as an argument. It can also be used to squeeze white space. To remove empty lines, type:
tr -s \n The -s option can be used with two strings as arguments. In that case, tr first translates the text as if -s were not given and then tries to squeeze the characters in the second string. For instance, we can squeeze all standard white space to a single space by specifying:tr -s \n [ *]The -d flag can also be used with two strings: the characters in the first string will be removed and the characters in the second string will be squeezed. tr may not be a great program; however, it gets the job done. It is particularly useful in scripts using pipes and command substitutions (i.e., inside the back quotes). If you use tr often, you'll learn to appreciate its capabilities. Small is beautiful.
Linux Journal
t r is a simple pattern translator. Its practical application overlaps a bit with other, more complex tools, such as sed and awk [with larger binary footprints]. tr is quite useful for simple textual replacements, deletions and additions. Its behavior is dictated by "from" and "to" character sets provided as the first and second argument. The general usage syntax of tr is as follows:
# (12) tr usage tr [options] "set1" ["set2"] < input > outputNote that tr does not accept file arguments; it reads from standard input and writes to standard output. When two character sets are provided, tr operates on the characters contained in "set1" and performs some amount of substitution based on "set2". Listing 1 demonstrates some of the more common tasks performed with tr.
# (13) Transform lower case alphas to their # equivelent upper case. $ echo "Hello World." | tr "[a-z]" "[A-Z]" HELLO WORLD. # (14) Same lower to upper transformation - # uses character class names :lower: # and :upper:. (tr recognizes 12 # character class names). $ tr "[:lower:]" "[:upper:]" README > UPPER_README # (15) Make $PATH a bit more readable/searchable - # substitude ':' with a line feed $ echo $PATH | tr ":" "\n" /usr/bin /bin /usr/local/bin ..... $ echo $PATH | tr ":" "\n" | grep -i "local" /usr/local/bin /usr/home/curly/Local_bin # (16) Remove all white space from a file. $ tr -d "[:space:]" < README > NO_WHITE_SPACE # (17) Substitute all single or sequence of ; # with a single : $ echo ";;;;This;;is;a;;;;simple;;;example." \ | tr -s ";" ":" :This:is:a:simple:example.
this example takes an echo response of '12345678 9247' and pipes it through the tr replacing the appropriate numbers with the letters. In this example it would return computer hope.
echo "12345678 9247" | tr 123456789 computerh -
this example would take the file myfile1 and strip all non printable characters and take that results to myfile2.
tr -cd '\11\12\40-\176' < myfile1 > myfile2
Google matched content |
tr - translate or delete charactersSynopsis
tr [OPTION]... SET1 [SET2]Description
Translate, squeeze, and/or delete characters from standard input, writing to standard output.
- -c, -C, --complement
- use the complement of SET1
- -d, --delete
- delete characters in SET1, do not translate
- -s, --squeeze-repeats
- replace each input sequence of a repeated character that is listed in SET1 with a single occurrence of that character
- -t, --truncate-set1
- first truncate SET1 to length of SET2
- --help
- display this help and exit
- --version
- output version information and exit
SETs are specified as strings of characters. Most represent themselves. Interpreted sequences are:
- \NNN
- character with octal value NNN (1 to 3 octal digits)
- \\
- backslash
- \a
- audible BEL
- \b
- backspace
- \f
- form feed
- \n
- new line
- \r
- return
- \t
- horizontal tab
- \v
- vertical tab
- CHAR1-CHAR2
- all characters from CHAR1 to CHAR2 in ascending order
- [CHAR*]
- in SET2, copies of CHAR until length of SET1
- [CHAR*REPEAT]
- REPEAT copies of CHAR, REPEAT octal if starting with 0
- [:alnum:]
- all letters and digits
- [:alpha:]
- all letters
- [:blank:]
- all horizontal whitespace
- [:cntrl:]
- all control characters
- [:digit:]
- all digits
- [:graph:]
- all printable characters, not including space
- [:lower:]
- all lower case letters
- [:print:]
- all printable characters, including space
- [:punct:]
- all punctuation characters
- [:space:]
- all horizontal or vertical whitespace
- [:upper:]
- all upper case letters
- [:xdigit:]
- all hexadecimal digits
- [=CHAR=]
- all characters which are equivalent to CHAR
Translation occurs if -d is not given and both SET1 and SET2 appear. -t may be used only when translating. SET2 is extended to length of SET1 by repeating its last character as necessary. Excess characters of SET2 are ignored. Only [:lower:] and [:upper:] are guaranteed to expand in ascending order; used in SET2 while translating, they may only be used in pairs to specify case conversion. -s uses SET1 if not translating nor deleting; else squeezing uses SET2 and occurs after translation or deletion.
Any combination of the options -c, -d, or -s may be used:
The following example creates a list of all the words in filename1, one per line, in filename2, where a word is taken to be a maximal string of alphabetics. The second string is quoted to protect `\' from the shell. 012 is the ASCII code for NEWLINE.
example% tr -cs A-Za-z '\012' <filename1>filename2
-A | Performs all operations on a byte-by-byte basis using the ASCII collation order for ranges and character classes, instead of the collation order for the current locale. |
-c | Specifies that the value of String1 be replaced by the complement of the string specified by String1. The complement of String1 is all of the characters in the character set of the current locale, except the characters specified by String1. If the -A and -c flags are both specified, characters are complemented with respect to the set of all 8-bit character codes. If the -c and -s flags are both specified, the -s flag applies to characters in the complement of String1. |
-d | Deletes each character from standard input that is contained in the string specified by String1. |
-s | Removes all but the first in a sequence of a repeated characters. Character sequences specified by String1 are removed from standard input before translation, and character sequences specified by String2 are removed from standard output. |
String1 | Specifies a string of characters. |
String2 | Specifies a string of characters. |
Cat-ting our file (columns.txt) and then piping the output of the cat command to the input of the translate command causing all lowercase names to be translated to uppercase names.
cat columns.txt | tr '[a-z]' '[A-Z]'Remember we have not modified the file columns.txt so how do we save the output? Simple, by redirecting the output of the translate command with '>' to a file called UpCaseColumns.txt with:
cat columns.txt | tr '[a-z]' '[A-Z]' > UpCaseColumns.txtSince the tr command, does not take a filename like sed did, we could have changed the above example to:
tr '[a-z]' '[A-Z]' < columns.txt > UpCaseColumns.txtAs you can see the input to the translate command now comes, not from stdin, but rather from columns.txt. So either way we do it, we can achieve what we've set out to do, using tr as part of a stream, or taking the input from the stdin ('<').
We can also use translate in another way: to distinguish between spaces and tabs. Spaces and tabs can be a pain when using scripts to compile system reports. What we need is a way of translating these characters. Now, there are many ways to skin a cat in Linux and shell scripting. I'm going to show you one way, although I'm sure you could now write a sed expression to do the same thing.
Assume that I have a file with a number of columns in it, but I am not sure about the number of spaces or tabs between the different columns, I would need some way of changing these spaces into a single space. Why? Since, having a space (one or more) or a tab (one or more) between the columns will produce significantly different output if we extracted information from the file with a shell script. How do we do convert many spaces or tabs into a single space? Well, translate is our right-hand man (or woman) for this particular task. In order not to waste our time modifying our columns.txt let's work on the free command, which shows you free memory on your system. Type:
freeIf you look at the output you will see that there's lots of spaces between each one of these fields. How do we reduce multiple spaces between fields to a single space? We can use to tr to squeeze characters (you can squeeze any characters but in this case we want to squeeze a space):
free |tr -s ' 'The -s switch tells the translate command to squeeze. (Read the info page on tr to find out all the other switches of tr).
We could squeeze zeroes with:
free | tr -s '0'Which would obviously make zero sense!
Going back to our previous command of squeezing spaces, you'll see immediately that our memory usage table (which is what the free command produces) becomes much more usable because we've removed superfluous spaces.
Perhaps, we want some fields from the output. We could redirect the output of this into a file with:
free | tr -s ' ' > file.txtTraditional systems would have you use a Text editor to cut and paste the fields you are interested in, into a new file. Do we want to do that? Absolutely not! We're lazy, we want to find a better way of doing this.
What I'm interested in, is the line that contains 'Mem'. As part of your project, you should be building a set of scripts to monitor your system. Memory sounds like a good one that you may want to save. Instead of just redirecting the tr command to a file, let's first pass it through sed where we extract only the lines beginning with the word "Mem":
free | tr -s ' ' | sed '/^Mem/!d'This returns only the line that we're interested in. We could run this over and over again, to ensure that the values change.
Let's take this one step further. We're only interested in the second, third and fourth fields of the line (representing total memory, used memory and free memory respectively). How do we retrieve only these fields?
Society
Groupthink : Two Party System as Polyarchy : Corruption of Regulators : Bureaucracies : Understanding Micromanagers and Control Freaks : Toxic Managers : Harvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Neoliberalism : The Iron Law of Oligarchy : Libertarian Philosophy
Quotes
War and Peace : Skeptical Finance : John Kenneth Galbraith :Talleyrand : Oscar Wilde : Otto Von Bismarck : Keynes : George Carlin : Skeptics : Propaganda : SE quotes : Language Design and Programming Quotes : Random IT-related quotes : Somerset Maugham : Marcus Aurelius : Kurt Vonnegut : Eric Hoffer : Winston Churchill : Napoleon Bonaparte : Ambrose Bierce : Bernard Shaw : Mark Twain Quotes
Bulletin:
Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks The efficient markets hypothesis : Political Skeptic Bulletin, 2013 : Unemployment Bulletin, 2010 : Vol 23, No.10 (October, 2011) An observation about corporate security departments : Slightly Skeptical Euromaydan Chronicles, June 2014 : Greenspan legacy bulletin, 2008 : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Financial Humor Bulletin, 2010 : Inequality Bulletin, 2009 : Financial Humor Bulletin, 2008 : Copyleft Problems Bulletin, 2004 : Financial Humor Bulletin, 2011 : Energy Bulletin, 2010 : Malware Protection Bulletin, 2010 : Vol 26, No.1 (January, 2013) Object-Oriented Cult : Political Skeptic Bulletin, 2011 : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law
History:
Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds : Larry Wall : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOS : Programming Languages History : PL/1 : Simula 67 : C : History of GCC development : Scripting Languages : Perl history : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history
Classic books:
The Peter Principle : Parkinson Law : 1984 : The Mythical Man-Month : How to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Haters Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite
Most popular humor pages:
Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor
The Last but not Least Technology is dominated by two types of people: those who understand what they do not manage and those who manage what they do not understand ~Archibald Putt. Ph.D
Copyright ฉ 1996-2021 by Softpanorama Society. www.softpanorama.org was initially created as a service to the (now defunct) UN Sustainable Development Networking Programme (SDNP) without any remuneration. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.
FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.
This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...
|
You can use PayPal to to buy a cup of coffee for authors of this site |
Disclaimer:
The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the Softpanorama society. We do not warrant the correctness of the information provided or its fitness for any purpose. The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be tracked by Google please disable Javascript for this site. This site is perfectly usable without Javascript.
Last modified: February, 10, 2021