|
Home | Switchboard | Unix Administration | Red Hat | TCP/IP Networks | Neoliberalism | Toxic Managers |
(slightly skeptical) Educational society promoting "Back to basics" movement against IT overcomplexity and bastardization of classic Unix |
|
Those notes are partially based on lecture notes by Professor Nikolai Bezroukov at FDU.
String operators allow you to manipulate the contents of a variable without resorting to AWK or Perl. Modern shells such as bash 3.x or ksh93 supports most of the standard string manipulation functions, but in a very pervert, idiosyncratic way. Anyway, standard functions like length, index, substr are available. Strings can be concatenated by juxtaposition and using double quoted strings. You can ensure that variables exist (i.e., are defined and have non-null values) and set default values for variables and catch errors that result from variables not being set. You can also perform basic pattern matching. There are several basic string operations available in bash, ksh93 and similar shells:
|
Shell string processing capabilities were weak, but recently in bash 4.x they were improved and briught closer to ksh88 capabilities. Most "classic" string handing function such as index, substr, concatenation, trimming, case conversion, translation of one set of symbols into another, etc, are available. Directly or indirectly. Regular expression now can be used for matching the string using Perl compatible regex pattern. So if you use bash 4.1+ life is good. Almost ;-)
One interesting idiosyncrasy of Unix shells including bash is that many string operators in shell use unique among programming languages curly-bracket syntax. In shell any variable can be displayed as ${name_of_the_variable} instead of $name_of_the_variable. This notation was initially introduced to protect a variable name from merging with string that comes after it, but was extended to allow string operations on variables. Here is example in which it is used for separation of a variable $var and a string "_string" using curly brackets:
export var='test' echo ${var}_string # var is a variable that uses syntax ${var} with the value test echo $var_string # var_string is a variable that doesn't exist, echo doesn't print anything
In Korn 88 shell this notation was extended to allow expressions inside curvy brackets. For example
${var:=moo}
Each operation is encoded using special symbol or two symbols ( "digraph", for example :- , := , etc). An argument that the operator may need is positioned after the symbol of the operation. Later this notation extended ksh93 and adopted by bash and other shells. Recently it was also extended to case conversion using ^^ and ,, digrams
This "ksh-originated" group of operators is the most popular and probably the most widely used group of string-handling operators so it makes sense to learn them, if only in order to be able to modify old scripts.
Bash 3.2 introduced =~ operator with "normal" Perl-style regular expressions that can be used instead in many cases and they are definitely preferable in new scripts that you might write. for more about Perl compatible regular expression see Text processing using regex. They are now standard de-fact and are use in Perl, Python, Java, Javascript and other modem languages. So good knowledge of them is necessary for any sysadmin.
Let's say we need to establish whether variable $ip appears to be a valid IP address:
for ip in "255.255.255.255" "10.10.10.10" "400.0.0.0" ; do echo "=== testing $ip ===" if [[ $ip =~ ^[0-2][0-9]{0,2}\.[0-2][0-9]{0,2}\.[0-2][0-9]{0,2}\.[0-2][0-9]{0,2} ]] ; then echo "$ip Looks like a valid IP" else echo "$ip is unvalid ip" fi done
Running this fragment will get:
=== testing 255.255.255.255 === 255.255.255.255 Looks like a valid IP === testing 10.10.10.10 === 10.10.10.10 Looks like a valid IP === testing 400.0.0.0 === 400.0.0.0 is unvalid ip
In bash-3.1, a string append operator (+=) was added:
PATH+=":~/bin" echo "$PATH"
In bash 4.1 negative length specifications in the ${var:offset:length} expansion, previously errors, are now treated as offsets from the end of the variable.
It also extended printf built-in which now has a new %(fmt)T specifier, which allows time values to use strftime-like formatting.
printf -v can now assign values to array indices.
The read built-in now has a new `-N nchars' option, which reads exactly NCHARS characters, ignoring delimiters like newline.
The {x}
If pattern is a string, then "matching pattern substitution" is the combination of two functions index and substr, Only if index function succeed, substr function is applied. If regular expression is used, this is equivalent to $var=s/regex/string/operation in Perl. See also Search and Replace Unlike Perl only basic regular expressions are allowed
This notation was introduced in ksh88 and still remains very idiosyncratic. We have four operations: #, ##, % and %%. Two perform search/matching from the left of the string, two from the right.
In examples below we will assume that the variable var has value "this is a test" (as produced by execution of statement export var="this is a test")
var="this is a test" echo ${var#t*is} # will match "this" at the beginning of the string is a test
echo ${var##t*is} a test
echo ${var%t*st} this is a
echo ${var%%t*st} # returns empty string as the first word is matched
Here is test Perl program for the example above that illustrates how # operates
#!/usr/bin/perl $var="this is a test"; $pattern='this'; if ( ($k=index($var,$pattern))>-1 ) { substr($var,$k,length($pattern))=''; } print "var=$var\n";If executed, it will print
var= is a testNote a space after equal sign -- pattern matches up to letter 't' so space is the first unmatched symbol that gets into the result.
Despite shell deficiencies in this area and idiosyncrasies preserved from 1970th most classic string operations can be implemented in shell. You can define functions that behave almost exactly like in Perl or other "more normal" scripting languages. In case shell facilities are not enough you can always add functions written in AWK or Perl. It's actually sad that AWK was not integrated into shell.
There are several ways to get length of the string.
expr length $stringor
expr "$string" : '.*'
stringZ=abcABC123ABCabc echo ${#stringZ} # 15 echo `expr length $stringZ` # 15 echo `expr "$stringZ" : '.*'` # 15
More complex example. Here's the function for validating that that string is less the a given max length. It requires two parameters, the actual string and the maximum length the string should be.
check_length() # to call: check_length string max_length_of_string { # check we have the right params if (( $# != 2 )) ; then echo "check_length need two parameters: a string and max_length" return 1 fi if (( ${#1} > $2 )) ; then return 1 fi return 0 }
You could call the function check_length like this:
#!/usr/bin/bash # test_name while : do echo -n "Enter customer name :" read NAME [ check_length $NAME 10 ] && break echo "The string $NAME is longer then 10 characters" done
echo $NAME
expr match "$string" : '$substring'
where:
The arguments are converted to strings and the second is considered to be a (basic, the same as used by GNU grep ) regular expression, with a `^' implicitly prepended. The first argument is then matched against this regular expression.
If the match succeeds and REGEX uses `\(' and `\)', the `:' expression returns the part of STRING that matched the sub-expression; otherwise, it returns the number of characters matched.
If the match fails, the `:' operator returns the null string if `\(' and `\)' are used in REGEX, otherwise 0
Only the first `\( ... \)' pair is relevant to the return value; additional pairs are meaningful only for grouping the regular expression operators.
In the regular expression, `\+', `\?', and `\|' are operators which respectively match one or more, zero or one, or separate alternatives. SunOS and other `expr''s treat these as regular characters. (POSIX allows either behavior.)
For example
my_string=abcABC123ABCabc # |------| echo `expr "$my_string" : 'abc[A-Z]*.2'` # 8See shell script - OR in `expr match` - Unix & Linux Stack Exchange for some additional info.
There is no such function. In cases when you use if often and need exact position of the substring in the source string you probably will be better off switching to Perl or any other scripting language that you know well.
If you just need to check that the substring occurs in the string can also use regular expression with the =~ operator
You can use pattern matching for determining whether particular substring is present in string (the most typical usage of index function) but you can't determine its position
string="abba" [[ $string =∼ *bb* ]] && echo "bb is contains in string $string"
Note that you do not need to quote variables inside [[...]]. also if you need space you can escape it with the backslash.
Case statement also will work and in many cases can be used to emulzte index function:
case "$string" in *bb*) # Do stuff ;; esac
In discussion String contains in Bash - Stack Overflow the following solution was ranked highest which provides the solution to "space inside the string problem" which hunt previous approaches:
You can use Marcus's answer (* wildcards) outside a case statement, too, if you use double brackets:string='My long string' if [[ $string == *"My long"* ]]; then echo "It's there!" fi
The other half-decent way to implement checking if the string contains a substring is to use grep and <<< redirection operator. The option -b prints the byte offset of the string shown in the first position that you can extract from the result, but this is a perversion. You need numeric position you need to switch to Perl or AWK.
if [[ grep $substr <<< string ]] ; then echo Substring $substr occurs in string fi
NOTE: Function index that exists in expr is not what you want -- the second operator in it is the set of characters (like in tr) not a string.
TheIn other wordsexpr index
command searches through your first string looking the first occurrence of any character from your second string. In this case, it is recognizing that the 'o' in the characters 'Mozilla' matches the 4th character in "Info.out..."This using this as a test to see what happens. It will return 4 as the first match for 'd':
echo `expr index "abcdefghijklmnopqrstuvwxyz" xyzd`
expr index $string $character_setreturns the numerical position in $string of first character in set of characters defined by $character_set that matches.
stringZ=abcABC123ABCabc echo `expr index "$stringZ" C12` # 6 # C position. echo `expr index "$stringZ" c` # 3 # 'c' (in #3 position)
This is the close equivalent of strchr() in C. Moreover:
Bash provides two implementation of substr function which are not identical:
I recommend using the second one, as this is more compact notation and does not involves using external function expr. It also looks more modern, as if inspired by Python, although its origin has nothing to do with Python. It is actually more compact notation that in Perl.
Classic substr function is available via expr function:
- expr substr $string $position $length
- Extracts $length characters from $string starting at $position.. The first character has index one.
stringZ=abcABC123ABCabc # 123456789...... # 1-based indexing. echo `expr substr $stringZ 1 2` # ab echo `expr substr $stringZ 4 3` # ABC
Notes:
Idiosyncratic, but pretty slick implementation of substring function is also available in bash 3.x as a part of pattern matching operators in the form
${param:offset[:length}Extracts substring with $length characters from $string starting at $position.. Offset is counted from zero like in Perl.
NOTES:
${string:position:length}If $length is not given the rest of the string starting from position $offset is extracted:
stringZ=abcABC123ABCabc # 0123456789..... # 0-based indexing. echo ${stringZ:0} # abcABC123ABCabc echo ${stringZ:1} # bcABC123ABCabc echo ${stringZ:7} # 23ABCabc echo ${stringZ:7:3} # 23A # Three characters of substring.
NOTES:
a=12345678 echo ${a:-4}intending to print the last four characters of $a. The problem is that ${param:-word} already has a special meaning: in shell: assigning the value after minus sign to the variable, if the value of variable param is undefined or null.
To use negative offsets that begin with a minus sign, separate the minus sign and the colon with a space
If the $string parameter is "*" or "@", then this is not substr function. Instead it slice the array of parameters extracts a maximum of $length positional parameters, starting at $position.
echo ${*:2} # Echoes second and following positional parameters. echo ${@:2} # Same as above. echo ${*:2:3} # Echoes three positional parameters, starting at second.
You can search and replace substring in a variable using ksh syntax:
alpha='This is a test string in which the word "test" is replaced.' beta="${alpha/test/replace}"
The string "beta" now contains an edited version of the original string in which the first case of the word "test" has been replaced by "replace". To replace all cases, not just the first, use this syntax:
beta="${alpha//test/replace}"
Note the double "//" symbol.
Here is an example in which we replace one string with another in a multi-line block of text:
list="cricket frog cat dog" poem="I wanna be a x\n\ A x is what I'd love to be\n\ If I became a x\n\ How happy I would be.\n" for critter in $list; do echo -e ${poem//x/$critter} doneThere are several additional capabilities
${var:pos[:len]} # extract substr from pos (0-based) for len ${var/substr/repl} # replace first match ${var//substr/repl} # replace all matches ${var/#substr/repl} # replace if matches at beginning (non-greedy) ${var/##substr/repl} # replace if matches at beginning (greedy) ${var/%substr/repl} # replace if matches at end (non-greedy) ${var/%%substr/repl} # replace if matches at end (greedy) ${#var} # returns length of $var ${!var} # indirect expansion
In bash-3.1, a string append operator (+=) was added, It is now a preferred solution:
PATH+=":~/bin" echo "$PATH"
Traditionally in shell strings were concatenated by juxtaposition and using double quoted strings. For example
PATH="$PATH:/usr/games"
Double quoted string in shell is almost identical to double quoted string in Perl and performs macro expansion of all variables in it. The minor differences are the treatment of escaped characters and the newline character. If you want exact match you can use $'string'
#!/bin/bash # String expansion.Introduced with version 2 of Bash. # Strings of the form $'xxx' have the standard escaped characters interpreted. echo $'Ringing bell 3 times \a \a \a' # May only ring once with certain terminals. echo $'Three form feeds \f \f \f' echo $'10 newlines \n\n\n\n\n\n\n\n\n\n' echo $'\102\141\163\150' # Bash # Octal equivalent of characters. exit 0
Using the wildcard character (?), you can imitate Perl chop function (which cuts the last character of the string and returns the rest) quite easily
test="~/bin/" trimmed_last=${test%?} trimmed_first=${test#?} echo "original='$test,timmed_first='$trimmed_first', trimmed_last='$trimmed_last'"
The first character of a string can also be obtained with printf:
printf -v char "%c" "$source"Conditional chopping line in Perl chomp function or REXX function trim can be done using while loop, for example:
function trim { target=$1 while : # this is an infinite loop do case $target in ' '*) target=${target#?} ;; ## if $target begins with a space remove it *' ') target=${target%?} ;; ## if $target ends with a space remove it *) break ;; # no more leading or trailing spaces, so exit the loop esac done return target }
A more Perl-style method to trim trailing blanks would be
spaces=${source_var##*[! ]} ## get the trailing blanks in var $spaces
trimmed_var=${source_var#$spaces}The same trick can be used for removing leading spaces.
Operator: ${var:-bar} is useful for assigning a variable a default value.
It word the following way: if $var exists and is not null, it returns $var. If it doesn't exist or is null, return bar. This operator does not change the variable $var.
Example:
$ export var="" $ echo ${var:-one} one $ echo $var
More complex example:
sort -nr $1 | head -${2:-10}
A typical usage include situations when you need to check if arguments were passed to the script and if not assign some default values::
#!/bin/bash export FROM=${1:-"~root/.profile"} export TO=${2:-"~my/.profile"} cp -p $FROM $TO
It works as following: If $var exists and is not null, return $var. If it doesn't exist or is null, set $var to bar and return bar.
Example:
$ export var="" $ echo ${var:=one} one echo $var one
The [[ ]] construct was introduced in ksh88 as a way to compensate for multiple shortcomings and limitations of the [ ] (test) solution. Essentially it makes [ ] construct obsolete except for running a program to get a return code. In turn double round brackets ((..)) construct made The [[ ]] construct obsolete for integer comparisons.
|
There are two types of operators that can be used inside double square bracket construct:
Paradoxically integer comparison operators are represented as strings ( -eq, -ne, -gt, etc) while string comparison operators as delimiters ("=", "==", "!=", "<", ">", etc).
The [[ ]] construct expects expression. In ksh delimiters [[ and ]] serve as single quotes so you do not have macro expansion inside: variable substitution and wildcard expansion aren't done within [[ and ]], making quoting less necessary. In bash this is less true :-).
It can act as independent operator as it produces return code. So constructs like
[[ $string =∼ [aeiou] ]] && exit;
are legitimate and actually pretty compact way to write if statements without else clause that contain a single statement in then block.
One of the [[ ]] construct warts is that it redefined == as a pattern matching operation, which anybody who programmed in C/C++/Java strongly resent. Latest bash version corrected that and allow using Perl-style =~ operator instead (I think ksh93 allow that too):
string="abba" [[ $string =∼ [aeiou] ]] echo $? 0 [[ $string =∼ h[sdfghjkl] ]] echo $? 1
Like [ ] construct [[ ]] construct can be used as a separate statement that returns an exit status depending upon whether condition is true or not. With && and || constructs discussed above this provides an alternative syntax for if-then and if-else constructs
if [[ -d $HOME/$user ]] ; then echo " Home for user $user exists..."; fi
can be written simpler as
[[ -d $HOME/$user ]] && echo " Home for user $user exists..."
There are several types of expressions that can be used inside [[ ... ]] construct:
One unpleasant reality (and probably the most common gotcha) of using legacy [[...]] integer comparison constructs is that if one of the variable is not initialized it produces syntax error. Various tricks are used to avoid this nasty macro substitution side effect, that came of a legacy of extremely week implementation of comparisons in Borne shell (there are way too many crazy things implemented in Borne shell, anyway ;-).
There are two classic tricks to deal with this gotchas in old [[..]] construct as you will be dealing with scripts infested with those old constructs pretty often. They can be and often are used simultaneously:
Generally, it is better to initialize most variables explicitly. I know it is difficult as old habits die slowly, but this can be done. Here are the most common "legacy integer comparison operators":
- -eq
- is equal to
if [[ "$a" -eq "$b" ]]
- -ne
- is not equal to
if [[ "$a" -ne "$b" ]]
- -gt
- is greater than
if [[ "$a" -gt "$b" ]]
- -ge
- is greater than or equal to
if [[ "$a" -ge "$b" ]]
- -lt
- is less than
if [[ "$a" -lt "$b" ]]
- -le
- is less than or equal to
if [[ "$a" -le "$b" ]]
String comparisons is all what left useful in this construct as for integer comparisons ((..)) construct is better and for file comparisons older [...] construct is equal. This topic is discussed at greater length at String Operations in Shell
Notes:
Operator | True if... |
---|---|
str = pat str == pat |
str matches pat. Note that in case of "==" that's not what you logically expect, if you have some experience with C /C++/Java programming !!! |
str != pat | str does not match pat. |
str1 < str2 | str1 is less than str2 is collation order used |
str1 > str2 | str1 is greater than str2. |
-n str | str is not null (has length greater than 0). |
-z str | str is null (has length 0). |
file1 -ef file2 | file1 is another name for file2 (hard or symbolic link) |
While we're cleaning up code we wrote in the last chapter, let's fix up the error handling in the highest script The code for that script is:
filename=${1:?"filename missing."} howmany=${2:-10} sort -nr $filename | head -$howmany
Recall that if you omit the first argument (the filename), the shell prints the message highest: 1: filename missing. We can make this better by substituting a more standard "usage" message:
if [[ -z $1 ]]; then print 'usage: howmany filename [-N]' else filename=$1 howmany=${2:-10} sort -nr $filename | head -$howmany fi
It is considered better programming style to enclose all of the code in the if-then-else, but such code can get confusing if you are writing a long script in which you need to check for errors and bail out at several points along the way. Therefore, a more usual style for shell programming is this:
if [[ -z $1 ]]; then print 'usage: howmany filename [-N]' return 1 fi filename=$1 howmany=${2:-10} sort -nr $filename | head -$howmany
There are two types of pattern matching is shell:
Unless you need to modify old scripts it does not make sense to use old ksh-style regex in bash.
(partially borrowed from Bash Regular Expressions | Linux Journal)
Since version 3 of bash (released in 2004) bash implements an extended regular expressions which are mostly compatible with Perl regex. They are also called POSIX regular expressions as they are defined in IEEE POSIX 1003.2. (which you should read and understand to use the full power provided). Extended regular expression are also used in egrep so they are well known by system administrators. Please note that Perl regular expressions are equivalent to extended regular expressions with a few additional features:Extended regular expression support set of predefined character classes. When used between brackets, these define commonly used sets of characters. The POSIX character classes implemented in extended regular expressions include:
NOTE: I have problems with GNU bash, version 3.2.25(1)-release (x86_64-redhat-linux-gnu) using those extended classes. It does accept them, but it does not match correctly.
Modifiers are similar to Perl
Extended regex | Perl regex |
a+ | a+ |
a? | a? |
a|b | a|b |
(expression1) | (expression1) |
{m,n} | {m,n} |
{,n} | {,n} |
{m,} | {m,} |
{m} | {m} |
It returns 0 (success) if the regular expression matches the string, otherwise it returns 1 (failure).
In addition to doing simple matching, bash regular expressions support sub-patterns surrounded by parenthesis for capturing parts of the match. The matches are assigned to an array variable BASH_REMATCH. The entire match is assigned to BASH_REMATCH[0], the first sub-pattern is assigned to BASH_REMATCH[1], etc..
The following example script takes a regular expression as its first argument and one or more strings to match against. It then cycles through the strings and outputs the results of the match process:
#!/bin.bash if [[ $# -lt 2 ]]; then echo "Usage: $0 PATTERN STRINGS..." exit 1 fi regex=$1 shift echo "regex: $regex" echo while [[ $1 ]] do if [[ $1 =~ $regex ]]; then echo "$1 matches" i=1 n=${#BASH_REMATCH[*]} while [[ $i -lt $n ]] do echo " capture[$i]: ${BASH_REMATCH[$i]}" let i++ done else echo "$1 does not match" fi shift done
Assuming the script is saved in "bashre.sh", the following sample shows its output:
# sh bashre.sh 'aa(b{2,3}[xyz])cc' aabbxcc aabbcc regex: aa(b{2,3}[xyz])cc aabbxcc matches capture[1]: bbx aabbcc does not match
Pattern-matching operators were introduced in ksh88 in a very idiosyncratic way. The notation is different from used by Perl or utilities such as grep but it has some internal logic in it and is usable for those who are programming in bash on regular basis. For everybody else there is a need to consult the reference each time you use this capabilities (stack overflow ;-) Life is not perfect.
NOTE: While they are hard to remember, but there is a handy mnemonic tip:
There are two kinds of pattern matching available: matching from the left and matching from the right.
The operators, with their functions and an example, are shown in the following table (note that on keyboard symbol "#" is to the left and symbol "%" is to the right of dollar sign; that might help to memorize them):
Operator | Meaning | Example |
${var#t*is} | Deletes the shortest possible match from the left: If the pattern matches the beginning of the variable's value, delete the shortest part that matches and return the rest. | export $var="this is a test" echo ${var#t*is} is a test |
${var##t*is} | Deletes the longest possible match from the left: If the pattern matches the beginning of the variable's value, delete the longest part that matches and return the rest. | export $var="this is a test"
echo ${var##t*is} a test |
${var%t*st} | Deletes the shortest possible match from the right: If the pattern matches the end of the variable's value, delete the shortest part that matches and return the rest. | export $var="this is a test"
echo ${var%t*st} this is a |
${var%%t*st} | Deletes the longest possible match from the right: If the pattern matches the end of the variable's value, delete the longest part that matches and return the rest. | export $var="this is a test" echo ${var%%t*is} |
These operators can be used to cu string both from right and left and extract the necessary part. In the example below this is done with uptime:
cores=`grep processor /proc/cpuinfo | wc -l` cpuload=`uptime` cpuload=${cpuload#*average: } cpuload=${cpuload%%.*} if (( cpuload > cores + 1 )) ; then echo "Server $HOSTNAME overloaded: $cpuload on $corex cores" fi
For example, the following script changes the extension of all .html files to .htm.
#!/bin/bash # quickly convert html filenames for use on a dossy system # only handles file extensions, not filenames for i in *.html; do if [ -f ${i%l} ]; then echo ${i%l} already exists else mv $i ${i%l} fi done
The classic use for pattern-matching operators is stripping off components of pathnames, such as directory prefixes and filename suffixes. With that in mind, here is an example that shows how all of the operators work. Assume that the variable path has the value /home /billr/mem/long.file.name; then:
Expression Result ${path##/*/} long.file.name ${path#/*/} billr/mem/long.file.name $path /home/billr/mem/long.file.name ${path%.*} /home/billr/mem/long.file ${path%%.*} /home/billr/mem/long
Example:
$ export var="this is a test"
$ echo ${var#t*is}
is a test
Example:
$ export var="this is a test"
$ echo ${var##t*is}
a test
Example:
$ export var="this is a test" $ echo ${var%t*st} this is a
for i in *.htm*; do if [ -f ${i%l} ]; then echo "${i%l} already exists" else mv $i ${i%l} fi done
Example:
$ export var="this is a test" $ echo ${var%%t*st}
A KSH regular expression are now obsolete. Please use Perl-style regular expression if [[ variable =~ regex ]] instead (available in bash 4.x)
They use idiosyncratic prefix-based notation that is difficult to learn after you got used to regular suffix based notation. In other words each of "quantity metasymbol" or quantifiers should be used as prefix for the string not as suffix as everywhere in Unix (like in ls resolv*, not ls *(resolv).)
This was a huge blunder committed by David Korn and still is was never corrected. In any case attempts to used it looks pretty perverse and if should be abandoned for good. The notes below are just for those unfortunate people who need to understand somebody else scripts which use this notation. Information below was extracted from Learning the Korn Shell, 2nd Edition 4.5. String Operators. I never verified its correctness.
Each such operator has the form x(exp), where x is the particular operator and exp is any regular expression (often simply a regular string). The operator determines how many occurrences of exp a string that matches the pattern can contain.
Operator | Meaning |
---|---|
*(exp) | 0 or more occurrences of exp |
+(exp) | 1 or more occurrences of exp |
?(exp) | 0 or 1 occurrences of exp |
@(exp1|exp2|...) | exp1 or exp2 or... |
!(exp) | Anything that doesn't match exp |
Expression | Matches |
---|---|
x | x |
*(x) | Null string, x, xx, xxx, ... |
+(x) | x, xx, xxx, ... |
?(x) | Null string, x |
!(x) | Any string except x |
@(x) | x (see below) |
The following section compares Korn shell regular expressions to analogous features in awk and egrep. If you aren't familiar with these, skip to the section entitled "Pattern-matching Operators."
Shell | egrep/awk | Meaning |
---|---|---|
*(exp) | exp* | 0 or more occurrences of exp |
+(exp) | exp+ | 1 or more occurrences of exp |
?(exp) | exp? | 0 or 1 occurrences of exp |
@(exp1|exp2|...) | exp1|exp2|... | exp1 or exp2 or... |
!(exp) | (none) | Anything that doesn't match exp |
These equivalents are close but not quite exact. Actually, an exp within any of the Korn shell operators can be a series of exp1|exp2|... alternates. But because the shell would interpret an expression like dave|fred|bob as a pipeline of commands, you must use @(dave|fred|bob) for alternates
For example:
It is worth re-emphasizing that shell regular expressions can still contain standard shell wildcards. Thus, the shell wildcard ? (match any single character) is the equivalent to . in egrep or awk, and the shell's character set operator [...] is the same as in those utilities. For example, the expression +([0-9]) matches a number, i.e., one or more digits. The shell wildcard character * is equivalent to the shell regular expression * (?).
A few egrep and awk regex operators do not have equivalents in the Korn shell. These include:
The first two pairs are hardly necessary, since the Korn shell doesn't normally operate on text files and does parse strings into words itself.
Conversion of string to lower and upper case was weak point of shell for along time. Before recent enhancements described below the "standard" way to converting string was to use tr function:
a='Hi all'
a=$(tr '[:upper:]' '[:lower:]' <<< "$a") hi all
Please note that this is a better solution then
a=$(tr '[A-Z]' '[a-z]' <<< "$a")
Using A-Z assumed that the text is ASCII. Which may or may not be the case. Note that tr '[A-Z]'
'[a-z];
is incorrect in almost all locales. For example, in the en-US
locale,
A-Z
is actually the interval AaBbCcDdEeFfGgHh...XxYyZ
In more complex cases Perl or AWK should be used instead.
If you need to work with very old versions of bash (1.x) and do not have access to tr, sed, awl or Perl (poor you ;-) you can to create two (or more) functions for this purpose. See Converting string to lower case in Bash - Stack Overflow for inspiration. Here is an example from this thread:
lcs="abcdefghijklmnopqrstuvwxyz"
ucs="ABCDEFGHIJKLMNOPQRSTUVWXYZ"
input="Change Me To All Capitals"
for (( i=0; i<"${#input}"; i++ )) ; do :
for (( j=0; j<"${#lcs}"; j++ )) ; do :
if [[ "${input:$i:1}" == "${lcs:$j:1}" ]] ; then
input="${input/${input:$i:1}/${ucs:$j:1}}"
fi
done
done
The typeset keyword (declare is the alias of the keyword typeset) usually is used for specifying integer type or creating local variables in shell. But recently its functionality was extended and now it allow perform two really useful string operations:
From the moment you used this option the string will cast to the specified case on assignment. It will work until you explicitly turn it off. You can turn off typeset options explicitly by typing typeset +o , where o is the option you turned on before.
In Korn shell there are two additional useful options (-L and -R) that allow also trimming string to fixed length and remove leading blanks. An obvious application for the -those options is one in which you need fixed-width output.
Here are a simple example taken from Leaning the Korn Shell [Chapter 6] 6.3 Arrays
Assume that the variable alpha is assigned the letters of the alphabet, in alternating case, surrounded by three blanks on each side:
alpha=" aBcDeFgHiJkLmNoPqRsTuVwXyZ "Table 6.6 shows some typeset statements and their resulting values (assuming that each of the statements are run "independently").
Table 6.6: Examples of typeset String Formatting Options Statement Value of v typeset -L v=$alpha "aBcDeFgHiJkLmNoPqRsTuVwXyZ "
typeset -L10 v=$alpha "aBcDeFgHiJ"
typeset -R v=$alpha " aBcDeFgHiJkLmNoPqRsTuVwXyZ"
typeset -R16 v=$alpha "kLmNoPqRsTuVwXyZ"
typeset -l v=$alpha " abcdefghijklmnopqrstuvwxyz"
typeset -uR5 v=$alpha "VWXYZ"
typeset -Z8 v= "123.50"
"00123.50"
More examples
$ a="A Few Words"
$ declare -l a=a
$ echo "$a"
a few words
$ a="A Few Words"
$ declare -u a=a
$ echo "$a"
A FEW WORDS
The declation provided will work for subsequent assignments too: all string will be forcefully converted to specific case on assignment. While in many cases very convenient, sometimes it can mange your strings, is you forget about the fact that this declation will be in force until implicitly removed. If such behaviour is not what you want you need to remove particular flag from the variable by using declare the attribute with +l or +u
Another way to convert the string to lower case in bash 4.x, which do not have such a side effect, is ,, operator
$ a=${a,,}
Similarly to convert string to upper case you can use ^^
a=${a^^}
You can also toggle first character by word using ~
(probably inspired by VI)
${a~}
There are also several other options.
More sophisticated version that works only with ASCII characters uses the fact that the distance between lower and upper characters is the same for all letters of the alphabet. (but not for _ or any other special letter). In this case you can work with the binary representation of each letter, which allow you to implement more complex variants of conversion (for example with partial transliteration of symbols)
97 - 65 = 32
And this is the working version with examples.
Please note the comments in the code, as they explain a lot of stuff:#!/bin/bash # lowerupper.sh # Prints the lowercase version of a char lowercaseChar(){ case "$1" in [A-Z]) n=$(printf "%d" "'$1") n=$((n+32)) printf \\$(printf "%o" "$n") ;; *) printf "%s" "$1" ;; esac } # Prints the lowercase version of a sequence of strings lowercase() { word="$@" for((i=0;i<${#word};i++)); do ch="${word:$i:1}" lowercaseChar "$ch" done } # Prints the uppercase version of a char uppercaseChar(){ case "$1" in [a-z]) n=$(printf "%d" "'$1") n=$((n-32)) printf \\$(printf "%o" "$n") ;; *) printf "%s" "$1" ;; esac } # Prints the uppercase version of a sequence of strings uppercase() { word="$@" for((i=0;i<${#word};i++)); do ch="${word:$i:1}" uppercaseChar "$ch" done } # The functions will not add a new line, so use echo or # append it if you want a new line after printing # Printing stuff directly lowercase "I AM the Walrus!"$'\n' uppercase "I AM the Walrus!"$'\n' echo "----------" # Printing a var str="A StRing WITH mixed sTUFF!" lowercase "$str"$'\n' uppercase "$str"$'\n' echo "----------" # Not quoting the var should also work, # since we use "$@" inside the functions lowercase $str$'\n' uppercase $str$'\n' echo "----------" # Assigning to a var myLowerVar="$(lowercase $str)" myUpperVar="$(uppercase $str)" echo "myLowerVar: $myLowerVar" echo "myUpperVar: $myUpperVar" echo "----------" # You can even do stuff like if [[ 'option 2' = "$(lowercase 'OPTION 2')" ]]; then echo "Fine! All the same!" else echo "Ops! Not the same!" fi exit 0
And the results after running this:
$ ./lowerupper.sh i am the walrus! I AM THE WALRUS! ---------- a string with mixed stuff! A STRING WITH MIXED STUFF! ---------- a string with mixed stuff! A STRING WITH MIXED STUFF! ---------- myLowerVar: a string with mixed stuff! myUpperVar: A STRING WITH MIXED STUFF! ---------- Fine! All the same!
But this is mostly "art for the sake of art". Moreover it will not work in borne shell as borne shell
does not support ${1:$i:1}.
But this solution is portable between all version of bash in
use (C-style for loop was introduced in bash 2.0 I think).
Here string are a special case of shell assignment (From Wikipedia):
In the following example, text is passed to thetr
command (transliterating lower to upper-case) using a here document. This could be in a shell file, or entered interactively at a prompt.$ tr a-z A-Z << END_TEXT > one two three > four five six > END_TEXT ONE TWO THREE FOUR FIVE SIX
END_TEXT
was used as the delimiting identifier. It specified the start and end of the here document. The redirect and the delimiting identifier do not need to be separated by a space:<<END_TEXT
or<< END_TEXT
both work equally well.Appending a minus sign to the << has the effect that leading tabs are ignored. This allows indenting here documents in shell scripts (primarily for alignment with existing indentation) without changing their value:[a]
$ tr a-z A-Z <<- END_TEXT > one two three > four five six > END_TEXT ONE TWO THREE FOUR FIVE SIXThis yields the same output, notably not indented.
By default, behavior is largely identical to the contents of double quotes: variables are interpolated, commands in backticks are evaluated, etc.[b]
$ cat << EOF > \$ Working dir "$PWD" `pwd` > EOF $ Working dir "/home/user" /home/userThis can be disabled by quoting any part of the label, which is then ended by the unquoted value;[c] the behavior is essentially identical to that if the contents were enclosed in single quotes. Thus for example by setting it in single quotes:
$ cat << 'EOF' > \$ Working dir "$PWD" `pwd` > EOF \$ Working dir "$PWD" `pwd`Double quotes may also be used, but this is subject to confusion, because expansion does occur in a double-quoted string, but does not occur in a here document with double-quoted delimiter.[3] Single- and double-quoted delimiters are distinguished in some other languages, notably Perl (see below), where behavior parallels the corresponding string quoting.
Here strings
A here string (available in Bash, ksh, or zsh) is syntactically similar, consisting of
<<<
, and effects input redirection from a word (a sequence treated as a unit by the shell, in this context generally a string literal). In this case the usual shell syntax is used for the word ("here string syntax"), with the only syntax being the redirection: a here string is an ordinary string used for input redirection, not a special kind of string.Operation <<< was introduced in bash-2.05b
A single word need not be quoted:
$ tr a-z A-Z <<< one ONEIn case of a string with spaces, it must be quoted:
$ tr a-z A-Z <<< 'one two three' ONE TWO THREEThis could also be written as:
$ FOO='one two three' $ tr a-z A-Z <<< $FOO ONE TWO THREEMultiline strings are acceptable, yielding:
$ tr a-z A-Z <<< 'one > two three' ONE TWO THREENote that leading and trailing newlines, if present, are included:
$ tr a-z A-Z <<< ' > one > two three > ' ONE TWO THREE $The key difference from here documents is that, in here documents, the delimiters are on separate lines; the leading and trailing newlines are stripped. Here, the terminating delimiter can be specified.
Here strings are particularly useful for commands that often take short input, such as the calculator bc:
$ bc <<< 2^10 1024Note that here string behavior can also be accomplished (reversing the order) via piping and the
echo
command, as in:$ echo 'one two three' | tr a-z A-Z ONE TWO THREEhowever here strings are particularly useful when the last command needs to run in the current process, as is the case with the
read
builtin:$ echo 'one two three' | read a b c $ echo $a $b $cyields nothing, while
$ read a b c <<< 'one two three' $ echo $a $b $c one two threeThis happens because in the previous example piping causes
read
to run in a subprocess, and as such can not affect the environment of the parent process.
|
Switchboard | ||||
Latest | |||||
Past week | |||||
Past month |
Please visit Heiner Steven's SHELLdorado, the best shell scripting site on the Internet |
May 23, 2021 | sookocheff.com
The Bash String Operators Posted on December 11, 2014 | 3 minutes | Kevin Sookocheff
A common task in bash programming is to manipulate portions of a string and return the result. bash provides rich support for these manipulations via string operators. The syntax is not always intuitive so I wanted to use this blog post to serve as a permanent reminder of the operators.
The string operators are signified with the
Substring Extraction${}
notation. The operations can be grouped in to a few classes. Each heading in this article describes a class of operation.
Extract from a position
1 ${string:position}Extraction returns a substring of
string
starting atposition
and ending at the end ofstring
.string
is treated as an array of characters starting at 0.
1 2 3 4 5 > string="hello world" > echo ${string:1} ello world > echo ${string:6} world
Extract from a position with a length
1 ${string:position:length}Adding a length returns a substring only as long as the
length
parameter.Substring Removal
1 2 3 4 5 > string="hello world" > echo ${string:1:2} el > echo ${string:6:3} wor
Remove shortest starting match
1 ${variable#pattern}If
variable
starts withpattern
, delete the shortest part that matches the pattern.
1 2 3 > string="hello world, hello jim" > echo ${string#*hello} world, hello jim
Remove longest starting match
1 ${variable##pattern}If
variable
starts withpattern
, delete the longest match fromvariable
and return the rest.
1 2 3 > string="hello world, hello jim" > echo ${string##*hello} jim
Remove shortest ending match
1 ${variable%pattern}If
variable
ends withpattern
, delete the shortest match from the end ofvariable
and return the rest.
1 2 3 > string="hello world, hello jim" > echo ${string%hello*} hello world,
Remove longest ending match
1 ${variable%%pattern}If
variable
ends withpattern
, delete the longest match from the end ofvariable
and return the rest.Substring Replacement
1 2 3 > string="hello world, hello jim" > echo ${string%%hello*}
Replace first occurrence of word
1 ${variable/pattern/string}Find the first occurrence of
pattern
invariable
and replace it withstring
. Ifstring
is null,pattern
is deleted fromvariable
. Ifpattern
starts with#
, the match must occur at the beginning ofvariable
. Ifpattern
starts with%
, the match must occur at the end of thevariable
.
1 2 3 > string="hello world, hello jim" > echo ${string/hello/goodbye} goodbye world, hello jim
Replace all occurrences of word
1 ${variable//pattern/string}Same as above but finds all occurrences of
pattern
invariable
and replace them withstring
. Ifstring
is null,pattern
is deleted fromvariable
.bash
1 2 3 > string="hello world, hello jim" > echo ${string//hello/goodbye} goodbye world, goodbye jimSee also
May 10, 2021 | www.xmodulo.com
When you need to split a string in bash, you can use bash's built-in
read
command. This command reads a single line of string from stdin, and splits the string on a delimiter. The split elements are then stored in either an array or separate variables supplied with theread
command. The default delimiter is whitespace characters (' ', '\t', '\r', '\n'). If you want to split a string on a custom delimiter, you can specify the delimiter inIFS
variable before callingread
.# strings to split var1="Harry Samantha Bart Amy" var2="green:orange:black:purple" # split a string by one or more whitespaces, and store the result in an array read -a my_array <<< $var1 # iterate the array to access individual split words for elem in "${my_array[@]}"; do echo $elem done echo "----------" # split a string by a custom delimter IFS=':' read -a my_array2 <<< $var2 for elem in "${my_array2[@]}"; do echo $elem doneHarry Samantha Bart Amy ---------- green orange black purple
May 10, 2021 | www.xmodulo.com
Remove a Trailing Newline Character from a String in Bash
If you want to remove a trailing newline or carriage return character from a string, you can use the bash's parameter expansion in the following form.
${string%$var}This expression implies that if the "string" contains a trailing character stored in "var", the result of the expression will become the "string" without the character. For example:
# input string with a trailing newline character input_line=$'This is my example line\n' # define a trailing character. For carriage return, replace it with $'\r' character=$'\n' echo -e "($input_line)" # remove a trailing newline character input_line=${input_line%$character} echo -e "($input_line)"(This is my example line ) (This is my example line)Trim Leading/Trailing Whitespaces from a String in BashIf you want to remove whitespaces at the beginning or at the end of a string (also known as leading/trailing whitespaces) from a string, you can use
sed
command.my_str=" This is my example string " # original string with leading/trailing whitespaces echo -e "($my_str)" # trim leading whitespaces in a string my_str=$(echo "$my_str" | sed -e "s/^[[:space:]]*//") echo -e "($my_str)" # trim trailing whitespaces in a string my_str=$(echo "$my_str" | sed -e "s/[[:space:]]*$//") echo -e "($my_str)"( This is my example string ) (This is my example string ) ← leading whitespaces removed (This is my example string) ← trailing whitespaces removedIf you want to stick with bash's built-in mechanisms, the following bash function can get the job done.
trim() { local var="$*" # remove leading whitespace characters var="${var#"${var%%[![:space:]]*}"}" # remove trailing whitespace characters var="${var%"${var##*[![:space:]]}"}" echo "$var" } my_str=" This is my example string " echo "($my_str)" my_str=$(trim $my_str) echo "($my_str)"
May 10, 2021 | www.oreilly.com
Table 4-1. Substitution Operators
Operator Substitution $ { varname :- word } If varname exists and isn't null, return its value; otherwise return word .
Purpose : Returning a default value if the variable is undefined.
Example : ${count:-0} evaluates to 0 if count is undefined.
$ { varname := word } If varname exists and isn't null, return its value; otherwise set it to word and then return its value. Positional and special parameters cannot be assigned this way.
Purpose : Setting a variable to a default value if it is undefined.
Example : $ {count := 0} sets count to 0 if it is undefined.
$ { varname :? message } If varname exists and isn't null, return its value; otherwise print varname : followed by message , and abort the current command or script (non-interactive shells only). Omitting message produces the default message parameter null or not set .
Purpose : Catching errors that result from variables being undefined.
Example : {count :?" undefined! " } prints "count: undefined!" and exits if count is undefined.
$ { varname : + word } If varname exists and isn't null, return word ; otherwise return null.
Purpose : Testing for the existence of a variable.
Example : $ {count :+ 1} returns 1 (which could mean "true") if count is defined.
$ { varname : offset } $ { varname : offset : length } Performs substring expansion. a It returns the substring of $ varname starting at offset and up to length characters. The first character in $ varname is position 0. If length is omitted, the substring starts at offset and continues to the end of $ varname . If offset is less than 0 then the position is taken from the end of $ varname . If varname is @ , the length is the number of positional parameters starting at parameter offset .
Purpose : Returning parts of a string (substrings or slices ).
Example : If count is set to frogfootman , $ {count :4} returns footman . $ {count :4:4} returns foot .
[ 52 ]
Table 4-2. Pattern-Matching Operators
Operator Meaning $ { variable # pattern } If the pattern matches the beginning of the variable's value, delete the shortest part that matches and return the rest.
$ { variable ## pattern } If the pattern matches the beginning of the variable's value, delete the longest part that matches and return the rest.
$ { variable % pattern } If the pattern matches the end of the variable's value, delete the shortest part that matches and return the rest.
$ { variable %% pattern } If the pattern matches the end of the variable's value, delete the longest part that matches and return the rest.
$ { variable / pattern / string } $ { variable // pattern / string } The longest match to pattern in variable is replaced by string . In the first form, only the first match is replaced. In the second form, all matches are replaced. If the pattern is begins with a # , it must match at the start of the variable. If it begins with a % , it must match with the end of the variable. If string is null, the matches are deleted. If variable is @ or * , the operation is applied to each positional parameter in turn and the expansion is the resultant list. a
May 10, 2021 | linuxize.com
Another way of concatenating strings in bash is by appending variables or literal strings to a variable using the
+=
operator:VAR1="Hello, " VAR1+=" World" echo "$VAR1"CopyHello, World CopyThe following example is using the
languages.sh+=
operator to concatenate strings in bash for loop :VAR="" for ELEMENT in 'Hydrogen' 'Helium' 'Lithium' 'Beryllium'; do VAR+="${ELEMENT} " done echo "$VAR"CopyHydrogen Helium Lithium Beryllium
May 10, 2021 | sites.google.com
The curly-bracket syntax allows for the shell's string operators . String operators allow you to manipulate values of variables in various useful ways without having to write full-blown programs or resort to external UNIX utilities. You can do a lot with string-handling operators even if you haven't yet mastered the programming features we'll see in later chapters.
In particular, string operators let you do the following:
4.3.1 Syntax of String Operators
- Ensure that variables exist (i.e., are defined and have non-null values)
- Set default values for variables
- Catch errors that result from variables not being set
- Remove portions of variables' values that match patterns
The basic idea behind the syntax of string operators is that special characters that denote operations are inserted between the variable's name and the right curly brackets. Any argument that the operator may need is inserted to the operator's right.
The first group of string-handling operators tests for the existence of variables and allows substitutions of default values under certain conditions. These are listed in Table 4.1 . [6]
[6] The colon (
:
) in each of these operators is actually optional. If the colon is omitted, then change "exists and isn't null" to "exists" in each definition, i.e., the operator tests for existence only.
Table 4.1: Substitution Operators Operator Substitution ${ varname :- word } If varname exists and isn't null, return its value; otherwise return word . Purpose : Returning a default value if the variable is undefined. Example : ${count:-0} evaluates to 0 if count is undefined. ${ varname := word} If varname exists and isn't null, return its value; otherwise set it to word and then return its value.[7] Purpose : Setting a variable to a default value if it is undefined. Example : $
{count:=0} sets count to 0 if it is undefined.${ varname :?
message }If varname exists and isn't null, return its value; otherwise print varname : followed by message , and abort the current command or script. Omitting message produces the default message parameter null or not set . Purpose : Catching errors that result from variables being undefined. Example : {count :?"
undefined!"
} prints "count: undefined!" and exits if count is undefined.${ varname :+
word }If varname exists and isn't null, return word ; otherwise return null. Purpose : Testing for the existence of a variable. Example : ${count:+1} returns 1 (which could mean "true") if count is defined. [7] Pascal, Modula, and Ada programmers may find it helpful to recognize the similarity of this to the assignment operators in those languages.
The first two of these operators are ideal for setting defaults for command-line arguments in case the user omits them. We'll use the first one in our first programming task.
Task 4.1You have a large album collection, and you want to write some software to keep track of it. Assume that you have a file of data on how many albums you have by each artist. Lines in the file look like this:
14 Bach, J.S. 1 Balachander, S. 21 Beatles 6 Blakey, ArtWrite a program that prints the N highest lines, i.e., the N artists by whom you have the most albums. The default for N should be 10. The program should take one argument for the name of the input file and an optional second argument for how many lines to print.
By far the best approach to this type of script is to use built-in UNIX utilities, combining them with I/O redirectors and pipes. This is the classic "building-block" philosophy of UNIX that is another reason for its great popularity with programmers. The building-block technique lets us write a first version of the script that is only one line long:
sort -nr $1 | head -${2:-10}Here is how this works: the sort (1) program sorts the data in the file whose name is given as the first argument ( $1 ). The -n option tells sort to interpret the first word on each line as a number (instead of as a character string); the -r tells it to reverse the comparisons, so as to sort in descending order.
The output of sort is piped into the head (1) utility, which, when given the argument - N , prints the first N lines of its input on the standard output. The expression -${2:-10} evaluates to a dash ( - ) followed by the second argument if it is given, or to -10 if it's not; notice that the variable in this expression is 2 , which is the second positional parameter.
Assume the script we want to write is called highest . Then if the user types highest myfile , the line that actually runs is:
sort -nr myfile | head -10Or if the user types highest myfile 22 , the line that runs is:
sort -nr myfile | head -22Make sure you understand how the :- string operator provides a default value.
This is a perfectly good, runnable script-but it has a few problems. First, its one line is a bit cryptic. While this isn't much of a problem for such a tiny script, it's not wise to write long, elaborate scripts in this manner. A few minor changes will make the code more readable.
First, we can add comments to the code; anything between # and the end of a line is a comment. At a minimum, the script should start with a few comment lines that indicate what the script does and what arguments it accepts. Second, we can improve the variable names by assigning the values of the positional parameters to regular variables with mnemonic names. Finally, we can add blank lines to space things out; blank lines, like comments, are ignored. Here is a more readable version:
# # highest filename [howmany] # # Print howmany highest-numbered lines in file filename. # The input file is assumed to have lines that start with # numbers. Default for howmany is 10. # filename=$1 howmany=${2:-10} sort -nr $filename | head -$howmanyThe square brackets around howmany in the comments adhere to the convention in UNIX documentation that square brackets denote optional arguments.
The changes we just made improve the code's readability but not how it runs. What if the user were to invoke the script without any arguments? Remember that positional parameters default to null if they aren't defined. If there are no arguments, then $1 and $2 are both null. The variable howmany ( $2 ) is set up to default to 10, but there is no default for filename ( $1 ). The result would be that this command runs:
sort -nr | head -10As it happens, if sort is called without a filename argument, it expects input to come from standard input, e.g., a pipe (|) or a user's terminal. Since it doesn't have the pipe, it will expect the terminal. This means that the script will appear to hang! Although you could always type [CTRL-D] or [CTRL-C] to get out of the script, a naive user might not know this.
Therefore we need to make sure that the user supplies at least one argument. There are a few ways of doing this; one of them involves another string operator. We'll replace the line:
filename=$1with:
filename=${1:?"filename missing."}This will cause two things to happen if a user invokes the script without any arguments: first the shell will print the somewhat unfortunate message:
highest: 1: filename missing.to the standard error output. Second, the script will exit without running the remaining code.
With a somewhat "kludgy" modification, we can get a slightly better error message. Consider this code:
filename=$1 filename=${filename:?"missing."}This results in the message:
highest: filename: missing.(Make sure you understand why.) Of course, there are ways of printing whatever message is desired; we'll find out how in Chapter 5 .
Before we move on, we'll look more closely at the two remaining operators in Table 4.1 and see how we can incorporate them into our task solution. The := operator does roughly the same thing as :- , except that it has the "side effect" of setting the value of the variable to the given word if the variable doesn't exist.
Therefore we would like to use := in our script in place of :- , but we can't; we'd be trying to set the value of a positional parameter, which is not allowed. But if we replaced:
howmany=${2:-10}with just:
howmany=$2and moved the substitution down to the actual command line (as we did at the start), then we could use the := operator:
sort -nr $filename | head -${howmany:=10}Using := has the added benefit of setting the value of howmany to 10 in case we need it afterwards in later versions of the script.
The final substitution operator is :+ . Here is how we can use it in our example: Let's say we want to give the user the option of adding a header line to the script's output. If he or she types the option -h , then the output will be preceded by the line:
ALBUMS ARTISTAssume further that this option ends up in the variable header , i.e., $header is -h if the option is set or null if not. (Later we will see how to do this without disturbing the other positional parameters.)
The expression:
${header:+"ALBUMS ARTIST\n"}yields null if the variable header is null, or ALBUMS══ARTIST\n if it is non-null. This means that we can put the line:
print -n ${header:+"ALBUMS ARTIST\n"}right before the command line that does the actual work. The -n option to print causes it not to print a LINEFEED after printing its arguments. Therefore this print statement will print nothing-not even a blank line-if header is null; otherwise it will print the header line and a LINEFEED (\n).
4.3.2 Patterns and Regular ExpressionsWe'll continue refining our solution to Task 4-1 later in this chapter. The next type of string operator is used to match portions of a variable's string value against patterns . Patterns, as we saw in Chapter 1 are strings that can contain wildcard characters (
*
,?
, and [] for character sets and ranges).Wildcards have been standard features of all UNIX shells going back (at least) to the Version 6 Bourne shell. But the Korn shell is the first shell to add to their capabilities. It adds a set of operators, called regular expression (or regexp for short) operators, that give it much of the string-matching power of advanced UNIX utilities like awk (1), egrep (1) (extended grep (1)) and the emacs editor, albeit with a different syntax. These capabilities go beyond those that you may be used to in other UNIX utilities like grep , sed (1) and vi (1).
Advanced UNIX users will find the Korn shell's regular expression capabilities occasionally useful for script writing, although they border on overkill. (Part of the problem is the inevitable syntactic clash with the shell's myriad other special characters.) Therefore we won't go into great detail about regular expressions here. For more comprehensive information, the "last word" on practical regular expressions in UNIX is sed & awk , an O'Reilly Nutshell Handbook by Dale Dougherty. If you are already comfortable with awk or egrep , you may want to skip the following introductory section and go to "Korn Shell Versus awk/egrep Regular Expressions" below, where we explain the shell's regular expression mechanism by comparing it with the syntax used in those two utilities. Otherwise, read on.
4.3.2.1 Regular expression basicsThink of regular expressions as strings that match patterns more powerfully than the standard shell wildcard schema. Regular expressions began as an idea in theoretical computer science, but they have found their way into many nooks and crannies of everyday, practical computing. The syntax used to represent them may vary, but the concepts are very much the same.
A shell regular expression can contain regular characters, standard wildcard characters, and additional operators that are more powerful than wildcards. Each such operator has the form x ( exp ) , where x is the particular operator and exp is any regular expression (often simply a regular string). The operator determines how many occurrences of exp a string that matches the pattern can contain. See Table 4.2 and Table 4.3 .
Table 4.2: Regular Expression Operators Operator Meaning *
( exp )0 or more occurrences of exp +
( exp )1 or more occurrences of exp ?
( exp )0 or 1 occurrences of exp @ ( exp1 | exp2 |...) exp1 or exp2 or... ! ( exp ) Anything that doesn't match exp [8] [8] Actually, !( exp ) is not a regular expression operator by the standard technical definition, though it is a handy extension.
Table 4.3: Regular Expression Operator Examples Expression Matches x x *
( x )Null string, x , xx , xxx , ... +
( x )x , xx , xxx , ... ?
( x )Null string, x !
( x )Any string except x @
( x )x (see below) Regular expressions are extremely useful when dealing with arbitrary text, as you already know if you have used grep or the regular-expression capabilities of any UNIX editor. They aren't nearly as useful for matching filenames and other simple types of information with which shell users typically work. Furthermore, most things you can do with the shell's regular expression operators can also be done (though possibly with more keystrokes and less efficiency) by piping the output of a shell command through grep or egrep .
Nevertheless, here are a few examples of how shell regular expressions can solve filename-listing problems. Some of these will come in handy in later chapters as pieces of solutions to larger tasks.
- The emacs editor supports customization files whose names end in .el (for Emacs LISP) or .elc (for Emacs LISP Compiled). List all emacs customization files in the current directory.
- In a directory of C source code, list all files that are not necessary. Assume that "necessary" files end in .c or .h , or are named Makefile or README .
- Filenames in the VAX/VMS operating system end in a semicolon followed by a version number, e.g., fred.bob;23 . List all VAX/VMS-style filenames in the current directory.
Here are the solutions:
- In the first of these, we are looking for files that end in .el with an optional c . The expression that matches this is
*
.el?
(c) .- The second example depends on the four standard subexpressions
*
.c ,*
.h , Makefile , and README . The entire expression is !(*
.c|*
.h|Makefile|README) , which matches anything that does not match any of the four possibilities.- The solution to the third example starts with
*
\;
: the shell wildcard*
followed by a backslash-escaped semicolon. Then, we could use the regular expression +([0-9]) , which matches one or more characters in the range [0-9] , i.e., one or more digits. This is almost correct (and probably close enough), but it doesn't take into account that the first digit cannot be 0. Therefore the correct expression is*
\;[1-9]*
([0-9]) , which matches anything that ends with a semicolon, a digit from 1 to 9, and zero or more digits from 0 to 9.Regular expression operators are an interesting addition to the Korn shell's features, but you can get along well without them-even if you intend to do a substantial amount of shell programming.
In our opinion, the shell's authors missed an opportunity to build into the wildcard mechanism the ability to match files by type (regular, directory, executable, etc., as in some of the conditional tests we will see in Chapter 5 ) as well as by name component. We feel that shell programmers would have found this more useful than arcane regular expression operators.
The following section compares Korn shell regular expressions to analogous features in awk and egrep . If you aren't familiar with these, skip to the section entitled "Pattern-matching Operators."
4.3.2.2 Korn shell versus awk/egrep regular expressionsTable 4.4 is an expansion of Table 4.2 : the middle column shows the equivalents in awk / egrep of the shell's regular expression operators.
Table 4.4: Shell Versus egrep/awk Regular Expression Operators Korn Shell egrep/awk Meaning *
( exp )exp *
0 or more occurrences of exp +( exp ) exp + 1 or more occurrences of exp ?
( exp )exp ?
0 or 1 occurrences of exp @( exp1 | exp2 |...) exp1 | exp2 |... exp1 or exp2 or... ! ( exp ) (none) Anything that doesn't match exp These equivalents are close but not quite exact. Actually, an exp within any of the Korn shell operators can be a series of exp1 | exp2 |... alternates. But because the shell would interpret an expression like dave|fred|bob as a pipeline of commands, you must use @(dave|fred|bob) for alternates by themselves.
For example:
- @(dave|fred|bob) matches dave , fred , or bob .
*
(dave|fred|bob) means, "0 or more occurrences of dave , fred , or bob ". This expression matches strings like the null string, dave , davedave , fred , bobfred , bobbobdavefredbobfred , etc.- +(dave|fred|bob) matches any of the above except the null string.
- ?(dave|fred|bob) matches the null string, dave , fred , or bob .
- !(dave|fred|bob) matches anything except dave , fred , or bob .
It is worth re-emphasizing that shell regular expressions can still contain standard shell wildcards. Thus, the shell wildcard ? (match any single character) is the equivalent to . in egrep or awk , and the shell's character set operator [ ... ] is the same as in those utilities. [9] For example, the expression +([0-9]) matches a number, i.e., one or more digits. The shell wildcard character
*
is equivalent to the shell regular expression*
(?)
.[9] And, for that matter, the same as in grep , sed , ed , vi , etc.
A few egrep and awk regexp operators do not have equivalents in the Korn shell. These include:
- The beginning- and end-of-line operators ^ and $ .
- The beginning- and end-of-word operators \< and \> .
- Repeat factors like \{ N \} and \{ M , N \} .
The first two pairs are hardly necessary, since the Korn shell doesn't normally operate on text files and does parse strings into words itself.
4.3.3 Pattern-matching OperatorsTable 4.5 lists the Korn shell's pattern-matching operators.
Table 4.5: Pattern-matching Operators Operator Meaning $ { variable # pattern } If the pattern matches the beginning of the variable's value, delete the shortest part that matches and return the rest. $ { variable ## pattern } If the pattern matches the beginning of the variable's value, delete the longest part that matches and return the rest. $ { variable % pattern } If the pattern matches the end of the variable's value, delete the shortest part that matches and return the rest. $ { variable %% pattern } If the pattern matches the end of the variable's value, delete the longest part that matches and return the rest. These can be hard to remember, so here's a handy mnemonic device: # matches the front because number signs precede numbers; % matches the rear because percent signs follow numbers.
The classic use for pattern-matching operators is in stripping off components of pathnames, such as directory prefixes and filename suffixes. With that in mind, here is an example that shows how all of the operators work. Assume that the variable path has the value /home /billr/mem/long.file.name ; then:
Expression Result ${path##/*/} long.file.name ${path#/*/} billr/mem/long.file.name $path /home/billr/mem/long.file.name ${path%.*} /home/billr/mem/long.file ${path%%.*} /home/billr/mem/longThe two patterns used here are
/*/
, which matches anything between two slashes, and .*
, which matches a dot followed by anything.We will incorporate one of these operators into our next programming task.
Task 4.2You are writing a C compiler, and you want to use the Korn shell for your front-end.[10]
[10] Don't laugh-many UNIX compilers have shell scripts as front-ends.
Think of a C compiler as a pipeline of data processing components. C source code is input to the beginning of the pipeline, and object code comes out of the end; there are several steps in between. The shell script's task, among many other things, is to control the flow of data through the components and to designate output files.
You need to write the part of the script that takes the name of the input C source file and creates from it the name of the output object code file. That is, you must take a filename ending in .c and create a filename that is similar except that it ends in .o .
The task at hand is to strip the .c off the filename and append .o . A single shell statement will do it:
objname=${filename%.c}.oThis tells the shell to look at the end of filename for .c . If there is a match, return $filename with the match deleted. So if filename had the value fred.c , the expression ${filename%.c} would return fred . The .o is appended to make the desired fred.o , which is stored in the variable objname .
If filename had an inappropriate value (without .c ) such as fred.a , the above expression would evaluate to fred.a.o : since there was no match, nothing is deleted from the value of filename , and .o is appended anyway. And, if filename contained more than one dot-e.g., if it were the y.tab.c that is so infamous among compiler writers-the expression would still produce the desired y.tab.o . Notice that this would not be true if we used %% in the expression instead of % . The former operator uses the longest match instead of the shortest, so it would match .tab.o and evaluate to y.o rather than y.tab.o . So the single % is correct in this case.
A longest-match deletion would be preferable, however, in the following task.
Task 4.3You are implementing a filter that prepares a text file for printer output. You want to put the file's name-without any directory prefix-on the "banner" page. Assume that, in your script, you have the pathname of the file to be printed stored in the variable pathname .
Clearly the objective is to remove the directory prefix from the pathname. The following line will do it:
bannername=${pathname##*/}This solution is similar to the first line in the examples shown before. If pathname were just a filename, the pattern
*
/ (anything followed by a slash) would not match and the value of the expression would be pathname untouched. If pathname were something like fred/bob , the prefix fred/ would match the pattern and be deleted, leaving just bob as the expression's value. The same thing would happen if pathname were something like /dave/pete/fred/bob : since the ## deletes the longest match, it deletes the entire /dave/pete/fred/ .If we used #
*
/ instead of ##*
/ , the expression would have the incorrect value dave/pete/fred/bob , because the shortest instance of "anything followed by a slash" at the beginning of the string is just a slash ( / ).The construct $ { variable ##
4.3.4 Length Operator*
/} is actually equivalent to the UNIX utility basename (1). basename takes a pathname as argument and returns the filename only; it is meant to be used with the shell's command substitution mechanism (see below). basename is less efficient than $ { variable ##/*
} because it runs in its own separate process rather than within the shell. Another utility, dirname (1), does essentially the opposite of basename : it returns the directory prefix only. It is equivalent to the Korn shell expression $ { variable %/*
} and is less efficient for the same reason.There are two remaining operators on variables. One is $ {# varname }, which returns the length of the value of the variable as a character string. (In Chapter 6 we will see how to treat this and similar values as actual numbers so they can be used in arithmetic expressions.) For example, if filename has the value fred.c , then ${#filename} would have the value 6 . The other operator ( $ {# array [
*
]} ) has to do with array variables, which are also discussed in Chapter 6 .
Sep 11, 2019 | stackoverflow.com
Jeff ,May 8 at 18:30
Given a filename in the formsomeletters_12345_moreleters.ext
, I want to extract the 5 digits and put them into a variable.So to emphasize the point, I have a filename with x number of characters then a five digit sequence surrounded by a single underscore on either side then another set of x number of characters. I want to take the 5 digit number and put that into a variable.
I am very interested in the number of different ways that this can be accomplished.
Berek Bryan ,Jan 24, 2017 at 9:30
Use cut :echo 'someletters_12345_moreleters.ext' | cut -d'_' -f 2More generic:
INPUT='someletters_12345_moreleters.ext' SUBSTRING=$(echo $INPUT| cut -d'_' -f 2) echo $SUBSTRINGJB. ,Jan 6, 2015 at 10:13
If x is constant, the following parameter expansion performs substring extraction:b=${a:12:5}where 12 is the offset (zero-based) and 5 is the length
If the underscores around the digits are the only ones in the input, you can strip off the prefix and suffix (respectively) in two steps:
tmp=${a#*_} # remove prefix ending in "_" b=${tmp%_*} # remove suffix starting with "_"If there are other underscores, it's probably feasible anyway, albeit more tricky. If anyone knows how to perform both expansions in a single expression, I'd like to know too.
Both solutions presented are pure bash, with no process spawning involved, hence very fast.
A Sahra ,Mar 16, 2017 at 6:27
Generic solution where the number can be anywhere in the filename, using the first of such sequences:number=$(echo $filename | egrep -o '[[:digit:]]{5}' | head -n1)Another solution to extract exactly a part of a variable:
number=${filename:offset:length}If your filename always have the format
stuff_digits_...
you can use awk:number=$(echo $filename | awk -F _ '{ print $2 }')Yet another solution to remove everything except digits, use
number=$(echo $filename | tr -cd '[[:digit:]]')sshow ,Jul 27, 2017 at 17:22
In case someone wants more rigorous information, you can also search it in man bash like this$ man bash [press return key] /substring [press return key] [press "n" key] [press "n" key] [press "n" key] [press "n" key]Result:
${parameter:offset} ${parameter:offset:length} Substring Expansion. Expands to up to length characters of parameter starting at the character specified by offset. If length is omitted, expands to the substring of parameter start‐ ing at the character specified by offset. length and offset are arithmetic expressions (see ARITHMETIC EVALUATION below). If offset evaluates to a number less than zero, the value is used as an offset from the end of the value of parameter. Arithmetic expressions starting with a - must be separated by whitespace from the preceding : to be distinguished from the Use Default Values expansion. If length evaluates to a number less than zero, and parameter is not @ and not an indexed or associative array, it is interpreted as an offset from the end of the value of parameter rather than a number of characters, and the expan‐ sion is the characters between the two offsets. If parameter is @, the result is length positional parameters beginning at off‐ set. If parameter is an indexed array name subscripted by @ or *, the result is the length members of the array beginning with ${parameter[offset]}. A negative offset is taken relative to one greater than the maximum index of the specified array. Sub‐ string expansion applied to an associative array produces unde‐ fined results. Note that a negative offset must be separated from the colon by at least one space to avoid being confused with the :- expansion. Substring indexing is zero-based unless the positional parameters are used, in which case the indexing starts at 1 by default. If offset is 0, and the positional parameters are used, $0 is prefixed to the list.Aleksandr Levchuk ,Aug 29, 2011 at 5:51
Building on jor's answer (which doesn't work for me):substring=$(expr "$filename" : '.*_\([^_]*\)_.*')kayn ,Oct 5, 2015 at 8:48
I'm surprised this pure bash solution didn't come up:a="someletters_12345_moreleters.ext" IFS="_" set $a echo $2 # prints 12345You probably want to reset IFS to what value it was before, or
unset IFS
afterwards!zebediah49 ,Jun 4 at 17:31
Here's how i'd do it:FN=someletters_12345_moreleters.ext [[ ${FN} =~ _([[:digit:]]{5})_ ]] && NUM=${BASH_REMATCH[1]}Note: the above is a regular expression and is restricted to your specific scenario of five digits surrounded by underscores. Change the regular expression if you need different matching.
TranslucentCloud ,Jun 16, 2014 at 13:27
Following the requirementsI have a filename with x number of characters then a five digit sequence surrounded by a single underscore on either side then another set of x number of characters. I want to take the 5 digit number and put that into a variable.
I found some
grep
ways that may be useful:$ echo "someletters_12345_moreleters.ext" | grep -Eo "[[:digit:]]+" 12345or better
$ echo "someletters_12345_moreleters.ext" | grep -Eo "[[:digit:]]{5}" 12345And then with
-Po
syntax:$ echo "someletters_12345_moreleters.ext" | grep -Po '(?<=_)\d+' 12345Or if you want to make it fit exactly 5 characters:
$ echo "someletters_12345_moreleters.ext" | grep -Po '(?<=_)\d{5}' 12345Finally, to make it be stored in a variable it is just need to use the
var=$(command)
syntax.Darron ,Jan 9, 2009 at 16:13
Without any sub-processes you can:shopt -s extglob front=${input%%_+([a-zA-Z]).*} digits=${front##+([a-zA-Z])_}A very small variant of this will also work in ksh93.
user2350426
add a comment ,Aug 5, 2014 at 8:11
If we focus in the concept of:
"A run of (one or several) digits"We could use several external tools to extract the numbers.
We could quite easily erase all other characters, either sed or tr:
name='someletters_12345_moreleters.ext' echo $name | sed 's/[^0-9]*//g' # 12345 echo $name | tr -c -d 0-9 # 12345But if $name contains several runs of numbers, the above will fail:
If "name=someletters_12345_moreleters_323_end.ext", then:
echo $name | sed 's/[^0-9]*//g' # 12345323 echo $name | tr -c -d 0-9 # 12345323We need to use regular expresions (regex).
To select only the first run (12345 not 323) in sed and perl:echo $name | sed 's/[^0-9]*\([0-9]\{1,\}\).*$/\1/' perl -e 'my $name='$name';my ($num)=$name=~/(\d+)/;print "$num\n";'But we could as well do it directly in bash (1) :
regex=[^0-9]*([0-9]{1,}).*$; \ [[ $name =~ $regex ]] && echo ${BASH_REMATCH[1]}This allows us to extract the FIRST run of digits of any length
surrounded by any other text/characters.Note :
regex=[^0-9]*([0-9]{5,5}).*$;
will match only exactly 5 digit runs. :-)(1) : faster than calling an external tool for each short texts. Not faster than doing all processing inside sed or awk for large files.
codist ,May 6, 2011 at 12:50
Here's a prefix-suffix solution (similar to the solutions given by JB and Darron) that matches the first block of digits and does not depend on the surrounding underscores:str='someletters_12345_morele34ters.ext' s1="${str#"${str%%[[:digit:]]*}"}" # strip off non-digit prefix from str s2="${s1%%[^[:digit:]]*}" # strip off non-digit suffix from s1 echo "$s2" # 12345Campa ,Oct 21, 2016 at 8:12
I lovesed
's capability to deal with regex groups:> var="someletters_12345_moreletters.ext" > digits=$( echo $var | sed "s/.*_\([0-9]\+\).*/\1/p" -n ) > echo $digits 12345A slightly more general option would be not to assume that you have an underscore
_
marking the start of your digits sequence, hence for instance stripping off all non-numbers you get before your sequence:s/[^0-9]\+\([0-9]\+\).*/\1/p
.
> man sed | grep s/regexp/replacement -A 2 s/regexp/replacement/ Attempt to match regexp against the pattern space. If successful, replace that portion matched with replacement. The replacement may contain the special character & to refer to that portion of the pattern space which matched, and the special escapes \1 through \9 to refer to the corresponding matching sub-expressions in the regexp.
More on this, in case you're not too confident with regexps:
s
is for _s_ubstitute[0-9]+
matches 1+ digits\1
links to the group n.1 of the regex output (group 0 is the whole match, group 1 is the match within parentheses in this case)p
flag is for _p_rintingAll escapes
\
are there to makesed
's regexp processing work.Dan Dascalescu ,May 8 at 18:28
Given test.txt is a file containing "ABCDEFGHIJKLMNOPQRSTUVWXYZ"cut -b19-20 test.txt > test1.txt # This will extract chars 19 & 20 "ST" while read -r; do; > x=$REPLY > done < test1.txt echo $x STAlex Raj Kaliamoorthy ,Jul 29, 2016 at 7:41
My answer will have more control on what you want out of your string. Here is the code on how you can extract12345
out of your stringstr="someletters_12345_moreleters.ext" str=${str#*_} str=${str%_more*} echo $strThis will be more efficient if you want to extract something that has any chars like
abc
or any special characters like_
or-
. For example: If your string is like this and you want everything that is aftersomeletters_
and before_moreleters.ext
:str="someletters_123-45-24a&13b-1_moreleters.ext"With my code you can mention what exactly you want. Explanation:
#*
It will remove the preceding string including the matching key. Here the key we mentioned is_
%
It will remove the following string including the matching key. Here the key we mentioned is '_more*'Do some experiments yourself and you would find this interesting.
Dan Dascalescu ,May 8 at 18:27
similar to substr('abcdefg', 2-1, 3) in php:echo 'abcdefg'|tail -c +2|head -c 3olibre ,Nov 25, 2015 at 14:50
Ok, here goes pure Parameter Substitution with an empty string. Caveat is that I have defined someletters and moreletters as only characters. If they are alphanumeric, this will not work as it is.filename=someletters_12345_moreletters.ext substring=${filename//@(+([a-z])_|_+([a-z]).*)} echo $substring 12345gniourf_gniourf ,Jun 4 at 17:33
There's also the bash builtin 'expr' command:INPUT="someletters_12345_moreleters.ext" SUBSTRING=`expr match "$INPUT" '.*_\([[:digit:]]*\)_.*' ` echo $SUBSTRINGrussell ,Aug 1, 2013 at 8:12
A little late, but I just ran across this problem and found the following:host:/tmp$ asd=someletters_12345_moreleters.ext host:/tmp$ echo `expr $asd : '.*_\(.*\)_'` 12345 host:/tmp$I used it to get millisecond resolution on an embedded system that does not have %N for date:
set `grep "now at" /proc/timer_list` nano=$3 fraction=`expr $nano : '.*\(...\)......'` $debug nano is $nano, fraction is $fraction> ,Aug 5, 2018 at 17:13
A bash solution:IFS="_" read -r x digs x <<<'someletters_12345_moreleters.ext'This will clobber a variable called
x
. The varx
could be changed to the var_
.input='someletters_12345_moreleters.ext' IFS="_" read -r _ digs _ <<<"$input"
Sep 08, 2019 | stackoverflow.com
Ask Question Asked 9 years, 4 months ago Active 2 months ago Viewed 226k times 238 127
Mark Byers ,Apr 25, 2010 at 19:20
Can anyone recommend a safe solution to recursively replace spaces with underscores in file and directory names starting from a given root directory? For example:$ tree . |-- a dir | `-- file with spaces.txt `-- b dir |-- another file with spaces.txt `-- yet another file with spaces.pdfbecomes:
$ tree . |-- a_dir | `-- file_with_spaces.txt `-- b_dir |-- another_file_with_spaces.txt `-- yet_another_file_with_spaces.pdfJürgen Hötzel ,Nov 4, 2015 at 3:03
Userename
(akaprename
) which is a Perl script which may be on your system already. Do it in two steps:find -name "* *" -type d | rename 's/ /_/g' # do the directories first find -name "* *" -type f | rename 's/ /_/g'Based on Jürgen's answer and able to handle multiple layers of files and directories in a single bound using the "Revision 1.5 1998/12/18 16:16:31 rmb1" version of
/usr/bin/rename
(a Perl script):find /tmp/ -depth -name "* *" -execdir rename 's/ /_/g' "{}" \;oevna ,Jan 1, 2016 at 8:25
I use:for f in *\ *; do mv "$f" "${f// /_}"; doneThough it's not recursive, it's quite fast and simple. I'm sure someone here could update it to be recursive.
The
${f// /_}
part utilizes bash's parameter expansion mechanism to replace a pattern within a parameter with supplied string. The relevant syntax is${parameter/pattern/string}
. See: https://www.gnu.org/software/bash/manual/html_node/Shell-Parameter-Expansion.html or http://wiki.bash-hackers.org/syntax/pe .armandino ,Dec 3, 2013 at 20:51
find . -depth -name '* *' \ | while IFS= read -r f ; do mv -i "$f" "$(dirname "$f")/$(basename "$f"|tr ' ' _)" ; donefailed to get it right at first, because I didn't think of directories.
Edmund Elmer ,Jul 3 at 7:12
you can usedetox
by Doug Harpledetox -r <folder>Dennis Williamson ,Mar 22, 2012 at 20:33
A find/rename solution. rename is part of util-linux.You need to descend depth first, because a whitespace filename can be part of a whitespace directory:
find /tmp/ -depth -name "* *" -execdir rename " " "_" "{}" ";"armandino ,Apr 26, 2010 at 11:49
bash 4.0#!/bin/bash shopt -s globstar for file in **/*\ * do mv "$file" "${file// /_}" doneItamar ,Jan 31, 2013 at 21:27
you can use this:find . -name '* *' | while read fname do new_fname=`echo $fname | tr " " "_"` if [ -e $new_fname ] then echo "File $new_fname already exists. Not replacing $fname" else echo "Creating new file $new_fname to replace $fname" mv "$fname" $new_fname fi doneyabt ,Apr 26, 2010 at 14:54
Here's a (quite verbose) find -exec solution which writes "file already exists" warnings to stderr:function trspace() { declare dir name bname dname newname replace_char [ $# -lt 1 -o $# -gt 2 ] && { echo "usage: trspace dir char"; return 1; } dir="${1}" replace_char="${2:-_}" find "${dir}" -xdev -depth -name $'*[ \t\r\n\v\f]*' -exec bash -c ' for ((i=1; i<=$#; i++)); do name="${@:i:1}" dname="${name%/*}" bname="${name##*/}" newname="${dname}/${bname//[[:space:]]/${0}}" if [[ -e "${newname}" ]]; then echo "Warning: file already exists: ${newname}" 1>&2 else mv "${name}" "${newname}" fi done ' "${replace_char}" '{}' + } trspace rootdir _degi ,Aug 8, 2011 at 9:10
This one does a little bit more. I use it to rename my downloaded torrents (no special characters (non-ASCII), spaces, multiple dots, etc.).#!/usr/bin/perl &rena(`find . -type d`); &rena(`find . -type f`); sub rena { ($elems)=@_; @t=split /\n/,$elems; for $e (@t) { $_=$e; # remove ./ of find s/^\.\///; # non ascii transliterate tr [\200-\377][_]; tr [\000-\40][_]; # special characters we do not want in paths s/[ \-\,\;\?\+\'\"\!\[\]\(\)\@\#]/_/g; # multiple dots except for extension while (/\..*\./) { s/\./_/; } # only one _ consecutive s/_+/_/g; next if ($_ eq $e ) or ("./$_" eq $e); print "$e -> $_\n"; rename ($e,$_); } }Junyeop Lee ,Apr 10, 2018 at 9:44
Recursive version of Naidim's Answers.find . -name "* *" | awk '{ print length, $0 }' | sort -nr -s | cut -d" " -f2- | while read f; do base=$(basename "$f"); newbase="${base// /_}"; mv "$(dirname "$f")/$(basename "$f")" "$(dirname "$f")/$newbase"; doneghoti ,Dec 5, 2016 at 21:16
I found around this script, it may be interesting :)IFS=$'\n';for f in `find .`; do file=$(echo $f | tr [:blank:] '_'); [ -e $f ] && [ ! -e $file ] && mv "$f" $file;done;unset IFSghoti ,Dec 5, 2016 at 21:17
Here's a reasonably sized bash script solution#!/bin/bash ( IFS=$'\n' for y in $(ls $1) do mv $1/`echo $y | sed 's/ /\\ /g'` $1/`echo "$y" | sed 's/ /_/g'` done )user1060059 ,Nov 22, 2011 at 15:15
This only finds files inside the current directory and renames them . I have this aliased.
find ./ -name "* *" -type f -d 1 | perl -ple '$file = $_; $file =~ s/\s+/_/g; rename($_, $file);
Hongtao ,Sep 26, 2014 at 19:30
I just make one for my own purpose. You may can use it as reference.#!/bin/bash cd /vzwhome/c0cheh1/dev_source/UB_14_8 for file in * do echo $file cd "/vzwhome/c0cheh1/dev_source/UB_14_8/$file/Configuration/$file" echo "==> `pwd`" for subfile in *\ *; do [ -d "$subfile" ] && ( mv "$subfile" "$(echo $subfile | sed -e 's/ /_/g')" ); done ls cd /vzwhome/c0cheh1/dev_source/UB_14_8 doneMarcos Jean Sampaio ,Dec 5, 2016 at 20:56
For files in folder named /filesfor i in `IFS="";find /files -name *\ *` do echo $i done > /tmp/list while read line do mv "$line" `echo $line | sed 's/ /_/g'` done < /tmp/list rm /tmp/listMuhammad Annaqeeb ,Sep 4, 2017 at 11:03
For those struggling through this using macOS, first install all the tools:brew install tree findutils renameThen when needed to rename, make an alias for GNU find (gfind) as find. Then run the code of @Michel Krelin:
alias find=gfind find . -depth -name '* *' \ | while IFS= read -r f ; do mv -i "$f" "$(dirname "$f")/$(basename "$f"|tr ' ' _)" ; done
Mar 25, 2019 | linuxize.com
https://acdn.adnxs.com/ib/static/usersync/v3/async_usersync.html
https://bh.contextweb.com/visitormatch
Concatenating Strings with the += OperatorAnother way of concatenating strings in bash is by appending variables or literal strings to a variable using the
+=
operator:VAR1="Hello, " VAR1+=" World" echo "$VAR1"Hello, WorldThe following example is using the
languages.sh+=
operator to concatenate strings in bash for loop :VAR="" for ELEMENT in 'Hydrogen' 'Helium' 'Lithium' 'Beryllium'; do VAR+="${ELEMENT} " done echo "$VAR"
May 14, 2012 | stackoverflow.com
Lgn ,May 14, 2012 at 15:15
In a Bash script I would like to split a line into pieces and store them in an array.The line:
Paris, France, EuropeI would like to have them in an array like this:
array[0] = Paris array[1] = France array[2] = EuropeI would like to use simple code, the command's speed doesn't matter. How can I do it?
antak ,Jun 18, 2018 at 9:22
This is #1 Google hit but there's controversy in the answer because the question unfortunately asks about delimiting on,
(comma-space) and not a single character such as comma. If you're only interested in the latter, answers here are easier to follow: stackoverflow.com/questions/918886/ – antak Jun 18 '18 at 9:22Dennis Williamson ,May 14, 2012 at 15:16
IFS=', ' read -r -a array <<< "$string"Note that the characters in
$IFS
are treated individually as separators so that in this case fields may be separated by either a comma or a space rather than the sequence of the two characters. Interestingly though, empty fields aren't created when comma-space appears in the input because the space is treated specially.To access an individual element:
echo "${array[0]}"To iterate over the elements:
for element in "${array[@]}" do echo "$element" doneTo get both the index and the value:
for index in "${!array[@]}" do echo "$index ${array[index]}" doneThe last example is useful because Bash arrays are sparse. In other words, you can delete an element or add an element and then the indices are not contiguous.
unset "array[1]" array[42]=EarthTo get the number of elements in an array:
echo "${#array[@]}"As mentioned above, arrays can be sparse so you shouldn't use the length to get the last element. Here's how you can in Bash 4.2 and later:
echo "${array[-1]}"in any version of Bash (from somewhere after 2.05b):
echo "${array[@]: -1:1}"Larger negative offsets select farther from the end of the array. Note the space before the minus sign in the older form. It is required.
l0b0 ,May 14, 2012 at 15:24
Just useIFS=', '
, then you don't have to remove the spaces separately. Test:IFS=', ' read -a array <<< "Paris, France, Europe"; echo "${array[@]}"
– l0b0 May 14 '12 at 15:24Dennis Williamson ,May 14, 2012 at 16:33
@l0b0: Thanks. I don't know what I was thinking. I like to usedeclare -p array
for test output, by the way. – Dennis Williamson May 14 '12 at 16:33Nathan Hyde ,Mar 16, 2013 at 21:09
@Dennis Williamson - Awesome, thorough answer. – Nathan Hyde Mar 16 '13 at 21:09dsummersl ,Aug 9, 2013 at 14:06
MUCH better than multiplecut -f
calls! – dsummersl Aug 9 '13 at 14:06caesarsol ,Oct 29, 2015 at 14:45
Warning: the IFS variable means split by one of these characters , so it's not a sequence of chars to split by.IFS=', ' read -a array <<< "a,d r s,w"
=>${array[*]} == a d r s w
– caesarsol Oct 29 '15 at 14:45Jim Ho ,Mar 14, 2013 at 2:20
Here is a way without setting IFS:string="1:2:3:4:5" set -f # avoid globbing (expansion of *). array=(${string//:/ }) for i in "${!array[@]}" do echo "$i=>${array[i]}" doneThe idea is using string replacement:
${string//substring/replacement}to replace all matches of $substring with white space and then using the substituted string to initialize a array:
(element1 element2 ... elementN)Note: this answer makes use of the split+glob operator . Thus, to prevent expansion of some characters (such as
*
) it is a good idea to pause globbing for this script.Werner Lehmann ,May 4, 2013 at 22:32
Used this approach... until I came across a long string to split. 100% CPU for more than a minute (then I killed it). It's a pity because this method allows to split by a string, not some character in IFS. – Werner Lehmann May 4 '13 at 22:32Dieter Gribnitz ,Sep 2, 2014 at 15:46
WARNING: Just ran into a problem with this approach. If you have an element named * you will get all the elements of your cwd as well. thus string="1:2:3:4:*" will give some unexpected and possibly dangerous results depending on your implementation. Did not get the same error with (IFS=', ' read -a array <<< "$string") and this one seems safe to use. – Dieter Gribnitz Sep 2 '14 at 15:46akostadinov ,Nov 6, 2014 at 14:31
not reliable for many kinds of values, use with care – akostadinov Nov 6 '14 at 14:31Andrew White ,Jun 1, 2016 at 11:44
quoting${string//:/ }
prevents shell expansion – Andrew White Jun 1 '16 at 11:44Mark Thomson ,Jun 5, 2016 at 20:44
I had to use the following on OSX:array=(${string//:/ })
– Mark Thomson Jun 5 '16 at 20:44bgoldst ,Jul 19, 2017 at 21:20
All of the answers to this question are wrong in one way or another.
IFS=', ' read -r -a array <<< "$string"1: This is a misuse of
$IFS
. The value of the$IFS
variable is not taken as a single variable-length string separator, rather it is taken as a set of single-character string separators, where each field thatread
splits off from the input line can be terminated by any character in the set (comma or space, in this example).Actually, for the real sticklers out there, the full meaning of
$IFS
is slightly more involved. From the bash manual :The shell treats each character of IFS as a delimiter, and splits the results of the other expansions into words using these characters as field terminators. If IFS is unset, or its value is exactly <space><tab><newline> , the default, then sequences of <space> , <tab> , and <newline> at the beginning and end of the results of the previous expansions are ignored, and any sequence of IFS characters not at the beginning or end serves to delimit words. If IFS has a value other than the default, then sequences of the whitespace characters <space> , <tab> , and <newline> are ignored at the beginning and end of the word, as long as the whitespace character is in the value of IFS (an IFS whitespace character). Any character in IFS that is not IFS whitespace, along with any adjacent IFS whitespace characters, delimits a field. A sequence of IFS whitespace characters is also treated as a delimiter. If the value of IFS is null, no word splitting occurs.
Basically, for non-default non-null values of
$IFS
, fields can be separated with either (1) a sequence of one or more characters that are all from the set of "IFS whitespace characters" (that is, whichever of <space> , <tab> , and <newline> ("newline" meaning line feed (LF) ) are present anywhere in$IFS
), or (2) any non-"IFS whitespace character" that's present in$IFS
along with whatever "IFS whitespace characters" surround it in the input line.For the OP, it's possible that the second separation mode I described in the previous paragraph is exactly what he wants for his input string, but we can be pretty confident that the first separation mode I described is not correct at all. For example, what if his input string was
'Los Angeles, United States, North America'
?IFS=', ' read -ra a <<<'Los Angeles, United States, North America'; declare -p a; ## declare -a a=([0]="Los" [1]="Angeles" [2]="United" [3]="States" [4]="North" [5]="America")2: Even if you were to use this solution with a single-character separator (such as a comma by itself, that is, with no following space or other baggage), if the value of the
$string
variable happens to contain any LFs, thenread
will stop processing once it encounters the first LF. Theread
builtin only processes one line per invocation. This is true even if you are piping or redirecting input only to theread
statement, as we are doing in this example with the here-string mechanism, and thus unprocessed input is guaranteed to be lost. The code that powers theread
builtin has no knowledge of the data flow within its containing command structure.You could argue that this is unlikely to cause a problem, but still, it's a subtle hazard that should be avoided if possible. It is caused by the fact that the
read
builtin actually does two levels of input splitting: first into lines, then into fields. Since the OP only wants one level of splitting, this usage of theread
builtin is not appropriate, and we should avoid it.3: A non-obvious potential issue with this solution is that
read
always drops the trailing field if it is empty, although it preserves empty fields otherwise. Here's a demo:string=', , a, , b, c, , , '; IFS=', ' read -ra a <<<"$string"; declare -p a; ## declare -a a=([0]="" [1]="" [2]="a" [3]="" [4]="b" [5]="c" [6]="" [7]="")Maybe the OP wouldn't care about this, but it's still a limitation worth knowing about. It reduces the robustness and generality of the solution.
This problem can be solved by appending a dummy trailing delimiter to the input string just prior to feeding it to
read
, as I will demonstrate later.
string="1:2:3:4:5" set -f # avoid globbing (expansion of *). array=(${string//:/ })t="one,two,three" a=($(echo $t | tr ',' "\n"))(Note: I added the missing parentheses around the command substitution which the answerer seems to have omitted.)
string="1,2,3,4" array=(`echo $string | sed 's/,/\n/g'`)These solutions leverage word splitting in an array assignment to split the string into fields. Funnily enough, just like
read
, general word splitting also uses the$IFS
special variable, although in this case it is implied that it is set to its default value of <space><tab><newline> , and therefore any sequence of one or more IFS characters (which are all whitespace characters now) is considered to be a field delimiter.This solves the problem of two levels of splitting committed by
read
, since word splitting by itself constitutes only one level of splitting. But just as before, the problem here is that the individual fields in the input string can already contain$IFS
characters, and thus they would be improperly split during the word splitting operation. This happens to not be the case for any of the sample input strings provided by these answerers (how convenient...), but of course that doesn't change the fact that any code base that used this idiom would then run the risk of blowing up if this assumption were ever violated at some point down the line. Once again, consider my counterexample of'Los Angeles, United States, North America'
(or'Los Angeles:United States:North America'
).Also, word splitting is normally followed by filename expansion ( aka pathname expansion aka globbing), which, if done, would potentially corrupt words containing the characters
*
,?
, or[
followed by]
(and, ifextglob
is set, parenthesized fragments preceded by?
,*
,+
,@
, or!
) by matching them against file system objects and expanding the words ("globs") accordingly. The first of these three answerers has cleverly undercut this problem by runningset -f
beforehand to disable globbing. Technically this works (although you should probably addset +f
afterward to reenable globbing for subsequent code which may depend on it), but it's undesirable to have to mess with global shell settings in order to hack a basic string-to-array parsing operation in local code.Another issue with this answer is that all empty fields will be lost. This may or may not be a problem, depending on the application.
Note: If you're going to use this solution, it's better to use the
${string//:/ }
"pattern substitution" form of parameter expansion , rather than going to the trouble of invoking a command substitution (which forks the shell), starting up a pipeline, and running an external executable (tr
orsed
), since parameter expansion is purely a shell-internal operation. (Also, for thetr
andsed
solutions, the input variable should be double-quoted inside the command substitution; otherwise word splitting would take effect in theecho
command and potentially mess with the field values. Also, the$(...)
form of command substitution is preferable to the old`...`
form since it simplifies nesting of command substitutions and allows for better syntax highlighting by text editors.)
str="a, b, c, d" # assuming there is a space after ',' as in Q arr=(${str//,/}) # delete all occurrences of ','This answer is almost the same as #2 . The difference is that the answerer has made the assumption that the fields are delimited by two characters, one of which being represented in the default
$IFS
, and the other not. He has solved this rather specific case by removing the non-IFS-represented character using a pattern substitution expansion and then using word splitting to split the fields on the surviving IFS-represented delimiter character.This is not a very generic solution. Furthermore, it can be argued that the comma is really the "primary" delimiter character here, and that stripping it and then depending on the space character for field splitting is simply wrong. Once again, consider my counterexample:
'Los Angeles, United States, North America'
.Also, again, filename expansion could corrupt the expanded words, but this can be prevented by temporarily disabling globbing for the assignment with
set -f
and thenset +f
.Also, again, all empty fields will be lost, which may or may not be a problem depending on the application.
string='first line second line third line' oldIFS="$IFS" IFS=' ' IFS=${IFS:0:1} # this is useful to format your code with tabs lines=( $string ) IFS="$oldIFS"This is similar to #2 and #3 in that it uses word splitting to get the job done, only now the code explicitly sets
$IFS
to contain only the single-character field delimiter present in the input string. It should be repeated that this cannot work for multicharacter field delimiters such as the OP's comma-space delimiter. But for a single-character delimiter like the LF used in this example, it actually comes close to being perfect. The fields cannot be unintentionally split in the middle as we saw with previous wrong answers, and there is only one level of splitting, as required.One problem is that filename expansion will corrupt affected words as described earlier, although once again this can be solved by wrapping the critical statement in
set -f
andset +f
.Another potential problem is that, since LF qualifies as an "IFS whitespace character" as defined earlier, all empty fields will be lost, just as in #2 and #3 . This would of course not be a problem if the delimiter happens to be a non-"IFS whitespace character", and depending on the application it may not matter anyway, but it does vitiate the generality of the solution.
So, to sum up, assuming you have a one-character delimiter, and it is either a non-"IFS whitespace character" or you don't care about empty fields, and you wrap the critical statement in
set -f
andset +f
, then this solution works, but otherwise not.(Also, for information's sake, assigning a LF to a variable in bash can be done more easily with the
$'...'
syntax, e.g.IFS=$'\n';
.)
countries='Paris, France, Europe' OIFS="$IFS" IFS=', ' array=($countries) IFS="$OIFS"IFS=', ' eval 'array=($string)'This solution is effectively a cross between #1 (in that it sets
$IFS
to comma-space) and #2-4 (in that it uses word splitting to split the string into fields). Because of this, it suffers from most of the problems that afflict all of the above wrong answers, sort of like the worst of all worlds.Also, regarding the second variant, it may seem like the
eval
call is completely unnecessary, since its argument is a single-quoted string literal, and therefore is statically known. But there's actually a very non-obvious benefit to usingeval
in this way. Normally, when you run a simple command which consists of a variable assignment only , meaning without an actual command word following it, the assignment takes effect in the shell environment:IFS=', '; ## changes $IFS in the shell environmentThis is true even if the simple command involves multiple variable assignments; again, as long as there's no command word, all variable assignments affect the shell environment:
IFS=', ' array=($countries); ## changes both $IFS and $array in the shell environmentBut, if the variable assignment is attached to a command name (I like to call this a "prefix assignment") then it does not affect the shell environment, and instead only affects the environment of the executed command, regardless whether it is a builtin or external:
IFS=', ' :; ## : is a builtin command, the $IFS assignment does not outlive it IFS=', ' env; ## env is an external command, the $IFS assignment does not outlive itRelevant quote from the bash manual :
If no command name results, the variable assignments affect the current shell environment. Otherwise, the variables are added to the environment of the executed command and do not affect the current shell environment.
It is possible to exploit this feature of variable assignment to change
$IFS
only temporarily, which allows us to avoid the whole save-and-restore gambit like that which is being done with the$OIFS
variable in the first variant. But the challenge we face here is that the command we need to run is itself a mere variable assignment, and hence it would not involve a command word to make the$IFS
assignment temporary. You might think to yourself, well why not just add a no-op command word to the statement like the: builtin
to make the$IFS
assignment temporary? This does not work because it would then make the$array
assignment temporary as well:IFS=', ' array=($countries) :; ## fails; new $array value never escapes the : commandSo, we're effectively at an impasse, a bit of a catch-22. But, when
eval
runs its code, it runs it in the shell environment, as if it was normal, static source code, and therefore we can run the$array
assignment inside theeval
argument to have it take effect in the shell environment, while the$IFS
prefix assignment that is prefixed to theeval
command will not outlive theeval
command. This is exactly the trick that is being used in the second variant of this solution:IFS=', ' eval 'array=($string)'; ## $IFS does not outlive the eval command, but $array doesSo, as you can see, it's actually quite a clever trick, and accomplishes exactly what is required (at least with respect to assignment effectation) in a rather non-obvious way. I'm actually not against this trick in general, despite the involvement of
eval
; just be careful to single-quote the argument string to guard against security threats.But again, because of the "worst of all worlds" agglomeration of problems, this is still a wrong answer to the OP's requirement.
IFS=', '; array=(Paris, France, Europe) IFS=' ';declare -a array=(Paris France Europe)Um... what? The OP has a string variable that needs to be parsed into an array. This "answer" starts with the verbatim contents of the input string pasted into an array literal. I guess that's one way to do it.
It looks like the answerer may have assumed that the
$IFS
variable affects all bash parsing in all contexts, which is not true. From the bash manual:IFS The Internal Field Separator that is used for word splitting after expansion and to split lines into words with the read builtin command. The default value is <space><tab><newline> .
So the
$IFS
special variable is actually only used in two contexts: (1) word splitting that is performed after expansion (meaning not when parsing bash source code) and (2) for splitting input lines into words by theread
builtin.Let me try to make this clearer. I think it might be good to draw a distinction between parsing and execution . Bash must first parse the source code, which obviously is a parsing event, and then later it executes the code, which is when expansion comes into the picture. Expansion is really an execution event. Furthermore, I take issue with the description of the
$IFS
variable that I just quoted above; rather than saying that word splitting is performed after expansion , I would say that word splitting is performed during expansion, or, perhaps even more precisely, word splitting is part of the expansion process. The phrase "word splitting" refers only to this step of expansion; it should never be used to refer to the parsing of bash source code, although unfortunately the docs do seem to throw around the words "split" and "words" a lot. Here's a relevant excerpt from the linux.die.net version of the bash manual:Expansion is performed on the command line after it has been split into words. There are seven kinds of expansion performed: brace expansion , tilde expansion , parameter and variable expansion , command substitution , arithmetic expansion , word splitting , and pathname expansion .
The order of expansions is: brace expansion; tilde expansion, parameter and variable expansion, arithmetic expansion, and command substitution (done in a left-to-right fashion); word splitting; and pathname expansion.
You could argue the GNU version of the manual does slightly better, since it opts for the word "tokens" instead of "words" in the first sentence of the Expansion section:
Expansion is performed on the command line after it has been split into tokens.
The important point is,
$IFS
does not change the way bash parses source code. Parsing of bash source code is actually a very complex process that involves recognition of the various elements of shell grammar, such as command sequences, command lists, pipelines, parameter expansions, arithmetic substitutions, and command substitutions. For the most part, the bash parsing process cannot be altered by user-level actions like variable assignments (actually, there are some minor exceptions to this rule; for example, see the variouscompatxx
shell settings , which can change certain aspects of parsing behavior on-the-fly). The upstream "words"/"tokens" that result from this complex parsing process are then expanded according to the general process of "expansion" as broken down in the above documentation excerpts, where word splitting of the expanded (expanding?) text into downstream words is simply one step of that process. Word splitting only touches text that has been spit out of a preceding expansion step; it does not affect literal text that was parsed right off the source bytestream.
string='first line second line third line' while read -r line; do lines+=("$line"); done <<<"$string"This is one of the best solutions. Notice that we're back to using
read
. Didn't I say earlier thatread
is inappropriate because it performs two levels of splitting, when we only need one? The trick here is that you can callread
in such a way that it effectively only does one level of splitting, specifically by splitting off only one field per invocation, which necessitates the cost of having to call it repeatedly in a loop. It's a bit of a sleight of hand, but it works.But there are problems. First: When you provide at least one NAME argument to
read
, it automatically ignores leading and trailing whitespace in each field that is split off from the input string. This occurs whether$IFS
is set to its default value or not, as described earlier in this post. Now, the OP may not care about this for his specific use-case, and in fact, it may be a desirable feature of the parsing behavior. But not everyone who wants to parse a string into fields will want this. There is a solution, however: A somewhat non-obvious usage ofread
is to pass zero NAME arguments. In this case,read
will store the entire input line that it gets from the input stream in a variable named$REPLY
, and, as a bonus, it does not strip leading and trailing whitespace from the value. This is a very robust usage ofread
which I've exploited frequently in my shell programming career. Here's a demonstration of the difference in behavior:string=$' a b \n c d \n e f '; ## input string a=(); while read -r line; do a+=("$line"); done <<<"$string"; declare -p a; ## declare -a a=([0]="a b" [1]="c d" [2]="e f") ## read trimmed surrounding whitespace a=(); while read -r; do a+=("$REPLY"); done <<<"$string"; declare -p a; ## declare -a a=([0]=" a b " [1]=" c d " [2]=" e f ") ## no trimmingThe second issue with this solution is that it does not actually address the case of a custom field separator, such as the OP's comma-space. As before, multicharacter separators are not supported, which is an unfortunate limitation of this solution. We could try to at least split on comma by specifying the separator to the
-d
option, but look what happens:string='Paris, France, Europe'; a=(); while read -rd,; do a+=("$REPLY"); done <<<"$string"; declare -p a; ## declare -a a=([0]="Paris" [1]=" France")Predictably, the unaccounted surrounding whitespace got pulled into the field values, and hence this would have to be corrected subsequently through trimming operations (this could also be done directly in the while-loop). But there's another obvious error: Europe is missing! What happened to it? The answer is that
read
returns a failing return code if it hits end-of-file (in this case we can call it end-of-string) without encountering a final field terminator on the final field. This causes the while-loop to break prematurely and we lose the final field.Technically this same error afflicted the previous examples as well; the difference there is that the field separator was taken to be LF, which is the default when you don't specify the
-d
option, and the<<<
("here-string") mechanism automatically appends a LF to the string just before it feeds it as input to the command. Hence, in those cases, we sort of accidentally solved the problem of a dropped final field by unwittingly appending an additional dummy terminator to the input. Let's call this solution the "dummy-terminator" solution. We can apply the dummy-terminator solution manually for any custom delimiter by concatenating it against the input string ourselves when instantiating it in the here-string:a=(); while read -rd,; do a+=("$REPLY"); done <<<"$string,"; declare -p a; declare -a a=([0]="Paris" [1]=" France" [2]=" Europe")There, problem solved. Another solution is to only break the while-loop if both (1)
read
returned failure and (2)$REPLY
is empty, meaningread
was not able to read any characters prior to hitting end-of-file. Demo:a=(); while read -rd,|| [[ -n "$REPLY" ]]; do a+=("$REPLY"); done <<<"$string"; declare -p a; ## declare -a a=([0]="Paris" [1]=" France" [2]=$' Europe\n')This approach also reveals the secretive LF that automatically gets appended to the here-string by the
<<<
redirection operator. It could of course be stripped off separately through an explicit trimming operation as described a moment ago, but obviously the manual dummy-terminator approach solves it directly, so we could just go with that. The manual dummy-terminator solution is actually quite convenient in that it solves both of these two problems (the dropped-final-field problem and the appended-LF problem) in one go.So, overall, this is quite a powerful solution. It's only remaining weakness is a lack of support for multicharacter delimiters, which I will address later.
string='first line second line third line' readarray -t lines <<<"$string"(This is actually from the same post as #7 ; the answerer provided two solutions in the same post.)
The
readarray
builtin, which is a synonym formapfile
, is ideal. It's a builtin command which parses a bytestream into an array variable in one shot; no messing with loops, conditionals, substitutions, or anything else. And it doesn't surreptitiously strip any whitespace from the input string. And (if-O
is not given) it conveniently clears the target array before assigning to it. But it's still not perfect, hence my criticism of it as a "wrong answer".First, just to get this out of the way, note that, just like the behavior of
read
when doing field-parsing,readarray
drops the trailing field if it is empty. Again, this is probably not a concern for the OP, but it could be for some use-cases. I'll come back to this in a moment.Second, as before, it does not support multicharacter delimiters. I'll give a fix for this in a moment as well.
Third, the solution as written does not parse the OP's input string, and in fact, it cannot be used as-is to parse it. I'll expand on this momentarily as well.
For the above reasons, I still consider this to be a "wrong answer" to the OP's question. Below I'll give what I consider to be the right answer.
Right answer
Here's a naïve attempt to make #8 work by just specifying the
-d
option:string='Paris, France, Europe'; readarray -td, a <<<"$string"; declare -p a; ## declare -a a=([0]="Paris" [1]=" France" [2]=$' Europe\n')We see the result is identical to the result we got from the double-conditional approach of the looping
read
solution discussed in #7 . We can almost solve this with the manual dummy-terminator trick:readarray -td, a <<<"$string,"; declare -p a; ## declare -a a=([0]="Paris" [1]=" France" [2]=" Europe" [3]=$'\n')The problem here is that
readarray
preserved the trailing field, since the<<<
redirection operator appended the LF to the input string, and therefore the trailing field was not empty (otherwise it would've been dropped). We can take care of this by explicitly unsetting the final array element after-the-fact:readarray -td, a <<<"$string,"; unset 'a[-1]'; declare -p a; ## declare -a a=([0]="Paris" [1]=" France" [2]=" Europe")The only two problems that remain, which are actually related, are (1) the extraneous whitespace that needs to be trimmed, and (2) the lack of support for multicharacter delimiters.
The whitespace could of course be trimmed afterward (for example, see How to trim whitespace from a Bash variable? ). But if we can hack a multicharacter delimiter, then that would solve both problems in one shot.
Unfortunately, there's no direct way to get a multicharacter delimiter to work. The best solution I've thought of is to preprocess the input string to replace the multicharacter delimiter with a single-character delimiter that will be guaranteed not to collide with the contents of the input string. The only character that has this guarantee is the NUL byte . This is because, in bash (though not in zsh, incidentally), variables cannot contain the NUL byte. This preprocessing step can be done inline in a process substitution. Here's how to do it using awk :
readarray -td '' a < <(awk '{ gsub(/, /,"\0"); print; }' <<<"$string, "); unset 'a[-1]'; declare -p a; ## declare -a a=([0]="Paris" [1]="France" [2]="Europe")There, finally! This solution will not erroneously split fields in the middle, will not cut out prematurely, will not drop empty fields, will not corrupt itself on filename expansions, will not automatically strip leading and trailing whitespace, will not leave a stowaway LF on the end, does not require loops, and does not settle for a single-character delimiter.
Trimming solution
Lastly, I wanted to demonstrate my own fairly intricate trimming solution using the obscure
-C callback
option ofreadarray
. Unfortunately, I've run out of room against Stack Overflow's draconian 30,000 character post limit, so I won't be able to explain it. I'll leave that as an exercise for the reader.function mfcb { local val="$4"; "$1"; eval "$2[$3]=\$val;"; }; function val_ltrim { if [[ "$val" =~ ^[[:space:]]+ ]]; then val="${val:${#BASH_REMATCH[0]}}"; fi; }; function val_rtrim { if [[ "$val" =~ [[:space:]]+$ ]]; then val="${val:0:${#val}-${#BASH_REMATCH[0]}}"; fi; }; function val_trim { val_ltrim; val_rtrim; }; readarray -c1 -C 'mfcb val_trim a' -td, <<<"$string,"; unset 'a[-1]'; declare -p a; ## declare -a a=([0]="Paris" [1]="France" [2]="Europe")fbicknel ,Aug 18, 2017 at 15:57
It may also be helpful to note (though understandably you had no room to do so) that the-d
option toreadarray
first appears in Bash 4.4. – fbicknel Aug 18 '17 at 15:57Cyril Duchon-Doris ,Nov 3, 2017 at 9:16
You should add a "TL;DR : scroll 3 pages to see the right solution at the end of my answer" – Cyril Duchon-Doris Nov 3 '17 at 9:16dawg ,Nov 26, 2017 at 22:28
Great answer (+1). If you change your awk toawk '{ gsub(/,[ ]+|$/,"\0"); print }'
and eliminate that concatenation of the final", "
then you don't have to go through the gymnastics on eliminating the final record. So:readarray -td '' a < <(awk '{ gsub(/,[ ]+/,"\0"); print; }' <<<"$string")
on Bash that supportsreadarray
. Note your method is Bash 4.4+ I think because of the-d
inreadarray
– dawg Nov 26 '17 at 22:28datUser ,Feb 22, 2018 at 14:54
Looks likereadarray
is not an available builtin on OSX. – datUser Feb 22 '18 at 14:54bgoldst ,Feb 23, 2018 at 3:37
@datUser That's unfortunate. Your version of bash must be too old forreadarray
. In this case, you can use the second-best solution built onread
. I'm referring to this:a=(); while read -rd,; do a+=("$REPLY"); done <<<"$string,";
(with theawk
substitution if you need multicharacter delimiter support). Let me know if you run into any problems; I'm pretty sure this solution should work on fairly old versions of bash, back to version 2-something, released like two decades ago. – bgoldst Feb 23 '18 at 3:37Jmoney38 ,Jul 14, 2015 at 11:54
t="one,two,three" a=($(echo "$t" | tr ',' '\n')) echo "${a[2]}"Prints three
shrimpwagon ,Oct 16, 2015 at 20:04
I actually prefer this approach. Simple. – shrimpwagon Oct 16 '15 at 20:04Ben ,Oct 31, 2015 at 3:11
I copied and pasted this and it did did not work with echo, but did work when I used it in a for loop. – Ben Oct 31 '15 at 3:11Pinaki Mukherjee ,Nov 9, 2015 at 20:22
This is the simplest approach. thanks – Pinaki Mukherjee Nov 9 '15 at 20:22abalter ,Aug 30, 2016 at 5:13
This does not work as stated. @Jmoney38 or shrimpwagon if you can paste this in a terminal and get the desired output, please paste the result here. – abalter Aug 30 '16 at 5:13leaf ,Jul 17, 2017 at 16:28
@abalter Works for me witha=($(echo $t | tr ',' "\n"))
. Same result witha=($(echo $t | tr ',' ' '))
. – leaf Jul 17 '17 at 16:28Luca Borrione ,Nov 2, 2012 at 13:44
Sometimes it happened to me that the method described in the accepted answer didn't work, especially if the separator is a carriage return.
In those cases I solved in this way:string='first line second line third line' oldIFS="$IFS" IFS=' ' IFS=${IFS:0:1} # this is useful to format your code with tabs lines=( $string ) IFS="$oldIFS" for line in "${lines[@]}" do echo "--> $line" doneStefan van den Akker ,Feb 9, 2015 at 16:52
+1 This completely worked for me. I needed to put multiple strings, divided by a newline, into an array, andread -a arr <<< "$strings"
did not work withIFS=$'\n'
. – Stefan van den Akker Feb 9 '15 at 16:52Stefan van den Akker ,Feb 10, 2015 at 13:49
Here is the answer to make the accepted answer work when the delimiter is a newline . – Stefan van den Akker Feb 10 '15 at 13:49,Jul 24, 2015 at 21:24
The accepted answer works for values in one line.
If the variable has several lines:string='first line second line third line'We need a very different command to get all lines:
while read -r line; do lines+=("$line"); done <<<"$string"
Or the much simpler bash readarray :
readarray -t lines <<<"$string"Printing all lines is very easy taking advantage of a printf feature:
printf ">[%s]\n" "${lines[@]}" >[first line] >[ second line] >[ third line]Mayhem ,Dec 31, 2015 at 3:13
While not every solution works for every situation, your mention of readarray... replaced my last two hours with 5 minutes... you got my vote – Mayhem Dec 31 '15 at 3:13Derek 朕會功夫 ,Mar 23, 2018 at 19:14
readarray
is the right answer. – Derek 朕會功夫 Mar 23 '18 at 19:14ssanch ,Jun 3, 2016 at 15:24
This is similar to the approach by Jmoney38, but using sed:string="1,2,3,4" array=(`echo $string | sed 's/,/\n/g'`) echo ${array[0]}Prints 1
dawg ,Nov 26, 2017 at 19:59
The key to splitting your string into an array is the multi character delimiter of", "
. Any solution usingIFS
for multi character delimiters is inherently wrong since IFS is a set of those characters, not a string.If you assign
IFS=", "
then the string will break on EITHER","
OR" "
or any combination of them which is not an accurate representation of the two character delimiter of", "
.You can use
awk
orsed
to split the string, with process substitution:#!/bin/bash str="Paris, France, Europe" array=() while read -r -d $'\0' each; do # use a NUL terminated field separator array+=("$each") done < <(printf "%s" "$str" | awk '{ gsub(/,[ ]+|$/,"\0"); print }') declare -p array # declare -a array=([0]="Paris" [1]="France" [2]="Europe") outputIt is more efficient to use a regex you directly in Bash:
#!/bin/bash str="Paris, France, Europe" array=() while [[ $str =~ ([^,]+)(,[ ]+|$) ]]; do array+=("${BASH_REMATCH[1]}") # capture the field i=${#BASH_REMATCH} # length of field + delimiter str=${str:i} # advance the string by that length done # the loop deletes $str, so make a copy if needed declare -p array # declare -a array=([0]="Paris" [1]="France" [2]="Europe") output...With the second form, there is no sub shell and it will be inherently faster.
Edit by bgoldst: Here are some benchmarks comparing my
readarray
solution to dawg's regex solution, and I also included theread
solution for the heck of it (note: I slightly modified the regex solution for greater harmony with my solution) (also see my comments below the post):## competitors function c_readarray { readarray -td '' a < <(awk '{ gsub(/, /,"\0"); print; };' <<<"$1, "); unset 'a[-1]'; }; function c_read { a=(); local REPLY=''; while read -r -d ''; do a+=("$REPLY"); done < <(awk '{ gsub(/, /,"\0"); print; };' <<<"$1, "); }; function c_regex { a=(); local s="$1, "; while [[ $s =~ ([^,]+),\ ]]; do a+=("${BASH_REMATCH[1]}"); s=${s:${#BASH_REMATCH}}; done; }; ## helper functions function rep { local -i i=-1; for ((i = 0; i<$1; ++i)); do printf %s "$2"; done; }; ## end rep() function testAll { local funcs=(); local args=(); local func=''; local -i rc=-1; while [[ "$1" != ':' ]]; do func="$1"; if [[ ! "$func" =~ ^[_a-zA-Z][_a-zA-Z0-9]*$ ]]; then echo "bad function name: $func" >&2; return 2; fi; funcs+=("$func"); shift; done; shift; args=("$@"); for func in "${funcs[@]}"; do echo -n "$func "; { time $func "${args[@]}" >/dev/null 2>&1; } 2>&1| tr '\n' '/'; rc=${PIPESTATUS[0]}; if [[ $rc -ne 0 ]]; then echo "[$rc]"; else echo; fi; done| column -ts/; }; ## end testAll() function makeStringToSplit { local -i n=$1; ## number of fields if [[ $n -lt 0 ]]; then echo "bad field count: $n" >&2; return 2; fi; if [[ $n -eq 0 ]]; then echo; elif [[ $n -eq 1 ]]; then echo 'first field'; elif [[ "$n" -eq 2 ]]; then echo 'first field, last field'; else echo "first field, $(rep $[$1-2] 'mid field, ')last field"; fi; }; ## end makeStringToSplit() function testAll_splitIntoArray { local -i n=$1; ## number of fields in input string local s=''; echo "===== $n field$(if [[ $n -ne 1 ]]; then echo 's'; fi;) ====="; s="$(makeStringToSplit "$n")"; testAll c_readarray c_read c_regex : "$s"; }; ## end testAll_splitIntoArray() ## results testAll_splitIntoArray 1; ## ===== 1 field ===== ## c_readarray real 0m0.067s user 0m0.000s sys 0m0.000s ## c_read real 0m0.064s user 0m0.000s sys 0m0.000s ## c_regex real 0m0.000s user 0m0.000s sys 0m0.000s ## testAll_splitIntoArray 10; ## ===== 10 fields ===== ## c_readarray real 0m0.067s user 0m0.000s sys 0m0.000s ## c_read real 0m0.064s user 0m0.000s sys 0m0.000s ## c_regex real 0m0.001s user 0m0.000s sys 0m0.000s ## testAll_splitIntoArray 100; ## ===== 100 fields ===== ## c_readarray real 0m0.069s user 0m0.000s sys 0m0.062s ## c_read real 0m0.065s user 0m0.000s sys 0m0.046s ## c_regex real 0m0.005s user 0m0.000s sys 0m0.000s ## testAll_splitIntoArray 1000; ## ===== 1000 fields ===== ## c_readarray real 0m0.084s user 0m0.031s sys 0m0.077s ## c_read real 0m0.092s user 0m0.031s sys 0m0.046s ## c_regex real 0m0.125s user 0m0.125s sys 0m0.000s ## testAll_splitIntoArray 10000; ## ===== 10000 fields ===== ## c_readarray real 0m0.209s user 0m0.093s sys 0m0.108s ## c_read real 0m0.333s user 0m0.234s sys 0m0.109s ## c_regex real 0m9.095s user 0m9.078s sys 0m0.000s ## testAll_splitIntoArray 100000; ## ===== 100000 fields ===== ## c_readarray real 0m1.460s user 0m0.326s sys 0m1.124s ## c_read real 0m2.780s user 0m1.686s sys 0m1.092s ## c_regex real 17m38.208s user 15m16.359s sys 2m19.375s ##bgoldst ,Nov 27, 2017 at 4:28
Very cool solution! I never thought of using a loop on a regex match, nifty use of$BASH_REMATCH
. It works, and does indeed avoid spawning subshells. +1 from me. However, by way of criticism, the regex itself is a little non-ideal, in that it appears you were forced to duplicate part of the delimiter token (specifically the comma) so as to work around the lack of support for non-greedy multipliers (also lookarounds) in ERE ("extended" regex flavor built into bash). This makes it a little less generic and robust. – bgoldst Nov 27 '17 at 4:28bgoldst ,Nov 27, 2017 at 4:28
Secondly, I did some benchmarking, and although the performance is better than the other solutions for smallish strings, it worsens exponentially due to the repeated string-rebuilding, becoming catastrophic for very large strings. See my edit to your answer. – bgoldst Nov 27 '17 at 4:28dawg ,Nov 27, 2017 at 4:46
@bgoldst: What a cool benchmark! In defense of the regex, for 10's or 100's of thousands of fields (what the regex is splitting) there would probably be some form of record (like\n
delimited text lines) comprising those fields so the catastrophic slow-down would likely not occur. If you have a string with 100,000 fields -- maybe Bash is not ideal ;-) Thanks for the benchmark. I learned a thing or two. – dawg Nov 27 '17 at 4:46Geoff Lee ,Mar 4, 2016 at 6:02
Try thisIFS=', '; array=(Paris, France, Europe) for item in ${array[@]}; do echo $item; doneIt's simple. If you want, you can also add a declare (and also remove the commas):
IFS=' ';declare -a array=(Paris France Europe)The IFS is added to undo the above but it works without it in a fresh bash instance
MrPotatoHead ,Nov 13, 2018 at 13:19
Pure bash multi-character delimiter solution.As others have pointed out in this thread, the OP's question gave an example of a comma delimited string to be parsed into an array, but did not indicate if he/she was only interested in comma delimiters, single character delimiters, or multi-character delimiters.
Since Google tends to rank this answer at or near the top of search results, I wanted to provide readers with a strong answer to the question of multiple character delimiters, since that is also mentioned in at least one response.
If you're in search of a solution to a multi-character delimiter problem, I suggest reviewing Mallikarjun M 's post, in particular the response from gniourf_gniourf who provides this elegant pure BASH solution using parameter expansion:
#!/bin/bash str="LearnABCtoABCSplitABCaABCString" delimiter=ABC s=$str$delimiter array=(); while [[ $s ]]; do array+=( "${s%%"$delimiter"*}" ); s=${s#*"$delimiter"}; done; declare -p arrayLink to cited comment/referenced post
Link to cited question: Howto split a string on a multi-character delimiter in bash?
Eduardo Cuomo ,Dec 19, 2016 at 15:27
Use this:countries='Paris, France, Europe' OIFS="$IFS" IFS=', ' array=($countries) IFS="$OIFS" #${array[1]} == Paris #${array[2]} == France #${array[3]} == Europegniourf_gniourf ,Dec 19, 2016 at 17:22
Bad: subject to word splitting and pathname expansion. Please don't revive old questions with good answers to give bad answers. – gniourf_gniourf Dec 19 '16 at 17:22Scott Weldon ,Dec 19, 2016 at 18:12
This may be a bad answer, but it is still a valid answer. Flaggers / reviewers: For incorrect answers such as this one, downvote, don't delete! – Scott Weldon Dec 19 '16 at 18:12George Sovetov ,Dec 26, 2016 at 17:31
@gniourf_gniourf Could you please explain why it is a bad answer? I really don't understand when it fails. – George Sovetov Dec 26 '16 at 17:31gniourf_gniourf ,Dec 26, 2016 at 18:07
@GeorgeSovetov: As I said, it's subject to word splitting and pathname expansion. More generally, splitting a string into an array asarray=( $string )
is a (sadly very common) antipattern: word splitting occurs:string='Prague, Czech Republic, Europe'
; Pathname expansion occurs:string='foo[abcd],bar[efgh]'
will fail if you have a file named, e.g.,food
orbarf
in your directory. The only valid usage of such a construct is whenstring
is a glob. – gniourf_gniourf Dec 26 '16 at 18:07user1009908 ,Jun 9, 2015 at 23:28
UPDATE: Don't do this, due to problems with eval.With slightly less ceremony:
IFS=', ' eval 'array=($string)'e.g.
string="foo, bar,baz" IFS=', ' eval 'array=($string)' echo ${array[1]} # -> barcaesarsol ,Oct 29, 2015 at 14:42
eval is evil! don't do this. – caesarsol Oct 29 '15 at 14:42user1009908 ,Oct 30, 2015 at 4:05
Pfft. No. If you're writing scripts large enough for this to matter, you're doing it wrong. In application code, eval is evil. In shell scripting, it's common, necessary, and inconsequential. – user1009908 Oct 30 '15 at 4:05caesarsol ,Nov 2, 2015 at 18:19
put a$
in your variable and you'll see... I write many scripts and I never ever had to use a singleeval
– caesarsol Nov 2 '15 at 18:19Dennis Williamson ,Dec 2, 2015 at 17:00
Eval command and security issues – Dennis Williamson Dec 2 '15 at 17:00user1009908 ,Dec 22, 2015 at 23:04
You're right, this is only usable when the input is known to be clean. Not a robust solution. – user1009908 Dec 22 '15 at 23:04Eduardo Lucio ,Jan 31, 2018 at 20:45
Here's my hack!Splitting strings by strings is a pretty boring thing to do using bash. What happens is that we have limited approaches that only work in a few cases (split by ";", "/", "." and so on) or we have a variety of side effects in the outputs.
The approach below has required a number of maneuvers, but I believe it will work for most of our needs!
#!/bin/bash # -------------------------------------- # SPLIT FUNCTION # ---------------- F_SPLIT_R=() f_split() { : 'It does a "split" into a given string and returns an array. Args: TARGET_P (str): Target string to "split". DELIMITER_P (Optional[str]): Delimiter used to "split". If not informed the split will be done by spaces. Returns: F_SPLIT_R (array): Array with the provided string separated by the informed delimiter. ' F_SPLIT_R=() TARGET_P=$1 DELIMITER_P=$2 if [ -z "$DELIMITER_P" ] ; then DELIMITER_P=" " fi REMOVE_N=1 if [ "$DELIMITER_P" == "\n" ] ; then REMOVE_N=0 fi # NOTE: This was the only parameter that has been a problem so far! # By Questor # [Ref.: https://unix.stackexchange.com/a/390732/61742] if [ "$DELIMITER_P" == "./" ] ; then DELIMITER_P="[.]/" fi if [ ${REMOVE_N} -eq 1 ] ; then # NOTE: Due to bash limitations we have some problems getting the # output of a split by awk inside an array and so we need to use # "line break" (\n) to succeed. Seen this, we remove the line breaks # momentarily afterwards we reintegrate them. The problem is that if # there is a line break in the "string" informed, this line break will # be lost, that is, it is erroneously removed in the output! # By Questor TARGET_P=$(awk 'BEGIN {RS="dn"} {gsub("\n", "3F2C417D448C46918289218B7337FCAF"); printf $0}' <<< "${TARGET_P}") fi # NOTE: The replace of "\n" by "3F2C417D448C46918289218B7337FCAF" results # in more occurrences of "3F2C417D448C46918289218B7337FCAF" than the # amount of "\n" that there was originally in the string (one more # occurrence at the end of the string)! We can not explain the reason for # this side effect. The line below corrects this problem! By Questor TARGET_P=${TARGET_P%????????????????????????????????} SPLIT_NOW=$(awk -F"$DELIMITER_P" '{for(i=1; i<=NF; i++){printf "%s\n", $i}}' <<< "${TARGET_P}") while IFS= read -r LINE_NOW ; do if [ ${REMOVE_N} -eq 1 ] ; then # NOTE: We use "'" to prevent blank lines with no other characters # in the sequence being erroneously removed! We do not know the # reason for this side effect! By Questor LN_NOW_WITH_N=$(awk 'BEGIN {RS="dn"} {gsub("3F2C417D448C46918289218B7337FCAF", "\n"); printf $0}' <<< "'${LINE_NOW}'") # NOTE: We use the commands below to revert the intervention made # immediately above! By Questor LN_NOW_WITH_N=${LN_NOW_WITH_N%?} LN_NOW_WITH_N=${LN_NOW_WITH_N#?} F_SPLIT_R+=("$LN_NOW_WITH_N") else F_SPLIT_R+=("$LINE_NOW") fi done <<< "$SPLIT_NOW" } # -------------------------------------- # HOW TO USE # ---------------- STRING_TO_SPLIT=" * How do I list all databases and tables using psql? \" sudo -u postgres /usr/pgsql-9.4/bin/psql -c \"\l\" sudo -u postgres /usr/pgsql-9.4/bin/psql <DB_NAME> -c \"\dt\" \" \" \list or \l: list all databases \dt: list all tables in the current database \" [Ref.: https://dba.stackexchange.com/questions/1285/how-do-i-list-all-databases-and-tables-using-psql] " f_split "$STRING_TO_SPLIT" "bin/psql -c" # -------------------------------------- # OUTPUT AND TEST # ---------------- ARR_LENGTH=${#F_SPLIT_R[*]} for (( i=0; i<=$(( $ARR_LENGTH -1 )); i++ )) ; do echo " > -----------------------------------------" echo "${F_SPLIT_R[$i]}" echo " < -----------------------------------------" done if [ "$STRING_TO_SPLIT" == "${F_SPLIT_R[0]}bin/psql -c${F_SPLIT_R[1]}" ] ; then echo " > -----------------------------------------" echo "The strings are the same!" echo " < -----------------------------------------" fisel-en-ium ,May 31, 2018 at 5:56
Another way to do it without modifying IFS:read -r -a myarray <<< "${string//, /$IFS}"Rather than changing IFS to match our desired delimiter, we can replace all occurrences of our desired delimiter
", "
with contents of$IFS
via"${string//, /$IFS}"
.Maybe this will be slow for very large strings though?
This is based on Dennis Williamson's answer.
rsjethani ,Sep 13, 2016 at 16:21
Another approach can be:str="a, b, c, d" # assuming there is a space after ',' as in Q arr=(${str//,/}) # delete all occurrences of ','After this 'arr' is an array with four strings. This doesn't require dealing IFS or read or any other special stuff hence much simpler and direct.
gniourf_gniourf ,Dec 26, 2016 at 18:12
Same (sadly common) antipattern as other answers: subject to word splitting and filename expansion. – gniourf_gniourf Dec 26 '16 at 18:12Safter Arslan ,Aug 9, 2017 at 3:21
Another way would be:string="Paris, France, Europe" IFS=', ' arr=(${string})Now your elements are stored in "arr" array. To iterate through the elements:
for i in ${arr[@]}; do echo $i; donebgoldst ,Aug 13, 2017 at 22:38
I cover this idea in my answer ; see Wrong answer #5 (you might be especially interested in my discussion of theeval
trick). Your solution leaves$IFS
set to the comma-space value after-the-fact. – bgoldst Aug 13 '17 at 22:38
Nov 08, 2018 | stackoverflow.com
This question already has an answer here:Rob I , May 9, 2012 at 19:22
For your second question, see @mkb's comment to my answer below - that's definitely the way to go! – Rob I May 9 '12 at 19:22Dennis Williamson , Jul 4, 2012 at 16:14
See my edited answer for one way to read individual characters into an array. – Dennis Williamson Jul 4 '12 at 16:14Nick Weedon , Dec 31, 2015 at 11:04
Here is the same thing in a more concise form: var1=$(cut -f1 -d- <<<$STR) – Nick Weedon Dec 31 '15 at 11:04Rob I , May 9, 2012 at 17:00
If your solution doesn't have to be general, i.e. only needs to work for strings like your example, you could do:var1=$(echo $STR | cut -f1 -d-) var2=$(echo $STR | cut -f2 -d-)I chose
cut
here because you could simply extend the code for a few more variables...crunchybutternut , May 9, 2012 at 17:40
Can you look at my post again and see if you have a solution for the followup question? thanks! – crunchybutternut May 9 '12 at 17:40mkb , May 9, 2012 at 17:59
You can usecut
to cut characters too!cut -c1
for example. – mkb May 9 '12 at 17:59FSp , Nov 27, 2012 at 10:26
Although this is very simple to read and write, is a very slow solution because forces you to read twice the same data ($STR) ... if you care of your script performace, the @anubhava solution is much better – FSp Nov 27 '12 at 10:26tripleee , Jan 25, 2016 at 6:47
Apart from being an ugly last-resort solution, this has a bug: You should absolutely use double quotes inecho "$STR"
unless you specifically want the shell to expand any wildcards in the string as a side effect. See also stackoverflow.com/questions/10067266/ – tripleee Jan 25 '16 at 6:47Rob I , Feb 10, 2016 at 13:57
You're right about double quotes of course, though I did point out this solution wasn't general. However I think your assessment is a bit unfair - for some people this solution may be more readable (and hence extensible etc) than some others, and doesn't completely rely on arcane bash feature that wouldn't translate to other shells. I suspect that's why my solution, though less elegant, continues to get votes periodically... – Rob I Feb 10 '16 at 13:57Dennis Williamson , May 10, 2012 at 3:14
read
withIFS
are perfect for this:$ IFS=- read var1 var2 <<< ABCDE-123456 $ echo "$var1" ABCDE $ echo "$var2" 123456Edit:
Here is how you can read each individual character into array elements:
$ read -a foo <<<"$(echo "ABCDE-123456" | sed 's/./& /g')"Dump the array:
$ declare -p foo declare -a foo='([0]="A" [1]="B" [2]="C" [3]="D" [4]="E" [5]="-" [6]="1" [7]="2" [8]="3" [9]="4" [10]="5" [11]="6")'If there are spaces in the string:
$ IFS=$'\v' read -a foo <<<"$(echo "ABCDE 123456" | sed 's/./&\v/g')" $ declare -p foo declare -a foo='([0]="A" [1]="B" [2]="C" [3]="D" [4]="E" [5]=" " [6]="1" [7]="2" [8]="3" [9]="4" [10]="5" [11]="6")'insecure , Apr 30, 2014 at 7:51
Great, the elegant bash-only way, without unnecessary forks. – insecure Apr 30 '14 at 7:51Martin Serrano , Jan 11 at 4:34
this solution also has the benefit that if delimiter is not present, thevar2
will be empty – Martin Serrano Jan 11 at 4:34mkb , May 9, 2012 at 17:02
If you know it's going to be just two fields, you can skip the extra subprocesses like this:var1=${STR%-*} var2=${STR#*-}What does this do?
${STR%-*}
deletes the shortest substring of$STR
that matches the pattern-*
starting from the end of the string.${STR#*-}
does the same, but with the*-
pattern and starting from the beginning of the string. They each have counterparts%%
and##
which find the longest anchored pattern match. If anyone has a helpful mnemonic to remember which does which, let me know! I always have to try both to remember.Jens , Jan 30, 2015 at 15:17
Plus 1 For knowing your POSIX shell features, avoiding expensive forks and pipes, and the absence of bashisms. – Jens Jan 30 '15 at 15:17Steven Lu , May 1, 2015 at 20:19
Dunno about "absence of bashisms" considering that this is already moderately cryptic .... if your delimiter is a newline instead of a hyphen, then it becomes even more cryptic. On the other hand, it works with newlines , so there's that. – Steven Lu May 1 '15 at 20:19mkb , Mar 9, 2016 at 17:30
@KErlandsson: done – mkb Mar 9 '16 at 17:30mombip , Aug 9, 2016 at 15:58
I've finally found documentation for it: Shell-Parameter-Expansion – mombip Aug 9 '16 at 15:58DS. , Jan 13, 2017 at 19:56
Mnemonic: "#" is to the left of "%" on a standard keyboard, so "#" removes a prefix (on the left), and "%" removes a suffix (on the right). – DS. Jan 13 '17 at 19:56tripleee , May 9, 2012 at 17:57
Sounds like a job forset
with a customIFS
.IFS=- set $STR var1=$1 var2=$2(You will want to do this in a function with a
local IFS
so you don't mess up other parts of your script where you requireIFS
to be what you expect.)Rob I , May 9, 2012 at 19:20
Nice - I knew about$IFS
but hadn't seen how it could be used. – Rob I May 9 '12 at 19:20Sigg3.net , Jun 19, 2013 at 8:08
I used triplee's example and it worked exactly as advertised! Just change last two lines to <pre> myvar1=echo $1
&& myvar2=echo $2
</pre> if you need to store them throughout a script with several "thrown" variables. – Sigg3.net Jun 19 '13 at 8:08tripleee , Jun 19, 2013 at 13:25
No, don't use a uselessecho
in backticks . – tripleee Jun 19 '13 at 13:25Daniel Andersson , Mar 27, 2015 at 6:46
This is a really sweet solution if we need to write something that is not Bash specific. To handleIFS
troubles, one can addOLDIFS=$IFS
at the beginning before overwriting it, and then addIFS=$OLDIFS
just after theset
line. – Daniel Andersson Mar 27 '15 at 6:46tripleee , Mar 27, 2015 at 6:58
FWIW the link above is broken. I was lazy and careless. The canonical location still works; iki.fi/era/unix/award.html#echo – tripleee Mar 27 '15 at 6:58anubhava , May 9, 2012 at 17:09
Using bash regex capabilities:re="^([^-]+)-(.*)$" [[ "ABCDE-123456" =~ $re ]] && var1="${BASH_REMATCH[1]}" && var2="${BASH_REMATCH[2]}" echo $var1 echo $var2OUTPUT
ABCDE 123456Cometsong , Oct 21, 2016 at 13:29
Love pre-defining there
for later use(s)! – Cometsong Oct 21 '16 at 13:29Archibald , Nov 12, 2012 at 11:03
string="ABCDE-123456" IFS=- # use "local IFS=-" inside the function set $string echo $1 # >>> ABCDE echo $2 # >>> 123456tripleee , Mar 27, 2015 at 7:02
Hmmm, isn't this just a restatement of my answer ? – tripleee Mar 27 '15 at 7:02Archibald , Sep 18, 2015 at 12:36
Actually yes. I just clarified it a bit. – Archibald Sep 18 '15 at 12:36
Nov 08, 2018 | stackoverflow.com
cd1 , Jul 1, 2010 at 23:29
Suppose I have the string1:2:3:4:5
and I want to get its last field (5
in this case). How do I do that using Bash? I triedcut
, but I don't know how to specify the last field with-f
.Stephen , Jul 2, 2010 at 0:05
You can use string operators :$ foo=1:2:3:4:5 $ echo ${foo##*:} 5This trims everything from the front until a ':', greedily.
${foo <-- from variable foo ## <-- greedy front trim * <-- matches anything : <-- until the last ':' }eckes , Jan 23, 2013 at 15:23
While this is working for the given problem, the answer of William below ( stackoverflow.com/a/3163857/520162 ) also returns5
if the string is1:2:3:4:5:
(while using the string operators yields an empty result). This is especially handy when parsing paths that could contain (or not) a finishing/
character. – eckes Jan 23 '13 at 15:23Dobz , Jun 25, 2014 at 11:44
How would you then do the opposite of this? to echo out '1:2:3:4:'? – Dobz Jun 25 '14 at 11:44Mihai Danila , Jul 9, 2014 at 14:07
And how does one keep the part before the last separator? Apparently by using${foo%:*}
.#
- from beginning;%
- from end.#
,%
- shortest match;##
,%%
- longest match. – Mihai Danila Jul 9 '14 at 14:07Putnik , Feb 11, 2016 at 22:33
If i want to get the last element from path, how should I use it?echo ${pwd##*/}
does not work. – Putnik Feb 11 '16 at 22:33Stan Strum , Dec 17, 2017 at 4:22
@Putnik that command seespwd
as a variable. Trydir=$(pwd); echo ${dir##*/}
. Works for me! – Stan Strum Dec 17 '17 at 4:22a3nm , Feb 3, 2012 at 8:39
Another way is to reverse before and aftercut
:$ echo ab:cd:ef | rev | cut -d: -f1 | rev efThis makes it very easy to get the last but one field, or any range of fields numbered from the end.
Dannid , Jan 14, 2013 at 20:50
This answer is nice because it uses 'cut', which the author is (presumably) already familiar. Plus, I like this answer because I am using 'cut' and had this exact question, hence finding this thread via search. – Dannid Jan 14 '13 at 20:50funroll , Aug 12, 2013 at 19:51
Some cut-and-paste fodder for people using spaces as delimiters:echo "1 2 3 4" | rev | cut -d " " -f1 | rev
– funroll Aug 12 '13 at 19:51EdgeCaseBerg , Sep 8, 2013 at 5:01
the rev | cut -d -f1 | rev is so clever! Thanks! Helped me a bunch (my use case was rev | -d ' ' -f 2- | rev – EdgeCaseBerg Sep 8 '13 at 5:01Anarcho-Chossid , Sep 16, 2015 at 15:54
Wow. Beautiful and dark magic. – Anarcho-Chossid Sep 16 '15 at 15:54shearn89 , Aug 17, 2017 at 9:27
I always forget aboutrev
, was just what I needed!cut -b20- | rev | cut -b10- | rev
– shearn89 Aug 17 '17 at 9:27William Pursell , Jul 2, 2010 at 7:09
It's difficult to get the last field using cut, but here's (one set of) solutions in awk and perl$ echo 1:2:3:4:5 | awk -F: '{print $NF}' 5 $ echo 1:2:3:4:5 | perl -F: -wane 'print $F[-1]' 5eckes , Jan 23, 2013 at 15:20
great advantage of this solution over the accepted answer: it also matches paths that contain or do not contain a finishing/
character:/a/b/c/d
and/a/b/c/d/
yield the same result (d
) when processingpwd | awk -F/ '{print $NF}'
. The accepted answer results in an empty result in the case of/a/b/c/d/
– eckes Jan 23 '13 at 15:20stamster , May 21 at 11:52
@eckes In case of AWK solution, on GNU bash, version 4.3.48(1)-release that's not true, as it matters whenever you have trailing slash or not. Simply put AWK will use/
as delimiter, and if your path is/my/path/dir/
it will use value after last delimiter, which is simply an empty string. So it's best to avoid trailing slash if you need to do such a thing like I do. – stamster May 21 at 11:52Nicholas M T Elliott , Jul 1, 2010 at 23:39
Assuming fairly simple usage (no escaping of the delimiter, for example), you can use grep:$ echo "1:2:3:4:5" | grep -oE "[^:]+$" 5Breakdown - find all the characters not the delimiter ([^:]) at the end of the line ($). -o only prints the matching part.
Dennis Williamson , Jul 2, 2010 at 0:05
One way:var1="1:2:3:4:5" var2=${var1##*:}Another, using an array:
var1="1:2:3:4:5" saveIFS=$IFS IFS=":" var2=($var1) IFS=$saveIFS var2=${var2[@]: -1}Yet another with an array:
var1="1:2:3:4:5" saveIFS=$IFS IFS=":" var2=($var1) IFS=$saveIFS count=${#var2[@]} var2=${var2[$count-1]}Using Bash (version >= 3.2) regular expressions:
var1="1:2:3:4:5" [[ $var1 =~ :([^:]*)$ ]] var2=${BASH_REMATCH[1]}liuyang1 , Mar 24, 2015 at 6:02
Thanks so much for array style, as I need this feature, but not have cut, awk these utils. – liuyang1 Mar 24 '15 at 6:02user3133260 , Dec 24, 2013 at 19:04
$ echo "a b c d e" | tr ' ' '\n' | tail -1 eSimply translate the delimiter into a newline and choose the last entry with
tail -1
.Yajo , Jul 30, 2014 at 10:13
It will fail if the last item contains a\n
, but for most cases is the most readable solution. – Yajo Jul 30 '14 at 10:13Rafael , Nov 10, 2016 at 10:09
Usingsed
:$ echo '1:2:3:4:5' | sed 's/.*://' # => 5 $ echo '' | sed 's/.*://' # => (empty) $ echo ':' | sed 's/.*://' # => (empty) $ echo ':b' | sed 's/.*://' # => b $ echo '::c' | sed 's/.*://' # => c $ echo 'a' | sed 's/.*://' # => a $ echo 'a:' | sed 's/.*://' # => (empty) $ echo 'a:b' | sed 's/.*://' # => b $ echo 'a::c' | sed 's/.*://' # => cAb Irato , Nov 13, 2013 at 16:10
If your last field is a single character, you could do this:a="1:2:3:4:5" echo ${a: -1} echo ${a:(-1)}Check string manipulation in bash .
gniourf_gniourf , Nov 13, 2013 at 16:15
This doesn't work: it gives the last character ofa
, not the last field . – gniourf_gniourf Nov 13 '13 at 16:15Ab Irato , Nov 25, 2013 at 13:25
True, that's the idea, if you know the length of the last field it's good. If not you have to use something else... – Ab Irato Nov 25 '13 at 13:25sphakka , Jan 25, 2016 at 16:24
Interesting, I didn't know of these particular Bash string manipulations. It also resembles to Python's string/array slicing . – sphakka Jan 25 '16 at 16:24ghostdog74 , Jul 2, 2010 at 1:16
Using Bash.$ var1="1:2:3:4:0" $ IFS=":" $ set -- $var1 $ eval echo \$${#} 0Sopalajo de Arrierez , Dec 24, 2014 at 5:04
I would buy some details about this method, please :-) . – Sopalajo de Arrierez Dec 24 '14 at 5:04Rafa , Apr 27, 2017 at 22:10
Could have usedecho ${!#}
instead ofeval echo \$${#}
. – Rafa Apr 27 '17 at 22:10Crytis , Dec 7, 2016 at 6:51
echo "a:b:c:d:e"|xargs -d : -n1|tail -1First use xargs split it using ":",-n1 means every line only have one part.Then,pring the last part.
BDL , Dec 7, 2016 at 13:47
Although this might solve the problem, one should always add an explanation to it. – BDL Dec 7 '16 at 13:47Crytis , Jun 7, 2017 at 9:13
already added.. – Crytis Jun 7 '17 at 9:13021 , Apr 26, 2016 at 11:33
There are many good answers here, but still I want to share this one using basename :basename $(echo "a:b:c:d:e" | tr ':' '/')However it will fail if there are already some '/' in your string . If slash / is your delimiter then you just have to (and should) use basename.
It's not the best answer but it just shows how you can be creative using bash commands.
Nahid Akbar , Jun 22, 2012 at 2:55
for x in `echo $str | tr ";" "\n"`; do echo $x; donechepner , Jun 22, 2012 at 12:58
This runs into problems if there is whitespace in any of the fields. Also, it does not directly address the question of retrieving the last field. – chepner Jun 22 '12 at 12:58Christoph Böddeker , Feb 19 at 15:50
For those that comfortable with Python, https://github.com/Russell91/pythonpy is a nice choice to solve this problem.$ echo "a:b:c:d:e" | py -x 'x.split(":")[-1]'From the pythonpy help:
-x treat each row of stdin as x
.With that tool, it is easy to write python code that gets applied to the input.
baz , Nov 24, 2017 at 19:27
a solution using the read builtinIFS=':' read -a field <<< "1:2:3:4:5" echo ${field[4]}
Nov 08, 2018 | stackoverflow.com
stefanB , May 28, 2009 at 2:03
I have this string stored in a variable:IN="[email protected];[email protected]"Now I would like to split the strings by
;
delimiter so that I have:ADDR1="[email protected]" ADDR2="[email protected]"I don't necessarily need the
ADDR1
andADDR2
variables. If they are elements of an array that's even better.
After suggestions from the answers below, I ended up with the following which is what I was after:
#!/usr/bin/env bash IN="[email protected];[email protected]" mails=$(echo $IN | tr ";" "\n") for addr in $mails do echo "> [$addr]" doneOutput:
> [[email protected]] > [[email protected]]There was a solution involving setting Internal_field_separator (IFS) to
;
. I am not sure what happened with that answer, how do you resetIFS
back to default?RE:
IFS
solution, I tried this and it works, I keep the oldIFS
and then restore it:IN="[email protected];[email protected]" OIFS=$IFS IFS=';' mails2=$IN for x in $mails2 do echo "> [$x]" done IFS=$OIFSBTW, when I tried
mails2=($IN)I only got the first string when printing it in loop, without brackets around
$IN
it works.Brooks Moses , May 1, 2012 at 1:26
With regards to your "Edit2": You can simply "unset IFS" and it will return to the default state. There's no need to save and restore it explicitly unless you have some reason to expect that it's already been set to a non-default value. Moreover, if you're doing this inside a function (and, if you aren't, why not?), you can set IFS as a local variable and it will return to its previous value once you exit the function. – Brooks Moses May 1 '12 at 1:26dubiousjim , May 31, 2012 at 5:21
@BrooksMoses: (a) +1 for usinglocal IFS=...
where possible; (b) -1 forunset IFS
, this doesn't exactly reset IFS to its default value, though I believe an unset IFS behaves the same as the default value of IFS ($' \t\n'), however it seems bad practice to be assuming blindly that your code will never be invoked with IFS set to a custom value; (c) another idea is to invoke a subshell:(IFS=$custom; ...)
when the subshell exits IFS will return to whatever it was originally. – dubiousjim May 31 '12 at 5:21nicooga , Mar 7, 2016 at 15:32
I just want to have a quick look at the paths to decide where to throw an executable, so I resorted to runruby -e "puts ENV.fetch('PATH').split(':')"
. If you want to stay pure bash won't help but using any scripting language that has a built-in split is easier. – nicooga Mar 7 '16 at 15:32Jeff , Apr 22 at 17:51
This is kind of a drive-by comment, but since the OP used email addresses as the example, has anyone bothered to answer it in a way that is fully RFC 5322 compliant, namely that any quoted string can appear before the @ which means you're going to need regular expressions or some other kind of parser instead of naive use of IFS or other simplistic splitter functions. – Jeff Apr 22 at 17:51user2037659 , Apr 26 at 20:15
for x in $(IFS=';';echo $IN); do echo "> [$x]"; done
– user2037659 Apr 26 at 20:15Johannes Schaub - litb , May 28, 2009 at 2:23
You can set the internal field separator (IFS) variable, and then let it parse into an array. When this happens in a command, then the assignment toIFS
only takes place to that single command's environment (toread
). It then parses the input according to theIFS
variable value into an array, which we can then iterate over.IFS=';' read -ra ADDR <<< "$IN" for i in "${ADDR[@]}"; do # process "$i" doneIt will parse one line of items separated by
;
, pushing it into an array. Stuff for processing whole of$IN
, each time one line of input separated by;
:while IFS=';' read -ra ADDR; do for i in "${ADDR[@]}"; do # process "$i" done done <<< "$IN"Chris Lutz , May 28, 2009 at 2:25
This is probably the best way. How long will IFS persist in it's current value, can it mess up my code by being set when it shouldn't be, and how can I reset it when I'm done with it? – Chris Lutz May 28 '09 at 2:25Johannes Schaub - litb , May 28, 2009 at 3:04
now after the fix applied, only within the duration of the read command :) – Johannes Schaub - litb May 28 '09 at 3:04lhunath , May 28, 2009 at 6:14
You can read everything at once without using a while loop: read -r -d '' -a addr <<< "$in" # The -d '' is key here, it tells read not to stop at the first newline (which is the default -d) but to continue until EOF or a NULL byte (which only occur in binary data). – lhunath May 28 '09 at 6:14Charles Duffy , Jul 6, 2013 at 14:39
@LucaBorrione SettingIFS
on the same line as theread
with no semicolon or other separator, as opposed to in a separate command, scopes it to that command -- so it's always "restored"; you don't need to do anything manually. – Charles Duffy Jul 6 '13 at 14:39chepner , Oct 2, 2014 at 3:50
@imagineerThis There is a bug involving herestrings and local changes to IFS that requires$IN
to be quoted. The bug is fixed inbash
4.3. – chepner Oct 2 '14 at 3:50palindrom , Mar 10, 2011 at 9:00
Taken from Bash shell script split array :IN="[email protected];[email protected]" arrIN=(${IN//;/ })Explanation:
This construction replaces all occurrences of
';'
(the initial//
means global replace) in the stringIN
with' '
(a single space), then interprets the space-delimited string as an array (that's what the surrounding parentheses do).The syntax used inside of the curly braces to replace each
';'
character with a' '
character is called Parameter Expansion .There are some common gotchas:
Oz123 , Mar 21, 2011 at 18:50
I just want to add: this is the simplest of all, you can access array elements with ${arrIN[1]} (starting from zeros of course) – Oz123 Mar 21 '11 at 18:50KomodoDave , Jan 5, 2012 at 15:13
Found it: the technique of modifying a variable within a ${} is known as 'parameter expansion'. – KomodoDave Jan 5 '12 at 15:13qbolec , Feb 25, 2013 at 9:12
Does it work when the original string contains spaces? – qbolec Feb 25 '13 at 9:12Ethan , Apr 12, 2013 at 22:47
No, I don't think this works when there are also spaces present... it's converting the ',' to ' ' and then building a space-separated array. – Ethan Apr 12 '13 at 22:47Charles Duffy , Jul 6, 2013 at 14:39
This is a bad approach for other reasons: For instance, if your string contains;*;
, then the*
will be expanded to a list of filenames in the current directory. -1 – Charles Duffy Jul 6 '13 at 14:39Chris Lutz , May 28, 2009 at 2:09
If you don't mind processing them immediately, I like to do this:for i in $(echo $IN | tr ";" "\n") do # process doneYou could use this kind of loop to initialize an array, but there's probably an easier way to do it. Hope this helps, though.
Chris Lutz , May 28, 2009 at 2:42
You should have kept the IFS answer. It taught me something I didn't know, and it definitely made an array, whereas this just makes a cheap substitute. – Chris Lutz May 28 '09 at 2:42Johannes Schaub - litb , May 28, 2009 at 2:59
I see. Yeah i find doing these silly experiments, i'm going to learn new things each time i'm trying to answer things. I've edited stuff based on #bash IRC feedback and undeleted :) – Johannes Schaub - litb May 28 '09 at 2:59lhunath , May 28, 2009 at 6:12
-1, you're obviously not aware of wordsplitting, because it's introducing two bugs in your code. one is when you don't quote $IN and the other is when you pretend a newline is the only delimiter used in wordsplitting. You are iterating over every WORD in IN, not every line, and DEFINATELY not every element delimited by a semicolon, though it may appear to have the side-effect of looking like it works. – lhunath May 28 '09 at 6:12Johannes Schaub - litb , May 28, 2009 at 17:00
You could change it to echo "$IN" | tr ';' '\n' | while read -r ADDY; do # process "$ADDY"; done to make him lucky, i think :) Note that this will fork, and you can't change outer variables from within the loop (that's why i used the <<< "$IN" syntax) then – Johannes Schaub - litb May 28 '09 at 17:00mklement0 , Apr 24, 2013 at 14:13
To summarize the debate in the comments: Caveats for general use : the shell applies word splitting and expansions to the string, which may be undesired; just try it with.IN="[email protected];[email protected];*;broken apart"
. In short: this approach will break, if your tokens contain embedded spaces and/or chars. such as*
that happen to make a token match filenames in the current folder. – mklement0 Apr 24 '13 at 14:13F. Hauri , Apr 13, 2013 at 14:20
Compatible answerTo this SO question, there is already a lot of different way to do this in bash . But bash has many special features, so called bashism that work well, but that won't work in any other shell .
In particular, arrays , associative array , and pattern substitution are pure bashisms and may not work under other shells .
On my Debian GNU/Linux , there is a standard shell called dash , but I know many people who like to use ksh .
Finally, in very small situation, there is a special tool called busybox with his own shell interpreter ( ash ).
Requested stringThe string sample in SO question is:
IN="[email protected];[email protected]"As this could be useful with whitespaces and as whitespaces could modify the result of the routine, I prefer to use this sample string:
IN="[email protected];[email protected];Full Name <[email protected]>"Split string based on delimiter in bash (version >=4.2)Under pure bash, we may use arrays and IFS :
var="[email protected];[email protected];Full Name <[email protected]>"oIFS="$IFS" IFS=";" declare -a fields=($var) IFS="$oIFS" unset oIFSIFS=\; read -a fields <<<"$var"Using this syntax under recent bash don't change
$IFS
for current session, but only for the current command:set | grep ^IFS= IFS=$' \t\n'Now the string
var
is split and stored into an array (namedfields
):set | grep ^fields=\\\|^var= fields=([0]="[email protected]" [1]="[email protected]" [2]="Full Name <[email protected]>") var='[email protected];[email protected];Full Name <[email protected]>'We could request for variable content with
declare -p
:declare -p var fields declare -- var="[email protected];[email protected];Full Name <[email protected]>" declare -a fields=([0]="[email protected]" [1]="[email protected]" [2]="Full Name <[email protected]>")
read
is the quickiest way to do the split, because there is no forks and no external resources called.From there, you could use the syntax you already know for processing each field:
for x in "${fields[@]}";do echo "> [$x]" done > [[email protected]] > [[email protected]] > [Full Name <[email protected]>]or drop each field after processing (I like this shifting approach):
while [ "$fields" ] ;do echo "> [$fields]" fields=("${fields[@]:1}") done > [[email protected]] > [[email protected]] > [Full Name <[email protected]>]or even for simple printout (shorter syntax):
printf "> [%s]\n" "${fields[@]}" > [[email protected]] > [[email protected]] > [Full Name <[email protected]>]Split string based on delimiter in shellBut if you would write something usable under many shells, you have to not use bashisms .
There is a syntax, used in many shells, for splitting a string across first or last occurrence of a substring:
${var#*SubStr} # will drop begin of string up to first occur of `SubStr` ${var##*SubStr} # will drop begin of string up to last occur of `SubStr` ${var%SubStr*} # will drop part of string from last occur of `SubStr` to the end ${var%%SubStr*} # will drop part of string from first occur of `SubStr` to the end(The missing of this is the main reason of my answer publication ;)
As pointed out by Score_Under :
#
and%
delete the shortest possible matching string, and
##
and%%
delete the longest possible.This little sample script work well under bash , dash , ksh , busybox and was tested under Mac-OS's bash too:
var="[email protected];[email protected];Full Name <[email protected]>" while [ "$var" ] ;do iter=${var%%;*} echo "> [$iter]" [ "$var" = "$iter" ] && \ var='' || \ var="${var#*;}" done > [[email protected]] > [[email protected]] > [Full Name <[email protected]>]Have fun!
Score_Under , Apr 28, 2015 at 16:58
The#
,##
,%
, and%%
substitutions have what is IMO an easier explanation to remember (for how much they delete):#
and%
delete the shortest possible matching string, and##
and%%
delete the longest possible. – Score_Under Apr 28 '15 at 16:58sorontar , Oct 26, 2016 at 4:36
TheIFS=\; read -a fields <<<"$var"
fails on newlines and add a trailing newline. The other solution removes a trailing empty field. – sorontar Oct 26 '16 at 4:36Eric Chen , Aug 30, 2017 at 17:50
The shell delimiter is the most elegant answer, period. – Eric Chen Aug 30 '17 at 17:50sancho.s , Oct 4 at 3:42
Could the last alternative be used with a list of field separators set somewhere else? For instance, I mean to use this as a shell script, and pass a list of field separators as a positional parameter. – sancho.s Oct 4 at 3:42F. Hauri , Oct 4 at 7:47
Yes, in a loop:for sep in "#" "ł" "@" ; do ... var="${var#*$sep}" ...
– F. Hauri Oct 4 at 7:47DougW , Apr 27, 2015 at 18:20
I've seen a couple of answers referencing thecut
command, but they've all been deleted. It's a little odd that nobody has elaborated on that, because I think it's one of the more useful commands for doing this type of thing, especially for parsing delimited log files.In the case of splitting this specific example into a bash script array,
tr
is probably more efficient, butcut
can be used, and is more effective if you want to pull specific fields from the middle.Example:
$ echo "[email protected];[email protected]" | cut -d ";" -f 1 [email protected] $ echo "[email protected];[email protected]" | cut -d ";" -f 2 [email protected]You can obviously put that into a loop, and iterate the -f parameter to pull each field independently.
This gets more useful when you have a delimited log file with rows like this:
2015-04-27|12345|some action|an attribute|meta data
cut
is very handy to be able tocat
this file and select a particular field for further processing.MisterMiyagi , Nov 2, 2016 at 8:42
Kudos for usingcut
, it's the right tool for the job! Much cleared than any of those shell hacks. – MisterMiyagi Nov 2 '16 at 8:42uli42 , Sep 14, 2017 at 8:30
This approach will only work if you know the number of elements in advance; you'd need to program some more logic around it. It also runs an external tool for every element. – uli42 Sep 14 '17 at 8:30Louis Loudog Trottier , May 10 at 4:20
Excatly waht i was looking for trying to avoid empty string in a csv. Now i can point the exact 'column' value as well. Work with IFS already used in a loop. Better than expected for my situation. – Louis Loudog Trottier May 10 at 4:20, May 28, 2009 at 10:31
How about this approach:IN="[email protected];[email protected]" set -- "$IN" IFS=";"; declare -a Array=($*) echo "${Array[@]}" echo "${Array[0]}" echo "${Array[1]}"Yzmir Ramirez , Sep 5, 2011 at 1:06
+1 ... but I wouldn't name the variable "Array" ... pet peev I guess. Good solution. – Yzmir Ramirez Sep 5 '11 at 1:06ata , Nov 3, 2011 at 22:33
+1 ... but the "set" and declare -a are unnecessary. You could as well have used justIFS";" && Array=($IN)
– ata Nov 3 '11 at 22:33Luca Borrione , Sep 3, 2012 at 9:26
+1 Only a side note: shouldn't it be recommendable to keep the old IFS and then restore it? (as shown by stefanB in his edit3) people landing here (sometimes just copying and pasting a solution) might not think about this – Luca Borrione Sep 3 '12 at 9:26Charles Duffy , Jul 6, 2013 at 14:44
-1: First, @ata is right that most of the commands in this do nothing. Second, it uses word-splitting to form the array, and doesn't do anything to inhibit glob-expansion when doing so (so if you have glob characters in any of the array elements, those elements are replaced with matching filenames). – Charles Duffy Jul 6 '13 at 14:44John_West , Jan 8, 2016 at 12:29
Suggest to use$'...'
:IN=$'[email protected];[email protected];bet <d@\ns* kl.com>'
. Thenecho "${Array[2]}"
will print a string with newline.set -- "$IN"
is also neccessary in this case. Yes, to prevent glob expansion, the solution should includeset -f
. – John_West Jan 8 '16 at 12:29Steven Lizarazo , Aug 11, 2016 at 20:45
This worked for me:string="1;2" echo $string | cut -d';' -f1 # output is 1 echo $string | cut -d';' -f2 # output is 2Pardeep Sharma , Oct 10, 2017 at 7:29
this is sort and sweet :) – Pardeep Sharma Oct 10 '17 at 7:29space earth , Oct 17, 2017 at 7:23
Thanks...Helped a lot – space earth Oct 17 '17 at 7:23mojjj , Jan 8 at 8:57
cut works only with a single char as delimiter. – mojjj Jan 8 at 8:57lothar , May 28, 2009 at 2:12
echo "[email protected];[email protected]" | sed -e 's/;/\n/g' [email protected] [email protected]Luca Borrione , Sep 3, 2012 at 10:08
-1 what if the string contains spaces? for exampleIN="this is first line; this is second line" arrIN=( $( echo "$IN" | sed -e 's/;/\n/g' ) )
will produce an array of 8 elements in this case (an element for each word space separated), rather than 2 (an element for each line semi colon separated) – Luca Borrione Sep 3 '12 at 10:08lothar , Sep 3, 2012 at 17:33
@Luca No the sed script creates exactly two lines. What creates the multiple entries for you is when you put it into a bash array (which splits on white space by default) – lothar Sep 3 '12 at 17:33Luca Borrione , Sep 4, 2012 at 7:09
That's exactly the point: the OP needs to store entries into an array to loop over it, as you can see in his edits. I think your (good) answer missed to mention to usearrIN=( $( echo "$IN" | sed -e 's/;/\n/g' ) )
to achieve that, and to advice to change IFS toIFS=$'\n'
for those who land here in the future and needs to split a string containing spaces. (and to restore it back afterwards). :) – Luca Borrione Sep 4 '12 at 7:09lothar , Sep 4, 2012 at 16:55
@Luca Good point. However the array assignment was not in the initial question when I wrote up that answer. – lothar Sep 4 '12 at 16:55Ashok , Sep 8, 2012 at 5:01
This also works:IN="[email protected];[email protected]" echo ADD1=`echo $IN | cut -d \; -f 1` echo ADD2=`echo $IN | cut -d \; -f 2`Be careful, this solution is not always correct. In case you pass "[email protected]" only, it will assign it to both ADD1 and ADD2.
fersarr , Mar 3, 2016 at 17:17
You can use -s to avoid the mentioned problem: superuser.com/questions/896800/ "-f, --fields=LIST select only these fields; also print any line that contains no delimiter character, unless the -s option is specified" – fersarr Mar 3 '16 at 17:17Tony , Jan 14, 2013 at 6:33
I think AWK is the best and efficient command to resolve your problem. AWK is included in Bash by default in almost every Linux distribution.echo "[email protected];[email protected]" | awk -F';' '{print $1,$2}'will give
[email protected] [email protected]Of course your can store each email address by redefining the awk print field.
Jaro , Jan 7, 2014 at 21:30
Or even simpler: echo "[email protected];[email protected]" | awk 'BEGIN{RS=";"} {print}' – Jaro Jan 7 '14 at 21:30Aquarelle , May 6, 2014 at 21:58
@Jaro This worked perfectly for me when I had a string with commas and needed to reformat it into lines. Thanks. – Aquarelle May 6 '14 at 21:58Eduardo Lucio , Aug 5, 2015 at 12:59
It worked in this scenario -> "echo "$SPLIT_0" | awk -F' inode=' '{print $1}'"! I had problems when trying to use atrings (" inode=") instead of characters (";"). $ 1, $ 2, $ 3, $ 4 are set as positions in an array! If there is a way of setting an array... better! Thanks! – Eduardo Lucio Aug 5 '15 at 12:59Tony , Aug 6, 2015 at 2:42
@EduardoLucio, what I'm thinking about is maybe you can first replace your delimiterinode=
into;
for example bysed -i 's/inode\=/\;/g' your_file_to_process
, then define-F';'
when applyawk
, hope that can help you. – Tony Aug 6 '15 at 2:42nickjb , Jul 5, 2011 at 13:41
A different take on Darron's answer , this is how I do it:IN="[email protected];[email protected]" read ADDR1 ADDR2 <<<$(IFS=";"; echo $IN)ColinM , Sep 10, 2011 at 0:31
This doesn't work. – ColinM Sep 10 '11 at 0:31nickjb , Oct 6, 2011 at 15:33
I think it does! Run the commands above and then "echo $ADDR1 ... $ADDR2" and i get "[email protected] ... [email protected]" output – nickjb Oct 6 '11 at 15:33Nick , Oct 28, 2011 at 14:36
This worked REALLY well for me... I used it to itterate over an array of strings which contained comma separated DB,SERVER,PORT data to use mysqldump. – Nick Oct 28 '11 at 14:36dubiousjim , May 31, 2012 at 5:28
Diagnosis: theIFS=";"
assignment exists only in the$(...; echo $IN)
subshell; this is why some readers (including me) initially think it won't work. I assumed that all of $IN was getting slurped up by ADDR1. But nickjb is correct; it does work. The reason is thatecho $IN
command parses its arguments using the current value of $IFS, but then echoes them to stdout using a space delimiter, regardless of the setting of $IFS. So the net effect is as though one had calledread ADDR1 ADDR2 <<< "[email protected] [email protected]"
(note the input is space-separated not ;-separated). – dubiousjim May 31 '12 at 5:28sorontar , Oct 26, 2016 at 4:43
This fails on spaces and newlines, and also expand wildcards*
in theecho $IN
with an unquoted variable expansion. – sorontar Oct 26 '16 at 4:43gniourf_gniourf , Jun 26, 2014 at 9:11
In Bash, a bullet proof way, that will work even if your variable contains newlines:IFS=';' read -d '' -ra array < <(printf '%s;\0' "$in")Look:
$ in=$'one;two three;*;there is\na newline\nin this field' $ IFS=';' read -d '' -ra array < <(printf '%s;\0' "$in") $ declare -p array declare -a array='([0]="one" [1]="two three" [2]="*" [3]="there is a newline in this field")'The trick for this to work is to use the
-d
option ofread
(delimiter) with an empty delimiter, so thatread
is forced to read everything it's fed. And we feedread
with exactly the content of the variablein
, with no trailing newline thanks toprintf
. Note that's we're also putting the delimiter inprintf
to ensure that the string passed toread
has a trailing delimiter. Without it,read
would trim potential trailing empty fields:$ in='one;two;three;' # there's an empty field $ IFS=';' read -d '' -ra array < <(printf '%s;\0' "$in") $ declare -p array declare -a array='([0]="one" [1]="two" [2]="three" [3]="")'the trailing empty field is preserved.
Update for Bash≥4.4Since Bash 4.4, the builtin
mapfile
(akareadarray
) supports the-d
option to specify a delimiter. Hence another canonical way is:mapfile -d ';' -t array < <(printf '%s;' "$in")John_West , Jan 8, 2016 at 12:10
I found it as the rare solution on that list that works correctly with\n
, spaces and*
simultaneously. Also, no loops; array variable is accessible in the shell after execution (contrary to the highest upvoted answer). Note,in=$'...'
, it does not work with double quotes. I think, it needs more upvotes. – John_West Jan 8 '16 at 12:10Darron , Sep 13, 2010 at 20:10
How about this one liner, if you're not using arrays:IFS=';' read ADDR1 ADDR2 <<<$INdubiousjim , May 31, 2012 at 5:36
Consider usingread -r ...
to ensure that, for example, the two characters "\t" in the input end up as the same two characters in your variables (instead of a single tab char). – dubiousjim May 31 '12 at 5:36Luca Borrione , Sep 3, 2012 at 10:07
-1 This is not working here (ubuntu 12.04). Addingecho "ADDR1 $ADDR1"\n echo "ADDR2 $ADDR2"
to your snippet will outputADDR1 [email protected] [email protected]\nADDR2
(\n is newline) – Luca Borrione Sep 3 '12 at 10:07chepner , Sep 19, 2015 at 13:59
This is probably due to a bug involvingIFS
and here strings that was fixed inbash
4.3. Quoting$IN
should fix it. (In theory,$IN
is not subject to word splitting or globbing after it expands, meaning the quotes should be unnecessary. Even in 4.3, though, there's at least one bug remaining--reported and scheduled to be fixed--so quoting remains a good idea.) – chepner Sep 19 '15 at 13:59sorontar , Oct 26, 2016 at 4:55
This breaks if $in contain newlines even if $IN is quoted. And adds a trailing newline. – sorontar Oct 26 '16 at 4:55kenorb , Sep 11, 2015 at 20:54
Here is a clean 3-liner:in="foo@bar;bizz@buzz;fizz@buzz;buzz@woof" IFS=';' list=($in) for item in "${list[@]}"; do echo $item; donewhere
IFS
delimit words based on the separator and()
is used to create an array . Then[@]
is used to return each item as a separate word.If you've any code after that, you also need to restore
$IFS
, e.g.unset IFS
.sorontar , Oct 26, 2016 at 5:03
The use of$in
unquoted allows wildcards to be expanded. – sorontar Oct 26 '16 at 5:03user2720864 , Sep 24 at 13:46
+ for the unset command – user2720864 Sep 24 at 13:46Emilien Brigand , Aug 1, 2016 at 13:15
Without setting the IFSIf you just have one colon you can do that:
a="foo:bar" b=${a%:*} c=${a##*:}you will get:
b = foo c = barVictor Choy , Sep 16, 2015 at 3:34
There is a simple and smart way like this:echo "add:sfff" | xargs -d: -i echo {}But you must use gnu xargs, BSD xargs cant support -d delim. If you use apple mac like me. You can install gnu xargs :
brew install findutilsthen
echo "add:sfff" | gxargs -d: -i echo {}Halle Knast , May 24, 2017 at 8:42
The following Bash/zsh function splits its first argument on the delimiter given by the second argument:split() { local string="$1" local delimiter="$2" if [ -n "$string" ]; then local part while read -d "$delimiter" part; do echo $part done <<< "$string" echo $part fi }For instance, the command
$ split 'a;b;c' ';'yields
a b cThis output may, for instance, be piped to other commands. Example:
$ split 'a;b;c' ';' | cat -n 1 a 2 b 3 cCompared to the other solutions given, this one has the following advantages:
IFS
is not overriden: Due to dynamic scoping of even local variables, overridingIFS
over a loop causes the new value to leak into function calls performed from within the loop.- Arrays are not used: Reading a string into an array using
read
requires the flag-a
in Bash and-A
in zsh.If desired, the function may be put into a script as follows:
#!/usr/bin/env bash split() { # ... } split "$@"sandeepkunkunuru , Oct 23, 2017 at 16:10
works and neatly modularized. – sandeepkunkunuru Oct 23 '17 at 16:10Prospero , Sep 25, 2011 at 1:09
This is the simplest way to do it.spo='one;two;three' OIFS=$IFS IFS=';' spo_array=($spo) IFS=$OIFS echo ${spo_array[*]}rashok , Oct 25, 2016 at 12:41
IN="[email protected];[email protected]" IFS=';' read -a IN_arr <<< "${IN}" for entry in "${IN_arr[@]}" do echo $entry doneOutput
[email protected] [email protected]System : Ubuntu 12.04.1
codeforester , Jan 2, 2017 at 5:37
IFS is not getting set in the specific context ofread
here and hence it can upset rest of the code, if any. – codeforester Jan 2 '17 at 5:37shuaihanhungry , Jan 20 at 15:54
you can apply awk to many situationsecho "[email protected];[email protected]"|awk -F';' '{printf "%s\n%s\n", $1, $2}'also you can use this
echo "[email protected];[email protected]"|awk -F';' '{print $1,$2}' OFS="\n"ghost , Apr 24, 2013 at 13:13
If no space, Why not this?IN="[email protected];[email protected]" arr=(`echo $IN | tr ';' ' '`) echo ${arr[0]} echo ${arr[1]}eukras , Oct 22, 2012 at 7:10
There are some cool answers here (errator esp.), but for something analogous to split in other languages -- which is what I took the original question to mean -- I settled on this:IN="[email protected];[email protected]" declare -a a="(${IN/;/ })";Now
${a[0]}
,${a[1]}
, etc, are as you would expect. Use${#a[*]}
for number of terms. Or to iterate, of course:for i in ${a[*]}; do echo $i; doneIMPORTANT NOTE:
This works in cases where there are no spaces to worry about, which solved my problem, but may not solve yours. Go with the
$IFS
solution(s) in that case.olibre , Oct 7, 2013 at 13:33
Does not work whenIN
contains more than two e-mail addresses. Please refer to same idea (but fixed) at palindrom's answer – olibre Oct 7 '13 at 13:33sorontar , Oct 26, 2016 at 5:14
Better use${IN//;/ }
(double slash) to make it also work with more than two values. Beware that any wildcard (*?[
) will be expanded. And a trailing empty field will be discarded. – sorontar Oct 26 '16 at 5:14jeberle , Apr 30, 2013 at 3:10
Use theset
built-in to load up the$@
array:IN="[email protected];[email protected]" IFS=';'; set $IN; IFS=$' \t\n'Then, let the party begin:
echo $# for a; do echo $a; done ADDR1=$1 ADDR2=$2sorontar , Oct 26, 2016 at 5:17
Better useset -- $IN
to avoid some issues with "$IN" starting with dash. Still, the unquoted expansion of$IN
will expand wildcards (*?[
). – sorontar Oct 26 '16 at 5:17NevilleDNZ , Sep 2, 2013 at 6:30
Two bourne-ish alternatives where neither require bash arrays:Case 1 : Keep it nice and simple: Use a NewLine as the Record-Separator... eg.
IN="[email protected] [email protected]" while read i; do # process "$i" ... eg. echo "[email:$i]" done <<< "$IN"Note: in this first case no sub-process is forked to assist with list manipulation.
Idea: Maybe it is worth using NL extensively internally , and only converting to a different RS when generating the final result externally .
Case 2 : Using a ";" as a record separator... eg.
NL=" " IRS=";" ORS=";" conv_IRS() { exec tr "$1" "$NL" } conv_ORS() { exec tr "$NL" "$1" } IN="[email protected];[email protected]" IN="$(conv_IRS ";" <<< "$IN")" while read i; do # process "$i" ... eg. echo -n "[email:$i]$ORS" done <<< "$IN"In both cases a sub-list can be composed within the loop is persistent after the loop has completed. This is useful when manipulating lists in memory, instead storing lists in files. {p.s. keep calm and carry on B-) }
fedorqui , Jan 8, 2015 at 10:21
Apart from the fantastic answers that were already provided, if it is just a matter of printing out the data you may consider usingawk
:awk -F";" '{for (i=1;i<=NF;i++) printf("> [%s]\n", $i)}' <<< "$IN"This sets the field separator to
Test;
, so that it can loop through the fields with afor
loop and print accordingly.$ IN="[email protected];[email protected]" $ awk -F";" '{for (i=1;i<=NF;i++) printf("> [%s]\n", $i)}' <<< "$IN" > [[email protected]] > [[email protected]]With another input:
$ awk -F";" '{for (i=1;i<=NF;i++) printf("> [%s]\n", $i)}' <<< "a;b;c d;e_;f" > [a] > [b] > [c d] > [e_] > [f]18446744073709551615 , Feb 20, 2015 at 10:49
In Android shell, most of the proposed methods just do not work:$ IFS=':' read -ra ADDR <<<"$PATH" /system/bin/sh: can't create temporary file /sqlite_stmt_journals/mksh.EbNoR10629: No such file or directoryWhat does work is:
$ for i in ${PATH//:/ }; do echo $i; done /sbin /vendor/bin /system/sbin /system/bin /system/xbinwhere
//
means global replacement.sorontar , Oct 26, 2016 at 5:08
Fails if any part of $PATH contains spaces (or newlines). Also expands wildcards (asterisk *, question mark ? and braces [ ]). – sorontar Oct 26 '16 at 5:08Eduardo Lucio , Apr 4, 2016 at 19:54
Okay guys!Here's my answer!
DELIMITER_VAL='=' read -d '' F_ABOUT_DISTRO_R <<"EOF" DISTRIB_ID=Ubuntu DISTRIB_RELEASE=14.04 DISTRIB_CODENAME=trusty DISTRIB_DESCRIPTION="Ubuntu 14.04.4 LTS" NAME="Ubuntu" VERSION="14.04.4 LTS, Trusty Tahr" ID=ubuntu ID_LIKE=debian PRETTY_NAME="Ubuntu 14.04.4 LTS" VERSION_ID="14.04" HOME_URL="http://www.ubuntu.com/" SUPPORT_URL="http://help.ubuntu.com/" BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/" EOF SPLIT_NOW=$(awk -F$DELIMITER_VAL '{for(i=1;i<=NF;i++){printf "%s\n", $i}}' <<<"${F_ABOUT_DISTRO_R}") while read -r line; do SPLIT+=("$line") done <<< "$SPLIT_NOW" for i in "${SPLIT[@]}"; do echo "$i" doneWhy this approach is "the best" for me?
Because of two reasons:
- You do not need to escape the delimiter;
- You will not have problem with blank spaces . The value will be properly separated in the array!
[]'s
gniourf_gniourf , Jan 30, 2017 at 8:26
FYI,/etc/os-release
and/etc/lsb-release
are meant to be sourced, and not parsed. So your method is really wrong. Moreover, you're not quite answering the question about spiltting a string on a delimiter. – gniourf_gniourf Jan 30 '17 at 8:26Michael Hale , Jun 14, 2012 at 17:38
A one-liner to split a string separated by ';' into an array is:IN="[email protected];[email protected]" ADDRS=( $(IFS=";" echo "$IN") ) echo ${ADDRS[0]} echo ${ADDRS[1]}This only sets IFS in a subshell, so you don't have to worry about saving and restoring its value.
Luca Borrione , Sep 3, 2012 at 10:04
-1 this doesn't work here (ubuntu 12.04). it prints only the first echo with all $IN value in it, while the second is empty. you can see it if you put echo "0: "${ADDRS[0]}\n echo "1: "${ADDRS[1]} the output is0: [email protected];[email protected]\n 1:
(\n is new line) – Luca Borrione Sep 3 '12 at 10:04Luca Borrione , Sep 3, 2012 at 10:05
please refer to nickjb's answer at for a working alternative to this idea stackoverflow.com/a/6583589/1032370 – Luca Borrione Sep 3 '12 at 10:05Score_Under , Apr 28, 2015 at 17:09
-1, 1. IFS isn't being set in that subshell (it's being passed to the environment of "echo", which is a builtin, so nothing is happening anyway). 2.$IN
is quoted so it isn't subject to IFS splitting. 3. The process substitution is split by whitespace, but this may corrupt the original data. – Score_Under Apr 28 '15 at 17:09ajaaskel , Oct 10, 2014 at 11:33
IN='[email protected];[email protected];Charlie Brown <[email protected];!"#$%&/()[]{}*? are no problem;simple is beautiful :-)' set -f oldifs="$IFS" IFS=';'; arrayIN=($IN) IFS="$oldifs" for i in "${arrayIN[@]}"; do echo "$i" done set +fOutput:
[email protected] [email protected] Charlie Brown <[email protected] !"#$%&/()[]{}*? are no problem simple is beautiful :-)Explanation: Simple assignment using parenthesis () converts semicolon separated list into an array provided you have correct IFS while doing that. Standard FOR loop handles individual items in that array as usual. Notice that the list given for IN variable must be "hard" quoted, that is, with single ticks.
IFS must be saved and restored since Bash does not treat an assignment the same way as a command. An alternate workaround is to wrap the assignment inside a function and call that function with a modified IFS. In that case separate saving/restoring of IFS is not needed. Thanks for "Bize" for pointing that out.
gniourf_gniourf , Feb 20, 2015 at 16:45
!"#$%&/()[]{}*? are no problem
well... not quite:[]*?
are glob characters. So what about creating this directory and file: `mkdir '!"#$%&'; touch '!"#$%&/()[]{} got you hahahaha - are no problem' and running your command? simple may be beautiful, but when it's broken, it's broken. – gniourf_gniourf Feb 20 '15 at 16:45ajaaskel , Feb 25, 2015 at 7:20
@gniourf_gniourf The string is stored in a variable. Please see the original question. – ajaaskel Feb 25 '15 at 7:20gniourf_gniourf , Feb 25, 2015 at 7:26
@ajaaskel you didn't fully understand my comment. Go in a scratch directory and issue these commands:mkdir '!"#$%&'; touch '!"#$%&/()[]{} got you hahahaha - are no problem'
. They will only create a directory and a file, with weird looking names, I must admit. Then run your commands with the exactIN
you gave:IN='[email protected];[email protected];Charlie Brown <[email protected];!"#$%&/()[]{}*? are no problem;simple is beautiful :-)'
. You'll see that you won't get the output you expect. Because you're using a method subject to pathname expansions to split your string. – gniourf_gniourf Feb 25 '15 at 7:26gniourf_gniourf , Feb 25, 2015 at 7:29
This is to demonstrate that the characters*
,?
,[...]
and even, ifextglob
is set,!(...)
,@(...)
,?(...)
,+(...)
are problems with this method! – gniourf_gniourf Feb 25 '15 at 7:29ajaaskel , Feb 26, 2015 at 15:26
@gniourf_gniourf Thanks for detailed comments on globbing. I adjusted the code to have globbing off. My point was however just to show that rather simple assignment can do the splitting job. – ajaaskel Feb 26 '15 at 15:26> , Dec 19, 2013 at 21:39
Maybe not the most elegant solution, but works with*
and spaces:IN="bla@so me.com;*;[email protected]" for i in `delims=${IN//[^;]}; seq 1 $((${#delims} + 1))` do echo "> [`echo $IN | cut -d';' -f$i`]" doneOutputs
> [bla@so me.com] > [*] > [[email protected]]Other example (delimiters at beginning and end):
IN=";bla@so me.com;*;[email protected];" > [] > [bla@so me.com] > [*] > [[email protected]] > []Basically it removes every character other than
;
makingdelims
eg.;;;
. Then it doesfor
loop from1
tonumber-of-delimiters
as counted by${#delims}
. The final step is to safely get the$i
th part usingcut
.
Oct 20, 2017 | stackoverflow.com
Amit , Jun 7, 2011 at 19:18
I have a couple of variables and I want to check the following condition (written out in words, then my failed attempt at bash scripting):if varA EQUALS 1 AND ( varB EQUALS "t1" OR varB EQUALS "t2" ) then do something done.And in my failed attempt, I came up with:
if (($varA == 1)) && ( (($varB == "t1")) || (($varC == "t2")) ); then scale=0.05 fiBest answer Gilles
What you've written actually almost works (it would work if all the variables were numbers), but it's not an idiomatic way at all.
( )
parentheses indicate a subshell . What's inside them isn't an expression like in many other languages. It's a list of commands (just like outside parentheses). These commands are executed in a separate subprocess, so any redirection, assignment, etc. performed inside the parentheses has no effect outside the parentheses.
- With a leading dollar sign,
$( )
is a command substitution : there is a command inside the parentheses, and the output from the command is used as part of the command line (after extra expansions unless the substitution is between double quotes, but that's another story ).
{ }
braces are like parentheses in that they group commands, but they only influence parsing, not grouping. The programx=2; { x=4; }; echo $x
prints 4, whereasx=2; (x=4); echo $x
prints 2. (Also braces require spaces around them and a semicolon before closing, whereas parentheses don't. That's just a syntax quirk.)
- With a leading dollar sign,
${VAR}
is a parameter expansion , expanding to the value of a variable, with possible extra transformations.
(( ))
double parentheses surround an arithmetic instruction , that is, a computation on integers, with a syntax resembling other programming languages. This syntax is mostly used for assignments and in conditionals.
- The same syntax is used in arithmetic expressions
$(( ))
, which expand to the integer value of the expression.
[[ ]]
double brackets surround conditional expressions . Conditional expressions are mostly built on operators such as-n $variable
to test if a variable is empty and-e $file
to test if a file exists. There are also string equality operators:"$string1" = "$string2"
(beware that the right-hand side is a pattern, e.g.[[ $foo = a* ]]
tests if$foo
starts witha
while[[ $foo = "a*" ]]
tests if$foo
is exactlya*
), and the familiar!
,&&
and||
operators for negation, conjunction and disjunction as well as parentheses for grouping.
- Note that you need a space around each operator (e.g.
[[ "$x" = "$y" ]]
, not), and a space or a character like[[ "$x"="$y" ]]
;
both inside and outside the brackets (e.g.[[ -n $foo ]]
, not).[[-n $foo]]
[ ]
single brackets are an alternate form of conditional expressions with more quirks (but older and more portable). Don't write any for now; start worrying about them when you find scripts that contain them.This is the idiomatic way to write your test in bash:
if [[ $varA = 1 && ($varB = "t1" || $varC = "t2") ]]; thenIf you need portability to other shells, this would be the way (note the additional quoting and the separate sets of brackets around each individual test):
if [ "$varA" = 1 ] && { [ "$varB" = "t1" ] || [ "$varC" = "t2" ]; }; thenWill Sheppard , Jun 19, 2014 at 11:07
It's better to use==
to differentiate the comparison from assigning a variable (which is also=
) – Will Sheppard Jun 19 '14 at 11:07Cbhihe , Apr 3, 2016 at 8:05
+1 @WillSheppard for yr reminder of proper style. Gilles, don't you need a semicolon after yr closing curly bracket and before "then" ? I always thoughtif
,then
,else
andfi
could not be on the same line... As in:
if [ "$varA" = 1 ] && { [ "$varB" = "t1" ] || [ "$varC" = "t2" ]; }; then
Rockallite , Jan 19 at 2:41
Backquotes (` `
) are old-style form of command substitution, with some differences: in this form, backslash retains its literal meaning except when followed by$
,`
, or\
, and the first backquote not preceded by a backslash terminates the command substitution; whereas in the$( )
form, all characters between the parentheses make up the command, none are treated specially.Peter A. Schneider , Aug 28 at 13:16
You could emphasize that single brackets have completely different semantics inside and outside of double brackets. (Because you start with explicitly pointing out the subshell semantics but then only as an aside mention the grouping semantics as part of conditional expressions. Was confusing to me for a second when I looked at your idiomatic example.) – Peter A. Schneider Aug 28 at 13:16matchew , Jun 7, 2011 at 19:29
very closeif (( $varA == 1 )) && [[ $varB == 't1' || $varC == 't2' ]]; then scale=0.05 fishould work.
breaking it down
(( $varA == 1 ))is an integer comparison where as
$varB == 't1'is a string comparison. otherwise, I am just grouping the comparisons correctly.
Double square brackets delimit a Conditional Expression. And, I find the following to be a good reading on the subject: "(IBM) Demystify test, [, [[, ((, and if-then-else"
Peter A. Schneider , Aug 28 at 13:21
Just to be sure: The quoting in 't1' is unnecessary, right? Because as opposed to arithmetic instructions in double parentheses, where t1 would be a variable, t1 in a conditional expression in double brackets is just a literal string.I.e.,
[[ $varB == 't1' ]]
is exactly the same as[[ $varB == t1 ]]
, right? – Peter A. Schneider Aug 28 at 13:21
Oct 20, 2017 | unix.stackexchange.com
OR in `expr match` up vote down vote favorite
stracktracer , Dec 14, 2015 at 13:54
I'm confused as to why this does not match:
expr match Unauthenticated123 '^(Unauthenticated|Authenticated).*'
it outputs 0.
Charles Duffy , Dec 14, 2015 at 18:22
As an aside, if you were using bash for this, the preferred alternative would be the=~
operator in[[ ]]
, ie.[[ Unauthenticated123 =~ ^(Unauthenticated|Authenticated) ]]
– Charles Duffy Dec 14 '15 at 18:22Charles Duffy , Dec 14, 2015 at 18:25
...and if you weren't targeting a known/fixed operating system, usingcase
rather than a regex match is very much the better practice, since the accepted answer depends on behavior POSIX doesn't define. – Charles Duffy Dec 14 '15 at 18:25Gilles , Dec 14, 2015 at 23:43
See Why does my regular expression work in X but not in Y? – Gilles Dec 14 '15 at 23:43Lambert , Dec 14, 2015 at 14:04
Your command should be:expr match Unauthenticated123 'Unauthenticated\|Authenticated'If you want the number of characters matched.
To have the part of the string (Unauthenticated) returned use:
expr match Unauthenticated123 '\(Unauthenticated\|Authenticated\)'From
info coreutils 'expr invocation'
:
STRING : REGEX' Perform pattern matching. The arguments are converted to strings and the second is considered to be a (basic, a la GNU
grep') regular expression, with a `^' implicitly prepended. The first argument is then matched against this regular expression.If the match succeeds and REGEX uses `\(' and `\)', the `:' expression returns the part of STRING that matched the subexpression; otherwise, it returns the number of characters matched. If the match fails, the `:' operator returns the null string if `\(' and `\)' are used in REGEX, otherwise 0. Only the first `\( ... \)' pair is relevant to the return value; additional pairs are meaningful only for grouping the regular expression operators. In the regular expression, `\+', `\?', and `\|' are operators which respectively match one or more, zero or one, or separate alternatives. SunOS and other `expr''s treat these as regular characters. (POSIX allows either behavior.) *Note Regular Expression Library: (regex)Top, for details of regular expression syntax. Some examples are in *note Examples of expr::.stracktracer , Dec 14, 2015 at 14:18
Thanks escaping the | worked. Weird, normally I'd expect it if I wanted to match the literal |... – stracktracer Dec 14 '15 at 14:18reinierpost , Dec 14, 2015 at 15:34
Regular expression syntax, including the use of backquoting, is different for different tools. Always look it up. – reinierpost Dec 14 '15 at 15:34Stéphane Chazelas , Dec 14, 2015 at 14:49
Note that bothmatch
and\|
are GNU extensions (and the behaviour for:
(thematch
standard equivalent) when the pattern starts with^
varies with implementations). Standardly, you'd do:expr " $string" : " Authenticated" '|' " $string" : " Unauthenticated"The leading space is to avoid problems with values of
$string
that start with-
or areexpr
operators, but that means it adds one to the number of characters being matched.With GNU
expr
, you'd write it:expr + "$string" : 'Authenticated\|Unauthenticated'The
+
forces$string
to be taken as a string even if it happens to be aexpr
operator.expr
regular expressions are basic regular expressions which don't have an alternation operator (and where|
is not special). The GNU implementation has it as\|
though as an extension.If all you want is to check whether
$string
starts withAuthenticated
orUnauthenticated
, you'd better use:case $string in (Authenticated* | Unauthenticated*) do-something esacnetmonk , Dec 14, 2015 at 14:06
$ expr match "Unauthenticated123" '^\(Unauthenticated\|Authenticated\).*'
you have to escape with\
the parenthesis and the pipe.mikeserv , Dec 14, 2015 at 14:18
and the^
may not mean what some would think depending on theexpr
. it is implied anyway. – mikeserv Dec 14 '15 at 14:18Stéphane Chazelas , Dec 14, 2015 at 14:34
@mikeserv,match
and\|
are GNU extensions anyway. This Q&A seems to be about GNUexpr
anyway (where^
is guaranteed to mean match at the beginning of the string ). – Stéphane Chazelas Dec 14 '15 at 14:34mikeserv , Dec 14, 2015 at 14:49
@StéphaneChazelas - i didn't know they were strictly GNU. i think i remember them being explicitly officially unspecified - but i don't useexpr
too often anyway and didn't know that. thank you. – mikeserv Dec 14 '15 at 14:49Random832 , Dec 14, 2015 at 16:13
It's not "strictly GNU" - it's present in a number of historical implementations (even System V had it, undocumented, though it didn't have the others like substr/length/index), which is why it's explicitly unspecified. I can't find anything about\|
being an extension. – Random832 Dec 14 '15 at 16:13
Feb 15, 2010 | stackoverflow.com
assassin , Feb 15, 2010 at 7:02
Is there a way in bash to convert a string into a lower case string?For example, if I have:
a="Hi all"I want to convert it to:
"hi all"ghostdog74 , Feb 15, 2010 at 7:43
The are various ways: tr$ echo "$a" | tr '[:upper:]' '[:lower:]' hi allAWK$ echo "$a" | awk '{print tolower($0)}' hi allBash 4.0$ echo "${a,,}" hi allPerl$ echo "$a" | perl -ne 'print lc' hi allBashlc(){ case "$1" in [A-Z]) n=$(printf "%d" "'$1") n=$((n+32)) printf \\$(printf "%o" "$n") ;; *) printf "%s" "$1" ;; esac } word="I Love Bash" for((i=0;i<${#word};i++)) do ch="${word:$i:1}" lc "$ch" donejangosteve , Jan 14, 2012 at 21:58
Am I missing something, or does your last example (in Bash) actually do something completely different? It works for "ABX", but if you instead makeword="Hi All"
like the other examples, it returnsha
, nothi all
. It only works for the capitalized letters and skips the already-lowercased letters. – jangosteve Jan 14 '12 at 21:58Richard Hansen , Feb 3, 2012 at 18:55
Note that only thetr
andawk
examples are specified in the POSIX standard. – Richard Hansen Feb 3 '12 at 18:55Richard Hansen , Feb 3, 2012 at 18:58
tr '[:upper:]' '[:lower:]'
will use the current locale to determine uppercase/lowercase equivalents, so it'll work with locales that use letters with diacritical marks. – Richard Hansen Feb 3 '12 at 18:58Adam Parkin , Sep 25, 2012 at 18:01
How does one get the output into a new variable? Ie say I want the lowercased string into a new variable? – Adam Parkin Sep 25 '12 at 18:01Tino , Nov 14, 2012 at 15:39
@Adam:b="$(echo $a | tr '[A-Z]' '[a-z]')"
– Tino Nov 14 '12 at 15:39Dennis Williamson , Feb 15, 2010 at 10:31
In Bash 4:To lowercase
$ string="A FEW WORDS" $ echo "${string,}" a FEW WORDS $ echo "${string,,}" a few words $ echo "${string,,[AEIUO]}" a FeW WoRDS $ string="A Few Words" $ declare -l string $ string=$string; echo "$string" a few wordsTo uppercase
$ string="a few words" $ echo "${string^}" A few words $ echo "${string^^}" A FEW WORDS $ echo "${string^^[aeiou]}" A fEw wOrds $ string="A Few Words" $ declare -u string $ string=$string; echo "$string" A FEW WORDSToggle (undocumented, but optionally configurable at compile time)
$ string="A Few Words" $ echo "${string~~}" a fEW wORDS $ string="A FEW WORDS" $ echo "${string~}" a FEW WORDS $ string="a few words" $ echo "${string~}" A few wordsCapitalize (undocumented, but optionally configurable at compile time)
$ string="a few words" $ declare -c string $ string=$string $ echo "$string" A few wordsTitle case:
$ string="a few words" $ string=($string) $ string="${string[@]^}" $ echo "$string" A Few Words $ declare -c string $ string=(a few words) $ echo "${string[@]}" A Few Words $ string="a FeW WOrdS" $ string=${string,,} $ string=${string~} $ echo "$string"To turn off a
declare
attribute, use+
. For example,declare +c string
. This affects subsequent assignments and not the current value.The
declare
options change the attribute of the variable, but not the contents. The reassignments in my examples update the contents to show the changes.Edit:
Added "toggle first character by word" (
${var~}
) as suggested by ghostdog74Edit: Corrected tilde behavior to match Bash 4.3.
ghostdog74 , Feb 15, 2010 at 10:52
there's also${string~}
– ghostdog74 Feb 15 '10 at 10:52Hubert Kario , Jul 12, 2012 at 16:48
Quite bizzare, "^^" and ",," operators don't work on non-ASCII characters but "~~" does... Sostring="łódź"; echo ${string~~}
will return "ŁÓDŹ", butecho ${string^^}
returns "łóDź". Even inLC_ALL=pl_PL.utf-8
. That's using bash 4.2.24. – Hubert Kario Jul 12 '12 at 16:48Dennis Williamson , Jul 12, 2012 at 18:20
@HubertKario: That's weird. It's the same for me in Bash 4.0.33 with the same string inen_US.UTF-8
. It's a bug and I've reported it. – Dennis Williamson Jul 12 '12 at 18:20Dennis Williamson , Jul 13, 2012 at 0:44
@HubertKario: Tryecho "$string" | tr '[:lower:]' '[:upper:]'
. It will probably exhibit the same failure. So the problem is at least partly not Bash's. – Dennis Williamson Jul 13 '12 at 0:44Dennis Williamson , Jul 14, 2012 at 14:27
@HubertKario: The Bash maintainer has acknowledged the bug and stated that it will be fixed in the next release. – Dennis Williamson Jul 14 '12 at 14:27shuvalov , Feb 15, 2010 at 7:13
echo "Hi All" | tr "[:upper:]" "[:lower:]"Richard Hansen , Feb 3, 2012 at 19:00
+1 for not assuming english – Richard Hansen Feb 3 '12 at 19:00Hubert Kario , Jul 12, 2012 at 16:56
@RichardHansen:tr
doesn't work for me for non-ACII characters. I do have correct locale set and locale files generated. Have any idea what could I be doing wrong? – Hubert Kario Jul 12 '12 at 16:56wasatchwizard , Oct 23, 2014 at 16:42
FYI: This worked on Windows/Msys. Some of the other suggestions did not. – wasatchwizard Oct 23 '14 at 16:42Ignacio Vazquez-Abrams , Feb 15, 2010 at 7:03
tr :a="$(tr [A-Z] [a-z] <<< "$a")"AWK :{ print tolower($0) }sed :y/ABCDEFGHIJKLMNOPQRSTUVWXYZ/abcdefghijklmnopqrstuvwxyz/Sandeepan Nath , Feb 2, 2011 at 11:12
+1a="$(tr [A-Z] [a-z] <<< "$a")"
looks easiest to me. I am still a beginner... – Sandeepan Nath Feb 2 '11 at 11:12Haravikk , Oct 19, 2013 at 12:54
I strongly recommend thesed
solution; I've been working in an environment that for some reason doesn't havetr
but I've yet to find a system withoutsed
, plus a lot of the time I want to do this I've just done something else insed
anyway so can chain the commands together into a single (long) statement. – Haravikk Oct 19 '13 at 12:54Dennis , Nov 6, 2013 at 19:49
The bracket expressions should be quoted. Intr [A-Z] [a-z] A
, the shell may perform filename expansion if there are filenames consisting of a single letter or nullgob is set.tr "[A-Z]" "[a-z]" A
will behave properly. – Dennis Nov 6 '13 at 19:49Haravikk , Jun 15, 2014 at 10:51
@CamiloMartin it's a BusyBox system where I'm having that problem, specifically Synology NASes, but I've encountered it on a few other systems too. I've been doing a lot of cross-platform shell scripting lately, and with the requirement that nothing extra be installed it makes things very tricky! However I've yet to encounter a system withoutsed
– Haravikk Jun 15 '14 at 10:51fuz , Jan 31, 2016 at 14:54
Note thattr [A-Z] [a-z]
is incorrect in almost all locales. for example, in theen-US
locale,A-Z
is actually the intervalAaBbCcDdEeFfGgHh...XxYyZ
. – fuz Jan 31 '16 at 14:54nettux443 , May 14, 2014 at 9:36
I know this is an oldish post but I made this answer for another site so I thought I'd post it up here:UPPER -> lower : use python:
b=`echo "print '$a'.lower()" | python`Or Ruby:
b=`echo "print '$a'.downcase" | ruby`Or Perl (probably my favorite):
b=`perl -e "print lc('$a');"`Or PHP:
b=`php -r "print strtolower('$a');"`Or Awk:
b=`echo "$a" | awk '{ print tolower($1) }'`Or Sed:
b=`echo "$a" | sed 's/./\L&/g'`Or Bash 4:
b=${a,,}Or NodeJS if you have it (and are a bit nuts...):
b=`echo "console.log('$a'.toLowerCase());" | node`You could also use
dd
(but I wouldn't!):b=`echo "$a" | dd conv=lcase 2> /dev/null`lower -> UPPER
use python:
b=`echo "print '$a'.upper()" | python`Or Ruby:
b=`echo "print '$a'.upcase" | ruby`Or Perl (probably my favorite):
b=`perl -e "print uc('$a');"`Or PHP:
b=`php -r "print strtoupper('$a');"`Or Awk:
b=`echo "$a" | awk '{ print toupper($1) }'`Or Sed:
b=`echo "$a" | sed 's/./\U&/g'`Or Bash 4:
b=${a^^}Or NodeJS if you have it (and are a bit nuts...):
b=`echo "console.log('$a'.toUpperCase());" | node`You could also use
dd
(but I wouldn't!):b=`echo "$a" | dd conv=ucase 2> /dev/null`Also when you say 'shell' I'm assuming you mean
bash
but if you can usezsh
it's as easy asb=$a:lfor lower case and
b=$a:ufor upper case.
JESii , May 28, 2015 at 21:42
Neither the sed command nor the bash command worked for me. – JESii May 28 '15 at 21:42nettux443 , Nov 20, 2015 at 14:33
@JESii both work for me upper -> lower and lower-> upper. I'm using sed 4.2.2 and Bash 4.3.42(1) on 64bit Debian Stretch. – nettux443 Nov 20 '15 at 14:33JESii , Nov 21, 2015 at 17:34
Hi, @nettux443... I just tried the bash operation again and it still fails for me with the error message "bad substitution". I'm on OSX using homebrew's bash: GNU bash, version 4.3.42(1)-release (x86_64-apple-darwin14.5.0) – JESii Nov 21 '15 at 17:34tripleee , Jan 16, 2016 at 11:45
Do not use! All of the examples which generate a script are extremely brittle; if the value ofa
contains a single quote, you have not only broken behavior, but a serious security problem. – tripleee Jan 16 '16 at 11:45Scott Smedley , Jan 27, 2011 at 5:37
In zsh:echo $a:uGotta love zsh!
Scott Smedley , Jan 27, 2011 at 5:39
or $a:l for lower case conversion – Scott Smedley Jan 27 '11 at 5:39biocyberman , Jul 24, 2015 at 23:26
Add one more case:echo ${(C)a} #Upcase the first char only
– biocyberman Jul 24 '15 at 23:26devnull , Sep 26, 2013 at 15:45
Using GNUsed
:sed 's/.*/\L&/'Example:
$ foo="Some STRIng"; $ foo=$(echo "$foo" | sed 's/.*/\L&/') $ echo "$foo" some stringtechnosaurus , Jan 21, 2012 at 10:27
For a standard shell (without bashisms) using only builtins:uppers=ABCDEFGHIJKLMNOPQRSTUVWXYZ lowers=abcdefghijklmnopqrstuvwxyz lc(){ #usage: lc "SOME STRING" -> "some string" i=0 while ([ $i -lt ${#1} ]) do CUR=${1:$i:1} case $uppers in *$CUR*)CUR=${uppers%$CUR*};OUTPUT="${OUTPUT}${lowers:${#CUR}:1}";; *)OUTPUT="${OUTPUT}$CUR";; esac i=$((i+1)) done echo "${OUTPUT}" }And for upper case:
uc(){ #usage: uc "some string" -> "SOME STRING" i=0 while ([ $i -lt ${#1} ]) do CUR=${1:$i:1} case $lowers in *$CUR*)CUR=${lowers%$CUR*};OUTPUT="${OUTPUT}${uppers:${#CUR}:1}";; *)OUTPUT="${OUTPUT}$CUR";; esac i=$((i+1)) done echo "${OUTPUT}" }Dereckson , Nov 23, 2014 at 19:52
I wonder if you didn't let some bashism in this script, as it's not portable on FreeBSD sh: ${1:$...}: Bad substitution – Dereckson Nov 23 '14 at 19:52tripleee , Apr 14, 2015 at 7:09
Indeed; substrings with${var:1:1}
are a Bashism. – tripleee Apr 14 '15 at 7:09Derek Shaw , Jan 24, 2011 at 13:53
Regular expressionI would like to take credit for the command I wish to share but the truth is I obtained it for my own use from http://commandlinefu.com . It has the advantage that if you
cd
to any directory within your own home folder that is it will change all files and folders to lower case recursively please use with caution. It is a brilliant command line fix and especially useful for those multitudes of albums you have stored on your drive.find . -depth -exec rename 's/(.*)\/([^\/]*)/$1\/\L$2/' {} \;You can specify a directory in place of the dot(.) after the find which denotes current directory or full path.
I hope this solution proves useful the one thing this command does not do is replace spaces with underscores - oh well another time perhaps.
Wadih M. , Nov 29, 2011 at 1:31
thanks for commandlinefu.com – Wadih M. Nov 29 '11 at 1:31John Rix , Jun 26, 2013 at 15:58
This didn't work for me for whatever reason, though it looks fine. I did get this to work as an alternative though: find . -exec /bin/bash -c 'mv {} `tr [A-Z] [a-z] <<< {}`' \; – John Rix Jun 26 '13 at 15:58Tino , Dec 11, 2015 at 16:27
This needsprename
fromperl
:dpkg -S "$(readlink -e /usr/bin/rename)"
givesperl: /usr/bin/prename
– Tino Dec 11 '15 at 16:27c4f4t0r , Aug 21, 2013 at 10:21
In bash 4 you can use typesetExample:
A="HELLO WORLD" typeset -l A=$Acommunity wiki, Jan 16, 2016 at 12:26
Pre Bash 4.0Bash Lower the Case of a string and assign to variable
VARIABLE=$(echo "$VARIABLE" | tr '[:upper:]' '[:lower:]') echo "$VARIABLE"Tino , Dec 11, 2015 at 16:23
No need forecho
and pipes: use$(tr '[:upper:]' '[:lower:]' <<<"$VARIABLE")
– Tino Dec 11 '15 at 16:23tripleee , Jan 16, 2016 at 12:28
@Tino The here string is also not portable back to really old versions of Bash; I believe it was introduced in v3. – tripleee Jan 16 '16 at 12:28Tino , Jan 17, 2016 at 14:28
@tripleee You are right, it was introduced in bash-2.05b - however that's the oldest bash I was able to find on my systems – Tino Jan 17 '16 at 14:28Bikesh M Annur , Mar 23 at 6:48
You can try thiss="Hello World!" echo $s # Hello World! a=${s,,} echo $a # hello world! b=${s^^} echo $b # HELLO WORLD!ref : http://wiki.workassis.com/shell-script-convert-text-to-lowercase-and-uppercase/
Orwellophile , Mar 24, 2013 at 13:43
For Bash versions earlier than 4.0, this version should be fastest (as it doesn't fork/exec any commands):function string.monolithic.tolower { local __word=$1 local __len=${#__word} local __char local __octal local __decimal local __result for (( i=0; i<__len; i++ )) do __char=${__word:$i:1} case "$__char" in [A-Z] ) printf -v __decimal '%d' "'$__char" printf -v __octal '%03o' $(( $__decimal ^ 0x20 )) printf -v __char \\$__octal ;; esac __result+="$__char" done REPLY="$__result" }technosaurus's answer had potential too, although it did run properly for mee.
Stephen M. Harris , Mar 22, 2013 at 22:42
If using v4, this is baked-in . If not, here is a simple, widely applicable solution. Other answers (and comments) on this thread were quite helpful in creating the code below.# Like echo, but converts to lowercase echolcase () { tr [:upper:] [:lower:] <<< "${*}" } # Takes one arg by reference (var name) and makes it lowercase lcase () { eval "${1}"=\'$(echo ${!1//\'/"'\''"} | tr [:upper:] [:lower:] )\' }Notes:
- Doing:
a="Hi All"
and then:lcase a
will do the same thing as:a=$( echolcase "Hi All" )
- In the lcase function, using
${!1//\'/"'\''"}
instead of${!1}
allows this to work even when the string has quotes.JaredTS486 , Dec 23, 2015 at 17:37
In spite of how old this question is and similar to this answer by technosaurus . I had a hard time finding a solution that was portable across most platforms (That I Use) as well as older versions of bash. I have also been frustrated with arrays, functions and use of prints, echos and temporary files to retrieve trivial variables. This works very well for me so far I thought I would share. My main testing environments are:
- GNU bash, version 4.1.2(1)-release (x86_64-redhat-linux-gnu)
- GNU bash, version 3.2.57(1)-release (sparc-sun-solaris2.10)
lcs="abcdefghijklmnopqrstuvwxyz" ucs="ABCDEFGHIJKLMNOPQRSTUVWXYZ" input="Change Me To All Capitals" for (( i=0; i<"${#input}"; i++ )) ; do : for (( j=0; j<"${#lcs}"; j++ )) ; do : if [[ "${input:$i:1}" == "${lcs:$j:1}" ]] ; then input="${input/${input:$i:1}/${ucs:$j:1}}" fi done doneSimple C-style for loop to iterate through the strings. For the line below if you have not seen anything like this before this is where I learned this . In this case the line checks if the char ${input:$i:1} (lower case) exists in input and if so replaces it with the given char ${ucs:$j:1} (upper case) and stores it back into input.
input="${input/${input:$i:1}/${ucs:$j:1}}"Gus Neves , May 16 at 10:04
Many answers using external programs, which is not really usingBash
.If you know you will have Bash4 available you should really just use the
${VAR,,}
notation (it is easy and cool). For Bash before 4 (My Mac still uses Bash 3.2 for example). I used the corrected version of @ghostdog74 's answer to create a more portable version.One you can call
lowercase 'my STRING'
and get a lowercase version. I read comments about setting the result to a var, but that is not really portable inBash
, since we can't return strings. Printing it is the best solution. Easy to capture with something likevar="$(lowercase $str)"
.How this works
The way this works is by getting the ASCII integer representation of each char with
printf
and thenadding 32
ifupper-to->lower
, orsubtracting 32
iflower-to->upper
. Then useprintf
again to convert the number back to a char. From'A' -to-> 'a'
we have a difference of 32 chars.Using
printf
to explain:$ printf "%d\n" "'a" 97 $ printf "%d\n" "'A" 65
97 - 65 = 32
And this is the working version with examples.
Please note the comments in the code, as they explain a lot of stuff:#!/bin/bash # lowerupper.sh # Prints the lowercase version of a char lowercaseChar(){ case "$1" in [A-Z]) n=$(printf "%d" "'$1") n=$((n+32)) printf \\$(printf "%o" "$n") ;; *) printf "%s" "$1" ;; esac } # Prints the lowercase version of a sequence of strings lowercase() { word="$@" for((i=0;i<${#word};i++)); do ch="${word:$i:1}" lowercaseChar "$ch" done } # Prints the uppercase version of a char uppercaseChar(){ case "$1" in [a-z]) n=$(printf "%d" "'$1") n=$((n-32)) printf \\$(printf "%o" "$n") ;; *) printf "%s" "$1" ;; esac } # Prints the uppercase version of a sequence of strings uppercase() { word="$@" for((i=0;i<${#word};i++)); do ch="${word:$i:1}" uppercaseChar "$ch" done } # The functions will not add a new line, so use echo or # append it if you want a new line after printing # Printing stuff directly lowercase "I AM the Walrus!"$'\n' uppercase "I AM the Walrus!"$'\n' echo "----------" # Printing a var str="A StRing WITH mixed sTUFF!" lowercase "$str"$'\n' uppercase "$str"$'\n' echo "----------" # Not quoting the var should also work, # since we use "$@" inside the functions lowercase $str$'\n' uppercase $str$'\n' echo "----------" # Assigning to a var myLowerVar="$(lowercase $str)" myUpperVar="$(uppercase $str)" echo "myLowerVar: $myLowerVar" echo "myUpperVar: $myUpperVar" echo "----------" # You can even do stuff like if [[ 'option 2' = "$(lowercase 'OPTION 2')" ]]; then echo "Fine! All the same!" else echo "Ops! Not the same!" fi exit 0And the results after running this:
$ ./lowerupper.sh i am the walrus! I AM THE WALRUS! ---------- a string with mixed stuff! A STRING WITH MIXED STUFF! ---------- a string with mixed stuff! A STRING WITH MIXED STUFF! ---------- myLowerVar: a string with mixed stuff! myUpperVar: A STRING WITH MIXED STUFF! ---------- Fine! All the same!This should only work for ASCII characters though .
For me it is fine, since I know I will only pass ASCII chars to it.
I am using this for some case-insensitive CLI options, for example.nitinr708 , Jul 8, 2016 at 9:20
To store the transformed string into a variable. Following worked for me -$SOURCE_NAME
to$TARGET_NAME
TARGET_NAME="`echo $SOURCE_NAME | tr '[:upper:]' '[:lower:]'`"
Jun 18, 2017 | opensource.com
About conditional, substring, and substitution parameter expansion operators Conditional parameter expansionConditional parameter expansion allows branching on whether the parameter is unset, empty, or has content. Based on these conditions, the parameter can be expanded to its value, a default value, or an alternate value; throw a customizable error; or reassign the parameter to a default value. The following table shows the conditional parameter expansions-each row shows a parameter expansion using an operator to potentially modify the expansion, with the columns showing the result of that expansion given the parameter's status as indicated in the column headers. Operators with the ':' prefix treat parameters with empty values as if they were unset.
parameter expansion unset var var="" var="gnu" ${var-default} default - gnu ${var:-default} default default gnu ${var+alternate} - alternate alternate ${var:+alternate} - - alternate ${var?error} error - gnu ${var:?error} error error gnu The = and := operators in the table function identically to - and :- , respectively, except that the = variants rebind the variable to the result of the expansion.
As an example, let's try opening a user's editor on a file specified by the OUT_FILE variable. If either the EDITOR environment variable or our OUT_FILE variable is not specified, we will have a problem. Using a conditional expansion, we can ensure that when the EDITOR variable is expanded, we get the specified value or at least a sane default:
$ echo ${ EDITOR } /usr/bin/vi $ echo ${ EDITOR :- $( which nano ) } /usr/bin/vi $ unset EDITOR $ echo ${ EDITOR :- $( which nano ) } /usr/bin/nanoBuilding on the above, we can run the editor command and abort with a helpful error at runtime if there's no filename specified:
$ ${ EDITOR :- $( which nano ) } ${ OUT_FILE :? Missing filename } bash: OUT_FILE: Missing filenameSubstring parameter expansionParameters can be expanded to just part of their contents, either by offset or by removing content matching a pattern. When specifying a substring offset, a length may optionally be specified. If running Bash version 4.2 or greater, negative numbers may be used as offsets from the end of the string. Note the parentheses used around the negative offset, which ensure that Bash does not parse the expansion as having the conditional default expansion operator from above:
$ location = " CA 90095 " $ echo " Zip Code: ${ location : 3 } " Zip Code: 90095 $ echo " Zip Code: ${ location : (-5) } " Zip Code: 90095 $ echo " State: ${ location : 0 : 2 } " State: CAAnother way to take a substring is to remove characters from the string matching a pattern, either from the left edge with the # and ## operators or from the right edge with the % and %% operators. A useful mnemonic is that # appears left of a comment and % appears right of a number. When the operator is doubled, it matches greedily, as opposed to the single version, which removes the most minimal set of characters matching the pattern.
var="open source" parameter expansion offset of 5
length of 4${var:offset} source ${var:offset:length} sour pattern of *o? ${var#pattern} en source ${var##pattern} rce pattern of ?e* ${var%pattern} open sour ${var%%pattern} o The pattern-matching used is the same as with filename globbing: * matches zero or more of any character, ? matches exactly one of any character, [...] brackets introduce a character class match against a single character, supporting negation ( ^ ), as well as the posix character classes, e.g. . By excising characters from our string in this manner, we can take a substring without first knowing the offset of the data we need:
$ echo $ PATH /usr/local/bin:/usr/bin:/bin $ echo " Lowest priority in PATH: ${ PATH ## *: } " Lowest priority in PATH: /bin $ echo " Everything except lowest priority: ${ PATH % :* } " Everything except lowest priority: /usr/local/bin:/usr/bin $ echo " Highest priority in PATH: ${ PATH %% :* } " Highest priority in PATH: /usr/local/binSubstitution in parameter expansionThe same types of patterns are used for substitution in parameter expansion. Substitution is introduced with the / or // operators, followed by two arguments separated by another / representing the pattern and the string to substitute. The pattern matching is always greedy, so the doubled version of the operator, in this case, causes all matches of the pattern to be replaced in the variable's expansion, while the singleton version replaces only the leftmost.
var="free and open" parameter expansion pattern of
string of _${var/pattern/string} free_and open ${var//pattern/string} free_and_open The wealth of parameter expansion modifiers transforms Bash variables and other parameters into powerful tools beyond simple value stores. At the very least, it is important to understand how parameter expansion works when reading Bash scripts, but I suspect that not unlike myself, many of you will enjoy the conciseness and expressiveness that these expansion modifiers bring to your scripts as well as your interactive sessions.
Nov 04, 2016 | github.com
Relax-and-Recover is written in Bash (at least bash version 3 is needed), a language that can be used in many styles. We want to make it easier for everybody to understand the Relax-and-Recover code and subsequently to contribute fixes and enhancements.
Here is a collection of coding hints that should help to get a more consistent code base.
Don't be afraid to contribute to Relax-and-Recover even if your contribution does not fully match all this coding hints. Currently large parts of the Relax-and-Recover code are not yet in compliance with this coding hints. This is an ongoing step by step process. Nevertheless try to understand the idea behind this coding hints so that you know how to break them properly (i.e. "learn the rules so you know how to break them properly").
The overall idea behind this coding hints is:
Make yourself understoodMake yourself understood to enable others to fix and enhance your code properly as needed.
From this overall idea the following coding hints are derived.
For the fun of it an extreme example what coding style should be avoided:
#!/bin/bash for i in `seq 1 2 $((2*$1-1))`;do echo $((j+=i));done
Try to find out what that code is about - it does a useful thing.
Code must be easy to readCode should be easy to understand
- Variables and functions must have names that explain what they do, even if it makes them longer. Avoid too short names, in particular do not use one-letter-names (like a variable named
i
- just try to 'grep' for it over the whole code to find code that is related toi
). In general names should consist of two parts, a generic part plus a specific part to make them meaningful. For exampledev
is basically meaningless because there are so many different kind of device-like thingies. Use names likeboot_dev
or even betterboot_partition
versusbootloader_install_device
to make it unambiguous what that thingy actually is about. Use different names for different things so that others can 'grep' over the whole code and get a correct overview what actually belongs to a particular name.- Introduce intermediate variables with meaningful names to tell what is going on.
For example instead of running commands with obfuscated arguments like
rm -f $( ls ... | sed ... | grep ... | awk ... )
which looks scaring (what the heck gets deleted here?) better usefoo_dirs="..." foo_files=$( ls $foo_dirs | sed ... | grep ... ) obsolete_foo_files=$( echo $foo_files | awk ... ) rm -f $obsolete_foo_files
that tells the intent behind (regardless whether or not that code is the best way to do it - but now others can easily improve it).- Use functions to structure longer programs into code blocks that can be understood independently.
- Don't use
||
and&&
one-liners, write proper if-then-else-fi blocks.
Exceptions are simple do-or-die statements like
COMMAND || Error "meaningful error message"
and only if it aids readability compared to a full if-then-else clause.- Use
$( COMMAND )
instead of backticks`COMMAND`
- Use spaces when possible to aid readability like
output=( $( COMMAND1 OPTION1 | COMMAND2 OPTION2 ) )
instead of
output=($(COMMAND1 OPTION1|COMMAND2 OPTION2))
Do not only tell what the code does (i.e. the implementation details) but also explain what the intent behind is (i.e. why ) to make the code maintainable.
- Provide meaningful comments that tell what the computer should do and also explain why it should do it so that others understand the intent behind so that they can properly fix issues or adapt and enhance it as needed.
- If there is a GitHub issue or another URL available for a particular piece of code provide a comment with the GitHub issue or any other URL that tells about the reasoning behind current implementation details.
Here the initial example so that one can understand what it is about:
#!/bin/bash # output the first N square numbers # by summing up the first N odd numbers 1 3 ... 2*N-1 # where each nth partial sum is the nth square number # see https://en.wikipedia.org/wiki/Square_number#Properties # this way it is a little bit faster for big N compared to # calculating each square number on its own via multiplication N=$1 if ! [[ $N =~ ^[0-9]+$ ]] ; then echo "Input must be non-negative integer." 1>&2 exit 1 fi square_number=0 for odd_number in $( seq 1 2 $(( 2 * N - 1 )) ) ; do (( square_number += odd_number )) && echo $square_number done
Now the intent behind is clear and now others can easily decide if that code is really the best way to do it and easily improve it if needed.
Try to care about possible errorsBy default bash proceeds with the next command when something failed. Do not let your code blindly proceed in case of errors because that could make it hard to find the root cause of a failure when it errors out somewhere later at an unrelated place with a weird error message which could lead to false fixes that cure only a particular symptom but not the root cause.
Maintain Backward Compatibility
- In case of errors better abort than to blindly proceed.
- At least test mandatory conditions before proceeding. If a mandatory condition is not fulfilled abort with
Error "meaningful error message"
, see 'Relax-and-Recover functions' below.- Preferably in new scripts use
set -ue
to die from unset variables and unhandled errors and useset -o pipefail
to better notice failures in a pipeline. When leaving the script restore the Relax-and-Recover default bash flags and options withapply_bash_flags_and_options_commands "$DEFAULT_BASH_FLAGS_AND_OPTIONS_COMMANDS"
see usr/sbin/rear .- TODO Use
set -eu
andset -o pipefail
also in existing scripts, see make rear working with ''set -ue -o pipefail" .Implement adaptions and enhancements in a backward compatible way so that your changes do not cause regressions for others.
Dirty hacks welcome
- One same Relax-and-Recover code must work on various different systems. On older systems as well as on newest systems and on various different Linux distributions.
- Preferably use simple generic functionality that works on any Linux system. Better very simple code than oversophisticated (possibly fragile) constructs. In particular avoid special bash version 4 features (Relax-and-Recover code should also work with bash version 3).
- When there are incompatible differences on different systems distinction of cases with separated code is needed because it is more important that the Relax-and-Recover code works everywhere than having generic code that sometimes fails.
When there are special issues on particular systems it is more important that the Relax-and-Recover code works than having nice looking clean code that sometimes fails. In such special cases any dirty hacks that intend to make it work everywhere are welcome. But for dirty hacks the above listed coding hints become mandatory rules:
- Provide explanatory comments that tell what a dirty hack does together with a GitHub issue or any other URL that tell about the reasoning behind the dirty hack to enable others to properly adapt or clean up a dirty hack at any time later when the reason for it had changed or gone away.
- Try as good as you can to foresee possible errors or failures of a dirty hack and error out with meaningful error messages if things go wrong to enable others to understand the reason behind a failure.
- Implement the dirty hack in a way so that it does not cause regressions for others.
For example a dirty hack like the following is perfectly acceptable:
# FIXME: Dirty hack to make it work # on "FUBAR Linux version 666" # where COMMAND sometimes inexplicably fails # but always works after at most 3 attempts # see http://example.org/issue12345 # Retries should have no bad effect on other systems # where the first run of COMMAND works. COMMAND || COMMAND || COMMAND || Error "COMMAND failed."
Character EncodingUse only traditional (7-bit) ASCII charactes. In particular do not use UTF-8 encoded multi-byte characters.
Text Layout
- Non-ASCII characters in scripts may cause arbitrary unexpected failures on systems that do not support other locales than POSIX/C. During "rear recover" only the POSIX/C locale works (the ReaR rescue/recovery system has no support for non-ASCII locales) and /usr/sbin/rear sets the C locale so that non-ASCII characters are invalid in scripts. Have in mind that basically all files in ReaR are scripts. E.g. also /usr/share/rear/conf/default.conf and /etc/rear/local.conf are sourced (and executed) as scripts.
- English documentation texts do not need non-ASCII characters. Using non-ASCII characters in documentation texts makes it needlessly hard to display the documentation correctly for any user on any system. When non-ASCII characters are used but the user does not have the exact right matching locale set arbitrary nonsense can happen, cf. https://en.opensuse.org/SDB:Plain_Text_versus_Locale
Variables
- Indentation with 4 blanks, not tabs.
- Block level statements in same line:
if CONDITION ; then
Functions
- Curly braces only where really needed:
$FOO
instead of${FOO}
, but${FOO:-default_foo}
.- All variables that are used in more than a single script must be all-caps:
$FOO
instead of$foo
or$Foo
.- Variables that are used only locally should be lowercased and should be marked with
local
like:
local $foo="default_value"
Relax-and-Recover functions
- Use the
function
keyword to define a function.- Function names are lower case, words separated by underline (
_
).Use the available Relax-and-Recover functions when possible instead of re-implementing basic functionality again and again. The Relax-and-Recover functions are implemented in various lib/*-functions.sh files .
test, [, [[, ((
is_true
andis_false
:
See lib/global-functions.sh how to use them.
For example instead of using
if [[ ! "$FOO" =~ ^[yY1] ]] ; then
use
if ! is_true "$FOO" ; then
Paired parenthesis
- Use
[[
where it is required (e.g. for pattern matching or complex conditionals) and[
ortest
everywhere else.((
is the preferred way for numeric comparison, variables don't need to be prefixed with$
there.See also
- Use paired parenthesis for
case
patterns as in
case WORD in (PATTERN) COMMANDS ;; esac
so that editor commands (like '%' in 'vi') that check for matching opening and closing parenthesis work everywhere in the code.
I want to see if a string is inside a portion of another string.e.g.:'ab' in 'abc' -> true 'ab' in 'bcd' -> false
How can I do this in a conditional of a bash script?
A: You can use the form
${VAR/subs}
whereVAR
contains the bigger string andsubs
is the substring your are trying to find:my_string=abc substring=ab if [ "${my_string/$substring}" = "$my_string" ] ; then echo "${substring} is not in ${my_string}" else echo "${substring} was found in ${my_string}" fi
This works because
${VAR/subs}
is equal to$VAR
but with the first occurrence of the stringsubs
removed, in particular if$VAR
does not contains the wordsubs
it won't be modified.I think that you should change the sequence of the
echo
statements. Because I getab is not in abc
–Mmm.. No, the script is wrong. Like that I get
ab was found in abc
, but if I usesubstring=z
I getz was found in abc
– Lucio May 25 '13 at 0:08===
Sorry again I forgot the
$
insubstring
. – edwin May 25 '13 at 0:10Now I get
ab is not in abc
. Butz was found in abc
. This is funny :D – Lucio May 25 '13 at 0:11===
[[ "ab" =~ "bcd" ]]
[[ "ab" =~ "abc" ]]
the brackets are for the test, and as it is double brackets, it can so some extra tests like
=~
.So you could use this form something like
var1="ab" var2="bcd" if [[ "$var2" =~ "$var1" ]]; then echo "pass" else echo "fail" fi
Edit: corrected "=~", had flipped.
I get
fail
with this parameters:var2="abcd"
– Lucio May 25 '13 at 0:02===
@Lucio The correct is
[[ $string =~ $substring ]]
. I updated the answer. – Eric Carvalho May 25 '13 at 0:38===
@EricCarvalho opps, thanks for correcting it. – demure May 25 '13 at 0:49
===Using bash filename patterns (aka "glob" patterns)
===substr=ab [[ abc == *"$substr"* ]] && echo yes || echo no # yes [[ bcd == *"$substr"* ]] && echo yes || echo no # no
The following two approaches will work on any POSIX-compatible environment, not just in bash:
substr=ab for s in abc bcd; do if case ${s} in *"${substr}"*) true;; *) false;; esac; then printf %s\\n "'${s}' contains '${substr}'" else printf %s\\n "'${s}' does not contain '${substr}'" fi done
substr=ab for s in abc bcd; do if printf %s\\n "${s}" | grep -qF "${substr}"; then printf %s\\n "'${s}' contains '${substr}'" else printf %s\\n "'${s}' does not contain '${substr}'" fi done
Both of the above output:
'abc' contains 'ab' 'bcd' does not contain 'ab'
The former has the advantage of not spawning a separate
grep
process.Note that I use
===printf %s\\n "${foo}"
instead ofecho "${foo}"
becauseecho
might mangle${foo}
if it contains backslashes.Mind the
[[
and"
:[[ $a == z* ]] # True if $a starts with an "z" (pattern matching). [[ $a == "z*" ]] # True if $a is equal to z* (literal matching). [ $a == z* ] # File globbing and word splitting take place. [ "$a" == "z*" ] # True if $a is equal to z* (literal matching).
So as @glenn_jackman said, but mind that if you wrap the whole second term in double quotes, it will switch the test to literal matching.
Here's the actual formal definition from the bash man pages:${parameter/pattern/string} ${parameter//pattern/string}The pattern is expanded to produce a patter t- tern against its value is replaced with string. In the first form, only the first match is replaced. The second form causes all matches of pattern to be replaced with string. If pattern begins with #, it must match at the beginning of the expanded value of parameter. If pattern begins with %, it must match at the end of the expanded value of parameter. If string is null, matches of pattern are deleted and the / following pattern may be omitted. If parameter is @ or *, the substitution operation is applied to each positional parameter in turn, and the expan- sion is the resultant list. If parameter is an array variable subscripted with @ or *, the substitution operation is applied to each member of the array in turn, and the expansion is the resultant list.
Regular expressions and globbingGlobbing is use of * as a wildcard to glob file name list together. Use of wildcards is not a regular expression.
These following examples should also work inside bash scripts. These may or may not be compatible with sh. These are "interesting" regex or globbing examples. I say "interesting" because they don't seem to follow the path of "true" regular expressions used by Perl.
[mst3k@zeus ~]$ echo ${HOME/\/home\//}
mst3k
[mst3k@zeus ~]$ echo ${HOME##home}
/home/mst3k
[mst3k@zeus ~]$ echo ${HOME##/home}
/mst3k
[mst3k@zeus ~]$ echo ${HOME##/home/}
mst3k
[mst3k@zeus ~]$ echo ${HOME##*}[mst3k@zeus ~]$ echo ${HOME##*/}
I'm having trouble with the string replacement function in bash. The problem is that I want to replace a printing character, in this case &, by a non-printing character, either new line or null in this case. I don't see how to specify the non-printing character in the string replacement function ${variable//a/b}.
I have a long, URL-encoded-like file name that I would like to parse with grep. I have used & as a delimiter between variables within the long file name. I would like to use the string replacement function in bash to search for all instances of & and replace each one with either the null character or the new line character since grep can recognize either one.
How do I specify a non-printing character in the bash string replacement function ?
Thank you.
Special Syntax
Submitted by Mitch Frazier on Fri, 04/02/2010 - 11:43.
Use the $'\xNN' syntax for the non-printing character. Note though that a NULL character does not work:$ cat j.sh
v="hello=yes&world=no"
v2=${v/&/$'\x0a'}
# ^^^^^^^ change to newline
echo -n ">>$v2<<" | hexdump -Cv2=${v/&/$'\x00'}
# ^^^^^^^ change to null (doesn't work)
echo -n ">>$v2<<" | hexdump -C
If you run this you can see that the substitution works for a newline but not for a NULL:$ sh j.sh
00000000 3e 3e 68 65 6c 6c 6f 3d 79 65 73 0a 77 6f 72 6c |>>hello=yes.worl|
00000010 64 3d 6e 6f 3c 3c |d=no<<|
00000016
00000000 3e 3e 68 65 6c 6c 6f 3d 79 65 73 77 6f 72 6c 64 |>>hello=yesworld|
00000010 3d 6e 6f 3c 3c |=no<<|
00000015
Mitch Frazier is an Associate Editor for Linux Journal.Multiple operations?
Submitted by Anonymous on Wed, 03/24/2010 - 10:56.
Very interesting article.
A question: is it possible to use in the same expression many operators, as:
${var#t*is%t*st} which uses both '#t*is' and '%t*st' which gives 'is a' in the example?
I tried some forms but it doesn't work... Has someone an idea?Doesn't Work
Submitted by Mitch Frazier on Wed, 03/24/2010 - 14:59.
You can't do multiple operations in one expression.Mitch Frazier is an Associate Editor for Linux Journal.
Variable
Submitted by First question (not verified) on Tue, 09/01/2009 - 07:07.
I want to do something like this using linux bash script:
a1="Chris Alonso"
i="1"
echo $a$i #I only trying to write: echo $a1 using the variable iSomeone can help me, please?
Eval
Submitted by Mitch Frazier on Tue, 09/01/2009 - 13:41.
Eval will do this for you but you may decide you really don't want to do this after seeing it:eval echo \$$(echo a$i)
or
eval echo \$`echo a$i`
A slightly less complicated sequence would be something like:v=a$i
eval echo \$$v
It looks like what you're trying to do here is simulate arrays. If that's the case then you'd be better or using bash's built-in arrays.Mitch Frazier is an Associate Editor for Linux Journal.
How about v=a$i echo ${!v}
Submitted by Anonymous (not verified) on Fri, 09/25/2009 - 15:09.
How aboutv=a$i
echo ${!v}simplification of indirect reference
Submitted by Anonymous on Fri, 03/12/2010 - 23:16.
Is there any way to rid the statements of the variable assignment? As in, make it so that:
echo ${!a$i}works? I'm thinking that there has to be a way to escape the "a$i" inside the indirect reference construct. I have a case where I'm trying to do this with the result of a regex match, and am not able to figure out the right syntax:
SRC_FOLDER=/var/website
needleA_FOLDER=/var/www
needleB_FOLDER=/var/htdocsfor item in ${ARRAY[@]; do
[[ "$item" =~ hay(needle)stack ]] &&
DIR=${!${BASH_REMATCH[1]}_FOLDER};
cp -R $SRC_FOLDER/* $DIR;
done;But the seventh line (with the indirect reference) chokes with a "bad substitution" error. I should be able to do this on one line, without using eval with the right syntax, no?
Sincerely,
TylerYes
Submitted by Mitch Frazier on Fri, 09/25/2009 - 15:36.
That works and is simpler than my solution.Mitch Frazier is an Associate Editor for Linux Journal.
: or not to :
Submitted by Ash (not verified) on Mon, 01/08/2007 - 07:47.
Interesting, ":" can be ommited for "numeric" variables (script/function arguments).baz=${var:-bar}
vs.baz=${1-bar}
First time I thought it is a typo, but it is not.interesting, this will save a few seds and greps!
Submitted by mangoo (not verified) on Tue, 01/29/2008 - 06:24.
Interesting article, this will save me a few seds, greps and awks!what if we are to operate on
Submitted by MgBaMa req (not verified) on Mon, 09/03/2007 - 22:44.
what if we are to operate on the param $1 $2, ...?
i mean is it feasible to see a result of ${4%/*}
to get a valule as from
$ export $1="this is a test/none"
$ echo ${$1/*}
> this is a test
$ echo ${$2#*/}
> none
?
thanksvariable contents confusing
Submitted by Paul Archerr (not verified) on Wed, 03/29/2006 - 09:39.
Minor typos not withstanding, I had a bit of a problem with the values of the variables used. assigning the value 'bar' to the variable bar makes it confusing to quickly figure out which is which. (Is that 'bar' another variable? Or a value?)
I would suggest making the simple change of putting your values in uppercase. They would stand out and make the article more readable.
For example:
$ export var=var
$ echo ${var}bar # var exists so this works as expected
varbar
$ echo $varbar # varbar doesn't exist, so this doesn't
$becomes
$ export var=VAR
$ echo ${var}bar # var exists so this works as expected
VARbar
$ echo $varbar # varbar doesn't exist, so this doesn't
$You can see how the 'VARbar' on the third line becomes differentiated from the 'varbar' on the fourth line.
var, bar, +varbar: worst things for programming since Microsoft.
Submitted by Anonymous (not verified) on Mon, 09/10/2007 - 14:35.
Using var and bar to try to inform someone is pretty much uniformly bad everywhere it's done, as var and bar explicitly indicate things that don't have any meaning whatsoever. Varbar is even worse, since it is visibly only different from var bar because of a single " " (space).If you're trying to confuse the reader, use var, bar, and especially varbar.
If you're trying to be informative, please, give your damn variables a short but logically useful name.
Part II?
Submitted by Anonymous (not verified) on Fri, 03/24/2006 - 08:58.
OK but there's a lot more to it than just this. How about some of the following?${var:pos[:len]} # extract substr from pos (0-based) for len
${var/substr/repl} # replace first match
${var//substr/repl} # replace all matches
${var/#substr/repl} # replace if matches at beginning (non-greedy)
${var/##substr/repl} # replace if matches at beginning (greedy)
${var/%substr/repl} # replace if matches at end (non-greedy)
${var/%%substr/repl} # replace if matches at end (greedy)${#var} # returns length of $var
${!var} # indirect expansion...Sorry, those round parens
Submitted by Anonymous (not verified) on Fri, 03/24/2006 - 09:07.
...Sorry, those round parens should be curlies.Interesting
Submitted by Stephanie (not verified) on Sun, 03/12/2006 - 21:51.
I think they author did a great job explaining the article and am glad that I was able to learn from it and finally found something interesting to read online!Examples in Table 1 are rubbish
Submitted by Anonymous (not verified) on Sun, 03/12/2006 - 20:56.
In addition to the incorrect $var= (should be var=), the last two examples don't illustrate the use of the construct they are supposed to . Pity the author did not proof-read the first table.Examples using same operator yet differing results?!
Submitted by really-txtedmacs (not verified) on Fri, 03/10/2006 - 19:46.
${var#t*is} deletes the shortest possible match from the left export $var="this is a test" echo ${var#t*is} is a testfine, but next in line is supposed to remove the maximum from the left, but uses the same exact operator, how does it get the correct result?
${var##t*is} deletes the longest possible match from the left export $var="this is a test" echo ${var#t*is} a test
Get's worse when going from the right, the original operation from the right is employed. Moreover, on my system an Ubuntu 05.10 descktop, this gave:
txtedmacs@phpserver:~$ export $var="this is a test"
bash: export: `=this is a test': not a valid identifierTake out the $var, and it works fine.
Much easier to catch someone else's errors than one's own - I hate looking at my articles or emails.
Errors in article
Submitted by Anonymous (not verified) on Mon, 03/13/2006 - 22:49.
As really-txtedmacs tried to politely point out, there are errors in the Pattern Matching table - Example column, as of when he looked at it and as of now. Each instance of "export $var" should be "export var" in bash and most similar shells. Also, the operator in the echo command needs to match exactly the operator in the first column. Interestingly, some but not all of these errors still exist in the original article at http://linuxgazette.net/issue57/eyler.html, which is in issue 57, not 67.
Otherwise, a very good article. I will save the info in my bag of tricks.You're channelling Larry Wall, dude!
Submitted by Jim Dennis (not verified) on Sat, 03/11/2006 - 18:30.
In Bourne shell and its ilk (like Korn shell and bash) the assignment syntax is:
var=... You only prefix a variable's name with $ when you're "dereferencing" it (expanding it into its value).So the shell was parsing
export $var="this is a test" as:export ???="this is a test" (where ??? is whatever "var" was set to before this statement ... probably the empty string if the variable was previously unset).
I know this is confusing because Perl does it completely differently. In Perl the $ is a "sigil" which, on an "lvalue" (a variable name or other assignable token) tells the interpeter what "type" of assignment is occuring. Thus a Perl statement like:
$var="this is a test"; (note the required semicolon, too) is a "scalar" assignment. This also sets the context of the assignment. In Perl a scalar value in scalar context is conceptually the closest to a normal shell variable assignment. However, a list value in a scalar assignment context is a different beast entirely. So a line of Perl like
perl -e '@bar=(1,2,3)]; $var=@bar; print $var ;' will set $var to the number of items in the bar array. (Of course we could use @var for the array name since they are different namespaces in Perl. But I wanted my example to be clear). So an array/list value in scalar context returns an integer (a type of scalar) which represents the number of elements in the list.Anyway, just remembrer that the shell $ it more like the C programming * operator ... it dereferences the variable into its value.
JimD
The Linux Gazette "Answer Guy"USA <> World :-)
Submitted by peter.green on Fri, 03/10/2006 - 14:57.
Although the # and % identifiers may not seem obvious, they have a convenient mnemonic. The # key is on the left side of the $ key and operates from the left.
In the USA, perhaps, but my UK keyboard has the # key nestling up against the Enter and right-Shift keys. Not to mention layouts such as Dvorak...!Other (non-USA-specific?!) Mnemonics
Submitted by Anonymous (not verified) on Sat, 03/18/2006 - 16:29.
Another way to keep track is that we say "#1" and "1%", not "1#" and "%1". That is, unless you're using "#" to mean "pounds", in which case "1#" is correct, but it's antiquated at best in the USA, and presumably a nonissue for other countries that use metric...C programmers are used to using "#" at the start of lines (#define, #include). LaTeX authors are used to "%" at the end of lines when writing macro definitions, as a comment to keep extraneous whitespace from creeping in--but "%" is comment to end-of-line so it's also likely to show up at the start of a line too...
Mnemonics
Submitted by Island Joe (not verified) on Tue, 03/21/2006 - 04:25.
Thanks for sharing those mnemonic insights, it's most helpful.
${var:pos[:len]} # extract substr from pos (0-based) for len
${var/substr/repl} # replace first match
${var//substr/repl} # replace all matches
${var/#substr/repl} # replace if matches at beginning (non-greedy)
${var/##substr/repl} # replace if matches at beginning (greedy)
${var/%substr/repl} # replace if matches at end (non-greedy)
${var/%%substr/repl} # replace if matches at end (greedy)${#var} # returns length of $var
${!var} # indirect expansion
Jan 08, 2007
Ash:
Interesting, ":" can be omitted for "numeric" variables (script/function arguments).
baz=${var:-bar}vs.baz=${1-bar}First time I thought it is a typo, but it is not.
The bash shell has many features that are sufficiently obscure you almost never see them used. One of the problems is that the man page offers no examples.
Here I'm going t... some of these features to do the sorts of simple string manipulations that are commonly needed on file and path names.
In traditional Bourne shell programming you might see references to the basename and dirname commands. These perform simple string manipulations on their arguments. You'll also see many uses of sed and awk or perl -e to perform simple string manipulations.
Often these machinations are necessary perform on lists of filenames and paths. There are many specialized programs that are conventionally included with Unix to perform these sorts of utility functions: tr, cut, paste, and join. Given a filename like /home/myplace/a.data.directory/a.filename.txt which we'll call $f you could use commands like:
dirname $f basename $f basename $f.txt... to see output like:
/home/myplace/a.data.directory a.filename.txt a.filenameNotice that the GNU version of basename takes an optional parameter. This handy for specifying a filename "extension" like .tar.gz which will be stripped off of the output. Note that basename and dirname don't verify that these parameters are valid filenames or paths. They simple perform simple string operations on a single argument. You shouldn't use wild cards with them -- since dirname takes exactly one argument (and complains if given more) and basename takes one argument and an optional one which is not a filename.
Despite their simplicity these two commands are used frequently in shell programming because most shells don't have any built-in string handling functions -- and we frequently need to refer to just the directory or just the file name parts of a given full file specification.
Usually these commands are used within the "back tick" shell operators like TARGETDIR=`dirname $1`. The "back tick" operators are equivalent to the $(...) construct. This latter construct is valid in Korn shell and bash -- and I find it easier to read (since I don't have to squint at me screen wondering which direction the "tick" is slanted).
Although the basename and dirname commands embody the "small is beautiful" spirit of Unix -- they may push the envelope towards the "too simple to be worth a separate program" end of simplicity.
Naturally you can call on sed, awk, TCL or perl for more flexible and complete string handling. However this can be overkill -- and a little ungainly.
So, bash (which long ago abandoned the "small is beautiful" principal and went the way of emacs) has some built in syntactical candy for doing these operations. Since bash is the default shell on Linux systems then there is no reason not to use these features when writing scripts for Linux.
If your concerned about portability to other shells and systems -- you may want to stick with dirname, basename, and sed The bash man page is huge. In contains a complete reference to the "readline" libraries and how to write a .inputrc file (which I think should all go in a separate man page) -- and a run down of all the csh "history" or bang! operators (which I think should be replaced with a simple statement like: "Most of the
csh work the same way in bash"). However, buried in there is a section on Parameter Substitution which tells us that $var is really a shorthand for ${var} which is really the simplest case of several ${var:operators} and similar constructs.
Are you confused, yet?
Here's where a few examples would have helped. To understand the man page I simply experimented with the echo command and several shell variables. This is what it all means:
Given:
var=/tmp/my.dir/filename.tar.gz
We can use these expressions:
- path = ${var%/*}
- To get: /tmp/my.dir (like dirname)
- file = ${var##*/}
- To get: filename.tar.gz (like basename)
- base = ${file%%.*}
- To get: filename
- ext = ${file#*.}
- To get: tar.gz
Note that the last two depend on the assignment made in the second one
Here we notice two different "operators" being used inside the parameters (curly braces). Those are the # and the % operators. We also see them used as single characters and in pairs. This gives us four combinations for trimming patterns off the beginning or end of a string:
- ${variable%pattern}
- Trim the shortest match from the end
- ${variable##pattern}
- Trim the longest match from the beginning
- ${variable%%pattern}
- Trim the shortest match from the end
- ${variable#pattern}
- Trim the shortest match from the beginning
It's important to understand that these use shell "globbing" rather than "regular expressions" to match these patterns. Naturally a simple string like "txt" will match sequences of exactly those three characters in that sequence -- so the difference between "shortest" and "longest" only applies if you are using a shell wild card in your pattern.
A simple example of using these operators comes in the common question of copying or renaming all the *.txt to change the .txt to .bak (in MS-DOS' COMMAND.COM that would be REN *.TXT *.BAK).
This is complicated in Unix/Linux because of a fundamental difference in the programming API's. In most Unix shells the expansion of a wild card pattern into a list of filenames (called "globbing") is done by the shell -- before the command is executed. Thus the command normally sees a list of filenames (like "var.txt bar.txt etc.txt") where DOS (COMMAND.COM) hands external programs a pattern like *.TXT.
Under Unix shells, if a pattern doesn't match any filenames the parameter is usually left on the command like literally. Under bash this is a user-settable option. In fact, under bash you can disable shell "globbing" if you like -- there's a simple option to do this. It's almost never used -- because commands like mv, and cp won't work properly if their arguments are passed to them in this manner.
However here's a way to accomplish a similar result:
for i in *.txt; do cp $i ${i%.txt}.bak; done... obviously this is more typing. If you tried to create a shell function or alias for it -- you have to figure out how to pass this parameters. Certainly the following seems simple enough:
function cp-pattern { for i in $1; do cp $i ${i%$1}$2; done... but that doesn't work like most Unix users would expect. You'd have to pass this command a pair of specially chosen, and quoted arguments like:
cp-pattern '*.txt' .bak... note how the second pattern has no wild cards and how the first is quoted to prevent any shell globbing. That's fine for something you might just use yourself -- if you remember to quote it right. It's easy enough to add check for the number of arguments and to ensure that there is at least one file that exists in the $1 pattern. However it becomes much harder to make this command reasonably safe and robust. Inevitably it becomes less "unix-like" and thus more difficult to use with other Unix tools.
I generally just take a whole different approach. Rather than trying to use cp to make a backup of each file under a slightly changed name I might just make a directory (usually using the date and my login ID as a template) and use a simple cp command to copy all my target files into the new directory.
Another interesting thing we can do with these "parameter expansion" features is to iterate over a list of components in a single variable.
For example, you might want to do something to traverse over every directory listed in your path -- perhaps to verify that everything listed therein is really a directory and is accessible to you.
Here's a command that will echo each directory named on your path on it's own line:
p=$PATH until [ $p = $d ]; do d=${p%%:*}; p=${p#*:}; echo $d; done... obviously you can replace the echo $d part of this command with anything you like.
Another case might be where you'd want to traverse a list of directories that were all part of a path. Here's a command pair that echos each directory from the root down to the "current working directory":
p=$(pwd) until [ $p = $d ]; do p=${p#*/}; d=${p%%/*}; echo $d; done... here we've reversed the assignments to p and d so that we skip the root directory itself -- which must be "special cased" since it appears to be a "null" entry if we do it the other way. The same problem would have occurred in the previous example -- if the value assigned to $PATH had started with a ":" character.
Of course, its important to realize that this is not the only, or necessarily the best method to parse a line or value into separate fields. Here's an example that uses the old IFS variable (the "inter-field separator in the Bourne, and Korn shells as well as bash) to parse each line of /etc/passwd and extract just two fields:
cat /etc/passwd | ( \ IFS=: ; while read lognam pw id gp fname home sh; \ do echo $home \"$fname\"; done \ )Here we see the parentheses used to isolate the contents in a subshell -- such that the assignment to IFS doesn't affect our current shell. Setting the IFS to a "colon" tells the shell to treat that character as the separater between "words" -- instead of the usual "whitespace" that's assigned to it. For this particular function it's very important that IFS consist solely of that character -- usually it is set to "space," "tab," and "newline.
After that we see a typical while read loop -- where we read values from each line of input (from /etc/passwd into seven variables per line. This allows us to use any of these fields that we need from within the loop. Here we are just using the echo command -- as we have in the other examples.
My point here has been to show how we can do quite a bit of string parsing and manipulation directly within bash -- which will allow our shell scripts to run faster with less overhead and may be easier than some of the more complex sorts of pipes and command substitutions one might have to employ to pass data to the various external commands and return the results.
Many people might ask: Why not simply do it all in perl? I won't dignify that with a response. Part of the beauty of Unix is that each user has many options about how they choose to program something. Well written scripts and programs interoperate regardless of what particular scripting or programming facility was used to create them. Issue the command file /usr/bin/* on your system and and you may be surprised at how many Bourne and C shell scripts there are in there
In conclusion I'll just provide a sampler of some other bash parameter expansions:
${parameter:-word}- Provide a default if parameter is unset or null.
Example:echo ${1:-"default"}- Note: this would have to be used from within a functions or shell script -- the point is to show that some of the parameter substitutions can be use with shell numbered arguments. In this case the string "default" would be returned if the function or script was called with no $1 (or if all of the arguments had been shifted out of existence. ${parameter:=word}
- Assign a value to parameter if it was previously unset or null.
- Example:
echo ${HOME:="/home/.nohome"}- ${parameter:?word}
- Generate an error if parameter is unset or null by printing word to stdout.
- Example:
${HOME:="/home/.nohome"} - ${TMP:?"Error: Must have a valid Temp Variable Set"}
This one just uses the shell "null command" (the : command) to evaluate the expression. If the variable doesn't exist or has a null value -- this will print the string to the standard error file handle and exit the script with a return code of one.
Oddly enough -- while it is easy to redirect the standard error of processes under bash -- there doesn't seem to be an easy portable way to explicitly generate message or redirect output to stderr. The best method I've come up with is to use the /proc/ filesystem (process table) like so:
function error { echo "$*" > /proc/self/fd/2 }... self is always a set of entries that refers to the current process -- and self/fd/ is a directory full of the currently open file descriptors. Under Unix and DOS every process is given the following pre-opened file descriptors: stdin, stdout, and stderr.
- ${parameter:+word}
- Alternative value. ${TMP:+"/mnt/tmp"}
use /mnt/tmp instead of $TMP but do nothing if TMP was unset. This is a weird one that I can't ever see myself using. But it is a logical complement to the ${var:-value} we saw above.- ${#variable}
- Return the length of the variable in characters.
Example:
echo The length of your PATH is ${#PATH}
Bash supports a number of string manipulation operations. Unfortunately, these tools lack a unified focus. Some are a subset of parameter substitution, and others fall under the functionality of the UNIX expr command. This results in inconsistent command syntax and overlap of functionality, not to mention confusion.
Substring Removal
- expr match "$string" '\($substring\)'
- Extracts $substring at beginning of $string, where $substring is a regular expression.
- expr "$string" : '\($substring\)'
- Extracts $substring at beginning of $string, where $substring is a regular expression.
stringZ=abcABC123ABCabc # ======= echo `expr match "$stringZ" '\(.[b-c]*[A-Z]..[0-9]\)'` # abcABC1 echo `expr "$stringZ" : '\(.[b-c]*[A-Z]..[0-9]\)'` # abcABC1 echo `expr "$stringZ" : '\(.......\)'` # abcABC1 # All of the above forms give an identical result.- expr match "$string" '.*\($substring\)'
- Extracts $substring at end of $string, where $substring is a regular expression.
- expr "$string" : '.*\($substring\)'
- Extracts $substring at end of $string, where $substring is a regular expression.
stringZ=abcABC123ABCabc # ====== echo `expr match "$stringZ" '.*\([A-C][A-C][A-C][a-c]*\)'` # ABCabc echo `expr "$stringZ" : '.*\(......\)'` # ABCabcSubstring Replacement
- ${string#substring}
- Strips shortest match of $substring from front of $string.
- ${string##substring}
- Strips longest match of $substring from front of $string.
stringZ=abcABC123ABCabc # |----| # |----------| echo ${stringZ#a*C} # 123ABCabc # Strip out shortest match between 'a' and 'C'. echo ${stringZ##a*C} # abc # Strip out longest match between 'a' and 'C'.- ${string%substring}
- Strips shortest match of $substring from back of $string.
- ${string%%substring}
- Strips longest match of $substring from back of $string.
stringZ=abcABC123ABCabc # || # |------------| echo ${stringZ%b*c} # abcABC123ABCa # Strip out shortest match between 'b' and 'c', from back of $stringZ. echo ${stringZ%%b*c} # a # Strip out longest match between 'b' and 'c', from back of $stringZ.Example 9-10. Converting graphic file formats, with filename change
#!/bin/bash # cvt.sh: # Converts all the MacPaint image files in a directory to "pbm" format. # Uses the "macptopbm" binary from the "netpbm" package, #+ which is maintained by Brian Henderson ([email protected]). # Netpbm is a standard part of most Linux distros. OPERATION=macptopbm SUFFIX=pbm # New filename suffix. if [ -n "$1" ] then directory=$1 # If directory name given as a script argument... else directory=$PWD # Otherwise use current working directory. fi # Assumes all files in the target directory are MacPaint image files, # + with a ".mac" suffix. for file in $directory/* # Filename globbing. do filename=${file%.*c} # Strip ".mac" suffix off filename #+ ('.*c' matches everything #+ between '.' and 'c', inclusive). $OPERATION $file > $filename.$SUFFIX # Redirect conversion to new filename. rm -f $file # Delete original files after converting. echo "$filename.$SUFFIX" # Log what is happening to stdout. done exit 0
- ${string/substring/replacement}
- Replace first match of $substring with $replacement.
- ${string//substring/replacement}
- Replace all matches of $substring with $replacement.
stringZ=abcABC123ABCabc echo ${stringZ/abc/xyz} # xyzABC123ABCabc # Replaces first match of 'abc' with 'xyz'. echo ${stringZ//abc/xyz} # xyzABC123ABCxyz # Replaces all matches of 'abc' with # 'xyz'.
${string/#substring/replacement}- If $substring matches front end of $string, substitute $replacement for $substring.
- ${string/%substring/replacement}
- If $substring matches back end of $string, substitute $replacement for $substring.
stringZ=abcABC123ABCabc echo ${stringZ/#abc/XYZ} # XYZABC123ABCabc # Replaces front-end match of 'abc' with 'xyz'. echo ${stringZ/%abc/XYZ} # abcABC123ABCXYZ # Replaces back-end match of 'abc' with 'xyz'.Manipulating strings using awk
A Bash script may invoke the string manipulation facilities of awk as an alternative to using its built-in operations.
Example 9-11. Alternate ways of extracting substrings#!/bin/bash # substring-extraction.sh String=23skidoo1 # 012345678 Bash # 123456789 awk # Note different string indexing system: # Bash numbers first character of string as '0'. # Awk numbers first character of string as '1'. echo ${String:2:4} # position 3 (0-1-2), 4 characters long # skid # The awk equivalent of ${string:pos:length} is substr(string,pos,length). echo | awk ' { print substr("'"${String}"'",3,4) # skid } ' # Piping an empty "echo" to awk gives it dummy input, #+ and thus makes it unnecessary to supply a filename. exit 0For more on string manipulation in scripts, refer to Section 9.3 and the relevant section of the expr command listing. For script examples, see:
# More to the point, thanks to the way ksh works, you can do this:
# make an array, words, local to the current functiontypeset -A words
# read a full line
read line
# split the line into words
echo "$line" | read -A words
# Now you can access the line either word-wise or string-wise - useful if you want to, say, check for a command as the Nth parameter,
# but also keep formatting of the other parameters...
Google matched content |
Society
Groupthink : Two Party System as Polyarchy : Corruption of Regulators : Bureaucracies : Understanding Micromanagers and Control Freaks : Toxic Managers : Harvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Neoliberalism : The Iron Law of Oligarchy : Libertarian Philosophy
Quotes
War and Peace : Skeptical Finance : John Kenneth Galbraith :Talleyrand : Oscar Wilde : Otto Von Bismarck : Keynes : George Carlin : Skeptics : Propaganda : SE quotes : Language Design and Programming Quotes : Random IT-related quotes : Somerset Maugham : Marcus Aurelius : Kurt Vonnegut : Eric Hoffer : Winston Churchill : Napoleon Bonaparte : Ambrose Bierce : Bernard Shaw : Mark Twain Quotes
Bulletin:
Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks The efficient markets hypothesis : Political Skeptic Bulletin, 2013 : Unemployment Bulletin, 2010 : Vol 23, No.10 (October, 2011) An observation about corporate security departments : Slightly Skeptical Euromaydan Chronicles, June 2014 : Greenspan legacy bulletin, 2008 : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Financial Humor Bulletin, 2010 : Inequality Bulletin, 2009 : Financial Humor Bulletin, 2008 : Copyleft Problems Bulletin, 2004 : Financial Humor Bulletin, 2011 : Energy Bulletin, 2010 : Malware Protection Bulletin, 2010 : Vol 26, No.1 (January, 2013) Object-Oriented Cult : Political Skeptic Bulletin, 2011 : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law
History:
Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds : Larry Wall : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOS : Programming Languages History : PL/1 : Simula 67 : C : History of GCC development : Scripting Languages : Perl history : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history
Classic books:
The Peter Principle : Parkinson Law : 1984 : The Mythical Man-Month : How to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Hater’s Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite
Most popular humor pages:
Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor
The Last but not Least Technology is dominated by two types of people: those who understand what they do not manage and those who manage what they do not understand ~Archibald Putt. Ph.D
Copyright © 1996-2021 by Softpanorama Society. www.softpanorama.org was initially created as a service to the (now defunct) UN Sustainable Development Networking Programme (SDNP) without any remuneration. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.
FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.
This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...
|
You can use PayPal to to buy a cup of coffee for authors of this site |
Disclaimer:
The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the Softpanorama society. We do not warrant the correctness of the information provided or its fitness for any purpose. The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be tracked by Google please disable Javascript for this site. This site is perfectly usable without Javascript.
Last modified: May 10, 2021