Softpanorama

Home Switchboard Unix Administration Red Hat TCP/IP Networks Neoliberalism Toxic Managers
May the source be with you, but remember the KISS principle ;-)
Bigger doesn't imply better. Bigger often is a sign of obesity, of lost control, of overcomplexity, of cancerous cells

String Operations in Shell

Version 2.0 (October 20, 2017)

News

Bash

Recommended Links

Bash as a scripting language Double square bracket conditionals Pattern Matching

Variable Substitution

${var:-bar}
(default)
${var:=bar}
(set if not defined)
# ## % %% KSH Substitutions
Length Index Substr Search and Replace Concatenation Trimming from left and right Text processing using regex
BASH Debugging Bash Built-in Variables Annotated List of Bash Enhancements bash Tips and Tricks Tips and Tricks Humor Etc

Those notes are partially based on lecture notes by Professor Nikolai Bezroukov at FDU.

String operators allow you to manipulate the contents of a variable without resorting to AWK or Perl. Modern shells such as bash 3.x or ksh93 supports most of the standard string manipulation functions, but in a very pervert, idiosyncratic way. Anyway, standard functions like length, index, substr are available. Strings can be concatenated by juxtaposition and using double quoted strings. You can ensure that variables exist (i.e., are defined and have non-null values) and set default values for variables and catch errors that result from variables not being set. You can also perform basic pattern matching. There are several basic string operations available in bash, ksh93 and similar shells:

Introduction

Shell string processing capabilities were weak, but recently in bash 4.x they were improve and how are half-decent. Most "classic" string  handing function such as index, substr, concatenation, trimming, case conversion, translation of one set of symbols into another, etc, are available.  Directly or indirectly. Regular expression now  can be used for matching the string using Perl compatible regex pattern. So if you use bash 4.1+  like s good. Almost ;-) 

One interesting idiosyncrasy of Unix shells including bash is that  many string operators in shell use unique among programming languages curly-bracket syntax. In shell any variable can be displayed as ${name_of_the_variable} instead of $name_of_the_variable. This notation was initially introduced to protect a variable name from merging with string that comes after it, but was extended to allow string operations on variables.  Here is example in which it is used for separation of a variable $var and a string "_string" using curly brackets:

$ export var='test' 
$ echo ${var}_string # var is a variable that uses syntax ${var} with the value test
$ echo $var_string # var_string is a variable that doesn't exist, echo doesn't print anything

In Korn 88 shell this notation was extended to allow expressions inside curvy brackets. For example

${var:=moo}

Each operation is encoded using special symbol or two symbols ( "digraph", for example :- , := , etc). An argument that the operator may need is positioned after the symbol of the operation. Later this notation extended ksh93 and adopted by bash and other shells.  Recently it was also extended to case conversion using ^^ and ,, digrams

This "ksh-originated" group of operators is the most popular and probably the most widely used group of string-handling operators so it makes sense to learn them, if only in order to be able to modify old scripts.

Recent developments

Bash 3.2 introduced =~ operator with "normal" Perl-style regular expressions that can be used instead in many cases and they are definitely preferable in new scripts that you might write. for more about Perl compatible regular expression see Text processing using regex. they are now standard de-fact and are use in Perl, Python, Java, Javascript and other modem languages. So good knowledge of them is necessary for any sysadmin.

Let's say we need to establish whether variable $ip appears to be a valid IP address:

for ip in "255.255.255.255" "10.10.10.10" "400.0.0.0" ; do

   echo "=== testing $ip ==="
   if [[ $ip =~ ^[0-2][0-9]{0,2}\.[0-2][0-9]{0,2}\.[0-2][0-9]{0,2}\.[0-2][0-9]{0,2} ]] ; then
       echo "$ip Looks like a valid IP"
   else
      echo "$ip is unvalid ip"
   fi
done

running this fragment will get:

=== testing 255.255.255.255 ===
255.255.255.255 Looks like a valid IP
=== testing 10.10.10.10 ===
10.10.10.10 Looks like a valid IP
=== testing 400.0.0.0 ===
400.0.0.0 is unvalid ip

In bash-3.1, a string append operator (+=) was added:

PATH+=":~/bin"
echo "$PATH"

In bash 4.1 negative length specifications in the ${var:offset:length} expansion,   previously errors, are now treated as offsets from the end of the variable.

It also extended printf built-in which now has a new %(fmt)T specifier, which allows time values to use strftime-like formatting.

printf -v can now assign values to array indices.

The read built-in now has a new `-N nchars' option, which reads exactly NCHARS characters, ignoring delimiters like newline.

The {x} operators to the [[ conditional command now do string  comparison according to the current locale if the compatibility level is greater than 40.

Matching pattern substitution

If pattern is a string, then "matching pattern substitution" is the combination of two functions index and substr, Only if index function succeed, substr function is applied. If regular expression is used, this is equivalent to $var=s/regex/string/operation in Perl. See also Search and Replace Unlike in Perl only basic regular expressions are allowed

This notation was introduced in ksh88 and still remains very idiosyncratic. We have four operations: #, ##, % and %%. Two perform search/matching from the left of the string, two from the right.

In examples below we will assume that the variable var has value "this is a test" (as produced by execution of statement export var="this is a test")

The # and % operators have a convenient mnemonic if you use the US keyboard. The # key is on the left side of the $ key and operates from the left, while % is to right of $ key.

Here is test Perl program for the example above that illustrates how # operates

#!/usr/bin/perl
   $var="this is a test";
   $pattern='this';
   if ( ($k=index($var,$pattern))>-1 ) {
      substr($var,$k,length($pattern))='';
   }
   print "var=$var\n";
If executed, it will print
var= is a test
Note a space after equal sign -- pattern matches up to letter 't' so space is the first unmatched symbol that gets into the result.

Implementation of classic string operations in shell

Despite shell deficiencies in this area and idiosyncrasies preserved from 1970th most classic string operations can be implemented in shell. You can define functions that behave almost exactly like in Perl or other "more normal" language. In case shell facilities are not enough you can use AWK or Perl. It's actually sad that AWK was not integrated into shell.

Length Operator

There are several ways to get length of the string.

More complex example. Here's the function for validating that that string is within a given max length. It requires two parameters, the actual string and the maximum length the string should be.

check_length() 
# check_length 
# to call: check_length string max_length_of_string 
{ 
	# check we have the right params 
	if (( $# != 2 )) ; then 
	   echo "check_length need two parameters: a string and max_length" 
	   return 1 
	fi 
	if (( ${#1} > $2 )) ; then 
	   return 1 
	fi 
	return 0 
} 

You could call the function check_length like this:

#!/usr/bin/bash
# test_name 
while : 
do 
  echo -n "Enter customer name :" 
  read NAME 
  [ check_length $NAME 10 ] && break
  echo "The string $NAME is longer then 10 characters"    
done 
echo $NAME

Determining the Length of Matching Substring at Beginning of String

This is pretty rarely used and pretty obscure capability of expr built-in operation called match. Sometimes it can be useful to emulate index function, as bash (shame on developers) does not has index function yet, only substr function =which was recently implemented (see below)
expr match "$string" : '$substring' 

where:

The arguments are converted to strings and the second is considered to be a (basic, the same as used by GNU grep ) regular expression, with a `^' implicitly prepended. The first argument is then matched against this regular expression.

If the match succeeds and REGEX uses `\(' and `\)', the `:' expression returns the part of STRING that matched the sub-expression; otherwise, it returns the number of characters matched.

If the match fails, the `:' operator returns the null string if `\(' and `\)' are used in REGEX, otherwise 0

Only the first `\( ... \)' pair is relevant to the return value; additional pairs are meaningful only for grouping the regular expression operators.

In the regular expression, `\+', `\?', and `\|' are operators which respectively match one or more, zero or one, or separate alternatives. SunOS and other `expr''s treat these as regular characters. (POSIX allows either behavior.)

For example

my_string=abcABC123ABCabc
#         |------|

echo `expr "$my_string" : 'abc[A-Z]*.2'`       # 8
See shell script - OR in `expr match` - Unix & Linux Stack Exchange for some additional info.

Index

There is no such function. In cases when you use if often and need exact position of the substring in the source string you probably will be better off switching to Perl or any other scripting language that you know well.

If you just need to check that the substring occurs in the string can also use regular expression with the =~ operator

You can use pattern matching for determining whether particular substring is present in string (the most typical usage of index function) but you can't determine its position

string="abba"
[[ $string =∼ *bb*  ]] && echo "bb is contains in string $string" 

Note that you do not need to quote variables inside [[...]]. also if you need space you can escape it with the backslash.

Case statement also will work:

case "$string" in 
  *bb*)
    # Do stuff
    ;;
esac

In discussion String contains in Bash - Stack Overflow the following solution was ranked highest which provides the solution to "space inside the string problem" which hunt previous approaches:

You can use Marcus's answer (* wildcards) outside a case statement, too, if you use double brackets:
string='My long string'
if [[ $string == *"My long"* ]]; then
  echo "It's there!"
fi

The other half-decent way to implement checking if the string contains a substring is to use grep and <<< redirection operator. The option -b prints the byte offset of the string shown in the first position that you can extract from the result, but this is a perversion. You need numeric position you need to switch to Perl or AWK.

if  [[ grep $substr <<< string ]] ; then 
    echo Substring $substr occurs in string
fi 

NOTE: Function index that exists in expr is not what you want -- the second operator in it is the set of characters (like in tr) not a string.

The expr index command searches through your first string looking the first occurrence of any character from your second string. In this case, it is recognizing that the 'o' in the characters 'Mozilla' matches the 4th character in "Info.out..."

This using this as a test to see what happens. It will return 4 as the first match for 'd':

echo `expr index "abcdefghijklmnopqrstuvwxyz" xyzd`
In other words
expr index $string $character_set
returns the numerical position in $string of first character in set of characters defined by $character_set that matches.
stringZ=abcABC123ABCabc
echo `expr index "$stringZ" C12`             # 6
                                             # C position.

echo `expr index "$stringZ" c`              # 3
# 'c' (in #3 position) 

This is the close equivalent of strchr() in C. Moreover:

substr function

Bash provides two implementation of substr function which are not identical:

  1. Via expr function
  2. a part of pattern matching operators in the form ${param:offset[:length}.

I recommend using the second one, as this is more compact notation and does not involves using external function expr. It also looks more modern, as if inspired by Python, although its origin has nothing to do with Python. It is actually more compact that in Perl.

Implementation via expr function

Classic substr function is available via expr function:

expr substr $string $position $length
Extracts $length characters from $string starting at $position.. The first character has index one.
stringZ=abcABC123ABCabc
#       123456789......
#       1-based indexing.

echo `expr substr $stringZ 1 2`              # ab
echo `expr substr $stringZ 4 3`              # ABC

Notes:

Implementation of substr function using :: notation in bash 4.1+

Idiosyncratic, but pretty slick implementation of substring function is also available in bash 3.x as a part of pattern matching operators in the form

${param:offset[:length}
Extracts substring with $length characters from $string starting at $position..

NOTE: Starting with bash 4.1 negative length specification in the ${var:offset:length} expansion, previously being an error, is now treated as offset from the end of the variable, like in Perl.

If the $string parameter is "*" or "@", then this extracts the positional parameters, starting at $position.

${string:position:length}
Extracts $length characters of substring from $string at $position.
stringZ=abcABC123ABCabc
#       0123456789.....
#       0-based indexing.

echo ${stringZ:0}                            # abcABC123ABCabc
echo ${stringZ:1}                            # bcABC123ABCabc
echo ${stringZ:7}                            # 23ABCabc

echo ${stringZ:7:3}                          # 23A
                                             # Three characters of substring.

Notes:

You can also emulate it (substr):
#
# substr -- a function to emulate the ancient ksh built-in
#

#
# -l == shortest from left
# -L == longest from left
# -r == shortest from right (the default)
# -R == longest from right

substr()
{
	local flag pat str
	local usage="usage: substr -lLrR pat string or substr string pat"

	case "$1" in
	-l | -L | -r | -R)
		flag="$1"
		pat="$2"
		shift 2
		;;
	-*)
		echo "substr: unknown option: $1"
		echo "$usage"
		return 1
		;;
	*)
		flag="-r"
		pat="$2"
		;;
	esac

	if [ "$#" -eq 0 ] || [ "$#" -gt 2 ] ; then
		echo "substr: bad argument count"
		return 2
	fi

	str="$1"

	#
	# We don't want -f, but we don't want to turn it back on if
	# we didn't have it already
	#
	case "$-" in
	"*f*")
		;;
	*)
		fng=1
		set -f
		;;
	esac

	case "$flag" in
	-l)
		str="${str#$pat}"		# substr -l pat string
		;;
	-L)
		str="${str##$pat}"		# substr -L pat string
		;;
	-r)
		str="${str%$pat}"		# substr -r pat string
		;;
	-R)
		str="${str%%$pat}"		# substr -R pat string
		;;
	*)
		str="${str%$2}"			# substr string pat
		;;
	esac

	echo "$str"

	#
	# If we had file name generation when we started, re-enable it
	#
	if [ "$fng" = "1" ] ; then
		set +f
	fi
}

Somewhat strange nuance: taking a slice of the array of positional parameters using :: notation

If the $string parameter is "*" or "@", then this is not substr function. Instead it slice the array of parameters extracts a maximum of $length positional parameters, starting at $position.

echo ${*:2}          # Echoes second and following positional parameters.
echo ${@:2}          # Same as above.

echo ${*:2:3}        # Echoes three positional parameters, starting at second.
expr substr $string $position $length

Search and Replace

You can search and replace substring in a variable using ksh syntax:

alpha='This is a test string in which the word "test" is replaced.' 
beta="${alpha/test/replace}"

The string "beta" now contains an edited version of the original string in which the first case of the word "test" has been replaced by "replace". To replace all cases, not just the first, use this syntax:

beta="${alpha//test/replace}"

Note the double "//" symbol.

Here is an example in which we replace one string with another in a multi-line block of text:

list="cricket frog cat dog" 
poem="I wanna be a x\n\ A x is what I'd love to be\n\ If I became a x\n\ How happy I would be.\n"
for critter in $list; do
   echo -e ${poem//x/$critter}
done
There are several additional capabilities
${var:pos[:len]} # extract substr from pos (0-based) for len
${var/substr/repl} # replace first match
${var//substr/repl} # replace all matches
${var/#substr/repl} # replace if matches at beginning (non-greedy)
${var/##substr/repl} # replace if matches at beginning (greedy)
${var/%substr/repl} # replace if matches at end (non-greedy)
${var/%%substr/repl} # replace if matches at end (greedy)
${#var} # returns length of $var
${!var} # indirect expansion

Concatenation

In bash-3.1, a string append operator (+=) was added, It is now a preferred solution:

PATH+=":~/bin"
echo "$PATH"

Traditionally in shell strings were concatenated by juxtaposition and using double quoted strings. For example

PATH="$PATH:/usr/games"

Double quoted string in shell is almost identical to double quoted string in Perl and performs macro expansion of all variables in it. The minor differences are the treatment of escaped characters and new line character. If you want exact match you can use $'string'

#!/bin/bash

# String expansion.Introduced with version 2 of Bash.

#  Strings of the form $'xxx' have the standard escaped characters interpreted. 

echo $'Ringing bell 3 times \a \a \a'
     # May only ring once with certain terminals.
echo $'Three form feeds \f \f \f'
echo $'10 newlines \n\n\n\n\n\n\n\n\n\n'
echo $'\102\141\163\150'   # Bash
                           # Octal equivalent of characters.

exit 0

Trimming from left and right

Using the wildcard character (?), you can imitate Perl chop function (which cuts the last character of the string and returns the rest) quite easily

test="~/bin/"
trimmed_last=${test%?}
trimmed_first=${test#?}
echo "original='$test,timmed_first='$trimmed_first', trimmed_last='$trimmed_last'"

The first character of a string can also be obtained with printf:

printf -v char "%c" "$source"
Conditional chopping line in Perl chomp function or REXX function trim can be done using while loop, for example:
function trim
{
   target=$1
   while : # this is an infinite loop
   do
   case $target in
      ' '*) target=${target#?} ;; ## if $target begins with a space remove it
      *' ') target=${target%?} ;; ## if $target ends with a space remove it
      *) break ;; # no more leading or trailing spaces, so exit the loop
   esac
   done
   return target
}

A more Perl-style method to trim trailing blanks would be

spaces=${source_var##*[! ]} ## get the trailing blanks in var $spaces
trimmed_var=${source_var#$spaces}
The same trick can be used for removing leading spaces.

Assignment of default value for undefined variables

Operator: ${var:-bar} is useful for assigning a variable a default value.

It word the following way: if $var exists and is not null, it returns $var. If it doesn't exist or is null, return bar. This operator does not change the variable $var.

Example:

$ export var=""
$ echo ${var:-one}
one
$ echo $var

More complex example:

sort -nr $1 | head -${2:-10}

A typical usage include situations when you need to check if arguments were passed to the script and if not assign some default values::

#!/bin/bash 
export FROM=${1:-"~root/.profile"}
export TO=${2:-"~my/.profile"}
cp -p $FROM $TO

set variable if it is not defined with the operator ${var:=bar}

It works as following: If $var exists and is not null, return $var. If it doesn't exist or is null, set $var to bar and return bar.

Example:

$ export var=""
$ echo ${var:=one}
one
echo $var
one

String comparison via double square bracket conditionals

The [[ ]] construct was introduced in ksh88 as a way to compensate for multiple shortcomings and limitations of the [ ] (test) solution. Essentially it makes [ ] construct obsolete except for running a program to get a return code. In turn double round brackets ((..)) construct made The [[ ]] construct obsolete for integer comparisons.

There are two types of operators that can be used inside double square bracket construct:

Paradoxically integer comparison operators are represented as strings ( -eq, -ne, -gt, etc) while string comparison operators as delimiters ("=", "==", "!=", "<", ">", etc).

The [[ ]] construct expects expression. In ksh delimiters [[ and ]] serve as single quotes so you do not have macro expansion inside: variable substitution and wildcard expansion aren't done within [[ and ]], making quoting less necessary. In bash this is less true :-).

It can act as independent operator as it produces return code. So constructs like

[[ $string =∼ [aeiou] ]] && exit; 

are legitimate and actually pretty compact way to write if statements without else clause that contain a single statement in then block.

One of the [[ ]] construct warts is that it redefined == as a pattern matching operation, which anybody who programmed in C/C++/Java strongly resent. Latest bash version corrected that and allow using Perl-style =~ operator instead (I think ksh93 allow that too):

string="abba"
[[ $string =∼ [aeiou] ]]
echo $?
0

[[ $string =∼ h[sdfghjkl] ]]
echo $?
1

Like [ ] construct [[ ]] construct can be used as a separate statement that returns an exit status depending upon whether condition is true or not. With && and || constructs discussed above this provides an alternative syntax for if-then and if-else constructs

if [[ -d  $HOME/$user ]] ; then echo " Home for user 
   $user exists..."; fi

can be written simpler as

[[ -d  $HOME/$user ]] && echo " Home for user $user 
   exists..." 

There are several types of expressions that can be used inside [[ ... ]] construct:

One unpleasant reality (and probably the most common gotcha) of using legacy [[...]] integer comparison constructs is that if one of the variable is not initialized it produces syntax error. Various tricks are used to avoid this nasty macro substitution side effect, that came of a legacy of extremely week implementation of comparisons in Borne shell (there are way too many crazy things implemented in Borne shell, anyway ;-).

There are two classic tricks to deal with this gotchas in old [[..]] construct as you will be dealing with scripts infested with those old constructs pretty often. They can be and often are used simultaneously:

Generally, it is better to initialize most variables explicitly. I know it is difficult as old habits die slowly, but this can be done. Here are the most common "legacy integer comparison operators":

-eq
is equal to

if [[ "$a" -eq "$b" ]]

-ne
is not equal to

if [[ "$a" -ne "$b" ]]

-gt
is greater than

if [[ "$a" -gt "$b" ]]

-ge
is greater than or equal to

if [[ "$a" -ge "$b" ]]

-lt
is less than

if [[ "$a" -lt "$b" ]]

-le
is less than or equal to

if [[ "$a" -le "$b" ]]

String comparisons

String comparisons is all what left useful in this construct as for integer comparisons ((..)) construct is better and for file comparisons older [...] construct is equal. This topic is discussed at greater length at String Operations in Shell

Notes:

Operator True if...
str = pat
str == pat
str matches pat. Note that in case of "==" that's not what you logically expect, if you have some experience with C /C++/Java programming !!!
str != pat str does not match pat.
str1 < str2 str1 is less than str2 is collation order used
str1 > str2 str1 is greater than str2.
-n str str is not null (has length greater than 0).
-z str str is null (has length 0).
file1 -ef file2 file1 is another name for file2 (hard or symbolic link)

While we're cleaning up code we wrote in the last chapter, let's fix up the error handling in the highest script The code for that script is:

filename=${1:?"filename missing."}
howmany=${2:-10}
sort -nr $filename | head -$howmany

Recall that if you omit the first argument (the filename), the shell prints the message highest: 1: filename missing. We can make this better by substituting a more standard "usage" message:

if [[ -z $1 ]]; then
    print 'usage: howmany filename [-N]'
else
    filename=$1
    howmany=${2:-10}
    sort -nr $filename | head -$howmany
fi

It is considered better programming style to enclose all of the code in the if-then-else, but such code can get confusing if you are writing a long script in which you need to check for errors and bail out at several points along the way. Therefore, a more usual style for shell programming is this:

if [[ -z $1 ]]; then
    print 'usage: howmany filename [-N]'
    return 1
fi
filename=$1
howmany=${2:-10}
sort -nr $filename | head -$howmany

Pattern Matching

There are two types of pattern matching is shell:

Unless you need to modify old scripts it does not make sense to use old ksh-style regex in bash.

Perl-style regular expressions

(partially borrowed from Bash Regular Expressions | Linux Journal)

Since version 3 of bash (released in 2004) bash implements an extended regular expressions which are mostly compatible with Perl regex. They are also called POSIX regular expressions as they are defined in IEEE POSIX 1003.2. (which you should read and understand to use the full power provided). Extended regular expression are also used in egrep so they are well known by system administrators. Please note that Perl regular expressions are equivalent to extended regular expressions with a few additional features:

Predefined Character Classes

Extended regular expression support set of predefined character classes. When used between brackets, these define commonly used sets of characters. The POSIX character classes implemented in extended regular expressions include:

NOTE: I have problems with GNU bash, version 3.2.25(1)-release (x86_64-redhat-linux-gnu) using those extended classes. It does accept them, but it does not match correctly.

Modifiers are similar to Perl

Extended regex Perl regex
a+ a+
a? a?
a|b a|b
(expression1) (expression1)
{m,n} {m,n}
{,n} {,n}
{m,} {m,}
{m} {m}

It returns 0 (success) if the regular expression matches the string, otherwise it returns 1 (failure).

In addition to doing simple matching, bash regular expressions support sub-patterns surrounded by parenthesis for capturing parts of the match. The matches are assigned to an array variable BASH_REMATCH. The entire match is assigned to BASH_REMATCH[0], the first sub-pattern is assigned to BASH_REMATCH[1], etc..

The following example script takes a regular expression as its first argument and one or more strings to match against. It then cycles through the strings and outputs the results of the match process:

#!/bin.bash

if [[ $# -lt 2 ]]; then
    echo "Usage: $0 PATTERN STRINGS..."
    exit 1
fi
regex=$1
shift
echo "regex: $regex"
echo

while [[ $1 ]]
do
    if [[ $1 =~ $regex ]]; then
        echo "$1 matches"
        i=1
        n=${#BASH_REMATCH[*]}
        while [[ $i -lt $n ]]
        do
            echo "  capture[$i]: ${BASH_REMATCH[$i]}"
            let i++
        done
    else
        echo "$1 does not match"
    fi
    shift
done

Assuming the script is saved in "bashre.sh", the following sample shows its output:

  # sh bashre.sh 'aa(b{2,3}[xyz])cc' aabbxcc aabbcc
  regex: aa(b{2,3}[xyz])cc

  aabbxcc matches
    capture[1]: bbx
  aabbcc does not match

Old KSH Pattern-matching Operators

Pattern-matching operators were introduced in ksh88 in a very idiosyncratic way. The notation is different from used by Perl or utilities such as grep. That's a shame, but that's how it is. Life is not perfect. They are hard to remember, but there is a handy mnemonic tip: # matches the front because number signs precede numbers; % matches the rear because percent signs follow numbers.

There are two kinds of pattern matching available: matching from the left and matching from the right.

The operators, with their functions and an example, are shown in the following table (note that on keyboard symbol "#" is to the left and symbol "%" is to the right of dollar sign; that might help to memorize them):

Operator Meaning Example
${var#t*is} Deletes the shortest possible match from the left: If the pattern matches the beginning of the variable's value, delete the shortest part that matches and return the rest. export $var="this is a test"
echo ${var#t*is}
is a test
${var##t*is} Deletes the longest possible match from the left: If the pattern matches the beginning of the variable's value, delete the longest part that matches and return the rest. export $var="this is a test"

echo ${var##t*is}

a test

${var%t*st} Deletes the shortest possible match from the right: If the pattern matches the end of the variable's value, delete the shortest part that matches and return the rest. export $var="this is a test"

echo ${var%t*st}

this is a

${var%%t*st} Deletes the longest possible match from the right: If the pattern matches the end of the variable's value, delete the longest part that matches and return the rest. export $var="this is a test" echo ${var%%t*is}

NOTE: While the # and % identifiers may not seem obvious, they have a convenient mnemonic. The # key is on the left side of the $ key on the keyboard and operates from the left. The % key is on the right of the $ key and operated from the right.

These operators can be used to cu string both from right and left and extract the necessary part. In the example below this is done with uptime:

cores=`grep processor /proc/cpuinfo | wc -l`
cpuload=`uptime`

cpuload=${cpuload#*average: }
cpuload=${cpuload%%.*}
if (( cpuload > cores + 1  )) ;  then 
   echo "Server $HOSTNAME overloaded: $cpuload on $corex cores"
fi    

For example, the following script changes the extension of all .html files to .htm.

#!/bin/bash
# quickly convert html filenames for use on a dossy system
# only handles file extensions, not filenames

for i in *.html; do
  if [ -f ${i%l} ]; then
    echo ${i%l} already exists
  else
    mv $i ${i%l}
  fi
done

The classic use for pattern-matching operators is stripping off components of pathnames, such as directory prefixes and filename suffixes. With that in mind, here is an example that shows how all of the operators work. Assume that the variable path has the value /home /billr/mem/long.file.name; then:

Expression         	  Result
${path##/*/}                       long.file.name
${path#/*/}              billr/mem/long.file.name
$path              /home/billr/mem/long.file.name
${path%.*}         /home/billr/mem/long.file
${path%%.*}        /home/billr/mem/long

Operator #: ${var#t*is} deletes the shortest possible match from the left

Example:

$ export var="this is a test"
$ echo ${var#t*is}
is a test

Operator ##: ${var##t*is} deletes the longest possible match from the left

Example:

$ export var="this is a test"
$ echo ${var##t*is}
a test

Operator %: ${var%t*st} Function: deletes the shortest possible match from the right

Example:

$ export var="this is a test" 
$ echo ${var%t*st} 
this is a
for i in *.htm*; do 
   if [ -f ${i%l} ]; then  
      echo "${i%l} already exists" 
   else  
      mv $i ${i%l} 
   fi  
done

Operator %%: ${var%%t*st} deletes the longest possible match from the right

Example:

$ export var="this is a test" 
$ echo ${var%%t*st}

Ksh-style regular expressions

A KSH regular expression are now obsolete. Please use Perl-style regular expression if [[ variable =~ regex ]] instead (available in bash 4.x)

They use idiosyncratic prefix-based notation that is difficult to learn after you got used to regular suffix based notation. In other words each of "quantity metasymbol" or quantifiers should be used as prefix for the string not as suffix as everywhere in Unix (like in ls resolv*, not ls *(resolv).)

This was a huge blunder committed by David Korn and still is was never corrected. In any case attempts to used it looks pretty perverse and if should be abandoned for good. The notes below are just for those unfortunate people who need to understand somebody else scripts which use this notation. Information below was extracted from Learning the Korn Shell, 2nd Edition 4.5. String Operators. I never verified its correctness.

Each such operator has the form x(exp), where x is the particular operator and exp is any regular expression (often simply a regular string). The operator determines how many occurrences of exp a string that matches the pattern can contain.

Operator Meaning
*(exp) 0 or more occurrences of exp
+(exp) 1 or more occurrences of exp
?(exp) 0 or 1 occurrences of exp
@(exp1|exp2|...) exp1 or exp2 or...
!(exp) Anything that doesn't match exp
Expression Matches
x x
*(x) Null string, x, xx, xxx, ...
+(x) x, xx, xxx, ...
?(x) Null string, x
!(x) Any string except x
@(x) x (see below)

The following section compares Korn shell regular expressions to analogous features in awk and egrep. If you aren't familiar with these, skip to the section entitled "Pattern-matching Operators."

shell basic regex vs. awk/egrep regular expressions

Shell egrep/awk Meaning
*(exp) exp* 0 or more occurrences of exp
+(exp) exp+ 1 or more occurrences of exp
?(exp) exp? 0 or 1 occurrences of exp
@(exp1|exp2|...) exp1|exp2|... exp1 or exp2 or...
!(exp) (none) Anything that doesn't match exp

These equivalents are close but not quite exact. Actually, an exp within any of the Korn shell operators can be a series of exp1|exp2|... alternates. But because the shell would interpret an expression like dave|fred|bob as a pipeline of commands, you must use @(dave|fred|bob) for alternates

For example:

It is worth re-emphasizing that shell regular expressions can still contain standard shell wildcards. Thus, the shell wildcard ? (match any single character) is the equivalent to . in egrep or awk, and the shell's character set operator [...] is the same as in those utilities. For example, the expression +([0-9]) matches a number, i.e., one or more digits. The shell wildcard character * is equivalent to the shell regular expression * (?).

A few egrep and awk regex operators do not have equivalents in the Korn shell. These include:

The first two pairs are hardly necessary, since the Korn shell doesn't normally operate on text files and does parse strings into words itself.

Conversion of strings to lower or upper case

Conversion of string to lower and upper case was weak point of shell for along time. Before recent enhancements described below the "standard" way to converting string was to use tr function:

a='Hi all'
a=$(tr '[:upper:]' '[:lower:]' <<< "$a") 
hi all

Please note that this is a better solution then

a=$(tr '[A-Z]' '[a-z]' <<< "$a")

Using A-Z assumed that the text is ASCII. Which may or may not be the case. Note that tr '[A-Z]' '[a-z]; is incorrect in almost all locales. For example, in the en-US locale, A-Z is actually the interval AaBbCcDdEeFfGgHh...XxYyZ

In more complex cases Perl or AWK should be used instead.

If you need to work with very old versions of bash (1.x) and do not have access to tr, sed, awl or Perl (poor you ;-) you can to create two (or more) functions for this purpose. See Converting string to lower case in Bash - Stack Overflow for inspiration. Here is an example from this thread:

lcs="abcdefghijklmnopqrstuvwxyz"
ucs="ABCDEFGHIJKLMNOPQRSTUVWXYZ"
input="Change Me To All Capitals"
for (( i=0; i<"${#input}"; i++ )) ; do :
for (( j=0; j<"${#lcs}"; j++ )) ; do :
if [[ "${input:$i:1}" == "${lcs:$j:1}" ]] ; then
input="${input/${input:$i:1}/${ucs:$j:1}}"
fi
done
done

Using typeset/declare statement for converting (or more correctly casting) string to lower or upper case

The typeset keyword (declare is the alias of the keyword typeset) usually is used for specifying integer type or creating local variables in shell. But recently its functionality was extended and now it allow perform two really useful string operations:

From the moment you used this option the string will cast to the specified case on assignment. It will work until you explicitly turn it off. You can turn off typeset options explicitly by typing typeset +o , where o is the option you turned on before.

In Korn shell there are two additional useful options (-L and -R) that allow also trimming string to fixed length and remove leading blanks. An obvious application for the -those options is one in which you need fixed-width output.

Here are a simple example taken from Leaning the Korn Shell [Chapter 6] 6.3 Arrays

Assume that the variable alpha is assigned the letters of the alphabet, in alternating case, surrounded by three blanks on each side:

alpha="   aBcDeFgHiJkLmNoPqRsTuVwXyZ   "

Table 6.6 shows some typeset statements and their resulting values (assuming that each of the statements are run "independently").

Table 6.6: Examples of typeset String Formatting Options
Statement Value of v
typeset -L v=$alpha "aBcDeFgHiJkLmNoPqRsTuVwXyZ "
typeset -L10 v=$alpha "aBcDeFgHiJ"
typeset -R v=$alpha " aBcDeFgHiJkLmNoPqRsTuVwXyZ"
typeset -R16 v=$alpha "kLmNoPqRsTuVwXyZ"
typeset -l v=$alpha " abcdefghijklmnopqrstuvwxyz"
typeset -uR5 v=$alpha "VWXYZ"
typeset -Z8 v= "123.50" "00123.50"

More examples

$ a="A Few Words"
$ declare -l a=a
$ echo "$a"
a few words
$ a="A Few Words"
$ declare -u a=a
$ echo "$a"
A FEW WORDS

The declation provided will work for subsequent assignments too: all string will be forcefully converted to specific case on assignment. While in many cases very convenient, sometimes it can mange your strings, is you forget about the fact that this declation will be in force until implicitly removed. If such behaviour is not what you want you need to remove particular flag from the variable by using declare the attribute with +l or +u

Converting string to lower case or upper case using ^^ and ,, operators

Another way to convert the string to lower case in bash 4.x, which do not have such a side effect, is ,, operator

$ a=${a,,}

Similarly to convert string to upper case you can use ^^

a=${a^^}

You can also toggle first character by word using ~ (probably inspired by VI)

${a~}

There are also several other options.

Using distance between characters in ASCII character set for conversion

More sophisticated version that works only with ASCII characters uses the fact that the distance between lower and upper characters is the same for all letters of the alphabet. (but not for _ or any other special letter). In this case you can work with the binary representation of each letter, which allow you to implement more complex variants of conversion (for example with partial transliteration of symbols)

97 - 65 = 32

And this is the working version with examples.
Please note the comments in the code, as they explain a lot of stuff:

#!/bin/bash

# lowerupper.sh

# Prints the lowercase version of a char
lowercaseChar(){
    case "$1" in
        [A-Z])
            n=$(printf "%d" "'$1")
            n=$((n+32))
            printf \\$(printf "%o" "$n")
            ;;
        *)
            printf "%s" "$1"
            ;;
    esac
}

# Prints the lowercase version of a sequence of strings
lowercase() {
    word="$@"
    for((i=0;i<${#word};i++)); do
        ch="${word:$i:1}"
        lowercaseChar "$ch"
    done
}

# Prints the uppercase version of a char
uppercaseChar(){
    case "$1" in
        [a-z])
            n=$(printf "%d" "'$1")
            n=$((n-32))
            printf \\$(printf "%o" "$n")
            ;;
        *)
            printf "%s" "$1"
            ;;
    esac
}

# Prints the uppercase version of a sequence of strings
uppercase() {
    word="$@"
    for((i=0;i<${#word};i++)); do
        ch="${word:$i:1}"
        uppercaseChar "$ch"
    done
}

# The functions will not add a new line, so use echo or
# append it if you want a new line after printing

# Printing stuff directly
lowercase "I AM the Walrus!"$'\n'
uppercase "I AM the Walrus!"$'\n'

echo "----------"

# Printing a var
str="A StRing WITH mixed sTUFF!"
lowercase "$str"$'\n'
uppercase "$str"$'\n'

echo "----------"

# Not quoting the var should also work, 
# since we use "$@" inside the functions
lowercase $str$'\n'
uppercase $str$'\n'

echo "----------"

# Assigning to a var
myLowerVar="$(lowercase $str)"
myUpperVar="$(uppercase $str)"
echo "myLowerVar: $myLowerVar"
echo "myUpperVar: $myUpperVar"

echo "----------"

# You can even do stuff like
if [[ 'option 2' = "$(lowercase 'OPTION 2')" ]]; then
    echo "Fine! All the same!"
else
    echo "Ops! Not the same!"
fi

exit 0

And the results after running this:

$ ./lowerupper.sh 
i am the walrus!
I AM THE WALRUS!
----------
a string with mixed stuff!
A STRING WITH MIXED STUFF!
----------
a string with mixed stuff!
A STRING WITH MIXED STUFF!
----------
myLowerVar: a string with mixed stuff!
myUpperVar: A STRING WITH MIXED STUFF!
----------
Fine! All the same!

But this is mostly "art for the sake of art". Moreover it will not work in borne shell as borne shell does not support ${1:$i:1}. But this solution is portable between all version of bash in use (C-style for loop was introduced in bash 2.0 I think).

Here strings

Here string are a special case of shell assignment (From Wikipedia):

In the following example, text is passed to the tr command (transliterating lower to upper-case) using a here document. This could be in a shell file, or entered interactively at a prompt.
$ tr a-z A-Z << END_TEXT
> one two three
> four five six
> END_TEXT
ONE TWO THREE
FOUR FIVE SIX

END_TEXT was used as the delimiting identifier. It specified the start and end of the here document. The redirect and the delimiting identifier do not need to be separated by a space: <<END_TEXT or << END_TEXT both work equally well.

Appending a minus sign to the << has the effect that leading tabs are ignored. This allows indenting here documents in shell scripts (primarily for alignment with existing indentation) without changing their value:[a]

$ tr a-z A-Z <<- END_TEXT
>       one two three
>       four five six
>       END_TEXT
ONE TWO THREE
FOUR FIVE SIX

This yields the same output, notably not indented.

By default, behavior is largely identical to the contents of double quotes: variables are interpolated, commands in backticks are evaluated, etc.[b]

$ cat << EOF
> \$ Working dir "$PWD" `pwd`
> EOF
$ Working dir "/home/user" /home/user

This can be disabled by quoting any part of the label, which is then ended by the unquoted value;[c] the behavior is essentially identical to that if the contents were enclosed in single quotes. Thus for example by setting it in single quotes:

$ cat << 'EOF'
> \$ Working dir "$PWD" `pwd`
> EOF
\$ Working dir "$PWD" `pwd`

Double quotes may also be used, but this is subject to confusion, because expansion does occur in a double-quoted string, but does not occur in a here document with double-quoted delimiter.[3] Single- and double-quoted delimiters are distinguished in some other languages, notably Perl (see below), where behavior parallels the corresponding string quoting.

Here strings

A here string (available in Bash, ksh, or zsh) is syntactically similar, consisting of <<<, and effects input redirection from a word (a sequence treated as a unit by the shell, in this context generally a string literal). In this case the usual shell syntax is used for the word ("here string syntax"), with the only syntax being the redirection: a here string is an ordinary string used for input redirection, not a special kind of string.

Operation <<< was introduced in bash-2.05b

A single word need not be quoted:

$ tr a-z A-Z <<< one
ONE

In case of a string with spaces, it must be quoted:

$ tr a-z A-Z <<< 'one two three'
ONE TWO THREE

This could also be written as:

$ FOO='one two three'
$ tr a-z A-Z <<< $FOO
ONE TWO THREE

Multiline strings are acceptable, yielding:

$ tr a-z A-Z <<< 'one
> two three'
ONE
TWO THREE

Note that leading and trailing newlines, if present, are included:

$ tr a-z A-Z <<< '
> one
> two three
> '

ONE
TWO THREE

$

The key difference from here documents is that, in here documents, the delimiters are on separate lines; the leading and trailing newlines are stripped. Here, the terminating delimiter can be specified.

Here strings are particularly useful for commands that often take short input, such as the calculator bc:

$ bc <<< 2^10
1024

Note that here string behavior can also be accomplished (reversing the order) via piping and the echo command, as in:

$ echo 'one two three' | tr a-z A-Z
ONE TWO THREE

however here strings are particularly useful when the last command needs to run in the current process, as is the case with the read builtin:

$ echo 'one two three' | read a b c
$ echo $a $b $c

yields nothing, while

$ read a b c <<< 'one two three'
$ echo $a $b $c
one two three

This happens because in the previous example piping causes read to run in a subprocess, and as such can not affect the environment of the parent process.


Top Visited
Switchboard
Latest
Past week
Past month

NEWS CONTENTS

Old News :-)

Please visit Heiner Steven's SHELLdorado, the best shell scripting site on the Internet

[Nov 08, 2018] How to split one string into multiple variables in bash shell? [duplicate]

Nov 08, 2018 | stackoverflow.com
This question already has an answer here:

Rob I , May 9, 2012 at 19:22

For your second question, see @mkb's comment to my answer below - that's definitely the way to go! – Rob I May 9 '12 at 19:22

Dennis Williamson , Jul 4, 2012 at 16:14

See my edited answer for one way to read individual characters into an array. – Dennis Williamson Jul 4 '12 at 16:14

Nick Weedon , Dec 31, 2015 at 11:04

Here is the same thing in a more concise form: var1=$(cut -f1 -d- <<<$STR) – Nick Weedon Dec 31 '15 at 11:04

Rob I , May 9, 2012 at 17:00

If your solution doesn't have to be general, i.e. only needs to work for strings like your example, you could do:
var1=$(echo $STR | cut -f1 -d-)
var2=$(echo $STR | cut -f2 -d-)

I chose cut here because you could simply extend the code for a few more variables...

crunchybutternut , May 9, 2012 at 17:40

Can you look at my post again and see if you have a solution for the followup question? thanks! – crunchybutternut May 9 '12 at 17:40

mkb , May 9, 2012 at 17:59

You can use cut to cut characters too! cut -c1 for example. – mkb May 9 '12 at 17:59

FSp , Nov 27, 2012 at 10:26

Although this is very simple to read and write, is a very slow solution because forces you to read twice the same data ($STR) ... if you care of your script performace, the @anubhava solution is much better – FSp Nov 27 '12 at 10:26

tripleee , Jan 25, 2016 at 6:47

Apart from being an ugly last-resort solution, this has a bug: You should absolutely use double quotes in echo "$STR" unless you specifically want the shell to expand any wildcards in the string as a side effect. See also stackoverflow.com/questions/10067266/tripleee Jan 25 '16 at 6:47

Rob I , Feb 10, 2016 at 13:57

You're right about double quotes of course, though I did point out this solution wasn't general. However I think your assessment is a bit unfair - for some people this solution may be more readable (and hence extensible etc) than some others, and doesn't completely rely on arcane bash feature that wouldn't translate to other shells. I suspect that's why my solution, though less elegant, continues to get votes periodically... – Rob I Feb 10 '16 at 13:57

Dennis Williamson , May 10, 2012 at 3:14

read with IFS are perfect for this:
$ IFS=- read var1 var2 <<< ABCDE-123456
$ echo "$var1"
ABCDE
$ echo "$var2"
123456

Edit:

Here is how you can read each individual character into array elements:

$ read -a foo <<<"$(echo "ABCDE-123456" | sed 's/./& /g')"

Dump the array:

$ declare -p foo
declare -a foo='([0]="A" [1]="B" [2]="C" [3]="D" [4]="E" [5]="-" [6]="1" [7]="2" [8]="3" [9]="4" [10]="5" [11]="6")'

If there are spaces in the string:

$ IFS=$'\v' read -a foo <<<"$(echo "ABCDE 123456" | sed 's/./&\v/g')"
$ declare -p foo
declare -a foo='([0]="A" [1]="B" [2]="C" [3]="D" [4]="E" [5]=" " [6]="1" [7]="2" [8]="3" [9]="4" [10]="5" [11]="6")'

insecure , Apr 30, 2014 at 7:51

Great, the elegant bash-only way, without unnecessary forks. – insecure Apr 30 '14 at 7:51

Martin Serrano , Jan 11 at 4:34

this solution also has the benefit that if delimiter is not present, the var2 will be empty – Martin Serrano Jan 11 at 4:34

mkb , May 9, 2012 at 17:02

If you know it's going to be just two fields, you can skip the extra subprocesses like this:
var1=${STR%-*}
var2=${STR#*-}

What does this do? ${STR%-*} deletes the shortest substring of $STR that matches the pattern -* starting from the end of the string. ${STR#*-} does the same, but with the *- pattern and starting from the beginning of the string. They each have counterparts %% and ## which find the longest anchored pattern match. If anyone has a helpful mnemonic to remember which does which, let me know! I always have to try both to remember.

Jens , Jan 30, 2015 at 15:17

Plus 1 For knowing your POSIX shell features, avoiding expensive forks and pipes, and the absence of bashisms. – Jens Jan 30 '15 at 15:17

Steven Lu , May 1, 2015 at 20:19

Dunno about "absence of bashisms" considering that this is already moderately cryptic .... if your delimiter is a newline instead of a hyphen, then it becomes even more cryptic. On the other hand, it works with newlines , so there's that. – Steven Lu May 1 '15 at 20:19

mkb , Mar 9, 2016 at 17:30

@KErlandsson: done – mkb Mar 9 '16 at 17:30

mombip , Aug 9, 2016 at 15:58

I've finally found documentation for it: Shell-Parameter-Expansionmombip Aug 9 '16 at 15:58

DS. , Jan 13, 2017 at 19:56

Mnemonic: "#" is to the left of "%" on a standard keyboard, so "#" removes a prefix (on the left), and "%" removes a suffix (on the right). – DS. Jan 13 '17 at 19:56

tripleee , May 9, 2012 at 17:57

Sounds like a job for set with a custom IFS .
IFS=-
set $STR
var1=$1
var2=$2

(You will want to do this in a function with a local IFS so you don't mess up other parts of your script where you require IFS to be what you expect.)

Rob I , May 9, 2012 at 19:20

Nice - I knew about $IFS but hadn't seen how it could be used. – Rob I May 9 '12 at 19:20

Sigg3.net , Jun 19, 2013 at 8:08

I used triplee's example and it worked exactly as advertised! Just change last two lines to <pre> myvar1= echo $1 && myvar2= echo $2 </pre> if you need to store them throughout a script with several "thrown" variables. – Sigg3.net Jun 19 '13 at 8:08

tripleee , Jun 19, 2013 at 13:25

No, don't use a useless echo in backticks . – tripleee Jun 19 '13 at 13:25

Daniel Andersson , Mar 27, 2015 at 6:46

This is a really sweet solution if we need to write something that is not Bash specific. To handle IFS troubles, one can add OLDIFS=$IFS at the beginning before overwriting it, and then add IFS=$OLDIFS just after the set line. – Daniel Andersson Mar 27 '15 at 6:46

tripleee , Mar 27, 2015 at 6:58

FWIW the link above is broken. I was lazy and careless. The canonical location still works; iki.fi/era/unix/award.html#echotripleee Mar 27 '15 at 6:58

anubhava , May 9, 2012 at 17:09

Using bash regex capabilities:
re="^([^-]+)-(.*)$"
[[ "ABCDE-123456" =~ $re ]] && var1="${BASH_REMATCH[1]}" && var2="${BASH_REMATCH[2]}"
echo $var1
echo $var2

OUTPUT

ABCDE
123456

Cometsong , Oct 21, 2016 at 13:29

Love pre-defining the re for later use(s)! – Cometsong Oct 21 '16 at 13:29

Archibald , Nov 12, 2012 at 11:03

string="ABCDE-123456"
IFS=- # use "local IFS=-" inside the function
set $string
echo $1 # >>> ABCDE
echo $2 # >>> 123456

tripleee , Mar 27, 2015 at 7:02

Hmmm, isn't this just a restatement of my answer ? – tripleee Mar 27 '15 at 7:02

Archibald , Sep 18, 2015 at 12:36

Actually yes. I just clarified it a bit. – Archibald Sep 18 '15 at 12:36

[Nov 08, 2018] How to split a string in shell and get the last field

Nov 08, 2018 | stackoverflow.com

cd1 , Jul 1, 2010 at 23:29

Suppose I have the string 1:2:3:4:5 and I want to get its last field ( 5 in this case). How do I do that using Bash? I tried cut , but I don't know how to specify the last field with -f .

Stephen , Jul 2, 2010 at 0:05

You can use string operators :
$ foo=1:2:3:4:5
$ echo ${foo##*:}
5

This trims everything from the front until a ':', greedily.

${foo  <-- from variable foo
  ##   <-- greedy front trim
  *    <-- matches anything
  :    <-- until the last ':'
 }

eckes , Jan 23, 2013 at 15:23

While this is working for the given problem, the answer of William below ( stackoverflow.com/a/3163857/520162 ) also returns 5 if the string is 1:2:3:4:5: (while using the string operators yields an empty result). This is especially handy when parsing paths that could contain (or not) a finishing / character. – eckes Jan 23 '13 at 15:23

Dobz , Jun 25, 2014 at 11:44

How would you then do the opposite of this? to echo out '1:2:3:4:'? – Dobz Jun 25 '14 at 11:44

Mihai Danila , Jul 9, 2014 at 14:07

And how does one keep the part before the last separator? Apparently by using ${foo%:*} . # - from beginning; % - from end. # , % - shortest match; ## , %% - longest match. – Mihai Danila Jul 9 '14 at 14:07

Putnik , Feb 11, 2016 at 22:33

If i want to get the last element from path, how should I use it? echo ${pwd##*/} does not work. – Putnik Feb 11 '16 at 22:33

Stan Strum , Dec 17, 2017 at 4:22

@Putnik that command sees pwd as a variable. Try dir=$(pwd); echo ${dir##*/} . Works for me! – Stan Strum Dec 17 '17 at 4:22

a3nm , Feb 3, 2012 at 8:39

Another way is to reverse before and after cut :
$ echo ab:cd:ef | rev | cut -d: -f1 | rev
ef

This makes it very easy to get the last but one field, or any range of fields numbered from the end.

Dannid , Jan 14, 2013 at 20:50

This answer is nice because it uses 'cut', which the author is (presumably) already familiar. Plus, I like this answer because I am using 'cut' and had this exact question, hence finding this thread via search. – Dannid Jan 14 '13 at 20:50

funroll , Aug 12, 2013 at 19:51

Some cut-and-paste fodder for people using spaces as delimiters: echo "1 2 3 4" | rev | cut -d " " -f1 | revfunroll Aug 12 '13 at 19:51

EdgeCaseBerg , Sep 8, 2013 at 5:01

the rev | cut -d -f1 | rev is so clever! Thanks! Helped me a bunch (my use case was rev | -d ' ' -f 2- | rev – EdgeCaseBerg Sep 8 '13 at 5:01

Anarcho-Chossid , Sep 16, 2015 at 15:54

Wow. Beautiful and dark magic. – Anarcho-Chossid Sep 16 '15 at 15:54

shearn89 , Aug 17, 2017 at 9:27

I always forget about rev , was just what I needed! cut -b20- | rev | cut -b10- | revshearn89 Aug 17 '17 at 9:27

William Pursell , Jul 2, 2010 at 7:09

It's difficult to get the last field using cut, but here's (one set of) solutions in awk and perl
$ echo 1:2:3:4:5 | awk -F: '{print $NF}'
5
$ echo 1:2:3:4:5 | perl -F: -wane 'print $F[-1]'
5

eckes , Jan 23, 2013 at 15:20

great advantage of this solution over the accepted answer: it also matches paths that contain or do not contain a finishing / character: /a/b/c/d and /a/b/c/d/ yield the same result ( d ) when processing pwd | awk -F/ '{print $NF}' . The accepted answer results in an empty result in the case of /a/b/c/d/eckes Jan 23 '13 at 15:20

stamster , May 21 at 11:52

@eckes In case of AWK solution, on GNU bash, version 4.3.48(1)-release that's not true, as it matters whenever you have trailing slash or not. Simply put AWK will use / as delimiter, and if your path is /my/path/dir/ it will use value after last delimiter, which is simply an empty string. So it's best to avoid trailing slash if you need to do such a thing like I do. – stamster May 21 at 11:52

Nicholas M T Elliott , Jul 1, 2010 at 23:39

Assuming fairly simple usage (no escaping of the delimiter, for example), you can use grep:
$ echo "1:2:3:4:5" | grep -oE "[^:]+$"
5

Breakdown - find all the characters not the delimiter ([^:]) at the end of the line ($). -o only prints the matching part.

Dennis Williamson , Jul 2, 2010 at 0:05

One way:
var1="1:2:3:4:5"
var2=${var1##*:}

Another, using an array:

var1="1:2:3:4:5"
saveIFS=$IFS
IFS=":"
var2=($var1)
IFS=$saveIFS
var2=${var2[@]: -1}

Yet another with an array:

var1="1:2:3:4:5"
saveIFS=$IFS
IFS=":"
var2=($var1)
IFS=$saveIFS
count=${#var2[@]}
var2=${var2[$count-1]}

Using Bash (version >= 3.2) regular expressions:

var1="1:2:3:4:5"
[[ $var1 =~ :([^:]*)$ ]]
var2=${BASH_REMATCH[1]}

liuyang1 , Mar 24, 2015 at 6:02

Thanks so much for array style, as I need this feature, but not have cut, awk these utils. – liuyang1 Mar 24 '15 at 6:02

user3133260 , Dec 24, 2013 at 19:04

$ echo "a b c d e" | tr ' ' '\n' | tail -1
e

Simply translate the delimiter into a newline and choose the last entry with tail -1 .

Yajo , Jul 30, 2014 at 10:13

It will fail if the last item contains a \n , but for most cases is the most readable solution. – Yajo Jul 30 '14 at 10:13

Rafael , Nov 10, 2016 at 10:09

Using sed :
$ echo '1:2:3:4:5' | sed 's/.*://' # => 5

$ echo '' | sed 's/.*://' # => (empty)

$ echo ':' | sed 's/.*://' # => (empty)
$ echo ':b' | sed 's/.*://' # => b
$ echo '::c' | sed 's/.*://' # => c

$ echo 'a' | sed 's/.*://' # => a
$ echo 'a:' | sed 's/.*://' # => (empty)
$ echo 'a:b' | sed 's/.*://' # => b
$ echo 'a::c' | sed 's/.*://' # => c

Ab Irato , Nov 13, 2013 at 16:10

If your last field is a single character, you could do this:
a="1:2:3:4:5"

echo ${a: -1}
echo ${a:(-1)}

Check string manipulation in bash .

gniourf_gniourf , Nov 13, 2013 at 16:15

This doesn't work: it gives the last character of a , not the last field . – gniourf_gniourf Nov 13 '13 at 16:15

Ab Irato , Nov 25, 2013 at 13:25

True, that's the idea, if you know the length of the last field it's good. If not you have to use something else... – Ab Irato Nov 25 '13 at 13:25

sphakka , Jan 25, 2016 at 16:24

Interesting, I didn't know of these particular Bash string manipulations. It also resembles to Python's string/array slicing . – sphakka Jan 25 '16 at 16:24

ghostdog74 , Jul 2, 2010 at 1:16

Using Bash.
$ var1="1:2:3:4:0"
$ IFS=":"
$ set -- $var1
$ eval echo  \$${#}
0

Sopalajo de Arrierez , Dec 24, 2014 at 5:04

I would buy some details about this method, please :-) . – Sopalajo de Arrierez Dec 24 '14 at 5:04

Rafa , Apr 27, 2017 at 22:10

Could have used echo ${!#} instead of eval echo \$${#} . – Rafa Apr 27 '17 at 22:10

Crytis , Dec 7, 2016 at 6:51

echo "a:b:c:d:e"|xargs -d : -n1|tail -1

First use xargs split it using ":",-n1 means every line only have one part.Then,pring the last part.

BDL , Dec 7, 2016 at 13:47

Although this might solve the problem, one should always add an explanation to it. – BDL Dec 7 '16 at 13:47

Crytis , Jun 7, 2017 at 9:13

already added.. – Crytis Jun 7 '17 at 9:13

021 , Apr 26, 2016 at 11:33

There are many good answers here, but still I want to share this one using basename :
 basename $(echo "a:b:c:d:e" | tr ':' '/')

However it will fail if there are already some '/' in your string . If slash / is your delimiter then you just have to (and should) use basename.

It's not the best answer but it just shows how you can be creative using bash commands.

Nahid Akbar , Jun 22, 2012 at 2:55

for x in `echo $str | tr ";" "\n"`; do echo $x; done

chepner , Jun 22, 2012 at 12:58

This runs into problems if there is whitespace in any of the fields. Also, it does not directly address the question of retrieving the last field. – chepner Jun 22 '12 at 12:58

Christoph Böddeker , Feb 19 at 15:50

For those that comfortable with Python, https://github.com/Russell91/pythonpy is a nice choice to solve this problem.
$ echo "a:b:c:d:e" | py -x 'x.split(":")[-1]'

From the pythonpy help: -x treat each row of stdin as x .

With that tool, it is easy to write python code that gets applied to the input.

baz , Nov 24, 2017 at 19:27

a solution using the read builtin
IFS=':' read -a field <<< "1:2:3:4:5"
echo ${field[4]}

[Nov 08, 2018] How do I split a string on a delimiter in Bash?

Notable quotes:
"... Bash shell script split array ..."
"... associative array ..."
"... pattern substitution ..."
"... Debian GNU/Linux ..."
Nov 08, 2018 | stackoverflow.com

stefanB , May 28, 2009 at 2:03

I have this string stored in a variable:
IN="bla@some.com;john@home.com"

Now I would like to split the strings by ; delimiter so that I have:

ADDR1="bla@some.com"
ADDR2="john@home.com"

I don't necessarily need the ADDR1 and ADDR2 variables. If they are elements of an array that's even better.


After suggestions from the answers below, I ended up with the following which is what I was after:

#!/usr/bin/env bash

IN="bla@some.com;john@home.com"

mails=$(echo $IN | tr ";" "\n")

for addr in $mails
do
    echo "> [$addr]"
done

Output:

> [bla@some.com]
> [john@home.com]

There was a solution involving setting Internal_field_separator (IFS) to ; . I am not sure what happened with that answer, how do you reset IFS back to default?

RE: IFS solution, I tried this and it works, I keep the old IFS and then restore it:

IN="bla@some.com;john@home.com"

OIFS=$IFS
IFS=';'
mails2=$IN
for x in $mails2
do
    echo "> [$x]"
done

IFS=$OIFS

BTW, when I tried

mails2=($IN)

I only got the first string when printing it in loop, without brackets around $IN it works.

Brooks Moses , May 1, 2012 at 1:26

With regards to your "Edit2": You can simply "unset IFS" and it will return to the default state. There's no need to save and restore it explicitly unless you have some reason to expect that it's already been set to a non-default value. Moreover, if you're doing this inside a function (and, if you aren't, why not?), you can set IFS as a local variable and it will return to its previous value once you exit the function. – Brooks Moses May 1 '12 at 1:26

dubiousjim , May 31, 2012 at 5:21

@BrooksMoses: (a) +1 for using local IFS=... where possible; (b) -1 for unset IFS , this doesn't exactly reset IFS to its default value, though I believe an unset IFS behaves the same as the default value of IFS ($' \t\n'), however it seems bad practice to be assuming blindly that your code will never be invoked with IFS set to a custom value; (c) another idea is to invoke a subshell: (IFS=$custom; ...) when the subshell exits IFS will return to whatever it was originally. – dubiousjim May 31 '12 at 5:21

nicooga , Mar 7, 2016 at 15:32

I just want to have a quick look at the paths to decide where to throw an executable, so I resorted to run ruby -e "puts ENV.fetch('PATH').split(':')" . If you want to stay pure bash won't help but using any scripting language that has a built-in split is easier. – nicooga Mar 7 '16 at 15:32

Jeff , Apr 22 at 17:51

This is kind of a drive-by comment, but since the OP used email addresses as the example, has anyone bothered to answer it in a way that is fully RFC 5322 compliant, namely that any quoted string can appear before the @ which means you're going to need regular expressions or some other kind of parser instead of naive use of IFS or other simplistic splitter functions. – Jeff Apr 22 at 17:51

user2037659 , Apr 26 at 20:15

for x in $(IFS=';';echo $IN); do echo "> [$x]"; doneuser2037659 Apr 26 at 20:15

Johannes Schaub - litb , May 28, 2009 at 2:23

You can set the internal field separator (IFS) variable, and then let it parse into an array. When this happens in a command, then the assignment to IFS only takes place to that single command's environment (to read ). It then parses the input according to the IFS variable value into an array, which we can then iterate over.
IFS=';' read -ra ADDR <<< "$IN"
for i in "${ADDR[@]}"; do
    # process "$i"
done

It will parse one line of items separated by ; , pushing it into an array. Stuff for processing whole of $IN , each time one line of input separated by ; :

 while IFS=';' read -ra ADDR; do
      for i in "${ADDR[@]}"; do
          # process "$i"
      done
 done <<< "$IN"

Chris Lutz , May 28, 2009 at 2:25

This is probably the best way. How long will IFS persist in it's current value, can it mess up my code by being set when it shouldn't be, and how can I reset it when I'm done with it? – Chris Lutz May 28 '09 at 2:25

Johannes Schaub - litb , May 28, 2009 at 3:04

now after the fix applied, only within the duration of the read command :) – Johannes Schaub - litb May 28 '09 at 3:04

lhunath , May 28, 2009 at 6:14

You can read everything at once without using a while loop: read -r -d '' -a addr <<< "$in" # The -d '' is key here, it tells read not to stop at the first newline (which is the default -d) but to continue until EOF or a NULL byte (which only occur in binary data). – lhunath May 28 '09 at 6:14

Charles Duffy , Jul 6, 2013 at 14:39

@LucaBorrione Setting IFS on the same line as the read with no semicolon or other separator, as opposed to in a separate command, scopes it to that command -- so it's always "restored"; you don't need to do anything manually. – Charles Duffy Jul 6 '13 at 14:39

chepner , Oct 2, 2014 at 3:50

@imagineerThis There is a bug involving herestrings and local changes to IFS that requires $IN to be quoted. The bug is fixed in bash 4.3. – chepner Oct 2 '14 at 3:50

palindrom , Mar 10, 2011 at 9:00

Taken from Bash shell script split array :
IN="bla@some.com;john@home.com"
arrIN=(${IN//;/ })

Explanation:

This construction replaces all occurrences of ';' (the initial // means global replace) in the string IN with ' ' (a single space), then interprets the space-delimited string as an array (that's what the surrounding parentheses do).

The syntax used inside of the curly braces to replace each ';' character with a ' ' character is called Parameter Expansion .

There are some common gotchas:

  1. If the original string has spaces, you will need to use IFS :
    • IFS=':'; arrIN=($IN); unset IFS;
  2. If the original string has spaces and the delimiter is a new line, you can set IFS with:
    • IFS=$'\n'; arrIN=($IN); unset IFS;

Oz123 , Mar 21, 2011 at 18:50

I just want to add: this is the simplest of all, you can access array elements with ${arrIN[1]} (starting from zeros of course) – Oz123 Mar 21 '11 at 18:50

KomodoDave , Jan 5, 2012 at 15:13

Found it: the technique of modifying a variable within a ${} is known as 'parameter expansion'. – KomodoDave Jan 5 '12 at 15:13

qbolec , Feb 25, 2013 at 9:12

Does it work when the original string contains spaces? – qbolec Feb 25 '13 at 9:12

Ethan , Apr 12, 2013 at 22:47

No, I don't think this works when there are also spaces present... it's converting the ',' to ' ' and then building a space-separated array. – Ethan Apr 12 '13 at 22:47

Charles Duffy , Jul 6, 2013 at 14:39

This is a bad approach for other reasons: For instance, if your string contains ;*; , then the * will be expanded to a list of filenames in the current directory. -1 – Charles Duffy Jul 6 '13 at 14:39

Chris Lutz , May 28, 2009 at 2:09

If you don't mind processing them immediately, I like to do this:
for i in $(echo $IN | tr ";" "\n")
do
  # process
done

You could use this kind of loop to initialize an array, but there's probably an easier way to do it. Hope this helps, though.

Chris Lutz , May 28, 2009 at 2:42

You should have kept the IFS answer. It taught me something I didn't know, and it definitely made an array, whereas this just makes a cheap substitute. – Chris Lutz May 28 '09 at 2:42

Johannes Schaub - litb , May 28, 2009 at 2:59

I see. Yeah i find doing these silly experiments, i'm going to learn new things each time i'm trying to answer things. I've edited stuff based on #bash IRC feedback and undeleted :) – Johannes Schaub - litb May 28 '09 at 2:59

lhunath , May 28, 2009 at 6:12

-1, you're obviously not aware of wordsplitting, because it's introducing two bugs in your code. one is when you don't quote $IN and the other is when you pretend a newline is the only delimiter used in wordsplitting. You are iterating over every WORD in IN, not every line, and DEFINATELY not every element delimited by a semicolon, though it may appear to have the side-effect of looking like it works. – lhunath May 28 '09 at 6:12

Johannes Schaub - litb , May 28, 2009 at 17:00

You could change it to echo "$IN" | tr ';' '\n' | while read -r ADDY; do # process "$ADDY"; done to make him lucky, i think :) Note that this will fork, and you can't change outer variables from within the loop (that's why i used the <<< "$IN" syntax) then – Johannes Schaub - litb May 28 '09 at 17:00

mklement0 , Apr 24, 2013 at 14:13

To summarize the debate in the comments: Caveats for general use : the shell applies word splitting and expansions to the string, which may be undesired; just try it with. IN="bla@some.com;john@home.com;*;broken apart" . In short: this approach will break, if your tokens contain embedded spaces and/or chars. such as * that happen to make a token match filenames in the current folder. – mklement0 Apr 24 '13 at 14:13

F. Hauri , Apr 13, 2013 at 14:20

Compatible answer

To this SO question, there is already a lot of different way to do this in bash . But bash has many special features, so called bashism that work well, but that won't work in any other shell .

In particular, arrays , associative array , and pattern substitution are pure bashisms and may not work under other shells .

On my Debian GNU/Linux , there is a standard shell called dash , but I know many people who like to use ksh .

Finally, in very small situation, there is a special tool called busybox with his own shell interpreter ( ash ).

Requested string

The string sample in SO question is:

IN="bla@some.com;john@home.com"

As this could be useful with whitespaces and as whitespaces could modify the result of the routine, I prefer to use this sample string:

 IN="bla@some.com;john@home.com;Full Name <fulnam@other.org>"
Split string based on delimiter in bash (version >=4.2)

Under pure bash, we may use arrays and IFS :

var="bla@some.com;john@home.com;Full Name <fulnam@other.org>"
oIFS="$IFS"
IFS=";"
declare -a fields=($var)
IFS="$oIFS"
unset oIFS

IFS=\; read -a fields <<<"$var"

Using this syntax under recent bash don't change $IFS for current session, but only for the current command:

set | grep ^IFS=
IFS=$' \t\n'

Now the string var is split and stored into an array (named fields ):

set | grep ^fields=\\\|^var=
fields=([0]="bla@some.com" [1]="john@home.com" [2]="Full Name <fulnam@other.org>")
var='bla@some.com;john@home.com;Full Name <fulnam@other.org>'

We could request for variable content with declare -p :

declare -p var fields
declare -- var="bla@some.com;john@home.com;Full Name <fulnam@other.org>"
declare -a fields=([0]="bla@some.com" [1]="john@home.com" [2]="Full Name <fulnam@other.org>")

read is the quickiest way to do the split, because there is no forks and no external resources called.

From there, you could use the syntax you already know for processing each field:

for x in "${fields[@]}";do
    echo "> [$x]"
    done
> [bla@some.com]
> [john@home.com]
> [Full Name <fulnam@other.org>]

or drop each field after processing (I like this shifting approach):

while [ "$fields" ] ;do
    echo "> [$fields]"
    fields=("${fields[@]:1}")
    done
> [bla@some.com]
> [john@home.com]
> [Full Name <fulnam@other.org>]

or even for simple printout (shorter syntax):

printf "> [%s]\n" "${fields[@]}"
> [bla@some.com]
> [john@home.com]
> [Full Name <fulnam@other.org>]
Split string based on delimiter in shell

But if you would write something usable under many shells, you have to not use bashisms .

There is a syntax, used in many shells, for splitting a string across first or last occurrence of a substring:

${var#*SubStr}  # will drop begin of string up to first occur of `SubStr`
${var##*SubStr} # will drop begin of string up to last occur of `SubStr`
${var%SubStr*}  # will drop part of string from last occur of `SubStr` to the end
${var%%SubStr*} # will drop part of string from first occur of `SubStr` to the end

(The missing of this is the main reason of my answer publication ;)

As pointed out by Score_Under :

# and % delete the shortest possible matching string, and

## and %% delete the longest possible.

This little sample script work well under bash , dash , ksh , busybox and was tested under Mac-OS's bash too:

var="bla@some.com;john@home.com;Full Name <fulnam@other.org>"
while [ "$var" ] ;do
    iter=${var%%;*}
    echo "> [$iter]"
    [ "$var" = "$iter" ] && \
        var='' || \
        var="${var#*;}"
  done
> [bla@some.com]
> [john@home.com]
> [Full Name <fulnam@other.org>]

Have fun!

Score_Under , Apr 28, 2015 at 16:58

The # , ## , % , and %% substitutions have what is IMO an easier explanation to remember (for how much they delete): # and % delete the shortest possible matching string, and ## and %% delete the longest possible. – Score_Under Apr 28 '15 at 16:58

sorontar , Oct 26, 2016 at 4:36

The IFS=\; read -a fields <<<"$var" fails on newlines and add a trailing newline. The other solution removes a trailing empty field. – sorontar Oct 26 '16 at 4:36

Eric Chen , Aug 30, 2017 at 17:50

The shell delimiter is the most elegant answer, period. – Eric Chen Aug 30 '17 at 17:50

sancho.s , Oct 4 at 3:42

Could the last alternative be used with a list of field separators set somewhere else? For instance, I mean to use this as a shell script, and pass a list of field separators as a positional parameter. – sancho.s Oct 4 at 3:42

F. Hauri , Oct 4 at 7:47

Yes, in a loop: for sep in "#" "ł" "@" ; do ... var="${var#*$sep}" ...F. Hauri Oct 4 at 7:47

DougW , Apr 27, 2015 at 18:20

I've seen a couple of answers referencing the cut command, but they've all been deleted. It's a little odd that nobody has elaborated on that, because I think it's one of the more useful commands for doing this type of thing, especially for parsing delimited log files.

In the case of splitting this specific example into a bash script array, tr is probably more efficient, but cut can be used, and is more effective if you want to pull specific fields from the middle.

Example:

$ echo "bla@some.com;john@home.com" | cut -d ";" -f 1
bla@some.com
$ echo "bla@some.com;john@home.com" | cut -d ";" -f 2
john@home.com

You can obviously put that into a loop, and iterate the -f parameter to pull each field independently.

This gets more useful when you have a delimited log file with rows like this:

2015-04-27|12345|some action|an attribute|meta data

cut is very handy to be able to cat this file and select a particular field for further processing.

MisterMiyagi , Nov 2, 2016 at 8:42

Kudos for using cut , it's the right tool for the job! Much cleared than any of those shell hacks. – MisterMiyagi Nov 2 '16 at 8:42

uli42 , Sep 14, 2017 at 8:30

This approach will only work if you know the number of elements in advance; you'd need to program some more logic around it. It also runs an external tool for every element. – uli42 Sep 14 '17 at 8:30

Louis Loudog Trottier , May 10 at 4:20

Excatly waht i was looking for trying to avoid empty string in a csv. Now i can point the exact 'column' value as well. Work with IFS already used in a loop. Better than expected for my situation. – Louis Loudog Trottier May 10 at 4:20

, May 28, 2009 at 10:31

How about this approach:
IN="bla@some.com;john@home.com" 
set -- "$IN" 
IFS=";"; declare -a Array=($*) 
echo "${Array[@]}" 
echo "${Array[0]}" 
echo "${Array[1]}"

Source

Yzmir Ramirez , Sep 5, 2011 at 1:06

+1 ... but I wouldn't name the variable "Array" ... pet peev I guess. Good solution. – Yzmir Ramirez Sep 5 '11 at 1:06

ata , Nov 3, 2011 at 22:33

+1 ... but the "set" and declare -a are unnecessary. You could as well have used just IFS";" && Array=($IN)ata Nov 3 '11 at 22:33

Luca Borrione , Sep 3, 2012 at 9:26

+1 Only a side note: shouldn't it be recommendable to keep the old IFS and then restore it? (as shown by stefanB in his edit3) people landing here (sometimes just copying and pasting a solution) might not think about this – Luca Borrione Sep 3 '12 at 9:26

Charles Duffy , Jul 6, 2013 at 14:44

-1: First, @ata is right that most of the commands in this do nothing. Second, it uses word-splitting to form the array, and doesn't do anything to inhibit glob-expansion when doing so (so if you have glob characters in any of the array elements, those elements are replaced with matching filenames). – Charles Duffy Jul 6 '13 at 14:44

John_West , Jan 8, 2016 at 12:29

Suggest to use $'...' : IN=$'bla@some.com;john@home.com;bet <d@\ns* kl.com>' . Then echo "${Array[2]}" will print a string with newline. set -- "$IN" is also neccessary in this case. Yes, to prevent glob expansion, the solution should include set -f . – John_West Jan 8 '16 at 12:29

Steven Lizarazo , Aug 11, 2016 at 20:45

This worked for me:
string="1;2"
echo $string | cut -d';' -f1 # output is 1
echo $string | cut -d';' -f2 # output is 2

Pardeep Sharma , Oct 10, 2017 at 7:29

this is sort and sweet :) – Pardeep Sharma Oct 10 '17 at 7:29

space earth , Oct 17, 2017 at 7:23

Thanks...Helped a lot – space earth Oct 17 '17 at 7:23

mojjj , Jan 8 at 8:57

cut works only with a single char as delimiter. – mojjj Jan 8 at 8:57

lothar , May 28, 2009 at 2:12

echo "bla@some.com;john@home.com" | sed -e 's/;/\n/g'
bla@some.com
john@home.com

Luca Borrione , Sep 3, 2012 at 10:08

-1 what if the string contains spaces? for example IN="this is first line; this is second line" arrIN=( $( echo "$IN" | sed -e 's/;/\n/g' ) ) will produce an array of 8 elements in this case (an element for each word space separated), rather than 2 (an element for each line semi colon separated) – Luca Borrione Sep 3 '12 at 10:08

lothar , Sep 3, 2012 at 17:33

@Luca No the sed script creates exactly two lines. What creates the multiple entries for you is when you put it into a bash array (which splits on white space by default) – lothar Sep 3 '12 at 17:33

Luca Borrione , Sep 4, 2012 at 7:09

That's exactly the point: the OP needs to store entries into an array to loop over it, as you can see in his edits. I think your (good) answer missed to mention to use arrIN=( $( echo "$IN" | sed -e 's/;/\n/g' ) ) to achieve that, and to advice to change IFS to IFS=$'\n' for those who land here in the future and needs to split a string containing spaces. (and to restore it back afterwards). :) – Luca Borrione Sep 4 '12 at 7:09

lothar , Sep 4, 2012 at 16:55

@Luca Good point. However the array assignment was not in the initial question when I wrote up that answer. – lothar Sep 4 '12 at 16:55

Ashok , Sep 8, 2012 at 5:01

This also works:
IN="bla@some.com;john@home.com"
echo ADD1=`echo $IN | cut -d \; -f 1`
echo ADD2=`echo $IN | cut -d \; -f 2`

Be careful, this solution is not always correct. In case you pass "bla@some.com" only, it will assign it to both ADD1 and ADD2.

fersarr , Mar 3, 2016 at 17:17

You can use -s to avoid the mentioned problem: superuser.com/questions/896800/ "-f, --fields=LIST select only these fields; also print any line that contains no delimiter character, unless the -s option is specified" – fersarr Mar 3 '16 at 17:17

Tony , Jan 14, 2013 at 6:33

I think AWK is the best and efficient command to resolve your problem. AWK is included in Bash by default in almost every Linux distribution.
echo "bla@some.com;john@home.com" | awk -F';' '{print $1,$2}'

will give

bla@some.com john@home.com

Of course your can store each email address by redefining the awk print field.

Jaro , Jan 7, 2014 at 21:30

Or even simpler: echo "bla@some.com;john@home.com" | awk 'BEGIN{RS=";"} {print}' – Jaro Jan 7 '14 at 21:30

Aquarelle , May 6, 2014 at 21:58

@Jaro This worked perfectly for me when I had a string with commas and needed to reformat it into lines. Thanks. – Aquarelle May 6 '14 at 21:58

Eduardo Lucio , Aug 5, 2015 at 12:59

It worked in this scenario -> "echo "$SPLIT_0" | awk -F' inode=' '{print $1}'"! I had problems when trying to use atrings (" inode=") instead of characters (";"). $ 1, $ 2, $ 3, $ 4 are set as positions in an array! If there is a way of setting an array... better! Thanks! – Eduardo Lucio Aug 5 '15 at 12:59

Tony , Aug 6, 2015 at 2:42

@EduardoLucio, what I'm thinking about is maybe you can first replace your delimiter inode= into ; for example by sed -i 's/inode\=/\;/g' your_file_to_process , then define -F';' when apply awk , hope that can help you. – Tony Aug 6 '15 at 2:42

nickjb , Jul 5, 2011 at 13:41

A different take on Darron's answer , this is how I do it:
IN="bla@some.com;john@home.com"
read ADDR1 ADDR2 <<<$(IFS=";"; echo $IN)

ColinM , Sep 10, 2011 at 0:31

This doesn't work. – ColinM Sep 10 '11 at 0:31

nickjb , Oct 6, 2011 at 15:33

I think it does! Run the commands above and then "echo $ADDR1 ... $ADDR2" and i get "bla@some.com ... john@home.com" output – nickjb Oct 6 '11 at 15:33

Nick , Oct 28, 2011 at 14:36

This worked REALLY well for me... I used it to itterate over an array of strings which contained comma separated DB,SERVER,PORT data to use mysqldump. – Nick Oct 28 '11 at 14:36

dubiousjim , May 31, 2012 at 5:28

Diagnosis: the IFS=";" assignment exists only in the $(...; echo $IN) subshell; this is why some readers (including me) initially think it won't work. I assumed that all of $IN was getting slurped up by ADDR1. But nickjb is correct; it does work. The reason is that echo $IN command parses its arguments using the current value of $IFS, but then echoes them to stdout using a space delimiter, regardless of the setting of $IFS. So the net effect is as though one had called read ADDR1 ADDR2 <<< "bla@some.com john@home.com" (note the input is space-separated not ;-separated). – dubiousjim May 31 '12 at 5:28

sorontar , Oct 26, 2016 at 4:43

This fails on spaces and newlines, and also expand wildcards * in the echo $IN with an unquoted variable expansion. – sorontar Oct 26 '16 at 4:43

gniourf_gniourf , Jun 26, 2014 at 9:11

In Bash, a bullet proof way, that will work even if your variable contains newlines:
IFS=';' read -d '' -ra array < <(printf '%s;\0' "$in")

Look:

$ in=$'one;two three;*;there is\na newline\nin this field'
$ IFS=';' read -d '' -ra array < <(printf '%s;\0' "$in")
$ declare -p array
declare -a array='([0]="one" [1]="two three" [2]="*" [3]="there is
a newline
in this field")'

The trick for this to work is to use the -d option of read (delimiter) with an empty delimiter, so that read is forced to read everything it's fed. And we feed read with exactly the content of the variable in , with no trailing newline thanks to printf . Note that's we're also putting the delimiter in printf to ensure that the string passed to read has a trailing delimiter. Without it, read would trim potential trailing empty fields:

$ in='one;two;three;'    # there's an empty field
$ IFS=';' read -d '' -ra array < <(printf '%s;\0' "$in")
$ declare -p array
declare -a array='([0]="one" [1]="two" [2]="three" [3]="")'

the trailing empty field is preserved.


Update for Bash≥4.4

Since Bash 4.4, the builtin mapfile (aka readarray ) supports the -d option to specify a delimiter. Hence another canonical way is:

mapfile -d ';' -t array < <(printf '%s;' "$in")

John_West , Jan 8, 2016 at 12:10

I found it as the rare solution on that list that works correctly with \n , spaces and * simultaneously. Also, no loops; array variable is accessible in the shell after execution (contrary to the highest upvoted answer). Note, in=$'...' , it does not work with double quotes. I think, it needs more upvotes. – John_West Jan 8 '16 at 12:10

Darron , Sep 13, 2010 at 20:10

How about this one liner, if you're not using arrays:
IFS=';' read ADDR1 ADDR2 <<<$IN

dubiousjim , May 31, 2012 at 5:36

Consider using read -r ... to ensure that, for example, the two characters "\t" in the input end up as the same two characters in your variables (instead of a single tab char). – dubiousjim May 31 '12 at 5:36

Luca Borrione , Sep 3, 2012 at 10:07

-1 This is not working here (ubuntu 12.04). Adding echo "ADDR1 $ADDR1"\n echo "ADDR2 $ADDR2" to your snippet will output ADDR1 bla@some.com john@home.com\nADDR2 (\n is newline) – Luca Borrione Sep 3 '12 at 10:07

chepner , Sep 19, 2015 at 13:59

This is probably due to a bug involving IFS and here strings that was fixed in bash 4.3. Quoting $IN should fix it. (In theory, $IN is not subject to word splitting or globbing after it expands, meaning the quotes should be unnecessary. Even in 4.3, though, there's at least one bug remaining--reported and scheduled to be fixed--so quoting remains a good idea.) – chepner Sep 19 '15 at 13:59

sorontar , Oct 26, 2016 at 4:55

This breaks if $in contain newlines even if $IN is quoted. And adds a trailing newline. – sorontar Oct 26 '16 at 4:55

kenorb , Sep 11, 2015 at 20:54

Here is a clean 3-liner:
in="foo@bar;bizz@buzz;fizz@buzz;buzz@woof"
IFS=';' list=($in)
for item in "${list[@]}"; do echo $item; done

where IFS delimit words based on the separator and () is used to create an array . Then [@] is used to return each item as a separate word.

If you've any code after that, you also need to restore $IFS , e.g. unset IFS .

sorontar , Oct 26, 2016 at 5:03

The use of $in unquoted allows wildcards to be expanded. – sorontar Oct 26 '16 at 5:03

user2720864 , Sep 24 at 13:46

+ for the unset command – user2720864 Sep 24 at 13:46

Emilien Brigand , Aug 1, 2016 at 13:15

Without setting the IFS

If you just have one colon you can do that:

a="foo:bar"
b=${a%:*}
c=${a##*:}

you will get:

b = foo
c = bar

Victor Choy , Sep 16, 2015 at 3:34

There is a simple and smart way like this:
echo "add:sfff" | xargs -d: -i  echo {}

But you must use gnu xargs, BSD xargs cant support -d delim. If you use apple mac like me. You can install gnu xargs :

brew install findutils

then

echo "add:sfff" | gxargs -d: -i  echo {}

Halle Knast , May 24, 2017 at 8:42

The following Bash/zsh function splits its first argument on the delimiter given by the second argument:
split() {
    local string="$1"
    local delimiter="$2"
    if [ -n "$string" ]; then
        local part
        while read -d "$delimiter" part; do
            echo $part
        done <<< "$string"
        echo $part
    fi
}

For instance, the command

$ split 'a;b;c' ';'

yields

a
b
c

This output may, for instance, be piped to other commands. Example:

$ split 'a;b;c' ';' | cat -n
1   a
2   b
3   c

Compared to the other solutions given, this one has the following advantages:

If desired, the function may be put into a script as follows:

#!/usr/bin/env bash

split() {
    # ...
}

split "$@"

sandeepkunkunuru , Oct 23, 2017 at 16:10

works and neatly modularized. – sandeepkunkunuru Oct 23 '17 at 16:10

Prospero , Sep 25, 2011 at 1:09

This is the simplest way to do it.
spo='one;two;three'
OIFS=$IFS
IFS=';'
spo_array=($spo)
IFS=$OIFS
echo ${spo_array[*]}

rashok , Oct 25, 2016 at 12:41

IN="bla@some.com;john@home.com"
IFS=';'
read -a IN_arr <<< "${IN}"
for entry in "${IN_arr[@]}"
do
    echo $entry
done

Output

bla@some.com
john@home.com

System : Ubuntu 12.04.1

codeforester , Jan 2, 2017 at 5:37

IFS is not getting set in the specific context of read here and hence it can upset rest of the code, if any. – codeforester Jan 2 '17 at 5:37

shuaihanhungry , Jan 20 at 15:54

you can apply awk to many situations
echo "bla@some.com;john@home.com"|awk -F';' '{printf "%s\n%s\n", $1, $2}'

also you can use this

echo "bla@some.com;john@home.com"|awk -F';' '{print $1,$2}' OFS="\n"

ghost , Apr 24, 2013 at 13:13

If no space, Why not this?
IN="bla@some.com;john@home.com"
arr=(`echo $IN | tr ';' ' '`)

echo ${arr[0]}
echo ${arr[1]}

eukras , Oct 22, 2012 at 7:10

There are some cool answers here (errator esp.), but for something analogous to split in other languages -- which is what I took the original question to mean -- I settled on this:
IN="bla@some.com;john@home.com"
declare -a a="(${IN/;/ })";

Now ${a[0]} , ${a[1]} , etc, are as you would expect. Use ${#a[*]} for number of terms. Or to iterate, of course:

for i in ${a[*]}; do echo $i; done

IMPORTANT NOTE:

This works in cases where there are no spaces to worry about, which solved my problem, but may not solve yours. Go with the $IFS solution(s) in that case.

olibre , Oct 7, 2013 at 13:33

Does not work when IN contains more than two e-mail addresses. Please refer to same idea (but fixed) at palindrom's answerolibre Oct 7 '13 at 13:33

sorontar , Oct 26, 2016 at 5:14

Better use ${IN//;/ } (double slash) to make it also work with more than two values. Beware that any wildcard ( *?[ ) will be expanded. And a trailing empty field will be discarded. – sorontar Oct 26 '16 at 5:14

jeberle , Apr 30, 2013 at 3:10

Use the set built-in to load up the $@ array:
IN="bla@some.com;john@home.com"
IFS=';'; set $IN; IFS=$' \t\n'

Then, let the party begin:

echo $#
for a; do echo $a; done
ADDR1=$1 ADDR2=$2

sorontar , Oct 26, 2016 at 5:17

Better use set -- $IN to avoid some issues with "$IN" starting with dash. Still, the unquoted expansion of $IN will expand wildcards ( *?[ ). – sorontar Oct 26 '16 at 5:17

NevilleDNZ , Sep 2, 2013 at 6:30

Two bourne-ish alternatives where neither require bash arrays:

Case 1 : Keep it nice and simple: Use a NewLine as the Record-Separator... eg.

IN="bla@some.com
john@home.com"

while read i; do
  # process "$i" ... eg.
    echo "[email:$i]"
done <<< "$IN"

Note: in this first case no sub-process is forked to assist with list manipulation.

Idea: Maybe it is worth using NL extensively internally , and only converting to a different RS when generating the final result externally .

Case 2 : Using a ";" as a record separator... eg.

NL="
" IRS=";" ORS=";"

conv_IRS() {
  exec tr "$1" "$NL"
}

conv_ORS() {
  exec tr "$NL" "$1"
}

IN="bla@some.com;john@home.com"
IN="$(conv_IRS ";" <<< "$IN")"

while read i; do
  # process "$i" ... eg.
    echo -n "[email:$i]$ORS"
done <<< "$IN"

In both cases a sub-list can be composed within the loop is persistent after the loop has completed. This is useful when manipulating lists in memory, instead storing lists in files. {p.s. keep calm and carry on B-) }

fedorqui , Jan 8, 2015 at 10:21

Apart from the fantastic answers that were already provided, if it is just a matter of printing out the data you may consider using awk :
awk -F";" '{for (i=1;i<=NF;i++) printf("> [%s]\n", $i)}' <<< "$IN"

This sets the field separator to ; , so that it can loop through the fields with a for loop and print accordingly.

Test
$ IN="bla@some.com;john@home.com"
$ awk -F";" '{for (i=1;i<=NF;i++) printf("> [%s]\n", $i)}' <<< "$IN"
> [bla@some.com]
> [john@home.com]

With another input:

$ awk -F";" '{for (i=1;i<=NF;i++) printf("> [%s]\n", $i)}' <<< "a;b;c   d;e_;f"
> [a]
> [b]
> [c   d]
> [e_]
> [f]

18446744073709551615 , Feb 20, 2015 at 10:49

In Android shell, most of the proposed methods just do not work:
$ IFS=':' read -ra ADDR <<<"$PATH"                             
/system/bin/sh: can't create temporary file /sqlite_stmt_journals/mksh.EbNoR10629: No such file or directory

What does work is:

$ for i in ${PATH//:/ }; do echo $i; done
/sbin
/vendor/bin
/system/sbin
/system/bin
/system/xbin

where // means global replacement.

sorontar , Oct 26, 2016 at 5:08

Fails if any part of $PATH contains spaces (or newlines). Also expands wildcards (asterisk *, question mark ? and braces [ ]). – sorontar Oct 26 '16 at 5:08

Eduardo Lucio , Apr 4, 2016 at 19:54

Okay guys!

Here's my answer!

DELIMITER_VAL='='

read -d '' F_ABOUT_DISTRO_R <<"EOF"
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=14.04
DISTRIB_CODENAME=trusty
DISTRIB_DESCRIPTION="Ubuntu 14.04.4 LTS"
NAME="Ubuntu"
VERSION="14.04.4 LTS, Trusty Tahr"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 14.04.4 LTS"
VERSION_ID="14.04"
HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
EOF

SPLIT_NOW=$(awk -F$DELIMITER_VAL '{for(i=1;i<=NF;i++){printf "%s\n", $i}}' <<<"${F_ABOUT_DISTRO_R}")
while read -r line; do
   SPLIT+=("$line")
done <<< "$SPLIT_NOW"
for i in "${SPLIT[@]}"; do
    echo "$i"
done

Why this approach is "the best" for me?

Because of two reasons:

  1. You do not need to escape the delimiter;
  2. You will not have problem with blank spaces . The value will be properly separated in the array!

[]'s

gniourf_gniourf , Jan 30, 2017 at 8:26

FYI, /etc/os-release and /etc/lsb-release are meant to be sourced, and not parsed. So your method is really wrong. Moreover, you're not quite answering the question about spiltting a string on a delimiter.gniourf_gniourf Jan 30 '17 at 8:26

Michael Hale , Jun 14, 2012 at 17:38

A one-liner to split a string separated by ';' into an array is:
IN="bla@some.com;john@home.com"
ADDRS=( $(IFS=";" echo "$IN") )
echo ${ADDRS[0]}
echo ${ADDRS[1]}

This only sets IFS in a subshell, so you don't have to worry about saving and restoring its value.

Luca Borrione , Sep 3, 2012 at 10:04

-1 this doesn't work here (ubuntu 12.04). it prints only the first echo with all $IN value in it, while the second is empty. you can see it if you put echo "0: "${ADDRS[0]}\n echo "1: "${ADDRS[1]} the output is 0: bla@some.com;john@home.com\n 1: (\n is new line) – Luca Borrione Sep 3 '12 at 10:04

Luca Borrione , Sep 3, 2012 at 10:05

please refer to nickjb's answer at for a working alternative to this idea stackoverflow.com/a/6583589/1032370 – Luca Borrione Sep 3 '12 at 10:05

Score_Under , Apr 28, 2015 at 17:09

-1, 1. IFS isn't being set in that subshell (it's being passed to the environment of "echo", which is a builtin, so nothing is happening anyway). 2. $IN is quoted so it isn't subject to IFS splitting. 3. The process substitution is split by whitespace, but this may corrupt the original data. – Score_Under Apr 28 '15 at 17:09

ajaaskel , Oct 10, 2014 at 11:33

IN='bla@some.com;john@home.com;Charlie Brown <cbrown@acme.com;!"#$%&/()[]{}*? are no problem;simple is beautiful :-)'
set -f
oldifs="$IFS"
IFS=';'; arrayIN=($IN)
IFS="$oldifs"
for i in "${arrayIN[@]}"; do
echo "$i"
done
set +f

Output:

bla@some.com
john@home.com
Charlie Brown <cbrown@acme.com
!"#$%&/()[]{}*? are no problem
simple is beautiful :-)

Explanation: Simple assignment using parenthesis () converts semicolon separated list into an array provided you have correct IFS while doing that. Standard FOR loop handles individual items in that array as usual. Notice that the list given for IN variable must be "hard" quoted, that is, with single ticks.

IFS must be saved and restored since Bash does not treat an assignment the same way as a command. An alternate workaround is to wrap the assignment inside a function and call that function with a modified IFS. In that case separate saving/restoring of IFS is not needed. Thanks for "Bize" for pointing that out.

gniourf_gniourf , Feb 20, 2015 at 16:45

!"#$%&/()[]{}*? are no problem well... not quite: []*? are glob characters. So what about creating this directory and file: `mkdir '!"#$%&'; touch '!"#$%&/()[]{} got you hahahaha - are no problem' and running your command? simple may be beautiful, but when it's broken, it's broken. – gniourf_gniourf Feb 20 '15 at 16:45

ajaaskel , Feb 25, 2015 at 7:20

@gniourf_gniourf The string is stored in a variable. Please see the original question. – ajaaskel Feb 25 '15 at 7:20

gniourf_gniourf , Feb 25, 2015 at 7:26

@ajaaskel you didn't fully understand my comment. Go in a scratch directory and issue these commands: mkdir '!"#$%&'; touch '!"#$%&/()[]{} got you hahahaha - are no problem' . They will only create a directory and a file, with weird looking names, I must admit. Then run your commands with the exact IN you gave: IN='bla@some.com;john@home.com;Charlie Brown <cbrown@acme.com;!"#$%&/()[]{}*? are no problem;simple is beautiful :-)' . You'll see that you won't get the output you expect. Because you're using a method subject to pathname expansions to split your string. – gniourf_gniourf Feb 25 '15 at 7:26

gniourf_gniourf , Feb 25, 2015 at 7:29

This is to demonstrate that the characters * , ? , [...] and even, if extglob is set, !(...) , @(...) , ?(...) , +(...) are problems with this method! – gniourf_gniourf Feb 25 '15 at 7:29

ajaaskel , Feb 26, 2015 at 15:26

@gniourf_gniourf Thanks for detailed comments on globbing. I adjusted the code to have globbing off. My point was however just to show that rather simple assignment can do the splitting job. – ajaaskel Feb 26 '15 at 15:26

> , Dec 19, 2013 at 21:39

Maybe not the most elegant solution, but works with * and spaces:
IN="bla@so me.com;*;john@home.com"
for i in `delims=${IN//[^;]}; seq 1 $((${#delims} + 1))`
do
   echo "> [`echo $IN | cut -d';' -f$i`]"
done

Outputs

> [bla@so me.com]
> [*]
> [john@home.com]

Other example (delimiters at beginning and end):

IN=";bla@so me.com;*;john@home.com;"
> []
> [bla@so me.com]
> [*]
> [john@home.com]
> []

Basically it removes every character other than ; making delims eg. ;;; . Then it does for loop from 1 to number-of-delimiters as counted by ${#delims} . The final step is to safely get the $i th part using cut .

[Oct 20, 2017] Simple logical operators in Bash - Stack Overflow

Notable quotes:
"... Backquotes ( ` ` ) are old-style form of command substitution, with some differences: in this form, backslash retains its literal meaning except when followed by $ , ` , or \ , and the first backquote not preceded by a backslash terminates the command substitution; whereas in the $( ) form, all characters between the parentheses make up the command, none are treated specially. ..."
"... Double square brackets delimit a Conditional Expression. And, I find the following to be a good reading on the subject: "(IBM) Demystify test, [, [[, ((, and if-then-else" ..."
Oct 20, 2017 | stackoverflow.com

Amit , Jun 7, 2011 at 19:18

I have a couple of variables and I want to check the following condition (written out in words, then my failed attempt at bash scripting):
if varA EQUALS 1 AND ( varB EQUALS "t1" OR varB EQUALS "t2" ) then 

do something

done.

And in my failed attempt, I came up with:

if (($varA == 1)) && ( (($varB == "t1")) || (($varC == "t2")) ); 
  then
    scale=0.05
  fi

Best answer Gilles

What you've written actually almost works (it would work if all the variables were numbers), but it's not an idiomatic way at all.

This is the idiomatic way to write your test in bash:

if [[ $varA = 1 && ($varB = "t1" || $varC = "t2") ]]; then

If you need portability to other shells, this would be the way (note the additional quoting and the separate sets of brackets around each individual test):

if [ "$varA" = 1 ] && { [ "$varB" = "t1" ] || [ "$varC" = "t2" ]; }; then

Will Sheppard , Jun 19, 2014 at 11:07

It's better to use == to differentiate the comparison from assigning a variable (which is also = ) – Will Sheppard Jun 19 '14 at 11:07

Cbhihe , Apr 3, 2016 at 8:05

+1 @WillSheppard for yr reminder of proper style. Gilles, don't you need a semicolon after yr closing curly bracket and before "then" ? I always thought if , then , else and fi could not be on the same line... As in:

if [ "$varA" = 1 ] && { [ "$varB" = "t1" ] || [ "$varC" = "t2" ]; }; then

– Cbhihe Apr 3 '16 at 8:05

Rockallite , Jan 19 at 2:41

Backquotes ( ` ` ) are old-style form of command substitution, with some differences: in this form, backslash retains its literal meaning except when followed by $ , ` , or \ , and the first backquote not preceded by a backslash terminates the command substitution; whereas in the $( ) form, all characters between the parentheses make up the command, none are treated specially.

– Rockallite Jan 19 at 2:41

Peter A. Schneider , Aug 28 at 13:16

You could emphasize that single brackets have completely different semantics inside and outside of double brackets. (Because you start with explicitly pointing out the subshell semantics but then only as an aside mention the grouping semantics as part of conditional expressions. Was confusing to me for a second when I looked at your idiomatic example.) – Peter A. Schneider Aug 28 at 13:16

matchew , Jun 7, 2011 at 19:29

very close
if (( $varA == 1 )) && [[ $varB == 't1' || $varC == 't2' ]]; 
  then 
    scale=0.05
  fi

should work.

breaking it down

(( $varA == 1 ))

is an integer comparison where as

$varB == 't1'

is a string comparison. otherwise, I am just grouping the comparisons correctly.

Double square brackets delimit a Conditional Expression. And, I find the following to be a good reading on the subject: "(IBM) Demystify test, [, [[, ((, and if-then-else"

Peter A. Schneider , Aug 28 at 13:21

Just to be sure: The quoting in 't1' is unnecessary, right? Because as opposed to arithmetic instructions in double parentheses, where t1 would be a variable, t1 in a conditional expression in double brackets is just a literal string.

I.e., [[ $varB == 't1' ]] is exactly the same as [[ $varB == t1 ]] , right? – Peter A. Schneider Aug 28 at 13:21

[Oct 20, 2017] shell script - OR in `expr match`

Notable quotes:
"... ...and if you weren't targeting a known/fixed operating system, using case rather than a regex match is very much the better practice, since the accepted answer depends on behavior POSIX doesn't define. ..."
"... Regular expression syntax, including the use of backquoting, is different for different tools. Always look it up. ..."
Oct 20, 2017 | unix.stackexchange.com

OR in `expr match` up vote down vote favorite

stracktracer , Dec 14, 2015 at 13:54

I'm confused as to why this does not match:

expr match Unauthenticated123 '^(Unauthenticated|Authenticated).*'

it outputs 0.

Charles Duffy , Dec 14, 2015 at 18:22

As an aside, if you were using bash for this, the preferred alternative would be the =~ operator in [[ ]] , ie. [[ Unauthenticated123 =~ ^(Unauthenticated|Authenticated) ]]Charles Duffy Dec 14 '15 at 18:22

Charles Duffy , Dec 14, 2015 at 18:25

...and if you weren't targeting a known/fixed operating system, using case rather than a regex match is very much the better practice, since the accepted answer depends on behavior POSIX doesn't define. Charles Duffy Dec 14 '15 at 18:25

Gilles , Dec 14, 2015 at 23:43

See Why does my regular expression work in X but not in Y?Gilles Dec 14 '15 at 23:43

Lambert , Dec 14, 2015 at 14:04

Your command should be:
expr match Unauthenticated123 'Unauthenticated\|Authenticated'

If you want the number of characters matched.

To have the part of the string (Unauthenticated) returned use:

expr match Unauthenticated123 '\(Unauthenticated\|Authenticated\)'

From info coreutils 'expr invocation' :

STRING : REGEX' Perform pattern matching. The arguments are converted to strings and the second is considered to be a (basic, a la GNU grep') regular expression, with a `^' implicitly prepended. The first argument is then matched against this regular expression.

 If the match succeeds and REGEX uses `\(' and `\)', the `:'
 expression returns the part of STRING that matched the
 subexpression; otherwise, it returns the number of characters
 matched.

 If the match fails, the `:' operator returns the null string if
 `\(' and `\)' are used in REGEX, otherwise 0.

 Only the first `\( ... \)' pair is relevant to the return value;
 additional pairs are meaningful only for grouping the regular
 expression operators.

 In the regular expression, `\+', `\?', and `\|' are operators
 which respectively match one or more, zero or one, or separate
 alternatives.  SunOS and other `expr''s treat these as regular
 characters.  (POSIX allows either behavior.)  *Note Regular
 Expression Library: (regex)Top, for details of regular expression
 syntax.  Some examples are in *note Examples of expr::.

stracktracer , Dec 14, 2015 at 14:18

Thanks escaping the | worked. Weird, normally I'd expect it if I wanted to match the literal |... – stracktracer Dec 14 '15 at 14:18

reinierpost , Dec 14, 2015 at 15:34

Regular expression syntax, including the use of backquoting, is different for different tools. Always look it up.reinierpost Dec 14 '15 at 15:34

Stéphane Chazelas , Dec 14, 2015 at 14:49

Note that both match and \| are GNU extensions (and the behaviour for : (the match standard equivalent) when the pattern starts with ^ varies with implementations). Standardly, you'd do:
expr " $string" : " Authenticated" '|' " $string" : " Unauthenticated"

The leading space is to avoid problems with values of $string that start with - or are expr operators, but that means it adds one to the number of characters being matched.

With GNU expr , you'd write it:

expr + "$string" : 'Authenticated\|Unauthenticated'

The + forces $string to be taken as a string even if it happens to be a expr operator. expr regular expressions are basic regular expressions which don't have an alternation operator (and where | is not special). The GNU implementation has it as \| though as an extension.

If all you want is to check whether $string starts with Authenticated or Unauthenticated , you'd better use:

case $string in
  (Authenticated* | Unauthenticated*) do-something
esac

netmonk , Dec 14, 2015 at 14:06

$ expr match "Unauthenticated123" '^\(Unauthenticated\|Authenticated\).*' you have to escape with \ the parenthesis and the pipe.

mikeserv , Dec 14, 2015 at 14:18

and the ^ may not mean what some would think depending on the expr . it is implied anyway. – mikeserv Dec 14 '15 at 14:18

Stéphane Chazelas , Dec 14, 2015 at 14:34

@mikeserv, match and \| are GNU extensions anyway. This Q&A seems to be about GNU expr anyway (where ^ is guaranteed to mean match at the beginning of the string ). – Stéphane Chazelas Dec 14 '15 at 14:34

mikeserv , Dec 14, 2015 at 14:49

@StéphaneChazelas - i didn't know they were strictly GNU. i think i remember them being explicitly officially unspecified - but i don't use expr too often anyway and didn't know that. thank you. – mikeserv Dec 14 '15 at 14:49

Random832 , Dec 14, 2015 at 16:13

It's not "strictly GNU" - it's present in a number of historical implementations (even System V had it, undocumented, though it didn't have the others like substr/length/index), which is why it's explicitly unspecified. I can't find anything about \| being an extension. – Random832 Dec 14 '15 at 16:13

[Oct 17, 2017] Converting string to lower case in Bash - Stack Overflow

Feb 15, 2010 | stackoverflow.com

assassin , Feb 15, 2010 at 7:02

Is there a way in bash to convert a string into a lower case string?

For example, if I have:

a="Hi all"

I want to convert it to:

"hi all"

ghostdog74 , Feb 15, 2010 at 7:43

The are various ways: tr
$ echo "$a" | tr '[:upper:]' '[:lower:]'
hi all
AWK
$ echo "$a" | awk '{print tolower($0)}'
hi all
Bash 4.0
$ echo "${a,,}"
hi all
Perl
$ echo "$a" | perl -ne 'print lc'
hi all
Bash
lc(){
    case "$1" in
        [A-Z])
        n=$(printf "%d" "'$1")
        n=$((n+32))
        printf \\$(printf "%o" "$n")
        ;;
        *)
        printf "%s" "$1"
        ;;
    esac
}
word="I Love Bash"
for((i=0;i<${#word};i++))
do
    ch="${word:$i:1}"
    lc "$ch"
done

jangosteve , Jan 14, 2012 at 21:58

Am I missing something, or does your last example (in Bash) actually do something completely different? It works for "ABX", but if you instead make word="Hi All" like the other examples, it returns ha , not hi all . It only works for the capitalized letters and skips the already-lowercased letters. – jangosteve Jan 14 '12 at 21:58

Richard Hansen , Feb 3, 2012 at 18:55

Note that only the tr and awk examples are specified in the POSIX standard. – Richard Hansen Feb 3 '12 at 18:55

Richard Hansen , Feb 3, 2012 at 18:58

tr '[:upper:]' '[:lower:]' will use the current locale to determine uppercase/lowercase equivalents, so it'll work with locales that use letters with diacritical marks. – Richard Hansen Feb 3 '12 at 18:58

Adam Parkin , Sep 25, 2012 at 18:01

How does one get the output into a new variable? Ie say I want the lowercased string into a new variable? – Adam Parkin Sep 25 '12 at 18:01

Tino , Nov 14, 2012 at 15:39

@Adam: b="$(echo $a | tr '[A-Z]' '[a-z]')" – Tino Nov 14 '12 at 15:39

Dennis Williamson , Feb 15, 2010 at 10:31

In Bash 4:

To lowercase

$ string="A FEW WORDS"
$ echo "${string,}"
a FEW WORDS
$ echo "${string,,}"
a few words
$ echo "${string,,[AEIUO]}"
a FeW WoRDS

$ string="A Few Words"
$ declare -l string
$ string=$string; echo "$string"
a few words

To uppercase

$ string="a few words"
$ echo "${string^}"
A few words
$ echo "${string^^}"
A FEW WORDS
$ echo "${string^^[aeiou]}"
A fEw wOrds

$ string="A Few Words"
$ declare -u string
$ string=$string; echo "$string"
A FEW WORDS

Toggle (undocumented, but optionally configurable at compile time)

$ string="A Few Words"
$ echo "${string~~}"
a fEW wORDS
$ string="A FEW WORDS"
$ echo "${string~}"
a FEW WORDS
$ string="a few words"
$ echo "${string~}"
A few words

Capitalize (undocumented, but optionally configurable at compile time)

$ string="a few words"
$ declare -c string
$ string=$string
$ echo "$string"
A few words

Title case:

$ string="a few words"
$ string=($string)
$ string="${string[@]^}"
$ echo "$string"
A Few Words

$ declare -c string
$ string=(a few words)
$ echo "${string[@]}"
A Few Words

$ string="a FeW WOrdS"
$ string=${string,,}
$ string=${string~}
$ echo "$string"

To turn off a declare attribute, use + . For example, declare +c string . This affects subsequent assignments and not the current value.

The declare options change the attribute of the variable, but not the contents. The reassignments in my examples update the contents to show the changes.

Edit:

Added "toggle first character by word" ( ${var~} ) as suggested by ghostdog74

Edit: Corrected tilde behavior to match Bash 4.3.

ghostdog74 , Feb 15, 2010 at 10:52

there's also ${string~} – ghostdog74 Feb 15 '10 at 10:52

Hubert Kario , Jul 12, 2012 at 16:48

Quite bizzare, "^^" and ",," operators don't work on non-ASCII characters but "~~" does... So string="łσdź"; echo ${string~~} will return "ŁΣDŹ", but echo ${string^^} returns "łσDź". Even in LC_ALL=pl_PL.utf-8 . That's using bash 4.2.24. – Hubert Kario Jul 12 '12 at 16:48

Dennis Williamson , Jul 12, 2012 at 18:20

@HubertKario: That's weird. It's the same for me in Bash 4.0.33 with the same string in en_US.UTF-8 . It's a bug and I've reported it. – Dennis Williamson Jul 12 '12 at 18:20

Dennis Williamson , Jul 13, 2012 at 0:44

@HubertKario: Try echo "$string" | tr '[:lower:]' '[:upper:]' . It will probably exhibit the same failure. So the problem is at least partly not Bash's. – Dennis Williamson Jul 13 '12 at 0:44

Dennis Williamson , Jul 14, 2012 at 14:27

@HubertKario: The Bash maintainer has acknowledged the bug and stated that it will be fixed in the next release. – Dennis Williamson Jul 14 '12 at 14:27

shuvalov , Feb 15, 2010 at 7:13

echo "Hi All" | tr "[:upper:]" "[:lower:]"

Richard Hansen , Feb 3, 2012 at 19:00

+1 for not assuming english – Richard Hansen Feb 3 '12 at 19:00

Hubert Kario , Jul 12, 2012 at 16:56

@RichardHansen: tr doesn't work for me for non-ACII characters. I do have correct locale set and locale files generated. Have any idea what could I be doing wrong? – Hubert Kario Jul 12 '12 at 16:56

wasatchwizard , Oct 23, 2014 at 16:42

FYI: This worked on Windows/Msys. Some of the other suggestions did not. – wasatchwizard Oct 23 '14 at 16:42

Ignacio Vazquez-Abrams , Feb 15, 2010 at 7:03

tr :
a="$(tr [A-Z] [a-z] <<< "$a")"
AWK :
{ print tolower($0) }
sed :
y/ABCDEFGHIJKLMNOPQRSTUVWXYZ/abcdefghijklmnopqrstuvwxyz/

Sandeepan Nath , Feb 2, 2011 at 11:12

+1 a="$(tr [A-Z] [a-z] <<< "$a")" looks easiest to me. I am still a beginner... – Sandeepan Nath Feb 2 '11 at 11:12

Haravikk , Oct 19, 2013 at 12:54

I strongly recommend the sed solution; I've been working in an environment that for some reason doesn't have tr but I've yet to find a system without sed , plus a lot of the time I want to do this I've just done something else in sed anyway so can chain the commands together into a single (long) statement. – Haravikk Oct 19 '13 at 12:54

Dennis , Nov 6, 2013 at 19:49

The bracket expressions should be quoted. In tr [A-Z] [a-z] A , the shell may perform filename expansion if there are filenames consisting of a single letter or nullgob is set. tr "[A-Z]" "[a-z]" A will behave properly. – Dennis Nov 6 '13 at 19:49

Haravikk , Jun 15, 2014 at 10:51

@CamiloMartin it's a BusyBox system where I'm having that problem, specifically Synology NASes, but I've encountered it on a few other systems too. I've been doing a lot of cross-platform shell scripting lately, and with the requirement that nothing extra be installed it makes things very tricky! However I've yet to encounter a system without sed – Haravikk Jun 15 '14 at 10:51

fuz , Jan 31, 2016 at 14:54

Note that tr [A-Z] [a-z] is incorrect in almost all locales. for example, in the en-US locale, A-Z is actually the interval AaBbCcDdEeFfGgHh...XxYyZ . – fuz Jan 31 '16 at 14:54

nettux443 , May 14, 2014 at 9:36

I know this is an oldish post but I made this answer for another site so I thought I'd post it up here:

UPPER -> lower : use python:

b=`echo "print '$a'.lower()" | python`

Or Ruby:

b=`echo "print '$a'.downcase" | ruby`

Or Perl (probably my favorite):

b=`perl -e "print lc('$a');"`

Or PHP:

b=`php -r "print strtolower('$a');"`

Or Awk:

b=`echo "$a" | awk '{ print tolower($1) }'`

Or Sed:

b=`echo "$a" | sed 's/./\L&/g'`

Or Bash 4:

b=${a,,}

Or NodeJS if you have it (and are a bit nuts...):

b=`echo "console.log('$a'.toLowerCase());" | node`

You could also use dd (but I wouldn't!):

b=`echo "$a" | dd  conv=lcase 2> /dev/null`

lower -> UPPER

use python:

b=`echo "print '$a'.upper()" | python`

Or Ruby:

b=`echo "print '$a'.upcase" | ruby`

Or Perl (probably my favorite):

b=`perl -e "print uc('$a');"`

Or PHP:

b=`php -r "print strtoupper('$a');"`

Or Awk:

b=`echo "$a" | awk '{ print toupper($1) }'`

Or Sed:

b=`echo "$a" | sed 's/./\U&/g'`

Or Bash 4:

b=${a^^}

Or NodeJS if you have it (and are a bit nuts...):

b=`echo "console.log('$a'.toUpperCase());" | node`

You could also use dd (but I wouldn't!):

b=`echo "$a" | dd  conv=ucase 2> /dev/null`

Also when you say 'shell' I'm assuming you mean bash but if you can use zsh it's as easy as

b=$a:l

for lower case and

b=$a:u

for upper case.

JESii , May 28, 2015 at 21:42

Neither the sed command nor the bash command worked for me. – JESii May 28 '15 at 21:42

nettux443 , Nov 20, 2015 at 14:33

@JESii both work for me upper -> lower and lower-> upper. I'm using sed 4.2.2 and Bash 4.3.42(1) on 64bit Debian Stretch. – nettux443 Nov 20 '15 at 14:33

JESii , Nov 21, 2015 at 17:34

Hi, @nettux443... I just tried the bash operation again and it still fails for me with the error message "bad substitution". I'm on OSX using homebrew's bash: GNU bash, version 4.3.42(1)-release (x86_64-apple-darwin14.5.0) – JESii Nov 21 '15 at 17:34

tripleee , Jan 16, 2016 at 11:45

Do not use! All of the examples which generate a script are extremely brittle; if the value of a contains a single quote, you have not only broken behavior, but a serious security problem. – tripleee Jan 16 '16 at 11:45

Scott Smedley , Jan 27, 2011 at 5:37

In zsh:
echo $a:u

Gotta love zsh!

Scott Smedley , Jan 27, 2011 at 5:39

or $a:l for lower case conversion – Scott Smedley Jan 27 '11 at 5:39

biocyberman , Jul 24, 2015 at 23:26

Add one more case: echo ${(C)a} #Upcase the first char only – biocyberman Jul 24 '15 at 23:26

devnull , Sep 26, 2013 at 15:45

Using GNU sed :
sed 's/.*/\L&/'

Example:

$ foo="Some STRIng";
$ foo=$(echo "$foo" | sed 's/.*/\L&/')
$ echo "$foo"
some string

technosaurus , Jan 21, 2012 at 10:27

For a standard shell (without bashisms) using only builtins:
uppers=ABCDEFGHIJKLMNOPQRSTUVWXYZ
lowers=abcdefghijklmnopqrstuvwxyz

lc(){ #usage: lc "SOME STRING" -> "some string"
    i=0
    while ([ $i -lt ${#1} ]) do
        CUR=${1:$i:1}
        case $uppers in
            *$CUR*)CUR=${uppers%$CUR*};OUTPUT="${OUTPUT}${lowers:${#CUR}:1}";;
            *)OUTPUT="${OUTPUT}$CUR";;
        esac
        i=$((i+1))
    done
    echo "${OUTPUT}"
}

And for upper case:

uc(){ #usage: uc "some string" -> "SOME STRING"
    i=0
    while ([ $i -lt ${#1} ]) do
        CUR=${1:$i:1}
        case $lowers in
            *$CUR*)CUR=${lowers%$CUR*};OUTPUT="${OUTPUT}${uppers:${#CUR}:1}";;
            *)OUTPUT="${OUTPUT}$CUR";;
        esac
        i=$((i+1))
    done
    echo "${OUTPUT}"
}

Dereckson , Nov 23, 2014 at 19:52

I wonder if you didn't let some bashism in this script, as it's not portable on FreeBSD sh: ${1:$...}: Bad substitution – Dereckson Nov 23 '14 at 19:52

tripleee , Apr 14, 2015 at 7:09

Indeed; substrings with ${var:1:1} are a Bashism. – tripleee Apr 14 '15 at 7:09

Derek Shaw , Jan 24, 2011 at 13:53

Regular expression

I would like to take credit for the command I wish to share but the truth is I obtained it for my own use from http://commandlinefu.com . It has the advantage that if you cd to any directory within your own home folder that is it will change all files and folders to lower case recursively please use with caution. It is a brilliant command line fix and especially useful for those multitudes of albums you have stored on your drive.

find . -depth -exec rename 's/(.*)\/([^\/]*)/$1\/\L$2/' {} \;

You can specify a directory in place of the dot(.) after the find which denotes current directory or full path.

I hope this solution proves useful the one thing this command does not do is replace spaces with underscores - oh well another time perhaps.

Wadih M. , Nov 29, 2011 at 1:31

thanks for commandlinefu.com – Wadih M. Nov 29 '11 at 1:31

John Rix , Jun 26, 2013 at 15:58

This didn't work for me for whatever reason, though it looks fine. I did get this to work as an alternative though: find . -exec /bin/bash -c 'mv {} `tr [A-Z] [a-z] <<< {}`' \; – John Rix Jun 26 '13 at 15:58

Tino , Dec 11, 2015 at 16:27

This needs prename from perl : dpkg -S "$(readlink -e /usr/bin/rename)" gives perl: /usr/bin/prename – Tino Dec 11 '15 at 16:27

c4f4t0r , Aug 21, 2013 at 10:21

In bash 4 you can use typeset

Example:

A="HELLO WORLD"
typeset -l A=$A

community wiki, Jan 16, 2016 at 12:26

Pre Bash 4.0

Bash Lower the Case of a string and assign to variable

VARIABLE=$(echo "$VARIABLE" | tr '[:upper:]' '[:lower:]') 

echo "$VARIABLE"

Tino , Dec 11, 2015 at 16:23

No need for echo and pipes: use $(tr '[:upper:]' '[:lower:]' <<<"$VARIABLE") – Tino Dec 11 '15 at 16:23

tripleee , Jan 16, 2016 at 12:28

@Tino The here string is also not portable back to really old versions of Bash; I believe it was introduced in v3. – tripleee Jan 16 '16 at 12:28

Tino , Jan 17, 2016 at 14:28

@tripleee You are right, it was introduced in bash-2.05b - however that's the oldest bash I was able to find on my systems – Tino Jan 17 '16 at 14:28

Bikesh M Annur , Mar 23 at 6:48

You can try this
s="Hello World!" 

echo $s  # Hello World!

a=${s,,}
echo $a  # hello world!

b=${s^^}
echo $b  # HELLO WORLD!

ref : http://wiki.workassis.com/shell-script-convert-text-to-lowercase-and-uppercase/

Orwellophile , Mar 24, 2013 at 13:43

For Bash versions earlier than 4.0, this version should be fastest (as it doesn't fork/exec any commands):
function string.monolithic.tolower
{
   local __word=$1
   local __len=${#__word}
   local __char
   local __octal
   local __decimal
   local __result

   for (( i=0; i<__len; i++ ))
   do
      __char=${__word:$i:1}
      case "$__char" in
         [A-Z] )
            printf -v __decimal '%d' "'$__char"
            printf -v __octal '%03o' $(( $__decimal ^ 0x20 ))
            printf -v __char \\$__octal
            ;;
      esac
      __result+="$__char"
   done
   REPLY="$__result"
}

technosaurus's answer had potential too, although it did run properly for mee.

Stephen M. Harris , Mar 22, 2013 at 22:42

If using v4, this is baked-in . If not, here is a simple, widely applicable solution. Other answers (and comments) on this thread were quite helpful in creating the code below.
# Like echo, but converts to lowercase
echolcase () {
    tr [:upper:] [:lower:] <<< "${*}"
}

# Takes one arg by reference (var name) and makes it lowercase
lcase () { 
    eval "${1}"=\'$(echo ${!1//\'/"'\''"} | tr [:upper:] [:lower:] )\'
}

Notes:

JaredTS486 , Dec 23, 2015 at 17:37

In spite of how old this question is and similar to this answer by technosaurus . I had a hard time finding a solution that was portable across most platforms (That I Use) as well as older versions of bash. I have also been frustrated with arrays, functions and use of prints, echos and temporary files to retrieve trivial variables. This works very well for me so far I thought I would share. My main testing environments are:
  1. GNU bash, version 4.1.2(1)-release (x86_64-redhat-linux-gnu)
  2. GNU bash, version 3.2.57(1)-release (sparc-sun-solaris2.10)
lcs="abcdefghijklmnopqrstuvwxyz"
ucs="ABCDEFGHIJKLMNOPQRSTUVWXYZ"
input="Change Me To All Capitals"
for (( i=0; i<"${#input}"; i++ )) ; do :
    for (( j=0; j<"${#lcs}"; j++ )) ; do :
        if [[ "${input:$i:1}" == "${lcs:$j:1}" ]] ; then
            input="${input/${input:$i:1}/${ucs:$j:1}}" 
        fi
    done
done

Simple C-style for loop to iterate through the strings. For the line below if you have not seen anything like this before this is where I learned this . In this case the line checks if the char ${input:$i:1} (lower case) exists in input and if so replaces it with the given char ${ucs:$j:1} (upper case) and stores it back into input.

input="${input/${input:$i:1}/${ucs:$j:1}}"

Gus Neves , May 16 at 10:04

Many answers using external programs, which is not really using Bash .

If you know you will have Bash4 available you should really just use the ${VAR,,} notation (it is easy and cool). For Bash before 4 (My Mac still uses Bash 3.2 for example). I used the corrected version of @ghostdog74 's answer to create a more portable version.

One you can call lowercase 'my STRING' and get a lowercase version. I read comments about setting the result to a var, but that is not really portable in Bash , since we can't return strings. Printing it is the best solution. Easy to capture with something like var="$(lowercase $str)" .

How this works

The way this works is by getting the ASCII integer representation of each char with printf and then adding 32 if upper-to->lower , or subtracting 32 if lower-to->upper . Then use printf again to convert the number back to a char. From 'A' -to-> 'a' we have a difference of 32 chars.

Using printf to explain:

$ printf "%d\n" "'a"
97
$ printf "%d\n" "'A"
65

97 - 65 = 32

And this is the working version with examples.
Please note the comments in the code, as they explain a lot of stuff:

#!/bin/bash

# lowerupper.sh

# Prints the lowercase version of a char
lowercaseChar(){
    case "$1" in
        [A-Z])
            n=$(printf "%d" "'$1")
            n=$((n+32))
            printf \\$(printf "%o" "$n")
            ;;
        *)
            printf "%s" "$1"
            ;;
    esac
}

# Prints the lowercase version of a sequence of strings
lowercase() {
    word="$@"
    for((i=0;i<${#word};i++)); do
        ch="${word:$i:1}"
        lowercaseChar "$ch"
    done
}

# Prints the uppercase version of a char
uppercaseChar(){
    case "$1" in
        [a-z])
            n=$(printf "%d" "'$1")
            n=$((n-32))
            printf \\$(printf "%o" "$n")
            ;;
        *)
            printf "%s" "$1"
            ;;
    esac
}

# Prints the uppercase version of a sequence of strings
uppercase() {
    word="$@"
    for((i=0;i<${#word};i++)); do
        ch="${word:$i:1}"
        uppercaseChar "$ch"
    done
}

# The functions will not add a new line, so use echo or
# append it if you want a new line after printing

# Printing stuff directly
lowercase "I AM the Walrus!"$'\n'
uppercase "I AM the Walrus!"$'\n'

echo "----------"

# Printing a var
str="A StRing WITH mixed sTUFF!"
lowercase "$str"$'\n'
uppercase "$str"$'\n'

echo "----------"

# Not quoting the var should also work, 
# since we use "$@" inside the functions
lowercase $str$'\n'
uppercase $str$'\n'

echo "----------"

# Assigning to a var
myLowerVar="$(lowercase $str)"
myUpperVar="$(uppercase $str)"
echo "myLowerVar: $myLowerVar"
echo "myUpperVar: $myUpperVar"

echo "----------"

# You can even do stuff like
if [[ 'option 2' = "$(lowercase 'OPTION 2')" ]]; then
    echo "Fine! All the same!"
else
    echo "Ops! Not the same!"
fi

exit 0

And the results after running this:

$ ./lowerupper.sh 
i am the walrus!
I AM THE WALRUS!
----------
a string with mixed stuff!
A STRING WITH MIXED STUFF!
----------
a string with mixed stuff!
A STRING WITH MIXED STUFF!
----------
myLowerVar: a string with mixed stuff!
myUpperVar: A STRING WITH MIXED STUFF!
----------
Fine! All the same!

This should only work for ASCII characters though .

For me it is fine, since I know I will only pass ASCII chars to it.
I am using this for some case-insensitive CLI options, for example.

nitinr708 , Jul 8, 2016 at 9:20

To store the transformed string into a variable. Following worked for me - $SOURCE_NAME to $TARGET_NAME
TARGET_NAME="`echo $SOURCE_NAME | tr '[:upper:]' '[:lower:]'`"

[Jun 18, 2017] An introduction to parameter expansion in Bash by James Pannacciulli

Notable quotes:
"... parameter expansion ..."
"... var="" ..."
"... var="gnu" ..."
"... parameter expansion ..."
"... offset of 5 length of 4 ..."
"... parameter expansion ..."
"... pattern of string of _ ..."
Jun 18, 2017 | opensource.com
About conditional, substring, and substitution parameter expansion operators Conditional parameter expansion

Conditional parameter expansion allows branching on whether the parameter is unset, empty, or has content. Based on these conditions, the parameter can be expanded to its value, a default value, or an alternate value; throw a customizable error; or reassign the parameter to a default value. The following table shows the conditional parameter expansions-each row shows a parameter expansion using an operator to potentially modify the expansion, with the columns showing the result of that expansion given the parameter's status as indicated in the column headers. Operators with the ':' prefix treat parameters with empty values as if they were unset.

parameter expansion unset var var="" var="gnu"
${var-default} default - gnu
${var:-default} default default gnu
${var+alternate} - alternate alternate
${var:+alternate} - - alternate
${var?error} error - gnu
${var:?error} error error gnu

The = and := operators in the table function identically to - and :- , respectively, except that the = variants rebind the variable to the result of the expansion.

As an example, let's try opening a user's editor on a file specified by the OUT_FILE variable. If either the EDITOR environment variable or our OUT_FILE variable is not specified, we will have a problem. Using a conditional expansion, we can ensure that when the EDITOR variable is expanded, we get the specified value or at least a sane default:

$
echo
${
EDITOR
}

/usr/bin/vi
$
echo
${
EDITOR
:-
$(
which nano
)
}

/usr/bin/vi
$
unset
EDITOR
$
echo
${
EDITOR
:-
$(
which nano
)
}

/usr/bin/nano

Building on the above, we can run the editor command and abort with a helpful error at runtime if there's no filename specified:

$
${
EDITOR
:-
$(
which nano
)
}
${
OUT_FILE
:?
Missing filename
}

bash: OUT_FILE: Missing filename
Substring parameter expansion

Parameters can be expanded to just part of their contents, either by offset or by removing content matching a pattern. When specifying a substring offset, a length may optionally be specified. If running Bash version 4.2 or greater, negative numbers may be used as offsets from the end of the string. Note the parentheses used around the negative offset, which ensure that Bash does not parse the expansion as having the conditional default expansion operator from above:

$
location
=
"
CA 90095
"
$
echo
"
Zip Code: 
${
location
:
3
}
"

Zip Code: 90095
$
echo
"
Zip Code: 
${
location
:
(-5)
}
"

Zip Code: 90095
$
echo
"
State: 
${
location
:
0
:
2
}
"

State: CA

Another way to take a substring is to remove characters from the string matching a pattern, either from the left edge with the # and ## operators or from the right edge with the % and %% operators. A useful mnemonic is that # appears left of a comment and % appears right of a number. When the operator is doubled, it matches greedily, as opposed to the single version, which removes the most minimal set of characters matching the pattern.

var="open source"
parameter expansion offset of 5
length of 4
${var:offset} source
${var:offset:length} sour
pattern of *o?
${var#pattern} en source
${var##pattern} rce
pattern of ?e*
${var%pattern} open sour
${var%%pattern} o

The pattern-matching used is the same as with filename globbing: * matches zero or more of any character, ? matches exactly one of any character, [...] brackets introduce a character class match against a single character, supporting negation ( ^ ), as well as the posix character classes, e.g. . By excising characters from our string in this manner, we can take a substring without first knowing the offset of the data we need:

$
echo
$
PATH

/usr/local/bin:/usr/bin:/bin
$
echo
"
Lowest priority in PATH: 
${
PATH
##
*:
}
"

Lowest priority in PATH: /bin
$
echo
"
Everything except lowest priority: 
${
PATH
%
:*
}
"

Everything except lowest priority: /usr/local/bin:/usr/bin
$
echo
"
Highest priority in PATH: 
${
PATH
%%
:*
}
"

Highest priority in PATH: /usr/local/bin
Substitution in parameter expansion

The same types of patterns are used for substitution in parameter expansion. Substitution is introduced with the / or // operators, followed by two arguments separated by another / representing the pattern and the string to substitute. The pattern matching is always greedy, so the doubled version of the operator, in this case, causes all matches of the pattern to be replaced in the variable's expansion, while the singleton version replaces only the leftmost.

var="free and open"
parameter expansion pattern of
string of _
${var/pattern/string} free_and open
${var//pattern/string} free_and_open

The wealth of parameter expansion modifiers transforms Bash variables and other parameters into powerful tools beyond simple value stores. At the very least, it is important to understand how parameter expansion works when reading Bash scripts, but I suspect that not unlike myself, many of you will enjoy the conciseness and expressiveness that these expansion modifiers bring to your scripts as well as your interactive sessions.

[Nov 04, 2016] Coding Style rear-rear Wiki

Reading rear sources is an interesting exercise. It really demonstrates attempt to use "reasonable' style of shell programming and you can learn a lot.
Nov 04, 2016 | github.com

Relax-and-Recover is written in Bash (at least bash version 3 is needed), a language that can be used in many styles. We want to make it easier for everybody to understand the Relax-and-Recover code and subsequently to contribute fixes and enhancements.

Here is a collection of coding hints that should help to get a more consistent code base.

Don't be afraid to contribute to Relax-and-Recover even if your contribution does not fully match all this coding hints. Currently large parts of the Relax-and-Recover code are not yet in compliance with this coding hints. This is an ongoing step by step process. Nevertheless try to understand the idea behind this coding hints so that you know how to break them properly (i.e. "learn the rules so you know how to break them properly").

The overall idea behind this coding hints is:

Make yourself understood

Make yourself understood to enable others to fix and enhance your code properly as needed.

From this overall idea the following coding hints are derived.

For the fun of it an extreme example what coding style should be avoided:

#!/bin/bash for i in `seq 1 2 $((2*$1-1))`;do echo $((j+=i));done



   

Try to find out what that code is about - it does a useful thing.

Code must be easy to read Code should be easy to understand

Do not only tell what the code does (i.e. the implementation details) but also explain what the intent behind is (i.e. why ) to make the code maintainable.

Here the initial example so that one can understand what it is about:

#!/bin/bash # output the first N square numbers # by summing up the first N odd numbers 1 3 ... 2*N-1 # where each nth partial sum is the nth square number # see https://en.wikipedia.org/wiki/Square_number#Properties # this way it is a little bit faster for big N compared to # calculating each square number on its own via multiplication N=$1 if ! [[ $N =~ ^[0-9]+$ ]] ; then echo "Input must be non-negative integer." 1>&2 exit 1 fi square_number=0 for odd_number in $( seq 1 2 $(( 2 * N - 1 )) ) ; do (( square_number += odd_number )) && echo $square_number done

Now the intent behind is clear and now others can easily decide if that code is really the best way to do it and easily improve it if needed.

Try to care about possible errors

By default bash proceeds with the next command when something failed. Do not let your code blindly proceed in case of errors because that could make it hard to find the root cause of a failure when it errors out somewhere later at an unrelated place with a weird error message which could lead to false fixes that cure only a particular symptom but not the root cause.

Maintain Backward Compatibility

Implement adaptions and enhancements in a backward compatible way so that your changes do not cause regressions for others.

Dirty hacks welcome

When there are special issues on particular systems it is more important that the Relax-and-Recover code works than having nice looking clean code that sometimes fails. In such special cases any dirty hacks that intend to make it work everywhere are welcome. But for dirty hacks the above listed coding hints become mandatory rules:

For example a dirty hack like the following is perfectly acceptable:

# FIXME: Dirty hack to make it work # on "FUBAR Linux version 666" # where COMMAND sometimes inexplicably fails # but always works after at most 3 attempts # see http://example.org/issue12345 # Retries should have no bad effect on other systems # where the first run of COMMAND works. COMMAND || COMMAND || COMMAND || Error "COMMAND failed."

Character Encoding

Use only traditional (7-bit) ASCII charactes. In particular do not use UTF-8 encoded multi-byte characters.

Text Layout Variables Functions Relax-and-Recover functions

Use the available Relax-and-Recover functions when possible instead of re-implementing basic functionality again and again. The Relax-and-Recover functions are implemented in various lib/*-functions.sh files .

test, [, [[, (( Paired parenthesis See also

[Nov 01, 2014] How to determine if a string is a substring of another in bash?

I want to see if a string is inside a portion of another string.e.g.:
'ab' in 'abc' -> true
'ab' in 'bcd' -> false

How can I do this in a conditional of a bash script?

A: You can use the form ${VAR/subs} where VAR contains the bigger string and subs is the substring your are trying to find:

my_string=abc
substring=ab
if [ "${my_string/$substring}" = "$my_string" ] ; then
  echo "${substring} is not in ${my_string}"
else
  echo "${substring} was found in ${my_string}"
fi

This works because ${VAR/subs} is equal to $VAR but with the first occurrence of the string subs removed, in particular if $VAR does not contains the word subs it won't be modified.

I think that you should change the sequence of the echo statements. Because I get ab is not in abc –

Mmm.. No, the script is wrong. Like that I get ab was found in abc, but if I use substring=z I get z was found in abc – Lucio May 25 '13 at 0:08

===

Sorry again I forgot the $ in substring. – edwin May 25 '13 at 0:10

Now I get ab is not in abc. But z was found in abc. This is funny :D – Lucio May 25 '13 at 0:11

===

[[ "ab" =~ "bcd" ]]
[[ "ab" =~ "abc" ]]

the brackets are for the test, and as it is double brackets, it can so some extra tests like =~.

So you could use this form something like

var1="ab"
var2="bcd"
if [[ "$var2" =~ "$var1" ]]; then
    echo "pass"
else
    echo "fail"
fi

Edit: corrected "=~", had flipped.

I get fail with this parameters: var2="abcd" – Lucio May 25 '13 at 0:02

===

@Lucio The correct is [[ $string =~ $substring ]]. I updated the answer. – Eric Carvalho May 25 '13 at 0:38

===

@EricCarvalho opps, thanks for correcting it. – demure May 25 '13 at 0:49

===

Using bash filename patterns (aka "glob" patterns)

substr=ab
[[ abc == *"$substr"* ]] && echo yes || echo no    # yes
[[ bcd == *"$substr"* ]] && echo yes || echo no    # no
===

The following two approaches will work on any POSIX-compatible environment, not just in bash:

substr=ab
for s in abc bcd; do
    if case ${s} in *"${substr}"*) true;; *) false;; esac; then
        printf %s\\n "'${s}' contains '${substr}'"
    else
        printf %s\\n "'${s}' does not contain '${substr}'"
    fi
done
substr=ab
for s in abc bcd; do
    if printf %s\\n "${s}" | grep -qF "${substr}"; then
        printf %s\\n "'${s}' contains '${substr}'"
    else
        printf %s\\n "'${s}' does not contain '${substr}'"
    fi
done

Both of the above output:

'abc' contains 'ab'
'bcd' does not contain 'ab'

The former has the advantage of not spawning a separate grep process.

Note that I use printf %s\\n "${foo}" instead of echo "${foo}" because echo might mangle ${foo} if it contains backslashes.

===

Mind the [[ and ":

[[ $a == z* ]]   # True if $a starts with an "z" (pattern matching).
[[ $a == "z*" ]] # True if $a is equal to z* (literal matching).

[ $a == z* ]     # File globbing and word splitting take place.
[ "$a" == "z*" ] # True if $a is equal to z* (literal matching).

So as @glenn_jackman said, but mind that if you wrap the whole second term in double quotes, it will switch the test to literal matching.

Source: http://tldp.org/LDP/abs/html/comparison-ops.html

[Mar 16, 2011] Bash pattern substitution

Here's the actual formal definition from the bash man pages:
 ${parameter/pattern/string}
 ${parameter//pattern/string}

The pattern is expanded to produce a patter t- tern against its value is replaced with string. In the first form, only the first match is replaced. The second form causes all matches of pattern to be replaced with string. If pattern begins with #, it must match at the beginning of the expanded value of parameter. If pattern begins with %, it must match at the end of the expanded value of parameter. If string is null, matches of pattern are deleted and the / following pattern may be omitted. If parameter is @ or *, the substitution operation is applied to each positional parameter in turn, and the expan- sion is the resultant list. If parameter is an array variable subscripted with @ or *, the substitution operation is applied to each member of the array in turn, and the expansion is the resultant list.

[Mar 16, 2011] Bash info, scripting examples, regex parameter substitution, interactive shell, and more

Regular expressions and globbing

Globbing is use of * as a wildcard to glob file name list together. Use of wildcards is not a regular expression.

These following examples should also work inside bash scripts. These may or may not be compatible with sh. These are "interesting" regex or globbing examples. I say "interesting" because they don't seem to follow the path of "true" regular expressions used by Perl.

[mst3k@zeus ~]$ echo ${HOME/\/home\//}
mst3k
[mst3k@zeus ~]$ echo ${HOME##home}
/home/mst3k
[mst3k@zeus ~]$ echo ${HOME##/home}
/mst3k
[mst3k@zeus ~]$ echo ${HOME##/home/}
mst3k
[mst3k@zeus ~]$ echo ${HOME##*}

[mst3k@zeus ~]$ echo ${HOME##*/}

Trouble with the string replacement function in bash

I'm having trouble with the string replacement function in bash. The problem is that I want to replace a printing character, in this case &, by a non-printing character, either new line or null in this case. I don't see how to specify the non-printing character in the string replacement function ${variable//a/b}.

I have a long, URL-encoded-like file name that I would like to parse with grep. I have used & as a delimiter between variables within the long file name. I would like to use the string replacement function in bash to search for all instances of & and replace each one with either the null character or the new line character since grep can recognize either one.

How do I specify a non-printing character in the bash string replacement function ?

Thank you.

Special Syntax
Submitted by Mitch Frazier on Fri, 04/02/2010 - 11:43.
Use the $'\xNN' syntax for the non-printing character. Note though that a NULL character does not work:

$ cat j.sh

v="hello=yes&world=no"

v2=${v/&/$'\x0a'}
# ^^^^^^^ change to newline
echo -n ">>$v2<<" | hexdump -C

v2=${v/&/$'\x00'}
# ^^^^^^^ change to null (doesn't work)
echo -n ">>$v2<<" | hexdump -C
If you run this you can see that the substitution works for a newline but not for a NULL:

$ sh j.sh
00000000 3e 3e 68 65 6c 6c 6f 3d 79 65 73 0a 77 6f 72 6c |>>hello=yes.worl|
00000010 64 3d 6e 6f 3c 3c |d=no<<|
00000016
00000000 3e 3e 68 65 6c 6c 6f 3d 79 65 73 77 6f 72 6c 64 |>>hello=yesworld|
00000010 3d 6e 6f 3c 3c |=no<<|
00000015
Mitch Frazier is an Associate Editor for Linux Journal.

Multiple operations?
Submitted by Anonymous on Wed, 03/24/2010 - 10:56.
Very interesting article.
A question: is it possible to use in the same expression many operators, as:
${var#t*is%t*st} which uses both '#t*is' and '%t*st' which gives 'is a' in the example?
I tried some forms but it doesn't work... Has someone an idea?

Doesn't Work
Submitted by Mitch Frazier on Wed, 03/24/2010 - 14:59.
You can't do multiple operations in one expression.

Mitch Frazier is an Associate Editor for Linux Journal.

Variable
Submitted by First question (not verified) on Tue, 09/01/2009 - 07:07.
I want to do something like this using linux bash script:
a1="Chris Alonso"
i="1"
echo $a$i #I only trying to write: echo $a1 using the variable i

Someone can help me, please?

Eval
Submitted by Mitch Frazier on Tue, 09/01/2009 - 13:41.
Eval will do this for you but you may decide you really don't want to do this after seeing it:

eval echo \$$(echo a$i)
or
eval echo \$`echo a$i`
A slightly less complicated sequence would be something like:

v=a$i
eval echo \$$v
It looks like what you're trying to do here is simulate arrays. If that's the case then you'd be better or using bash's built-in arrays.

Mitch Frazier is an Associate Editor for Linux Journal.

How about v=a$i echo ${!v}
Submitted by Anonymous (not verified) on Fri, 09/25/2009 - 15:09.
How about

v=a$i
echo ${!v}

simplification of indirect reference
Submitted by Anonymous on Fri, 03/12/2010 - 23:16.
Is there any way to rid the statements of the variable assignment? As in, make it so that:


echo ${!a$i}

works? I'm thinking that there has to be a way to escape the "a$i" inside the indirect reference construct. I have a case where I'm trying to do this with the result of a regex match, and am not able to figure out the right syntax:


SRC_FOLDER=/var/website
needleA_FOLDER=/var/www
needleB_FOLDER=/var/htdocs

for item in ${ARRAY[@]; do
[[ "$item" =~ hay(needle)stack ]] &&
DIR=${!${BASH_REMATCH[1]}_FOLDER};
cp -R $SRC_FOLDER/* $DIR;
done;

But the seventh line (with the indirect reference) chokes with a "bad substitution" error. I should be able to do this on one line, without using eval with the right syntax, no?

Sincerely,
Tyler

Yes
Submitted by Mitch Frazier on Fri, 09/25/2009 - 15:36.
That works and is simpler than my solution.

Mitch Frazier is an Associate Editor for Linux Journal.

: or not to :
Submitted by Ash (not verified) on Mon, 01/08/2007 - 07:47.
Interesting, ":" can be ommited for "numeric" variables (script/function arguments).

baz=${var:-bar}
vs.

baz=${1-bar}
First time I thought it is a typo, but it is not.

interesting, this will save a few seds and greps!
Submitted by mangoo (not verified) on Tue, 01/29/2008 - 06:24.
Interesting article, this will save me a few seds, greps and awks!

what if we are to operate on
Submitted by MgBaMa req (not verified) on Mon, 09/03/2007 - 22:44.
what if we are to operate on the param $1 $2, ...?
i mean is it feasible to see a result of ${4%/*}
to get a valule as from
$ export $1="this is a test/none"
$ echo ${$1/*}
> this is a test
$ echo ${$2#*/}
> none
?
thanks

variable contents confusing
Submitted by Paul Archerr (not verified) on Wed, 03/29/2006 - 09:39.
Minor typos not withstanding, I had a bit of a problem with the values of the variables used. assigning the value 'bar' to the variable bar makes it confusing to quickly figure out which is which. (Is that 'bar' another variable? Or a value?)
I would suggest making the simple change of putting your values in uppercase. They would stand out and make the article more readable.
For example:
$ export var=var
$ echo ${var}bar # var exists so this works as expected
varbar
$ echo $varbar # varbar doesn't exist, so this doesn't
$

becomes

$ export var=VAR
$ echo ${var}bar # var exists so this works as expected
VARbar
$ echo $varbar # varbar doesn't exist, so this doesn't
$

You can see how the 'VARbar' on the third line becomes differentiated from the 'varbar' on the fourth line.

var, bar, +varbar: worst things for programming since Microsoft.
Submitted by Anonymous (not verified) on Mon, 09/10/2007 - 14:35.
Using var and bar to try to inform someone is pretty much uniformly bad everywhere it's done, as var and bar explicitly indicate things that don't have any meaning whatsoever. Varbar is even worse, since it is visibly only different from var bar because of a single " " (space).

If you're trying to confuse the reader, use var, bar, and especially varbar.

If you're trying to be informative, please, give your damn variables a short but logically useful name.

Part II?
Submitted by Anonymous (not verified) on Fri, 03/24/2006 - 08:58.
OK but there's a lot more to it than just this. How about some of the following?

${var:pos[:len]} # extract substr from pos (0-based) for len

${var/substr/repl} # replace first match
${var//substr/repl} # replace all matches
${var/#substr/repl} # replace if matches at beginning (non-greedy)
${var/##substr/repl} # replace if matches at beginning (greedy)
${var/%substr/repl} # replace if matches at end (non-greedy)
${var/%%substr/repl} # replace if matches at end (greedy)

${#var} # returns length of $var
${!var} # indirect expansion

...Sorry, those round parens


Submitted by Anonymous (not verified) on Fri, 03/24/2006 - 09:07.
...Sorry, those round parens should be curlies.

Interesting
Submitted by Stephanie (not verified) on Sun, 03/12/2006 - 21:51.
I think they author did a great job explaining the article and am glad that I was able to learn from it and finally found something interesting to read online!

Examples in Table 1 are rubbish
Submitted by Anonymous (not verified) on Sun, 03/12/2006 - 20:56.
In addition to the incorrect $var= (should be var=), the last two examples don't illustrate the use of the construct they are supposed to . Pity the author did not proof-read the first table.

Examples using same operator yet differing results?!
Submitted by really-txtedmacs (not verified) on Fri, 03/10/2006 - 19:46.
${var#t*is} deletes the shortest possible match from the left export $var="this is a test" echo ${var#t*is} is a test

fine, but next in line is supposed to remove the maximum from the left, but uses the same exact operator, how does it get the correct result?

${var##t*is} deletes the longest possible match from the left export $var="this is a test" echo ${var#t*is} a test

Get's worse when going from the right, the original operation from the right is employed. Moreover, on my system an Ubuntu 05.10 descktop, this gave:

txtedmacs@phpserver:~$ export $var="this is a test"
bash: export: `=this is a test': not a valid identifier

Take out the $var, and it works fine.

Much easier to catch someone else's errors than one's own - I hate looking at my articles or emails.

Errors in article
Submitted by Anonymous (not verified) on Mon, 03/13/2006 - 22:49.
As really-txtedmacs tried to politely point out, there are errors in the Pattern Matching table - Example column, as of when he looked at it and as of now. Each instance of "export $var" should be "export var" in bash and most similar shells. Also, the operator in the echo command needs to match exactly the operator in the first column. Interestingly, some but not all of these errors still exist in the original article at http://linuxgazette.net/issue57/eyler.html, which is in issue 57, not 67.
Otherwise, a very good article. I will save the info in my bag of tricks.

You're channelling Larry Wall, dude!
Submitted by Jim Dennis (not verified) on Sat, 03/11/2006 - 18:30.
In Bourne shell and its ilk (like Korn shell and bash) the assignment syntax is:
var=... You only prefix a variable's name with $ when you're "dereferencing" it (expanding it into its value).

So the shell was parsing
export $var="this is a test" as:

export ???="this is a test" (where ??? is whatever "var" was set to before this statement ... probably the empty string if the variable was previously unset).

I know this is confusing because Perl does it completely differently. In Perl the $ is a "sigil" which, on an "lvalue" (a variable name or other assignable token) tells the interpeter what "type" of assignment is occuring. Thus a Perl statement like:
$var="this is a test"; (note the required semicolon, too) is a "scalar" assignment. This also sets the context of the assignment. In Perl a scalar value in scalar context is conceptually the closest to a normal shell variable assignment. However, a list value in a scalar assignment context is a different beast entirely. So a line of Perl like
perl -e '@bar=(1,2,3)]; $var=@bar; print $var ;' will set $var to the number of items in the bar array. (Of course we could use @var for the array name since they are different namespaces in Perl. But I wanted my example to be clear). So an array/list value in scalar context returns an integer (a type of scalar) which represents the number of elements in the list.

Anyway, just remembrer that the shell $ it more like the C programming * operator ... it dereferences the variable into its value.

JimD
The Linux Gazette "Answer Guy"

USA <> World :-)
Submitted by peter.green on Fri, 03/10/2006 - 14:57.
Although the # and % identifiers may not seem obvious, they have a convenient mnemonic. The # key is on the left side of the $ key and operates from the left.
In the USA, perhaps, but my UK keyboard has the # key nestling up against the Enter and right-Shift keys. Not to mention layouts such as Dvorak...!

Other (non-USA-specific?!) Mnemonics
Submitted by Anonymous (not verified) on Sat, 03/18/2006 - 16:29.
Another way to keep track is that we say "#1" and "1%", not "1#" and "%1". That is, unless you're using "#" to mean "pounds", in which case "1#" is correct, but it's antiquated at best in the USA, and presumably a nonissue for other countries that use metric...

C programmers are used to using "#" at the start of lines (#define, #include). LaTeX authors are used to "%" at the end of lines when writing macro definitions, as a comment to keep extraneous whitespace from creeping in--but "%" is comment to end-of-line so it's also likely to show up at the start of a line too...

Mnemonics
Submitted by Island Joe (not verified) on Tue, 03/21/2006 - 04:25.
Thanks for sharing those mnemonic insights, it's most helpful.

[Feb 09, 2011] Pattern matching with replacement

${var:pos[:len]} # extract substr from pos (0-based) for len

${var/substr/repl} # replace first match
${var//substr/repl} # replace all matches
${var/#substr/repl} # replace if matches at beginning (non-greedy)
${var/##substr/repl} # replace if matches at beginning (greedy)
${var/%substr/repl} # replace if matches at end (non-greedy)
${var/%%substr/repl} # replace if matches at end (greedy)

${#var} # returns length of $var
${!var} # indirect expansion

[Feb 09, 2011] : or not to :

Jan 08, 2007

Ash:

Interesting, ":" can be omitted for "numeric" variables (script/function arguments).

baz=${var:-bar}
vs.
baz=${1-bar}
First time I thought it is a typo, but it is not.

bash String Manipulations By Jim Dennis, jimd@starshine.org

The bash shell has many features that are sufficiently obscure you almost never see them used. One of the problems is that the man page offers no examples.

Here I'm going t... some of these features to do the sorts of simple string manipulations that are commonly needed on file and path names.

In traditional Bourne shell programming you might see references to the basename and dirname commands. These perform simple string manipulations on their arguments. You'll also see many uses of sed and awk or perl -e to perform simple string manipulations.

Often these machinations are necessary perform on lists of filenames and paths. There are many specialized programs that are conventionally included with Unix to perform these sorts of utility functions: tr, cut, paste, and join. Given a filename like /home/myplace/a.data.directory/a.filename.txt which we'll call $f you could use commands like:

dirname $f 
basename $f 
basename $f.txt

... to see output like:

/home/myplace/a.data.directory
a.filename.txt
a.filename 

Notice that the GNU version of basename takes an optional parameter. This handy for specifying a filename "extension" like .tar.gz which will be stripped off of the output. Note that basename and dirname don't verify that these parameters are valid filenames or paths. They simple perform simple string operations on a single argument. You shouldn't use wild cards with them -- since dirname takes exactly one argument (and complains if given more) and basename takes one argument and an optional one which is not a filename.

Despite their simplicity these two commands are used frequently in shell programming because most shells don't have any built-in string handling functions -- and we frequently need to refer to just the directory or just the file name parts of a given full file specification.

Usually these commands are used within the "back tick" shell operators like TARGETDIR=`dirname $1`. The "back tick" operators are equivalent to the $(...) construct. This latter construct is valid in Korn shell and bash -- and I find it easier to read (since I don't have to squint at me screen wondering which direction the "tick" is slanted).

Although the basename and dirname commands embody the "small is beautiful" spirit of Unix -- they may push the envelope towards the "too simple to be worth a separate program" end of simplicity.

Naturally you can call on sed, awk, TCL or perl for more flexible and complete string handling. However this can be overkill -- and a little ungainly.

So, bash (which long ago abandoned the "small is beautiful" principal and went the way of emacs) has some built in syntactical candy for doing these operations. Since bash is the default shell on Linux systems then there is no reason not to use these features when writing scripts for Linux.

The bash man page is huge. In contains a complete reference to the "readline" libraries and how to write a .inputrc file (which I think should all go in a separate man page) -- and a run down of all the csh "history" or bang! operators (which I think should be replaced with a simple statement like: "Most of the csh work the same way in bash").

However, buried in there is a section on Parameter Substitution which tells us that $var is really a shorthand for ${var} which is really the simplest case of several ${var:operators} and similar constructs.

Are you confused, yet?

Here's where a few examples would have helped. To understand the man page I simply experimented with the echo command and several shell variables. This is what it all means:

Here we notice two different "operators" being used inside the parameters (curly braces). Those are the # and the % operators. We also see them used as single characters and in pairs. This gives us four combinations for trimming patterns off the beginning or end of a string:

${variable%pattern}
Trim the shortest match from the end
${variable##pattern}
Trim the longest match from the beginning
${variable%%pattern}
Trim the shortest match from the end
${variable#pattern}
Trim the shortest match from the beginning

It's important to understand that these use shell "globbing" rather than "regular expressions" to match these patterns. Naturally a simple string like "txt" will match sequences of exactly those three characters in that sequence -- so the difference between "shortest" and "longest" only applies if you are using a shell wild card in your pattern.

A simple example of using these operators comes in the common question of copying or renaming all the *.txt to change the .txt to .bak (in MS-DOS' COMMAND.COM that would be REN *.TXT *.BAK).

This is complicated in Unix/Linux because of a fundamental difference in the programming API's. In most Unix shells the expansion of a wild card pattern into a list of filenames (called "globbing") is done by the shell -- before the command is executed. Thus the command normally sees a list of filenames (like "var.txt bar.txt etc.txt") where DOS (COMMAND.COM) hands external programs a pattern like *.TXT.

Under Unix shells, if a pattern doesn't match any filenames the parameter is usually left on the command like literally. Under bash this is a user-settable option. In fact, under bash you can disable shell "globbing" if you like -- there's a simple option to do this. It's almost never used -- because commands like mv, and cp won't work properly if their arguments are passed to them in this manner.

However here's a way to accomplish a similar result:

for i in *.txt; do cp $i ${i%.txt}.bak; done 

... obviously this is more typing. If you tried to create a shell function or alias for it -- you have to figure out how to pass this parameters. Certainly the following seems simple enough:

function cp-pattern { for i in $1; do cp $i ${i%$1}$2; done

... but that doesn't work like most Unix users would expect. You'd have to pass this command a pair of specially chosen, and quoted arguments like:

cp-pattern '*.txt' .bak 

... note how the second pattern has no wild cards and how the first is quoted to prevent any shell globbing. That's fine for something you might just use yourself -- if you remember to quote it right. It's easy enough to add check for the number of arguments and to ensure that there is at least one file that exists in the $1 pattern. However it becomes much harder to make this command reasonably safe and robust. Inevitably it becomes less "unix-like" and thus more difficult to use with other Unix tools.

I generally just take a whole different approach. Rather than trying to use cp to make a backup of each file under a slightly changed name I might just make a directory (usually using the date and my login ID as a template) and use a simple cp command to copy all my target files into the new directory.

Another interesting thing we can do with these "parameter expansion" features is to iterate over a list of components in a single variable.

For example, you might want to do something to traverse over every directory listed in your path -- perhaps to verify that everything listed therein is really a directory and is accessible to you.

Here's a command that will echo each directory named on your path on it's own line:

p=$PATH until [ $p = $d ]; do d=${p%%:*}; p=${p#*:}; echo $d; 
		done 

... obviously you can replace the echo $d part of this command with anything you like.

Another case might be where you'd want to traverse a list of directories that were all part of a path. Here's a command pair that echos each directory from the root down to the "current working directory":

p=$(pwd) until [ $p = $d ]; do p=${p#*/}; d=${p%%/*}; echo $d; 
		done 

... here we've reversed the assignments to p and d so that we skip the root directory itself -- which must be "special cased" since it appears to be a "null" entry if we do it the other way. The same problem would have occurred in the previous example -- if the value assigned to $PATH had started with a ":" character.

Of course, its important to realize that this is not the only, or necessarily the best method to parse a line or value into separate fields. Here's an example that uses the old IFS variable (the "inter-field separator in the Bourne, and Korn shells as well as bash) to parse each line of /etc/passwd and extract just two fields:

cat /etc/passwd | ( \
   IFS=: ;
   while read lognam pw id gp fname home sh; \
   do echo $home \"$fname\"; done \
)

Here we see the parentheses used to isolate the contents in a subshell -- such that the assignment to IFS doesn't affect our current shell. Setting the IFS to a "colon" tells the shell to treat that character as the separater between "words" -- instead of the usual "whitespace" that's assigned to it. For this particular function it's very important that IFS consist solely of that character -- usually it is set to "space," "tab," and "newline.

After that we see a typical while read loop -- where we read values from each line of input (from /etc/passwd into seven variables per line. This allows us to use any of these fields that we need from within the loop. Here we are just using the echo command -- as we have in the other examples.

My point here has been to show how we can do quite a bit of string parsing and manipulation directly within bash -- which will allow our shell scripts to run faster with less overhead and may be easier than some of the more complex sorts of pipes and command substitutions one might have to employ to pass data to the various external commands and return the results.

Many people might ask: Why not simply do it all in perl? I won't dignify that with a response. Part of the beauty of Unix is that each user has many options about how they choose to program something. Well written scripts and programs interoperate regardless of what particular scripting or programming facility was used to create them. Issue the command file /usr/bin/* on your system and and you may be surprised at how many Bourne and C shell scripts there are in there

In conclusion I'll just provide a sampler of some other bash parameter expansions:

${parameter:-word}
Provide a default if parameter is unset or null.
Example:
echo ${1:-"default"}
Note: this would have to be used from within a functions or shell script -- the point is to show that some of the parameter substitutions can be use with shell numbered arguments. In this case the string "default" would be returned if the function or script was called with no $1 (or if all of the arguments had been shifted out of existence. ${parameter:=word}
Assign a value to parameter if it was previously unset or null.
Example:
echo ${HOME:="/home/.nohome"}
${parameter:?word}
Generate an error if parameter is unset or null by printing word to stdout.
Example:
${HOME:="/home/.nohome"}
${TMP:?"Error: Must have a valid Temp Variable Set"}

This one just uses the shell "null command" (the : command) to evaluate the expression. If the variable doesn't exist or has a null value -- this will print the string to the standard error file handle and exit the script with a return code of one.

Oddly enough -- while it is easy to redirect the standard error of processes under bash -- there doesn't seem to be an easy portable way to explicitly generate message or redirect output to stderr. The best method I've come up with is to use the /proc/ filesystem (process table) like so:

function error { echo "$*" > /proc/self/fd/2 } 

... self is always a set of entries that refers to the current process -- and self/fd/ is a directory full of the currently open file descriptors. Under Unix and DOS every process is given the following pre-opened file descriptors: stdin, stdout, and stderr.

${parameter:+word}
Alternative value. ${TMP:+"/mnt/tmp"}
use /mnt/tmp instead of $TMP but do nothing if TMP was unset. This is a weird one that I can't ever see myself using. But it is a logical complement to the ${var:-value} we saw above.
${#variable}
Return the length of the variable in characters.
Example:

    echo The length of your PATH is ${#PATH}

Manipulating Strings

From Advanced Bash-Scripting Guide: Chapter 10. Manipulating Variables

Bash supports a number of string manipulation operations. Unfortunately, these tools lack a unified focus. Some are a subset of parameter substitution, and others fall under the functionality of the UNIX expr command. This results in inconsistent command syntax and overlap of functionality, not to mention confusion.

expr match "$string" '\($substring\)'
Extracts $substring at beginning of $string, where $substring is a regular expression.
expr "$string" : '\($substring\)'
Extracts $substring at beginning of $string, where $substring is a regular expression.
stringZ=abcABC123ABCabc
#     =======	  echo `expr match "$stringZ" '\(.[b-c]*[A-Z]..[0-9]\)'` # abcABC1
echo `expr "$stringZ" : '\(.[b-c]*[A-Z]..[0-9]\)'`     # abcABC1
echo `expr "$stringZ" : '\(.......\)'`                 # abcABC1
# All of the above forms give an identical result.
expr match "$string" '.*\($substring\)'
Extracts $substring at end of $string, where $substring is a regular expression.
expr "$string" : '.*\($substring\)'
Extracts $substring at end of $string, where $substring is a regular expression.

stringZ=abcABC123ABCabc
#              ======

echo `expr match "$stringZ" '.*\([A-C][A-C][A-C][a-c]*\)'`  # ABCabc
echo `expr "$stringZ" : '.*\(......\)'`                     # ABCabc
Substring Removal
${string#substring}
Strips shortest match of $substring from front of $string.
${string##substring}
Strips longest match of $substring from front of $string.
stringZ=abcABC123ABCabc
#     |----|
#     |----------|

echo ${stringZ#a*C}    # 123ABCabc
# Strip out shortest match between 'a' and 'C'.

echo ${stringZ##a*C}   # abc
# Strip out longest match between 'a' and 'C'.

${string%substring}
Strips shortest match of $substring from back of $string.
${string%%substring}
Strips longest match of $substring from back of $string.

stringZ=abcABC123ABCabc
#                  ||
#      |------------|

echo ${stringZ%b*c}    # abcABC123ABCa
# Strip out shortest match between 'b' and 'c', from back of $stringZ.

echo ${stringZ%%b*c}   # a
# Strip out longest match between 'b' and 'c', from back of $stringZ.
Example 9-10. Converting graphic file formats, with filename change
#!/bin/bash
#  cvt.sh:
#  Converts all the MacPaint image files in a directory to "pbm" format.

#  Uses the "macptopbm" binary from the "netpbm" package,
#+ which is maintained by Brian Henderson (bryanh@giraffe-data.com).
#  Netpbm is a standard part of most Linux distros.

OPERATION=macptopbm
SUFFIX=pbm        # New filename suffix.

if [ -n "$1" ]
then
  directory=$1    # If directory name given as a script argument...
else
  directory=$PWD  # Otherwise use current working directory.
fi 
 
#  Assumes all files in the target directory are MacPaint image files,
# + with a ".mac" suffix.

for file in $directory/*  # Filename globbing.
do
  filename=${file%.*c}    #  Strip ".mac" suffix off filename
                          #+ ('.*c' matches everything
			  #+ between '.' and 'c', inclusive).
  $OPERATION $file > $filename.$SUFFIX
                          # Redirect conversion to new filename.
  rm -f $file             # Delete original files after converting. echo "$filename.$SUFFIX"  # Log what is happening to stdout.
done

exit 0
Substring Replacement
${string/substring/replacement}
Replace first match of $substring with $replacement.
${string//substring/replacement}
Replace all matches of $substring with $replacement.
stringZ=abcABC123ABCabc

echo ${stringZ/abc/xyz}         # xyzABC123ABCabc
                                # Replaces first match of 'abc' with 'xyz'.

echo ${stringZ//abc/xyz}        # xyzABC123ABCxyz
                                # Replaces all matches of 'abc' with # 'xyz'.

${string/#substring/replacement}
If $substring matches front end of $string, substitute $replacement for $substring.
${string/%substring/replacement}
If $substring matches back end of $string, substitute $replacement for $substring.
stringZ=abcABC123ABCabc

echo ${stringZ/#abc/XYZ}        # XYZABC123ABCabc
                                # Replaces front-end match of 'abc' with 'xyz'.

echo ${stringZ/%abc/XYZ}        # abcABC123ABCXYZ
                                # Replaces back-end match of 'abc' with 'xyz'.

Manipulating strings using awk

A Bash script may invoke the string manipulation facilities of awk as an alternative to using its built-in operations.

Example 9-11. Alternate ways of extracting substrings
#!/bin/bash
# substring-extraction.sh

String=23skidoo1
#    012345678  Bash
#    123456789  awk
# Note different string indexing system:
# Bash numbers first character of string as '0'.
# Awk  numbers first character of string as '1'.

echo ${String:2:4} # position 3 (0-1-2), 4 characters long
                                       # skid

# The awk equivalent of ${string:pos:length} is substr(string,pos,length).
echo | awk '
{ print substr("'"${String}"'",3,4)    # skid
}
'
#  Piping an empty "echo" to awk gives it dummy input,
#+ and thus makes it unnecessary to supply a filename.

exit 0

For more on string manipulation in scripts, refer to Section 9.3 and the relevant section of the expr command listing. For script examples, see:

  1. Example 12-6
  2. Example 9-14
  3. Example 9-15
  4. Example 9-16
  5. Example 9-18

David Korn Tells All

# More to the point, thanks to the way ksh works, you can do this:
# make an array, words, local to the current function

typeset -A words

# read a full line

read line

# split the line into words

echo "$line" | read -A words

# Now you can access the line either word-wise or string-wise - useful if you want to, say, check for a command as the Nth parameter,
# but also keep formatting of the other parameters...

Recommended Links

Google matched content

Softpanorama Recommended

Top articles

Sites



Etc

Society

Groupthink : Two Party System as Polyarchy : Corruption of Regulators : Bureaucracies : Understanding Micromanagers and Control Freaks : Toxic Managers :   Harvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Neoliberalism  : The Iron Law of Oligarchy : Libertarian Philosophy

Quotes

War and Peace : Skeptical Finance : John Kenneth Galbraith :Talleyrand : Oscar Wilde : Otto Von Bismarck : Keynes : George Carlin : Skeptics : Propaganda  : SE quotes : Language Design and Programming Quotes : Random IT-related quotesSomerset Maugham : Marcus Aurelius : Kurt Vonnegut : Eric Hoffer : Winston Churchill : Napoleon Bonaparte : Ambrose BierceBernard Shaw : Mark Twain Quotes

Bulletin:

Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks The efficient markets hypothesis : Political Skeptic Bulletin, 2013 : Unemployment Bulletin, 2010 :  Vol 23, No.10 (October, 2011) An observation about corporate security departments : Slightly Skeptical Euromaydan Chronicles, June 2014 : Greenspan legacy bulletin, 2008 : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Financial Humor Bulletin, 2010 : Inequality Bulletin, 2009 : Financial Humor Bulletin, 2008 : Copyleft Problems Bulletin, 2004 : Financial Humor Bulletin, 2011 : Energy Bulletin, 2010 : Malware Protection Bulletin, 2010 : Vol 26, No.1 (January, 2013) Object-Oriented Cult : Political Skeptic Bulletin, 2011 : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method  : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law

History:

Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds  : Larry Wall  : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOSProgramming Languages History : PL/1 : Simula 67 : C : History of GCC developmentScripting Languages : Perl history   : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history

Classic books:

The Peter Principle : Parkinson Law : 1984 : The Mythical Man-MonthHow to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Hater’s Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite

Most popular humor pages:

Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor

The Last but not Least Technology is dominated by two types of people: those who understand what they do not manage and those who manage what they do not understand ~Archibald Putt. Ph.D


Copyright © 1996-2018 by Dr. Nikolai Bezroukov. www.softpanorama.org was initially created as a service to the (now defunct) UN Sustainable Development Networking Programme (SDNP) in the author free time and without any remuneration. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.

FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.

This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...

You can use PayPal to make a contribution, supporting development of this site and speed up access. In case softpanorama.org is down you can use the at softpanorama.info

Disclaimer:

The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the author present and former employers, SDNP or any other organization the author may be associated with. We do not warrant the correctness of the information provided or its fitness for any purpose.

The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be tracked by Google please disable Javascript for this site. This site is perfectly usable without Javascript.

Last modified: December 26, 2017