Comparison operators in shell
In general, implementation of shell arithmetic expressions and conditional operators are the most
obscure areas of shell and are implemented considerably worse (or more dirty, if you wish) that in any
other important programming languages in existence. One reason for this is that they are implemented
as afterthought by people who tried to correct grave flaws in the design of Bourne shell. That will
always be a dark spot of the biography of
Steve Bourne. Bourne
shell was the default shell of Unix version 7 (1977). The fact that before implementing shell Steve
Bourne used
to work on Algol-68 compiler makes this situation tragicomical: you can't imagine any person who after
working on Algol-68 can made so stupid mistakes in constructing the shell language. Especially when
in the other room of the same building there were people who created (much superior and cleaner) AWK,
which was developed also in 1977 and also first appeared in
Version 7 Unix.
And the code between two can definitely be shared... But it does not happened. What an irony...
Those grave flaws of initial Borne shell design (partially causes by excessive desire to preserve
compatibility with earlier version of shell -- Thomson shell) cause a lot pain to students and I can do nothing, but try to explain this construct
the best that I can. Unfortunately I an not God and pain remains. You need to be very careful in your
first scripts not to get into some Gotchas. The best way is to use ready-made scripts from reputable
sources (and that means books: O'Reilly
books used to have downloadable examples)
as a prototype and gradually modify it. But you need to understand that there are good
examples and bad examples. A lot of scripts on the WEB do not use construction that are present in
bash 4.2 and later. Any script that is using old single bracket conditional expressions in
suspect in this sense. Modern scripts should be using double square brackets and double round brackets
conditional expression exclusively.
Key ideas of Unix shell are more then 40 years old. The first Unix shells were created at the same time as REXX (1979)
and Icon (1977). Also, if you look at AWK which was created the same years as shell
you can understand the difference in the caliber of people involved. Not only
Bourne shell adopted strange,
counterintuitive and different from every other language notation for comparisons, each different flavor
of shell implements them differently. Even worse they have multiple
Gotchas. Absence
of lexical scanner (and lexical level) in Bourne shell creates a real mess.
Any notion that a person who participated in
Algol68 compiler development would make architectural decisions
that were made in Bourne shell is a joke: as a product Bourne
shell is a shame for any decent compiler writer. It is a completely
amateurish product. That is very strange as Steve
Bourne run a project in the Cambridge Computer Laboratory to implement a version
of Algol 68 called ALGOL68C. And
Algol68 compiler designers were the elite of compiler writing community. Adoption of Algol-68
keywords (and "symmetrical" ending words as in if-then-fi ) for if statements and loops are
actually that only signs that Steve Borne participated in development of Algol-68 compiler. Clearly
he did not understood classic compiler technology well enough to use it in Borne shell development
(and David Gries seminal book
Compiler Construction for Digital Computers that provides detailed coverage of the state of the
art of compiler technology up to late 60th was published in 1971). Yes there were severe
memory constrains at the time of the Borne shell design. but use interpreter
like Forth
could shrink the
memory footprint dramatically. Forth was created by Charles Moore in 1968; Throughout the 1970s he
implemented Forth on 18 different machine architecture, so it was a really portable language with a
very small footprint , available in mid 70th.
Historically shell notation for conditionals was developed in three stages:
- First there was test program that accepted conditional expression as arguments, which was later
"beatified" into a single
square bracket construct [...] (single square
bracket conditionals or test construct). Single brackets is nothing but cosmetics created to hide
ugliness of test.
- Later in ksh88 to compensate weaknesses of this construct
[[...]] (double square bracket conditional
expressions) were
introduced.
- Even more later double round bracket
construct was introduced in ksh93 to compensate for absurdly bad implementation of integer comparisons
in both [..] and [[...]] constructs.
This solution from ksh93 eventually migrated to bash (starting from version
2.05b) and other shells.
Shell is the unique language in one more respect: it casts operators into strings or numbers depending of the type of operation used (implicit type
conversion). In other word operator dictates the type into which the variable is
converted before performing a particular operation. This is completely different approach from
the most of programming language in which comparison operators are "polymorphic" -- work for
all types of operators such as integers, floating point numbers and string.
Not so in bash and
other shells. Here is the operator used forces the conversion into the particular type. That's why shell uses two set of operators: one for integer comparisons and
the other for string comparisons.
That greatly confuse novice programmers, but this idea is not so
crazy as you can think from the first sight. It has its value and is probably one of the possible way
to design the language. But the only other major scripting language that adopted this non-orthodox idea of
implicit type conversion based on operators used is Perl. And that create the barrier of entry
for Perl too.. This is one reason why Python, which in many respects is a more traditional
Algol-style language overtook Perl.
This is what is called "Basic language effect" in action -- simpler language what can be
quickly learned by novices have much better chances to survive and prosper despite all
problems with it design, than a more complex language with less defects in the design, but for
example, more convoluted syntax of semantic rules. We see the same effect also with PHP.
So the second way to classify and study conditional expressions is by type they convert the
operands.
I think that it is the best for learning the intricacies of shell conditional operators and such an
approach can provide a useful framework that helps to avoid most Gotchas. It is adopted below.
From this point of view in modern shells such as bash we have:
- Integer comparison operators ( there is a new style
((...)) notation
and two old style notations [[...]] or [...] which allow integer comparisons )
- String comparison operators (
Double square bracket conditional expression )
- File test operators they can be iether unary like
-f /etc/resolv.conf or binary. They be used in both double square bracket
(preferable), of single square bracket conditional expression ).
One note about bash: as bash is a derivative of Borne shell, it inherited all the mess that was emanating
from it, so in some respect it is even worse then ksh93. The developers has neither clue, no the
courage to create a better shell and slide into the easy road of re-implementing all the quirks of Borne
shell plus some. The following three rules (you can call them Softpanorama rules ;-) can help to avoid at least some Gotchas:
- For string comparisons one should use "double square parenthesis" type of comparison
- For integers comparisons one should use only double round brackets comparison
- Generally you should avoid using "single square parenthesis" type of comparison.
Among most annoying Gotchas related to comparison operators one can mention
Macrosubstitutions caveats and
Quoting caveats. There is also less frequent "Implicit type conversions caveat" when
you expect the operand to be numeric, but it is a sting. The later generally should be classified
as programming mistakes.
Tips:
- Replacing problematic commands with echo is a handy debugging trick that can help to
locate the problem
- Using set -xv statement in the part of the script that has problems is another useful debugging trick that
might help to locate the nature of the problem
- The shell reads in an entire compound statement, such as if statement of loop ( for , while , etc.) before it executes
any commands in the loop. So any errors in such statements are reported as belonging to the
initial like of the statement or even earlier then that.
You can compare integers is shell in three different ways: using new style ((...)) notation
and two old style notations [[...]] or [...]. Only new style ((..)) notation can be recommended for
this purpose.
Here is the full annotated list of available possibilities:
- Double round brackets conditionals This is a newer
construct and it is the best of three available. They are covered in in detail in
Arithmetic Expressions in BASH
- Double square bracket conditionals This
is older construct that currently makes sense only for string comparisons.
- Single square brackets conditionals. This
is the oldest construct that does not make sense anymore for integer comparisons.
In its turn Double round brackets conditionals can
be classified into two types:
- Those which use dollar sign in front of variables
- Those that do not use dollar sign in front of variable
To preserve sanity, it is better to use only "double round bracket comparison" without using dollar
sign prefix before variables. This flavor of the "double round bracket" construct is supported
in bash and all implementations of ksh93 in major Unixes, so it is perfectly portable (now there is
no Unix which does not supply bash as one of the shells). Here
is short summary of operators for "double round bracket comparison":
- < is less than (within double parentheses). For example, (( a <
b ))
- <= is less than or equal to (within double parentheses) (( a <= b ))
- > is greater than (within double parentheses) (( a > b ))
- >= is greater than or equal to (within double parentheses) (( a >= b ))
In addition, omitting "$" sign before variables used in this statement helps to avoid some stupid legacy behaviors
that plague shell due to its way too long historical development path (macrosubstitution of the
values of the variable before syntax analyses phase) .
Tips:
- Never use old-style shell comparison constructs (single square
bracket comparison construct and double square bracket construct) unless absolutely necessary. The
"double round bracket" construct is much safer -- it contain far less Gotchas and is portable between
bash and ksh88/ksh93.
- Like any other comparison it can be used as a statement, which allows to imitate if-then
construct using && operator, for example:
(( no_files == 0 )) && echo "There is not files to process"
You often need to deal with old scripts that use either [[...]] for arithmetic comparisons (
clearly ((..)) now should be used in all cases), or even older [..] construct for integer
comparisons, which in comparison with [[ ]] has additional Gotchas. With legacy [...]
construct the situation is pretty simple -- you should convert it at least to [[..]] (which can be
quite easy done programmatically) and in case of arithmetic comparison to ((...)), which also can be
done programmatically but requires more efforts to program the conversion correctly. With [[..]]
performing operations on integers the situation
is slightly more complex and in large scripts, and it is too much trouble to convert those construct
into more modern ((..)) construct if they use complex conditions. Unless you plan to modernize the
script they might be left "as is" for now. Like Talleyrand advised to young diplomats, and this is
applicable to shell programmers too: "first and foremost, not too much zeal".
One unpleasant reality (and probably the most common gotcha) of using legacy integer comparison constructs
is that if one of the variable is not initialized, it produces syntax error. Of
course
this is better then run-time error that is not detected (if the variable was un-initialized by
mistake), but still it is nuisance.
If you encounter such a situation then
you need to look deeper and determine "what's wrong" in the legacy scripts, because
typically this error is not the only one. They tend to cluster. Along with conversion to (( ))
and [[ ]] construct you can also use two classic tricks are used to
suppress this nasty macro substitution side effect without going into in-depth analysis of logic (
please note that ((..)) arithmetic conditional expression do not have this effect if you do
not use dollar sigh in the name of the variables):
- Put variables in double quotes, if they were "naked" variables ("quoting trick"). For example
if [[ "$a" -eq "$b" ]]... Double square brackets conditional
expression treats all operators as if they are enclosed in double quotes, and this is its
distinct
advantage over single square bracket conditional expression.
- Prefix quoted variables with 0 for integer comparisons and Z for string comparisons. For example, use [ "0$a" -eq "0$b" ]
instead of [ "$a" -eq "$b" ]
- Try to initialize offending variables explicitly using shell capability to provide the
default value of a variable (
${VARIABLE:-default}
${VARIABLE:=default}
constructs)
Please note that you can always simply change "legacy integer comparison operators" to
"normal" ((...)) arithmetic conditional expression in most cases. Here is the list for your reference:
- -eq is equal to if [[ "$a" -eq "$b" ]]
can be changed to (( a = b ))
- -ne is not equal to if [[ "$a" -ne "$b" ]]
can be changed to (( a != b ))
- -gt is greater than if [[ "$a" -gt "$b" ]]
can be changed to (( a > b ))
- -ge is greater than or equal to if [[ "$a" -ge "$b" ]]
can be changed to (( a >= b ))
- -lt is less than if [[ "$a" -lt "$b" ]]
can be changed to (( a < b ))
- -le is less than or equal to if [[ "$a" -le "$b" ]]
can be changed to (( a <= b ))
My recommendation is to convert all questionable cases where you suspect that the variable can
be undefined change the expression to the double round bracket notation ((...)),
unless the script is way too complex, or your use of it is a one time event.
You can perform simple integer math using shell constructs. Simply enclose the particular arithmetic
expression between a "$((" and a "))", and bash will evaluate the expression. Here are some examples:
$ echo $(( 100 / 3 ))
33
$ myvar="56"
$ echo $(( $myvar + 12 ))
68
$ echo $(( $myvar - $myvar ))
0
$ myvar=$(( $myvar + 1 ))
$ echo $myvar
57
If variable is used with leading dollar sign in double round bracket comparison construct and it is
not initialized it will be substituted as null string like in old constructs and cause syntax errors.
That's why we recommended remove dollar sign from all variables in double round bracket comparison
construct.Here is a small test that illustrated the error (note, variable $a
is not
initialized)
# cat macrotest.sh
if (( $a == 4 )) ; then
echo this is what macro substitution is about
fi
Results of execution in bash:
bash -xv macrotest.sh
if (( $a == 4 )) ; then
echo this is what macro substitution is about
fi
+ (( == 4 ))
mactotest.sh: line 1: ((: == 4 : syntax error: operand expected (error token is "== 4 ")
If we remove leading $ from variable $a bash behaves pretty reasonably:
# bash -xv mactotest.sh
if (( a == 4 )) ; then
echo this is what macro substitution is about
fi
+ (( a == 4 ))
#
If we replace constant 4 with constant zero, then statement will be executed as we would expect from
any decent scripting language:
# bash -xv mactotest.sh
if (( a == 0 )) ; then
echo this is what macro substitution is about
fi
+ (( a == 0 ))
+ echo this is what macro substitution is about
this is what macro substitution is about
#
You can avoid such a behaviors in older constructs by initializing variable beforehand. Two methods
which we already mentioned above can be used to detect uninitialized variable:
-
Conditional with operator -z (see below):
[[ -z "$a" ]] && $a=0
- Operator ${foo:-bar} either directly in comparison or beforehand:
if [[ ${a:-0} -gt 0 ]] ...
or
$a=${a:-0}
Here is small test that illustrates behavior of uninitialized variables on double parenthesis
comparison
cat "double_round_paren_test.sh"
if [[ -z "$a" ]] ; then
echo variable '$a' is not defined
fi
if (( a == 0 )) ; then
echo "this is how to avoid macro substitution pitfall"
fi
Results of execution in ksh produces an error "a == 0 : bad number" but bash is OK:
export PATH
+ [[ -z ]]
+ echo variable $a is not defined
variable $a is not defined
+ let a == 0
double_round_paren_test.sh[4]: a == 0 : bad number
String comparison operations in shell are discussed in
Double square bracket conditionals in greater
detail. Here we will provide just a brief overview:
They suffer from the same macro substitution Gotchas as integer operators, but it is far more common
as it accrues not only when string in not initialized but also if its length is zero. "macro substitution
Gotchas" lead to run time syntax error. To avoid it the "double quotes" trick is used: you need
to put variables inside double quotes literal, which converts any type to string. Here is the summary
of most typical operators used
- = is equal to. For example if [[ "$a" = "$b" ]] ...
- == matches. Pattern is expected as a right side operand and should be "naked"
-- without double or single quotes surrounding it (quote string is considered to be a literal and
operator became test for string equality). For example if [[ "$a" == a*b ]] ...
Here a*b is a pattern but in if [[ "$a" == "a*b" ]] ... it is
a string. Here are additional examples from Advanced Bash guide:
[[ $a == z* ]] # true if $a starts with an "z" (pattern matching)
[[ "$a" == "z*" ]] # true if $a is equal to z*
[ "$a" == z* ] # file globbing and word splitting take place
[ "$a" == "z*" ] # true if $a is equal to z*
|
- != is not equal to if [[ "$a" != "$b" ]]
This operator uses pattern matching
within a
[[ ... ]] construct.
- < is less than, in ASCII alphabetical order
if [[ "$a" < "$b" ]]
if [ "$a" \< "$b" ]
This is an old gotcha, which is another argument that should never use legacy single square
bracket conditionals in your scripts: the "<" needs to be escaped within a [ ] construct.
- > is greater than, in ASCII alphabetical order
if [[ "$a" > "$b" ]]
if [ "$a" \> "$b" ]
Note similar gotcha: the ">" needs to be escaped within a [ ] construct.
You can test the string for being null or uninitialized using -z operator (opposite operator
is -n string is not "null".
Sometimes, there a particular comparison might work is some cases even if you used wrong operator
(wrong type, for example string comparison instead of integer). For example, the
following two snippets of code the snippet that use string comparison function identically to the
snippet that uses integer comparison, if the integer does not have leading zeros, but can get you an
error otherwise:if [[ "$myvar" -eq 3 ] ; then
echo "myvar equals 3"
fi
if [[ "$myvar" = "3" ] ; then
echo "myvar equals 3"
fi
this tells you something about danger of shell :-). Note that the first snippet uses arithmetic comparison
operators, while the second uses string comparison operators.
The other kind of operator that can be used in conditional expressions checks if a file has certain
properties. There are 21 such operators.
The Gotchas is that not all of them are present in all shells. So if you run the same script both
in bash and ksh you can get into very unpleasant situation. The typical offending test is -e which is
present in bash but not in ksh installed by default in major commercial Unixes. Use -f instead: it is
more portable way of achieving the same result in all shells.
Here are some operators provided:
Unary
- -a, -e file file exists
- -d file file is a directory
- -f file file is a regular file (i.e., not a directory or other special type of
file)
- -r file (Readable by you) You have read permission on file
- -s file (Not empty) file exists and is not empty
- -w file (Writable by you) You have write permission on file
- -x file You have execute permission on file, or directory search permission if
it is a directory
- -O file You own file
- -G file Your group ID is the same as that of file
- -h file The file is actually a symbolic link (also -L)
- -N file Has been modified since last being read.
Binary
- file1 -nt file2 file1 is newer than file2
- file1 -ot file2 file1 is older than file2
f1 -ef f2
- file1 -ef file2 files f1 and f2
are hard links to the same file
More complete coverage is provided by Advanced Bash-Scripting Guide, Prev Chapter 7. Tests (
File test operators)
7.2. File test operators
Returns true if...
- -e
- file exists
- -a
- file exists
This is identical in effect to -e. It has been "deprecated,"
[1] and its
use is discouraged.
- -f
- file is a regular file (not a directory or
device file)
- -s
- file is not zero size
- -d
- file is a directory
- -b
- file is a
block device
- -c
- file is a
character device
device0="/dev/sda2" # / (root directory)
if [ -b "$device0" ]
then
echo "$device0 is a block device."
fi
# /dev/sda2 is a block device.
device1="/dev/ttyS1" # PCMCIA modem card.
if [ -c "$device1" ]
then
echo "$device1 is a character device."
fi
# /dev/ttyS1 is a character device.
|
- -p
- file is a
pipe
function show_input_type()
{
[ -p /dev/fd/0 ] && echo PIPE || echo STDIN
}
show_input_type "Input" # STDIN
echo "Input" | show_input_type # PIPE
# This example courtesy of Carl Anderson.
|
- -h
- file is a
symbolic link
- -L
- file is a symbolic link
- -S
- file is a
socket
- -t
- file (descriptor)
is associated with a terminal device
This test option
may be used
to check whether the stdin [ -t 0 ] or stdout [ -t
1 ] in a given script is a terminal.
- -r
- file has read permission (for the user running the test)
- -w
- file has write permission (for the user running the test)
- -x
- file has execute permission (for the user running the test)
- -g
- set-group-id (sgid) flag set on file or directory
If a directory has the sgid
flag set, then a file created within that directory belongs to the group that owns the directory,
not necessarily to the group of the user who created the file. This may be useful for a directory
shared by a workgroup.
- -u
-
set-user-id (suid) flag set on file
A binary owned by root with set-user-id flag set runs with root
privileges, even when an ordinary user invokes it.
[2] This is
useful for executables (such as pppd and cdrecord) that need to access system hardware.
Lacking the suid flag, these binaries could not be invoked by a non-root user.
-rwsr-xr-t 1 root 178236 Oct 2 2000 /usr/sbin/pppd
|
A file with the suid flag set shows an s in its permissions.
- -k
- sticky bit set
Commonly known as the sticky bit, the save-text-mode
flag is a special type of file permission. If a file has this flag set, that file will be kept
in cache memory, for quicker access.
[3] If set
on a directory, it restricts write permission. Setting the sticky bit adds a t to the permissions
on the file or directory listing. This restricts altering or deleting specific files in that directory
to the owner of those files.
drwxrwxrwt 7 root 1024 May 19 21:26 tmp/
|
If a user does not own a directory that has the sticky bit set, but has write permission in
that directory, she can only delete those files that she owns in it. This keeps users from inadvertently
overwriting or deleting each other's files in a publicly accessible directory, such as /tmp.
(The owner of the directory or root can, of course, delete or rename files there.)
- -O
- you are owner of file
- -G
- group-id of file same as yours
- -N
- file modified since it was last read
- f1 -nt f2
- file f1 is newer than f2
- f1 -ot f2
- file f1 is older than f2
- f1 -ef f2
- files f1 and f2 are hard links to the same file
- !
- "not" -- reverses the sense of the tests above (returns true if condition absent).
Example 7-4. Testing for broken links
#!/bin/bash
# broken-link.sh
# Written by Lee bigelow <[email protected]>
# Used in ABS Guide with permission.
# A pure shell script to find dead symlinks and output them quoted
#+ so they can be fed to xargs and dealt with :)
#+ eg. sh broken-link.sh /somedir /someotherdir|xargs rm
#
# This, however, is a better method:
#
# find "somedir" -type l -print0|\
# xargs -r0 file|\
# grep "broken symbolic"|
# sed -e 's/^\|: *broken symbolic.*$/"/g'
#
#+ but that wouldn't be pure Bash, now would it.
# Caution: beware the /proc file system and any circular links!
################################################################
# If no args are passed to the script set directories-to-search
#+ to current directory. Otherwise set the directories-to-search
#+ to the args passed.
######################
[ $# -eq 0 ] && directorys=`pwd` || directorys=$@
# Setup the function linkchk to check the directory it is passed
#+ for files that are links and don't exist, then print them quoted.
# If one of the elements in the directory is a subdirectory then
#+ send that subdirectory to the linkcheck function.
##########
linkchk () {
for element in $1/*; do
[ -h "$element" -a ! -e "$element" ] && echo \"$element\"
[ -d "$element" ] && linkchk $element
# Of course, '-h' tests for symbolic link, '-d' for directory.
done
}
# Send each arg that was passed to the script to the linkchk() function
#+ if it is a valid directoy. If not, then print the error message
#+ and usage info.
##################
for directory in $directorys; do
if [ -d $directory ]
then linkchk $directory
else
echo "$directory is not a directory"
echo "Usage: $0 dir1 dir2 ..."
fi
done
exit $?
|
Example 31-1,
Example 11-8,
Example 11-3,
Example 31-3, and
Example
A-1 also illustrate uses of the file test operators.
Notes
[1] |
Per the 1913 edition of Webster's Dictionary:
Deprecate
...
To pray against, as an evil;
to seek to avert by prayer;
to desire the removal of;
to seek deliverance from;
to express deep regret for;
to disapprove of strongly.
|
|
[2] |
Be aware that suid binaries may open security
holes. The suid flag has no effect on shell scripts. |
[3] |
On Linux systems, the sticky bit is no longer used
for files, only on directories. |
Most of the time, while you can omit the use of double quotes surrounding strings and string variables,
it's not a good idea. Why? Because your code will work perfectly, unless an environment variable
happens to have a space or a tab in it, in which case shell go bananas. Here's an example
of a this Gotchas:
if [ $mydate = "Sep 11, 2010" ] ; then
echo "yes, today is September 11 again"
fi
In the above example, if mydate equals "foo", the code will work as expected and will not print anything.
However, if mydate equals "Sep 11, 2010", the code will fail with the following error:
[: too many arguments
In this case, the spaces in "$mydate" (which equals "Sep 11, 2010") end up confusing bash
as it perform syntax analysis of this construct after substituting the value of variable. After bash
expands "$mydate" using its stupid macro substitution mechanism (completely unnecessary in this case) it
ends up with the following comparison:
[ Sep 11, 2010 = "Sep 11, 2010" ]
That's why typically shell string comparisons surround the string arguments with double-quotes.
Here's how the "Sep 11, 2010" comparison should have been written (of
couse we should use [[...]] double square bracket notation instead):
if [[ "$mydate" = "Sep 11, 2010" ]] ; then
echo "yes, today is September 11 again"
fi
If you want your environment variables to be expanded, you must enclose them in double quotes,
rather than single quotes. Single quotes disable variable (as well as history) expansion.
Compound conditionals are discussed at Compound conditionals
at more detail. Here we will provide a brief overview:
In single square bracket conditional expressions you can use && or || -- you should use -a and -o.
Also in this type of conditional expression, even quoting the string variable might not suffice.
[ -n "$string" -o "$a" = "$b" ]
may cause an error with some versions of Bash if $string
is empty. The only way to avoid this is to used stupid trick of
appending an extra character inside double quotes,
For example:
[ "x$string" != x -o "x$a" = "x$b" ]
In case of integer comparisons you can use leading zero
Legacy logical operations supported by shell id single quarte bracket expressions are as
following:
- -a "logical and": exp1 -a exp2 returns true if
both exp1 and exp2 are true.
- -o "logical or": exp1 -o exp2 returns true
if either exp1 or exp2 are true.
Generally they should be replaced by double square bracket expression and && and ||. You
can also use && and || between two two conditional expressions. For
example, you can chain double square brackets conditions with arithmetic conditions:
[[ condition1 ]] && (( condition2 ))
- 20210523 : The Bash String Operators, by Kevin Sookocheff ( May 23, 2021 , sookocheff.com )
- 20210510 : Split a String in Bash ( May 10, 2021 , www.xmodulo.com )
- 20210510 : How to manipulate strings in bash ( May 10, 2021 , www.xmodulo.com )
- 20210510 : String Operators - Learning the bash Shell, Second Edition ( May 10, 2021 , www.oreilly.com )
- 20210510 : Concatenating Strings with the += Operator ( May 10, 2021 , linuxize.com )
- 20210510 : String Operators (Korn Shell) - Daniel Han's Technical Notes ( May 10, 2021 , sites.google.com )
- 20190911 : string - Extract substring in Bash - Stack Overflow ( Sep 11, 2019 , stackoverflow.com )
- 20190908 : How to replace spaces in file names using a bash script ( Sep 08, 2019 , stackoverflow.com )
- 20190325 : Concatenating Strings with the += Operator ( Mar 25, 2019 , linuxize.com )
- 20190129 : Split string into an array in Bash ( May 14, 2012 , stackoverflow.com )
- 20181108 : How to split one string into multiple variables in bash shell? [duplicate] ( Nov 08, 2018 , stackoverflow.com )
- 20181108 : How to split a string in shell and get the last field ( Nov 08, 2018 , stackoverflow.com )
- 20181108 : How do I split a string on a delimiter in Bash? ( Nov 08, 2018 , stackoverflow.com )
- 20171020 : Simple logical operators in Bash - Stack Overflow ( Oct 20, 2017 , stackoverflow.com )
- 20171020 : Linux tip Bash test and comparison functions by Ian Shields ( Linux tip Bash test and comparison functions, )
- 20171020 : Related topics ( )
- 20161104 : Coding Style rear-rear Wiki ( Nov 04, 2016 , github.com )
- 20110313 : Linux tip Bash test and comparison functions ( Linux tip Bash test and comparison functions, Mar 13, 2011 )
- 20110313 : Advanced Bash Scripting Guide by M. Leo Cooper ( Advanced Bash Scripting Guide, )
- 20110313 : Linux tip Bash test and comparison functions ( Linux tip Bash test and comparison functions, )
- 20110313 : List Constructs ( softpanorama.org, )
The Bash String Operators Posted on December 11, 2014 | 3 minutes | Kevin Sookocheff
A common task in bash programming is to manipulate portions of a string and return the result. bash provides rich
support for these manipulations via string operators. The syntax is not always intuitive so I wanted to use this blog post to serve
as a permanent reminder of the operators.
The string operators are signified with the ${}
notation. The operations can be grouped in to a few classes. Each
heading in this article describes a class of operation.
Substring Extraction
Extract from a position
1
${string:position}
Extraction returns a substring of string
starting at position
and ending at the end of string
. string
is treated as an array of characters starting at 0.
1
2
3
4
5
> string="hello world"
> echo ${string:1}
ello world
> echo ${string:6}
world
Extract from a position with a length
${string:position:length}
Adding a length returns a substring only as long as the length
parameter.
> string="hello world"
> echo ${string:1:2}
el
> echo ${string:6:3}
wor
Substring Removal
Remove shortest starting match
${variable#pattern}
If variable
starts with pattern
, delete the shortest part that matches the pattern.
> string="hello world, hello jim"
> echo ${string#*hello}
world, hello jim
Remove longest starting match
${variable##pattern}
If variable
starts with pattern
, delete the longest match from variable
and return the rest.
> string="hello world, hello jim"
> echo ${string##*hello}
jim
Remove shortest ending match
${variable%pattern}
If variable
ends with pattern
, delete the shortest match from the end of variable
and
return the rest.
> string="hello world, hello jim"
> echo ${string%hello*}
hello world,
Remove longest ending match
${variable%%pattern}
If variable
ends with pattern
, delete the longest match from the end of variable
and return
the rest.
> string="hello world, hello jim"
> echo ${string%%hello*}
Substring Replacement
Replace first occurrence of word
${variable/pattern/string}
Find the first occurrence of pattern
in variable
and replace it with string
. If
string
is null, pattern
is deleted from variable
. If pattern
starts with #
, the match must occur at the beginning of variable
. If pattern
starts with %
, the match
must occur at the end of the variable
.
> string="hello world, hello jim"
> echo ${string/hello/goodbye}
goodbye world, hello jim
Replace all occurrences of word
${variable//pattern/string}
Same as above but finds all occurrences of pattern
in variable
and replace them with string
. If string
is null, pattern
is deleted from variable
.
> string="hello world, hello jim"
> echo ${string//hello/goodbye}
goodbye world, goodbye jim
See also
bash
When you need to split a string in bash, you can use bash's built-in read
command. This command reads a single line of string from stdin, and splits the string on a
delimiter. The split elements are then stored in either an array or separate variables supplied
with the read
command. The default delimiter is whitespace characters (' ', '\t',
'\r', '\n'). If you want to split a string on a custom delimiter, you can specify the delimiter
in IFS
variable before calling read
.
# strings to split
var1="Harry Samantha Bart Amy"
var2="green:orange:black:purple"
# split a string by one or more whitespaces, and store the result in an array
read -a my_array <<< $var1
# iterate the array to access individual split words
for elem in "${my_array[@]}"; do
echo $elem
done
echo "----------"
# split a string by a custom delimter
IFS=':' read -a my_array2 <<< $var2
for elem in "${my_array2[@]}"; do
echo $elem
done
Harry
Samantha
Bart
Amy
----------
green
orange
black
purple
Remove a Trailing Newline Character from a String in Bash
If you want to remove a trailing newline or carriage return character from a string, you can
use the bash's parameter expansion in the following form.
${string%$var}
This expression implies that if the "string" contains a trailing character stored in "var",
the result of the expression will become the "string" without the character. For example:
# input string with a trailing newline character
input_line=$'This is my example line\n'
# define a trailing character. For carriage return, replace it with $'\r'
character=$'\n'
echo -e "($input_line)"
# remove a trailing newline character
input_line=${input_line%$character}
echo -e "($input_line)"
(This is my example line
)
(This is my example line)
Trim Leading/Trailing Whitespaces from a String in Bash
If you want to remove whitespaces at the beginning or at the end of a string (also known as
leading/trailing whitespaces) from a string, you can use sed
command.
my_str=" This is my example string "
# original string with leading/trailing whitespaces
echo -e "($my_str)"
# trim leading whitespaces in a string
my_str=$(echo "$my_str" | sed -e "s/^[[:space:]]*//")
echo -e "($my_str)"
# trim trailing whitespaces in a string
my_str=$(echo "$my_str" | sed -e "s/[[:space:]]*$//")
echo -e "($my_str)"
( This is my example string )
(This is my example string ) ← leading whitespaces removed
(This is my example string) ← trailing whitespaces removed
If you want to stick with bash's built-in mechanisms, the following bash function can get
the job done.
trim() {
local var="$*"
# remove leading whitespace characters
var="${var#"${var%%[![:space:]]*}"}"
# remove trailing whitespace characters
var="${var%"${var##*[![:space:]]}"}"
echo "$var"
}
my_str=" This is my example string "
echo "($my_str)"
my_str=$(trim $my_str)
echo "($my_str)"
Table 4-1. Substitution Operators
Operator |
Substitution |
$ { varname :- word } |
If varname exists and isn't null, return its value; otherwise return
word .
|
Purpose : |
Returning a default value if the variable is undefined.
|
Example : |
${count:-0} evaluates to 0 if count is undefined.
|
$ { varname := word } |
If varname exists and isn't null, return its value; otherwise set it to
word and then return its value. Positional and special parameters cannot be
assigned this way.
|
Purpose : |
Setting a variable to a default value if it is undefined.
|
Example : |
$ {count := 0} sets count to 0 if it is undefined.
|
$ { varname :? message } |
If varname exists and isn't null, return its value; otherwise print
varname : followed by message , and abort the current command or script
(non-interactive shells only). Omitting message produces the default message
parameter null or not set .
|
Purpose : |
Catching errors that result from variables being undefined.
|
Example : |
{count :?" undefined! " } prints "count: undefined!" and exits if count is
undefined.
|
$ { varname : + word } |
If varname exists and isn't null, return word ; otherwise return
null.
|
Purpose : |
Testing for the existence of a variable.
|
Example : |
$ {count :+ 1} returns 1 (which could mean "true") if count is defined.
|
$ { varname : offset } |
$ { varname : offset : length } |
Performs substring expansion. a It returns the substring of $
varname starting at offset and up to length characters. The first
character in $ varname is position 0. If length is omitted, the substring
starts at offset and continues to the end of $ varname . If offset
is less than 0 then the position is taken from the end of $ varname . If
varname is @ , the length is the number of positional parameters starting
at parameter offset .
|
Purpose : |
Returning parts of a string (substrings or slices ).
|
Example : |
If count is set to frogfootman , $ {count :4} returns footman . $
{count :4:4} returns foot .
|
[
52 ]
Table 4-2. Pattern-Matching Operators
Operator |
Meaning |
$ { variable # pattern } |
If the pattern matches the beginning of the variable's value, delete the shortest
part that matches and return the rest.
|
$ { variable ## pattern } |
If the pattern matches the beginning of the variable's value, delete the longest
part that matches and return the rest.
|
$ { variable % pattern } |
If the pattern matches the end of the variable's value, delete the shortest part
that matches and return the rest.
|
$ { variable %% pattern } |
If the pattern matches the end of the variable's value, delete the longest part that
matches and return the rest.
|
$ { variable / pattern / string } |
$ { variable // pattern / string } |
The longest match to pattern in variable is replaced by string
. In the first form, only the first match is replaced. In the second form, all matches
are replaced. If the pattern is begins with a # , it must match at the start of the
variable. If it begins with a % , it must match with the end of the variable. If
string is null, the matches are deleted. If variable is @ or * , the
operation is applied to each positional parameter in turn and the expansion is the
resultant list. a
|
Another way of concatenating strings in bash is by appending variables or literal strings to
a variable using the +=
operator:
VAR1="Hello, "
VAR1+=" World"
echo "$VAR1"
Copy
Hello, World
Copy
The following example is using the +=
operator to concatenate strings in
bash for loop
:
languages.sh
VAR=""
for ELEMENT in 'Hydrogen' 'Helium' 'Lithium' 'Beryllium'; do
VAR+="${ELEMENT} "
done
echo "$VAR"
Copy
Hydrogen Helium Lithium Beryllium
4.3 String Operators
The curly-bracket syntax allows for the shell's string operators . String operators
allow you to manipulate values of variables in various useful ways without having to write
full-blown programs or resort to external UNIX utilities. You can do a lot with string-handling
operators even if you haven't yet mastered the programming features we'll see in later
chapters.
In particular, string operators let you do the following:
- Ensure that variables exist (i.e., are defined and have non-null values)
- Set default values for variables
- Catch errors that result from variables not being set
- Remove portions of variables' values that match patterns
4.3.1 Syntax of String
Operators
The basic idea behind the syntax of string operators is that special characters that denote
operations are inserted between the variable's name and the right curly brackets. Any argument
that the operator may need is inserted to the operator's right.
The first group of string-handling operators tests for the existence of variables and allows
substitutions of default values under certain conditions. These are listed in Table
4.1 . [6]
[6] The colon ( :
) in each of these operators is actually optional. If the
colon is omitted, then change "exists and isn't null" to "exists" in each definition, i.e.,
the operator tests for existence only.
Table 4.1: Substitution Operators
Operator |
Substitution |
${ varname :- word } |
If varname exists and isn't null, return its value; otherwise return word
. |
Purpose : |
Returning a default value if the variable is undefined. |
Example : |
${count:-0} evaluates to 0 if count is undefined. |
${ varname := word} |
If varname exists and isn't null, return its value; otherwise set it to
word and then return its value.[7] |
Purpose : |
Setting a variable to a default value if it is undefined. |
Example : |
$ {count:=0} sets count to 0 if it is undefined. |
${ varname :? message } |
If varname exists and isn't null, return its value; otherwise print
varname : followed by message , and abort the current command or
script. Omitting message produces the default message parameter null or not
set . |
Purpose : |
Catching errors that result from variables being undefined. |
Example : |
{count :?" undefined! " } prints
"count: undefined!" and exits if count is undefined. |
${ varname :+ word } |
If varname exists and isn't null, return word ; otherwise return
null. |
Purpose : |
Testing for the existence of a variable. |
Example : |
${count:+1} returns 1 (which could mean "true") if count is defined. |
[7] Pascal, Modula, and Ada programmers may find it helpful to recognize the similarity of
this to the assignment operators in those languages.
The first two of these operators are ideal for setting defaults for command-line arguments
in case the user omits them. We'll use the first one in our first programming task.
Task
4.1
You have a large album collection, and you want to write some software to keep track of
it. Assume that you have a file of data on how many albums you have by each artist. Lines in
the file look like this:
14 Bach, J.S.
1 Balachander, S.
21 Beatles
6 Blakey, Art
Write a program that prints the N highest lines, i.e., the N artists by whom
you have the most albums. The default for N should be 10. The program should take one
argument for the name of the input file and an optional second argument for how many lines to
print.
By far the best approach to this type of script is to use built-in UNIX utilities, combining
them with I/O redirectors and pipes. This is the classic "building-block" philosophy of UNIX
that is another reason for its great popularity with programmers. The building-block technique
lets us write a first version of the script that is only one line long:
sort -nr $1 | head -${2:-10}
Here is how this works: the sort (1) program sorts the data in the file whose name is
given as the first argument ( $1 ). The -n option tells sort to interpret
the first word on each line as a number (instead of as a character string); the -r tells
it to reverse the comparisons, so as to sort in descending order.
The output of sort is piped into the head (1) utility, which, when given the
argument - N , prints the first N lines of its input on the standard
output. The expression -${2:-10} evaluates to a dash ( - ) followed by the second
argument if it is given, or to -10 if it's not; notice that the variable in this expression is
2 , which is the second positional parameter.
Assume the script we want to write is called highest . Then if the user types
highest myfile , the line that actually runs is:
sort -nr myfile | head -10
Or if the user types highest myfile 22 , the line that runs is:
sort -nr myfile | head -22
Make sure you understand how the :- string operator provides a default value.
This is a perfectly good, runnable script-but it has a few problems. First, its one line is
a bit cryptic. While this isn't much of a problem for such a tiny script, it's not wise to
write long, elaborate scripts in this manner. A few minor changes will make the code more
readable.
First, we can add comments to the code; anything between # and the end of a line is a
comment. At a minimum, the script should start with a few comment lines that indicate what the
script does and what arguments it accepts. Second, we can improve the variable names by
assigning the values of the positional parameters to regular variables with mnemonic names.
Finally, we can add blank lines to space things out; blank lines, like comments, are ignored.
Here is a more readable version:
#
# highest filename [howmany]
#
# Print howmany highest-numbered lines in file filename.
# The input file is assumed to have lines that start with
# numbers. Default for howmany is 10.
#
filename=$1
howmany=${2:-10}
sort -nr $filename | head -$howmany
The square brackets around howmany in the comments adhere to the convention in UNIX
documentation that square brackets denote optional arguments.
The changes we just made improve the code's readability but not how it runs. What if the
user were to invoke the script without any arguments? Remember that positional parameters
default to null if they aren't defined. If there are no arguments, then $1 and $2
are both null. The variable howmany ( $2 ) is set up to default to 10, but there
is no default for filename ( $1 ). The result would be that this command
runs:
sort -nr | head -10
As it happens, if sort is called without a filename argument, it expects input to
come from standard input, e.g., a pipe (|) or a user's terminal. Since it doesn't have the
pipe, it will expect the terminal. This means that the script will appear to hang! Although you
could always type [CTRL-D] or [CTRL-C] to get out of the script, a naive
user might not know this.
Therefore we need to make sure that the user supplies at least one argument. There are a few
ways of doing this; one of them involves another string operator. We'll replace the line:
filename=$1
with:
filename=${1:?"filename missing."}
This will cause two things to happen if a user invokes the script without any arguments:
first the shell will print the somewhat unfortunate message:
highest: 1: filename missing.
to the standard error output. Second, the script will exit without running the remaining
code.
With a somewhat "kludgy" modification, we can get a slightly better error message. Consider
this code:
filename=$1
filename=${filename:?"missing."}
This results in the message:
highest: filename: missing.
(Make sure you understand why.) Of course, there are ways of printing whatever message is
desired; we'll find out how in Chapter 5 .
Before we move on, we'll look more closely at the two remaining operators in Table
4.1 and see how we can incorporate them into our task solution. The := operator does
roughly the same thing as :- , except that it has the "side effect" of setting the value
of the variable to the given word if the variable doesn't exist.
Therefore we would like to use := in our script in place of :- , but we can't;
we'd be trying to set the value of a positional parameter, which is not allowed. But if we
replaced:
howmany=${2:-10}
with just:
howmany=$2
and moved the substitution down to the actual command line (as we did at the start), then we
could use the := operator:
sort -nr $filename | head -${howmany:=10}
Using := has the added benefit of setting the value of howmany to 10 in case
we need it afterwards in later versions of the script.
The final substitution operator is :+ . Here is how we can use it in our example:
Let's say we want to give the user the option of adding a header line to the script's output.
If he or she types the option -h , then the output will be preceded by the line:
ALBUMS ARTIST
Assume further that this option ends up in the variable header , i.e., $header
is -h if the option is set or null if not. (Later we will see how to do this without
disturbing the other positional parameters.)
The expression:
${header:+"ALBUMS ARTIST\n"}
yields null if the variable header is null, or ALBUMS══ARTIST\n if
it is non-null. This means that we can put the line:
print -n ${header:+"ALBUMS ARTIST\n"}
right before the command line that does the actual work. The -n option to
print causes it not to print a LINEFEED after printing its arguments. Therefore
this print statement will print nothing-not even a blank line-if header is null;
otherwise it will print the header line and a LINEFEED (\n).
4.3.2 Patterns and Regular Expressions
We'll continue refining our solution to Task 4-1 later in this chapter. The next type of
string operator is used to match portions of a variable's string value against patterns
. Patterns, as we saw in Chapter 1 are strings that can
contain wildcard characters ( *
, ?
, and [] for character
sets and ranges).
Wildcards have been standard features of all UNIX shells going back (at least) to the
Version 6 Bourne shell. But the Korn shell is the first shell to add to their capabilities. It
adds a set of operators, called regular expression (or regexp for short)
operators, that give it much of the string-matching power of advanced UNIX utilities like
awk (1), egrep (1) (extended grep (1)) and the emacs editor, albeit
with a different syntax. These capabilities go beyond those that you may be used to in other
UNIX utilities like grep , sed (1) and vi (1).
Advanced UNIX users will find the Korn shell's regular expression capabilities occasionally
useful for script writing, although they border on overkill. (Part of the problem is the
inevitable syntactic clash with the shell's myriad other special characters.) Therefore we
won't go into great detail about regular expressions here. For more comprehensive information,
the "last word" on practical regular expressions in UNIX is sed & awk , an O'Reilly
Nutshell Handbook by Dale Dougherty. If you are already comfortable with awk or
egrep , you may want to skip the following introductory section and go to "Korn Shell
Versus awk/egrep Regular Expressions" below, where we explain the shell's regular expression
mechanism by comparing it with the syntax used in those two utilities. Otherwise, read
on.
4.3.2.1 Regular expression
basics
Think of regular expressions as strings that match patterns more powerfully than the
standard shell wildcard schema. Regular expressions began as an idea in theoretical computer
science, but they have found their way into many nooks and crannies of everyday, practical
computing. The syntax used to represent them may vary, but the concepts are very much the
same.
A shell regular expression can contain regular characters, standard wildcard characters, and
additional operators that are more powerful than wildcards. Each such operator has the form
x ( exp ) , where x is the particular operator and exp is
any regular expression (often simply a regular string). The operator determines how many
occurrences of exp a string that matches the pattern can contain. See Table 4.2 and
Table 4.3 .
Table 4.2: Regular Expression Operators
Operator |
Meaning |
* ( exp ) |
0 or more occurrences of exp |
+ ( exp ) |
1 or more occurrences of exp |
? ( exp ) |
0 or 1 occurrences of exp |
@ ( exp1 | exp2 |...) |
exp1 or exp2 or... |
! ( exp ) |
Anything that doesn't match exp [8] |
[8] Actually, !( exp ) is not a regular expression operator by the standard
technical definition, though it is a handy extension.
Regular expressions are extremely useful when dealing with arbitrary text, as you already
know if you have used grep or the regular-expression capabilities of any UNIX editor.
They aren't nearly as useful for matching filenames and other simple types of information with
which shell users typically work. Furthermore, most things you can do with the shell's regular
expression operators can also be done (though possibly with more keystrokes and less
efficiency) by piping the output of a shell command through grep or egrep .
Nevertheless, here are a few examples of how shell regular expressions can solve
filename-listing problems. Some of these will come in handy in later chapters as pieces of
solutions to larger tasks.
- The emacs editor supports customization files whose names end in .el (for
Emacs LISP) or .elc (for Emacs LISP Compiled). List all emacs customization
files in the current directory.
- In a directory of C source code, list all files that are not necessary. Assume that
"necessary" files end in .c or .h , or are named Makefile or
README .
- Filenames in the VAX/VMS operating system end in a semicolon followed by a version
number, e.g., fred.bob;23 . List all VAX/VMS-style filenames in the current
directory.
Here are the solutions:
- In the first of these, we are looking for files that end in .el with an optional
c . The expression that matches this is
*
.el ?
(c)
.
- The second example depends on the four standard subexpressions
*
.c ,
*
.h , Makefile , and README . The entire expression is
!( *
.c| *
.h|Makefile|README) , which matches
anything that does not match any of the four possibilities.
- The solution to the third example starts with
*
\ ;
:
the shell wildcard *
followed by a backslash-escaped semicolon. Then, we could
use the regular expression +([0-9]) , which matches one or more characters in the
range [0-9] , i.e., one or more digits. This is almost correct (and probably close
enough), but it doesn't take into account that the first digit cannot be 0. Therefore the
correct expression is *
\;[1-9] *
([0-9]) , which
matches anything that ends with a semicolon, a digit from 1 to 9, and zero or more
digits from 0 to 9.
Regular expression operators are an interesting addition to the Korn shell's features, but
you can get along well without them-even if you intend to do a substantial amount of shell
programming.
In our opinion, the shell's authors missed an opportunity to build into the wildcard
mechanism the ability to match files by type (regular, directory, executable, etc., as
in some of the conditional tests we will see in Chapter 5 ) as well as by name
component. We feel that shell programmers would have found this more useful than arcane regular
expression operators.
The following section compares Korn shell regular expressions to analogous features in
awk and egrep . If you aren't familiar with these, skip to the section entitled
"Pattern-matching Operators."
4.3.2.2 Korn shell versus awk/egrep regular expressions
Table 4.4 is an
expansion of Table 4.2 : the
middle column shows the equivalents in awk / egrep of the shell's regular
expression operators.
Table 4.4: Shell Versus egrep/awk Regular
Expression Operators
Korn Shell |
egrep/awk |
Meaning |
* ( exp ) |
exp * |
0 or more occurrences of exp |
+( exp ) |
exp + |
1 or more occurrences of exp |
? ( exp ) |
exp ? |
0 or 1 occurrences of exp |
@( exp1 | exp2 |...) |
exp1 | exp2 |... |
exp1 or exp2 or... |
! ( exp ) |
(none) |
Anything that doesn't match exp |
These equivalents are close but not quite exact. Actually, an exp within any of the
Korn shell operators can be a series of exp1 | exp2 |... alternates. But because
the shell would interpret an expression like dave|fred|bob as a pipeline of commands,
you must use @(dave|fred|bob) for alternates by themselves.
For example:
- @(dave|fred|bob) matches dave , fred , or bob .
*
(dave|fred|bob) means, "0 or more occurrences of dave ,
fred , or bob ". This expression matches strings like the null string,
dave , davedave , fred , bobfred , bobbobdavefredbobfred ,
etc.
- +(dave|fred|bob) matches any of the above except the null string.
- ?(dave|fred|bob) matches the null string, dave , fred , or
bob .
- !(dave|fred|bob) matches anything except dave , fred , or bob
.
It is worth re-emphasizing that shell regular expressions can still contain standard shell
wildcards. Thus, the shell wildcard ? (match any single character) is the equivalent to
. in egrep or awk , and the shell's character set operator [ ...
] is the same as in those utilities. [9] For example, the expression +([0-9])
matches a number, i.e., one or more digits. The shell wildcard character *
is
equivalent to the shell regular expression *
( ?)
.
[9] And, for that matter, the same as in grep , sed , ed , vi
, etc.
A few egrep and awk regexp operators do not have equivalents in the Korn
shell. These include:
- The beginning- and end-of-line operators ^ and $ .
- The beginning- and end-of-word operators \< and \> .
- Repeat factors like \{ N \} and \{ M , N
\} .
The first two pairs are hardly necessary, since the Korn shell doesn't normally operate on
text files and does parse strings into words itself.
4.3.3 Pattern-matching Operators
Table 4.5 lists the
Korn shell's pattern-matching operators.
Table 4.5: Pattern-matching Operators
Operator |
Meaning |
$ { variable # pattern } |
If the pattern matches the beginning of the variable's value, delete the shortest part
that matches and return the rest. |
$ { variable ## pattern } |
If the pattern matches the beginning of the variable's value, delete the longest part
that matches and return the rest. |
$ { variable % pattern } |
If the pattern matches the end of the variable's value, delete the shortest part that
matches and return the rest. |
$ { variable %% pattern } |
If the pattern matches the end of the variable's value, delete the longest part that
matches and return the rest. |
These can be hard to remember, so here's a handy mnemonic device: # matches the front
because number signs precede numbers; % matches the rear because percent signs
follow numbers.
The classic use for pattern-matching operators is in stripping off components of pathnames,
such as directory prefixes and filename suffixes. With that in mind, here is an example that
shows how all of the operators work. Assume that the variable path has the value
/home /billr/mem/long.file.name ; then:
Expression Result
${path##/*/} long.file.name
${path#/*/} billr/mem/long.file.name
$path /home/billr/mem/long.file.name
${path%.*} /home/billr/mem/long.file
${path%%.*} /home/billr/mem/long
The two patterns used here are /*/
, which matches anything between two
slashes, and . *
, which matches a dot followed by anything.
We will incorporate one of these operators into our next programming task.
Task
4.2
You are writing a C compiler, and you want to use the Korn shell for your
front-end.[10]
[10] Don't laugh-many UNIX compilers have shell scripts as front-ends.
Think of a C compiler as a pipeline of data processing components. C source code is input to
the beginning of the pipeline, and object code comes out of the end; there are several steps in
between. The shell script's task, among many other things, is to control the flow of data
through the components and to designate output files.
You need to write the part of the script that takes the name of the input C source file and
creates from it the name of the output object code file. That is, you must take a filename
ending in .c and create a filename that is similar except that it ends in .o
.
The task at hand is to strip the .c off the filename and append .o . A single
shell statement will do it:
objname=${filename%.c}.o
This tells the shell to look at the end of filename for .c . If there is a
match, return $filename with the match deleted. So if filename had the value
fred.c , the expression ${filename%.c} would return fred . The .o
is appended to make the desired fred.o , which is stored in the variable objname
.
If filename had an inappropriate value (without .c ) such as fred.a ,
the above expression would evaluate to fred.a.o : since there was no match, nothing is
deleted from the value of filename , and .o is appended anyway. And, if
filename contained more than one dot-e.g., if it were the y.tab.c that is so
infamous among compiler writers-the expression would still produce the desired y.tab.o .
Notice that this would not be true if we used %% in the expression instead of % .
The former operator uses the longest match instead of the shortest, so it would match
.tab.o and evaluate to y.o rather than y.tab.o . So the single % is
correct in this case.
A longest-match deletion would be preferable, however, in the following task.
Task
4.3
You are implementing a filter that prepares a text file for printer output. You want to
put the file's name-without any directory prefix-on the "banner" page. Assume that, in your
script, you have the pathname of the file to be printed stored in the variable
pathname .
Clearly the objective is to remove the directory prefix from the pathname. The following
line will do it:
bannername=${pathname##*/}
This solution is similar to the first line in the examples shown before. If pathname
were just a filename, the pattern *
/ (anything followed by a slash) would
not match and the value of the expression would be pathname untouched. If
pathname were something like fred/bob , the prefix fred/ would match the
pattern and be deleted, leaving just bob as the expression's value. The same thing would
happen if pathname were something like /dave/pete/fred/bob : since the ##
deletes the longest match, it deletes the entire /dave/pete/fred/ .
If we used # *
/ instead of ## *
/ ,
the expression would have the incorrect value dave/pete/fred/bob , because the shortest
instance of "anything followed by a slash" at the beginning of the string is just a slash (
/ ).
The construct $ { variable ## *
/} is actually
equivalent to the UNIX utility basename (1). basename takes a pathname as
argument and returns the filename only; it is meant to be used with the shell's command
substitution mechanism (see below). basename is less efficient than $ {
variable ##/ *
} because it runs in its own separate process
rather than within the shell. Another utility, dirname (1), does essentially the
opposite of basename : it returns the directory prefix only. It is equivalent to the
Korn shell expression $ { variable %/ *
} and is less
efficient for the same reason.
4.3.4
Length Operator
There are two remaining operators on variables. One is $ {# varname }, which
returns the length of the value of the variable as a character string. (In Chapter 6 we will see how to
treat this and similar values as actual numbers so they can be used in arithmetic expressions.)
For example, if filename has the value fred.c , then ${#filename} would
have the value 6 . The other operator ( $ {# array [ *
]} ) has to do with array variables, which are also discussed in Chapter 6 .
http://docstore.mik.ua/orelly/unix2.1/ksh/ch04_03.htm
Jeff
,May 8 at 18:30
Given a filename in the form
someletters_12345_moreleters.ext
, I want to extract the 5
digits and put them into a variable.
So to emphasize the point, I have a filename with x number of
characters then a five digit sequence surrounded by a single underscore on either side then another
set of x number of characters. I want to take the 5 digit number and put that into a variable.
I am very interested in the number of different ways that this can be accomplished.
Berek
Bryan
,Jan 24, 2017 at 9:30
Use
cut
:
echo 'someletters_12345_moreleters.ext' | cut -d'_' -f 2
More generic:
INPUT='someletters_12345_moreleters.ext'
SUBSTRING=$(echo $INPUT| cut -d'_' -f 2)
echo $SUBSTRING
JB.
,Jan 6, 2015 at 10:13
If
x
is constant, the following parameter expansion performs substring extraction:
b=${a:12:5}
where
12
is the offset (zero-based) and
5
is the length
If the underscores around the digits are the only ones in the input, you can strip off the
prefix and suffix (respectively) in two steps:
tmp=${a#*_} # remove prefix ending in "_"
b=${tmp%_*} # remove suffix starting with "_"
If there are other underscores, it's probably feasible anyway, albeit more tricky. If anyone
knows how to perform both expansions in a single expression, I'd like to know too.
Both solutions presented are pure bash, with no process spawning involved, hence very fast.
A
Sahra
,Mar 16, 2017 at 6:27
Generic solution where the number can be anywhere in the filename, using the first of such
sequences:
number=$(echo $filename | egrep -o '[[:digit:]]{5}' | head -n1)
Another solution to extract exactly a part of a variable:
number=${filename:offset:length}
If your filename always have the format
stuff_digits_...
you can use awk:
number=$(echo $filename | awk -F _ '{ print $2 }')
Yet another solution to remove everything except digits, use
number=$(echo $filename | tr -cd '[[:digit:]]')
sshow
,Jul 27, 2017 at 17:22
In case someone wants more rigorous information, you can also search it in man bash like this
$ man bash [press return key]
/substring [press return key]
[press "n" key]
[press "n" key]
[press "n" key]
[press "n" key]
Result:
${parameter:offset}
${parameter:offset:length}
Substring Expansion. Expands to up to length characters of
parameter starting at the character specified by offset. If
length is omitted, expands to the substring of parameter start‐
ing at the character specified by offset. length and offset are
arithmetic expressions (see ARITHMETIC EVALUATION below). If
offset evaluates to a number less than zero, the value is used
as an offset from the end of the value of parameter. Arithmetic
expressions starting with a - must be separated by whitespace
from the preceding : to be distinguished from the Use Default
Values expansion. If length evaluates to a number less than
zero, and parameter is not @ and not an indexed or associative
array, it is interpreted as an offset from the end of the value
of parameter rather than a number of characters, and the expan‐
sion is the characters between the two offsets. If parameter is
@, the result is length positional parameters beginning at off‐
set. If parameter is an indexed array name subscripted by @ or
*, the result is the length members of the array beginning with
${parameter[offset]}. A negative offset is taken relative to
one greater than the maximum index of the specified array. Sub‐
string expansion applied to an associative array produces unde‐
fined results. Note that a negative offset must be separated
from the colon by at least one space to avoid being confused
with the :- expansion. Substring indexing is zero-based unless
the positional parameters are used, in which case the indexing
starts at 1 by default. If offset is 0, and the positional
parameters are used, $0 is prefixed to the list.
Aleksandr
Levchuk
,Aug 29, 2011 at 5:51
Building on jor's answer (which doesn't work for me):
substring=$(expr "$filename" : '.*_\([^_]*\)_.*')
kayn
,Oct 5, 2015 at 8:48
I'm surprised this pure bash solution didn't come up:
a="someletters_12345_moreleters.ext"
IFS="_"
set $a
echo $2
# prints 12345
You probably want to reset IFS to what value it was before, or
unset IFS
afterwards!
zebediah49
,Jun 4 at 17:31
Here's how i'd do it:
FN=someletters_12345_moreleters.ext
[[ ${FN} =~ _([[:digit:]]{5})_ ]] && NUM=${BASH_REMATCH[1]}
Note: the above is a regular expression and is restricted to your specific scenario of five
digits surrounded by underscores. Change the regular expression if you need different matching.
TranslucentCloud
,Jun 16, 2014 at 13:27
Following the requirements
I have a filename with x number of characters then a five digit sequence surrounded by a
single underscore on either side then another set of x number of characters. I want to take
the 5 digit number and put that into a variable.
I found some
grep
ways that may be useful:
$ echo "someletters_12345_moreleters.ext" | grep -Eo "[[:digit:]]+"
12345
or better
$ echo "someletters_12345_moreleters.ext" | grep -Eo "[[:digit:]]{5}"
12345
And then with
-Po
syntax:
$ echo "someletters_12345_moreleters.ext" | grep -Po '(?<=_)\d+'
12345
Or if you want to make it fit exactly 5 characters:
$ echo "someletters_12345_moreleters.ext" | grep -Po '(?<=_)\d{5}'
12345
Finally, to make it be stored in a variable it is just need to use the
var=$(command)
syntax.
Darron
,Jan 9, 2009 at 16:13
Without any sub-processes you can:
shopt -s extglob
front=${input%%_+([a-zA-Z]).*}
digits=${front##+([a-zA-Z])_}
A very small variant of this will also work in ksh93.
user2350426
add a comment
,Aug 5, 2014 at 8:11
If we focus in the concept of:
"A run of (one or several) digits"
We could use several external tools to extract the numbers.
We could quite easily erase all other characters, either sed or tr:
name='someletters_12345_moreleters.ext'
echo $name | sed 's/[^0-9]*//g' # 12345
echo $name | tr -c -d 0-9 # 12345
But if $name contains several runs of numbers, the above will fail:
If "name=someletters_12345_moreleters_323_end.ext", then:
echo $name | sed 's/[^0-9]*//g' # 12345323
echo $name | tr -c -d 0-9 # 12345323
We need to use regular expresions (regex).
To select only the first run (12345 not 323) in sed and perl:
echo $name | sed 's/[^0-9]*\([0-9]\{1,\}\).*$/\1/'
perl -e 'my $name='$name';my ($num)=$name=~/(\d+)/;print "$num\n";'
But we could as well do it directly
in bash
(1)
:
regex=[^0-9]*([0-9]{1,}).*$; \
[[ $name =~ $regex ]] && echo ${BASH_REMATCH[1]}
This allows us to extract the FIRST run of digits of any length
surrounded by any other text/characters.
Note
:
regex=[^0-9]*([0-9]{5,5}).*$;
will match only exactly 5
digit runs. :-)
(1)
: faster than calling an external tool for each short texts. Not faster than
doing all processing inside sed or awk for large files.
codist
,May 6, 2011 at 12:50
Here's a prefix-suffix solution (similar to the solutions given by JB and Darron) that matches
the first block of digits and does not depend on the surrounding underscores:
str='someletters_12345_morele34ters.ext'
s1="${str#"${str%%[[:digit:]]*}"}" # strip off non-digit prefix from str
s2="${s1%%[^[:digit:]]*}" # strip off non-digit suffix from s1
echo "$s2" # 12345
Campa
,Oct 21, 2016 at 8:12
I love
sed
's capability to deal with regex groups:
> var="someletters_12345_moreletters.ext"
> digits=$( echo $var | sed "s/.*_\([0-9]\+\).*/\1/p" -n )
> echo $digits
12345
A slightly more general option would be
not
to assume that you have an
underscore
_
marking the start of your digits sequence, hence for instance stripping
off all non-numbers you get before your sequence:
s/[^0-9]\+\([0-9]\+\).*/\1/p
.
> man sed | grep s/regexp/replacement -A 2
s/regexp/replacement/
Attempt to match regexp against the pattern space. If successful, replace that portion matched with replacement. The replacement may contain the special character & to
refer to that portion of the pattern space which matched, and the special escapes \1 through \9 to refer to the corresponding matching sub-expressions in the regexp.
More on this, in case you're not too confident with regexps:
-
s
is for _s_ubstitute
-
[0-9]+
matches 1+ digits
-
\1
links to the group n.1 of the regex output (group 0 is the whole match,
group 1 is the match within parentheses in this case)
-
p
flag is for _p_rinting
All escapes
\
are there to make
sed
's regexp processing work.
Dan
Dascalescu
,May 8 at 18:28
Given test.txt is a file containing "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
cut -b19-20 test.txt > test1.txt # This will extract chars 19 & 20 "ST"
while read -r; do;
> x=$REPLY
> done < test1.txt
echo $x
ST
Alex Raj
Kaliamoorthy
,Jul 29, 2016 at 7:41
My answer will have more control on what you want out of your string. Here is the code on how you
can extract
12345
out of your string
str="someletters_12345_moreleters.ext"
str=${str#*_}
str=${str%_more*}
echo $str
This will be more efficient if you want to extract something that has any chars like
abc
or any special characters like
_
or
-
. For example: If your string is
like this and you want everything that is after
someletters_
and before
_moreleters.ext
:
str="someletters_123-45-24a&13b-1_moreleters.ext"
With my code you can mention what exactly you want. Explanation:
#*
It will remove the preceding string including the matching key. Here the key
we mentioned is
_
%
It will remove the following string including the
matching key. Here the key we mentioned is '_more*'
Do some experiments yourself and you would find this interesting.
Dan
Dascalescu
,May 8 at 18:27
similar to substr('abcdefg', 2-1, 3) in php:
echo 'abcdefg'|tail -c +2|head -c 3
olibre
,Nov 25, 2015 at 14:50
Ok, here goes pure Parameter Substitution with an empty string. Caveat is that I have defined
someletters
and
moreletters
as only characters. If they are
alphanumeric, this will not work as it is.
filename=someletters_12345_moreletters.ext
substring=${filename//@(+([a-z])_|_+([a-z]).*)}
echo $substring
12345
gniourf_gniourf
,Jun 4 at 17:33
There's also the bash builtin 'expr' command:
INPUT="someletters_12345_moreleters.ext"
SUBSTRING=`expr match "$INPUT" '.*_\([[:digit:]]*\)_.*' `
echo $SUBSTRING
russell
,Aug 1, 2013 at 8:12
A little late, but I just ran across this problem and found the following:
host:/tmp$ asd=someletters_12345_moreleters.ext
host:/tmp$ echo `expr $asd : '.*_\(.*\)_'`
12345
host:/tmp$
I used it to get millisecond resolution on an embedded system that does not have %N for date:
set `grep "now at" /proc/timer_list`
nano=$3
fraction=`expr $nano : '.*\(...\)......'`
$debug nano is $nano, fraction is $fraction
>
,Aug 5, 2018 at 17:13
A bash solution:
IFS="_" read -r x digs x <<<'someletters_12345_moreleters.ext'
This will clobber a variable called
x
. The var
x
could be changed to
the var
_
.
input='someletters_12345_moreleters.ext'
IFS="_" read -r _ digs _ <<<"$input"
Ask Question Asked 9
years, 4 months ago Active
2 months ago Viewed 226k times 238 127
Mark
Byers ,Apr 25, 2010 at 19:20
Can anyone recommend a safe solution to recursively replace spaces with underscores in file
and directory names starting from a given root directory? For example:
$ tree
.
|-- a dir
| `-- file with spaces.txt
`-- b dir
|-- another file with spaces.txt
`-- yet another file with spaces.pdf
becomes:
$ tree
.
|-- a_dir
| `-- file_with_spaces.txt
`-- b_dir
|-- another_file_with_spaces.txt
`-- yet_another_file_with_spaces.pdf
Jürgen
Hötzel ,Nov 4, 2015 at 3:03
Use rename
(aka prename
) which is a Perl script
which may be on
your system already. Do it in two steps:
find -name "* *" -type d | rename 's/ /_/g' # do the directories first
find -name "* *" -type f | rename 's/ /_/g'
Based on Jürgen's answer and able to handle multiple layers of files and directories
in a single bound using the "Revision 1.5 1998/12/18 16:16:31 rmb1" version of
/usr/bin/rename
(a Perl script):
find /tmp/ -depth -name "* *" -execdir rename 's/ /_/g' "{}" \;
oevna ,Jan
1, 2016 at 8:25
I use:
for f in *\ *; do mv "$f" "${f// /_}"; done
Though it's not recursive, it's quite fast and simple. I'm sure someone here could update
it to be recursive.
The ${f// /_}
part utilizes bash's parameter expansion mechanism to replace a
pattern within a parameter with supplied string. The relevant syntax is
${parameter/pattern/string}
. See: https://www.gnu.org/software/bash/manual/html_node/Shell-Parameter-Expansion.html
or http://wiki.bash-hackers.org/syntax/pe .
armandino ,Dec 3, 2013 at 20:51
find . -depth -name '* *' \
| while IFS= read -r f ; do mv -i "$f" "$(dirname "$f")/$(basename "$f"|tr ' ' _)" ; done
failed to get it right at first, because I didn't think of directories.
Edmund
Elmer ,Jul 3 at 7:12
you can use detox
by Doug Harple
detox -r <folder>
Dennis
Williamson ,Mar 22, 2012 at 20:33
A find/rename solution. rename is part of util-linux.
You need to descend depth first, because a whitespace filename can be part of a whitespace
directory:
find /tmp/ -depth -name "* *" -execdir rename " " "_" "{}" ";"
armandino ,Apr 26, 2010 at 11:49
bash 4.0
#!/bin/bash
shopt -s globstar
for file in **/*\ *
do
mv "$file" "${file// /_}"
done
Itamar
,Jan 31, 2013 at 21:27
you can use this:
find . -name '* *' | while read fname
do
new_fname=`echo $fname | tr " " "_"`
if [ -e $new_fname ]
then
echo "File $new_fname already exists. Not replacing $fname"
else
echo "Creating new file $new_fname to replace $fname"
mv "$fname" $new_fname
fi
done
yabt ,Apr
26, 2010 at 14:54
Here's a (quite verbose) find -exec solution which writes "file already exists" warnings to
stderr:
function trspace() {
declare dir name bname dname newname replace_char
[ $# -lt 1 -o $# -gt 2 ] && { echo "usage: trspace dir char"; return 1; }
dir="${1}"
replace_char="${2:-_}"
find "${dir}" -xdev -depth -name $'*[ \t\r\n\v\f]*' -exec bash -c '
for ((i=1; i<=$#; i++)); do
name="${@:i:1}"
dname="${name%/*}"
bname="${name##*/}"
newname="${dname}/${bname//[[:space:]]/${0}}"
if [[ -e "${newname}" ]]; then
echo "Warning: file already exists: ${newname}" 1>&2
else
mv "${name}" "${newname}"
fi
done
' "${replace_char}" '{}' +
}
trspace rootdir _
degi ,Aug
8, 2011 at 9:10
This one does a little bit more. I use it to rename my downloaded torrents (no special
characters (non-ASCII), spaces, multiple dots, etc.).
#!/usr/bin/perl
&rena(`find . -type d`);
&rena(`find . -type f`);
sub rena
{
($elems)=@_;
@t=split /\n/,$elems;
for $e (@t)
{
$_=$e;
# remove ./ of find
s/^\.\///;
# non ascii transliterate
tr [\200-\377][_];
tr [\000-\40][_];
# special characters we do not want in paths
s/[ \-\,\;\?\+\'\"\!\[\]\(\)\@\#]/_/g;
# multiple dots except for extension
while (/\..*\./)
{
s/\./_/;
}
# only one _ consecutive
s/_+/_/g;
next if ($_ eq $e ) or ("./$_" eq $e);
print "$e -> $_\n";
rename ($e,$_);
}
}
Junyeop
Lee ,Apr 10, 2018 at 9:44
Recursive version of Naidim's Answers.
find . -name "* *" | awk '{ print length, $0 }' | sort -nr -s | cut -d" " -f2- | while read f; do base=$(basename "$f"); newbase="${base// /_}"; mv "$(dirname "$f")/$(basename "$f")" "$(dirname "$f")/$newbase"; done
ghoti
,Dec 5, 2016 at 21:16
I found around this script, it may be interesting :)
IFS=$'\n';for f in `find .`; do file=$(echo $f | tr [:blank:] '_'); [ -e $f ] && [ ! -e $file ] && mv "$f" $file;done;unset IFS
ghoti
,Dec 5, 2016 at 21:17
Here's a reasonably sized bash script solution
#!/bin/bash
(
IFS=$'\n'
for y in $(ls $1)
do
mv $1/`echo $y | sed 's/ /\\ /g'` $1/`echo "$y" | sed 's/ /_/g'`
done
)
user1060059 ,Nov 22, 2011 at
15:15
This only finds files inside the current directory and renames them . I have this aliased.
find ./ -name "* *" -type f -d 1 | perl -ple '$file = $_; $file =~ s/\s+/_/g;
rename($_, $file);
Hongtao
,Sep 26, 2014 at 19:30
I just make one for my own purpose. You may can use it as reference.
#!/bin/bash
cd /vzwhome/c0cheh1/dev_source/UB_14_8
for file in *
do
echo $file
cd "/vzwhome/c0cheh1/dev_source/UB_14_8/$file/Configuration/$file"
echo "==> `pwd`"
for subfile in *\ *; do [ -d "$subfile" ] && ( mv "$subfile" "$(echo $subfile | sed -e 's/ /_/g')" ); done
ls
cd /vzwhome/c0cheh1/dev_source/UB_14_8
done
Marcos Jean Sampaio ,Dec
5, 2016 at 20:56
For files in folder named /files
for i in `IFS="";find /files -name *\ *`
do
echo $i
done > /tmp/list
while read line
do
mv "$line" `echo $line | sed 's/ /_/g'`
done < /tmp/list
rm /tmp/list
Muhammad Annaqeeb ,Sep 4,
2017 at 11:03
For those struggling through this using macOS, first install all the tools:
brew install tree findutils rename
Then when needed to rename, make an alias for GNU find (gfind) as find. Then run the code
of @Michel Krelin:
alias find=gfind
find . -depth -name '* *' \
| while IFS= read -r f ; do mv -i "$f" "$(dirname "$f")/$(basename "$f"|tr ' ' _)" ; done
https://acdn.adnxs.com/ib/static/usersync/v3/async_usersync.html
https://bh.contextweb.com/visitormatch
Concatenating
Strings with the += Operator
Another way of concatenating strings in bash is by appending variables or literal strings to
a variable using the +=
operator:
VAR1="Hello, "
VAR1+=" World"
echo "$VAR1"
Hello, World
The following example is using the +=
operator to concatenate strings in
bash for loop
:
languages.sh
VAR=""
for ELEMENT in 'Hydrogen' 'Helium' 'Lithium' 'Beryllium'; do
VAR+="${ELEMENT} "
done
echo "$VAR"
Lgn ,May 14, 2012 at 15:15
In a Bash script I would like to split a line into pieces and store them in an array.
The line:
Paris, France, Europe
I would like to have them in an array like this:
array[0] = Paris
array[1] = France
array[2] = Europe
I would like to use simple code, the command's speed doesn't matter. How can I do it?
antak ,Jun 18, 2018 at 9:22
This is #1 Google hit but there's controversy in the answer because the question unfortunately asks about delimiting on
,
(comma-space) and not a single character such as comma. If you're only interested in the latter, answers here
are easier to follow:
stackoverflow.com/questions/918886/
– antak
Jun 18 '18 at 9:22
Dennis Williamson ,May 14, 2012 at
15:16
IFS=', ' read -r -a array <<< "$string"
Note that the characters in $IFS
are treated individually as separators so that in this case fields may be separated
by either a comma or a space rather than the sequence of the two characters. Interestingly though, empty fields aren't
created when comma-space appears in the input because the space is treated specially.
To access an individual element:
echo "${array[0]}"
To iterate over the elements:
for element in "${array[@]}"
do
echo "$element"
done
To get both the index and the value:
for index in "${!array[@]}"
do
echo "$index ${array[index]}"
done
The last example is useful because Bash arrays are sparse. In other words, you can delete an element or add an element and
then the indices are not contiguous.
unset "array[1]"
array[42]=Earth
To get the number of elements in an array:
echo "${#array[@]}"
As mentioned above, arrays can be sparse so you shouldn't use the length to get the last element. Here's how you can in Bash
4.2 and later:
echo "${array[-1]}"
in any version of Bash (from somewhere after 2.05b):
echo "${array[@]: -1:1}"
Larger negative offsets select farther from the end of the array. Note the space before the minus sign in the older form. It
is required.
l0b0 ,May 14, 2012 at 15:24
Just use IFS=', '
, then you don't have to remove the spaces separately. Test: IFS=', ' read -a array <<< "Paris,
France, Europe"; echo "${array[@]}"
– l0b0
May 14 '12 at 15:24
Dennis Williamson ,May 14, 2012 at
16:33
@l0b0: Thanks. I don't know what I was thinking. I like to use declare -p array
for test output, by the way. –
Dennis Williamson
May 14 '12 at 16:33
Nathan Hyde ,Mar 16, 2013 at 21:09
@Dennis Williamson - Awesome, thorough answer. –
Nathan Hyde
Mar 16 '13 at 21:09
dsummersl ,Aug 9, 2013 at 14:06
MUCH better than multiple cut -f
calls! –
dsummersl
Aug 9 '13 at 14:06
caesarsol ,Oct 29, 2015 at 14:45
Warning: the IFS variable means split by one of these characters , so it's not a sequence of chars to split by. IFS=',
' read -a array <<< "a,d r s,w"
=> ${array[*]} == a d r s w
–
caesarsol
Oct 29 '15 at 14:45
Jim Ho ,Mar 14, 2013 at 2:20
Here is a way without setting IFS:
string="1:2:3:4:5"
set -f # avoid globbing (expansion of *).
array=(${string//:/ })
for i in "${!array[@]}"
do
echo "$i=>${array[i]}"
done
The idea is using string replacement:
${string//substring/replacement}
to replace all matches of $substring with white space and then using the substituted string to initialize a array:
(element1 element2 ... elementN)
Note: this answer makes use of the split+glob operator
. Thus, to prevent expansion of some characters (such as *
) it is a good idea to pause globbing for this script.
Werner Lehmann ,May 4, 2013 at 22:32
Used this approach... until I came across a long string to split. 100% CPU for more than a minute (then I killed it). It's a pity
because this method allows to split by a string, not some character in IFS. –
Werner Lehmann
May 4 '13 at 22:32
Dieter Gribnitz ,Sep 2, 2014 at 15:46
WARNING: Just ran into a problem with this approach. If you have an element named * you will get all the elements of your cwd
as well. thus string="1:2:3:4:*" will give some unexpected and possibly dangerous results depending on your implementation. Did
not get the same error with (IFS=', ' read -a array <<< "$string") and this one seems safe to use. –
Dieter Gribnitz
Sep 2 '14 at 15:46
akostadinov ,Nov 6, 2014 at 14:31
not reliable for many kinds of values, use with care –
akostadinov
Nov 6 '14 at 14:31
Andrew White ,Jun 1, 2016 at 11:44
quoting ${string//:/ }
prevents shell expansion –
Andrew White
Jun 1 '16 at 11:44
Mark Thomson ,Jun 5, 2016 at 20:44
I had to use the following on OSX: array=(${string//:/ })
–
Mark Thomson
Jun 5 '16 at 20:44
bgoldst ,Jul 19, 2017 at 21:20
All of the answers to this question are wrong in one way or another.
Wrong answer #1
IFS=', ' read -r -a array <<< "$string"
1: This is a misuse of $IFS
. The value of the $IFS
variable is not taken as a single variable-length
string separator, rather it is taken as a set of single-character string separators, where each field that
read
splits off from the input line can be terminated by any character in the set (comma or space, in this example).
Actually, for the real sticklers out there, the full meaning of $IFS
is slightly more involved. From the
bash manual
:
The shell treats each character of IFS as a delimiter, and splits the results of the other expansions into words using these
characters as field terminators. If IFS is unset, or its value is exactly <space><tab><newline> , the default, then sequences
of <space> , <tab> , and <newline> at the beginning and end of the results of the previous expansions are ignored, and any
sequence of IFS characters not at the beginning or end serves to delimit words. If IFS has a value other than the default,
then sequences of the whitespace characters <space> , <tab> , and <newline> are ignored at the beginning and end of the word,
as long as the whitespace character is in the value of IFS (an IFS whitespace character). Any character in IFS that is not
IFS whitespace, along with any adjacent IFS whitespace characters, delimits a field. A sequence of IFS whitespace characters
is also treated as a delimiter. If the value of IFS is null, no word splitting occurs.
Basically, for non-default non-null values of $IFS
, fields can be separated with either (1) a sequence of one
or more characters that are all from the set of "IFS whitespace characters" (that is, whichever of <space> , <tab> , and <newline>
("newline" meaning line feed (LF) ) are present anywhere in
$IFS
), or (2) any non-"IFS whitespace character" that's present in $IFS
along with whatever "IFS whitespace
characters" surround it in the input line.
For the OP, it's possible that the second separation mode I described in the previous paragraph is exactly what he wants for
his input string, but we can be pretty confident that the first separation mode I described is not correct at all. For example,
what if his input string was 'Los Angeles, United States, North America'
?
IFS=', ' read -ra a <<<'Los Angeles, United States, North America'; declare -p a;
## declare -a a=([0]="Los" [1]="Angeles" [2]="United" [3]="States" [4]="North" [5]="America")
2: Even if you were to use this solution with a single-character separator (such as a comma by itself, that is, with no following
space or other baggage), if the value of the $string
variable happens to contain any LFs, then read
will stop processing once it encounters the first LF. The read
builtin only processes one line per invocation. This
is true even if you are piping or redirecting input only to the read
statement, as we are doing in this example
with the here-string
mechanism, and thus unprocessed input is guaranteed to be lost. The code that powers the read
builtin has no knowledge
of the data flow within its containing command structure.
You could argue that this is unlikely to cause a problem, but still, it's a subtle hazard that should be avoided if possible.
It is caused by the fact that the read
builtin actually does two levels of input splitting: first into lines, then
into fields. Since the OP only wants one level of splitting, this usage of the read
builtin is not appropriate, and
we should avoid it.
3: A non-obvious potential issue with this solution is that read
always drops the trailing field if it is empty,
although it preserves empty fields otherwise. Here's a demo:
string=', , a, , b, c, , , '; IFS=', ' read -ra a <<<"$string"; declare -p a;
## declare -a a=([0]="" [1]="" [2]="a" [3]="" [4]="b" [5]="c" [6]="" [7]="")
Maybe the OP wouldn't care about this, but it's still a limitation worth knowing about. It reduces the robustness and generality
of the solution.
This problem can be solved by appending a dummy trailing delimiter to the input string just prior to feeding it to read
, as I will demonstrate later.
Wrong answer #2
string="1:2:3:4:5"
set -f # avoid globbing (expansion of *).
array=(${string//:/ })
Similar idea:
t="one,two,three"
a=($(echo $t | tr ',' "\n"))
(Note: I added the missing parentheses around the command substitution which the answerer seems to have omitted.)
Similar idea:
string="1,2,3,4"
array=(`echo $string | sed 's/,/\n/g'`)
These solutions leverage word splitting in an array assignment to split the string into fields. Funnily enough, just like
read
, general word splitting also uses the $IFS
special variable, although in this case it is implied
that it is set to its default value of <space><tab><newline> , and therefore any sequence of one or more IFS characters (which
are all whitespace characters now) is considered to be a field delimiter.
This solves the problem of two levels of splitting committed by read
, since word splitting by itself constitutes
only one level of splitting. But just as before, the problem here is that the individual fields in the input string can already
contain $IFS
characters, and thus they would be improperly split during the word splitting operation. This happens
to not be the case for any of the sample input strings provided by these answerers (how convenient...), but of course that doesn't
change the fact that any code base that used this idiom would then run the risk of blowing up if this assumption were ever violated
at some point down the line. Once again, consider my counterexample of 'Los Angeles, United States, North America'
(or 'Los Angeles:United States:North America'
).
Also, word splitting is normally followed by
filename
expansion ( aka pathname expansion aka globbing), which, if done, would potentially corrupt words containing
the characters *
, ?
, or [
followed by ]
(and, if extglob
is
set, parenthesized fragments preceded by ?
, *
, +
, @
, or !
) by matching them against file system objects and expanding the words ("globs") accordingly. The first of these three answerers
has cleverly undercut this problem by running set -f
beforehand to disable globbing. Technically this works (although
you should probably add set +f
afterward to reenable globbing for subsequent code which may depend on it), but it's
undesirable to have to mess with global shell settings in order to hack a basic string-to-array parsing operation in local code.
Another issue with this answer is that all empty fields will be lost. This may or may not be a problem, depending on the application.
Note: If you're going to use this solution, it's better to use the ${string//:/ }
"pattern substitution" form
of
parameter expansion , rather than going to the trouble of invoking a command substitution (which forks the shell), starting
up a pipeline, and running an external executable ( tr
or sed
), since parameter expansion is purely
a shell-internal operation. (Also, for the tr
and sed
solutions, the input variable should be double-quoted
inside the command substitution; otherwise word splitting would take effect in the echo
command and potentially mess
with the field values. Also, the $(...)
form of command substitution is preferable to the old `...`
form since it simplifies nesting of command substitutions and allows for better syntax highlighting by text editors.)
Wrong answer #3
str="a, b, c, d" # assuming there is a space after ',' as in Q
arr=(${str//,/}) # delete all occurrences of ','
This answer is almost the same as #2 . The difference is that the answerer has made the assumption that the fields are delimited
by two characters, one of which being represented in the default $IFS
, and the other not. He has solved this rather
specific case by removing the non-IFS-represented character using a pattern substitution expansion and then using word splitting
to split the fields on the surviving IFS-represented delimiter character.
This is not a very generic solution. Furthermore, it can be argued that the comma is really the "primary" delimiter character
here, and that stripping it and then depending on the space character for field splitting is simply wrong. Once again, consider
my counterexample: 'Los Angeles, United States, North America'
.
Also, again, filename expansion could corrupt the expanded words, but this can be prevented by temporarily disabling globbing
for the assignment with set -f
and then set +f
.
Also, again, all empty fields will be lost, which may or may not be a problem depending on the application.
Wrong answer #4
string='first line
second line
third line'
oldIFS="$IFS"
IFS='
'
IFS=${IFS:0:1} # this is useful to format your code with tabs
lines=( $string )
IFS="$oldIFS"
This is similar to #2 and #3 in that it uses word splitting to get the job done, only now the code explicitly sets $IFS
to contain only the single-character field delimiter present in the input string. It should be repeated that this cannot work
for multicharacter field delimiters such as the OP's comma-space delimiter. But for a single-character delimiter like the LF used
in this example, it actually comes close to being perfect. The fields cannot be unintentionally split in the middle as we saw
with previous wrong answers, and there is only one level of splitting, as required.
One problem is that filename expansion will corrupt affected words as described earlier, although once again this can be solved
by wrapping the critical statement in set -f
and set +f
.
Another potential problem is that, since LF qualifies as an "IFS whitespace character" as defined earlier, all empty fields
will be lost, just as in #2 and #3 . This would of course not be a problem if the delimiter happens to be a non-"IFS whitespace
character", and depending on the application it may not matter anyway, but it does vitiate the generality of the solution.
So, to sum up, assuming you have a one-character delimiter, and it is either a non-"IFS whitespace character" or you don't
care about empty fields, and you wrap the critical statement in set -f
and set +f
, then this solution
works, but otherwise not.
(Also, for information's sake, assigning a LF to a variable in bash can be done more easily with the $'...'
syntax,
e.g. IFS=$'\n';
.)
Wrong answer #5
countries='Paris, France, Europe'
OIFS="$IFS"
IFS=', ' array=($countries)
IFS="$OIFS"
Similar idea:
IFS=', ' eval 'array=($string)'
This solution is effectively a cross between #1 (in that it sets $IFS
to comma-space) and #2-4 (in that it uses
word splitting to split the string into fields). Because of this, it suffers from most of the problems that afflict all of the
above wrong answers, sort of like the worst of all worlds.
Also, regarding the second variant, it may seem like the eval
call is completely unnecessary, since its argument
is a single-quoted string literal, and therefore is statically known. But there's actually a very non-obvious benefit to using
eval
in this way. Normally, when you run a simple command which consists of a variable assignment only , meaning
without an actual command word following it, the assignment takes effect in the shell environment:
IFS=', '; ## changes $IFS in the shell environment
This is true even if the simple command involves multiple variable assignments; again, as long as there's no command
word, all variable assignments affect the shell environment:
IFS=', ' array=($countries); ## changes both $IFS and $array in the shell environment
But, if the variable assignment is attached to a command name (I like to call this a "prefix assignment") then it does not
affect the shell environment, and instead only affects the environment of the executed command, regardless whether it is a builtin
or external:
IFS=', ' :; ## : is a builtin command, the $IFS assignment does not outlive it
IFS=', ' env; ## env is an external command, the $IFS assignment does not outlive it
Relevant quote from the
bash manual :
If no command name results, the variable assignments affect the current shell environment. Otherwise, the variables are
added to the environment of the executed command and do not affect the current shell environment.
It is possible to exploit this feature of variable assignment to change $IFS
only temporarily, which allows us
to avoid the whole save-and-restore gambit like that which is being done with the $OIFS
variable in the first variant.
But the challenge we face here is that the command we need to run is itself a mere variable assignment, and hence it would not
involve a command word to make the $IFS
assignment temporary. You might think to yourself, well why not just add
a no-op command word to the statement like the
: builtin
to make the $IFS
assignment temporary? This does not work because it would then make the
$array
assignment temporary as well:
IFS=', ' array=($countries) :; ## fails; new $array value never escapes the : command
So, we're effectively at an impasse, a bit of a catch-22. But, when eval
runs its code, it runs it in the shell
environment, as if it was normal, static source code, and therefore we can run the $array
assignment inside the
eval
argument to have it take effect in the shell environment, while the $IFS
prefix assignment that
is prefixed to the eval
command will not outlive the eval
command. This is exactly the trick that is
being used in the second variant of this solution:
IFS=', ' eval 'array=($string)'; ## $IFS does not outlive the eval command, but $array does
So, as you can see, it's actually quite a clever trick, and accomplishes exactly what is required (at least with respect to
assignment effectation) in a rather non-obvious way. I'm actually not against this trick in general, despite the involvement of
eval
; just be careful to single-quote the argument string to guard against security threats.
But again, because of the "worst of all worlds" agglomeration of problems, this is still a wrong answer to the OP's requirement.
Wrong answer #6
IFS=', '; array=(Paris, France, Europe)
IFS=' ';declare -a array=(Paris France Europe)
Um... what? The OP has a string variable that needs to be parsed into an array. This "answer" starts with the verbatim contents
of the input string pasted into an array literal. I guess that's one way to do it.
It looks like the answerer may have assumed that the $IFS
variable affects all bash parsing in all contexts, which
is not true. From the bash manual:
IFS The Internal Field Separator that is used for word splitting after expansion and to split lines into words with the
read builtin command. The default value is <space><tab><newline> .
So the $IFS
special variable is actually only used in two contexts: (1) word splitting that is performed after
expansion (meaning not when parsing bash source code) and (2) for splitting input lines into words by the read
builtin.
Let me try to make this clearer. I think it might be good to draw a distinction between parsing and execution
. Bash must first parse the source code, which obviously is a parsing event, and then later it executes the
code, which is when expansion comes into the picture. Expansion is really an execution event. Furthermore, I take issue
with the description of the $IFS
variable that I just quoted above; rather than saying that word splitting is performed
after expansion , I would say that word splitting is performed during expansion, or, perhaps even more precisely,
word splitting is part of the expansion process. The phrase "word splitting" refers only to this step of expansion; it
should never be used to refer to the parsing of bash source code, although unfortunately the docs do seem to throw around the
words "split" and "words" a lot. Here's a relevant excerpt from the
linux.die.net version of the bash manual:
Expansion is performed on the command line after it has been split into words. There are seven kinds of expansion performed:
brace expansion , tilde expansion , parameter and variable expansion , command substitution ,
arithmetic expansion , word splitting , and pathname expansion .
The order of expansions is: brace expansion; tilde expansion, parameter and variable expansion, arithmetic expansion, and
command substitution (done in a left-to-right fashion); word splitting; and pathname expansion.
You could argue the
GNU version
of the manual does slightly better, since it opts for the word "tokens" instead of "words" in the first sentence of the Expansion
section:
Expansion is performed on the command line after it has been split into tokens.
The important point is, $IFS
does not change the way bash parses source code. Parsing of bash source code is actually
a very complex process that involves recognition of the various elements of shell grammar, such as command sequences, command
lists, pipelines, parameter expansions, arithmetic substitutions, and command substitutions. For the most part, the bash parsing
process cannot be altered by user-level actions like variable assignments (actually, there are some minor exceptions to this rule;
for example, see the various
compatxx
shell settings , which can change certain aspects of parsing behavior on-the-fly). The upstream "words"/"tokens" that result
from this complex parsing process are then expanded according to the general process of "expansion" as broken down in the above
documentation excerpts, where word splitting of the expanded (expanding?) text into downstream words is simply one step of that
process. Word splitting only touches text that has been spit out of a preceding expansion step; it does not affect literal text
that was parsed right off the source bytestream.
Wrong answer #7
string='first line
second line
third line'
while read -r line; do lines+=("$line"); done <<<"$string"
This is one of the best solutions. Notice that we're back to using read
. Didn't I say earlier that read
is inappropriate because it performs two levels of splitting, when we only need one? The trick here is that you can call
read
in such a way that it effectively only does one level of splitting, specifically by splitting off only one field per
invocation, which necessitates the cost of having to call it repeatedly in a loop. It's a bit of a sleight of hand, but it works.
But there are problems. First: When you provide at least one NAME argument to read
, it automatically ignores
leading and trailing whitespace in each field that is split off from the input string. This occurs whether $IFS
is
set to its default value or not, as described earlier in this post. Now, the OP may not care about this for his specific use-case,
and in fact, it may be a desirable feature of the parsing behavior. But not everyone who wants to parse a string into fields will
want this. There is a solution, however: A somewhat non-obvious usage of read
is to pass zero NAME arguments.
In this case, read
will store the entire input line that it gets from the input stream in a variable named
$REPLY
, and, as a bonus, it does not strip leading and trailing whitespace from the value. This is a very robust
usage of read
which I've exploited frequently in my shell programming career. Here's a demonstration of the difference
in behavior:
string=$' a b \n c d \n e f '; ## input string
a=(); while read -r line; do a+=("$line"); done <<<"$string"; declare -p a;
## declare -a a=([0]="a b" [1]="c d" [2]="e f") ## read trimmed surrounding whitespace
a=(); while read -r; do a+=("$REPLY"); done <<<"$string"; declare -p a;
## declare -a a=([0]=" a b " [1]=" c d " [2]=" e f ") ## no trimming
The second issue with this solution is that it does not actually address the case of a custom field separator, such as the
OP's comma-space. As before, multicharacter separators are not supported, which is an unfortunate limitation of this solution.
We could try to at least split on comma by specifying the separator to the -d
option, but look what happens:
string='Paris, France, Europe';
a=(); while read -rd,; do a+=("$REPLY"); done <<<"$string"; declare -p a;
## declare -a a=([0]="Paris" [1]=" France")
Predictably, the unaccounted surrounding whitespace got pulled into the field values, and hence this would have to be corrected
subsequently through trimming operations (this could also be done directly in the while-loop). But there's another obvious error:
Europe is missing! What happened to it? The answer is that read
returns a failing return code if it hits end-of-file
(in this case we can call it end-of-string) without encountering a final field terminator on the final field. This causes the
while-loop to break prematurely and we lose the final field.
Technically this same error afflicted the previous examples as well; the difference there is that the field separator was taken
to be LF, which is the default when you don't specify the -d
option, and the <<<
("here-string") mechanism
automatically appends a LF to the string just before it feeds it as input to the command. Hence, in those cases, we sort of
accidentally solved the problem of a dropped final field by unwittingly appending an additional dummy terminator to the input.
Let's call this solution the "dummy-terminator" solution. We can apply the dummy-terminator solution manually for any custom delimiter
by concatenating it against the input string ourselves when instantiating it in the here-string:
a=(); while read -rd,; do a+=("$REPLY"); done <<<"$string,"; declare -p a;
declare -a a=([0]="Paris" [1]=" France" [2]=" Europe")
There, problem solved. Another solution is to only break the while-loop if both (1) read
returned failure and
(2) $REPLY
is empty, meaning read
was not able to read any characters prior to hitting end-of-file.
Demo:
a=(); while read -rd,|| [[ -n "$REPLY" ]]; do a+=("$REPLY"); done <<<"$string"; declare -p a;
## declare -a a=([0]="Paris" [1]=" France" [2]=$' Europe\n')
This approach also reveals the secretive LF that automatically gets appended to the here-string by the <<<
redirection
operator. It could of course be stripped off separately through an explicit trimming operation as described a moment ago, but
obviously the manual dummy-terminator approach solves it directly, so we could just go with that. The manual dummy-terminator
solution is actually quite convenient in that it solves both of these two problems (the dropped-final-field problem and the appended-LF
problem) in one go.
So, overall, this is quite a powerful solution. It's only remaining weakness is a lack of support for multicharacter delimiters,
which I will address later.
Wrong answer #8
string='first line
second line
third line'
readarray -t lines <<<"$string"
(This is actually from the same post as #7 ; the answerer provided two solutions in the same post.)
The readarray
builtin, which is a synonym for mapfile
, is ideal. It's a builtin command which parses
a bytestream into an array variable in one shot; no messing with loops, conditionals, substitutions, or anything else. And it
doesn't surreptitiously strip any whitespace from the input string. And (if -O
is not given) it conveniently clears
the target array before assigning to it. But it's still not perfect, hence my criticism of it as a "wrong answer".
First, just to get this out of the way, note that, just like the behavior of read
when doing field-parsing,
readarray
drops the trailing field if it is empty. Again, this is probably not a concern for the OP, but it could
be for some use-cases. I'll come back to this in a moment.
Second, as before, it does not support multicharacter delimiters. I'll give a fix for this in a moment as well.
Third, the solution as written does not parse the OP's input string, and in fact, it cannot be used as-is to parse it. I'll
expand on this momentarily as well.
For the above reasons, I still consider this to be a "wrong answer" to the OP's question. Below I'll give what I consider to
be the right answer.
Right answer
Here's a naïve attempt to make #8 work by just specifying the -d
option:
string='Paris, France, Europe';
readarray -td, a <<<"$string"; declare -p a;
## declare -a a=([0]="Paris" [1]=" France" [2]=$' Europe\n')
We see the result is identical to the result we got from the double-conditional approach of the looping read
solution
discussed in #7 . We can almost solve this with the manual dummy-terminator trick:
readarray -td, a <<<"$string,"; declare -p a;
## declare -a a=([0]="Paris" [1]=" France" [2]=" Europe" [3]=$'\n')
The problem here is that readarray
preserved the trailing field, since the <<<
redirection operator
appended the LF to the input string, and therefore the trailing field was not empty (otherwise it would've been dropped).
We can take care of this by explicitly unsetting the final array element after-the-fact:
readarray -td, a <<<"$string,"; unset 'a[-1]'; declare -p a;
## declare -a a=([0]="Paris" [1]=" France" [2]=" Europe")
The only two problems that remain, which are actually related, are (1) the extraneous whitespace that needs to be trimmed,
and (2) the lack of support for multicharacter delimiters.
The whitespace could of course be trimmed afterward (for example, see
How to trim whitespace
from a Bash variable? ). But if we can hack a multicharacter delimiter, then that would solve both problems in one shot.
Unfortunately, there's no direct way to get a multicharacter delimiter to work. The best solution I've thought of is
to preprocess the input string to replace the multicharacter delimiter with a single-character delimiter that will be guaranteed
not to collide with the contents of the input string. The only character that has this guarantee is the
NUL byte . This is because, in bash (though not in
zsh, incidentally), variables cannot contain the NUL byte. This preprocessing step can be done inline in a process substitution.
Here's how to do it using awk :
readarray -td '' a < <(awk '{ gsub(/, /,"\0"); print; }' <<<"$string, "); unset 'a[-1]';
declare -p a;
## declare -a a=([0]="Paris" [1]="France" [2]="Europe")
There, finally! This solution will not erroneously split fields in the middle, will not cut out prematurely, will not drop
empty fields, will not corrupt itself on filename expansions, will not automatically strip leading and trailing whitespace, will
not leave a stowaway LF on the end, does not require loops, and does not settle for a single-character delimiter.
Trimming solution
Lastly, I wanted to demonstrate my own fairly intricate trimming solution using the obscure -C callback
option
of readarray
. Unfortunately, I've run out of room against Stack Overflow's draconian 30,000 character post limit,
so I won't be able to explain it. I'll leave that as an exercise for the reader.
function mfcb { local val="$4"; "$1"; eval "$2[$3]=\$val;"; };
function val_ltrim { if [[ "$val" =~ ^[[:space:]]+ ]]; then val="${val:${#BASH_REMATCH[0]}}"; fi; };
function val_rtrim { if [[ "$val" =~ [[:space:]]+$ ]]; then val="${val:0:${#val}-${#BASH_REMATCH[0]}}"; fi; };
function val_trim { val_ltrim; val_rtrim; };
readarray -c1 -C 'mfcb val_trim a' -td, <<<"$string,"; unset 'a[-1]'; declare -p a;
## declare -a a=([0]="Paris" [1]="France" [2]="Europe")
fbicknel ,Aug 18, 2017 at 15:57
It may also be helpful to note (though understandably you had no room to do so) that the -d
option to readarray
first appears in Bash 4.4. – fbicknel
Aug 18 '17 at 15:57
Cyril Duchon-Doris ,Nov 3, 2017
at 9:16
You should add a "TL;DR : scroll 3 pages to see the right solution at the end of my answer" –
Cyril Duchon-Doris
Nov 3 '17 at 9:16
dawg ,Nov 26, 2017 at 22:28
Great answer (+1). If you change your awk to awk '{ gsub(/,[ ]+|$/,"\0"); print }'
and eliminate that concatenation
of the final ", "
then you don't have to go through the gymnastics on eliminating the final record. So: readarray
-td '' a < <(awk '{ gsub(/,[ ]+/,"\0"); print; }' <<<"$string")
on Bash that supports readarray
. Note your
method is Bash 4.4+ I think because of the -d
in readarray
–
dawg
Nov 26 '17 at 22:28
datUser ,Feb 22, 2018 at 14:54
Looks like readarray
is not an available builtin on OSX. –
datUser
Feb 22 '18 at 14:54
bgoldst ,Feb 23, 2018 at 3:37
@datUser That's unfortunate. Your version of bash must be too old for readarray
. In this case, you can use the second-best
solution built on read
. I'm referring to this: a=(); while read -rd,; do a+=("$REPLY"); done <<<"$string,";
(with the awk
substitution if you need multicharacter delimiter support). Let me know if you run into any problems;
I'm pretty sure this solution should work on fairly old versions of bash, back to version 2-something, released like two decades
ago. – bgoldst
Feb 23 '18 at 3:37
Jmoney38 ,Jul 14, 2015 at 11:54
t="one,two,three"
a=($(echo "$t" | tr ',' '\n'))
echo "${a[2]}"
Prints three
shrimpwagon ,Oct 16, 2015 at 20:04
I actually prefer this approach. Simple. – shrimpwagon
Oct 16 '15 at 20:04
Ben ,Oct 31, 2015 at 3:11
I copied and pasted this and it did did not work with echo, but did work when I used it in a for loop. –
Ben
Oct 31 '15 at 3:11
Pinaki Mukherjee ,Nov 9, 2015 at
20:22
This is the simplest approach. thanks – Pinaki
Mukherjee
Nov 9 '15 at 20:22
abalter ,Aug 30, 2016 at 5:13
This does not work as stated. @Jmoney38 or shrimpwagon if you can paste this in a terminal and get the desired output, please
paste the result here. – abalter
Aug 30 '16 at 5:13
leaf ,Jul 17, 2017 at 16:28
@abalter Works for me with a=($(echo $t | tr ',' "\n"))
. Same result with a=($(echo $t | tr ',' ' '))
. – leaf
Jul 17 '17 at 16:28
Luca Borrione ,Nov 2, 2012 at 13:44
Sometimes it happened to me that the method described in the accepted answer didn't work, especially if the separator is a carriage
return.
In those cases I solved in this way:
string='first line
second line
third line'
oldIFS="$IFS"
IFS='
'
IFS=${IFS:0:1} # this is useful to format your code with tabs
lines=( $string )
IFS="$oldIFS"
for line in "${lines[@]}"
do
echo "--> $line"
done
Stefan van den Akker ,Feb 9,
2015 at 16:52
+1 This completely worked for me. I needed to put multiple strings, divided by a newline, into an array, and read -a arr
<<< "$strings"
did not work with IFS=$'\n'
. –
Stefan van den Akker
Feb 9 '15 at 16:52
Stefan van den Akker ,Feb 10,
2015 at 13:49
Here is the answer to make the accepted answer work when the delimiter is a newline . –
Stefan van den Akker
Feb 10 '15 at 13:49
,Jul 24, 2015 at 21:24
The accepted answer works for values in one line.
If the variable has several lines:
string='first line
second line
third line'
We need a very different command to get all lines:
while read -r line; do lines+=("$line"); done <<<"$string"
Or the much simpler bash readarray :
readarray -t lines <<<"$string"
Printing all lines is very easy taking advantage of a printf feature:
printf ">[%s]\n" "${lines[@]}"
>[first line]
>[ second line]
>[ third line]
Mayhem ,Dec 31, 2015 at 3:13
While not every solution works for every situation, your mention of readarray... replaced my last two hours with 5 minutes...
you got my vote – Mayhem
Dec 31 '15 at 3:13
Derek 朕會功夫 ,Mar 23, 2018 at 19:14
readarray
is the right answer. – Derek
朕會功夫
Mar 23 '18 at 19:14
ssanch ,Jun 3, 2016 at 15:24
This is similar to the approach by Jmoney38, but using sed:
string="1,2,3,4"
array=(`echo $string | sed 's/,/\n/g'`)
echo ${array[0]}
Prints 1
dawg ,Nov 26, 2017 at 19:59
The key to splitting your string into an array is the multi character delimiter of ", "
. Any solution using
IFS
for multi character delimiters is inherently wrong since IFS is a set of those characters, not a string.
If you assign IFS=", "
then the string will break on EITHER ","
OR " "
or any combination
of them which is not an accurate representation of the two character delimiter of ", "
.
You can use awk
or sed
to split the string, with process substitution:
#!/bin/bash
str="Paris, France, Europe"
array=()
while read -r -d $'\0' each; do # use a NUL terminated field separator
array+=("$each")
done < <(printf "%s" "$str" | awk '{ gsub(/,[ ]+|$/,"\0"); print }')
declare -p array
# declare -a array=([0]="Paris" [1]="France" [2]="Europe") output
It is more efficient to use a regex you directly in Bash:
#!/bin/bash
str="Paris, France, Europe"
array=()
while [[ $str =~ ([^,]+)(,[ ]+|$) ]]; do
array+=("${BASH_REMATCH[1]}") # capture the field
i=${#BASH_REMATCH} # length of field + delimiter
str=${str:i} # advance the string by that length
done # the loop deletes $str, so make a copy if needed
declare -p array
# declare -a array=([0]="Paris" [1]="France" [2]="Europe") output...
With the second form, there is no sub shell and it will be inherently faster.
Edit by bgoldst: Here are some benchmarks comparing my readarray
solution to dawg's regex solution, and I also
included the read
solution for the heck of it (note: I slightly modified the regex solution for greater harmony with
my solution) (also see my comments below the post):
## competitors
function c_readarray { readarray -td '' a < <(awk '{ gsub(/, /,"\0"); print; };' <<<"$1, "); unset 'a[-1]'; };
function c_read { a=(); local REPLY=''; while read -r -d ''; do a+=("$REPLY"); done < <(awk '{ gsub(/, /,"\0"); print; };' <<<"$1, "); };
function c_regex { a=(); local s="$1, "; while [[ $s =~ ([^,]+),\ ]]; do a+=("${BASH_REMATCH[1]}"); s=${s:${#BASH_REMATCH}}; done; };
## helper functions
function rep {
local -i i=-1;
for ((i = 0; i<$1; ++i)); do
printf %s "$2";
done;
}; ## end rep()
function testAll {
local funcs=();
local args=();
local func='';
local -i rc=-1;
while [[ "$1" != ':' ]]; do
func="$1";
if [[ ! "$func" =~ ^[_a-zA-Z][_a-zA-Z0-9]*$ ]]; then
echo "bad function name: $func" >&2;
return 2;
fi;
funcs+=("$func");
shift;
done;
shift;
args=("$@");
for func in "${funcs[@]}"; do
echo -n "$func ";
{ time $func "${args[@]}" >/dev/null 2>&1; } 2>&1| tr '\n' '/';
rc=${PIPESTATUS[0]}; if [[ $rc -ne 0 ]]; then echo "[$rc]"; else echo; fi;
done| column -ts/;
}; ## end testAll()
function makeStringToSplit {
local -i n=$1; ## number of fields
if [[ $n -lt 0 ]]; then echo "bad field count: $n" >&2; return 2; fi;
if [[ $n -eq 0 ]]; then
echo;
elif [[ $n -eq 1 ]]; then
echo 'first field';
elif [[ "$n" -eq 2 ]]; then
echo 'first field, last field';
else
echo "first field, $(rep $[$1-2] 'mid field, ')last field";
fi;
}; ## end makeStringToSplit()
function testAll_splitIntoArray {
local -i n=$1; ## number of fields in input string
local s='';
echo "===== $n field$(if [[ $n -ne 1 ]]; then echo 's'; fi;) =====";
s="$(makeStringToSplit "$n")";
testAll c_readarray c_read c_regex : "$s";
}; ## end testAll_splitIntoArray()
## results
testAll_splitIntoArray 1;
## ===== 1 field =====
## c_readarray real 0m0.067s user 0m0.000s sys 0m0.000s
## c_read real 0m0.064s user 0m0.000s sys 0m0.000s
## c_regex real 0m0.000s user 0m0.000s sys 0m0.000s
##
testAll_splitIntoArray 10;
## ===== 10 fields =====
## c_readarray real 0m0.067s user 0m0.000s sys 0m0.000s
## c_read real 0m0.064s user 0m0.000s sys 0m0.000s
## c_regex real 0m0.001s user 0m0.000s sys 0m0.000s
##
testAll_splitIntoArray 100;
## ===== 100 fields =====
## c_readarray real 0m0.069s user 0m0.000s sys 0m0.062s
## c_read real 0m0.065s user 0m0.000s sys 0m0.046s
## c_regex real 0m0.005s user 0m0.000s sys 0m0.000s
##
testAll_splitIntoArray 1000;
## ===== 1000 fields =====
## c_readarray real 0m0.084s user 0m0.031s sys 0m0.077s
## c_read real 0m0.092s user 0m0.031s sys 0m0.046s
## c_regex real 0m0.125s user 0m0.125s sys 0m0.000s
##
testAll_splitIntoArray 10000;
## ===== 10000 fields =====
## c_readarray real 0m0.209s user 0m0.093s sys 0m0.108s
## c_read real 0m0.333s user 0m0.234s sys 0m0.109s
## c_regex real 0m9.095s user 0m9.078s sys 0m0.000s
##
testAll_splitIntoArray 100000;
## ===== 100000 fields =====
## c_readarray real 0m1.460s user 0m0.326s sys 0m1.124s
## c_read real 0m2.780s user 0m1.686s sys 0m1.092s
## c_regex real 17m38.208s user 15m16.359s sys 2m19.375s
##
bgoldst ,Nov 27, 2017 at 4:28
Very cool solution! I never thought of using a loop on a regex match, nifty use of $BASH_REMATCH
. It works, and
does indeed avoid spawning subshells. +1 from me. However, by way of criticism, the regex itself is a little non-ideal, in that
it appears you were forced to duplicate part of the delimiter token (specifically the comma) so as to work around the lack of
support for non-greedy multipliers (also lookarounds) in ERE ("extended" regex flavor built into bash). This makes it a little
less generic and robust. – bgoldst
Nov 27 '17 at 4:28
bgoldst ,Nov 27, 2017 at 4:28
Secondly, I did some benchmarking, and although the performance is better than the other solutions for smallish strings, it worsens
exponentially due to the repeated string-rebuilding, becoming catastrophic for very large strings. See my edit to your answer.
– bgoldst
Nov 27 '17 at 4:28
dawg ,Nov 27, 2017 at 4:46
@bgoldst: What a cool benchmark! In defense of the regex, for 10's or 100's of thousands of fields (what the regex is splitting)
there would probably be some form of record (like \n
delimited text lines) comprising those fields so the catastrophic
slow-down would likely not occur. If you have a string with 100,000 fields -- maybe Bash is not ideal ;-) Thanks for the benchmark.
I learned a thing or two. – dawg
Nov 27 '17 at 4:46
Geoff Lee ,Mar 4, 2016 at 6:02
Try this
IFS=', '; array=(Paris, France, Europe)
for item in ${array[@]}; do echo $item; done
It's simple. If you want, you can also add a declare (and also remove the commas):
IFS=' ';declare -a array=(Paris France Europe)
The IFS is added to undo the above but it works without it in a fresh bash instance
MrPotatoHead ,Nov 13, 2018 at 13:19
Pure bash multi-character delimiter solution.
As others have pointed out in this thread, the OP's question gave an example of a comma delimited string to be parsed into
an array, but did not indicate if he/she was only interested in comma delimiters, single character delimiters, or multi-character
delimiters.
Since Google tends to rank this answer at or near the top of search results, I wanted to provide readers with a strong answer
to the question of multiple character delimiters, since that is also mentioned in at least one response.
If you're in search of a solution to a multi-character delimiter problem, I suggest reviewing
Mallikarjun M 's post, in particular the response
from gniourf_gniourf who provides this elegant
pure BASH solution using parameter expansion:
#!/bin/bash
str="LearnABCtoABCSplitABCaABCString"
delimiter=ABC
s=$str$delimiter
array=();
while [[ $s ]]; do
array+=( "${s%%"$delimiter"*}" );
s=${s#*"$delimiter"};
done;
declare -p array
Link to
cited comment/referenced post
Link to cited question:
Howto split a string on a multi-character delimiter in bash?
Eduardo Cuomo ,Dec 19, 2016 at 15:27
Use this:
countries='Paris, France, Europe'
OIFS="$IFS"
IFS=', ' array=($countries)
IFS="$OIFS"
#${array[1]} == Paris
#${array[2]} == France
#${array[3]} == Europe
gniourf_gniourf ,Dec 19, 2016 at
17:22
Bad: subject to word splitting and pathname expansion. Please don't revive old questions with good answers to give bad answers.
– gniourf_gniourf
Dec 19 '16 at 17:22
Scott Weldon ,Dec 19, 2016 at 18:12
This may be a bad answer, but it is still a valid answer. Flaggers / reviewers:
For incorrect answers such as this one, downvote, don't
delete! – Scott Weldon
Dec 19 '16 at 18:12
George Sovetov ,Dec 26, 2016 at 17:31
@gniourf_gniourf Could you please explain why it is a bad answer? I really don't understand when it fails. –
George Sovetov
Dec 26 '16 at 17:31
gniourf_gniourf ,Dec 26, 2016 at
18:07
@GeorgeSovetov: As I said, it's subject to word splitting and pathname expansion. More generally, splitting a string into an
array as array=( $string )
is a (sadly very common) antipattern: word splitting occurs: string='Prague,
Czech Republic, Europe'
; Pathname expansion occurs: string='foo[abcd],bar[efgh]'
will fail if you have a
file named, e.g., food
or barf
in your directory. The only valid usage of such a construct is when
string
is a glob. – gniourf_gniourf
Dec 26 '16 at 18:07
user1009908 ,Jun 9, 2015 at 23:28
UPDATE: Don't do this, due to problems with eval.
With slightly less ceremony:
IFS=', ' eval 'array=($string)'
e.g.
string="foo, bar,baz"
IFS=', ' eval 'array=($string)'
echo ${array[1]} # -> bar
caesarsol ,Oct 29, 2015 at 14:42
eval is evil! don't do this. – caesarsol
Oct 29 '15 at 14:42
user1009908 ,Oct 30, 2015 at 4:05
Pfft. No. If you're writing scripts large enough for this to matter, you're doing it wrong. In application code, eval is evil.
In shell scripting, it's common, necessary, and inconsequential. –
user1009908
Oct 30 '15 at 4:05
caesarsol ,Nov 2, 2015 at 18:19
put a $
in your variable and you'll see... I write many scripts and I never ever had to use a single eval
– caesarsol
Nov 2 '15 at 18:19
Dennis Williamson ,Dec 2, 2015 at
17:00
Eval command and security issues –
Dennis Williamson
Dec 2 '15 at 17:00
user1009908 ,Dec 22, 2015 at 23:04
You're right, this is only usable when the input is known to be clean. Not a robust solution. –
user1009908
Dec 22 '15 at 23:04
Eduardo Lucio ,Jan 31, 2018 at 20:45
Here's my hack!
Splitting strings by strings is a pretty boring thing to do using bash. What happens is that we have limited approaches that
only work in a few cases (split by ";", "/", "." and so on) or we have a variety of side effects in the outputs.
The approach below has required a number of maneuvers, but I believe it will work for most of our needs!
#!/bin/bash
# --------------------------------------
# SPLIT FUNCTION
# ----------------
F_SPLIT_R=()
f_split() {
: 'It does a "split" into a given string and returns an array.
Args:
TARGET_P (str): Target string to "split".
DELIMITER_P (Optional[str]): Delimiter used to "split". If not
informed the split will be done by spaces.
Returns:
F_SPLIT_R (array): Array with the provided string separated by the
informed delimiter.
'
F_SPLIT_R=()
TARGET_P=$1
DELIMITER_P=$2
if [ -z "$DELIMITER_P" ] ; then
DELIMITER_P=" "
fi
REMOVE_N=1
if [ "$DELIMITER_P" == "\n" ] ; then
REMOVE_N=0
fi
# NOTE: This was the only parameter that has been a problem so far!
# By Questor
# [Ref.: https://unix.stackexchange.com/a/390732/61742]
if [ "$DELIMITER_P" == "./" ] ; then
DELIMITER_P="[.]/"
fi
if [ ${REMOVE_N} -eq 1 ] ; then
# NOTE: Due to bash limitations we have some problems getting the
# output of a split by awk inside an array and so we need to use
# "line break" (\n) to succeed. Seen this, we remove the line breaks
# momentarily afterwards we reintegrate them. The problem is that if
# there is a line break in the "string" informed, this line break will
# be lost, that is, it is erroneously removed in the output!
# By Questor
TARGET_P=$(awk 'BEGIN {RS="dn"} {gsub("\n", "3F2C417D448C46918289218B7337FCAF"); printf $0}' <<< "${TARGET_P}")
fi
# NOTE: The replace of "\n" by "3F2C417D448C46918289218B7337FCAF" results
# in more occurrences of "3F2C417D448C46918289218B7337FCAF" than the
# amount of "\n" that there was originally in the string (one more
# occurrence at the end of the string)! We can not explain the reason for
# this side effect. The line below corrects this problem! By Questor
TARGET_P=${TARGET_P%????????????????????????????????}
SPLIT_NOW=$(awk -F"$DELIMITER_P" '{for(i=1; i<=NF; i++){printf "%s\n", $i}}' <<< "${TARGET_P}")
while IFS= read -r LINE_NOW ; do
if [ ${REMOVE_N} -eq 1 ] ; then
# NOTE: We use "'" to prevent blank lines with no other characters
# in the sequence being erroneously removed! We do not know the
# reason for this side effect! By Questor
LN_NOW_WITH_N=$(awk 'BEGIN {RS="dn"} {gsub("3F2C417D448C46918289218B7337FCAF", "\n"); printf $0}' <<< "'${LINE_NOW}'")
# NOTE: We use the commands below to revert the intervention made
# immediately above! By Questor
LN_NOW_WITH_N=${LN_NOW_WITH_N%?}
LN_NOW_WITH_N=${LN_NOW_WITH_N#?}
F_SPLIT_R+=("$LN_NOW_WITH_N")
else
F_SPLIT_R+=("$LINE_NOW")
fi
done <<< "$SPLIT_NOW"
}
# --------------------------------------
# HOW TO USE
# ----------------
STRING_TO_SPLIT="
* How do I list all databases and tables using psql?
\"
sudo -u postgres /usr/pgsql-9.4/bin/psql -c \"\l\"
sudo -u postgres /usr/pgsql-9.4/bin/psql <DB_NAME> -c \"\dt\"
\"
\"
\list or \l: list all databases
\dt: list all tables in the current database
\"
[Ref.: https://dba.stackexchange.com/questions/1285/how-do-i-list-all-databases-and-tables-using-psql]
"
f_split "$STRING_TO_SPLIT" "bin/psql -c"
# --------------------------------------
# OUTPUT AND TEST
# ----------------
ARR_LENGTH=${#F_SPLIT_R[*]}
for (( i=0; i<=$(( $ARR_LENGTH -1 )); i++ )) ; do
echo " > -----------------------------------------"
echo "${F_SPLIT_R[$i]}"
echo " < -----------------------------------------"
done
if [ "$STRING_TO_SPLIT" == "${F_SPLIT_R[0]}bin/psql -c${F_SPLIT_R[1]}" ] ; then
echo " > -----------------------------------------"
echo "The strings are the same!"
echo " < -----------------------------------------"
fi
sel-en-ium ,May 31, 2018 at 5:56
Another way to do it without modifying IFS:
read -r -a myarray <<< "${string//, /$IFS}"
Rather than changing IFS to match our desired delimiter, we can replace all occurrences of our desired delimiter ", "
with contents of $IFS
via "${string//, /$IFS}"
.
Maybe this will be slow for very large strings though?
This is based on Dennis Williamson's answer.
rsjethani ,Sep 13, 2016 at 16:21
Another approach can be:
str="a, b, c, d" # assuming there is a space after ',' as in Q
arr=(${str//,/}) # delete all occurrences of ','
After this 'arr' is an array with four strings. This doesn't require dealing IFS or read or any other special stuff hence much
simpler and direct.
gniourf_gniourf ,Dec 26, 2016 at
18:12
Same (sadly common) antipattern as other answers: subject to word splitting and filename expansion. –
gniourf_gniourf
Dec 26 '16 at 18:12
Safter Arslan ,Aug 9, 2017 at 3:21
Another way would be:
string="Paris, France, Europe"
IFS=', ' arr=(${string})
Now your elements are stored in "arr" array. To iterate through the elements:
for i in ${arr[@]}; do echo $i; done
bgoldst ,Aug 13, 2017 at 22:38
I cover this idea in
my
answer ; see Wrong answer #5 (you might be especially interested in my discussion of the eval
trick).
Your solution leaves $IFS
set to the comma-space value after-the-fact. –
bgoldst
Aug 13 '17 at 22:38
This question already has an answer here:
Rob I ,
May 9, 2012 at 19:22
For your second question, see @mkb's comment to my answer below - that's definitely the way
to go! – Rob
I
May 9 '12 at 19:22
Dennis
Williamson , Jul 4, 2012 at 16:14
See my edited answer for one way to read individual characters into an array. –
Dennis
Williamson
Jul 4 '12 at 16:14
Nick
Weedon , Dec 31, 2015 at 11:04
Here is the same thing in a more concise form: var1=$(cut -f1 -d- <<<$STR) –
Nick Weedon
Dec 31 '15 at 11:04
Rob I ,
May 9, 2012 at 17:00
If your solution doesn't have to be general, i.e. only needs to work for strings
like your example, you could do:
var1=$(echo $STR | cut -f1 -d-)
var2=$(echo $STR | cut -f2 -d-)
I chose cut
here because you could simply extend the code for a few more
variables...
crunchybutternut , May 9,
2012 at 17:40
Can you look at my post again and see if you have a solution for the followup question?
thanks! – crunchybutternut
May 9 '12 at 17:40
mkb , May 9,
2012 at 17:59
You can use cut
to cut characters too! cut -c1
for example. –
mkb
May 9 '12 at 17:59
FSp , Nov
27, 2012 at 10:26
Although this is very simple to read and write, is a very slow solution because forces you to
read twice the same data ($STR) ... if you care of your script performace, the @anubhava
solution is much better – FSp
Nov 27 '12 at 10:26
tripleee , Jan 25, 2016 at 6:47
Apart from being an ugly last-resort solution, this has a bug: You should absolutely use
double quotes in echo "$STR"
unless you specifically want the shell to expand
any wildcards in the string as a side effect. See also stackoverflow.com/questions/10067266/
– tripleee
Jan 25 '16 at 6:47
Rob I ,
Feb 10, 2016 at 13:57
You're right about double quotes of course, though I did point out this solution wasn't
general. However I think your assessment is a bit unfair - for some people this solution may
be more readable (and hence extensible etc) than some others, and doesn't completely rely on
arcane bash feature that wouldn't translate to other shells. I suspect that's why my
solution, though less elegant, continues to get votes periodically... – Rob I
Feb 10 '16 at 13:57
Dennis
Williamson , May 10, 2012 at 3:14
read
with IFS
are perfect for this:
$ IFS=- read var1 var2 <<< ABCDE-123456
$ echo "$var1"
ABCDE
$ echo "$var2"
123456
Edit:
Here is how you can read each individual character into array elements:
$ read -a foo <<<"$(echo "ABCDE-123456" | sed 's/./& /g')"
Dump the array:
$ declare -p foo
declare -a foo='([0]="A" [1]="B" [2]="C" [3]="D" [4]="E" [5]="-" [6]="1" [7]="2" [8]="3" [9]="4" [10]="5" [11]="6")'
If there are spaces in the string:
$ IFS=$'\v' read -a foo <<<"$(echo "ABCDE 123456" | sed 's/./&\v/g')"
$ declare -p foo
declare -a foo='([0]="A" [1]="B" [2]="C" [3]="D" [4]="E" [5]=" " [6]="1" [7]="2" [8]="3" [9]="4" [10]="5" [11]="6")'
insecure , Apr 30, 2014 at 7:51
Great, the elegant bash-only way, without unnecessary forks. – insecure
Apr 30 '14 at 7:51
Martin
Serrano , Jan 11 at 4:34
this solution also has the benefit that if delimiter is not present, the var2
will be empty – Martin Serrano
Jan 11 at 4:34
mkb , May 9,
2012 at 17:02
If you know it's going to be just two fields, you can skip the extra subprocesses like this:
var1=${STR%-*}
var2=${STR#*-}
What does this do? ${STR%-*}
deletes the shortest substring of
$STR
that matches the pattern -*
starting from the end of the
string. ${STR#*-}
does the same, but with the *-
pattern and
starting from the beginning of the string. They each have counterparts %%
and
##
which find the longest anchored pattern match. If anyone has a
helpful mnemonic to remember which does which, let me know! I always have to try both to
remember.
Jens , Jan
30, 2015 at 15:17
Plus 1 For knowing your POSIX shell features, avoiding expensive forks and pipes, and the
absence of bashisms. – Jens
Jan 30 '15 at 15:17
Steven
Lu , May 1, 2015 at 20:19
Dunno about "absence of bashisms" considering that this is already moderately cryptic .... if
your delimiter is a newline instead of a hyphen, then it becomes even more cryptic. On the
other hand, it works with newlines , so there's that. – Steven Lu
May 1 '15 at 20:19
mkb , Mar 9,
2016 at 17:30
@KErlandsson: done – mkb
Mar 9 '16 at 17:30
mombip ,
Aug 9, 2016 at 15:58
I've finally found documentation for it: Shell-Parameter-Expansion
– mombip
Aug 9 '16 at 15:58
DS. , Jan 13,
2017 at 19:56
Mnemonic: "#" is to the left of "%" on a standard keyboard, so "#" removes a prefix (on the
left), and "%" removes a suffix (on the right). – DS.
Jan 13 '17 at 19:56
tripleee , May 9, 2012 at 17:57
Sounds like a job for set
with a custom IFS
.
IFS=-
set $STR
var1=$1
var2=$2
(You will want to do this in a function with a local IFS
so you don't mess up
other parts of your script where you require IFS
to be what you expect.)
Rob I ,
May 9, 2012 at 19:20
Nice - I knew about $IFS
but hadn't seen how it could be used. –
Rob I
May 9 '12 at 19:20
Sigg3.net , Jun 19, 2013 at
8:08
I used triplee's example and it worked exactly as advertised! Just change last two lines to
<pre> myvar1= echo $1
&& myvar2= echo $2
</pre>
if you need to store them throughout a script with several "thrown" variables. –
Sigg3.net
Jun 19 '13 at 8:08
tripleee , Jun 19, 2013 at 13:25
No, don't use a
useless echo
in backticks . – tripleee
Jun 19 '13 at 13:25
Daniel
Andersson , Mar 27, 2015 at 6:46
This is a really sweet solution if we need to write something that is not Bash specific. To
handle IFS
troubles, one can add OLDIFS=$IFS
at the beginning
before overwriting it, and then add IFS=$OLDIFS
just after the set
line. – Daniel Andersson
Mar 27 '15 at 6:46
tripleee , Mar 27, 2015 at 6:58
FWIW the link above is broken. I was lazy and careless. The canonical location still works;
iki.fi/era/unix/award.html#echo –
tripleee
Mar 27 '15 at 6:58
anubhava , May 9, 2012 at 17:09
Using bash regex capabilities:
re="^([^-]+)-(.*)$"
[[ "ABCDE-123456" =~ $re ]] && var1="${BASH_REMATCH[1]}" && var2="${BASH_REMATCH[2]}"
echo $var1
echo $var2
OUTPUT
ABCDE
123456
Cometsong , Oct 21, 2016 at
13:29
Love pre-defining the re
for later use(s)! – Cometsong
Oct 21 '16 at 13:29
Archibald , Nov 12, 2012 at
11:03
string="ABCDE-123456"
IFS=- # use "local IFS=-" inside the function
set $string
echo $1 # >>> ABCDE
echo $2 # >>> 123456
tripleee , Mar 27, 2015 at 7:02
Hmmm, isn't this just a restatement of my answer ? – tripleee
Mar 27 '15 at 7:02
Archibald , Sep 18, 2015 at
12:36
Actually yes. I just clarified it a bit. – Archibald
Sep 18 '15 at 12:36
cd1 , Jul 1,
2010 at 23:29
Suppose I have the string 1:2:3:4:5
and I want to get its last field (
5
in this case). How do I do that using Bash? I tried cut
, but I
don't know how to specify the last field with -f
.
Stephen
, Jul 2, 2010 at 0:05
You can use string
operators :
$ foo=1:2:3:4:5
$ echo ${foo##*:}
5
This trims everything from the front until a ':', greedily.
${foo <-- from variable foo
## <-- greedy front trim
* <-- matches anything
: <-- until the last ':'
}
eckes ,
Jan 23, 2013 at 15:23
While this is working for the given problem, the answer of William below ( stackoverflow.com/a/3163857/520162 )
also returns 5
if the string is 1:2:3:4:5:
(while using the string
operators yields an empty result). This is especially handy when parsing paths that could
contain (or not) a finishing /
character. – eckes
Jan 23 '13 at 15:23
Dobz , Jun
25, 2014 at 11:44
How would you then do the opposite of this? to echo out '1:2:3:4:'? – Dobz
Jun 25 '14 at 11:44
Mihai
Danila , Jul 9, 2014 at 14:07
And how does one keep the part before the last separator? Apparently by using
${foo%:*}
. #
- from beginning; %
- from end.
#
, %
- shortest match; ##
, %%
- longest
match. – Mihai Danila
Jul 9 '14 at 14:07
Putnik ,
Feb 11, 2016 at 22:33
If i want to get the last element from path, how should I use it? echo
${pwd##*/}
does not work. – Putnik
Feb 11 '16 at 22:33
Stan
Strum , Dec 17, 2017 at 4:22
@Putnik that command sees pwd
as a variable. Try dir=$(pwd); echo
${dir##*/}
. Works for me! – Stan Strum
Dec 17 '17 at 4:22
a3nm , Feb
3, 2012 at 8:39
Another way is to reverse before and after cut
:
$ echo ab:cd:ef | rev | cut -d: -f1 | rev
ef
This makes it very easy to get the last but one field, or any range of fields numbered
from the end.
Dannid ,
Jan 14, 2013 at 20:50
This answer is nice because it uses 'cut', which the author is (presumably) already familiar.
Plus, I like this answer because I am using 'cut' and had this exact question, hence
finding this thread via search. – Dannid
Jan 14 '13 at 20:50
funroll
, Aug 12, 2013 at 19:51
Some cut-and-paste fodder for people using spaces as delimiters: echo "1 2 3 4" | rev |
cut -d " " -f1 | rev
– funroll
Aug 12 '13 at 19:51
EdgeCaseBerg , Sep 8, 2013 at
5:01
the rev | cut -d -f1 | rev is so clever! Thanks! Helped me a bunch (my use case was rev | -d
' ' -f 2- | rev – EdgeCaseBerg
Sep 8 '13 at 5:01
Anarcho-Chossid , Sep 16,
2015 at 15:54
Wow. Beautiful and dark magic. – Anarcho-Chossid
Sep 16 '15 at 15:54
shearn89 , Aug 17, 2017 at 9:27
I always forget about rev
, was just what I needed! cut -b20- | rev | cut
-b10- | rev
– shearn89
Aug 17 '17 at 9:27
William
Pursell , Jul 2, 2010 at 7:09
It's difficult to get the last field using cut, but here's (one set of) solutions in awk and
perl
$ echo 1:2:3:4:5 | awk -F: '{print $NF}'
5
$ echo 1:2:3:4:5 | perl -F: -wane 'print $F[-1]'
5
eckes ,
Jan 23, 2013 at 15:20
great advantage of this solution over the accepted answer: it also matches paths that contain
or do not contain a finishing /
character: /a/b/c/d
and
/a/b/c/d/
yield the same result ( d
) when processing pwd |
awk -F/ '{print $NF}'
. The accepted answer results in an empty result in the case of
/a/b/c/d/
– eckes
Jan 23 '13 at 15:20
stamster , May 21 at 11:52
@eckes In case of AWK solution, on GNU bash, version 4.3.48(1)-release that's not true, as it
matters whenever you have trailing slash or not. Simply put AWK will use /
as
delimiter, and if your path is /my/path/dir/
it will use value after last
delimiter, which is simply an empty string. So it's best to avoid trailing slash if you need
to do such a thing like I do. – stamster
May 21 at 11:52
Nicholas M T Elliott ,
Jul 1, 2010 at 23:39
Assuming fairly simple usage (no escaping of the delimiter, for example), you can use grep:
$ echo "1:2:3:4:5" | grep -oE "[^:]+$"
5
Breakdown - find all the characters not the delimiter ([^:]) at the end of the line ($).
-o only prints the matching part.
Dennis
Williamson , Jul 2, 2010 at 0:05
One way:
var1="1:2:3:4:5"
var2=${var1##*:}
Another, using an array:
var1="1:2:3:4:5"
saveIFS=$IFS
IFS=":"
var2=($var1)
IFS=$saveIFS
var2=${var2[@]: -1}
Yet another with an array:
var1="1:2:3:4:5"
saveIFS=$IFS
IFS=":"
var2=($var1)
IFS=$saveIFS
count=${#var2[@]}
var2=${var2[$count-1]}
Using Bash (version >= 3.2) regular expressions:
var1="1:2:3:4:5"
[[ $var1 =~ :([^:]*)$ ]]
var2=${BASH_REMATCH[1]}
liuyang1 , Mar 24, 2015 at 6:02
Thanks so much for array style, as I need this feature, but not have cut, awk these utils.
– liuyang1
Mar 24 '15 at 6:02
user3133260 , Dec 24, 2013 at
19:04
$ echo "a b c d e" | tr ' ' '\n' | tail -1
e
Simply translate the delimiter into a newline and choose the last entry with tail
-1
.
Yajo , Jul
30, 2014 at 10:13
It will fail if the last item contains a \n
, but for most cases is the most
readable solution. – Yajo
Jul 30 '14 at 10:13
Rafael ,
Nov 10, 2016 at 10:09
Using sed
:
$ echo '1:2:3:4:5' | sed 's/.*://' # => 5
$ echo '' | sed 's/.*://' # => (empty)
$ echo ':' | sed 's/.*://' # => (empty)
$ echo ':b' | sed 's/.*://' # => b
$ echo '::c' | sed 's/.*://' # => c
$ echo 'a' | sed 's/.*://' # => a
$ echo 'a:' | sed 's/.*://' # => (empty)
$ echo 'a:b' | sed 's/.*://' # => b
$ echo 'a::c' | sed 's/.*://' # => c
Ab
Irato , Nov 13, 2013 at 16:10
If your last field is a single character, you could do this:
a="1:2:3:4:5"
echo ${a: -1}
echo ${a:(-1)}
Check string manipulation in bash .
gniourf_gniourf , Nov 13,
2013 at 16:15
This doesn't work: it gives the last character of a
, not the last
field . – gniourf_gniourf
Nov 13 '13 at 16:15
Ab
Irato , Nov 25, 2013 at 13:25
True, that's the idea, if you know the length of the last field it's good. If not you have to
use something else... – Ab Irato
Nov 25 '13 at 13:25
sphakka
, Jan 25, 2016 at 16:24
Interesting, I didn't know of these particular Bash string manipulations. It also resembles
to Python's
string/array slicing . – sphakka
Jan 25 '16 at 16:24
ghostdog74 , Jul 2, 2010 at
1:16
Using Bash.
$ var1="1:2:3:4:0"
$ IFS=":"
$ set -- $var1
$ eval echo \$${#}
0
Sopalajo de Arrierez ,
Dec 24, 2014 at 5:04
I would buy some details about this method, please :-) . – Sopalajo de Arrierez
Dec 24 '14 at 5:04
Rafa , Apr
27, 2017 at 22:10
Could have used echo ${!#}
instead of eval echo \$${#}
. –
Rafa
Apr 27 '17 at 22:10
Crytis ,
Dec 7, 2016 at 6:51
echo "a:b:c:d:e"|xargs -d : -n1|tail -1
First use xargs split it using ":",-n1 means every line only have one part.Then,pring the
last part.
BDL , Dec
7, 2016 at 13:47
Although this might solve the problem, one should always add an explanation to it. –
BDL
Dec 7 '16 at 13:47
Crytis ,
Jun 7, 2017 at 9:13
already added.. – Crytis
Jun 7 '17 at 9:13
021 , Apr
26, 2016 at 11:33
There are many good answers here, but still I want to share this one using basename :
basename $(echo "a:b:c:d:e" | tr ':' '/')
However it will fail if there are already some '/' in your string . If slash / is your
delimiter then you just have to (and should) use basename.
It's not the best answer but it just shows how you can be creative using bash
commands.
Nahid
Akbar , Jun 22, 2012 at 2:55
for x in `echo $str | tr ";" "\n"`; do echo $x; done
chepner
, Jun 22, 2012 at 12:58
This runs into problems if there is whitespace in any of the fields. Also, it does not
directly address the question of retrieving the last field. – chepner
Jun 22 '12 at 12:58
Christoph
Böddeker , Feb 19 at 15:50
For those that comfortable with Python, https://github.com/Russell91/pythonpy is a nice
choice to solve this problem.
$ echo "a:b:c:d:e" | py -x 'x.split(":")[-1]'
From the pythonpy help: -x treat each row of stdin as x
.
With that tool, it is easy to write python code that gets applied to the input.
baz , Nov
24, 2017 at 19:27
a solution using the read builtin
IFS=':' read -a field <<< "1:2:3:4:5"
echo ${field[4]}
Notable quotes:
"... Bash shell script split array ..."
"... associative array ..."
"... pattern substitution ..."
"... Debian GNU/Linux ..."
stefanB ,
May 28, 2009 at 2:03
I have this string stored in a variable:
IN="[email protected];[email protected]"
Now I would like to split the strings by ;
delimiter so that I have:
ADDR1="[email protected]"
ADDR2="[email protected]"
I don't necessarily need the ADDR1
and ADDR2
variables. If they
are elements of an array that's even better.
After suggestions from the answers below, I ended up with the following which is what I
was after:
#!/usr/bin/env bash
IN="[email protected];[email protected]"
mails=$(echo $IN | tr ";" "\n")
for addr in $mails
do
echo "> [$addr]"
done
Output:
> [[email protected]]
> [[email protected]]
There was a solution involving setting Internal_field_separator (IFS) to
;
. I am not sure what happened with that answer, how do you reset
IFS
back to default?
RE: IFS
solution, I tried this and it works, I keep the old IFS
and then restore it:
IN="[email protected];[email protected]"
OIFS=$IFS
IFS=';'
mails2=$IN
for x in $mails2
do
echo "> [$x]"
done
IFS=$OIFS
BTW, when I tried
mails2=($IN)
I only got the first string when printing it in loop, without brackets around
$IN
it works.
Brooks
Moses , May 1, 2012 at 1:26
With regards to your "Edit2": You can simply "unset IFS" and it will return to the default
state. There's no need to save and restore it explicitly unless you have some reason to
expect that it's already been set to a non-default value. Moreover, if you're doing this
inside a function (and, if you aren't, why not?), you can set IFS as a local variable and it
will return to its previous value once you exit the function. – Brooks Moses
May 1 '12 at 1:26
dubiousjim , May 31, 2012 at
5:21
@BrooksMoses: (a) +1 for using local IFS=...
where possible; (b) -1 for
unset IFS
, this doesn't exactly reset IFS to its default value, though I
believe an unset IFS behaves the same as the default value of IFS ($' \t\n'), however it
seems bad practice to be assuming blindly that your code will never be invoked with IFS set
to a custom value; (c) another idea is to invoke a subshell: (IFS=$custom; ...)
when the subshell exits IFS will return to whatever it was originally. – dubiousjim
May 31 '12 at 5:21
nicooga
, Mar 7, 2016 at 15:32
I just want to have a quick look at the paths to decide where to throw an executable, so I
resorted to run ruby -e "puts ENV.fetch('PATH').split(':')"
. If you want to
stay pure bash won't help but using any scripting language that has a built-in split
is easier. – nicooga
Mar 7 '16 at 15:32
Jeff , Apr
22 at 17:51
This is kind of a drive-by comment, but since the OP used email addresses as the example, has
anyone bothered to answer it in a way that is fully RFC 5322 compliant, namely that any
quoted string can appear before the @ which means you're going to need regular expressions or
some other kind of parser instead of naive use of IFS or other simplistic splitter functions.
– Jeff
Apr 22 at 17:51
user2037659 , Apr 26 at 20:15
for x in $(IFS=';';echo $IN); do echo "> [$x]"; done
– user2037659
Apr 26 at 20:15
Johannes Schaub - litb ,
May 28, 2009 at 2:23
You can set the internal field separator (IFS)
variable, and then let it parse into an array. When this happens in a command, then the
assignment to IFS
only takes place to that single command's environment (to
read
). It then parses the input according to the IFS
variable
value into an array, which we can then iterate over.
IFS=';' read -ra ADDR <<< "$IN"
for i in "${ADDR[@]}"; do
# process "$i"
done
It will parse one line of items separated by ;
, pushing it into an array.
Stuff for processing whole of $IN
, each time one line of input separated by
;
:
while IFS=';' read -ra ADDR; do
for i in "${ADDR[@]}"; do
# process "$i"
done
done <<< "$IN"
Chris
Lutz , May 28, 2009 at 2:25
This is probably the best way. How long will IFS persist in it's current value, can it mess
up my code by being set when it shouldn't be, and how can I reset it when I'm done with it?
– Chris
Lutz
May 28 '09 at 2:25
Johannes Schaub - litb ,
May 28, 2009 at 3:04
now after the fix applied, only within the duration of the read command :) – Johannes Schaub -
litb
May 28 '09 at 3:04
lhunath ,
May 28, 2009 at 6:14
You can read everything at once without using a while loop: read -r -d '' -a addr
<<< "$in" # The -d '' is key here, it tells read not to stop at the first newline
(which is the default -d) but to continue until EOF or a NULL byte (which only occur in
binary data). – lhunath
May 28 '09 at 6:14
Charles
Duffy , Jul 6, 2013 at 14:39
@LucaBorrione Setting IFS
on the same line as the read
with no
semicolon or other separator, as opposed to in a separate command, scopes it to that command
-- so it's always "restored"; you don't need to do anything manually. – Charles Duffy
Jul 6 '13 at 14:39
chepner
, Oct 2, 2014 at 3:50
@imagineerThis There is a bug involving herestrings and local changes to IFS that requires
$IN
to be quoted. The bug is fixed in bash
4.3. – chepner
Oct 2 '14 at 3:50
palindrom , Mar 10, 2011 at 9:00
Taken from
Bash shell script split array :
IN="[email protected];[email protected]"
arrIN=(${IN//;/ })
Explanation:
This construction replaces all occurrences of ';'
(the initial
//
means global replace) in the string IN
with ' '
(a
single space), then interprets the space-delimited string as an array (that's what the
surrounding parentheses do).
The syntax used inside of the curly braces to replace each ';'
character with
a ' '
character is called Parameter
Expansion .
There are some common gotchas:
- If the original string has spaces, you will need to use
IFS :
IFS=':'; arrIN=($IN); unset IFS;
- If the original string has spaces and the delimiter is a new line, you can set
IFS with:
IFS=$'\n'; arrIN=($IN); unset IFS;
Oz123 ,
Mar 21, 2011 at 18:50
I just want to add: this is the simplest of all, you can access array elements with
${arrIN[1]} (starting from zeros of course) – Oz123
Mar 21 '11 at 18:50
KomodoDave , Jan 5, 2012 at
15:13
Found it: the technique of modifying a variable within a ${} is known as 'parameter
expansion'. – KomodoDave
Jan 5 '12 at 15:13
qbolec ,
Feb 25, 2013 at 9:12
Does it work when the original string contains spaces? – qbolec
Feb 25 '13 at 9:12
Ethan ,
Apr 12, 2013 at 22:47
No, I don't think this works when there are also spaces present... it's converting the ',' to
' ' and then building a space-separated array. – Ethan
Apr 12 '13 at 22:47
Charles
Duffy , Jul 6, 2013 at 14:39
This is a bad approach for other reasons: For instance, if your string contains
;*;
, then the *
will be expanded to a list of filenames in the
current directory. -1 – Charles Duffy
Jul 6 '13 at 14:39
Chris
Lutz , May 28, 2009 at 2:09
If you don't mind processing them immediately, I like to do this:
for i in $(echo $IN | tr ";" "\n")
do
# process
done
You could use this kind of loop to initialize an array, but there's probably an easier way
to do it. Hope this helps, though.
Chris
Lutz , May 28, 2009 at 2:42
You should have kept the IFS answer. It taught me something I didn't know, and it definitely
made an array, whereas this just makes a cheap substitute. – Chris Lutz
May 28 '09 at 2:42
Johannes Schaub - litb ,
May 28, 2009 at 2:59
I see. Yeah i find doing these silly experiments, i'm going to learn new things each time i'm
trying to answer things. I've edited stuff based on #bash IRC feedback and undeleted :)
– Johannes Schaub - litb
May 28 '09 at 2:59
lhunath ,
May 28, 2009 at 6:12
-1, you're obviously not aware of wordsplitting, because it's introducing two bugs in your
code. one is when you don't quote $IN and the other is when you pretend a newline is the only
delimiter used in wordsplitting. You are iterating over every WORD in IN, not every line, and
DEFINATELY not every element delimited by a semicolon, though it may appear to have the
side-effect of looking like it works. – lhunath
May 28 '09 at 6:12
Johannes Schaub - litb ,
May 28, 2009 at 17:00
You could change it to echo "$IN" | tr ';' '\n' | while read -r ADDY; do # process "$ADDY";
done to make him lucky, i think :) Note that this will fork, and you can't change outer
variables from within the loop (that's why i used the <<< "$IN" syntax) then –
Johannes
Schaub - litb
May 28 '09 at 17:00
mklement0 , Apr 24, 2013 at 14:13
To summarize the debate in the comments: Caveats for general use : the shell applies
word splitting and expansions to the string, which may be undesired; just try
it with. IN="[email protected];[email protected];*;broken apart"
. In short: this
approach will break, if your tokens contain embedded spaces and/or chars. such as
*
that happen to make a token match filenames in the current folder. –
mklement0
Apr 24 '13 at 14:13
F.
Hauri , Apr 13, 2013 at 14:20
Compatible answer
To this SO question, there is already a lot of different way to do this in bash . But bash has many
special features, so called bashism that work well, but that won't work in
any other shell .
In particular, arrays , associative array , and pattern
substitution are pure bashisms and may not work under other shells
.
On my Debian GNU/Linux , there is a standard shell called dash , but I know many
people who like to use ksh .
Finally, in very small situation, there is a special tool called busybox with his own shell
interpreter ( ash ).
Requested string
The string sample in SO question is:
IN="[email protected];[email protected]"
As this could be useful with whitespaces and as whitespaces could modify
the result of the routine, I prefer to use this sample string:
IN="[email protected];[email protected];Full Name <[email protected]>"
Split string based on delimiter in bash (version >=4.2)
Under pure bash, we may use arrays and IFS :
var="[email protected];[email protected];Full Name <[email protected]>"
oIFS="$IFS"
IFS=";"
declare -a fields=($var)
IFS="$oIFS"
unset oIFS
IFS=\; read -a fields <<<"$var"
Using this syntax under recent bash don't change $IFS
for current session,
but only for the current command:
set | grep ^IFS=
IFS=$' \t\n'
Now the string var
is split and stored into an array (named
fields
):
set | grep ^fields=\\\|^var=
fields=([0]="[email protected]" [1]="[email protected]" [2]="Full Name <[email protected]>")
var='[email protected];[email protected];Full Name <[email protected]>'
We could request for variable content with declare -p
:
declare -p var fields
declare -- var="[email protected];[email protected];Full Name <[email protected]>"
declare -a fields=([0]="[email protected]" [1]="[email protected]" [2]="Full Name <[email protected]>")
read
is the quickiest way to do the split, because there is no
forks and no external resources called.
From there, you could use the syntax you already know for processing each field:
for x in "${fields[@]}";do
echo "> [$x]"
done
> [[email protected]]
> [[email protected]]
> [Full Name <[email protected]>]
or drop each field after processing (I like this shifting approach):
while [ "$fields" ] ;do
echo "> [$fields]"
fields=("${fields[@]:1}")
done
> [[email protected]]
> [[email protected]]
> [Full Name <[email protected]>]
or even for simple printout (shorter syntax):
printf "> [%s]\n" "${fields[@]}"
> [[email protected]]
> [[email protected]]
> [Full Name <[email protected]>]
Split string based on delimiter in shell
But if you would write something usable under many shells, you have to not use
bashisms .
There is a syntax, used in many shells, for splitting a string across first or
last occurrence of a substring:
${var#*SubStr} # will drop begin of string up to first occur of `SubStr`
${var##*SubStr} # will drop begin of string up to last occur of `SubStr`
${var%SubStr*} # will drop part of string from last occur of `SubStr` to the end
${var%%SubStr*} # will drop part of string from first occur of `SubStr` to the end
(The missing of this is the main reason of my answer publication ;)
As pointed out by Score_Under :
#
and %
delete the shortest possible matching string, and
##
and %%
delete the longest possible.
This little sample script work well under bash , dash , ksh , busybox and was tested under
Mac-OS's bash too:
var="[email protected];[email protected];Full Name <[email protected]>"
while [ "$var" ] ;do
iter=${var%%;*}
echo "> [$iter]"
[ "$var" = "$iter" ] && \
var='' || \
var="${var#*;}"
done
> [[email protected]]
> [[email protected]]
> [Full Name <[email protected]>]
Have fun!
Score_Under , Apr 28, 2015 at
16:58
The #
, ##
, %
, and %%
substitutions
have what is IMO an easier explanation to remember (for how much they delete): #
and %
delete the shortest possible matching string, and ##
and
%%
delete the longest possible. – Score_Under
Apr 28 '15 at 16:58
sorontar , Oct 26, 2016 at 4:36
The IFS=\; read -a fields <<<"$var"
fails on newlines and add a
trailing newline. The other solution removes a trailing empty field. – sorontar
Oct 26 '16 at 4:36
Eric
Chen , Aug 30, 2017 at 17:50
The shell delimiter is the most elegant answer, period. – Eric Chen
Aug 30 '17 at 17:50
sancho.s , Oct 4 at 3:42
Could the last alternative be used with a list of field separators set somewhere else? For
instance, I mean to use this as a shell script, and pass a list of field separators as a
positional parameter. – sancho.s
Oct 4 at 3:42
F.
Hauri , Oct 4 at 7:47
Yes, in a loop: for sep in "#" "ł" "@" ; do ... var="${var#*$sep}" ...
– F.
Hauri
Oct 4 at 7:47
DougW ,
Apr 27, 2015 at 18:20
I've seen a couple of answers referencing the cut
command, but they've all been
deleted. It's a little odd that nobody has elaborated on that, because I think it's one of
the more useful commands for doing this type of thing, especially for parsing delimited log
files.
In the case of splitting this specific example into a bash script array, tr
is probably more efficient, but cut
can be used, and is more effective if you
want to pull specific fields from the middle.
Example:
$ echo "[email protected];[email protected]" | cut -d ";" -f 1
[email protected]
$ echo "[email protected];[email protected]" | cut -d ";" -f 2
[email protected]
You can obviously put that into a loop, and iterate the -f parameter to pull each field
independently.
This gets more useful when you have a delimited log file with rows like this:
2015-04-27|12345|some action|an attribute|meta data
cut
is very handy to be able to cat
this file and select a
particular field for further processing.
MisterMiyagi , Nov 2, 2016 at
8:42
Kudos for using cut
, it's the right tool for the job! Much cleared than any of
those shell hacks. – MisterMiyagi
Nov 2 '16 at 8:42
uli42 ,
Sep 14, 2017 at 8:30
This approach will only work if you know the number of elements in advance; you'd need to
program some more logic around it. It also runs an external tool for every element. –
uli42
Sep 14 '17 at 8:30
Louis Loudog Trottier ,
May 10 at 4:20
Excatly waht i was looking for trying to avoid empty string in a csv. Now i can point the
exact 'column' value as well. Work with IFS already used in a loop. Better than expected for
my situation. – Louis Loudog Trottier
May 10 at 4:20
, May 28, 2009 at 10:31
How about this approach:
IN="[email protected];[email protected]"
set -- "$IN"
IFS=";"; declare -a Array=($*)
echo "${Array[@]}"
echo "${Array[0]}"
echo "${Array[1]}"
Source
Yzmir
Ramirez , Sep 5, 2011 at 1:06
+1 ... but I wouldn't name the variable "Array" ... pet peev I guess. Good solution. –
Yzmir
Ramirez
Sep 5 '11 at 1:06
ata , Nov 3,
2011 at 22:33
+1 ... but the "set" and declare -a are unnecessary. You could as well have used just
IFS";" && Array=($IN)
– ata
Nov 3 '11 at 22:33
Luca
Borrione , Sep 3, 2012 at 9:26
+1 Only a side note: shouldn't it be recommendable to keep the old IFS and then restore it?
(as shown by stefanB in his edit3) people landing here (sometimes just copying and pasting a
solution) might not think about this – Luca Borrione
Sep 3 '12 at 9:26
Charles
Duffy , Jul 6, 2013 at 14:44
-1: First, @ata is right that most of the commands in this do nothing. Second, it uses
word-splitting to form the array, and doesn't do anything to inhibit glob-expansion when
doing so (so if you have glob characters in any of the array elements, those elements are
replaced with matching filenames). – Charles Duffy
Jul 6 '13 at 14:44
John_West , Jan 8, 2016 at
12:29
Suggest to use $'...'
: IN=$'[email protected];[email protected];bet <d@\ns*
kl.com>'
. Then echo "${Array[2]}"
will print a string with newline.
set -- "$IN"
is also neccessary in this case. Yes, to prevent glob expansion,
the solution should include set -f
. – John_West
Jan 8 '16 at 12:29
Steven
Lizarazo , Aug 11, 2016 at 20:45
This worked for me:
string="1;2"
echo $string | cut -d';' -f1 # output is 1
echo $string | cut -d';' -f2 # output is 2
Pardeep
Sharma , Oct 10, 2017 at 7:29
this is sort and sweet :) – Pardeep Sharma
Oct 10 '17 at 7:29
space
earth , Oct 17, 2017 at 7:23
Thanks...Helped a lot – space earth
Oct 17 '17 at 7:23
mojjj ,
Jan 8 at 8:57
cut works only with a single char as delimiter. – mojjj
Jan 8 at 8:57
lothar ,
May 28, 2009 at 2:12
echo "[email protected];[email protected]" | sed -e 's/;/\n/g'
[email protected]
[email protected]
Luca
Borrione , Sep 3, 2012 at 10:08
-1 what if the string contains spaces? for example IN="this is first line; this
is second line" arrIN=( $( echo "$IN" | sed -e 's/;/\n/g' ) )
will produce an array of
8 elements in this case (an element for each word space separated), rather than 2 (an element
for each line semi colon separated) – Luca Borrione
Sep 3 '12 at 10:08
lothar ,
Sep 3, 2012 at 17:33
@Luca No the sed script creates exactly two lines. What creates the multiple entries for you
is when you put it into a bash array (which splits on white space by default) –
lothar
Sep 3 '12 at 17:33
Luca
Borrione , Sep 4, 2012 at 7:09
That's exactly the point: the OP needs to store entries into an array to loop over it, as you
can see in his edits. I think your (good) answer missed to mention to use arrIN=( $(
echo "$IN" | sed -e 's/;/\n/g' ) )
to achieve that, and to advice to change IFS to
IFS=$'\n'
for those who land here in the future and needs to split a string
containing spaces. (and to restore it back afterwards). :) – Luca Borrione
Sep 4 '12 at 7:09
lothar ,
Sep 4, 2012 at 16:55
@Luca Good point. However the array assignment was not in the initial question when I wrote
up that answer. – lothar
Sep 4 '12 at 16:55
Ashok ,
Sep 8, 2012 at 5:01
This also works:
IN="[email protected];[email protected]"
echo ADD1=`echo $IN | cut -d \; -f 1`
echo ADD2=`echo $IN | cut -d \; -f 2`
Be careful, this solution is not always correct. In case you pass "[email protected]" only, it
will assign it to both ADD1 and ADD2.
fersarr
, Mar 3, 2016 at 17:17
You can use -s to avoid the mentioned problem: superuser.com/questions/896800/
"-f, --fields=LIST select only these fields; also print any line that contains no delimiter
character, unless the -s option is specified" – fersarr
Mar 3 '16 at 17:17
Tony , Jan
14, 2013 at 6:33
I think AWK is the best and
efficient command to resolve your problem. AWK is included in Bash by default in almost every
Linux distribution.
echo "[email protected];[email protected]" | awk -F';' '{print $1,$2}'
will give
[email protected] [email protected]
Of course your can store each email address by redefining the awk print field.
Jaro , Jan
7, 2014 at 21:30
Or even simpler: echo "[email protected];[email protected]" | awk 'BEGIN{RS=";"} {print}' –
Jaro
Jan 7 '14 at 21:30
Aquarelle , May 6, 2014 at
21:58
@Jaro This worked perfectly for me when I had a string with commas and needed to reformat it
into lines. Thanks. – Aquarelle
May 6 '14 at 21:58
Eduardo
Lucio , Aug 5, 2015 at 12:59
It worked in this scenario -> "echo "$SPLIT_0" | awk -F' inode=' '{print $1}'"! I had
problems when trying to use atrings (" inode=") instead of characters (";"). $ 1, $ 2, $ 3, $
4 are set as positions in an array! If there is a way of setting an array... better! Thanks!
– Eduardo Lucio
Aug 5 '15 at 12:59
Tony , Aug
6, 2015 at 2:42
@EduardoLucio, what I'm thinking about is maybe you can first replace your delimiter
inode=
into ;
for example by sed -i 's/inode\=/\;/g'
your_file_to_process
, then define -F';'
when apply awk
,
hope that can help you. – Tony
Aug 6 '15 at 2:42
nickjb ,
Jul 5, 2011 at 13:41
A different take on
Darron's answer , this is how I do it:
IN="[email protected];[email protected]"
read ADDR1 ADDR2 <<<$(IFS=";"; echo $IN)
ColinM ,
Sep 10, 2011 at 0:31
This doesn't work. – ColinM
Sep 10 '11 at 0:31
nickjb ,
Oct 6, 2011 at 15:33
I think it does! Run the commands above and then "echo $ADDR1 ... $ADDR2" and i get
"[email protected] ... [email protected]" output – nickjb
Oct 6 '11 at 15:33
Nick , Oct
28, 2011 at 14:36
This worked REALLY well for me... I used it to itterate over an array of strings which
contained comma separated DB,SERVER,PORT data to use mysqldump. – Nick
Oct 28 '11 at 14:36
dubiousjim , May 31, 2012 at
5:28
Diagnosis: the IFS=";"
assignment exists only in the $(...; echo
$IN)
subshell; this is why some readers (including me) initially think it won't work.
I assumed that all of $IN was getting slurped up by ADDR1. But nickjb is correct; it does
work. The reason is that echo $IN
command parses its arguments using the current
value of $IFS, but then echoes them to stdout using a space delimiter, regardless of the
setting of $IFS. So the net effect is as though one had called read ADDR1 ADDR2
<<< "[email protected] [email protected]"
(note the input is space-separated not
;-separated). – dubiousjim
May 31 '12 at 5:28
sorontar , Oct 26, 2016 at 4:43
This fails on spaces and newlines, and also expand wildcards *
in the echo
$IN
with an unquoted variable expansion. – sorontar
Oct 26 '16 at 4:43
gniourf_gniourf , Jun 26,
2014 at 9:11
In Bash, a bullet proof way, that will work even if your variable contains newlines:
IFS=';' read -d '' -ra array < <(printf '%s;\0' "$in")
Look:
$ in=$'one;two three;*;there is\na newline\nin this field'
$ IFS=';' read -d '' -ra array < <(printf '%s;\0' "$in")
$ declare -p array
declare -a array='([0]="one" [1]="two three" [2]="*" [3]="there is
a newline
in this field")'
The trick for this to work is to use the -d
option of read
(delimiter) with an empty delimiter, so that read
is forced to read everything
it's fed. And we feed read
with exactly the content of the variable
in
, with no trailing newline thanks to printf
. Note that's we're
also putting the delimiter in printf
to ensure that the string passed to
read
has a trailing delimiter. Without it, read
would trim
potential trailing empty fields:
$ in='one;two;three;' # there's an empty field
$ IFS=';' read -d '' -ra array < <(printf '%s;\0' "$in")
$ declare -p array
declare -a array='([0]="one" [1]="two" [2]="three" [3]="")'
the trailing empty field is preserved.
Update for Bash≥4.4
Since Bash 4.4, the builtin mapfile
(aka readarray
) supports
the -d
option to specify a delimiter. Hence another canonical way is:
mapfile -d ';' -t array < <(printf '%s;' "$in")
John_West , Jan 8, 2016 at
12:10
I found it as the rare solution on that list that works correctly with \n
,
spaces and *
simultaneously. Also, no loops; array variable is accessible in the
shell after execution (contrary to the highest upvoted answer). Note, in=$'...'
, it does not work with double quotes. I think, it needs more upvotes. – John_West
Jan 8 '16 at 12:10
Darron ,
Sep 13, 2010 at 20:10
How about this one liner, if you're not using arrays:
IFS=';' read ADDR1 ADDR2 <<<$IN
dubiousjim , May 31, 2012 at
5:36
Consider using read -r ...
to ensure that, for example, the two characters "\t"
in the input end up as the same two characters in your variables (instead of a single tab
char). – dubiousjim
May 31 '12 at 5:36
Luca
Borrione , Sep 3, 2012 at 10:07
-1 This is not working here (ubuntu 12.04). Adding echo "ADDR1 $ADDR1"\n echo "ADDR2
$ADDR2"
to your snippet will output ADDR1 [email protected]
[email protected]\nADDR2
(\n is newline) – Luca Borrione
Sep 3 '12 at 10:07
chepner
, Sep 19, 2015 at 13:59
This is probably due to a bug involving IFS
and here strings that was fixed in
bash
4.3. Quoting $IN
should fix it. (In theory, $IN
is not subject to word splitting or globbing after it expands, meaning the quotes should be
unnecessary. Even in 4.3, though, there's at least one bug remaining--reported and scheduled
to be fixed--so quoting remains a good idea.) – chepner
Sep 19 '15 at 13:59
sorontar , Oct 26, 2016 at 4:55
This breaks if $in contain newlines even if $IN is quoted. And adds a trailing newline.
– sorontar
Oct 26 '16 at 4:55
kenorb ,
Sep 11, 2015 at 20:54
Here is a clean 3-liner:
in="foo@bar;bizz@buzz;fizz@buzz;buzz@woof"
IFS=';' list=($in)
for item in "${list[@]}"; do echo $item; done
where IFS
delimit words based on the separator and ()
is used to
create an array
. Then [@]
is used to return each item as a separate word.
If you've any code after that, you also need to restore $IFS
, e.g.
unset IFS
.
sorontar , Oct 26, 2016 at 5:03
The use of $in
unquoted allows wildcards to be expanded. – sorontar
Oct 26 '16 at 5:03
user2720864 , Sep 24 at 13:46
+ for the unset command – user2720864
Sep 24 at 13:46
Emilien
Brigand , Aug 1, 2016 at 13:15
Without setting the IFS
If you just have one colon you can do that:
a="foo:bar"
b=${a%:*}
c=${a##*:}
you will get:
b = foo
c = bar
Victor
Choy , Sep 16, 2015 at 3:34
There is a simple and smart way like this:
echo "add:sfff" | xargs -d: -i echo {}
But you must use gnu xargs, BSD xargs cant support -d delim. If you use apple mac like me.
You can install gnu xargs :
brew install findutils
then
echo "add:sfff" | gxargs -d: -i echo {}
Halle
Knast , May 24, 2017 at 8:42
The following Bash/zsh function splits its first argument on the delimiter given by the
second argument:
split() {
local string="$1"
local delimiter="$2"
if [ -n "$string" ]; then
local part
while read -d "$delimiter" part; do
echo $part
done <<< "$string"
echo $part
fi
}
For instance, the command
$ split 'a;b;c' ';'
yields
a
b
c
This output may, for instance, be piped to other commands. Example:
$ split 'a;b;c' ';' | cat -n
1 a
2 b
3 c
Compared to the other solutions given, this one has the following advantages:
IFS
is not overriden: Due to dynamic scoping of even local variables,
overriding IFS
over a loop causes the new value to leak into function calls
performed from within the loop.
- Arrays are not used: Reading a string into an array using
read
requires
the flag -a
in Bash and -A
in zsh.
If desired, the function may be put into a script as follows:
#!/usr/bin/env bash
split() {
# ...
}
split "$@"
sandeepkunkunuru , Oct 23,
2017 at 16:10
works and neatly modularized. – sandeepkunkunuru
Oct 23 '17 at 16:10
Prospero , Sep 25, 2011 at 1:09
This is the simplest way to do it.
spo='one;two;three'
OIFS=$IFS
IFS=';'
spo_array=($spo)
IFS=$OIFS
echo ${spo_array[*]}
rashok ,
Oct 25, 2016 at 12:41
IN="[email protected];[email protected]"
IFS=';'
read -a IN_arr <<< "${IN}"
for entry in "${IN_arr[@]}"
do
echo $entry
done
Output
[email protected]
[email protected]
System : Ubuntu 12.04.1
codeforester , Jan 2, 2017 at
5:37
IFS is not getting set in the specific context of read
here and hence it can
upset rest of the code, if any. – codeforester
Jan 2 '17 at 5:37
shuaihanhungry , Jan 20 at
15:54
you can apply awk to many situations
echo "[email protected];[email protected]"|awk -F';' '{printf "%s\n%s\n", $1, $2}'
also you can use this
echo "[email protected];[email protected]"|awk -F';' '{print $1,$2}' OFS="\n"
ghost ,
Apr 24, 2013 at 13:13
If no space, Why not this?
IN="[email protected];[email protected]"
arr=(`echo $IN | tr ';' ' '`)
echo ${arr[0]}
echo ${arr[1]}
eukras ,
Oct 22, 2012 at 7:10
There are some cool answers here (errator esp.), but for something analogous to split in
other languages -- which is what I took the original question to mean -- I settled on this:
IN="[email protected];[email protected]"
declare -a a="(${IN/;/ })";
Now ${a[0]}
, ${a[1]}
, etc, are as you would expect. Use
${#a[*]}
for number of terms. Or to iterate, of course:
for i in ${a[*]}; do echo $i; done
IMPORTANT NOTE:
This works in cases where there are no spaces to worry about, which solved my problem, but
may not solve yours. Go with the $IFS
solution(s) in that case.
olibre ,
Oct 7, 2013 at 13:33
Does not work when IN
contains more than two e-mail addresses. Please refer to
same idea (but fixed) at palindrom's answer – olibre
Oct 7 '13 at 13:33
sorontar , Oct 26, 2016 at 5:14
Better use ${IN//;/ }
(double slash) to make it also work with more than two
values. Beware that any wildcard ( *?[
) will be expanded. And a trailing empty
field will be discarded. – sorontar
Oct 26 '16 at 5:14
jeberle
, Apr 30, 2013 at 3:10
Use the set
built-in to load up the $@
array:
IN="[email protected];[email protected]"
IFS=';'; set $IN; IFS=$' \t\n'
Then, let the party begin:
echo $#
for a; do echo $a; done
ADDR1=$1 ADDR2=$2
sorontar , Oct 26, 2016 at 5:17
Better use set -- $IN
to avoid some issues with "$IN" starting with dash. Still,
the unquoted expansion of $IN
will expand wildcards ( *?[
).
– sorontar
Oct 26 '16 at 5:17
NevilleDNZ , Sep 2, 2013 at 6:30
Two bourne-ish alternatives where neither require bash arrays:
Case 1 : Keep it nice and simple: Use a NewLine as the Record-Separator... eg.
IN="[email protected]
[email protected]"
while read i; do
# process "$i" ... eg.
echo "[email:$i]"
done <<< "$IN"
Note: in this first case no sub-process is forked to assist with list manipulation.
Idea: Maybe it is worth using NL extensively internally , and only converting to
a different RS when generating the final result externally .
Case 2 : Using a ";" as a record separator... eg.
NL="
" IRS=";" ORS=";"
conv_IRS() {
exec tr "$1" "$NL"
}
conv_ORS() {
exec tr "$NL" "$1"
}
IN="[email protected];[email protected]"
IN="$(conv_IRS ";" <<< "$IN")"
while read i; do
# process "$i" ... eg.
echo -n "[email:$i]$ORS"
done <<< "$IN"
In both cases a sub-list can be composed within the loop is persistent after the loop has
completed. This is useful when manipulating lists in memory, instead storing lists in files.
{p.s. keep calm and carry on B-) }
fedorqui , Jan 8, 2015 at 10:21
Apart from the fantastic answers that were already provided, if it is just a matter of
printing out the data you may consider using awk
:
awk -F";" '{for (i=1;i<=NF;i++) printf("> [%s]\n", $i)}' <<< "$IN"
This sets the field separator to ;
, so that it can loop through the fields
with a for
loop and print accordingly.
Test
$ IN="[email protected];[email protected]"
$ awk -F";" '{for (i=1;i<=NF;i++) printf("> [%s]\n", $i)}' <<< "$IN"
> [[email protected]]
> [[email protected]]
With another input:
$ awk -F";" '{for (i=1;i<=NF;i++) printf("> [%s]\n", $i)}' <<< "a;b;c d;e_;f"
> [a]
> [b]
> [c d]
> [e_]
> [f]
18446744073709551615 ,
Feb 20, 2015 at 10:49
In Android shell, most of the proposed methods just do not work:
$ IFS=':' read -ra ADDR <<<"$PATH"
/system/bin/sh: can't create temporary file /sqlite_stmt_journals/mksh.EbNoR10629: No such file or directory
What does work is:
$ for i in ${PATH//:/ }; do echo $i; done
/sbin
/vendor/bin
/system/sbin
/system/bin
/system/xbin
where //
means global replacement.
sorontar , Oct 26, 2016 at 5:08
Fails if any part of $PATH contains spaces (or newlines). Also expands wildcards (asterisk *,
question mark ? and braces [ ]). – sorontar
Oct 26 '16 at 5:08
Eduardo
Lucio , Apr 4, 2016 at 19:54
Okay guys!
Here's my answer!
DELIMITER_VAL='='
read -d '' F_ABOUT_DISTRO_R <<"EOF"
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=14.04
DISTRIB_CODENAME=trusty
DISTRIB_DESCRIPTION="Ubuntu 14.04.4 LTS"
NAME="Ubuntu"
VERSION="14.04.4 LTS, Trusty Tahr"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 14.04.4 LTS"
VERSION_ID="14.04"
HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
EOF
SPLIT_NOW=$(awk -F$DELIMITER_VAL '{for(i=1;i<=NF;i++){printf "%s\n", $i}}' <<<"${F_ABOUT_DISTRO_R}")
while read -r line; do
SPLIT+=("$line")
done <<< "$SPLIT_NOW"
for i in "${SPLIT[@]}"; do
echo "$i"
done
Why this approach is "the best" for me?
Because of two reasons:
- You do not need to escape the delimiter;
- You will not have problem with blank spaces . The value will be properly separated in
the array!
[]'s
gniourf_gniourf , Jan 30,
2017 at 8:26
FYI, /etc/os-release
and /etc/lsb-release
are meant to be sourced,
and not parsed. So your method is really wrong. Moreover, you're not quite answering the
question about spiltting a string on a delimiter. – gniourf_gniourf
Jan 30 '17 at 8:26
Michael
Hale , Jun 14, 2012 at 17:38
A one-liner to split a string separated by ';' into an array is:
IN="[email protected];[email protected]"
ADDRS=( $(IFS=";" echo "$IN") )
echo ${ADDRS[0]}
echo ${ADDRS[1]}
This only sets IFS in a subshell, so you don't have to worry about saving and restoring
its value.
Luca
Borrione , Sep 3, 2012 at 10:04
-1 this doesn't work here (ubuntu 12.04). it prints only the first echo with all $IN value in
it, while the second is empty. you can see it if you put echo "0: "${ADDRS[0]}\n echo "1:
"${ADDRS[1]} the output is 0: [email protected];[email protected]\n 1:
(\n is new line)
– Luca
Borrione
Sep 3 '12 at 10:04
Luca
Borrione , Sep 3, 2012 at 10:05
please refer to nickjb's answer at for a working alternative to this idea
stackoverflow.com/a/6583589/1032370 – Luca Borrione
Sep 3 '12 at 10:05
Score_Under , Apr 28, 2015 at
17:09
-1, 1. IFS isn't being set in that subshell (it's being passed to the environment of "echo",
which is a builtin, so nothing is happening anyway). 2. $IN
is quoted so it
isn't subject to IFS splitting. 3. The process substitution is split by whitespace, but this
may corrupt the original data. – Score_Under
Apr 28 '15 at 17:09
ajaaskel , Oct 10, 2014 at 11:33
IN='[email protected];[email protected];Charlie Brown <[email protected];!"#$%&/()[]{}*? are no problem;simple is beautiful :-)'
set -f
oldifs="$IFS"
IFS=';'; arrayIN=($IN)
IFS="$oldifs"
for i in "${arrayIN[@]}"; do
echo "$i"
done
set +f
Output:
[email protected]
[email protected]
Charlie Brown <[email protected]
!"#$%&/()[]{}*? are no problem
simple is beautiful :-)
Explanation: Simple assignment using parenthesis () converts semicolon separated list into
an array provided you have correct IFS while doing that. Standard FOR loop handles individual
items in that array as usual. Notice that the list given for IN variable must be "hard"
quoted, that is, with single ticks.
IFS must be saved and restored since Bash does not treat an assignment the same way as a
command. An alternate workaround is to wrap the assignment inside a function and call that
function with a modified IFS. In that case separate saving/restoring of IFS is not needed.
Thanks for "Bize" for pointing that out.
gniourf_gniourf , Feb 20,
2015 at 16:45
!"#$%&/()[]{}*? are no problem
well... not quite: []*?
are glob
characters. So what about creating this directory and file: `mkdir '!"#$%&'; touch
'!"#$%&/()[]{} got you hahahaha - are no problem' and running your command? simple may be
beautiful, but when it's broken, it's broken. – gniourf_gniourf
Feb 20 '15 at 16:45
ajaaskel , Feb 25, 2015 at 7:20
@gniourf_gniourf The string is stored in a variable. Please see the original question.
– ajaaskel
Feb 25 '15 at 7:20
gniourf_gniourf , Feb 25,
2015 at 7:26
@ajaaskel you didn't fully understand my comment. Go in a scratch directory and issue these
commands: mkdir '!"#$%&'; touch '!"#$%&/()[]{} got you hahahaha - are no
problem'
. They will only create a directory and a file, with weird looking names, I
must admit. Then run your commands with the exact IN
you gave:
IN='[email protected];[email protected];Charlie Brown <[email protected];!"#$%&/()[]{}*?
are no problem;simple is beautiful :-)'
. You'll see that you won't get the output you
expect. Because you're using a method subject to pathname expansions to split your string.
– gniourf_gniourf
Feb 25 '15 at 7:26
gniourf_gniourf , Feb 25,
2015 at 7:29
This is to demonstrate that the characters *
, ?
,
[...]
and even, if extglob
is set, !(...)
,
@(...)
, ?(...)
, +(...)
are problems with this
method! – gniourf_gniourf
Feb 25 '15 at 7:29
ajaaskel , Feb 26, 2015 at 15:26
@gniourf_gniourf Thanks for detailed comments on globbing. I adjusted the code to have
globbing off. My point was however just to show that rather simple assignment can do the
splitting job. – ajaaskel
Feb 26 '15 at 15:26
> , Dec 19, 2013 at 21:39
Maybe not the most elegant solution, but works with *
and spaces:
IN="bla@so me.com;*;[email protected]"
for i in `delims=${IN//[^;]}; seq 1 $((${#delims} + 1))`
do
echo "> [`echo $IN | cut -d';' -f$i`]"
done
Outputs
> [bla@so me.com]
> [*]
> [[email protected]]
Other example (delimiters at beginning and end):
IN=";bla@so me.com;*;[email protected];"
> []
> [bla@so me.com]
> [*]
> [[email protected]]
> []
Basically it removes every character other than ;
making delims
eg. ;;;
. Then it does for
loop from 1
to
number-of-delimiters
as counted by ${#delims}
. The final step is
to safely get the $i
th part using cut
.
Notable quotes:
"... Backquotes ( ` ` ) are old-style form of command substitution, with some differences: in this form, backslash retains its literal meaning except when followed by $ , ` , or \ , and the first backquote not preceded by a backslash terminates the command substitution; whereas in the $( ) form, all characters between the parentheses make up the command, none are treated specially. ..."
"... Double square brackets delimit a Conditional Expression. And, I find the following to be a good reading on the subject: "(IBM) Demystify test, [, [[, ((, and if-then-else" ..."
Amit , Jun 7,
2011 at 19:18
I have a couple of variables and I want to check the following condition (written out in words,
then my failed attempt at bash scripting):
if varA EQUALS 1 AND ( varB EQUALS "t1" OR varB EQUALS "t2" ) then
do something
done.
And in my failed attempt, I came up with:
if (($varA == 1)) && ( (($varB == "t1")) || (($varC == "t2")) );
then
scale=0.05
fi
Best answer Gilles
What you've written actually almost works (it would work if all the variables were numbers), but
it's not an idiomatic way at all.
( )
parentheses indicate a
subshell . What's inside them isn't an expression like in many other languages. It's a
list of commands (just like outside parentheses). These commands are executed in a separate
subprocess, so any redirection, assignment, etc. performed inside the parentheses has no effect
outside the parentheses.
- With a leading dollar sign,
$( )
is a
command substitution : there is a command inside the parentheses, and the output from
the command is used as part of the command line (after extra expansions unless the substitution
is between double quotes, but that's
another story ).
{ }
braces are like parentheses in that they group commands, but they only
influence parsing, not grouping. The program x=2; { x=4; }; echo $x
prints 4,
whereas x=2; (x=4); echo $x
prints 2. (Also braces require spaces around them
and a semicolon before closing, whereas parentheses don't. That's just a syntax quirk.)
- With a leading dollar sign,
${VAR}
is a
parameter expansion , expanding to the value of a variable, with possible extra transformations.
(( ))
double parentheses surround an
arithmetic instruction , that is, a computation on integers, with a syntax resembling other
programming languages. This syntax is mostly used for assignments and in conditionals.
- The same syntax is used in arithmetic expressions
$(( ))
, which expand
to the integer value of the expression.
[[ ]]
double brackets surround
conditional expressions . Conditional expressions are mostly built on
operators such as -n $variable
to test if a variable is empty and -e
$file
to test if a file exists. There are also string equality operators: "$string1"
= "$string2"
(beware that the right-hand side is a pattern, e.g. [[ $foo = a*
]]
tests if $foo
starts with a
while [[ $foo = "a*"
]]
tests if $foo
is exactly a*
), and the familiar !
, &&
and ||
operators for negation, conjunction and disjunction as
well as parentheses for grouping.
- Note that you need a space around each operator (e.g.
[[ "$x" = "$y" ]]
, not [[ "$x"="$y" ]]
), and a space or a character like ;
both inside and outside the brackets (e.g. [[ -n $foo ]]
, not [[-n
$foo]]
).
[ ]
single brackets are an alternate form of conditional expressions with
more quirks (but older and more portable). Don't write any for now; start worrying about them
when you find scripts that contain them.
This is the idiomatic way to write your test in bash:
if [[ $varA = 1 && ($varB = "t1" || $varC = "t2") ]]; then
If you need portability to other shells, this would be the way (note the additional quoting
and the separate sets of brackets around each individual test):
if [ "$varA" = 1 ] && { [ "$varB" = "t1" ] || [ "$varC" = "t2" ]; }; then
Will Sheppard
, Jun 19, 2014 at 11:07
It's better to use ==
to differentiate the comparison from assigning a variable (which
is also =
) –
Will Sheppard
Jun 19 '14 at 11:07
Cbhihe , Apr
3, 2016 at 8:05
+1 @WillSheppard for yr reminder of proper style. Gilles, don't you need a semicolon after yr
closing curly bracket and before "then" ? I always thought if
, then
, else
and fi
could not be on the same line... As in:
if [ "$varA" = 1 ] && { [ "$varB" = "t1" ] || [ "$varC" = "t2" ]; }; then
– Cbhihe
Apr 3 '16 at 8:05
Rockallite
, Jan 19 at 2:41
Backquotes ( ` `
) are old-style form of command substitution, with some differences:
in this form, backslash retains its literal meaning except when followed by $
,
`
, or \
, and the first backquote not preceded by a backslash terminates
the command substitution; whereas in the $( )
form, all characters between the parentheses
make up the command, none are treated specially. –
Rockallite
Jan 19 at 2:41
Peter A.
Schneider , Aug 28 at 13:16
You could emphasize that single brackets have completely different semantics inside and outside
of double brackets. (Because you start with explicitly pointing out the subshell semantics but
then only as an aside mention the grouping semantics as part of conditional expressions. Was confusing
to me for a second when I looked at your idiomatic example.) –
Peter A. Schneider
Aug 28 at 13:16
matchew ,
Jun 7, 2011 at 19:29
very close
if (( $varA == 1 )) && [[ $varB == 't1' || $varC == 't2' ]];
then
scale=0.05
fi
should work.
breaking it down
(( $varA == 1 ))
is an integer comparison where as
$varB == 't1'
is a string comparison. otherwise, I am just grouping the comparisons correctly.
Double square brackets delimit a Conditional Expression. And, I find the following to be
a good reading on the subject:
"(IBM)
Demystify test, [, [[, ((, and if-then-else"
Peter A.
Schneider , Aug 28 at 13:21
Just to be sure: The quoting in 't1' is unnecessary, right? Because as opposed to arithmetic instructions
in double parentheses, where t1 would be a variable, t1 in a conditional expression in double
brackets is just a literal string.
I.e., [[ $varB == 't1' ]]
is exactly the same as [[ $varB == t1 ]]
, right? –
Peter A. Schneider
Aug 28 at 13:21
February 20, 2007 |
www.ibm.com
This content is part of the series: Linux tip
The Bash shell is available on many Linux® and UNIX® systems today, and is a common default
shell on Linux. Bash includes powerful programming capabilities, including extensive functions
for testing file types and attributes, as well as the arithmetic and string comparisons available
in most programming languages. Understanding the various tests and knowing that the shell can
also interpret some operators as shell metacharacters is an important step to becoming a power
shell user. This article, excerpted from the developerWorks tutorial
LPI exam 102 prep: Shells, scripting, programming, and compiling, shows you how to understand
and use the test and comparison operations of the Bash shell.
This tip explains the shell test and comparison functions and shows you how to add programming
capability to the shell. You may have already seen simple shell logic using the && and ||
operators, which allow you to execute a command based on whether the previous command exits
normally or with an error. In this tip, you will see how to extend these basic techniques to more
complex shell programming.
Tests
In any programming language, after you learn how to assign values to variables and pass
parameters, you need to test those values and parameters. In shells, the tests set the return
status, which is the same thing that other commands do. In fact, test
is a builtin
command!
test and [
The test
builtin command returns 0 (True) or 1 (False), depending on the
evaluation of an expression, expr. You can also use square brackets: test expr
and [ expr ] are equivalent. You can examine the return value by displaying $?
;
you can use the return value with && and ||; or you can test it using the various conditional
constructs that are covered later in this tip.
Listing 1. Some simple tests
1 2
34
56 |
[ian@pinguino ~]$ test 3 -gt 4 && echo True || echo false false
[ian@pinguino ~]$ [ "abc" != "def" ];echo $? 0
[ian@pinguino ~]$ test -d "$HOME" ;echo $? 0
|
In the first example in Listing 1, the -gt
operator performs an arithmetic
comparison between two literal values. In the second example, the alternate [ ]
form
compares two strings for inequality. In the final example, the value of the HOME variable is
tested to see if it is a directory using the -d
unary operator.
You can compare arithmetic values using one of -eq
, -ne
, -lt
,
-le
, -gt
, or -ge
, meaning equal, not equal, less than,
less than or equal, greater than, and greater than or equal, respectively.
You can compare strings for equality, inequality, or whether the first string sorts before or
after the second one using the operators =
, !=
, <
, and
>
, respectively. The unary operator -z
tests for a null string, while
-n
or no operator at all returns True if a string is not empty.
Note: the <
and >
operators are also used by the
shell for redirection, so you must escape them using \<
or \>
. Listing
2 shows more examples of string tests. Check that they are as you expect.
Listing 2. Some string tests
1 2
34
56
78
910
1112 |
[ian@pinguino ~]$ test "abc" = "def" ;echo $? 1
[ian@pinguino ~]$ [ "abc" != "def" ];echo $? 0
[ian@pinguino ~]$ [ "abc" \< "def" ];echo $? 0
[ian@pinguino ~]$ [ "abc" \> "def" ];echo $? 1
[ian@pinguino ~]$ [ "abc" \<"abc" ];echo $? 1
[ian@pinguino ~]$ [ "abc" \> "abc" ];echo $? 1
|
Some of the more common file tests are shown in Table 1. The result is True if the file tested
is a file that exists and that has the specified characteristic.
Table 1. Some common file tests
Operator |
Characteristic |
-d |
Directory |
-e |
Exists (also -a) |
-f |
Regular file |
-h |
Symbolic link (also -L) |
-p |
Named pipe |
-r |
Readable by you |
-s |
Not empty |
-S |
Socket |
-w |
Writable by you |
-N |
Has been modified since last being read |
In addition to the unary tests above, you can compare two files with the binary operators
shown in Table 2.
Table 2. Testing pairs of files
Operator |
True if |
-nt |
Test if file1 is newer than file 2. The modification date is used for this and the
next comparison. |
-ot |
Test if file1 is older than file 2. |
-ef |
Test if file1 is a hard link to file2. |
Several other tests allow you to check things such as the permissions of the file. See the man
pages for bash for more details or use help test
to see brief information on the
test builtin. You can use the help
command for other builtins too.
The -o
operator allows you to test various shell options that may be set using
set -o option
, returning True (0) if the option is set and False (1)
otherwise, as shown in Listing 3.
Listing 3. Testing shell options
1 2
34
56 |
[ian@pinguino ~]$ set +o nounset [ian@pinguino ~]$ [ -o nounset
];echo $?
1 [ian@pinguino ~]$ set -u
[ian@pinguino ~]$ test -o nounset; echo $? 0
|
Finally, the -a
and -o
options allow you to combine expressions with
logical AND and OR, respectively, while the unary !
operator inverts the sense of
the test. You may use parentheses to group expressions and override the default precedence.
Remember that the shell will normally run an expression between parentheses in a subshell, so you
will need to escape the parentheses using \( and \) or enclosing these operators in single or
double quotes. Listing 4 illustrates the application of de Morgan's laws to an expression.
Listing 4. Combining and grouping tests
1 2
34
56 |
[ian@pinguino ~]$ test "a" != "$HOME" -a 3 -ge 4 ; echo $? 1
[ian@pinguino ~]$ [ ! \( "a" = "$HOME" -o 3 -lt 4 \) ]; echo $? 1
[ian@pinguino ~]$ [ ! \( "a" = "$HOME" -o '(' 3 -lt 4 ')' ")" ]; echo $?
1 |
(( and [[
The test
command is very powerful, but somewhat unwieldy given its requirement
for escaping and given the difference between string and arithmetic comparisons. Fortunately,
bash has two other ways of testing that are somewhat more natural for people who are familiar
with C, C++, or Java® syntax.
The (( ))
compound command evaluates an arithmetic expression and sets
the exit status to 1 if the expression evaluates to 0, or to 0 if the expression evaluates to a
non-zero value. You do not need to escape operators between ((
and ))
.
Arithmetic is done on integers. Division by 0 causes an error, but overflow does not. You may
perform the usual C language arithmetic, logical, and bitwise operations. The let
command can also execute one or more arithmetic expressions. It is usually used to assign values
to arithmetic variables.
Listing 5. Assigning and testing arithmetic expressions
1 2
34
56 |
[ian@pinguino ~]$ let x=2 y=2**3 z=y*3;echo $? $x $y $z 0 2 8 24
[ian@pinguino ~]$ (( w=(y/x) + ( (~ ++x) & 0x0f ) )); echo $? $x $y $w
0 3 8 16
[ian@pinguino ~]$ (( w=(y/x) + ( (~ ++x) & 0x0f ) )); echo $? $x $y $w
0 4 8 13 |
As with (( ))
, the [[ ]]
compound command allows you to use more
natural syntax for filename and string tests. You can combine tests that are allowed for the
test
command using parentheses and logical operators.
Listing 6. Using the [[ compound
1 2
3 |
[ian@pinguino ~]$ [[ ( -d "$HOME" ) && ( -w "$HOME" ) ]] &&
> echo "home is a writable directory"
home is a writable directory |
The [[
compound can also do pattern matching on strings when the =
or !=
operators are used. The match behaves as for wildcard globbing as illustrated
in Listing 7.
Listing 7. Wildcard tests with [[
1 2
34
56 |
[ian@pinguino ~]$ [[ "abc def .d,x--" == a[abc]*\ ?d* ]]; echo $?
0
[ian@pinguino ~]$ [[ "abc def c" == a[abc]*\ ?d* ]]; echo $? 1
[ian@pinguino ~]$ [[ "abc def d,x" == a[abc]*\ ?d* ]]; echo $? 1
|
You can even do arithmetic tests within [[
compounds, but be careful. Unless
within a ((
compound, the <
and >
operators will compare
the operands as strings and test their order in the current collating sequence. Listing 8
illustrates this with some examples.
Listing 8. Including arithmetic tests with [[
1 2
34
56
78
910 |
[ian@pinguino ~]$ [[ "abc def d,x" == a[abc]*\ ?d* || (( 3 > 2 )) ]]; echo $?
0
[ian@pinguino ~]$ [[ "abc def d,x" == a[abc]*\ ?d* || 3 -gt 2 ]]; echo $?
0
[ian@pinguino ~]$ [[ "abc def d,x" == a[abc]*\ ?d* || 3 > 2 ]]; echo $?
0
[ian@pinguino ~]$ [[ "abc def d,x" == a[abc]*\ ?d* || a > 2 ]]; echo $?
0
[ian@pinguino ~]$ [[ "abc def d,x" == a[abc]*\ ?d* || a -gt 2 ]]; echo $?
-bash: a: unbound variable |
Conditionals
While you could accomplish a huge amount of programming with the above tests and the &&
and ||
control operators, bash includes the more familiar "if, then, else" and case
constructs. After you learn about these, you will learn about looping constructs and your toolbox
will really expand.
If, then, else statements
The bash if
command is a compound command that tests the return value of a test
or command ($?
) and branches based on whether it is True (0) or False (not 0).
Although the tests above returned only 0 or 1 values, commands may return other values. Learn
more about these in the
LPI exam 102 prep: Shells, scripting, programming, and compiling tutorial.
The if
command in bash has a then
clause containing a list of
commands to be executed if the test or command returns 0, one or more optional elif
clauses, each with an additional test and then
clause with an associated list of
commands, an optional final else
clause and list of commands to be executed if
neither the original test, nor any of the tests used in the elif
clauses was true,
and a terminal fi
to mark the end of the construct.
Using what you have learned so far, you could now build a simple calculator to evaluate
arithmetic expressions as shown in Listing 9.
Listing 9. Evaluating expressions with if, then, else
1 2
34
56
78
910
1112
1314
1516
1718
1920
2122
2324 |
[ian@pinguino ~]$ function mycalc () > {
> local x > if [ $# -lt 1 ]; then
> echo "This function evaluates arithmetic for you if you give it some"
> elif (( $* )); then
> let x="$*" > echo "$* = $x"
> else > echo "$* = 0 or is not an arithmetic expression"
> fi > }
[ian@pinguino ~]$ mycalc 3 + 4 3 + 4 = 7
[ian@pinguino ~]$ mycalc 3 + 4**3 3 + 4**3 = 67
[ian@pinguino ~]$ mycalc 3 + (4**3 /2) -bash: syntax error near
unexpected token `('
[ian@pinguino ~]$ mycalc 3 + "(4**3 /2)" 3 + (4**3 /2) = 35
[ian@pinguino ~]$ mycalc xyz xyz = 0 or is not an arithmetic
expression
[ian@pinguino ~]$ mycalc xyz + 3 + "(4**3 /2)" + abc xyz + 3 + (4**3
/2) + abc = 35
|
The calculator makes use of the local
statement to declare x as a local variable
that is available only within the scope of the mycalc
function. The let
function has several possible options, as does the declare
function to which it is
closely related. Check the man pages for bash, or use help let
for more information.
As you saw in Listing 9, you need to make sure that your expressions are properly escaped if
they use shell metacharacters such as (, ), *, >, and <. Nevertheless, you have quite a handy
little calculator for evaluating arithmetic as the shell does it.
You may have noticed the else
clause and the last two examples in Listing 9. As
you can see, it is not an error to pass xyz
to mycalc, but it evaluates to 0. This
function is not smart enough to identify the character values in the final example of use and
thus be able to warn the user. You could use a string pattern matching test such as
[[ ! ("$*" == *[a-zA-Z]* ]]
(or the appropriate form for your locale) to eliminate any expression containing alphabetic
characters, but that would prevent using hexadecimal notation in your input, since you might use
0x0f to represent 15 using hexadecimal notation. In fact, the shell allows bases up to 64 (using
base#value
notation), so you could legitimately use any alphabetic
character, plus _ and @ in your input. Octal and hexadecimal use the usual notation of a leading
0 for octal and leading 0x or 0X for hexadecimal. Listing 10 shows some examples.
Listing 10. Calculating with different bases
1 2
34
56
78
910
1112
1314 |
[ian@pinguino ~]$ mycalc 015 015 = 13
[ian@pinguino ~]$ mycalc 0xff 0xff = 255
[ian@pinguino ~]$ mycalc 29#37 29#37 = 94
[ian@pinguino ~]$ mycalc 64#1az 64#1az = 4771
[ian@pinguino ~]$ mycalc 64#1azA 64#1azA = 305380
[ian@pinguino ~]$ mycalc 64#1azA_@ 64#1azA_@ = 1250840574
[ian@pinguino ~]$ mycalc 64#1az*64**3 + 64#A_@ 64#1az*64**3 + 64#A_@ =
1250840574
|
Additional laundering of the input is beyond the scope of this tip, so use your calculator
with care.
The elif
statement is very convenient. It helps you in writing scripts by
allowing you to simplify the indenting. You may be surprised to see the output of the type
command for the mycalc
function as shown in Listing 11.
Listing 11. Type mycalc
1 2
34
56
78
910
1112
1314
1516 |
[ian@pinguino ~]$ type mycalc mycalc is a function
mycalc () {
local x; if [ $# -lt 1 ]; then
echo "This function evaluates arithmetic for you if you give it
some"; else
if (( $* )); then let
x="$*";
echo "$* = $x"; else
echo "$* = 0 or is not an arithmetic expression";
fi;
fi }
|
Of course, you could just do shell arithmetic by using $(( expression ))
with the echo
command as shown in Listing 12. You wouldn't have learned anything
about functions or tests that way, but do note that the shell does not interpret metacharacters,
such as *, in their normal role when inside (( expression ))
or [[ expression ]]
.
Listing 12. Direct calculation in the shell with echo and $(( ))
1 2 |
[ian@pinguino ~]$ echo $((3 + (4**3 /2))) 35
|
Learn more
If you'd like to know more about Bash scripting in Linux, read the tutorial "LPI
exam 102 prep: Shells, scripting, programming, and compiling," from which this article was
excerpted. Don't forget to
rate this page.
Related topics
Reading rear sources is an interesting exercise. It really demonstrates attempt to use
"reasonable' style of shell programming and you can learn a lot.
Relax-and-Recover is written in
Bash (at least bash version
3 is needed), a language that can be used in many styles. We want to make it easier for everybody
to understand the Relax-and-Recover code and subsequently to contribute fixes and enhancements.
Here is a collection of coding hints that should help to get a more consistent code base.
Don't be afraid to contribute to Relax-and-Recover even if your contribution does not fully match
all this coding hints. Currently large parts of the Relax-and-Recover code are not yet in compliance
with this coding hints. This is an ongoing step by step process. Nevertheless try to understand the
idea behind this coding hints so that you know how to break them properly (i.e. "learn the rules
so you know how to break them properly").
The overall idea behind this coding hints is:
Make yourself understood
Make yourself understood to enable others to fix and enhance your code properly as needed.
From this overall idea the following coding hints are derived.
For the fun of it an extreme example what coding style should be avoided:
#!/bin/bash for i in `seq 1 2 $((2*$1-1))`;do echo $((j+=i));done
Try to find out what that code is about - it does a useful thing.
Code must be easy to read
Code should be easy to understand
Do not only tell what the code does (i.e. the implementation details) but also explain what the
intent behind is (i.e. why ) to make the code maintainable.
- Provide meaningful comments that tell what the computer should do and also explain why it
should do it so that others understand the intent behind so that they can properly fix issues
or adapt and enhance it as needed.
- If there is a GitHub issue or another URL available for a particular piece of code provide
a comment with the GitHub issue or any other URL that tells about the reasoning behind current
implementation details.
Here the initial example so that one can understand what it is about:
#!/bin/bash # output the first N square numbers # by summing up the first N odd numbers 1 3 ...
2*N-1 # where each nth partial sum is the nth square number # see https://en.wikipedia.org/wiki/Square_number#Properties
# this way it is a little bit faster for big N compared to # calculating each square number on its
own via multiplication N=$1 if ! [[ $N =~ ^[0-9]+$ ]] ; then echo "Input must be non-negative integer."
1>&2 exit 1 fi square_number=0 for odd_number in $( seq 1 2 $(( 2 * N - 1 )) ) ; do (( square_number
+= odd_number )) && echo $square_number done
Now the intent behind is clear and now others can easily decide if that code is really the best
way to do it and easily improve it if needed.
Try to care about possible errors
By default bash proceeds with the next command when something failed. Do not let your code blindly
proceed in case of errors because that could make it hard to find the root cause of a failure when
it errors out somewhere later at an unrelated place with a weird error message which could lead to
false fixes that cure only a particular symptom but not the root cause.
Maintain Backward Compatibility
Implement adaptions and enhancements in a backward compatible way so that your changes do not
cause regressions for others.
- One same Relax-and-Recover code must work on various different systems. On older systems as
well as on newest systems and on various different Linux distributions.
- Preferably use simple generic functionality that works on any Linux system. Better very simple
code than oversophisticated (possibly fragile) constructs. In particular avoid special bash version
4 features (Relax-and-Recover code should also work with bash version 3).
- When there are incompatible differences on different systems distinction of cases with separated
code is needed because it is more important that the Relax-and-Recover code works everywhere than
having generic code that sometimes fails.
Dirty
hacks welcome
When there are special issues on particular systems it is more important that the Relax-and-Recover
code works than having nice looking clean code that sometimes fails. In such special cases any dirty
hacks that intend to make it work everywhere are welcome. But for dirty hacks the above listed coding
hints become mandatory rules:
- Provide explanatory comments that tell what a dirty hack does together with a GitHub issue
or any other URL that tell about the reasoning behind the dirty hack to enable others to properly
adapt or clean up a dirty hack at any time later when the reason for it had changed or gone away.
- Try as good as you can to foresee possible errors or failures of a dirty hack and error out
with meaningful error messages if things go wrong to enable others to understand the reason behind
a failure.
- Implement the dirty hack in a way so that it does not cause regressions for others.
For example a dirty hack like the following is perfectly acceptable:
# FIXME: Dirty hack to make it work # on "FUBAR Linux version 666" # where COMMAND sometimes inexplicably
fails # but always works after at most 3 attempts # see http://example.org/issue12345 # Retries should
have no bad effect on other systems # where the first run of COMMAND works. COMMAND || COMMAND ||
COMMAND || Error "COMMAND failed."
Character
Encoding
Use only traditional (7-bit) ASCII charactes. In particular do not use UTF-8 encoded multi-byte
characters.
- Non-ASCII characters in scripts may cause arbitrary unexpected failures on systems that do
not support other locales than POSIX/C. During "rear recover" only the POSIX/C locale works (the
ReaR rescue/recovery system has no support for non-ASCII locales) and /usr/sbin/rear sets the
C locale so that non-ASCII characters are invalid in scripts. Have in mind that basically all
files in ReaR are scripts. E.g. also /usr/share/rear/conf/default.conf and /etc/rear/local.conf
are sourced (and executed) as scripts.
- English documentation texts do not need non-ASCII characters. Using non-ASCII characters in
documentation texts makes it needlessly hard to display the documentation correctly for any user
on any system. When non-ASCII characters are used but the user does not have the exact right matching
locale set arbitrary nonsense can happen, cf.
https://en.opensuse.org/SDB:Plain_Text_versus_Locale
Text Layout
- Indentation with 4 blanks, not tabs.
- Block level statements in same line:
if CONDITION ; then
Variables
- Curly braces only where really needed:
$FOO
instead of ${FOO}
, but ${FOO:-default_foo}
.
- All variables that are used in more than a single script must be all-caps:
$FOO
instead of $foo
or $Foo
.
- Variables that are used only locally should be lowercased and should be marked with
local
like:
local $foo="default_value"
Functions
- Use the
function
keyword to define a function.
- Function names are lower case, words separated by underline (
_
).
Relax-and-Recover functions
Use the available Relax-and-Recover functions when possible instead of re-implementing basic functionality
again and again. The Relax-and-Recover functions are implemented in various
lib/*-functions.sh
files .
is_true
and is_false
:
See
lib/global-functions.sh how to use them.
For example instead of using
if [[ ! "$FOO" =~ ^[yY1] ]] ; then
use
if ! is_true "$FOO" ; then
test, [, [[,
((
- Use
[[
where it is required (e.g. for pattern matching or complex conditionals)
and [
or test
everywhere else.
((
is the preferred way for numeric comparison, variables don't need to be prefixed
with $
there.
Paired
parenthesis
- Use paired parenthesis for
case
patterns as in
case WORD in (PATTERN) COMMANDS ;; esac
so that editor commands (like '%' in 'vi') that check for matching opening and closing parenthesis
work everywhere in the code.
See also
The -o operator allows you to test various shell options that may be set using set -o
option, returning True (0) if the option is set and False (1) otherwise, as shown in
Listing 3.
Listing 3. Testing shell options
[ian@pinguino ~]$ set +o nounset
[ian@pinguino ~]$ [ -o nounset ];echo $?
1
[ian@pinguino ~]$ set -u
[ian@pinguino ~]$ test -o nounset; echo $?
0
Finally, the -a and -o options allow you to combine expressions with logical AND and OR, respectively,
while the unary ! operator inverts the sense of the test. You may use parentheses to group expressions
and override the default precedence. Remember that the shell will normally run an expression between
parentheses in a subshell, so you will need to escape the parentheses using \( and \) or enclosing
these operators in single or double quotes. Listing 4 illustrates the application of de Morgan's
laws to an expression.
Listing 4. Combining and grouping tests
[ian@pinguino ~]$ test "a" != "$HOME" -a 3 -ge 4 ; echo $?
1
[ian@pinguino ~]$ [ ! \( "a" = "$HOME" -o 3 -lt 4 \) ]; echo $?
1
[ian@pinguino ~]$ [ ! \( "a" = "$HOME" -o '(' 3 -lt 4 ')' ")" ]; echo $?
1
(( and [[
The test command is very powerful, but somewhat unwieldy given its requirement for escaping and
given the difference between string and arithmetic comparisons. Fortunately, bash has two other ways
of testing that are somewhat more natural for people who are familiar with C, C++, or Java® syntax.
The (( )) compound command evaluates an arithmetic expression and sets the exit status to 1 if
the expression evaluates to 0, or to 0 if the expression evaluates to a non-zero value. You do not
need to escape operators between (( and )). Arithmetic is done on integers. Division by 0 causes
an error, but overflow does not. You may perform the usual C language arithmetic, logical, and bitwise
operations. The let command can also execute one or more arithmetic expressions. It is usually used
to assign values to arithmetic variables.
Listing 5. Assigning and testing arithmetic expressions
[ian@pinguino ~]$ let x=2 y=2**3 z=y*3;echo $? $x $y $z
0 2 8 24
[ian@pinguino ~]$ (( w=(y/x) + ( (~ ++x) & 0x0f ) )); echo $? $x $y $w
0 3 8 16
[ian@pinguino ~]$ (( w=(y/x) + ( (~ ++x) & 0x0f ) )); echo $? $x $y $w
0 4 8 13
As with (( )), the [[ ]] compound command allows you to use more natural syntax for filename and
string tests. You can combine tests that are allowed for the test command using parentheses and logical
operators.
Listing 6. Using the [[ compound
[ian@pinguino ~]$ [[ ( -d "$HOME" ) && ( -w "$HOME" ) ]] && echo "home is a writable directory"
home is a writable directory
The [[ compound can also do pattern matching on strings when the = or != operators are used. The
match behaves as for wildcard globbing as illustrated in Listing 7.
Listing 7. Wildcard tests with [[
[ian@pinguino ~]$ [[ "abc def .d,x--" == a[abc]*\ ?d* ]]; echo $?
0
[ian@pinguino ~]$ [[ "abc def c" == a[abc]*\ ?d* ]]; echo $?
1
[ian@pinguino ~]$ [[ "abc def d,x" == a[abc]*\ ?d* ]]; echo $?
1
You can even do arithmetic tests within [[ compounds, but be careful. Unless within a (( compound,
the < and > operators will compare the operands as strings and test their order in the current collating
sequence. Listing 8 illustrates this with some examples.
Listing 8. Including arithmetic tests with [[
[ian@pinguino ~]$ [[ "abc def d,x" == a[abc]*\ ?d* || (( 3 > 2 )) ]]; echo $?
0
[ian@pinguino ~]$ [[ "abc def d,x" == a[abc]*\ ?d* || 3 -gt 2 ]]; echo $?
0
[ian@pinguino ~]$ [[ "abc def d,x" == a[abc]*\ ?d* || 3 > 2 ]]; echo $?
0
[ian@pinguino ~]$ [[ "abc def d,x" == a[abc]*\ ?d* || a > 2 ]]; echo $?
0
[ian@pinguino ~]$ [[ "abc def d,x" == a[abc]*\ ?d* || a -gt 2 ]]; echo $?
-bash: a: unbound variable
Conditionals
While you could accomplish a huge amount of programming with the above tests and the && and ||
control operators, bash includes the more familiar "if, then, else" and case constructs. After you
learn about these, you will learn about looping constructs and your toolbox will really expand.
If, then, else statements
The bash if command is a compound command that tests the return value of a test or command ($?)
and branches based on whether it is True (0) or False (not 0). Although the tests above returned
only 0 or 1 values, commands may return other values. Learn more about these in the LPI exam 102
prep: Shells, scripting, programming, and compiling tutorial.
The if command in bash has a then clause containing a list of commands to be executed if the test
or command returns 0, one or more optional elif clauses, each with an additional test and then clause
with an associated list of commands, an optional final else clause and list of commands to be executed
if neither the original test, nor any of the tests used in the elif clauses was true, and a terminal
fi to mark the end of the construct.
Using what you have learned so far, you could now build a simple calculator to evaluate arithmetic
expressions as shown in Listing 9.
Listing 9. Evaluating expressions with if, then, else
[ian@pinguino ~]$ function mycalc ()
> {
> local x
> if [ $# -lt 1 ]; then
> echo "This function evaluates arithmetic for you if you give it some"
> elif (( $* )); then
> let x="$*"
> echo "$* = $x"
> else
> echo "$* = 0 or is not an arithmetic expression"
> fi
> }
[ian@pinguino ~]$ mycalc 3 + 4
3 + 4 = 7
[ian@pinguino ~]$ mycalc 3 + 4**3
3 + 4**3 = 67
[ian@pinguino ~]$ mycalc 3 + (4**3 /2)
-bash: syntax error near unexpected token `('
[ian@pinguino ~]$ mycalc 3 + "(4**3 /2)"
3 + (4**3 /2) = 35
[ian@pinguino ~]$ mycalc xyz
xyz = 0 or is not an arithmetic expression
[ian@pinguino ~]$ mycalc xyz + 3 + "(4**3 /2)" + abc
xyz + 3 + (4**3 /2) + abc = 35
The calculator makes use of the local statement to declare x as a local variable that is available
only within the scope of the mycalc function. The let function has several possible options, as does
the declare function to which it is closely related. Check the man pages for bash, or use help let
for more information.
As you saw in Listing 9, you need to make sure that your expressions are properly escaped if they
use shell metacharacters such as (, ), *, >, and <. Nevertheless, you have quite a handy little calculator
for evaluating arithmetic as the shell does it.
You may have noticed the else clause and the last two examples in Listing 9. As you can see, it
is not an error to pass xyz to mycalc, but it evaluates to 0. This function is not smart enough to
identify the character values in the final example of use and thus be able to warn the user. You
could use a string pattern matching test such as
[[ ! ("$*" == *[a-zA-Z]* ]]
(or the appropriate form for your locale) to eliminate any expression containing alphabetic characters,
but that would prevent using hexadecimal notation in your input, since you might use 0x0f to represent
15 using hexadecimal notation. In fact, the shell allows bases up to 64 (using base#value notation),
so you could legitimately use any alphabetic character, plus _ and @ in your input. Octal and hexadecimal
use the usual notation of a leading 0 for octal and leading 0x or 0X for hexadecimal. Listing 10
shows some examples.
Listing 10. Calculating with different bases
[ian@pinguino ~]$ mycalc 015
015 = 13
[ian@pinguino ~]$ mycalc 0xff
0xff = 255
[ian@pinguino ~]$ mycalc 29#37
29#37 = 94
[ian@pinguino ~]$ mycalc 64#1az
64#1az = 4771
[ian@pinguino ~]$ mycalc 64#1azA
64#1azA = 305380
[ian@pinguino ~]$ mycalc 64#1azA_@
64#1azA_@ = 1250840574
[ian@pinguino ~]$ mycalc 64#1az*64**3 + 64#A_@
64#1az*64**3 + 64#A_@ = 1250840574
Additional laundering of the input is beyond the scope of this tip, so use your calculator with
care.
The elif statement is very convenient. It helps you in writing scripts by allowing you to simplify
the indenting. You may be surprised to see the output of the type command for the mycalc function
as shown in Listing 11.
Listing 11. Type mycalc
[ian@pinguino ~]$ type mycalc
mycalc is a function
mycalc ()
{
local x;
if [ $# -lt 1 ]; then
echo "This function evaluates arithmetic for you if you give it some";
else
if (( $* )); then
let x="$*";
echo "$* = $x";
else
echo "$* = 0 or is not an arithmetic expression";
fi;
fi
}
Of course, you could just do shell arithmetic by using $(( expression )) with the echo command
as shown in Listing 12. You wouldn't have learned anything about functions or tests that way, but
do note that the shell does not interpret metacharacters, such as *, in their normal role when inside
(( expression )) or [[ expression ]].
Listing 12. Direct calculation in the shell with echo and $(( ))
[ian@pinguino ~]$ echo $((3 + (4**3 /2)))
35
If you'd like to know more about Bash scripting in Linux, read the tutorial "LPI exam 102 prep:
Shells, scripting, programming, and compiling," from which this article was excerpted, or see the
other Resources below. Don't forget to rate this page.
freshmeat.net
The Advanced Bash Scripting Guide is both a reference and a tutorial on shell scripting. This
comprehensive book (the equivalent of 880+ print pages) covers almost every aspect of shell scripting.
It contains 340 profusely commented illustrative examples, a number of tables, and a cross-linked
index/glossary. Not just a shell scripting tutorial, this book also provides an introduction to basic
programming techniques, such as sorting and recursion. It is well suited for either individual study
or classroom use. It covers Bash, up to and including version 3.2x.
http://www.tldp.org/LDP/abs/html/
20 Feb 2007
Are you confused by the plethora of testing and comparison options in the Bash shell? This tip
helps you demystify the various types of file, arithmetic, and string tests so you will always
know when to use test
, [ ]
, [[ ]]
, (( ))
,
or if-then-else
constructs.
{
The Bash shell is available on many Linux® and UNIX® systems today, and is a common default shell
on Linux. Bash includes powerful programming capabilities, including extensive functions for testing
file types and attributes, as well as the arithmetic and string comparisons available in most programming
languages. Understanding the various tests and knowing that the shell can also interpret some operators
as shell metacharacters is an important step to becoming a power shell user. This article, excerpted
from the developerWorks tutorial
LPI exam 102 prep: Shells, scripting, programming, and compiling, shows you how to understand
and use the test and comparison operations of the Bash shell.
This tip explains the shell test and comparison functions and shows you how to add programming
capability to the shell. You may have already seen simple shell logic using the && and || operators,
which allow you to execute a command based on whether the previous command exits normally or with
an error. In this tip, you will see how to extend these basic techniques to more complex shell programming.
Tests
In any programming language, after you learn how to assign values to variables and pass parameters,
you need to test those values and parameters. In shells, the tests set the return status, which is
the same thing that other commands do. In fact, test
is a builtin command!
test and [
The test
builtin command returns 0 (True) or 1 (False), depending on the evaluation
of an expression, expr. You can also use square brackets: test expr
and
[ expr ] are equivalent. You can examine the return value by displaying $?
; you
can use the return value with && and ||; or you can test it using the various conditional constructs
that are covered later in this tip.
Listing 1. Some simple tests
[ian@pinguino ~]$ test 3 -gt 4 && echo True || echo false
false
[ian@pinguino ~]$ [ "abc" != "def" ];echo $?
0
[ian@pinguino ~]$ test -d "$HOME" ;echo $?
0
|
In the first example in Listing 1, the -gt
operator performs an arithmetic comparison
between two literal values. In the second example, the alternate [ ]
form compares two
strings for inequality. In the final example, the value of the HOME variable is tested to see if
it is a directory using the -d
unary operator.You can compare arithmetic values using
one of -eq
, -ne
, -lt
, -le
, -gt
,
or -ge
, meaning equal, not equal, less than, less than or equal, greater than, and greater
than or equal, respectively.
You can compare strings for equality, inequality, or whether the first string sorts before or
after the second one using the operators =
, !=
, <
, and
>
, respectively. The unary operator -z
tests for a null string, while
-n
or no operator at all returns True if a string is not empty.
Note: the <
and >
operators are also used by the shell for redirection,
so you must escape them using \<
or \>
. Listing 2 shows more examples of
string tests. Check that they are as you expect.
Listing 2. Some string tests
[ian@pinguino ~]$ test "abc" = "def" ;echo $?
1
[ian@pinguino ~]$ [ "abc" != "def" ];echo $?
0
[ian@pinguino ~]$ [ "abc" \< "def" ];echo $?
0
[ian@pinguino ~]$ [ "abc" \> "def" ];echo $?
1
[ian@pinguino ~]$ [ "abc" \<"abc" ];echo $?
1
[ian@pinguino ~]$ [ "abc" \> "abc" ];echo $?
1
|
Some of the more common file tests are shown in Table 1. The result is True if the file tested is
a file that exists and that has the specified characteristic.
Table 1. Some common file tests
Operator |
Characteristic |
-d |
Directory |
-e |
Exists (also -a) |
-f |
Regular file |
-h |
Symbolic link (also -L) |
-p |
Named pipe |
-r |
Readable by you |
-s |
Not empty |
-S |
Socket |
-w |
Writable by you |
-N |
Has been modified since last being read |
In addition to the unary tests above, you can compare two files with the binary operators shown
in Table 2.
Table 2. Testing pairs of files
Operator |
True if |
-nt |
Test if file1 is newer than file 2. The modification date is used for this and the next
comparison. |
-ot |
Test if file1 is older than file 2. |
-ef |
Test if file1 is a hard link to file2. |
Several other tests allow you to check things such as the permissions of the file. See the man
pages for bash for more details or use help test
to see brief information on the test
builtin. You can use the help
command for other builtins too.
The -o
operator allows you to test various shell options that may be set using
set -o option
, returning True (0) if the option is set and False (1) otherwise,
as shown in Listing 3.
Listing 3. Testing shell options
[ian@pinguino ~]$ set +o nounset
[ian@pinguino ~]$ [ -o nounset ];echo $?
1
[ian@pinguino ~]$ set -u
[ian@pinguino ~]$ test -o nounset; echo $?
0
|
Finally, the -a
and -o
options allow you to combine expressions with
logical AND and OR, respectively, while the unary !
operator inverts the sense of the
test. You may use parentheses to group expressions and override the default precedence. Remember
that the shell will normally run an expression between parentheses in a subshell, so you will need
to escape the parentheses using \( and \) or enclosing these operators in single or double quotes.
Listing 4 illustrates the application of de Morgan's laws to an expression.
Listing 4. Combining and grouping tests
[ian@pinguino ~]$ test "a" != "$HOME" -a 3 -ge 4 ; echo $?
1
[ian@pinguino ~]$ [ ! \( "a" = "$HOME" -o 3 -lt 4 \) ]; echo $?
1
[ian@pinguino ~]$ [ ! \( "a" = "$HOME" -o '(' 3 -lt 4 ')' ")" ]; echo $?
1
|
(( and [[The test
command is very powerful, but somewhat unwieldy
given its requirement for escaping and given the difference between string and arithmetic comparisons.
Fortunately, bash has two other ways of testing that are somewhat more natural for people who are
familiar with C, C++, or Java® syntax.
The (( ))
compound command evaluates an arithmetic expression and sets the
exit status to 1 if the expression evaluates to 0, or to 0 if the expression evaluates to a non-zero
value. You do not need to escape operators between ((
and ))
. Arithmetic
is done on integers. Division by 0 causes an error, but overflow does not. You may perform the usual
C language arithmetic, logical, and bitwise operations. The let
command can also execute
one or more arithmetic expressions. It is usually used to assign values to arithmetic variables.
Listing 5. Assigning and testing arithmetic expressions
[ian@pinguino ~]$ let x=2 y=2**3 z=y*3;echo $? $x $y $z
0 2 8 24
[ian@pinguino ~]$ (( w=(y/x) + ( (~ ++x) & 0x0f ) )); echo $? $x $y $w
0 3 8 16
[ian@pinguino ~]$ (( w=(y/x) + ( (~ ++x) & 0x0f ) )); echo $? $x $y $w
0 4 8 13
|
As with (( ))
, the [[ ]]
compound command allows you to use more natural
syntax for filename and string tests. You can combine tests that are allowed for the test
command using parentheses and logical operators.
Listing 6. Using the [[ compound
[ian@pinguino ~]$ [[ ( -d "$HOME" ) && ( -w "$HOME" ) ]] &&
> echo "home is a writable directory"
home is a writable directory
|
The [[
compound can also do pattern matching on strings when the =
or
!=
operators are used. The match behaves as for wildcard globbing as illustrated in
Listing 7.
Listing 7. Wildcard tests with [[
[ian@pinguino ~]$ [[ "abc def .d,x--" == a[abc]*\ ?d* ]]; echo $?
0
[ian@pinguino ~]$ [[ "abc def c" == a[abc]*\ ?d* ]]; echo $?
1
[ian@pinguino ~]$ [[ "abc def d,x" == a[abc]*\ ?d* ]]; echo $?
1
|
You can even do arithmetic tests within [[
compounds, but be careful. Unless within
a ((
compound, the <
and >
operators will compare the operands
as strings and test their order in the current collating sequence. Listing 8 illustrates this with
some examples.
Listing 8. Including arithmetic tests with [[
[ian@pinguino ~]$ [[ "abc def d,x" == a[abc]*\ ?d* || (( 3 > 2 )) ]]; echo $?
0
[ian@pinguino ~]$ [[ "abc def d,x" == a[abc]*\ ?d* || 3 -gt 2 ]]; echo $?
0
[ian@pinguino ~]$ [[ "abc def d,x" == a[abc]*\ ?d* || 3 > 2 ]]; echo $?
0
[ian@pinguino ~]$ [[ "abc def d,x" == a[abc]*\ ?d* || a > 2 ]]; echo $?
0
[ian@pinguino ~]$ [[ "abc def d,x" == a[abc]*\ ?d* || a -gt 2 ]]; echo $?
-bash: a: unbound variable
|
ConditionalsWhile you could accomplish a huge amount of programming
with the above tests and the &&
and ||
control operators, bash includes
the more familiar "if, then, else" and case constructs. After you learn about these, you will learn
about looping constructs and your toolbox will really expand.
If, then, else statements
The bash if
command is a compound command that tests the return value of a test or
command ($?
) and branches based on whether it is True (0) or False (not 0). Although
the tests above returned only 0 or 1 values, commands may return other values. Learn more about these
in the
LPI exam 102 prep: Shells, scripting, programming, and compiling tutorial.
The if
command in bash has a then
clause containing a list of commands
to be executed if the test or command returns 0, one or more optional elif
clauses,
each with an additional test and then
clause with an associated list of commands, an
optional final else
clause and list of commands to be executed if neither the original
test, nor any of the tests used in the elif
clauses was true, and a terminal fi
to mark the end of the construct.
Using what you have learned so far, you could now build a simple calculator to evaluate arithmetic
expressions as shown in Listing 9.
Listing 9. Evaluating expressions with if, then, else
[ian@pinguino ~]$ function mycalc ()
> {
> local x
> if [ $# -lt 1 ]; then
> echo "This function evaluates arithmetic for you if you give it some"
> elif (( $* )); then
> let x="$*"
> echo "$* = $x"
> else
> echo "$* = 0 or is not an arithmetic expression"
> fi
> }
[ian@pinguino ~]$ mycalc 3 + 4
3 + 4 = 7
[ian@pinguino ~]$ mycalc 3 + 4**3
3 + 4**3 = 67
[ian@pinguino ~]$ mycalc 3 + (4**3 /2)
-bash: syntax error near unexpected token `('
[ian@pinguino ~]$ mycalc 3 + "(4**3 /2)"
3 + (4**3 /2) = 35
[ian@pinguino ~]$ mycalc xyz
xyz = 0 or is not an arithmetic expression
[ian@pinguino ~]$ mycalc xyz + 3 + "(4**3 /2)" + abc
xyz + 3 + (4**3 /2) + abc = 35
|
The calculator makes use of the local
statement to declare x as a local variable
that is available only within the scope of the mycalc
function. The let
function has several possible options, as does the declare
function to which it is closely
related. Check the man pages for bash, or use help let
for more information.
As you saw in Listing 9, you need to make sure that your expressions are properly escaped if they
use shell metacharacters such as (, ), *, >, and <. Nevertheless, you have quite a handy little calculator
for evaluating arithmetic as the shell does it.
You may have noticed the else
clause and the last two examples in Listing 9. As you
can see, it is not an error to pass xyz
to mycalc, but it evaluates to 0. This function
is not smart enough to identify the character values in the final example of use and thus be able
to warn the user. You could use a string pattern matching test such as
[[ ! ("$*" == *[a-zA-Z]* ]]
(or the appropriate form for your locale) to eliminate any expression containing alphabetic characters,
but that would prevent using hexadecimal notation in your input, since you might use 0x0f to represent
15 using hexadecimal notation. In fact, the shell allows bases up to 64 (using base#value
notation), so you could legitimately use any alphabetic character, plus _ and @ in your input. Octal
and hexadecimal use the usual notation of a leading 0 for octal and leading 0x or 0X for hexadecimal.
Listing 10 shows some examples.
Listing 10. Calculating with different bases
[ian@pinguino ~]$ mycalc 015
015 = 13
[ian@pinguino ~]$ mycalc 0xff
0xff = 255
[ian@pinguino ~]$ mycalc 29#37
29#37 = 94
[ian@pinguino ~]$ mycalc 64#1az
64#1az = 4771
[ian@pinguino ~]$ mycalc 64#1azA
64#1azA = 305380
[ian@pinguino ~]$ mycalc 64#1azA_@
64#1azA_@ = 1250840574
[ian@pinguino ~]$ mycalc 64#1az*64**3 + 64#A_@
64#1az*64**3 + 64#A_@ = 1250840574
|
Additional laundering of the input is beyond the scope of this tip, so use your calculator with
care.
The elif
statement is very convenient. It helps you in writing scripts by allowing
you to simplify the indenting. You may be surprised to see the output of the type
command
for the mycalc
function as shown in Listing 11.
Listing 11. Type mycalc
[ian@pinguino ~]$ type mycalc
mycalc is a function
mycalc ()
{
local x;
if [ $# -lt 1 ]; then
echo "This function evaluates arithmetic for you if you give it some";
else
if (( $* )); then
let x="$*";
echo "$* = $x";
else
echo "$* = 0 or is not an arithmetic expression";
fi;
fi
}
|
Of course, you could just do shell arithmetic by using $(( expression ))
with
the echo
command as shown in Listing 12. You wouldn't have learned anything about functions
or tests that way, but do note that the shell does not interpret metacharacters, such as *, in their
normal role when inside (( expression ))
or [[ expression ]]
.
Listing 12. Direct calculation in the shell with echo and $(( ))
[ian@pinguino ~]$ echo $((3 + (4**3 /2)))
35
|
The "and list" and "or list" constructs provide a means of processing a number of commands consecutively.
These can effectively replace complex nested if/then or even case statements. Note
that the exit status of an "and list" or an "or list" is the exit status of the last command executed.
- and list
-
1 command-1 && command-2 && command-3 && ... command-n
|
Each command executes in turn provided that the previous command has given a return value of true.
At the first false return, the command chain terminates (the first command returning false is the
last one to execute).
Example 3-87. Using an "and list" to test for command-line arguments
1 #!/bin/bash
2
3 # "and list"
4
5 if [ ! -z $1 ] && echo "Argument #1 = $1" && [ ! -z $2 ] && echo "Argument #2 = $2"
6 then
7 echo "At least 2 arguments to script."
8 # All the chained commands return true.
9 else
10 echo "Less than 2 arguments to script."
11 # At least one of the chained commands returns false.
12 fi
13 # Note that "if [ ! -z $1 ]" works, but its supposed equivalent,
14 # "if [ -n $1 ]" does not. This is a bug, not a feature.
15
16
17 # This accomplishes the same thing, coded using "pure" if/then statements.
18 if [ ! -z $1 ]
19 then
20 echo "Argument #1 = $1"
21 fi
22 if [ ! -z $2 ]
23 then
24 echo "Argument #2 = $2"
25 echo "At least 2 arguments to script."
26 else
27 echo "Less than 2 arguments to script."
28 fi
29 # It's longer and less elegant than using an "and list".
30
31
32 exit 0 |
- or list
-
1 command-1 || command-2 || command-3 || ... command-n |
Each command executes in turn for as long as the previous command returns false. At the first true
return, the command chain terminates (the first command returning true is the last one to execute).
This is obviously the inverse of the "and list".
Example 3-88. Using "or lists" in combination with an "and list"
1 #!/bin/bash
2
3 # "Delete", not-so-cunning file deletion utility.
4 # Usage: delete filename
5
6 if [ -z $1 ]
7 then
8 file=nothing
9 else
10 file=$1
11 fi
12 # Fetch file name (or "nothing") for deletion message.
13
14
15 [ ! -f $1 ] && echo "$1 not found. Can't delete a nonexistent file."
16 # AND LIST, to give error message if file not present.
17
18 [ ! -f $1 ] || ( rm -f $1; echo "$file deleted." )
19 # OR LIST, to delete file if present.
20 # ( command1 ; command2 ) is, in effect, an AND LIST variant.
21
22 # Note logic inversion above.
23 # AND LIST executes on true, OR LIST on false.
24
25 [ ! -z $1 ] || echo "Usage: `basename $0` filename"
26 # OR LIST, to give error message if no command line arg (file name).
27
28 exit 0 |
Clever combinations of "and" and "or" lists are possible, but the logic may easily become convoluted
and require extensive debugging.
Softpanorama Recommended
Please visit Heiner Steven
SHELLdorado -- the best shell
scripting site on the Internet
|
Society
Groupthink :
Two Party System
as Polyarchy :
Corruption of Regulators :
Bureaucracies :
Understanding Micromanagers
and Control Freaks : Toxic Managers :
Harvard Mafia :
Diplomatic Communication
: Surviving a Bad Performance
Review : Insufficient Retirement Funds as
Immanent Problem of Neoliberal Regime : PseudoScience :
Who Rules America :
Neoliberalism
: The Iron
Law of Oligarchy :
Libertarian Philosophy
Quotes
War and Peace
: Skeptical
Finance : John
Kenneth Galbraith :Talleyrand :
Oscar Wilde :
Otto Von Bismarck :
Keynes :
George Carlin :
Skeptics :
Propaganda : SE
quotes : Language Design and Programming Quotes :
Random IT-related quotes :
Somerset Maugham :
Marcus Aurelius :
Kurt Vonnegut :
Eric Hoffer :
Winston Churchill :
Napoleon Bonaparte :
Ambrose Bierce :
Bernard Shaw :
Mark Twain Quotes
Bulletin:
Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks The efficient
markets hypothesis :
Political Skeptic Bulletin, 2013 :
Unemployment Bulletin, 2010 :
Vol 23, No.10
(October, 2011) An observation about corporate security departments :
Slightly Skeptical Euromaydan Chronicles, June 2014 :
Greenspan legacy bulletin, 2008 :
Vol 25, No.10 (October, 2013) Cryptolocker Trojan
(Win32/Crilock.A) :
Vol 25, No.08 (August, 2013) Cloud providers
as intelligence collection hubs :
Financial Humor Bulletin, 2010 :
Inequality Bulletin, 2009 :
Financial Humor Bulletin, 2008 :
Copyleft Problems
Bulletin, 2004 :
Financial Humor Bulletin, 2011 :
Energy Bulletin, 2010 :
Malware Protection Bulletin, 2010 : Vol 26,
No.1 (January, 2013) Object-Oriented Cult :
Political Skeptic Bulletin, 2011 :
Vol 23, No.11 (November, 2011) Softpanorama classification
of sysadmin horror stories : Vol 25, No.05
(May, 2013) Corporate bullshit as a communication method :
Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law
History:
Fifty glorious years (1950-2000):
the triumph of the US computer engineering :
Donald Knuth : TAoCP
and its Influence of Computer Science : Richard Stallman
: Linus Torvalds :
Larry Wall :
John K. Ousterhout :
CTSS : Multix OS Unix
History : Unix shell history :
VI editor :
History of pipes concept :
Solaris : MS DOS
: Programming Languages History :
PL/1 : Simula 67 :
C :
History of GCC development :
Scripting Languages :
Perl history :
OS History : Mail :
DNS : SSH
: CPU Instruction Sets :
SPARC systems 1987-2006 :
Norton Commander :
Norton Utilities :
Norton Ghost :
Frontpage history :
Malware Defense History :
GNU Screen :
OSS early history
Classic books:
The Peter
Principle : Parkinson
Law : 1984 :
The Mythical Man-Month :
How to Solve It by George Polya :
The Art of Computer Programming :
The Elements of Programming Style :
The Unix Hater’s Handbook :
The Jargon file :
The True Believer :
Programming Pearls :
The Good Soldier Svejk :
The Power Elite
Most popular humor pages:
Manifest of the Softpanorama IT Slacker Society :
Ten Commandments
of the IT Slackers Society : Computer Humor Collection
: BSD Logo Story :
The Cuckoo's Egg :
IT Slang : C++ Humor
: ARE YOU A BBS ADDICT? :
The Perl Purity Test :
Object oriented programmers of all nations
: Financial Humor :
Financial Humor Bulletin,
2008 : Financial
Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related
Humor : Programming Language Humor :
Goldman Sachs related humor :
Greenspan humor : C Humor :
Scripting Humor :
Real Programmers Humor :
Web Humor : GPL-related Humor
: OFM Humor :
Politically Incorrect Humor :
IDS Humor :
"Linux Sucks" Humor : Russian
Musical Humor : Best Russian Programmer
Humor : Microsoft plans to buy Catholic Church
: Richard Stallman Related Humor :
Admin Humor : Perl-related
Humor : Linus Torvalds Related
humor : PseudoScience Related Humor :
Networking Humor :
Shell Humor :
Financial Humor Bulletin,
2011 : Financial
Humor Bulletin, 2012 :
Financial Humor Bulletin,
2013 : Java Humor : Software
Engineering Humor : Sun Solaris Related Humor :
Education Humor : IBM
Humor : Assembler-related Humor :
VIM Humor : Computer
Viruses Humor : Bright tomorrow is rescheduled
to a day after tomorrow : Classic Computer
Humor
The Last but not Least Technology is dominated by
two types of people: those who understand what they do not manage and those who manage what they do not understand ~Archibald Putt.
Ph.D
Copyright © 1996-2021 by Softpanorama Society. www.softpanorama.org
was initially created as a service to the (now defunct) UN Sustainable Development Networking Programme (SDNP)
without any remuneration. This document is an industrial compilation designed and created exclusively
for educational use and is distributed under the Softpanorama Content License.
Original materials copyright belong
to respective owners. Quotes are made for educational purposes only
in compliance with the fair use doctrine.
FAIR USE NOTICE This site contains
copyrighted material the use of which has not always been specifically
authorized by the copyright owner. We are making such material available
to advance understanding of computer science, IT technology, economic, scientific, and social
issues. We believe this constitutes a 'fair use' of any such
copyrighted material as provided by section 107 of the US Copyright Law according to which
such material can be distributed without profit exclusively for research and educational purposes.
This is a Spartan WHYFF (We Help You For Free)
site written by people for whom English is not a native language. Grammar and spelling errors should
be expected. The site contain some broken links as it develops like a living tree...
Disclaimer:
The statements, views and opinions presented on this web page are those of the author (or
referenced source) and are
not endorsed by, nor do they necessarily reflect, the opinions of the Softpanorama society. We do not warrant the correctness
of the information provided or its fitness for any purpose. The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be
tracked by Google please disable Javascript for this site. This site is perfectly usable without
Javascript.
Last modified:
May 23, 2021