Line-Based Records

Line-based records are those where each line in the file is a complete record. It will usually be divided into fields by a delimiting character, but sometimes the fields are defined by length: the first 20 characters are the names, the next 20 are the first line of the address, and so on.

When the files are large, the processing is usually done by an external utility such as sed or awk. Sometimes an external utility will be used to select a few records for the shell to process. This snippet searches the password file for users whose shell is bash and feeds the results to the shell to perform some (unspecified) checks:

grep 'bash$' /etc/passwd |
while read line
do
  : perform some checking here
done

Delimiter-Separated Values

Most single-line records will have fields delimited by a certain character. In /etc/passwd, the delimiter is a colon. In other files, the delimiter may be a tab, tilde, or, very commonly, a comma. For these records to be useful, they must be split into their separate fields.

When records are received on an input stream, the easiest way to split them is to change IFS and read each field into its own variable:

grep 'bash$' /etc/passwd |
while IFS=: read user passwd uid gid name homedir shell
do
  printf "%16s: %s\n" \
      User       "$user" \
      Password   "$passwd" \
      "User ID"  "$uid" \
      "Group ID" "$gid" \
      Name       "$name" \
"Home directory" "$homedir" \
      Shell      "$shell"

  read < /dev/tty
done

Sometimes it is not possible to split a record as it is read, such as if the record will be needed in its entirety as well as split into its constituent fields. In such cases, the entire line can be read into a single variable and then split later using any of several techniques. For all of these, the examples here will use the root entry from /etc/passwd:

record=root:x:0:0:root:/root:/bin/bash

The fields can be extracted one at a time using parameter expansion:

for var in user passwd uid gid name homedir shell
do
  eval "$var=\${record%%:*}"  ## extract the first field
  record=${record#*:}         ## and take it off the record
done

As long as the delimiting character is not found within any field, records can be split by setting IFS to the delimiter. When doing this, file name expansion should be turned off (with set -f) to avoid expanding any wildcard characters. The fields can be stored in an array and variables can be set to reference them:

IFS=:
set -f
data=( $record )
user=0
passwd=1
uid=2
gid=3
name=4
homedir=5
shell=6

The variable names are the names of the fields that can then be used to retrieve values from the data array:

$ echo;printf "%16s: %s\n" \
      User       "${data[$user]}" \
      Password   "${data[$passwd]}" \
      "User ID"  "${data[$uid]}" \
      "Group ID" "${data[$gid]}" \
      Name       "${data[$name]}" \
"Home directory" "${data[$homedir]}" \
      Shell      "${data[$shell]}"

            User: root
        Password: x
         User ID: 0
        Group ID: 0
            Name: root
  Home directory: /root
           Shell: /bin/bash

It is more usual to assign each field to a scalar variable. This function (Listing 13-16) takes a passwd record and splits it on colons and assigns fields to the variables.

Listing 13-16. split_passwd, Split a Record from /etc/passwd into Fields and Assign to Variables

split_passwd() #@ USAGE: split_passwd RECORD
{
  local opts=$-    ## store current shell options
  local IFS=:
  local record=${1:?} array

  set -f                                  ## Turn off filename expansion
  array=( $record )                       ## Split record into array
  case $opts in *f*);; *) set +f;; esac   ## Turn on expansion if previously set

  user=${array[0]}
  passwd=${array[1]}
  uid=${array[2]}
  gid=${array[3]}
  name=${array[4]}
  homedir=${array[5]}
  shell=${array[6]}
}

The same thing can be accomplished using a here document (Listing 13-17).

Listing 13-17. split_passwd, Split a Record from /etc/passwd into Fields and Assign to Variables

split_passwd()
{
  IFS=: read user passwd uid gid name homedir shell <<.
$1
.
}

More generally, any character-delimited record can be split into variables for each field with this function (Listing 13-18).

Listing 13-18. split_record, Split a Record by Reading Variables

split_record() #@ USAGE parse_record record delimiter var ...
{
  local record=${1:?} IFS=${2:?} ## record and delimiter must be provided
  : ${3:?}                       ## at least one variable is required
  shift 2                        ## remove record and delimiter, leaving variables

  ## Read record into a list of variables using a 'here document'
  read "$@" <<.
$record
.
}

Using the record defined earlier, here’s the output:

$ split_record "$record" : user passwd uid gid name homedir shell
$ sa "$user" "$passwd" "$uid" "$gid" "$name" "$homedir" "$shell"
:root:
:x:
:0:
:0:
:root:
:/root:
:/bin/bash:

Fixed-Length Fields

Less common than delimited fields are fixed-length fields. They aren’t used often, but when they are, they would be looped through name=width strings to parse them, which is how many text editors import data from fixed-length field data files:

line="John           123 Fourth Street   Toronto     Canada                "
for nw in name=15 address=20 city=12 country=22
do
  var=${nw%%=*}                 ## variable name precedes the equals sign
  width=${nw#*=}                ## field width follows it
  eval "$var=\${line:0:width}"  ## extract field
  line=${line:width}            ## remove field from the record
done