Table of Contents

Arrays

Purpose

An array is a parameter that holds mappings from keys to values. Arrays are used to store a collection of parameters into a parameter. Arrays (in any programming language) are a useful and common composite data structure, and one of the most important scripting features in Bash and other shells.

Here is an abstract representation of an array named NAMES. The indexes go from 0 to 3.

NAMES
 0: Peter
 1: Anna
 2: Greg
 3: Jan

Instead of using 4 separate variables, multiple related variables are grouped grouped together into elements of the array, accessible by their key. If you want the second name, ask for index 1 of the array NAMES.

Indexing

Bash supports two different types of ksh-like one-dimensional arrays. Multidimensional arrays are not implemented.

Syntax

Referencing

To accommodate referring to array variables and their individual elements, Bash extends the parameter naming scheme with a subscript suffix. Any valid ordinary scalar parameter name is also a valid array name: [[:alpha:]_][[:alnum:]_]*. The parameter name may be followed by an optional subscript enclosed in square brackets to refer to a member of the array.

The overall syntax is arrname[subscript] - where for indexed arrays, subscript is any valid arithmetic expression, and for associative arrays, any nonempty string. Subscripts are first processed for parameter and arithmetic expansions, and command and process substitutions. When used within parameter expansions or as an argument to the unset builtin, the special subscripts * and @ are also accepted which act upon arrays analogously to the way the @ and * special parameters act upon the positional parameters. In parsing the subscript, bash ignores any text that follows the closing bracket up to the end of the parameter name.

With few exceptions, names of this form may be used anywhere ordinary parameter names are valid, such as within arithmetic expressions, parameter expansions, and as arguments to builtins that accept parameter names. An array is a Bash parameter that has been given the -a (for indexed) or -A (for associative) attributes. However, any regular (non-special or positional) parameter may be validly referenced using a subscript, because in most contexts, referring to the zeroth element of an array is synonymous with referring to the array name without a subscript.

# "x" is an ordinary non-array parameter.
$ x=hi; printf '%s ' "$x" "${x[0]}"; echo "${_[0]}"
hi hi hi

The only exceptions to this rule are in a few cases where the array variable's name refers to the array as a whole. This is the case for the unset builtin (see destruction) and when declaring an array without assigning any values (see declaration).

Declaration

The following explicitly give variables array attributes, making them arrays:

Syntax Description
ARRAY=() Declares an indexed array ARRAY and initializes it to be empty. This can also be used to empty an existing array.
ARRAY[0]= Generally sets the first element of an indexed array. If no array ARRAY existed before, it is created.
declare -a ARRAY Declares an indexed array ARRAY. An existing array is not initialized.
declare -A ARRAY Declares an associative array ARRAY. This is the one and only way to create associative arrays.

As an example, and for use below, let's declare our NAMES array as described above:

  declare -a NAMES=('Peter' 'Anna' 'Greg' 'Jan')

Storing values

Storing values in arrays is quite as simple as storing values in normal variables.

Syntax Description
ARRAY[N]=VALUE Sets the element N of the indexed array ARRAY to VALUE. N can be any valid arithmetic expression.
ARRAY[STRING]=VALUE Sets the element indexed by STRING of the associative array ARRAY.
ARRAY=VALUE As above. If no index is given, as a default the zeroth element is set to VALUE. Careful, this is even true of associative arrays - there is no error if no key is specified, and the value is assigned to string index "0".
ARRAY=(E1 E2 …) Compound array assignment - sets the whole array ARRAY to the given list of elements indexed sequentially starting at zero. The array is unset before assignment unless the += operator is used. When the list is empty (ARRAY=()), the array will be set to an empty array. This method obviously does not use explicit indexes. An associative array can not be set like that! Clearing an associative array using ARRAY=() works.
ARRAY=([X]=E1 [Y]=E2 …) Compound assignment for indexed arrays with index-value pairs declared individually (here for example X and Y). X and Y are arithmetic expressions. This syntax can be combined with the above - elements declared without an explicitly specified index are assigned sequentially starting at either the last element with an explicit index, or zero.
ARRAY=([S1]=E1 [S2]=E2 …) Individual mass-setting for associative arrays. The named indexes (here: S1 and S2) are strings.
ARRAY+=(E1 E2 …) Append to ARRAY.
ARRAY=("${ANOTHER_ARRAY[@]}") Copy ANOTHER_ARRAY to ARRAY, copying each element.

As of now, arrays can't be exported.

Getting values

For completeness and details on several parameter expansion variants, see the article about parameter expansion and check the notes about arrays.
Syntax Description
${ARRAY[N]} Expands to the value of the index N in the indexed array ARRAY. If N is a negative number, it's treated as the offset from the maximum assigned index (can't be used for assignment) - 1
${ARRAY[S]} Expands to the value of the index S in the associative array ARRAY.
"${ARRAY[@]}"
${ARRAY[@]}
"${ARRAY[*]}"
${ARRAY[*]}
Similar to mass-expanding positional parameters, this expands to all elements. If unquoted, both subscripts * and @ expand to the same result, if quoted, @ expands to all elements individually quoted, * expands to all elements quoted as a whole.
"${ARRAY[@]:N:M}"
${ARRAY[@]:N:M}
"${ARRAY[*]:N:M}"
${ARRAY[*]:N:M}
Similar to what this syntax does for the characters of a single string when doing substring expansion, this expands to M elements starting with element N. This way you can mass-expand individual indexes. The rules for quoting and the subscripts * and @ are the same as above for the other mass-expansions.

For clarification: When you use the subscripts @ or * for mass-expanding, then the behaviour is exactly what it is for $@ and $* when mass-expanding the positional parameters. You should read this article to understand what's going on.

Metadata

Syntax Description
${#ARRAY[N]} Expands to the length of an individual array member at index N (stringlength)
${#ARRAY[STRING]} Expands to the length of an individual associative array member at index STRING (stringlength)
${#ARRAY[@]}
${#ARRAY[*]}
Expands to the number of elements in ARRAY
${!ARRAY[@]}
${!ARRAY[*]}
Expands to the indexes in ARRAY since BASH 3.0

Destruction

The unset builtin command is used to destroy (unset) arrays or individual elements of arrays.

Syntax Description
unset -v ARRAY
unset -v ARRAY[@]
unset -v ARRAY[*]
Destroys a complete array
unset -v ARRAY[N]Destroys the array element at index N
unset -v ARRAY[STRING]Destroys the array element of the associative array at index STRING

It is best to explicitly specify -v when unsetting variables with unset.

Specifying unquoted array elements as arguments to any command, such as with the syntax above may cause pathname expansion to occur due to the presence of glob characters.

Example: You are in a directory with a file named x1, and you want to destroy an array element x[1], with

unset x[1]
then pathname expansion will expand to the filename x1 and break your processing!

Even worse, if nullglob is set, your array/index will disappear.

To avoid this, always quote the array name and index:

unset -v 'x[1]'

This applies generally to all commands which take variable names as arguments. Single quotes preferred.

Usage

Numerical Index

Numerical indexed arrays are easy to understand and easy to use. The Purpose and Indexing chapters above more or less explain all the needed background theory.

Now, some examples and comments for you.

Let's say we have an array sentence which is initialized as follows:

sentence=(Be liberal in what you accept, and conservative in what you send)

Since no special code is there to prevent word splitting (no quotes), every word there will be assigned to an individual array element. When you count the words you see, you should get 12. Now let's see if Bash has the same opinion:

$ echo ${#sentence[@]}
12

Yes, 12. Fine. You can take this number to walk through the array. Just subtract 1 from the number of elements, and start your walk at 0 (zero):

((n_elements=${#sentence[@]}, max_index=n_elements - 1))

for ((i = 0; i <= max_index; i++)); do
  echo "Element $i: '${sentence[i]}'"
done

You always have to remember that, it seems newbies have problems sometimes. Please understand that numerical array indexing begins at 0 (zero)!

The method above, walking through an array by just knowing its number of elements, only works for arrays where all elements are set, of course. If one element in the middle is removed, then the calculation is nonsense, because the number of elements doesn't correspond to the highest used index anymore (we call them "sparse arrays").

Now, suppose that you want to replace your array sentence with the values in the previously-declared array NAMES . You might think you could just do

$ unset sentence ; declare -a sentence=NAMES
$ echo ${#sentence[@]}
1
# omit calculating max_index as above, and iterate as one-liner
$ for ((i = 0; i < ${#sentence[@]}; i++)); do  echo "Element $i: '${sentence[i]}'" ; done
Element 0: 'NAMES'

Obviously that's wrong. What about

$ unset sentence ; declare -a sentence=${NAMES}

? Again, wrong:

$ echo ${#sentence[*]}
1
$ for ((i = 0; i < ${#sentence[@]}; i++)); do  echo "Element $i: '${sentence[i]}'" ; done
Element 0: 'Peter'

So what's the right way? The (slightly ugly) answer is, reuse the enumeration syntax:

$ unset sentence ; declare -a sentence=("${NAMES[@]}")
$ echo ${#sentence[@]}
4
$ for ((i = 0; i < ${#sentence[@]}; i++)); do  echo "Element $i: '${sentence[i]}'" ; done
Element 0: 'Peter'
Element 1: 'Anna'
Element 2: 'Greg'
Element 3: 'Jan'

Associative (Bash 4)

Associative arrays (or hash tables) are not much more complicated than numerical indexed arrays. The numerical index value (in Bash a number starting at zero) just is replaced with an arbitrary string:

# declare -A, introduced with Bash 4 to declare an associative array
declare -A sentence

sentence[Begin]='Be liberal in what'
sentence[Middle]='you accept, and conservative'
sentence[End]='in what you send'
sentence['Very end']=...

Beware: don't rely on the fact that the elements are ordered in memory like they were declared, it could look like this:

# output from 'set' command
sentence=([End]="in what you send" [Middle]="you accept, and conservative " [Begin]="Be liberal in what " ["Very end"]="...")
This effectively means, you can get the data back with "${sentence[@]}", of course (just like with numerical indexing), but you can't rely on a specific order. If you want to store ordered data, or re-order data, go with numerical indexes. For associative arrays, you usually query known index values:
for element in Begin Middle End "Very end"; do
    printf "%s" "${sentence[$element]}"
done
printf "\n"

A nice code example: Checking for duplicate files using an associative array indexed with the SHA sum of the files:

# Thanks to Tramp in #bash for the idea and the code

unset flist; declare -A flist;
while read -r sum fname; do 
    if [[ ${flist[$sum]} ]]; then
        printf 'rm -- "%s" # Same as >%s<\n' "$fname" "${flist[$sum]}" 
    else
        flist[$sum]="$fname"
    fi
done <  <(find . -type f -exec sha256sum {} +)  >rmdups

Integer arrays

Any type attributes applied to an array apply to all elements of the array. If the integer attribute is set for either indexed or associative arrays, then values are considered as arithmetic for both compound and ordinary assignment, and the += operator is modified in the same way as for ordinary integer variables.

 ~ $ ( declare -ia 'a=(2+4 [2]=2+2 [a[2]]="a[2]")' 'a+=(42 [a[4]]+=3)'; declare -p a )
declare -ai a='([0]="6" [2]="4" [4]="7" [5]="42")'

a[0] is assigned to the result of 2+4. a[2] gets the result of 2+2. The last index in the first assignment is the result of a[2], which has already been assigned as 4, and its value is also given a[2].

This shows that even though any existing arrays named a in the current scope have already been unset by using = instead of += to the compound assignment, arithmetic variables within keys can self-reference any elements already assigned within the same compound-assignment. With integer arrays this also applies to expressions to the right of the =. (See evaluation order, the right side of an arithmetic assignment is typically evaluated first in Bash.)

The second compound assignment argument to declare uses +=, so it appends after the last element of the existing array rather than deleting it and creating a new array, so a[5] gets 42.

Lastly, the element whose index is the value of a[4] (4), gets 3 added to its existing value, making a[4] == 7. Note that having the integer attribute set this time causes += to add, rather than append a string, as it would for a non-integer array.

The single quotes force the assignments to be evaluated in the environment of declare. This is important because attributes are only applied to the assignment after assignment arguments are processed. Without them the += compound assignment would have been invalid, and strings would have been inserted into the integer array without evaluating the arithmetic. A special-case of this is shown in the next section.

Bash declaration commands are really keywords in disguise. They magically parse arguments to determine whether they are in the form of a valid assignment. If so, they are evaluated as assignments. If not, they are undergo normal argument expansion before being passed to the builtin which evaluates the resulting string as an assignment (somewhat like eval, but there are differences.) 'Todo:' Discuss this in detail.

Indirection

Arrays can be expanded indirectly using the indirect parameter expansion syntax. Parameters whose values are of the form: name[index], name[@], or name[*] when expanded indirectly produce the expected results. This is mainly useful for passing arrays (especially multiple arrays) by name to a function.

This example is an "isSubset"-like predicate which returns true if all key-value pairs of the array given as the first argument to isSubset correspond to a key-value of the array given as the second argument. It demonstrates both indirect array expansion and indirect key-passing without eval using the aforementioned special compound assignment expansion.

isSubset() {
    local -a 'xkeys=("${!'"$1"'[@]}")' 'ykeys=("${!'"$2"'[@]}")'
    set -- "${@/%/[key]}"

    (( ${#xkeys[@]} <= ${#ykeys[@]} )) || return 1

    local key
    for key in "${xkeys[@]}"; do
        [[ ${!2+_} && ${!1} == ${!2} ]] || return 1
    done
}

main() {
    # "a" is a subset of "b"
    local -a 'a=({0..5})' 'b=({0..10})'
    isSubset a b
    echo $? # true

    # "a" contains a key not in "b"
    local -a 'a=([5]=5 {6..11})' 'b=({0..10})'
    isSubset a b
    echo $? # false

    # "a" contains an element whose value != the corresponding member of "b"
    local -a 'a=([5]=5 6 8 9 10)' 'b=({0..10})'
    isSubset a b
    echo $? # false
}

main

This script is one way of implementing a crude multidimensional associative array by storing array definitions in an array and referencing them through indirection. The script takes two keys and dynamically calls a function whose name is resolved from the array.

callFuncs() {
    # Set up indirect references as positional parameters to minimize local name collisions.
    set -- "${@:1:3}" ${2+'a["$1"]' "$1"'["$2"]'}

    # The only way to test for set but null parameters is unfortunately to test each individually.
    local x
    for x; do
        [[ $x ]] || return 0
    done

    local -A a=(
        [foo]='([r]=f [s]=g [t]=h)'
        [bar]='([u]=i [v]=j [w]=k)'
        [baz]='([x]=l [y]=m [z]=n)'
        ) ${4+${a["$1"]+"${1}=${!3}"}} # For example, if "$1" is "bar" then define a new array: bar=([u]=i [v]=j [w]=k)

    ${4+${a["$1"]+"${!4-:}"}} # Now just lookup the new array. for inputs: "bar" "v", the function named "j" will be called, which prints "j" to stdout.
}

main() {
    # Define functions named {f..n} which just print their own names.
    local fun='() { echo "$FUNCNAME"; }' x

    for x in {f..n}; do
        eval "${x}${fun}"
    done

    callFuncs "$@"
}

main "$@"

Bugs and Portability Considerations

Bugs

Evaluation order

Here are some of the nasty details of array assignment evaluation order. You can use this testcase code to generate these results.

Each testcase prints evaluation order for indexed array assignment
contexts. Each context is tested for expansions (represented by digits) and
arithmetic (letters), ordered from left to right within the expression. The
output corresponds to the way evaluation is re-ordered for each shell:

a[ $1 a ]=${b[ $2 b ]:=${c[ $3 c ]}}               No attributes
a[ $1 a ]=${b[ $2 b ]:=c[ $3 c ]}                  typeset -ia a
a[ $1 a ]=${b[ $2 b ]:=c[ $3 c ]}                  typeset -ia b
a[ $1 a ]=${b[ $2 b ]:=c[ $3 c ]}                  typeset -ia a b
(( a[ $1 a ] = b[ $2 b ] ${c[ $3 c ]} ))           No attributes
(( a[ $1 a ] = ${b[ $2 b ]:=c[ $3 c ]} ))          typeset -ia b
a+=( [ $1 a ]=${b[ $2 b ]:=${c[ $3 c ]}} [ $4 d ]=$(( $5 e )) ) typeset -a a
a+=( [ $1 a ]=${b[ $2 b ]:=c[ $3 c ]} [ $4 d ]=${5}e ) typeset -ia a

bash: 4.2.42(1)-release
2 b 3 c 2 b 1 a
2 b 3 2 b 1 a c
2 b 3 2 b c 1 a
2 b 3 2 b c 1 a c
1 2 3 c b a
1 2 b 3 2 b c c a
1 2 b 3 c 2 b 4 5 e a d
1 2 b 3 2 b 4 5 a c d e

ksh93: Version AJM 93v- 2013-02-22
1 2 b b a
1 2 b b a
1 2 b b a
1 2 b b a
1 2 3 c b a
1 2 b b a
1 2 b b a 4 5 e d
1 2 b b a 4 5 d e

mksh: @(#)MIRBSD KSH R44 2013/02/24
2 b 3 c 1 a
2 b 3 1 a c
2 b 3 c 1 a
2 b 3 c 1 a
1 2 3 c a b
1 2 b 3 c a
1 2 b 3 c 4 5 e a d
1 2 b 3 4 5 a c d e

zsh: 5.0.2
2 b 3 c 2 b 1 a
2 b 3 2 b 1 a c
2 b 1 a
2 b 1 a
1 2 3 c b a
1 2 b a
1 2 b 3 c 2 b 4 5 e
1 2 b 3 2 b 4 5

See also

vim: set fenc=utf-8 ff=unix ts=4 sts=4 sw=4 ft=dokuwiki et wrap lbr: