expand and unexpand

These two commands will help you convert tabs to spaces and vice versa. Both these commands support options to customize the width of tab stops and which occurrences should be converted.

Default expand

The expand command converts tab characters to space characters. The default expansion aligns at multiples of 8 columns (calculated in terms of bytes).

# sample stdin data
$ printf 'apple\tbanana\tcherry\na\tb\tc\n' | cat -T
apple^Ibanana^Icherry
a^Ib^Ic
# 'apple' = 5 bytes, \t converts to 3 spaces
# 'banana' = 6 bytes, \t converts to 2 spaces
# 'a' and 'b' = 1 byte, \t converts to 7 spaces
$ printf 'apple\tbanana\tcherry\na\tb\tc\n' | expand
apple   banana  cherry
a       b       c

# 'αλε' = 6 bytes, \t converts to 2 spaces
$ printf 'αλε\tπού\n' | expand
αλε  πού

Here's an example with strings of size 7 and 8 bytes before the tab character:

$ printf 'deviate\treached\nbackdrop\toverhang\n' | expand
deviate reached
backdrop        overhang

The expand command also considers backspace characters to determine the number of spaces needed.

# sample input with a backspace character
$ printf 'cart\bd\tbard\n' | cat -t
cart^Hd^Ibard

# 'card' = 4 bytes, \t converts to 4 spaces
$ printf 'cart\bd\tbard\n' | expand
card    bard
$ printf 'cart\bd\tbard\n' | expand | cat -t
cart^Hd    bard

info expand will concatenate multiple files passed as input source, so cat will not be needed for such cases.

Expand only the initial tabs

You can use the -i option to convert only the tab characters present at the start of a line. The first occurrence of a character that is not tab or space characters will stop the expansion.

# 'a' present at the start of line is not a tab/space character
# so no tabs are expanded for this input
$ printf 'a\tb\tc\n' | expand -i | cat -T
a^Ib^Ic

# the first \t gets expanded here, 'a' stops further expansion
$ printf '\ta\tb\tc\n' | expand -i | cat -T
        a^Ib^Ic

# first two \t gets expanded here, 'a' stops further expansion
# presence of space characters will not stop the expansion
$ printf '\t \ta\tb\tc\n' | expand -i | cat -T
                a^Ib^Ic

Customize the tab stop width

You can use the -t option to control the expansion width. Default is 8 as seen in the previous examples.

This option provides various features. Here's an example where all the tab characters are converted equally to the given width:

$ cat -T code.py
def compute(x, y):
^Iif x > y:
^I^Iprint('hello')
^Ielse:
^I^Iprint('bye')

$ expand -t 2 code.py
def compute(x, y):
  if x > y:
    print('hello')
  else:
    print('bye')

You can provide multiple widths separated by a comma character. In such a case, the given widths determine the stop locations for those many tab characters. These stop values refer to absolute positions from the start of the line, not the number of spaces they can expand to. Rest of the tab characters will be expanded to a single space character.

# first tab character can expand till the 3rd column
# second tab character can expand till the 7th column
# rest of the tab characters will be expanded to a single space
$ printf 'a\tb\tc\td\te\n' | expand -t 3,7
a  b   c d e

# here are two more examples with the same specification as above
# second tab expands to two spaces to end at the 7th column
$ printf 'a\tbb\tc\td\te\n' | expand -t 3,7
a  bb  c d e
# second tab expands to a single space since it goes beyond the 7th column
$ printf 'a\tbbbbbbbb\tc\td\te\n' | expand -t 3,7
a  bbbbbbbb c d e

If you prefix a / character to the last width, the remaining tab characters will use multiple of this position instead of a single space default.

# first tab character can expand till the 3rd column
# remaining tab characters can expand till 7/14/21/etc
$ printf 'a\tb\tc\td\te\tf\tg\n' | expand -t 3,/7
a  b   c      d      e      f      g

# first tab character can expand till the 3rd column
# second tab character can expand till the 7th column
# remaining tab characters can expand till 10/15/20/etc
$ printf 'a\tb\tc\td\te\tf\tg\n' | expand -t 3,7,/5
a  b   c  d    e    f    g

If you use + instead of / as the prefix for the last width, the multiple calculation will use the second last width as an offset.

# first tab character can expand till the 3rd column
# 3+7=10, so remaining tab characters can expand till 10/17/24/etc
$ printf 'a\tb\tc\td\te\tf\tg\n' | expand -t 3,+7
a  b      c      d      e      f      g

# first tab character can expand till the 3rd column
# second tab character can expand till the 7th column
# 7+5=12, so remaining tab characters can expand till 12/17/22/etc
$ printf 'a\tb\tc\td\te\tf\tg\n' | expand -t 3,7,+5
a  b   c    d    e    f    g

Default unexpand

By default, the unexpand command converts initial blank characters (space or tab) to tabs. The first occurrence of a non-blank character will stop the conversion. By default, every 8 columns worth of blanks is converted to a tab.

# input is 8 spaces followed by 'a' and then more characters
# the initial 8 spaces is converted to a tab character
# 'a' stops any further conversion, since it is a non-blank character
$ printf '        a       b       c\n' | unexpand | cat -T
^Ia       b       c

# input is 9 spaces followed by 'a' and then more characters
# the initial 8 spaces are converted to a tab character
# remaining space is left as is
$ printf '         a       b       c\n' | unexpand | cat -T
^I a       b       c

# input has 16 initial spaces, gets converted to two tabs
$ printf '\t\ta\tb\tc\n' | expand | unexpand | cat -T
^I^Ia       b       c

# input has 4 spaces and a tab character (that expands till the 8th column)
# output will have a single tab character at the start
$ printf '    \ta b\n' | unexpand | cat -T
^Ia b

info The current locale determines which characters are considered as blanks. Also, unexpand will concatenate multiple files passed as input source, so cat will not be needed for such cases.

Unexpand all blanks

The -a option will allow you to convert all sequences of two or more blanks at tab boundaries. Here are some examples:

# default unexpand stops at the first non-blank character
$ printf '        a       b       c\n' | unexpand | cat -T
^Ia       b       c
# -a option will convert all sequences of blanks at tab boundaries
$ printf '        a       b       c\n' | unexpand -a | cat -T
^Ia^Ib^Ic

# only two or more consecutive blanks are considered for conversion
$ printf 'riddled reached\n' | unexpand -a | cat -T
riddled reached
$ printf 'riddle  reached\n' | unexpand -a | cat -T
riddle^Ireached

# blanks at non-tab boundaries won't be converted
$ printf 'oh  hi  hello\n' | unexpand -a | cat -T
oh  hi^Ihello

The unexpand command also considers backspace characters to determine the tab boundary.

# 'card' = 4 bytes, so the 4 spaces gets converted to a tab
$ printf 'cart\bd    bard\n' | unexpand -a | cat -T
card^Ibard
$ printf 'cart\bd    bard\n' | unexpand -a | cat -t
cart^Hd^Ibard

Change the tab stop width

The -t option has the same features as seen with the expand command. The -a option is also implied when this option is used.

Here's an example of changing the tab stop width to 2:

$ printf '\ta\n\t\tb\n' | expand -t 2
  a
    b

$ printf '\ta\n\t\tb\n' | expand -t 2 | unexpand -t 2 | cat -T
^Ia
^I^Ib

Here are some examples with multiple tab widths:

$ printf 'a\tb\tc\td\te\n' | expand -t 3,7
a  b   c d e
$ printf 'a  b   c d e\n' | unexpand -t 3,7 | cat -T
a^Ib^Ic d e
$ printf 'a\tb\tc\td\te\n' | expand -t 3,7 | unexpand -t 3,7 | cat -T
a^Ib^Ic d e

$ printf 'a\tb\tc\td\te\tf\n' | expand -t 3,/7
a  b   c      d      e      f
$ printf 'a  b   c      d      e      f\n' | unexpand -t 3,/7 | cat -T
a^Ib^Ic^Id^Ie^If

$ printf 'a\tb\tc\td\te\tf\n' | expand -t 3,+7
a  b      c      d      e      f
$ printf 'a  b      c      d      e      f\n' | unexpand -t 3,+7 | cat -T
a^Ib^Ic^Id^Ie^If

Exercises

info The exercises directory has all the files used in this section.

1) The items.txt file has space separated words. Convert the spaces to be aligned at 10 column widths as shown below.

$ cat items.txt
1) fruits
apple 5
banana 10
2) colors
green
sky blue
3) magical beasts
dragon 3
unicorn 42

##### add your solution here
1)        fruits
apple     5
banana    10
2)        colors
green
sky       blue
3)        magical   beasts
dragon    3
unicorn   42

2) What does the expand -i option do?

3) Expand the first tab character to stop at the 10th column and the second one at the 16th column. Rest of the tabs should be converted to a single space character.

$ printf 'app\tfix\tjoy\tmap\ttap\n' | ##### add your solution here
app       fix   joy map tap

$ printf 'appleseed\tfig\tjoy\n' | ##### add your solution here
appleseed fig   joy

$ printf 'a\tb\tc\td\te\n' | ##### add your solution here
a         b     c d e

4) Will the following code give back the original input? If not, is there an option that can help?

$ printf 'a\tb\tc\n' | expand | unexpand

5) How do the + and / prefix modifiers affect the -t option?