expand and unexpand

These two commands will help you convert tabs to spaces and vice versa. Both these commands support options to customize the width of tab stops and which occurrences should be converted.

Default expand

The expand command converts tab characters to space characters. The default expansion aligns at multiples of 8 columns (calculated in terms of bytes).

# sample stdin data
$ printf 'apple\tbanana\tcherry\na\tb\tc\n' | cat -T
apple^Ibanana^Icherry
a^Ib^Ic
# 'apple' = 5 bytes, \t converts to 3 spaces
# 'banana' = 6 bytes, \t converts to 2 spaces
# 'a' and 'b' = 1 byte, \t converts to 7 spaces
$ printf 'apple\tbanana\tcherry\na\tb\tc\n' | expand
apple   banana  cherry
a       b       c

# 'αλε' = 6 bytes, \t converts to 2 spaces
$ printf 'αλε\tπού\n' | expand
αλε  πού

Here's an example with strings of size 7 and 8 bytes before the tab character:

$ printf 'deviate\treached\nbackdrop\toverhang\n' | expand
deviate reached
backdrop        overhang

The expand command also considers backspace characters to determine the number of spaces needed.

# sample input with a backspace character
$ printf 'cart\bd\tbard\n' | cat -t
cart^Hd^Ibard

# 'card' = 4 bytes, \t converts to 4 spaces
$ printf 'cart\bd\tbard\n' | expand
card    bard
$ printf 'cart\bd\tbard\n' | expand | cat -t
cart^Hd    bard

info expand will concatenate multiple files passed as input source, so cat will not be needed for such cases.

Expand only initial tabs

You can use the -i option to convert only the tab characters present at the start of a line. The first occurrence of a character that is not a tab or space character will stop the expansion.

# 'a' present at the start of line is not a tab/space character
# so no tabs are expanded for this input
$ printf 'a\tb\tc\n' | expand -i | cat -T
a^Ib^Ic

# the first \t gets expanded here, 'a' stops further expansion
$ printf '\ta\tb\tc\n' | expand -i | cat -T
        a^Ib^Ic

# first two \t gets expanded here, 'a' stops further expansion
# presence of space characters will not stop the expansion
$ printf '\t \ta\tb\tc\n' | expand -i | cat -T
                a^Ib^Ic

Customize tab stop width

You can use the -t option to control the expansion width. Default is 8 as seen in the previous examples.

This option provides various features. Here's an example where all the tab characters are converted equally to the given width:

$ cat -T code.py
def compute(x, y):
^Iif x > y:
^I^Iprint('hello')
^Ielse:
^I^Iprint('bye')

$ expand -t 2 code.py
def compute(x, y):
  if x > y:
    print('hello')
  else:
    print('bye')

You can provide multiple widths separated by a comma character. In such a case, the given widths determine the stop locations for those many tab characters. These stop values refer to absolute positions from the start of the line, not the number of spaces they can expand to. Rest of the tab characters will be expanded to a single space character.

# first tab character can expand till 3rd column
# second tab character can expand till 7th column
# rest of the tab characters will be expanded to single space
$ printf 'a\tb\tc\td\te\n' | expand -t 3,7
a  b   c d e

# here's two more examples with the same specification as above
# second tab expands to two spaces to end at 7th column
$ printf 'a\tbb\tc\td\te\n' | expand -t 3,7
a  bb  c d e
# second tab expands to single space since it goes beyond 7th column
$ printf 'a\tbbbbbbbb\tc\td\te\n' | expand -t 3,7
a  bbbbbbbb c d e

If you prefix a / character to the last width, the remaining tab characters will use multiple of this position instead of single space default.

# first tab character can expand till 3rd column
# remaining tab characters can expand till 7/14/21/etc
$ printf 'a\tb\tc\td\te\tf\tg\n' | expand -t 3,/7
a  b   c      d      e      f      g

# first tab character can expand till 3rd column
# second tab character can expand till 7th column
# remaining tab characters can expand till 10/15/20/etc
$ printf 'a\tb\tc\td\te\tf\tg\n' | expand -t 3,7,/5
a  b   c  d    e    f    g

If you use + instead of / as the prefix for the last width, the multiple calculation will use the second last width as an offset.

# first tab character can expand till 3rd column
# 3+7=10, so remaining tab characters can expand till 10/17/24/etc
$ printf 'a\tb\tc\td\te\tf\tg\n' | expand -t 3,+7
a  b      c      d      e      f      g

# first tab character can expand till 3rd column
# second tab character can expand till 7th column
# 7+5=12, so remaining tab characters can expand till 12/17/22/etc
$ printf 'a\tb\tc\td\te\tf\tg\n' | expand -t 3,7,+5
a  b   c    d    e    f    g

Default unexpand

By default, the unexpand command converts initial blank (space or tab) characters to tabs. The first occurrence of a non-blank character will stop the conversion. By default, every 8 columns worth of blanks is converted to a tab.

# input is 8 spaces followed by 'a' and then more characters
# the initial 8 spaces is converted to a tab character
# 'a' stops any further conversion, since it is a non-blank character
$ printf '        a       b       c\n' | unexpand | cat -T
^Ia       b       c

# input is 9 spaces followed by 'a' and then more characters
# the initial 8 spaces is converted to a tab character
# remaining space is left as is
$ printf '         a       b       c\n' | unexpand | cat -T
^I a       b       c

# input has 16 initial spaces, gets converted to two tabs
$ printf '\t\ta\tb\tc\n' | expand | unexpand | cat -T
^I^Ia       b       c

# input has 4 spaces and a tab character (that expands till 8th column)
# output will have a single tab character at the start
$ printf '    \ta b\n' | unexpand | cat -T
^Ia b

info The current locale determines which characters are considered as blanks. Also, unexpand will concatenate multiple files passed as input source, so cat will not be needed for such cases.

Unexpand all blanks

The -a option will allow you to convert all sequences of two or more blanks at tab boundaries. Here's some examples:

# default unexpand stops at first non-blank character
$ printf '        a       b       c\n' | unexpand | cat -T
^Ia       b       c
# -a option will convert all sequences of blanks at tab boundaries
$ printf '        a       b       c\n' | unexpand -a | cat -T
^Ia^Ib^Ic

# only two or more consecutive blanks are considered for conversion
$ printf 'riddled reached\n' | unexpand -a | cat -T
riddled reached
$ printf 'riddle  reached\n' | unexpand -a | cat -T
riddle^Ireached

# blanks at non-tab boundaries won't be converted
$ printf 'oh  hi  hello\n' | unexpand -a | cat -T
oh  hi^Ihello

The unexpand command also considers backspace characters to determine the tab boundary.

# 'card' = 4 bytes, so the 4 spaces gets converted to a tab
$ printf 'cart\bd    bard\n' | unexpand -a | cat -T
card^Ibard
$ printf 'cart\bd    bard\n' | unexpand -a | cat -t
cart^Hd^Ibard

Change tab stop width

The -t option has the same features as seen with expand command. The -a option is also implied when this option is used.

Here's an example of changing the tab stop width to 2:

$ printf '\ta\n\t\tb\n' | expand -t 2
  a
    b

$ printf '\ta\n\t\tb\n' | expand -t 2 | unexpand -t 2 | cat -T
^Ia
^I^Ib

Here's some examples for multiple tab widths:

$ printf 'a\tb\tc\td\te\n' | expand -t 3,7
a  b   c d e
$ printf 'a  b   c d e\n' | unexpand -t 3,7 | cat -T
a^Ib^Ic d e
$ printf 'a\tb\tc\td\te\n' | expand -t 3,7 | unexpand -t 3,7 | cat -T
a^Ib^Ic d e

$ printf 'a\tb\tc\td\te\tf\n' | expand -t 3,/7
a  b   c      d      e      f
$ printf 'a  b   c      d      e      f\n' | unexpand -t 3,/7 | cat -T
a^Ib^Ic^Id^Ie^If

$ printf 'a\tb\tc\td\te\tf\n' | expand -t 3,+7
a  b      c      d      e      f
$ printf 'a  b      c      d      e      f\n' | unexpand -t 3,+7 | cat -T
a^Ib^Ic^Id^Ie^If