expand and unexpand
These two commands will help you convert tabs to spaces and vice versa. Both these commands support options to customize the width of tab stops and which occurrences should be converted.
Default expand
The expand
command converts tab characters to space characters. The default expansion aligns at multiples of 8
columns (calculated in terms of bytes).
# sample stdin data
$ printf 'apple\tbanana\tcherry\na\tb\tc\n' | cat -T
apple^Ibanana^Icherry
a^Ib^Ic
# 'apple' = 5 bytes, \t converts to 3 spaces
# 'banana' = 6 bytes, \t converts to 2 spaces
# 'a' and 'b' = 1 byte, \t converts to 7 spaces
$ printf 'apple\tbanana\tcherry\na\tb\tc\n' | expand
apple banana cherry
a b c
# 'αλε' = 6 bytes, \t converts to 2 spaces
$ printf 'αλε\tπού\n' | expand
αλε πού
Here's an example with strings of size 7
and 8
bytes before the tab character:
$ printf 'deviate\treached\nbackdrop\toverhang\n' | expand
deviate reached
backdrop overhang
The expand
command also considers backspace characters to determine the number of spaces needed.
# sample input with a backspace character
$ printf 'cart\bd\tbard\n' | cat -t
cart^Hd^Ibard
# 'card' = 4 bytes, \t converts to 4 spaces
$ printf 'cart\bd\tbard\n' | expand
card bard
$ printf 'cart\bd\tbard\n' | expand | cat -t
cart^Hd bard
expand
will concatenate multiple files passed as input source, socat
will not be needed for such cases.
Expand only the initial tabs
You can use the -i
option to convert only the tab characters present at the start of a line. The first occurrence of a character that is not tab or space characters will stop the expansion.
# 'a' present at the start of line is not a tab/space character
# so no tabs are expanded for this input
$ printf 'a\tb\tc\n' | expand -i | cat -T
a^Ib^Ic
# the first \t gets expanded here, 'a' stops further expansion
$ printf '\ta\tb\tc\n' | expand -i | cat -T
a^Ib^Ic
# first two \t gets expanded here, 'a' stops further expansion
# presence of space characters will not stop the expansion
$ printf '\t \ta\tb\tc\n' | expand -i | cat -T
a^Ib^Ic
Customize the tab stop width
You can use the -t
option to control the expansion width. Default is 8
as seen in the previous examples.
This option provides various features. Here's an example where all the tab characters are converted equally to the given width:
$ cat -T code.py
def compute(x, y):
^Iif x > y:
^I^Iprint('hello')
^Ielse:
^I^Iprint('bye')
$ expand -t 2 code.py
def compute(x, y):
if x > y:
print('hello')
else:
print('bye')
You can provide multiple widths separated by a comma character. In such a case, the given widths determine the stop locations for those many tab characters. These stop values refer to absolute positions from the start of the line, not the number of spaces they can expand to. Rest of the tab characters will be expanded to a single space character.
# first tab character can expand till the 3rd column
# second tab character can expand till the 7th column
# rest of the tab characters will be expanded to a single space
$ printf 'a\tb\tc\td\te\n' | expand -t 3,7
a b c d e
# here are two more examples with the same specification as above
# second tab expands to two spaces to end at the 7th column
$ printf 'a\tbb\tc\td\te\n' | expand -t 3,7
a bb c d e
# second tab expands to a single space since it goes beyond the 7th column
$ printf 'a\tbbbbbbbb\tc\td\te\n' | expand -t 3,7
a bbbbbbbb c d e
If you prefix a /
character to the last width, the remaining tab characters will use multiple of this position instead of a single space default.
# first tab character can expand till the 3rd column
# remaining tab characters can expand till 7/14/21/etc
$ printf 'a\tb\tc\td\te\tf\tg\n' | expand -t 3,/7
a b c d e f g
# first tab character can expand till the 3rd column
# second tab character can expand till the 7th column
# remaining tab characters can expand till 10/15/20/etc
$ printf 'a\tb\tc\td\te\tf\tg\n' | expand -t 3,7,/5
a b c d e f g
If you use +
instead of /
as the prefix for the last width, the multiple calculation will use the second last width as an offset.
# first tab character can expand till the 3rd column
# 3+7=10, so remaining tab characters can expand till 10/17/24/etc
$ printf 'a\tb\tc\td\te\tf\tg\n' | expand -t 3,+7
a b c d e f g
# first tab character can expand till the 3rd column
# second tab character can expand till the 7th column
# 7+5=12, so remaining tab characters can expand till 12/17/22/etc
$ printf 'a\tb\tc\td\te\tf\tg\n' | expand -t 3,7,+5
a b c d e f g
Default unexpand
By default, the unexpand
command converts initial blank characters (space or tab) to tabs. The first occurrence of a non-blank character will stop the conversion. By default, every 8
columns worth of blanks is converted to a tab.
# input is 8 spaces followed by 'a' and then more characters
# the initial 8 spaces is converted to a tab character
# 'a' stops any further conversion, since it is a non-blank character
$ printf ' a b c\n' | unexpand | cat -T
^Ia b c
# input is 9 spaces followed by 'a' and then more characters
# the initial 8 spaces are converted to a tab character
# remaining space is left as is
$ printf ' a b c\n' | unexpand | cat -T
^I a b c
# input has 16 initial spaces, gets converted to two tabs
$ printf '\t\ta\tb\tc\n' | expand | unexpand | cat -T
^I^Ia b c
# input has 4 spaces and a tab character (that expands till the 8th column)
# output will have a single tab character at the start
$ printf ' \ta b\n' | unexpand | cat -T
^Ia b
The current
locale
determines which characters are considered as blanks. Also,unexpand
will concatenate multiple files passed as input source, socat
will not be needed for such cases.
Unexpand all blanks
The -a
option will allow you to convert all sequences of two or more blanks at tab boundaries. Here are some examples:
# default unexpand stops at the first non-blank character
$ printf ' a b c\n' | unexpand | cat -T
^Ia b c
# -a option will convert all sequences of blanks at tab boundaries
$ printf ' a b c\n' | unexpand -a | cat -T
^Ia^Ib^Ic
# only two or more consecutive blanks are considered for conversion
$ printf 'riddled reached\n' | unexpand -a | cat -T
riddled reached
$ printf 'riddle reached\n' | unexpand -a | cat -T
riddle^Ireached
# blanks at non-tab boundaries won't be converted
$ printf 'oh hi hello\n' | unexpand -a | cat -T
oh hi^Ihello
The unexpand
command also considers backspace characters to determine the tab boundary.
# 'card' = 4 bytes, so the 4 spaces gets converted to a tab
$ printf 'cart\bd bard\n' | unexpand -a | cat -T
card^Ibard
$ printf 'cart\bd bard\n' | unexpand -a | cat -t
cart^Hd^Ibard
Change the tab stop width
The -t
option has the same features as seen with the expand
command. The -a
option is also implied when this option is used.
Here's an example of changing the tab stop width to 2
:
$ printf '\ta\n\t\tb\n' | expand -t 2
a
b
$ printf '\ta\n\t\tb\n' | expand -t 2 | unexpand -t 2 | cat -T
^Ia
^I^Ib
Here are some examples with multiple tab widths:
$ printf 'a\tb\tc\td\te\n' | expand -t 3,7
a b c d e
$ printf 'a b c d e\n' | unexpand -t 3,7 | cat -T
a^Ib^Ic d e
$ printf 'a\tb\tc\td\te\n' | expand -t 3,7 | unexpand -t 3,7 | cat -T
a^Ib^Ic d e
$ printf 'a\tb\tc\td\te\tf\n' | expand -t 3,/7
a b c d e f
$ printf 'a b c d e f\n' | unexpand -t 3,/7 | cat -T
a^Ib^Ic^Id^Ie^If
$ printf 'a\tb\tc\td\te\tf\n' | expand -t 3,+7
a b c d e f
$ printf 'a b c d e f\n' | unexpand -t 3,+7 | cat -T
a^Ib^Ic^Id^Ie^If
Exercises
The exercises directory has all the files used in this section.
1) The items.txt
file has space separated words. Convert the spaces to be aligned at 10 column widths as shown below.
$ cat items.txt
1) fruits
apple 5
banana 10
2) colors
green
sky blue
3) magical beasts
dragon 3
unicorn 42
##### add your solution here
1) fruits
apple 5
banana 10
2) colors
green
sky blue
3) magical beasts
dragon 3
unicorn 42
2) What does the expand -i
option do?
3) Expand the first tab character to stop at the 10th column and the second one at the 16th column. Rest of the tabs should be converted to a single space character.
$ printf 'app\tfix\tjoy\tmap\ttap\n' | ##### add your solution here
app fix joy map tap
$ printf 'appleseed\tfig\tjoy\n' | ##### add your solution here
appleseed fig joy
$ printf 'a\tb\tc\td\te\n' | ##### add your solution here
a b c d e
4) Will the following code give back the original input? If not, is there an option that can help?
$ printf 'a\tb\tc\n' | expand | unexpand
5) How do the +
and /
prefix modifiers affect the -t
option?