split
The split
command is useful to divide the input into smaller parts based on the number of lines, bytes, file size, etc. You can also execute another command on the divided parts before saving the results. An example use case is sending a large file as multiple parts as a workaround for online transfer size limits.
Since a lot of output files will be generated in this chapter (often with the same filenames), remove these files after every illustration.
Default split
By default, the split
command divides the input 1000
lines at a time. Newline character is the default line separator. You can pass a single file or stdin
data as the input. Use cat
if you need to concatenate multiple input sources.
By default, the output files will be named xaa
, xab
, xac
and so on (where x
is the prefix). If the filenames are exhausted, two more letters will be appended and the pattern will continue as needed. If the number of input lines is not evenly divisible, the last file will contain less than 1000
lines.
# divide input 1000 lines at a time
$ seq 10000 | split
# output filenames
$ ls x*
xaa xab xac xad xae xaf xag xah xai xaj
# preview of some of the output files
$ head -n1 xaa xab xae xaj
==> xaa <==
1
==> xab <==
1001
==> xae <==
4001
==> xaj <==
9001
$ rm x*
As mentioned earlier, remove the output files after every illustration.
Change number of lines
You can use the -l
option to change the number of lines to be saved in each output file.
# maximum of 3 lines at a time
$ split -l3 purchases.txt
$ head x*
==> xaa <==
coffee
tea
washing powder
==> xab <==
coffee
toothpaste
tea
==> xac <==
soap
tea
Split by byte count
The -b
option allows you to split the input by the number of bytes. Similar to line based splitting, you can always reconstruct the input by concatenating the output files. This option also accepts suffixes such as K
for 1024
bytes, KB
for 1000
bytes, M
for 1024 * 1024
bytes and so on.
# maximum of 15 bytes at a time
$ split -b15 greeting.txt
$ head x*
==> xaa <==
Hi there
Have a
==> xab <==
nice day
# when you concatenate the output files, you'll the original input
$ cat x*
Hi there
Have a nice day
The -C
option is similar to the -b
option, but it will try to break on line boundaries if possible. The break will happen before the given byte limit. Here's an example where input lines do not exceed the given byte limit:
$ split -C20 purchases.txt
$ head x*
==> xaa <==
coffee
tea
==> xab <==
washing powder
==> xac <==
coffee
toothpaste
==> xad <==
tea
soap
tea
$ wc -c x*
11 xaa
15 xab
18 xac
13 xad
57 total
If a line exceeds the given limit, it will be broken down into multiple parts:
$ printf 'apple\nbanana\n' | split -C4
$ head x*
==> xaa <==
appl
==> xab <==
e
==> xac <==
bana
==> xad <==
na
$ cat x*
apple
banana
Divide based on file size
The -n
option has several features. If you pass only a numeric argument N
, the given input file will be divided into N
chunks. The output files will be roughly the same size.
# divide the file into 2 parts
$ split -n2 purchases.txt
$ head x*
==> xaa <==
coffee
tea
washing powder
co
==> xab <==
ffee
toothpaste
tea
soap
tea
# the two output files are roughly the same size
$ wc x*
3 5 28 xaa
5 5 29 xab
8 10 57 total
Since the division is based on file size,
stdin
data cannot be used. Newer versions of the coreutils package supports this use case by creating a temporary file before splitting.$ seq 6 | split -n2 split: -: cannot determine file size
By using K/N
as the argument, you can view the K
th chunk of N
parts on stdout
. No output file will be created in this scenario.
# divide the input into 2 parts
# view only the 1st chunk on stdout
$ split -n1/2 greeting.txt
Hi there
Hav
To avoid splitting a line, use l/
as a prefix. Quoting from the manual:
For
l
mode, chunks are approximatelyinput size / N
. The input is partitioned intoN
equal sized portions, with the last assigned any excess. If a line starts within a partition it is written completely to the corresponding file. Since lines or records are not split even if they overlap a partition, the files written can be larger or smaller than the partition size, and even empty if a line/record is so long as to completely overlap the partition.
# divide input into 2 parts, but don't split lines
$ split -nl/2 purchases.txt
$ head x*
==> xaa <==
coffee
tea
washing powder
coffee
==> xab <==
toothpaste
tea
soap
tea
Here's an example to view the K
th chunk without splitting lines:
# 2nd chunk of 3 parts without splitting lines
$ split -nl/2/3 sample.txt
7) Believe it
8)
9) banana
10) papaya
11) mango
Interleaved lines
The -n
option will also help you create output files with interleaved lines. Since this is based on the line separator and not file size, stdin
data can also be used. Use the r/
prefix to enable this feature.
# two parts, lines distributed in round robin fashion
$ seq 5 | split -nr/2
$ head x*
==> xaa <==
1
3
5
==> xab <==
2
4
Here's an example to view the K
th chunk:
$ split -nr/1/3 sample.txt
1) Hello World
4) How are you
7) Believe it
10) papaya
13) Much ado about nothing
Custom line separator
You can use the -t
option to specify a single byte character as the line separator. Use \0
to specify NUL as the separator. Depending on your shell you can use ANSI-C quoting to use escapes like \t
instead of a literal tab character.
$ printf 'apple\nbanana\n;mango\npapaya\n' | split -t';' -l1
$ head x*
==> xaa <==
apple
banana
;
==> xab <==
mango
papaya
Customize filenames
As seen earlier, x
is the default prefix for output filenames. To change this prefix, pass an argument after the input source.
# choose prefix as 'op_' instead of 'x'
$ split -l1 greeting.txt op_
$ head op_*
==> op_aa <==
Hi there
==> op_ab <==
Have a nice day
The -a
option controls the length of the suffix. You'll get an error if this length isn't enough to cover all the output files. In such a case, you'll still get output files that can fit within the given length.
$ seq 10 | split -l1 -a1
$ ls x*
xa xb xc xd xe xf xg xh xi xj
$ rm x*
$ seq 10 | split -l1 -a3
$ ls x*
xaaa xaab xaac xaad xaae xaaf xaag xaah xaai xaaj
$ rm x*
$ seq 100 | split -l1 -a1
split: output file suffixes exhausted
$ ls x*
xa xc xe xg xi xk xm xo xq xs xu xw xy
xb xd xf xh xj xl xn xp xr xt xv xx xz
$ rm x*
You can use the -d
option to use numeric suffixes, starting from 00
(length can be changed using the -a
option). You can use the long option --numeric-suffixes
to specify a different starting number.
$ seq 10 | split -l1 -d
$ ls x*
x00 x01 x02 x03 x04 x05 x06 x07 x08 x09
$ rm x*
$ seq 10 | split -l2 --numeric-suffixes=10
$ ls x*
x10 x11 x12 x13 x14
Use -x
and --hex-suffixes
options for hexadecimal numbering.
$ seq 10 | split -l1 --hex-suffixes=8
$ ls x*
x08 x09 x0a x0b x0c x0d x0e x0f x10 x11
You can use the --additional-suffix
option to add a constant string at the end of filenames.
$ seq 10 | split -l2 -a1 --additional-suffix='.log'
$ ls x*
xa.log xb.log xc.log xd.log xe.log
$ rm x*
$ seq 10 | split -l2 -a1 -d --additional-suffix='.txt' - num_
$ ls num_*
num_0.txt num_1.txt num_2.txt num_3.txt num_4.txt
Exclude empty files
You can sometimes end up with empty files. For example, trying to split into more parts than possible with the given criteria. In such cases, you can use the -e
option to prevent empty files in the output. The split
command will ensure that the filenames are sequential even if files in the middle are empty.
# 'xac' is empty in this example
$ split -nl/3 greeting.txt
$ head x*
==> xaa <==
Hi there
==> xab <==
Have a nice day
==> xac <==
$ rm x*
# prevent empty files
$ split -e -nl/3 greeting.txt
$ head x*
==> xaa <==
Hi there
==> xab <==
Have a nice day
Process parts through another command
The --filter
option will allow you to apply another command on the intermediate split
results before saving the output files. Use $FILE
to refer to the output filename of the intermediate parts. Here's an example of compressing the results:
$ split -l1 --filter='gzip > $FILE.gz' greeting.txt
$ ls x*
xaa.gz xab.gz
$ zcat xaa.gz
Hi there
$ zcat xab.gz
Have a nice day
Here's an example of ignoring the first line of the results:
$ cat body_sep.txt
%=%=
apple
banana
%=%=
red
green
$ split -l3 --filter='tail -n +2 > $FILE' body_sep.txt
$ head x*
==> xaa <==
apple
banana
==> xab <==
red
green
Exercises
The exercises directory has all the files used in this section.
Remove the output files after every exercise.
1) Split the s1.txt
file 3 lines at a time.
##### add your solution here
$ head xa?
==> xaa <==
apple
coffee
fig
==> xab <==
honey
mango
pasta
==> xac <==
sugar
tea
$ rm xa?
2) Use appropriate options to get the output shown below.
$ echo 'apple,banana,cherry,dates' | ##### add your solution here
$ head xa?
==> xaa <==
apple,
==> xab <==
banana,
==> xac <==
cherry,
==> xad <==
dates
$ rm xa?
3) What do the -b
and -C
options do?
4) Display the 2nd chunk of the ip.txt
file after splitting it 4 times as shown below.
##### add your solution here
come back before the sky turns dark
There are so many delights to cherish
5) What does the r
prefix do when used with the -n
option?
6) Split the ip.txt
file 2 lines at a time. Customize the output filenames as shown below.
##### add your solution here
$ head ip_*
==> ip_0.txt <==
it is a warm and cozy day
listen to what I say
==> ip_1.txt <==
go play in the park
come back before the sky turns dark
==> ip_2.txt <==
There are so many delights to cherish
==> ip_3.txt <==
Apple, Banana and Cherry
Bread, Butter and Jelly
==> ip_4.txt <==
Try them all before you perish
$ rm ip_*
7) Which option would you use to prevent empty files in the output?
8) Split the items.txt
file 5 lines at a time. Additionally, remove lines starting with a digit character as shown below.
$ cat items.txt
1) fruits
apple 5
banana 10
2) colors
green
sky blue
3) magical beasts
dragon 3
unicorn 42
##### add your solution here
$ head xa?
==> xaa <==
apple 5
banana 10
green
==> xab <==
sky blue
dragon 3
unicorn 42
$ rm xa?