split command is useful to divide the input into smaller parts based on number of lines, bytes, file size, etc. You can also execute another command on the divided parts before saving the results. An example use case is sending a large file as multiple parts as a workaround for online transfer size limits.
Since a lot of output files will be generated in this chapter (often with same filenames), remove these files after every illustration.
By default, the
split command divides the input
1000 lines at a time. Newline character is the default line separator. You can pass a single file or
stdin data as the input. Use
cat if you need to concatenate multiple input sources.
By default, the output files will be named
xac and so on (where
x is the prefix). If the filenames are exhausted, two more letters will be appended and the pattern will continue as needed. If the number of input lines is not evenly divisible, the last file will contain less than
# divide input 1000 lines at a time $ seq 10000 | split # output filenames $ ls x* xaa xab xac xad xae xaf xag xah xai xaj # preview of some of the output files $ head -n1 xaa xab xae xaj ==> xaa <== 1 ==> xab <== 1001 ==> xae <== 4001 ==> xaj <== 9001 $ rm x*
As mentioned earlier, remove the output files after every illustration.
You can use the
-l option to change the number of lines to be saved in each output file.
# maximum of 3 lines at a time $ split -l3 purchases.txt $ head x* ==> xaa <== coffee tea washing powder ==> xab <== coffee toothpaste tea ==> xac <== soap tea
-b option allows you to split the input by number of bytes. Similar to line based splitting, you can always reconstruct the input by concatenating the output files. This option also accepts suffixes such as
1024 * 1024 bytes and so on.
# maximum of 15 bytes at a time $ split -b15 greeting.txt $ head x* ==> xaa <== Hi there Have a ==> xab <== nice day # when you concatenate the output files, you'll the original input $ cat x* Hi there Have a nice day
-C option is similar to the
-b option, but it will try to break on line boundaries if possible. The break will happen before the given byte limit. Here's an example where input lines do not exceed the given byte limit:
$ split -C20 purchases.txt $ head x* ==> xaa <== coffee tea ==> xab <== washing powder ==> xac <== coffee toothpaste ==> xad <== tea soap tea $ wc -c x* 11 xaa 15 xab 18 xac 13 xad 57 total
If a line exceeds the given limit, it will be broken down into multiple parts:
$ printf 'apple\nbanana\n' | split -C4 $ head x* ==> xaa <== appl ==> xab <== e ==> xac <== bana ==> xad <== na $ cat x* apple banana
-n option has several features. If you pass only a numeric argument
N, the given input file will be divided into
N chunks. The output files will be roughly the same size.
# divide the file into 2 parts $ split -n2 purchases.txt $ head x* ==> xaa <== coffee tea washing powder co ==> xab <== ffee toothpaste tea soap tea # the two output files are roughly the same size $ wc x* 3 5 28 xaa 5 5 29 xab 8 10 57 total
Since the division is based on file size,
stdindata cannot be used.
$ seq 6 | split -n2 split: -: cannot determine file size
K/N as the argument, you can view the
Kth chunk of
N parts on
stdout. No output file will be created in this scenario.
# divide the input into 2 parts # view only the 1st chunk on stdout $ split -n1/2 greeting.txt Hi there Hav
To avoid splitting a line, use
l/ as a prefix. Quoting from the manual:
lmode, chunks are approximately
input size / N. The input is partitioned into
Nequal sized portions, with the last assigned any excess. If a line starts within a partition it is written completely to the corresponding file. Since lines or records are not split even if they overlap a partition, the files written can be larger or smaller than the partition size, and even empty if a line/record is so long as to completely overlap the partition.
# divide input into 2 parts, don't split lines $ split -nl/2 purchases.txt $ head x* ==> xaa <== coffee tea washing powder coffee ==> xab <== toothpaste tea soap tea
Here's an example to view
Kth chunk without splitting lines:
# 2nd chunk of 3 parts, don't split lines $ split -nl/2/3 sample.txt 7) Believe it 8) 9) banana 10) papaya 11) mango
-n option will also help you create output files with interleaved lines. Since this is based on the line separator and not file size,
stdin data can also be used. Use
r/ prefix to enable this feature.
# two parts, lines distributed in round robin fashion $ seq 5 | split -nr/2 $ head x* ==> xaa <== 1 3 5 ==> xab <== 2 4
Here's an example to view
$ split -nr/1/3 sample.txt 1) Hello World 4) How are you 7) Believe it 10) papaya 13) Much ado about nothing
You can use the
-t option to specify a single byte character as the line separator. Use
\0 to specify NUL as the separator. Depending on your shell you can use ANSI-C quoting to use escapes like
\t instead of a literal tab character.
$ printf 'apple\nbanana\n;mango\npapaya\n' | split -t';' -l1 $ head x* ==> xaa <== apple banana ; ==> xab <== mango papaya
As seen earlier,
x is the default prefix for output filenames. To change this prefix, pass an argument after the input source.
# choose prefix as 'op_' instead of 'x' $ split -l1 greeting.txt op_ $ head op_* ==> op_aa <== Hi there ==> op_ab <== Have a nice day
-a option controls the length of the suffix. You'll get an error if this length isn't enough to cover all the output files. In such a case, you'll still get output files that can fit within the given length.
$ seq 10 | split -l1 -a1 $ ls x* xa xb xc xd xe xf xg xh xi xj $ rm x* $ seq 10 | split -l1 -a3 $ ls x* xaaa xaab xaac xaad xaae xaaf xaag xaah xaai xaaj $ rm x* $ seq 100 | split -l1 -a1 split: output file suffixes exhausted $ ls x* xa xc xe xg xi xk xm xo xq xs xu xw xy xb xd xf xh xj xl xn xp xr xt xv xx xz
You can use the
-d option to use numeric suffixes, starting from
00 (length can be changed using the
-a option). You can use the long option
--numeric-suffixes to specify a different starting number.
$ seq 10 | split -l1 -d $ ls x* x00 x01 x02 x03 x04 x05 x06 x07 x08 x09 $ rm x* $ seq 10 | split -l2 --numeric-suffixes=10 $ ls x* x10 x11 x12 x13 x14
--hex-suffixes options for hexadecimal numbering.
$ seq 10 | split -l1 --hex-suffixes=8 $ ls x* x08 x09 x0a x0b x0c x0d x0e x0f x10 x11
You can use the
--additional-suffix option to add a constant string at the end of filenames.
$ seq 10 | split -l2 -a1 --additional-suffix='.log' $ ls x* xa.log xb.log xc.log xd.log xe.log $ rm x* $ seq 10 | split -l2 -a1 -d --additional-suffix='.txt' - num_ $ ls num_* num_0.txt num_1.txt num_2.txt num_3.txt num_4.txt
You can sometimes end up with empty files. For example, trying to split into more parts than possible with the given criteria. In such cases, you can use the
-e option to prevent empty files in the output. The
split command will ensure that the filenames are sequential even if files in the middle are empty.
# 'xac' is empty in this example $ split -nl/3 greeting.txt $ head x* ==> xaa <== Hi there ==> xab <== Have a nice day ==> xac <== $ rm x* # prevent empty files $ split -e -nl/3 greeting.txt $ head x* ==> xaa <== Hi there ==> xab <== Have a nice day
--filter option will allow you to apply another command on the intermediate
split results before saving the output files. Use
$FILE to refer to the output filename of the intermediate parts. Here's an example of compressing the results:
$ split -l1 --filter='gzip > $FILE.gz' greeting.txt $ ls x* xaa.gz xab.gz $ zcat xaa.gz Hi there $ zcat xab.gz Have a nice day
Here's an example of ignoring the first line of the results:
$ cat body_sep.txt %=%= apple banana %=%= red green $ split -l3 --filter='tail -n +2 > $FILE' body_sep.txt $ head x* ==> xaa <== apple banana ==> xab <== red green