This is a work-in-progress draft version.
hck
From github: hck:
hck
is a shortening ofhack
, a rougher form ofcut
.A close to drop in replacement for
cut
that can use a regex delimiter instead of a fixed string.No single feature of
hck
on its own makes it stand out overawk
,cut
,xsv
or other such tools. Wherehck
excels is making common things easy, such as reordering output fields, or splitting records on a weird delimiter. It is meant to be simple and easy to use while exploring datasets.
Installation
See hck: install for installation details.
Field separators
By default, the input field separator option -d
uses the regex \s+
to split the data. The default value for the output field separator -D
is the tab character.
$ printf 'apple ball\t \r\v\fcat dog' | hck -f2,4
ball dog
# output field order is same as the order specified by -f
$ printf 'apple ball\t \r\v\fcat dog' | hck -f3,1
cat apple
If there are leading and trailing whitespaces, they'll result in empty fields. All the fields are printed if there is no particular selection specified.
$ printf ' fig toy net ' | hck -D, -f2,1,3
fig,,toy
$ printf ' fig toy net ' | hck -D,
,fig,toy,net,
Here's some examples of using custom field separators.
$ echo 'load;err_msg--\ant,r2..not' | hck -d'\W+' -D,
load,err_msg,ant,r2,not
$ echo 'Sample123string42with777numbers' | hck -d'\d+' -f1,4 -D,
Sample,numbers
$ echo 'apple:-:ball:-:cat' | hck -d:-: -f1,3 -D' : '
apple : cat
A particular field can only be displayed once in the output.
$ echo 'a,b,c,d,e' | hck -d, -f3,3,1,2,3,2,1 -D,
c,a,b
Literal field separator
Add -L
option to treat the argument passed to the -d
option as a fixed string instead of regex. As per the documentation, this can also result in significant speed up.
# same as: hck -d'\\' -f1
$ echo 'apple\ball' | hck -Ld'\' -f1
apple
$ echo '123)(%)*#^&(*@#.[](\\){1}\xyz' | hck -Ld')(%)*#^&(*@#.[](\\){1}\' -f2
xyz
Field ranges
Range of fields can be specified separated by a -
character. You'll get an error if the range is in descending order.
$ printf '1 2 3 4 5\na b c d e\n' | hck -f1-3 -D,
1,2,3
a,b,c
# multiple ranges can be specified
# as mentioned before, a particular field can only be printed once
$ printf '1 2 3 4 5\na b c d e\n' | hck -f2-4,1,3-5 -D,
2,3,4,1,5
b,c,d,a,e
Beginning or ending field for a range can be ignored. They'll default to first and last fields respectively.
# up to first four fields
$ printf 'apple ball cat\na b c d e\n' | hck -f-4 -D,
apple,ball,cat
a,b,c,d
# all fields from the second field
$ printf 'apple ball cat\na b c d e\n' | hck -f2- -D,
ball,cat
b,c,d,e
$ printf 'apple ball cat\na b c d e\n' | hck -D,
apple,ball,cat
a,b,c,d,e
Header based field selection
You can pass a literal header name to the -F
option to select a column based on its name.
$ cat scores.csv
Name,Maths,Physics,Chemistry
Ith,100,100,100
Cy,97,98,95
Lin,78,83,80
# you can also use: hck -d, -FMaths scores.csv
$ hck -d, -F 'Maths' scores.csv
Maths
100
97
78
# order of -F usage determines the output order as well
$ hck -d, -D: -F 'Chemistry' -F 'Maths' scores.csv
Chemistry:Maths
100:100
95:97
80:78
You can add the -r
option to select headers based on regex.
$ hck -d, -D: -rF '^[NP]' scores.csv
Name:Physics
Ith:100
Cy:98
Lin:83
If a given header selection doesn't match, you'll get an error.
$ hck -d, -F 'English' scores.csv
[2021-07-16T08:33:40Z ERROR hck] No headers matched
$ hck -d, -D: -rF '^[NP]z' -F 'at' scores.csv
[2021-07-16T08:32:38Z ERROR hck] Header not found: ^[NP]z
You can use both -f
and -F
options if you wish. As mentioned before, a particular field can only be printed once in the output.
$ hck -d, -F 'Name' -f3- -D: scores.csv
Name:Physics:Chemistry
Ith:100:100
Cy:98:95
Lin:83:80
Exclude fields
The -e
and -E
options can be used to exclude fields based on field number and header names respectively. You can continue to use -f
, -F
, -L
and -r
options as needed.
# except second field
$ printf 'apple ball cat\n1 2 3 4 5' | hck -e2 -D:
apple:cat
1:3:4:5
# except first and third fields
$ printf 'apple ball cat\n1 2 3 4 5' | hck -e1,3 -D:
ball
2:4:5
# except first and third fields, but only among the fields specified by -f
$ printf 'apple ball cat\n1 2 3 4 5' | hck -e1,3 -D: -f2-4
ball
2:4
# except fields ending with 's' character
$ hck -d, -rE 's$' -D: scores.csv
Name:Chemistry
Ith:100
Cy:95
Lin:80
Mixing -f and -F selections
You can use a mix of both -f
and -F
options for field selections. The -f
option can be used only once and -F
can be used multiple times. The field with the lower value between the -f
and -F
options will be displayed first in the output. Here's some examples to understand this priority better:
# -f2 comes before Chemistry (4th field)
$ hck -d, -D, -f2 -F 'Chemistry' scores.csv
Maths,Chemistry
100,100
97,95
78,80
# Name (1st field) comes before -f3
$ hck -d, -D, -f3 -F 'Name' scores.csv
Name,Physics
Ith,100
Cy,98
Lin,83
# Name (1st field) comes before -f3
# Chemistry (4th field) comes after -f3
$ hck -d, -D, -f3 -F 'Name' -F 'Chemistry' scores.csv
Name,Physics,Chemistry
Ith,100,100
Cy,98,95
Lin,83,80
-f
can have multiple fields, but only the first field passed to -f
is considered for the comparison. Similarly, if there are multiple -F
options, only the first -F
will be considered.
# Maths (2nd field) comes before -f3
$ hck -d, -D, -f3,1 -F 'Maths' scores.csv
Maths,Physics,Name
100,100,Ith
97,98,Cy
78,83,Lin
Processing compressed input
You can use the -z
option to work with compressed input files. This works based on the filename extension.
$ xz scores.csv
$ hck -d, -f2 -z scores.csv.xz
Maths
100
97
78
The -z
option is especially useful if you have multiple input files, and they can even be compressed differently. See hck: Decompression section for complete list of extensions supported and the command that is used to decompress.
Specifying output file
You can use the -o
option to specify a file for the output instead of stdout
. Don't use the same name as input, since it will result in empty output file.
$ hck -d, -f2 scores.csv -o op.txt
$ cat op.txt
Maths
100
97
78
The -o
option would become more useful when saving compressed output based on filename extension gets implemented.
DOS style line endings
If you have \r\n
as line endings, you can use the --crlf
option. The output will also be DOS style.
# since \n is the default line separator,
# last field retains the \r character
$ printf 'a,b,c\r\n1,2,3\r\n' | hck -d, -f3,2,1 -D, | cat -v
c^M,b,a
3^M,2,1
# with --crlf the last field no longer has \r
# output line ending will be \r\n
$ printf 'a,b,c\r\n1,2,3\r\n' | hck --crlf -d, -f3,2,1 -D, | cat -v
c,b,a^M
3,2,1^M