This is a work-in-progress draft version.
frawk
From github: frawk:
frawk
is a small programming language for writing short programs processing textual data. To a first approximation, it is an implementation of the AWK language; many common Awk programs produce equivalent output when passed to frawk. You might be interested infrawk
if you want your scripts to handle escaped CSV/TSV like standard Awk fields, or if you want your scripts to execute faster.The info subdirectory has more in-depth information on frawk:
Overview: what frawk is all about, how it differs from Awk.
Types: A quick gloss on frawk's approach to types and type inference.
Parallelism: An overview of frawk's parallelism support.
Benchmarks: A sense of the relative performance of frawk and other tools when processing large CSV or TSV files.
Builtin Functions Reference: A list of builtin functions implemented by frawk, including some that are new when compared with Awk.
Installation
See frawk: installation for details.
Common lines
The SCOWL-wl.txt
file used below was created using app.aspell.net. words.txt
is from /usr/share/dict/words
.
$ wc -l words.txt SCOWL-wl.txt
102401 words.txt
662349 SCOWL-wl.txt
764750 total
Here's a timing comparison for finding common lines between two files using various tools. Ordered from slowest to fastest.
- Case 1: shorter file passed as the first argument
# finding common lines between two files
$ time mawk 'NR==FNR{a[$0]; next} $0 in a' words.txt SCOWL-wl.txt > t1
real 0m0.307s
$ time gawk 'NR==FNR{a[$0]; next} $0 in a' words.txt SCOWL-wl.txt > t2
real 0m0.212s
$ time perl -ne 'if(!$#ARGV){$h{$_}=1; next}
print if exists $h{$_}' words.txt SCOWL-wl.txt > t3
real 0m0.192s
$ time frawk 'NR==FNR{a[$0]; next} $0 in a' words.txt SCOWL-wl.txt > t4
real 0m0.091s
- Case 2: longer file passed as the first argument
$ time mawk 'NR==FNR{a[$0]; next} $0 in a' SCOWL-wl.txt words.txt > f1
real 0m0.541s
$ time gawk 'NR==FNR{a[$0]; next} $0 in a' SCOWL-wl.txt words.txt > f2
real 0m0.382s
$ time perl -ne 'if(!$#ARGV){$h{$_}=1; next}
print if exists $h{$_}' SCOWL-wl.txt words.txt > f3
real 0m0.350s
$ time frawk 'NR==FNR{a[$0]; next} $0 in a' SCOWL-wl.txt words.txt > f4
real 0m0.204s
More to come