warning warning warning This is a work-in-progress draft version.



frawk

From github: frawk:

frawk is a small programming language for writing short programs processing textual data. To a first approximation, it is an implementation of the AWK language; many common Awk programs produce equivalent output when passed to frawk. You might be interested in frawk if you want your scripts to handle escaped CSV/TSV like standard Awk fields, or if you want your scripts to execute faster.

The info subdirectory has more in-depth information on frawk:

Overview: what frawk is all about, how it differs from Awk.

Types: A quick gloss on frawk's approach to types and type inference.

Parallelism: An overview of frawk's parallelism support.

Benchmarks: A sense of the relative performance of frawk and other tools when processing large CSV or TSV files.

Builtin Functions Reference: A list of builtin functions implemented by frawk, including some that are new when compared with Awk.

Installation

See frawk: installation for details.

Common lines

The SCOWL-wl.txt file used below was created using app.aspell.net. words.txt is from /usr/share/dict/words.

$ wc -l words.txt SCOWL-wl.txt
 102401 words.txt
 662349 SCOWL-wl.txt
 764750 total

Here's a timing comparison for finding common lines between two files using various tools. Ordered from slowest to fastest.

  • Case 1: shorter file passed as the first argument
# finding common lines between two files
$ time mawk 'NR==FNR{a[$0]; next} $0 in a' words.txt SCOWL-wl.txt > t1
real    0m0.307s

$ time gawk 'NR==FNR{a[$0]; next} $0 in a' words.txt SCOWL-wl.txt > t2
real    0m0.212s

$ time perl -ne 'if(!$#ARGV){$h{$_}=1; next}
                 print if exists $h{$_}' words.txt SCOWL-wl.txt > t3
real    0m0.192s

$ time frawk 'NR==FNR{a[$0]; next} $0 in a' words.txt SCOWL-wl.txt > t4
real    0m0.091s
  • Case 2: longer file passed as the first argument
$ time mawk 'NR==FNR{a[$0]; next} $0 in a' SCOWL-wl.txt words.txt > f1
real    0m0.541s

$ time gawk 'NR==FNR{a[$0]; next} $0 in a' SCOWL-wl.txt words.txt > f2
real    0m0.382s

$ time perl -ne 'if(!$#ARGV){$h{$_}=1; next}
                 print if exists $h{$_}' SCOWL-wl.txt words.txt > f3
real    0m0.350s

$ time frawk 'NR==FNR{a[$0]; next} $0 in a' SCOWL-wl.txt words.txt > f4
real    0m0.204s




More to come