Using modules

There are many standard modules available that come by default with Perl installation. And there's plenty of third-party modules available for wide variety of use cases. This chapter will discuss the -M command line option and show some examples with standard and third-party modules. You'll also see how to convert one-liners to full fledged script file.

Standard modules

See perldoc: modules for complete list of built-in modules. Quoting from perldoc: -m and -M options

-Mmodule executes use module ; before executing your program. This loads the module and calls its import method, causing the module to have its default effect, typically importing subroutines or giving effect to a pragma. You can use quotes to add extra code after the module name, e.g., '-MMODULE qw(foo bar)'.

A little builtin syntactic sugar means you can also say -mMODULE=foo,bar or -MMODULE=foo,bar as a shortcut for '-MMODULE qw(foo bar)'. This avoids the need to use quotes when importing symbols. The actual code generated by -MMODULE=foo,bar is use module split(/,/,q{foo,bar}). Note that the = form removes the distinction between -m and -M; that is, -mMODULE=foo,bar is the same as -MMODULE=foo,bar

The List::Util module has handy functions for array processing. See perldoc: List::Util for documentation. Here's some examples with max, product and sum0.

$ # same as: perl -F, -anE 'BEGIN{use List::Util qw(max)} say max @F'
$ echo '34,17,6' | perl -MList::Util=max -F, -anE 'say max @F'
34

$ echo '34,17,6' | perl -MList::Util=product -F, -anE 'say product @F'
3468

$ # 'sum0' returns '0' even if array is empty, whereas 'sum' returns 'undef'
$ echo '3.14,17,6' | perl -MList::Util=sum0 -F, -anE 'say sum0 @F'
26.14

Here's some examples for shuffle, sample and uniq.

$ s='floor bat to dubious four'
$ echo "$s" | perl -MList::Util=shuffle -lanE 'say join ":", shuffle @F'
bat:four:dubious:floor:to
$ echo 'foobar' | perl -MList::Util=shuffle -F -lanE 'say shuffle @F'
afbroo

$ # similar to shuffle, but can specify number of elements needed
$ echo "$s" | perl -MList::Util=sample -lanE 'say join ":", sample 2, @F'
dubious:bat

$ s='3,b,a,3,c,d,1,d,c,2,2,2,3,1,b'
$ # note that the input order of elements is preserved
$ echo "$s" | perl -MList::Util=uniq -F, -lanE 'say join ",",uniq @F'
3,b,a,c,d,1,2

Here's an example for base64 encoding and decoding. See perldoc: MIME::Base64 for documentation.

$ echo 'hello world' | base64
aGVsbG8gd29ybGQK

$ echo 'hello world' | perl -MMIME::Base64 -ne 'print encode_base64 $_'
aGVsbG8gd29ybGQK
$ echo 'aGVsbG8gd29ybGQK' | perl -MMIME::Base64 -ne 'print decode_base64 $_'
hello world

Third party modules

The Comprehensive Perl Archive Network (https://www.cpan.org/) has a huge collection of modules for various use cases. Before installing a new module, first check if the module is already installed or not:

$ # output modified here for presentation purposes
$ perl -MText::CSV -e ''
Can't locate Text/CSV.pm in @INC (you may need to install the Text::CSV module)
(@INC contains: <list of paths>).
BEGIN failed--compilation aborted.

If you are using the perl version that came installed with your OS, check if you can install a module from your platform repository. Here's an example for Ubuntu:

$ # search for Text::CSV module
$ apt-cache search perl text-csv
libspreadsheet-read-perl - reader for common spreadsheet formats
libtext-csv-encoded-perl - encoding-aware comma-separated values manipulator
libtext-csv-perl - comma-separated values manipulator (using XS or PurePerl)
libtext-csv-xs-perl - Perl C/XS module to process Comma-Separated Value files

$ # install the module of your choice
$ sudo apt install libtext-csv-xs-perl

The above process may fail to work with perl version that you manually installed or if a particular module isn't available from your platform repository. There are different options for such cases.

CSV

For robustly parsing csv files, you can use metacpan: Text::CSV or metacpan: Text::CSV_XS modules. _XS indicates a faster implementation, usually written in C language. The Text::CSV module uses Text::CSV_XS by default and uses Text::CSV_PP (pure Perl implementation) if _XS module isn't available.

Here's an example of parsing csv input with embedded comma characters. ARGV is a special filehandle that iterates over filenames passed as command line arguments, see Multiple file input chapter for more details.

$ s='eagle,"fox,42",bee,frog\n1,2,3,4'
$ # note that -n or -p option isn't used here
$ printf '%b' "$s" | perl -MText::CSV_XS -E 'say $row->[1]
                     while $row = Text::CSV_XS->new->getline(*ARGV)'
fox,42
2

Here's an example with embedded newline characters. Quoting from documentation:

Important Note: The default behavior is to accept only ASCII characters in the range from 0x20 (space) to 0x7E (tilde). This means that the fields can not contain newlines. If your data contains newlines embedded in fields, or characters above 0x7E (tilde), or binary data, you must set binary => 1 in the call to new.

$ cat newline.csv
apple,"1
2
3",good
guava,"32
54",nice

$ perl -MText::CSV_XS -E '
        while($row = Text::CSV_XS->new({binary => 1})->getline(*ARGV))
        {say "$row->[1]\n-----"}' newline.csv
1
2
3
-----
32
54
-----

You can change field separator using the sep_char option.


$ perl -MText::CSV_XS -E '
        while($row = Text::CSV_XS->new({sep_char => "\t"})->getline(*ARGV))
        {say join ",", @$row if $row->[0] eq "CSE"}' marks.txt
CSE,Surya,81
CSE,Amy,67

JSON

Newer versions of Perl come with perldoc: JSON::PP module, which is a pure Perl implementation. Use metacpan: JSON::XS for faster results. There's also metacpan: Cpanel::JSON::XS, which mentions the following reason:

While it seems there are many JSON modules, none of them correctly handle all corner cases, and in most cases their maintainers are unresponsive, gone missing, or not listening to bug reports for other reasons.

Here's a simple example of parsing JSON from a single line of input data.

$ s='{"greeting":"hi","marks":[78,62,93]}'

$ # <> is same as <ARGV>, here it helps to get a line of input
$ echo "$s" | perl -MCpanel::JSON::XS -E '$ip=decode_json <>; say $ip->{greeting}'
hi

$ echo "$s" | perl -MCpanel::JSON::XS -E '$ip=decode_json <>;
              say join ":", @{$ip->{marks}}'
78:62:93

For multiline input, use -0777 (or set $/ = undef) to pass entire input content as single string. You can create a shortcut to make it easier for one-liners.

$ # check if shortcut is available
$ type pj
bash: type: pj: not found

$ # add this to your ~/.bashrc (or the file you use for aliases/functions)
$ pj() { perl -MCpanel::JSON::XS -0777 -E '$ip=decode_json <>;'"$@" ; }

$ s='{"greeting":"hi","marks":[78,62,93]}'

$ echo "$s" | pj 'say $ip->{greeting}'
hi

Here's another example.

$ cat sample.json
{
    "fruit": "apple",
    "blue": ["toy", "flower", "sand stone"],
    "light blue": ["flower", "sky", "water"],
    "language": {
        "natural": ["english", "hindi", "spanish"],
        "programming": ["python", "kotlin", "ruby"]
    },
    "physics": 84
}

$ # order may be different than input as hash doesn't maintain key order
$ # process top-level keys not containing 'e'
$ pj 'for (keys %$ip){say "$_:$ip->{$_}" if !/e/}' sample.json
physics:84
fruit:apple

$ # process keys within 'language' key that contain 't'
$ pj '$"=","; while(($k,$v) = each %{$ip->{language}})
      {say "$k:@{$v}" if $k=~/t/}' sample.json
natural:english,hindi,spanish

Here's an example of converting possibly minified json input to a pretty printed output. You can use json_pp for JSON::PP and json_xs for JSON::XS.

$ s='{"greeting":"hi","marks":[78,62,93],"fruit":"apple"}'

$ # same as: echo "$s" | perl -MCpanel::JSON::XS -e '
$ #          print Cpanel::JSON::XS->new->pretty->encode(decode_json <>)'
$ echo "$s" | cpanel_json_xs
{
   "fruit" : "apple",
   "greeting" : "hi",
   "marks" : [
      78,
      62,
      93
   ]
}

If you need to preserve order, see:

Convert one-liners to pretty formatted scripts

The O module can be used to convert one-liners to full fledged programs. See perldoc: O for documentation. This is similar to -o option for GNU awk.

Here's how -n and -p options are implemented.

$ # note that input sources (stdin, filenames, etc) aren't needed here
$ perl -MO=Deparse -ne 'print if /at/'
LINE: while (defined($_ = readline ARGV)) {
    print $_ if /at/;
}
-e syntax OK

$ perl -MO=Deparse -pe 's/ /:/g'
LINE: while (defined($_ = readline ARGV)) {
    s/ /:/g;
}
continue {
    die "-p destination: $!\n" unless print $_;
}
-e syntax OK

info You can use -MO=qq,Deparse if you don't want to see the -e syntax OK message.

The Deparse output is very useful to debug record separator scripts.

$ perl -MO=Deparse -l -0072 -ne 'print if /a/'
BEGIN { $/ = ":"; $\ = "\n"; }
LINE: while (defined($_ = readline ARGV)) {
    chomp $_;
    print $_ if /a/;
}
-e syntax OK

$ perl -MO=Deparse -00 -ne 'print if /it/'
BEGIN { $/ = ""; $\ = undef; }
LINE: while (defined($_ = readline ARGV)) {
    print $_ if /it/;
}
-e syntax OK

Here's an alternate way to specify code to be executed after the while loop instead of using END block, when -n option is being used. This cannot be used with -p option because it will disrupt the continue block.

$ perl -MO=Deparse -ne 'print if /4/ }{ print "==> the end\n"'
LINE: while (defined($_ = readline ARGV)) {
    print $_ if /4/;
}
{
    print "==> the end\n";
}
-e syntax OK

Here's an example of saving the script to a file instead of displaying on the terminal.

$ perl -MO=Deparse -ne 'print if /4/' > script.pl
-e syntax OK
$ cat script.pl
LINE: while (defined($_ = readline ARGV)) {
    print $_ if /4/;
}

$ perl script.pl table.txt
brown bread mat hair 42
yellow banana window shoes 3.14

info info If you have noted the Deparse output very carefully, you'll see that the while loop has a LINE label. So, you can use next LINE to move onto the next input record even if you are inside other loops/blocks.

Modules to explore

Summary

This chapter showed how to enable modules via -M option and some examples for standard and third-party modules. You also saw how to convert cryptic one-liners to full fledged perl script using the O module.

Exercises

a) For the given space separated words, display the max word determined by alphabetic order.

$ s='let in bat xml me lion'

$ echo "$s" | ##### add your solution here
xml

b) For the given space separated words, randomize the order of characters for each word.

$ s='this is a sample sentence'

$ # sample randomized output shown here, could be different for you
$ echo "$s" | ##### add your solution here
htis si a melasp ecnnsete

c) Use metacpan: XML::LibXML to get content of all tags named blue for the input file sample.xml. See grantm: Perl XML::LibXML by example for a detailed book on XML::LibXML module.

$ cat sample.xml
<doc>
    <greeting type="ask">Hi there. How are you?</greeting>
    <greeting type="reply">I am good.</greeting>
    <color>
        <blue>flower</blue>
        <blue>sand stone</blue>
        <light-blue>sky</light-blue>
        <light-blue>water</light-blue>
    </color>
</doc>

##### add your solution here
flower
sand stone

d) Display current time in the format shown below.

$ # output will be different for you
##### add your solution here
29-Oct-2020 14:23:17

info See metacpan: DateTime for more comprehensive functions.