Using modules

By default, Perl provides plenty of standard modules. And there are many more third-party modules available for a wide variety of use cases. This chapter will discuss the -M command line option and show some examples with the standard and third-party modules. You'll also see how to convert one-liners to a full fledged script file.

The example_files directory has all the files used in the examples.

Standard modules

See perldoc: modules for a complete list of built-in modules. Quoting from perldoc: -m and -M options:

-Mmodule executes use module ; before executing your program. This loads the module and calls its import method, causing the module to have its default effect, typically importing subroutines or giving effect to a pragma. You can use quotes to add extra code after the module name, e.g., '-MMODULE qw(foo bar)'.
A little builtin syntactic sugar means you can also say -mMODULE=foo,bar or -MMODULE=foo,bar as a shortcut for '-MMODULE qw(foo bar)'. This avoids the need to use quotes when importing symbols. The actual code generated by -MMODULE=foo,bar is use module split(/,/,q{foo,bar}). Note that the = form removes the distinction between -m and -M; that is, -mMODULE=foo,bar is the same as -MMODULE=foo,bar

The List::Util module has handy functions for array processing. See perldoc: List::Util for documentation. Here are some examples with max, product and sum0.

# same as: perl -F, -anE 'BEGIN{use List::Util qw(max)} say max @F'
$ echo '34,17,6' | perl -MList::Util=max -F, -anE 'say max @F'
34

$ echo '34,17,6' | perl -MList::Util=product -F, -anE 'say product @F'
3468

# 'sum0' returns '0' even if the array is empty, whereas 'sum' returns 'undef'
$ echo '3.14,17,6' | perl -MList::Util=sum0 -F, -anE 'say sum0 @F'
26.14

Here are some examples for shuffle, sample and uniq.

$ s='floor bat to dubious four'
$ echo "$s" | perl -MList::Util=shuffle -lanE 'say join ":", shuffle @F'
bat:four:dubious:floor:to

$ echo 'dragon' | perl -MList::Util=shuffle -F -lanE 'say shuffle @F'
rogdan

# similar to shuffle, but can specify the number of elements needed
$ echo "$s" | perl -MList::Util=sample -lanE 'say join ":", sample 2, @F'
dubious:bat

$ s='3,b,a,3,c,d,1,d,c,2,2,2,3,1,b'
# note that the input order of elements is preserved
$ echo "$s" | perl -MList::Util=uniq -F, -lanE 'say join ",",uniq @F'
3,b,a,c,d,1,2

Here's an example for base64 encoding and decoding. See perldoc: MIME::Base64 for documentation.

$ echo 'hello world' | base64
aGVsbG8gd29ybGQK

$ echo 'hello world' | perl -MMIME::Base64 -ne 'print encode_base64 $_'
aGVsbG8gd29ybGQK
$ echo 'aGVsbG8gd29ybGQK' | perl -MMIME::Base64 -ne 'print decode_base64 $_'
hello world

Third party modules

The Comprehensive Perl Archive Network (https://www.cpan.org/) has a huge collection of modules for various use cases. Before installing a new module, first check if the module is already installed or not:

# output shown here is modified for presentation purposes
$ perl -MText::CSV -e ''
Can't locate Text/CSV.pm in @INC (you may need to install the Text::CSV module)
(@INC entries checked: <list of paths>).
BEGIN failed--compilation aborted.

If you are using the Perl version that came installed with your operating system, check if you can install a module from your platform repository. Here's an example for Ubuntu:

# search for the Text::CSV module
$ apt-cache search perl text-csv
libspreadsheet-read-perl - reader for common spreadsheet formats
libtext-csv-encoded-perl - encoding-aware comma-separated values manipulator
libtext-csv-perl - comma-separated values manipulator (using XS or PurePerl)
libtext-csv-xs-perl - Perl C/XS module to process Comma-Separated Value files

# install the module of your choice
$ sudo apt install libtext-csv-xs-perl

The above process may fail to work with the Perl version that you manually installed or if a particular module isn't available from your platform repository. There are different options for such cases.

stackoverflow: easiest way to install a missing module shows how to use the cpan command and has details for Windows platform too. You might need admin privileges.
metacpan: cpanm is also often recommended
metacpan: Carton is a Perl module dependency manager (aka Bundler for Perl)

CSV

For robustly parsing CSV files, you can use metacpan: Text::CSV or metacpan: Text::CSV_XS modules. _XS indicates a faster implementation, usually written in the C language. The Text::CSV module uses Text::CSV_XS by default and uses Text::CSV_PP (pure Perl implementation) if the _XS module isn't available.

Here's an example of parsing CSV input with embedded comma characters. ARGV is a special filehandle that iterates over filenames passed as command line arguments (see the Multiple file input chapter for more details).

$ s='eagle,"fox,42",bee,frog\n1,2,3,4'
# note that neither -n or -p is used here
$ printf '%b' "$s" | perl -MText::CSV_XS -E 'say $row->[1]
                     while $row = Text::CSV_XS->new->getline(*ARGV)'
fox,42
2

Here's an example with embedded newline characters. Quoting from the documentation:

Important Note: The default behavior is to accept only ASCII characters in the range from 0x20 (space) to 0x7E (tilde). This means that the fields can not contain newlines. If your data contains newlines embedded in fields, or characters above 0x7E (tilde), or binary data, you must set binary => 1 in the call to new.

$ cat newline.csv
apple,"1
2
3",good
guava,"32
54",nice

$ perl -MText::CSV_XS -E '
        while($row = Text::CSV_XS->new({binary => 1})->getline(*ARGV))
        {say "$row->[1]\n-----"}' newline.csv
1
2
3
-----
32
54
-----

You can change the field separator using the sep_char option.


$ perl -MText::CSV_XS -E '
        while($row = Text::CSV_XS->new({sep_char => "\t"})->getline(*ARGV))
        {say join ",", @$row if $row->[0] eq "CSE"}' marks.txt
CSE,Surya,81
CSE,Amy,67

JSON

Newer versions of Perl come with the perldoc: JSON::PP module, which is a pure Perl implementation. Use metacpan: JSON::XS for faster results. There's also metacpan: Cpanel::JSON::XS, which mentions the following reason:

While it seems there are many JSON modules, none of them correctly handle all corner cases, and in most cases their maintainers are unresponsive, gone missing, or not listening to bug reports for other reasons.

Here's a simple example of parsing JSON from a single line of input data.

$ s='{"greeting":"hi","marks":[78,62,93]}'

# <> is same as <ARGV>, here it helps to get a line of input
$ echo "$s" | perl -MCpanel::JSON::XS -E '$ip=decode_json <>; say $ip->{greeting}'
hi

$ echo "$s" | perl -MCpanel::JSON::XS -E '$ip=decode_json <>;
              say join ":", @{$ip->{marks}}'
78:62:93

For multiline input, use -0777 (or set $/ = undef or -g with newer Perl versions) to pass the entire input content as single string. You can also create a shortcut to make it easier for one-liners.

# check if a shortcut is available
$ type pj
bash: type: pj: not found

# add this to your ~/.bashrc (or the file you use for aliases/functions)
$ pj() { perl -MCpanel::JSON::XS -0777 -E '$ip=decode_json <>;'"$@" ; }

$ s='{"greeting":"hi","marks":[78,62,93]}'

$ echo "$s" | pj 'say $ip->{greeting}'
hi

Here's another example.

$ cat sample.json
{
    "fruit": "apple",
    "blue": ["toy", "flower", "sand stone"],
    "light blue": ["flower", "sky", "water"],
    "language": {
        "natural": ["english", "hindi", "spanish"],
        "programming": ["python", "kotlin", "ruby"]
    },
    "physics": 84
}

# order may be different than input as hash doesn't maintain key order
# process top-level keys not containing 'e'
$ pj 'for (keys %$ip){say "$_:$ip->{$_}" if !/e/}' sample.json
physics:84
fruit:apple

# process keys within 'language' key that contain 't'
$ pj '$"=","; while(($k,$v) = each %{$ip->{language}})
      {say "$k:@{$v}" if $k=~/t/}' sample.json
natural:english,hindi,spanish

Here's an example of converting possibly minified json input to a pretty printed output. You can use json_pp for JSON::PP and json_xs for JSON::XS.

$ s='{"greeting":"hi","marks":[78,62,93],"fruit":"apple"}'

# same as: echo "$s" | perl -MCpanel::JSON::XS -e '
#          print Cpanel::JSON::XS->new->pretty->encode(decode_json <>)'
$ echo "$s" | cpanel_json_xs
{
   "fruit" : "apple",
   "greeting" : "hi",
   "marks" : [
      78,
      62,
      93
   ]
}

If you need to preserve order, see:

Convert one-liners to pretty formatted scripts

The O module can be used to convert one-liners to full fledged programs. See perldoc: O for documentation. This is similar to the -o option provided by GNU awk.

Here's how the -n and -p options are implemented.

# note that input sources (stdin, filenames, etc) aren't needed here
$ perl -MO=Deparse -ne 'print if /at/'
LINE: while (defined($_ = readline ARGV)) {
    print $_ if /at/;
}
-e syntax OK

$ perl -MO=Deparse -pe 's/ /:/g'
LINE: while (defined($_ = readline ARGV)) {
    s/ /:/g;
}
continue {
    die "-p destination: $!\n" unless print $_;
}
-e syntax OK

You can use -MO=qq,Deparse if you don't want to see the -e syntax OK message.

The Deparse output is very useful to debug record separator scripts.

$ perl -MO=Deparse -l -0072 -ne 'print if /a/'
BEGIN { $/ = ":"; $\ = "\n"; }
LINE: while (defined($_ = readline ARGV)) {
    chomp $_;
    print $_ if /a/;
}
-e syntax OK

$ perl -MO=Deparse -00 -ne 'print if /it/'
BEGIN { $/ = ""; $\ = undef; }
LINE: while (defined($_ = readline ARGV)) {
    print $_ if /it/;
}
-e syntax OK

Here's an alternate way to specify some code to be executed after the while loop instead of using the END block, when the -n option is being used. This cannot be used with the -p option because it will disrupt the continue block.

$ perl -MO=Deparse -ne 'print if /4/ }{ print "==> the end\n"'
LINE: while (defined($_ = readline ARGV)) {
    print $_ if /4/;
}
{
    print "==> the end\n";
}
-e syntax OK

Here's an example of saving the script to a file instead of displaying on the terminal.

$ perl -MO=Deparse -ne 'print if /4/' > script.pl
-e syntax OK
$ cat script.pl
LINE: while (defined($_ = readline ARGV)) {
    print $_ if /4/;
}

$ perl script.pl table.txt
brown bread mat hair 42
yellow banana window shoes 3.14

If you have noted the Deparse output very carefully, you'll see that the while loop has a LINE label. So, you can use next LINE to move onto the next input record even if you are inside other loops and blocks.

Modules to explore

Awesome Perl — curated list of awesome Perl5 frameworks, libraries and software
bioperl — practical descriptions of BioPerl modules
metacpan: XML::LibXML — xml/html parsing
metacpan: String::Approx — fuzzy matching
metacpan: Tie::IxHash — ordered associative arrays for Perl
unix.stackexchange: example for Algorithm::Combinatorics
unix.stackexchange: example for Text::ParseWords
unix.stackexchange: sort words by syllable count using Lingua::EN::Syllable
stackoverflow: regular expression modules

Summary

This chapter showed how to enable modules via the -M option and some examples for the standard and third-party modules. You also saw how to convert cryptic one-liners to full fledged Perl scripts using the O module.

Exercises

The exercises directory has all the files used in this section.

1) For the given space separated words, display the max word determined by alphabetic order.

$ s='let in bat xml me lion'

$ echo "$s" | ##### add your solution here
xml

2) For the given space separated words, randomize the order of characters for each word.

$ s='this is a sample sentence'

# sample randomized output shown here, could be different for you
$ echo "$s" | ##### add your solution here
htis si a melasp ecnnsete

3) Use the metacpan: XML::LibXML module to get the content of all tags named blue for the input file sample.xml. See grantm: Perl XML::LibXML by example for a detailed book on the XML::LibXML module.

$ cat sample.xml
<doc>
    <greeting type="ask">Hi there. How are you?</greeting>
    <greeting type="reply">I am good.</greeting>
    <color>
        <blue>flower</blue>
        <blue>sand stone</blue>
        <light-blue>sky</light-blue>
        <light-blue>water</light-blue>
    </color>
</doc>

##### add your solution here
flower
sand stone

4) Display the current time in the format shown below.

# output will be different for you
##### add your solution here
12-Sep-2023 11:01:14

See metacpan: DateTime for more comprehensive functions.

Perl One-Liners Guide