Using modules
By default, Perl provides plenty of standard modules. And there are many more third-party modules available for a wide variety of use cases. This chapter will discuss the -M
command line option and show some examples with the standard and third-party modules. You'll also see how to convert one-liners to a full fledged script file.
The example_files directory has all the files used in the examples.
Standard modules
See perldoc: modules for a complete list of built-in modules. Quoting from perldoc: -m and -M options:
-Mmodule
executesuse module ;
before executing your program. This loads the module and calls itsimport
method, causing the module to have its default effect, typically importing subroutines or giving effect to a pragma. You can use quotes to add extra code after the module name, e.g.,'-MMODULE qw(foo bar)'
.A little builtin syntactic sugar means you can also say
-mMODULE=foo,bar
or-MMODULE=foo,bar
as a shortcut for'-MMODULE qw(foo bar)'
. This avoids the need to use quotes when importing symbols. The actual code generated by-MMODULE=foo,bar
isuse module split(/,/,q{foo,bar})
. Note that the=
form removes the distinction between-m
and-M
; that is,-mMODULE=foo,bar
is the same as-MMODULE=foo,bar
The List::Util
module has handy functions for array processing. See perldoc: List::Util for documentation. Here are some examples with max
, product
and sum0
.
# same as: perl -F, -anE 'BEGIN{use List::Util qw(max)} say max @F'
$ echo '34,17,6' | perl -MList::Util=max -F, -anE 'say max @F'
34
$ echo '34,17,6' | perl -MList::Util=product -F, -anE 'say product @F'
3468
# 'sum0' returns '0' even if the array is empty, whereas 'sum' returns 'undef'
$ echo '3.14,17,6' | perl -MList::Util=sum0 -F, -anE 'say sum0 @F'
26.14
Here are some examples for shuffle
, sample
and uniq
.
$ s='floor bat to dubious four'
$ echo "$s" | perl -MList::Util=shuffle -lanE 'say join ":", shuffle @F'
bat:four:dubious:floor:to
$ echo 'dragon' | perl -MList::Util=shuffle -F -lanE 'say shuffle @F'
rogdan
# similar to shuffle, but can specify the number of elements needed
$ echo "$s" | perl -MList::Util=sample -lanE 'say join ":", sample 2, @F'
dubious:bat
$ s='3,b,a,3,c,d,1,d,c,2,2,2,3,1,b'
# note that the input order of elements is preserved
$ echo "$s" | perl -MList::Util=uniq -F, -lanE 'say join ",",uniq @F'
3,b,a,c,d,1,2
Here's an example for base64
encoding and decoding. See perldoc: MIME::Base64 for documentation.
$ echo 'hello world' | base64
aGVsbG8gd29ybGQK
$ echo 'hello world' | perl -MMIME::Base64 -ne 'print encode_base64 $_'
aGVsbG8gd29ybGQK
$ echo 'aGVsbG8gd29ybGQK' | perl -MMIME::Base64 -ne 'print decode_base64 $_'
hello world
Third party modules
The Comprehensive Perl Archive Network (https://www.cpan.org/) has a huge collection of modules for various use cases. Before installing a new module, first check if the module is already installed or not:
# output shown here is modified for presentation purposes
$ perl -MText::CSV -e ''
Can't locate Text/CSV.pm in @INC (you may need to install the Text::CSV module)
(@INC entries checked: <list of paths>).
BEGIN failed--compilation aborted.
If you are using the Perl version that came installed with your operating system, check if you can install a module from your platform repository. Here's an example for Ubuntu:
# search for the Text::CSV module
$ apt-cache search perl text-csv
libspreadsheet-read-perl - reader for common spreadsheet formats
libtext-csv-encoded-perl - encoding-aware comma-separated values manipulator
libtext-csv-perl - comma-separated values manipulator (using XS or PurePerl)
libtext-csv-xs-perl - Perl C/XS module to process Comma-Separated Value files
# install the module of your choice
$ sudo apt install libtext-csv-xs-perl
The above process may fail to work with the Perl version that you manually installed or if a particular module isn't available from your platform repository. There are different options for such cases.
- stackoverflow: easiest way to install a missing module shows how to use the
cpan
command and has details for Windows platform too. You might need admin privileges. - metacpan: cpanm is also often recommended
- metacpan: Carton is a Perl module dependency manager (aka Bundler for Perl)
CSV
For robustly parsing CSV files, you can use metacpan: Text::CSV or metacpan: Text::CSV_XS modules. _XS
indicates a faster implementation, usually written in the C
language. The Text::CSV
module uses Text::CSV_XS
by default and uses Text::CSV_PP
(pure Perl implementation) if the _XS
module isn't available.
Here's an example of parsing CSV input with embedded comma characters. ARGV
is a special filehandle that iterates over filenames passed as command line arguments (see the Multiple file input chapter for more details).
$ s='eagle,"fox,42",bee,frog\n1,2,3,4'
# note that neither -n or -p is used here
$ printf '%b' "$s" | perl -MText::CSV_XS -E 'say $row->[1]
while $row = Text::CSV_XS->new->getline(*ARGV)'
fox,42
2
Here's an example with embedded newline characters. Quoting from the documentation:
Important Note: The default behavior is to accept only ASCII characters in the range from
0x20
(space) to0x7E
(tilde). This means that the fields can not contain newlines. If your data contains newlines embedded in fields, or characters above0x7E
(tilde), or binary data, you must setbinary => 1
in the call tonew
.
$ cat newline.csv
apple,"1
2
3",good
guava,"32
54",nice
$ perl -MText::CSV_XS -E '
while($row = Text::CSV_XS->new({binary => 1})->getline(*ARGV))
{say "$row->[1]\n-----"}' newline.csv
1
2
3
-----
32
54
-----
You can change the field separator using the sep_char
option.
$ perl -MText::CSV_XS -E '
while($row = Text::CSV_XS->new({sep_char => "\t"})->getline(*ARGV))
{say join ",", @$row if $row->[0] eq "CSE"}' marks.txt
CSE,Surya,81
CSE,Amy,67
JSON
Newer versions of Perl come with the perldoc: JSON::PP module, which is a pure Perl implementation. Use metacpan: JSON::XS for faster results. There's also metacpan: Cpanel::JSON::XS, which mentions the following reason:
While it seems there are many JSON modules, none of them correctly handle all corner cases, and in most cases their maintainers are unresponsive, gone missing, or not listening to bug reports for other reasons.
Here's a simple example of parsing JSON from a single line of input data.
$ s='{"greeting":"hi","marks":[78,62,93]}'
# <> is same as <ARGV>, here it helps to get a line of input
$ echo "$s" | perl -MCpanel::JSON::XS -E '$ip=decode_json <>; say $ip->{greeting}'
hi
$ echo "$s" | perl -MCpanel::JSON::XS -E '$ip=decode_json <>;
say join ":", @{$ip->{marks}}'
78:62:93
For multiline input, use -0777
(or set $/ = undef
or -g
with newer Perl versions) to pass the entire input content as single string. You can also create a shortcut to make it easier for one-liners.
# check if a shortcut is available
$ type pj
bash: type: pj: not found
# add this to your ~/.bashrc (or the file you use for aliases/functions)
$ pj() { perl -MCpanel::JSON::XS -0777 -E '$ip=decode_json <>;'"$@" ; }
$ s='{"greeting":"hi","marks":[78,62,93]}'
$ echo "$s" | pj 'say $ip->{greeting}'
hi
Here's another example.
$ cat sample.json
{
"fruit": "apple",
"blue": ["toy", "flower", "sand stone"],
"light blue": ["flower", "sky", "water"],
"language": {
"natural": ["english", "hindi", "spanish"],
"programming": ["python", "kotlin", "ruby"]
},
"physics": 84
}
# order may be different than input as hash doesn't maintain key order
# process top-level keys not containing 'e'
$ pj 'for (keys %$ip){say "$_:$ip->{$_}" if !/e/}' sample.json
physics:84
fruit:apple
# process keys within 'language' key that contain 't'
$ pj '$"=","; while(($k,$v) = each %{$ip->{language}})
{say "$k:@{$v}" if $k=~/t/}' sample.json
natural:english,hindi,spanish
Here's an example of converting possibly minified json
input to a pretty printed output. You can use json_pp
for JSON::PP
and json_xs
for JSON::XS
.
$ s='{"greeting":"hi","marks":[78,62,93],"fruit":"apple"}'
# same as: echo "$s" | perl -MCpanel::JSON::XS -e '
# print Cpanel::JSON::XS->new->pretty->encode(decode_json <>)'
$ echo "$s" | cpanel_json_xs
{
"fruit" : "apple",
"greeting" : "hi",
"marks" : [
78,
62,
93
]
}
If you need to preserve order, see:
- stackoverflow: Hash::Ordered versus Tie::IxHash with JSON::XS encode
- stackoverflow: decode and encode json preserving order
Convert one-liners to pretty formatted scripts
The O
module can be used to convert one-liners to full fledged programs. See perldoc: O for documentation. This is similar to the -o
option provided by GNU awk
.
Here's how the -n
and -p
options are implemented.
# note that input sources (stdin, filenames, etc) aren't needed here
$ perl -MO=Deparse -ne 'print if /at/'
LINE: while (defined($_ = readline ARGV)) {
print $_ if /at/;
}
-e syntax OK
$ perl -MO=Deparse -pe 's/ /:/g'
LINE: while (defined($_ = readline ARGV)) {
s/ /:/g;
}
continue {
die "-p destination: $!\n" unless print $_;
}
-e syntax OK
You can use
-MO=qq,Deparse
if you don't want to see the-e syntax OK
message.
The Deparse
output is very useful to debug record separator scripts.
$ perl -MO=Deparse -l -0072 -ne 'print if /a/'
BEGIN { $/ = ":"; $\ = "\n"; }
LINE: while (defined($_ = readline ARGV)) {
chomp $_;
print $_ if /a/;
}
-e syntax OK
$ perl -MO=Deparse -00 -ne 'print if /it/'
BEGIN { $/ = ""; $\ = undef; }
LINE: while (defined($_ = readline ARGV)) {
print $_ if /it/;
}
-e syntax OK
Here's an alternate way to specify some code to be executed after the while
loop instead of using the END
block, when the -n
option is being used. This cannot be used with the -p
option because it will disrupt the continue
block.
$ perl -MO=Deparse -ne 'print if /4/ }{ print "==> the end\n"'
LINE: while (defined($_ = readline ARGV)) {
print $_ if /4/;
}
{
print "==> the end\n";
}
-e syntax OK
Here's an example of saving the script to a file instead of displaying on the terminal.
$ perl -MO=Deparse -ne 'print if /4/' > script.pl
-e syntax OK
$ cat script.pl
LINE: while (defined($_ = readline ARGV)) {
print $_ if /4/;
}
$ perl script.pl table.txt
brown bread mat hair 42
yellow banana window shoes 3.14
If you have noted the
Deparse
output very carefully, you'll see that thewhile
loop has aLINE
label. So, you can usenext LINE
to move onto the next input record even if you are inside other loops and blocks.
Modules to explore
- Awesome Perl — curated list of awesome Perl5 frameworks, libraries and software
- bioperl — practical descriptions of BioPerl modules
- metacpan: XML::LibXML — xml/html parsing
- metacpan: String::Approx — fuzzy matching
- metacpan: Tie::IxHash — ordered associative arrays for Perl
- unix.stackexchange: example for Algorithm::Combinatorics
- unix.stackexchange: example for Text::ParseWords
- unix.stackexchange: sort words by syllable count using Lingua::EN::Syllable
- stackoverflow: regular expression modules
Summary
This chapter showed how to enable modules via the -M
option and some examples for the standard and third-party modules. You also saw how to convert cryptic one-liners to full fledged Perl scripts using the O
module.
Exercises
The exercises directory has all the files used in this section.
1) For the given space separated words, display the max word determined by alphabetic order.
$ s='let in bat xml me lion'
$ echo "$s" | ##### add your solution here
xml
2) For the given space separated words, randomize the order of characters for each word.
$ s='this is a sample sentence'
# sample randomized output shown here, could be different for you
$ echo "$s" | ##### add your solution here
htis si a melasp ecnnsete
3) Use the metacpan: XML::LibXML module to get the content of all tags named blue
for the input file sample.xml
. See grantm: Perl XML::LibXML by example for a detailed book on the XML::LibXML
module.
$ cat sample.xml
<doc>
<greeting type="ask">Hi there. How are you?</greeting>
<greeting type="reply">I am good.</greeting>
<color>
<blue>flower</blue>
<blue>sand stone</blue>
<light-blue>sky</light-blue>
<light-blue>water</light-blue>
</color>
</doc>
##### add your solution here
flower
sand stone
4) Display the current time in the format shown below.
# output will be different for you
##### add your solution here
12-Sep-2023 11:01:14
See metacpan: DateTime for more comprehensive functions.