Multiple file input
You have already seen blocks like BEGIN
, END
and statements like next
and exit
. This chapter will discuss features that are useful to make decisions around each file when there are multiple files passed as input.
The example_files directory has all the files used in the examples.
@ARGV, $ARGV and ARGV
From perldoc: @ARGV:
The array
@ARGV
contains the command-line arguments intended for the script.$#ARGV
is generally the number of arguments minus one, because$ARGV[0]
is the first argument, not the program's command name itself.
# note that only the -E option is used here
$ perl -E 'say join "\n", @ARGV' f[1-3].txt greeting.txt
f1.txt
f2.txt
f3.txt
greeting.txt
# @ARGV continuously ejects the filename being processed
# f1.txt and f2.txt have 1 line each, table.txt has 3 lines
$ perl -nE 'say "$#ARGV: " . join ",", @ARGV' f[12].txt table.txt
1: f2.txt,table.txt
0: table.txt
-1:
-1:
-1:
See also stackoverflow: referencing filename passed as arguments for more details about
@ARGV
behavior when the-n
or-p
switches are active.
From perldoc: $ARGV:
Contains the name of the current file when reading from
<>
.
From perldoc: ARGV:
The special filehandle that iterates over command-line filenames in
@ARGV
. Usually written as the null filehandle in the angle operator<>
. Note that currentlyARGV
only has its magical effect within the<>
operator; elsewhere it is just a plain filehandle corresponding to the last file opened by<>
.
By closing ARGV
at the end of each input file, you can reset the $.
variable.
# logic to do something at the start of each input file
# closing ARGV will reset $.
$ perl -ne 'print "--- $ARGV ---\n" if $. == 1;
print;
close ARGV if eof' greeting.txt table.txt
--- greeting.txt ---
Hi there
Have a nice day
Good bye
--- table.txt ---
brown bread mat hair 42
blue cake mug shirt -7
yellow banana window shoes 3.14
# do something at the end of a file
# same as: tail -q -n1 greeting.txt table.txt
$ perl -ne 'print if eof' greeting.txt table.txt
Good bye
yellow banana window shoes 3.14
Here are some more examples.
# same as: awk 'FNR==2{print; nextfile}' greeting.txt table.txt
$ perl -ne 'print and close ARGV if $.==2' greeting.txt table.txt
Have a nice day
blue cake mug shirt -7
# same as: head -q -n1 and awk 'FNR>1{nextfile} 1'
# can also use: perl -pe 'close ARGV'
$ perl -pe 'close ARGV if $.>=1' greeting.txt table.txt
Hi there
brown bread mat hair 42
In scalar context, <>
will return the next input record and in list context, <>
returns all the remaining input records. If you need a single character instead of a record, you can use the getc
function. See perldoc: getc for documentation.
# note that only the -e option is used, same as: perl -e 'print scalar <>'
$ perl -e 'print scalar readline' greeting.txt
Hi there
$ perl -e '$line = <>; print "$line---\n"; print <>' greeting.txt
Hi there
---
Have a nice day
Good bye
# note that the default filehandle for getc is STDIN
$ perl -E 'say getc' <greeting.txt
H
STDIN
The STDIN
filehandle is useful to distinguish between files passed as arguments and the stdin data. See the Comparing records section for more examples.
# with no file arguments, <> reads the stdin data
$ printf 'apple\nmango\n' | perl -e 'print <>'
apple
mango
# with file arguments, <> doesn't read the stdin data
$ printf 'apple\nmango\n' | perl -e 'print <>' greeting.txt
Hi there
Have a nice day
Good bye
$ printf 'apple\nmango\n' | perl -e 'print <STDIN>' greeting.txt
apple
mango
Skipping remaining contents per file
You have seen examples where the exit
function was used to avoid processing unnecessary records for the current and any other files yet to be processed. Sometimes, you need to skip only the contents for the current file and move on to the next file for processing. The close ARGV
example seen previously comes in handy for such cases.
# avoids unnecessary processing compared to perl -ne 'print if !(/\bba/ .. eof)'
# same as: awk '/\<ba/{nextfile} 1' ip.txt table.txt
$ perl -ne '/\bba/ ? close ARGV : print' ip.txt table.txt
it is a warm and cozy day
listen to what I say
go play in the park
brown bread mat hair 42
blue cake mug shirt -7
# print filename if it contains 'I' anywhere in the file
# same as: grep -l 'I' f[1-3].txt greeting.txt
# same as: perl -0777 -nE 'say $ARGV if /I/'
# but slurping is dependent on size of input files and available memory
$ perl -nE 'if(/I/){say $ARGV; close ARGV}' f[1-3].txt greeting.txt
f1.txt
f2.txt
# print filename if it contains a word ending with 'e'
# and 'bat' or 'mat' (irrespective of case) anywhere in the file
# same as: perl -0777 -nE 'say $ARGV if /(?=.*?e\b)(?i).*[bm]at/s'
$ perl -nE '$m1=1 if /e\b/; $m2=1 if /[bm]at/i;
if($m1 && $m2){say $ARGV; $m1=$m2=0; close ARGV; next};
$m1=$m2=0 if eof' f[1-3].txt greeting.txt
f3.txt
Summary
This chapter introduced features for processing multiple file inputs and constructing file level decisions. These will also show up in more examples in the coming chapters.
Exercises
The exercises directory has all the files used in this section.
1) Print the last field of first two lines for the input files table.txt
and ip.txt
. Assume space as the field separators for these two files. To make the output more informative, print filenames and a separator as shown in the output below. Assume that the input files will have at least two lines.
# assume table.txt ip.txt are passed as file inputs
##### add your solution here
>table.txt<
42
-7
----------
>ip.txt<
World
you
----------
2) For the input files sample.txt
, secrets.txt
, ip.txt
and table.txt
, display only the names of files that contain at
or fun
in the third field. Assume space as the field separator.
##### add your solution here
secrets.txt
ip.txt
table.txt
3) Print the first two lines for each of the input files ip.txt
, sample.txt
and table.txt
. Also, add a separator between the results as shown below (note that the separator isn't present at the end of the output). Assume that the input files will have at least two lines.
##### add your solution here
Hello World
How are you
---
Hello World
---
brown bread mat hair 42
blue cake mug shirt -7
4) Print only the second field of the third line, if any, from these input files: ip.txt
, sample.txt
and copyright.txt
. Consider space as the field separator.
##### add your solution here
game
day
bla