Multiple file input

You have seen special blocks like BEGIN, END and control structures like next and exit that affect the entire input contents. This chapter will discuss features that help to make decisions around individual files when there are multiple files passed as input.

@ARGV, $ARGV and ARGV

From perldoc: @ARGV:

The array @ARGV contains the command-line arguments intended for the script. $#ARGV is generally the number of arguments minus one, because $ARGV[0] is the first argument, not the program's command name itself.

$ # note that only -E option is used here
$ perl -E 'say join "\n", @ARGV' f[1-3].txt greeting.txt
f1.txt
f2.txt
f3.txt
greeting.txt

$ # @ARGV continuously ejects the filename being processed
$ # f1.txt and f2.txt have 1 line each, table.txt has 3 lines
$ perl -nE 'say "$#ARGV: " . join ",", @ARGV' f[12].txt table.txt
1: f2.txt,table.txt
0: table.txt
-1: 
-1: 
-1: 

info See also stackoverflow: referencing filename passed as arguments for more details about @ARGV behavior when -n or -p switch is active.

From perldoc: $ARGV:

Contains the name of the current file when reading from <>.

From perldoc: ARGV:

The special filehandle that iterates over command-line filenames in @ARGV. Usually written as the null filehandle in the angle operator <>. Note that currently ARGV only has its magical effect within the <> operator; elsewhere it is just a plain filehandle corresponding to the last file opened by <>.

By closing ARGV at the end of each input file, you can reset the $. variable.

$ # logic to do something at the start of each input file
$ # closing ARGV will reset $.
$ perl -ne 'print "--- $ARGV ---\n" if $. == 1;
            print;
            close ARGV if eof' greeting.txt table.txt
--- greeting.txt ---
Hi there
Have a nice day
Good bye
--- table.txt ---
brown bread mat hair 42
blue cake mug shirt -7
yellow banana window shoes 3.14

$ # do something at the end of a file
$ # same as: tail -q -n1 greeting.txt table.txt
$ perl -ne 'print if eof' greeting.txt table.txt
Good bye
yellow banana window shoes 3.14

Here's some more examples.

$ # same as: awk 'FNR==2{print; nextfile}' greeting.txt table.txt
$ perl -ne 'print and close ARGV if $.==2' greeting.txt table.txt
Have a nice day
blue cake mug shirt -7

$ # same as: head -q -n1 and awk 'FNR>1{nextfile} 1'
$ # can also use: perl -pe 'close ARGV'
$ perl -pe 'close ARGV if $.>=1' greeting.txt table.txt
Hi there
brown bread mat hair 42

In scalar context, <> will return the next input record and in list context, <> returns all the remaining input records. If you need a single character instead of a record, you can use getc function. See perldoc: getc for documentation.

$ # note that only -e option is used, same as: perl -e 'print scalar <>'
$ perl -e 'print scalar readline' greeting.txt
Hi there
$ perl -e '$line = <>; print "$line---\n"; print <>' greeting.txt
Hi there
---
Have a nice day
Good bye

$ # note that default filehandle for getc is STDIN
$ perl -E 'say getc' <greeting.txt
H

STDIN

The STDIN filehandle is useful to distinguish between files passed as argument and stdin data. See Comparing records section for more examples.

$ # with no file arguments, <> reads stdin data
$ printf 'apple\nmango\n' | perl -e 'print <>'
apple
mango

$ # with file arguments, <> doesn't read stdin data
$ printf 'apple\nmango\n' | perl -e 'print <>' greeting.txt
Hi there
Have a nice day
Good bye

$ printf 'apple\nmango\n' | perl -e 'print <STDIN>' greeting.txt
apple
mango

Skipping remaining contents per file

You have seen examples where exit function is used to avoid processing unnecessary records for the current and any other files yet to be processed. Sometimes, you need to skip only contents for the current file and move on to next file for processing. The close ARGV example seen previously comes in handy for such cases.

$ # print filename if it contains 'I' anywhere in the file
$ # same as: grep -l 'I' f[1-3].txt greeting.txt
$ # same as: perl -0777 -nE 'say $ARGV if /I/'
$ # but slurping is dependent on size of input files and available memory
$ perl -nE 'if(/I/){say $ARGV; close ARGV}' f[1-3].txt greeting.txt
f1.txt
f2.txt

$ # print filename if it contains a word ending with 'e'
$ # and 'bat' or 'mat' (irrespective of case) anywhere in the file
$ # same as: perl -0777 -nE 'say $ARGV if /(?=.*?e\b)(?i).*[bm]at/s'
$ perl -nE '$m1=1 if /e\b/; $m2=1 if /[bm]at/i;
            if($m1 && $m2){say $ARGV; $m1=$m2=0; close ARGV; next};
            $m1=$m2=0 if eof' f[1-3].txt greeting.txt
f3.txt

Summary

This chapter introduced features for processing multiple file inputs and constructing file level decisions. These will show up in many more examples in coming chapters.

Exercises

a) Print the last field of first two lines for the input files passed as arguments to the perl script. Assume space as the field separators for these two files. To make the output more informative, print filenames and a separator as shown in the output below. Assume input files will have at least two lines.

$ # assume table.txt ip.txt are passed as file inputs
##### add your solution here
>table.txt<
42
-7
----------
>ip.txt<
World
you
----------

b) For the given list of input files, display all filenames that contain at or fun in the third field in any of the input lines. Assume space as the field separator.

$ # assume sample.txt secrets.txt ip.txt table.txt are passed as file inputs
##### add your solution here
secrets.txt
ip.txt
table.txt

c) Print the first two lines for each of the input files ip.txt, sample.txt and table.txt. Also, add a separator between the results as shown below (note that the separator isn't present at the end of the output). Assume input files will have at least two lines.

##### add your solution here
Hello World
How are you
---
Hello World

---
brown bread mat hair 42
blue cake mug shirt -7