Multiple file input

You have already seen blocks like BEGIN, END and statements like next and exit. This chapter will discuss features that are useful to make decisions around each file when there are multiple files passed as input.

The example_files directory has all the files used in the examples.

@ARGV, $ARGV and ARGV

From perldoc: @ARGV:

The array @ARGV contains the command-line arguments intended for the script. $#ARGV is generally the number of arguments minus one, because $ARGV[0] is the first argument, not the program's command name itself.

# note that only the -E option is used here
$ perl -E 'say join "\n", @ARGV' f[1-3].txt greeting.txt
f1.txt
f2.txt
f3.txt
greeting.txt

# @ARGV continuously ejects the filename being processed
# f1.txt and f2.txt have 1 line each, table.txt has 3 lines
$ perl -nE 'say "$#ARGV: " . join ",", @ARGV' f[12].txt table.txt
1: f2.txt,table.txt
0: table.txt
-1: 
-1: 
-1:

See also stackoverflow: referencing filename passed as arguments for more details about @ARGV behavior when the -n or -p switches are active.

From perldoc: $ARGV:

Contains the name of the current file when reading from <>.

From perldoc: ARGV:

The special filehandle that iterates over command-line filenames in @ARGV. Usually written as the null filehandle in the angle operator <>. Note that currently ARGV only has its magical effect within the <> operator; elsewhere it is just a plain filehandle corresponding to the last file opened by <>.

By closing ARGV at the end of each input file, you can reset the $. variable.

# logic to do something at the start of each input file
# closing ARGV will reset $.
$ perl -ne 'print "--- $ARGV ---\n" if $. == 1;
            print;
            close ARGV if eof' greeting.txt table.txt
--- greeting.txt ---
Hi there
Have a nice day
Good bye
--- table.txt ---
brown bread mat hair 42
blue cake mug shirt -7
yellow banana window shoes 3.14

# do something at the end of a file
# same as: tail -q -n1 greeting.txt table.txt
$ perl -ne 'print if eof' greeting.txt table.txt
Good bye
yellow banana window shoes 3.14

Here are some more examples.

# same as: awk 'FNR==2{print; nextfile}' greeting.txt table.txt
$ perl -ne 'print and close ARGV if $.==2' greeting.txt table.txt
Have a nice day
blue cake mug shirt -7

# same as: head -q -n1 and awk 'FNR>1{nextfile} 1'
# can also use: perl -pe 'close ARGV'
$ perl -pe 'close ARGV if $.>=1' greeting.txt table.txt
Hi there
brown bread mat hair 42

In scalar context, <> will return the next input record and in list context, <> returns all the remaining input records. If you need a single character instead of a record, you can use the getc function. See perldoc: getc for documentation.

# note that only the -e option is used, same as: perl -e 'print scalar <>'
$ perl -e 'print scalar readline' greeting.txt
Hi there
$ perl -e '$line = <>; print "$line---\n"; print <>' greeting.txt
Hi there
---
Have a nice day
Good bye

# note that the default filehandle for getc is STDIN
$ perl -E 'say getc' <greeting.txt
H

STDIN

The STDIN filehandle is useful to distinguish between files passed as arguments and the stdin data. See the Comparing records section for more examples.

# with no file arguments, <> reads the stdin data
$ printf 'apple\nmango\n' | perl -e 'print <>'
apple
mango

# with file arguments, <> doesn't read the stdin data
$ printf 'apple\nmango\n' | perl -e 'print <>' greeting.txt
Hi there
Have a nice day
Good bye

$ printf 'apple\nmango\n' | perl -e 'print <STDIN>' greeting.txt
apple
mango

Skipping remaining contents per file

You have seen examples where the exit function was used to avoid processing unnecessary records for the current and any other files yet to be processed. Sometimes, you need to skip only the contents for the current file and move on to the next file for processing. The close ARGV example seen previously comes in handy for such cases.

# avoids unnecessary processing compared to perl -ne 'print if !(/\bba/ .. eof)'
# same as: awk '/\<ba/{nextfile} 1' ip.txt table.txt
$ perl -ne '/\bba/ ? close ARGV : print' ip.txt table.txt
it is a warm and cozy day
listen to what I say
go play in the park
brown bread mat hair 42
blue cake mug shirt -7

# print filename if it contains 'I' anywhere in the file
# same as: grep -l 'I' f[1-3].txt greeting.txt
# same as: perl -0777 -nE 'say $ARGV if /I/'
# but slurping is dependent on size of input files and available memory
$ perl -nE 'if(/I/){say $ARGV; close ARGV}' f[1-3].txt greeting.txt
f1.txt
f2.txt

# print filename if it contains a word ending with 'e'
# and 'bat' or 'mat' (irrespective of case) anywhere in the file
# same as: perl -0777 -nE 'say $ARGV if /(?=.*?e\b)(?i).*[bm]at/s'
$ perl -nE '$m1=1 if /e\b/; $m2=1 if /[bm]at/i;
            if($m1 && $m2){say $ARGV; $m1=$m2=0; close ARGV; next};
            $m1=$m2=0 if eof' f[1-3].txt greeting.txt
f3.txt

Summary

This chapter introduced features for processing multiple file inputs and constructing file level decisions. These will also show up in more examples in the coming chapters.

Exercises

The exercises directory has all the files used in this section.

1) Print the last field of first two lines for the input files table.txt and ip.txt. Assume space as the field separators for these two files. To make the output more informative, print filenames and a separator as shown in the output below. Assume that the input files will have at least two lines.

# assume table.txt ip.txt are passed as file inputs
##### add your solution here
>table.txt<
42
-7
----------
>ip.txt<
World
you
----------

2) For the input files sample.txt, secrets.txt, ip.txt and table.txt, display only the names of files that contain at or fun in the third field. Assume space as the field separator.

##### add your solution here
secrets.txt
ip.txt
table.txt

3) Print the first two lines for each of the input files ip.txt, sample.txt and table.txt. Also, add a separator between the results as shown below (note that the separator isn't present at the end of the output). Assume that the input files will have at least two lines.

##### add your solution here
Hello World
How are you
---
Hello World

---
brown bread mat hair 42
blue cake mug shirt -7

4) Print only the second field of the third line, if any, from these input files: ip.txt, sample.txt and copyright.txt. Consider space as the field separator.

##### add your solution here
game
day
bla

Perl One-Liners Guide