Multiple file input

You have seen control structures like BEGIN, END and methods like next and exit that affect the entire input contents. This chapter will discuss features that help to make decisions around individual files when there are multiple files passed as input.

ARGV and ARGF

The ARGV array contains the list of files passed to the ruby script. Once an input file is processed, it is removed from this list. You can dynamically manipulate this array if you wish to change the flow of input being processed. The $* global variable can also be used instead of ARGV.

$ # note that only -e option is used here
$ # can also use 'puts $*' instead of 'puts ARGV'
$ ruby -e 'puts ARGV' f[1-3].txt greeting.txt
f1.txt
f2.txt
f3.txt
greeting.txt

$ # ARGV continuously ejects the filename being processed
$ # f1.txt and f2.txt have 1 line each, table.txt has 3 lines
$ ruby -ne 'puts "#{ARGV.size}: " + ARGV * ","' f[12].txt table.txt
2: f2.txt,table.txt
1: table.txt
0: 
0: 
0: 

ARGF (or the $< global variable) represents the filehandle of the current file (that was passed as an argument to the ruby script) being processed. If ARGV is empty, then ARGF will process stdin data if available. If you explicitly call the close method on ARGF, it will reset the $. variable. See ruby-doc: ARGF for documentation.

$ # logic to do something at the start of each input file
$ # can also use $<.eof instead of ARGF.eof
$ ruby -ne 'puts "--- #{ARGF.filename} ---" if $. == 1;
            print;
            ARGF.close if ARGF.eof' greeting.txt table.txt
--- greeting.txt ---
Hi there
Have a nice day
Good bye
--- table.txt ---
brown bread mat hair 42
blue cake mug shirt -7
yellow banana window shoes 3.14

$ # do something at the end of a file
$ # same as: tail -q -n1 greeting.txt table.txt
$ ruby -ne 'print if ARGF.eof' greeting.txt table.txt
Good bye
yellow banana window shoes 3.14

Here's some more examples.

$ # same as: awk 'FNR==2{print; nextfile}' greeting.txt table.txt
$ ruby -ne '(print; ARGF.close) if $.==2' greeting.txt table.txt
Have a nice day
blue cake mug shirt -7

$ # same as: head -q -n1 and awk 'FNR>1{nextfile} 1'
$ # can also use: ruby -pe 'ARGF.close'
$ ruby -pe 'ARGF.close if $.>=1' greeting.txt table.txt
Hi there
brown bread mat hair 42

You can use methods like read, readline, readlines, gets, etc to explicitly get data from specific filehandle. ARGF is the default source for some of these methods like readlines, gets, etc as they are part of Kernel (see ruby-doc: Kernel for details).

$ # note that only -e option is used
$ # same as: ruby -e 'puts ARGF.gets' greeting.txt
$ ruby -e 'puts gets' greeting.txt
Hi there

$ ruby -e 'puts gets, "---", ARGF.read' greeting.txt
Hi there
---
Have a nice day
Good bye

$ ruby -e 'puts readlines' greeting.txt
Hi there
Have a nice day
Good bye

$ ruby -e 'puts ARGF.readchar' greeting.txt
H

STDIN

The STDIN filehandle is useful to distinguish between files passed as argument and stdin data. See Comparing records section for more examples.

$ # with no file arguments, readline works on stdin data
$ printf 'apple\nmango\n' | ruby -e 'puts readline'
apple

$ # with file arguments, readline doesn't work on stdin data
$ printf 'apple\nmango\n' | ruby -e 'puts readline' greeting.txt
Hi there

$ # use STDIN to work on stdin data irrespective of file arguments
$ printf 'apple\nmango\n' | ruby -e 'puts STDIN.readline' greeting.txt
apple

Skipping remaining contents per file

You have seen examples where exit method is used to avoid processing unnecessary records for the current and any other files yet to be processed. Sometimes, you need to skip only contents for the current file and move on to next file for processing. The close method seen previously comes in handy for such cases.

$ # print filename if it contains 'I' anywhere in the file
$ # same as: grep -l 'I' f[1-3].txt greeting.txt
$ # same as: ruby -0777 -ne 'puts ARGF.filename if /I/'
$ # but slurping is dependent on size of input files and available memory
$ ruby -ne '(puts ARGF.filename; ARGF.close) if /I/' f[1-3].txt greeting.txt
f1.txt
f2.txt

$ # print filename if it contains a word ending with 'e'
$ # and 'bat' or 'mat' (irrespective of case) anywhere in the file
$ # same as: ruby -0777 -ne 'puts ARGF.filename if /(?=.*?e\b)(?i).*[bm]at/m'
$ ruby -ne '$m1=true if /e\b/; $m2=true if /[bm]at/i;
            (puts ARGF.filename; $m1=$m2=false; ARGF.close; next) if $m1 && $m2;
            $m1=$m2=false if ARGF.eof' f[1-3].txt greeting.txt
f3.txt

Summary

This chapter introduced features for processing multiple file inputs and constructing file level decisions. These will show up in many more examples in coming chapters as well.

Exercises

a) Print the last field of first two lines for the input files passed as arguments to the ruby script. Assume space as the field separators for these two files. To make the output more informative, print filenames and a separator as shown in the output below. Assume input files will have at least two lines.

$ # assume table.txt ip.txt are passed as file inputs
##### add your solution here
>table.txt<
42
-7
----------
>ip.txt<
World
you
----------

b) For the given list of input files, display all filenames that contain at or fun in the third field in any of the input lines. Assume space as the field separator and note that some lines may not have three fields.

$ # assume sample.txt secrets.txt ip.txt table.txt are passed as file inputs
##### add your solution here
secrets.txt
ip.txt
table.txt

c) Print the first two lines for each of the input files ip.txt, sample.txt and table.txt. Also, add a separator between the results as shown below (note that the separator isn't present at the end of the output). Assume input files will have at least two lines.

##### add your solution here
Hello World
How are you
---
Hello World

---
brown bread mat hair 42
blue cake mug shirt -7