Multiple file input

You have already seen blocks like BEGIN, END and methods like next and exit. This chapter will discuss features that are useful to make decisions around individual files when there are multiple files passed as input.

info The example_files directory has all the files used in the examples.

ARGV and ARGF

The ARGV array contains the list of files passed to the Ruby script. Once an input file is processed, it is removed from this list. You can dynamically manipulate this array if you wish to change the flow of input being processed. The $* global variable can also be used instead of ARGV.

# note that only the -e option is used here
# can also use 'puts $*' instead of 'puts ARGV'
$ ruby -e 'puts ARGV' f[1-3].txt greeting.txt
f1.txt
f2.txt
f3.txt
greeting.txt

# ARGV continuously ejects the filename being processed
# f1.txt and f2.txt have 1 line each, table.txt has 3 lines
$ ruby -ne 'puts "#{ARGV.size}: " + ARGV * ","' f[12].txt table.txt
2: f2.txt,table.txt
1: table.txt
0: 
0: 
0: 

ARGF (or the $< global variable) represents the filehandle of the current file (that was passed as an argument to the Ruby script) being processed. If ARGV is empty, then ARGF will process stdin data if available. If you explicitly call the close method on ARGF, it will reset the $. variable. See ruby-doc: ARGF for documentation.

# logic to do something at the start of each input file
# can also use $<.eof instead of ARGF.eof
$ ruby -ne 'puts "--- #{ARGF.filename} ---" if $. == 1;
            print;
            ARGF.close if ARGF.eof' greeting.txt table.txt
--- greeting.txt ---
Hi there
Have a nice day
Good bye
--- table.txt ---
brown bread mat hair 42
blue cake mug shirt -7
yellow banana window shoes 3.14

# do something at the end of a file
# same as: tail -q -n1 greeting.txt table.txt
$ ruby -ne 'print if ARGF.eof' greeting.txt table.txt
Good bye
yellow banana window shoes 3.14

Here are some more examples.

# same as: awk 'FNR==2{print; nextfile}' greeting.txt table.txt
$ ruby -ne '(print; ARGF.close) if $.==2' greeting.txt table.txt
Have a nice day
blue cake mug shirt -7

# same as: head -q -n1 and awk 'FNR>1{nextfile} 1'
# can also use: ruby -pe 'ARGF.close'
$ ruby -pe 'ARGF.close if $.>=1' greeting.txt table.txt
Hi there
brown bread mat hair 42

You can use methods like read, readline, readlines, gets, etc to explicitly get data from a specific filehandle. ARGF is the default source for some of these methods like readlines, gets, etc as they are part of Kernel (see ruby-doc: Kernel for details).

# note that only the -e option is used
# same as: ruby -e 'puts ARGF.gets' greeting.txt
$ ruby -e 'puts gets' greeting.txt
Hi there

$ ruby -e 'puts gets, "---", ARGF.read' greeting.txt
Hi there
---
Have a nice day
Good bye

$ ruby -e 'puts readlines' greeting.txt
Hi there
Have a nice day
Good bye

$ ruby -e 'puts ARGF.readchar' greeting.txt
H

STDIN

The STDIN filehandle is useful to distinguish between files passed as arguments and the stdin data. See the Comparing records section for more examples.

# with no file arguments, readline uses the stdin data
$ printf 'apple\nmango\n' | ruby -e 'puts readline'
apple

# with file arguments, readline doesn't work on stdin data
$ printf 'apple\nmango\n' | ruby -e 'puts readline' greeting.txt
Hi there

# use STDIN to work on stdin data irrespective of file arguments
$ printf 'apple\nmango\n' | ruby -e 'puts STDIN.readline' greeting.txt
apple

Skipping remaining contents per file

You have seen examples where the exit method was used to avoid processing unnecessary records for the current and any other files yet to be processed. Sometimes, you need to skip only the contents of the current file and move on to the next file for processing. The close method seen previously comes in handy for such cases.

# avoids unnecessary processing compared to ruby -ne 'print if !(/\bba/..$<.eof)'
# same as: awk '/\<ba/{nextfile} 1' ip.txt table.txt
$ ruby -ne '/\bba/ ? ARGF.close : print' ip.txt table.txt
it is a warm and cozy day
listen to what I say
go play in the park
brown bread mat hair 42
blue cake mug shirt -7

# print filename if it contains 'I' anywhere in the file
# same as: grep -l 'I' f[1-3].txt greeting.txt
# same as: ruby -0777 -ne 'puts ARGF.filename if /I/'
# but slurping is dependent on size of input files and available memory
$ ruby -ne '(puts ARGF.filename; ARGF.close) if /I/' f[1-3].txt greeting.txt
f1.txt
f2.txt

# print filename if it contains a word ending with 'e'
# and 'bat' or 'mat' (irrespective of case) anywhere in the file
# same as: ruby -0777 -ne 'puts ARGF.filename if /(?=.*?e\b)(?i).*[bm]at/m'
$ ruby -ne '$m1=true if /e\b/; $m2=true if /[bm]at/i;
            (puts ARGF.filename; $m1=$m2=false; ARGF.close; next) if $m1 && $m2;
            $m1=$m2=false if ARGF.eof' f[1-3].txt greeting.txt
f3.txt

Summary

This chapter introduced features for processing multiple file inputs and constructing file level decisions. These will also show up in more examples in the coming chapters.

Exercises

info The exercises directory has all the files used in this section.

1) Print the last field of first two lines for the input files table.txt and ip.txt. Assume space as the field separators for these two files. To make the output more informative, print filenames and a separator as shown in the output below. Assume that the input files will have at least two lines.

# table.txt ip.txt are passed as file inputs
##### add your solution here
>table.txt<
42
-7
----------
>ip.txt<
World
you
----------

2) For the input files sample.txt, secrets.txt, ip.txt and table.txt, display only the names of files that contain at or fun in the third field. Assume space as the field separator.

##### add your solution here
secrets.txt
ip.txt
table.txt

3) Print the first two lines from the input files ip.txt, sample.txt and table.txt. Also, add a separator between the results as shown below (note that the separator isn't present at the end of the output). Assume that the input files will have at least two lines.

##### add your solution here
Hello World
How are you
---
Hello World

---
brown bread mat hair 42
blue cake mug shirt -7

4) Print only the second field of the third line, if any, from these input files: ip.txt, sample.txt and copyright.txt. Consider space as the field separator.

##### add your solution here
game
day
bla