wc

The wc command is useful to count the number of lines, words and characters for the given inputs.

Line, word and byte counts

By default, the wc command reports the number of lines, words and bytes (in that order). The byte count includes the newline characters, so you can use that as a measure of file size as well. Here's an example:

$ cat greeting.txt
Hi there
Have a nice day

$ wc greeting.txt
 2  6 25 greeting.txt

Wondering why there are leading spaces in the output? They help in aligning results for multiple files (discussed later).

Individual counts

Instead of the three default values, you can use options to get only the particular counts you are interested in. These options are:

-l for line count
-w for word count
-c for byte count

$ wc -l greeting.txt
2 greeting.txt

$ wc -w greeting.txt
6 greeting.txt

$ wc -c greeting.txt
25 greeting.txt

$ wc -wc greeting.txt
 6 25 greeting.txt

With stdin data, you'll get only the count value (unless you use - for stdin). Useful for assigning the output to shell variables.

$ printf 'hello' | wc -c
5
$ printf 'hello' | wc -c -
5 -

$ lines=$(wc -l <greeting.txt)
$ echo "$lines"
2

Multiple files

If you pass multiple files to the wc command, the count values will be displayed separately for each file. You'll also get a summary at the end, which sums the respective count of all the input files.

$ wc greeting.txt nums.txt purchases.txt
 2  6 25 greeting.txt
 3  3 13 nums.txt
 8  9 57 purchases.txt
13 18 95 total
$ wc greeting.txt nums.txt purchases.txt | tail -n1
13 18 95 total

$ wc *[ck]*.csv
  9   9 101 marks.csv
  4   4  70 scores.csv
 13  13 171 total

If you have NUL separated filenames (for example, output from find -print0, grep -lZ, etc), you can use the --files0-from option. This option accepts a file containing the NUL separated data (use - for stdin).

$ printf 'greeting.txt\0nums.txt' | wc --files0-from=-
2 6 25 greeting.txt
3 3 13 nums.txt
5 9 38 total

Character count

Use the -m option instead of -c if the input has multibyte characters.

# byte count
$ printf 'αλεπού' | wc -c
12

# character count
$ printf 'αλεπού' | wc -m
6

Note that the current locale will affect the behavior of the -m option.
$ printf 'αλεπού' | LC_ALL=C wc -m
12

Longest line length

You can use the -L option to report the length of the longest line in the input (excluding the newline character of a line).

$ echo 'apple' | wc -L
5
# last line not ending with newline won't be a problem
$ printf 'apple\nbanana' | wc -L
6

$ wc -L sample.txt
26 sample.txt
$ wc -L <sample.txt
26

If multiple files are passed, the last line summary will show the maximum length among the given inputs.

$ wc -L greeting.txt nums.txt purchases.txt
15 greeting.txt
 4 nums.txt
14 purchases.txt
15 total

Corner cases

Line count is based on the number of newline characters. So, if the last line of the input doesn't end with the newline character, it won't be counted.

$ printf 'good\nmorning\n' | wc -l
2

$ printf 'good\nmorning' | wc -l
1

$ printf '\n\n\n' | wc -l
3

Word count is based on whitespace separation. You'll have to pre-process the input if you do not want certain non-whitespace characters to influence the results.

$ echo 'apple ; banana ; cherry' | wc -w
5

# remove characters other than alphabets and whitespaces
$ echo 'apple ; banana ; cherry' | tr -cd 'a-zA-Z[:space:]'
apple  banana  cherry
$ echo 'apple ; banana ; cherry' | tr -cd 'a-zA-Z[:space:]' | wc -w
3

# allow numbers as well
$ echo '2 : apples ;' | tr -cd '[:alnum:][:space:]' | wc -w
2

-L won't count non-printable characters and tabs are converted to equivalent spaces. Multibyte characters will each be counted as 1 (depending on the locale, they might become non-printable too).

# tab characters can occupy up to 8 columns
$ printf '\t' | wc -L
8
$ printf 'a\tb' | wc -L
9

# example for non-printable character
$ printf 'a\34b' | wc -L
2

# multibyte characters are counted as 1 each in supported locales
$ printf 'αλεπού' | wc -L
6
# non-supported locales can cause them to be treated as non-printable
$ printf 'αλεπού' | LC_ALL=C wc -L
0

-m and -L options count grapheme clusters differently.

$ printf 'cag̈e' | wc -m
5

$ printf 'cag̈e' | wc -L
4

Exercises

The exercises directory has all the files used in this section.

1) Save the number of lines in the greeting.txt input file to the lines shell variable.

$ lines=##### add your solution here
$ echo "$lines"
2

2) What do you think will be the output of the following command?

$ echo 'dragons:2 ; unicorns:10' | wc -w

3) Use appropriate options and arguments to get the output as shown below. Also, why is the line count showing as 2 instead of 3 for the stdin data?

$ printf 'apple\nbanana\ncherry' | ##### add your solution here
      2      25 greeting.txt
      2      19 -
      4      44 total

4) Use appropriate options and arguments to get the output shown below.

$ printf 'greeting.txt\0scores.csv' | ##### add your solution here
2 6 25 greeting.txt
4 4 70 scores.csv
6 10 95 total

5) What is the difference between wc -c and wc -m options? And which option would you use to get the longest line length?

6) Calculate the number of comma separated words from the scores.csv file.

$ cat scores.csv
Name,Maths,Physics,Chemistry
Ith,100,100,100
Cy,97,98,95
Lin,78,83,80

##### add your solution here
16

CLI text processing with GNU Coreutils