File Properties

In this chapter, you'll learn how to view file details like line and word counts, file and disk sizes, file types, extract parts of a file path, etc. You'll also learn how to change file properties like timestamps and permissions.

info The example_files directory has the scripts and sample input files used in this chapter.

wc

The wc command is typically used to count the number of lines, words and characters for the given inputs. Here are some basic examples:

# change to the 'example_files/text_files' directory
$ cat greeting.txt
Hi there
Have a nice day

# by default, wc gives the newline/word/byte count (in that order)
$ wc greeting.txt
 2  6 25 greeting.txt

# get only the specified counts
$ wc -l greeting.txt
2 greeting.txt
$ wc -w greeting.txt
6 greeting.txt
$ wc -c greeting.txt
25 greeting.txt
$ wc -wc greeting.txt
 6 25 greeting.txt

Filename won't be printed for stdin data. This is helpful to save the results in a variable for scripting purposes.

$ wc -l <greeting.txt
2

Word count is based on whitespace separation. You can pre-process the input to prevent certain non-whitespace characters to influence the results. tr can be used to remove a particular set of characters (this command will be discussed in the Assorted Text Processing Tools chapter).

$ echo 'apple ; banana ; cherry' | wc -w
5

# remove characters other than alphabets and whitespace
# -d option is for deleting, -c option complements the given set
$ echo 'apple ; banana ; cherry' | tr -cd 'a-zA-Z[:space:]'
apple  banana  cherry
$ echo 'apple ; banana ; cherry' | tr -cd 'a-zA-Z[:space:]' | wc -w
3

If you pass multiple files to the wc command, the count values will be displayed separately for each file. You'll also get a summary at the end, which sums the respective count of all the input files.

$ wc greeting.txt fruits.txt sample.txt
  2   6  25 greeting.txt
  3   3  20 fruits.txt
 15  38 183 sample.txt
 20  47 228 total

You can use the -L option to report the length of the longest line in the input (excluding the newline character of a line). Note that -L won't count non-printable characters and tabs are converted to equivalent spaces. Multibyte characters and grapheme clusters will each be counted as 1 (depending on the locale, they might become non-printable too).

$ echo 'apple' | wc -L
5

$ echo 'αλεπού cag̈e' | wc -L
11

$ wc -L <greeting.txt
15

Use the -m option instead of -c if the input has multibyte characters.

$ printf 'αλεπού' | wc -c
12

$ printf 'αλεπού' | wc -m
6

du

The du command helps you estimate the size of files and directories.

By default, size is given in terms of 1024 bytes. All directories and sub-directories are recursively reported, but files are ignored. You can use the -a option if files should also be reported. du is one of the commands that require an explicit option (-L in this case) if you want symbolic links to be followed.

# change to the 'scripts' directory and source the 'du.sh' script
$ source du.sh

# n * 1024 bytes
$ du
28      ./projects/scripts
48      ./projects
8       ./todos
7536    .

Use the -s option to show the total directory size without descending into sub-directories. Add the -c option to also show the total size at the end.

$ du -s projects report.log
48      projects
7476    report.log

$ du -sc projects report.log
48      projects
7476    report.log
7524    total

Here are some examples to illustrate the size formatting options:

# number of bytes
$ du -b report.log
7654321 report.log

# n * 1024 bytes
$ du -k report.log
7476    report.log

# n * 1024 * 1024 bytes
$ du -m report.log
8       report.log

The -h option reports size in human readable format (uses power of 1024). Use the --si option to get results in powers of 1000 instead. If you use du -h, you can pipe the output to sort -h for sorting purposes.

$ du -sh *
48K     projects
7.4M    report.log
8.0K    todos

$ du -sh * | sort -h
8.0K    todos
48K     projects
7.4M    report.log

df

The df command gives you the space usage of file systems. df without path arguments will give information about all the currently mounted file systems.

$ df .
Filesystem     1K-blocks     Used Available Use% Mounted on
/dev/sda1       98298500 58563816  34734748  63% /

Use the -h option for human readable sizes. The -B option allows you to scale sizes by the specified amount. Use --si for size in powers of 1000 instead of 1024.

$ df -h .
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda1        94G   56G   34G  63% /

Use the --output option to report only the specific fields of interest:

$ df -h --output=size,used,file / /media/learnbyexample/projs
 Size  Used File
  94G   56G /
  92G   35G /media/learnbyexample/projs

# 'awk' here excludes first line and matches lines with first field >= 30
$ df -h --output=pcent,fstype,target | awk 'NR>1 && $1>=30'
 63% ext3     /
 38% ext4     /media/learnbyexample/projs
 51% ext4     /media/learnbyexample/backups

stat

The stat command is useful to get details like file type, size, inode, permissions, last accessed and modified timestamps, etc. You'll get all of these details by default. The -c and --printf options can be used to display only the required details in a particular format.

# change to the 'scripts' directory and source the 'stat.sh' script
$ source stat.sh

# %x gives the last accessed timestamp
$ stat -c '%x' ip.txt
2022-06-01 13:25:18.693823117 +0530

# %y gives the last modified timestamp
$ stat -c '%y' ip.txt
2022-05-24 14:39:41.285714934 +0530

# %s gives the file size in bytes
# \n is used to insert a newline
# %i gives the inode value
# same as: stat --printf='%s\n%i\n' ip.txt
$ stat -c $'%s\n%i' ip.txt
10
787224

# %N gives quoted filenames
# if the input is a link, the path it points to is also displayed
$ stat -c '%N' words.txt
'words.txt' -> '/usr/share/dict/words'

You can also pass multiple file arguments:

# %s gives the file size in bytes
# %n gives filenames
$ stat -c '%s %n' ip.txt hi.sh
10 ip.txt
21 hi.sh

info warning The stat command should be preferred instead of parsing the ls -l output for file details. See mywiki.wooledge: avoid parsing output of ls and unix.stackexchange: why not parse ls? for explanation and other alternatives.

touch

As mentioned earlier, the touch command helps you change the timestamps of files. You can do so based on the current timestamp, passing an argument, copying the value from another file and so on.

By default, touch updates both the access and modification timestamps to the current time. You can use the -a option to change only the access timestamp and -m to change only the modification timestamp.

# change to the 'scripts' directory and source the 'touch.sh' script
$ source touch.sh

# last access and modification timestamps
$ stat -c $'%x\n%y' fruits.txt
2017-07-19 17:06:01.523308599 +0530
2017-07-13 13:54:03.576055933 +0530

# update the access and modification values to the current time
$ touch fruits.txt
$ stat -c $'%x\n%y' fruits.txt
2024-05-14 13:01:25.921205889 +0530
2024-05-14 13:01:25.921205889 +0530

You can use the -r option to copy timestamp information from one file to another. The -d and -t options will allow you to specify timestamps directly as part of the command.

$ stat -c '%y' hi.sh
2022-06-14 13:00:46.170416890 +0530

# copy the modified timestamp from 'ip.txt' to 'hi.sh'
$ touch -m -r ip.txt hi.sh
$ stat -c '%y' hi.sh
2022-05-24 14:39:41.285714934 +0530

# pass timestamp as an argument
$ touch -m -d '2000-01-01 00:00:01' hi.sh
$ stat -c '%y' hi.sh
2000-01-01 00:00:01.000000000 +0530

As seen in the Managing Files and Directories chapter, touch creates a new file if the target file doesn't exist yet. You can use the -c option to prevent this behavior.

$ ls report.txt
ls: cannot access 'report.txt': No such file or directory
$ touch report.txt
$ ls report.txt
report.txt

$ touch -c xyz.txt
$ ls xyz.txt
ls: cannot access 'xyz.txt': No such file or directory

file

The file command helps you identify text encoding (ASCII, UTF-8, etc), whether the file is executable and so on.

Here are some examples to show how the file command behaves for different types:

# change to the 'scripts' directory and source the 'file.sh' script
$ source file.sh
$ ls -F
hi.sh*  ip.txt  moon.png  sunrise.jpg

$ file ip.txt hi.sh
ip.txt: ASCII text
hi.sh: Bourne-Again shell script, ASCII text executable

$ printf 'αλεπού\n' | file -
/dev/stdin: UTF-8 Unicode text

$ printf 'hi\r\n' | file -
/dev/stdin: ASCII text, with CRLF line terminators

Here's an example for image files:

# output of 'sunrise.jpg' wrapped for illustration purposes
$ file sunrise.jpg moon.png
sunrise.jpg: JPEG image data, JFIF standard 1.01, resolution (DPI), density
    96x96, segment length 16, baseline, precision 8, 76x76, components 3
moon.png:    PNG image data, 76 x 76, 8-bit colormap, non-interlaced

You can use the -b option to avoid filenames in the output:

$ file -b ip.txt
ASCII text

Here's how you can find particular type of files, images for example.

# assuming filenames do not contain ':' or newline characters
# awk here helps to print the first field of lines containing 'image data'
$ find -type f -exec file {} + | awk -F: '/\<image data\>/{print $1}'
./sunset.jpg
./moon.png

info See also the identify command which "describes the format and characteristics of one or more image files".

basename

By default, the basename command will remove the leading directory component from the given path argument. Any trailing slashes will be removed before determining the portion to be extracted.

$ basename /home/learnbyexample/example_files/scores.csv
scores.csv

# quote the arguments as needed
$ basename 'path with spaces/report.log'
report.log

You can use the -s option to remove a suffix from the filename. Usually used to remove the file extension.

$ basename -s'.csv' /home/learnbyexample/example_files/scores.csv
scores

# suffix will be removed only once
$ basename -s'.txt' purchases.txt.txt
purchases.txt

The basename command requires -a or -s (which implies -a) to work with multiple arguments.

$ basename -a /backups/jan_2021.tar.gz /home/learnbyexample/report.log
jan_2021.tar.gz
report.log

# -a is implied when -s is used
$ basename -s'.txt' logs/purchases.txt logs/report.txt
purchases
report

dirname

By default, the dirname command removes the trailing path component (after removing any trailing slashes).

$ dirname /home/learnbyexample/example_files/scores.csv
/home/learnbyexample/example_files

# one or more trailing slashes will not affect the output
$ dirname /home/learnbyexample/example_files/
/home/learnbyexample

# unlike basename, multiple arguments are accepted by default
$ dirname /home/learnbyexample/example_files/scores.csv ../report/backups/
/home/learnbyexample/example_files
../report

You can use shell features like command substitution to combine the effects of the basename and dirname commands.

# extract the second last path component
$ basename $(dirname /home/learnbyexample/example_files/scores.csv)
example_files

chmod

You can use the chmod command to change permissions. Consider this example:

$ mkdir practice_chmod
$ cd practice_chmod
$ echo 'learnbyexample' > ip.txt

# this info can also be seen in the first column of the 'ls -l' output
$ stat -c '%A' ip.txt
-rw-rw-r--

In the above output, the 10 characters displayed in the last line are related to file type and permissions. First character indicates the file type. The most common ones are shown below:

  • - regular file
  • d directory
  • l symbolic link

The other nine characters represent three sets of file permissions for user (u), group (g) and others (o), in that order.

  • user — file owner
  • group — users having file access as part of a group
  • others — everyone else

Only rwx file properties will be discussed in this section. For other types of properties, refer to the coreutils manual: File permissions.

Permission reference table for files:

CharacterMeaningValue
rread4
wwrite2
xexecute1
-no permission0

Here's an example showing both rwx and numerical representations of a file's permissions:

$ stat -c '%A' ip.txt
-rw-rw-r--

# r(4) + w(2) + 0 = 6
# r(4) + 0 + 0 = 4
$ stat -c '%a' ip.txt
664

info Note that the permissions are not straightforward to understand for directories. If a directory only has the x permission, you can cd into it but you cannot read the contents (using ls for example). If a directory only has the r permission, you cannot cd into it, but you'll be able to read the contents (along with "cannot access" error). For this reason, the rx permissions are almost always enabled/disabled together. The w permission allows you to add or remove contents, provided x is active.

Changing permissions for all three categories

You can provide numbers for ugo (in that order) to change permissions. This is best understood with examples:

$ printf '#!/bin/bash\n\necho hi\n' > hi.sh
$ stat -c '%a %A' hi.sh
664 -rw-rw-r--

# r(4) + w(2) + x(1) = 7
# r(4) + 0 + x(1) = 5
$ chmod 755 hi.sh
$ stat -c '%a %A' hi.sh
755 -rwxr-xr-x

Here's an example for a directory:

$ mkdir dot_files
$ stat -c '%a %A' dot_files
775 drwxrwxr-x

$ chmod 700 dot_files
$ stat -c '%a %A' dot_files
700 drwx------

You can also use mkdir -m instead of the mkdir+chmod combination seen above. The argument to the -m option accepts the same syntax as chmod (including the format that'll be discussed next).

$ mkdir -m 750 backups
$ stat -c '%a %A' backups
750 drwxr-x---

info You can use chmod -R to recursively change permissions. Use find+exec if you want to apply changes only for files filtered by some criteria.

Changing permissions for specific categories

You can assign (=), add (+) or remove (-) permissions by using those symbols followed by one or more rwx permissions. This depends on the umask value:

$ umask
0002

umask value of 0002 means:

  • read and execute permissions without ugo prefix affects all the three categories
  • write permissions without ugo prefix affects only the user and group categories

Here are some examples without ugo prefixes:

# remove execute permission for all three categories
$ chmod -x hi.sh

# add write permission only for 'user' and 'group'
$ chmod +w ip.txt

$ touch sample.txt
$ chmod 702 sample.txt
# give only read permission for all three categories
# write/execute permissions, if any, will be removed
$ chmod =r sample.txt
$ stat -c '%a %A' sample.txt
444 -r--r--r--

# give read and write permissions for 'user' and 'group'
# and read permission for 'others'
# execute permissions, if any, will be removed
$ chmod =rw hi.sh

Here are some examples with ugo prefixes. You can use a to refer to all the three categories. For example, a+w is same as ugo+w.

# remove read and write permissions only for 'others'
$ chmod o-rw sample.txt

# add execute permission for 'group' and 'others'
$ chmod go+x hi.sh

# give read and write permissions for all three categories
# execute permissions, if any, will be removed
$ chmod a=rw hi.sh

You can use , to separate multiple permissions:

# remove execute permission for 'group' and 'others'
# remove write permission for 'others'
$ chmod go-x,o-w hi.sh

Further Reading

Exercises

info Use the example_files/text_files directory for input files used in the following exercises, unless otherwise specified.

info Create a temporary directory for exercises that may require you to create some files and directories. You can delete such practice directories afterwards.

1) Save the number of lines in the greeting.txt input file to the lines shell variable.

# ???
$ echo "$lines"
2

2) What do you think will be the output of the following command?

$ echo 'dragons:2 ; unicorns:10' | wc -w

3) Use appropriate options and arguments to get the output shown below.

$ printf 'apple\nbanana\ncherry' | wc # ???
     15     183 sample.txt
      2      19 -
     17     202 total

4) Go through the wc manual and use appropriate options and arguments to get the output shown below.

$ printf 'greeting.txt\0scores.csv' | wc # ???
2 6 25 greeting.txt
4 4 70 scores.csv
6 10 95 total

5) What is the difference between the wc -c and wc -m options? And which option would you use to get the longest line length?

6) Find filenames ending with .log and report their sizes in human readable format. Use the find+du combination for the first case and the ls command (with appropriate shell features) for the second case.

# change to the 'scripts' directory and source the 'du.sh' script
$ source du.sh

# ??? find+du
16K     ./projects/errors.log
7.4M    ./report.log

# ??? ls and shell features
 16K projects/errors.log
7.4M report.log

7) Report sizes of files/directories in the current path in powers of 1000 without descending into sub-directories. Also, show a total at the end.

# change to the 'scripts' directory and source the 'du.sh' script
$ source du.sh

# ???
50k     projects
7.7M    report.log
8.2k    todos
7.8M    total

8) What does the du --apparent-size option do?

9) When will you use the df command instead of du? Which df command option will help you to report only the specific fields of interest?

10) Display the size of scores.csv and timings.txt files in the format shown below.

$ stat # ???
scores.csv: 70
timings.txt: 49

11) Which touch option will help you prevent file creation if it doesn't exist yet?

12) Assume new_file.txt doesn't exist in the current working directory. What would be the output of the stat command shown below?

$ touch -t '202010052010.05' new_file.txt
$ stat -c '%y' new_file.txt
# ???

13) Is the following touch command valid? If so, what would be the output of the stat command that follows?

# change to the 'scripts' directory and source the 'touch.sh' script
$ source touch.sh

$ stat -c '%n: %y' fruits.txt
fruits.txt: 2017-07-13 13:54:03.576055933 +0530

$ touch -r fruits.txt f{1..3}.txt
$ stat -c '%n: %y' f*.txt
# ???

14) Use appropriate option(s) to get the output shown below.

$ printf 'αλεπού\n' | file -
/dev/stdin: UTF-8 Unicode text

$ printf 'αλεπού\n' | file # ???
UTF-8 Unicode text

15) Is the following command valid? If so, what would be the output?

$ basename -s.txt ~///test.txt///
# ???

16) Given the file path in the shell variable p, how'd you obtain the output shown below?

$ p='~/projects/square_tictactoe/python/game.py'
$ dirname # ???
~/projects/square_tictactoe

17) Explain what each of the characters mean in the following stat command's output.

$ stat -c '%A' ../scripts/
drwxrwxr-x

18) What would be the output of the second stat command shown below?

$ touch new_file.txt
$ stat -c '%a %A' new_file.txt
664 -rw-rw-r--

$ chmod 546 new_file.txt
$ stat -c '%a %A' new_file.txt
# ???

19) How would you specify directory permissions using the mkdir command?

# instead of this
$ mkdir back_up
$ chmod 750 back_up
$ stat -c '%a %A' back_up
750 drwxr-x---
$ rm -r back_up

# do this
$ mkdir # ???
$ stat -c '%a %A' back_up
750 drwxr-x---

20) Change the file permission of book_list.txt to match the output of the second stat command shown below. Don't use the number 220, specify the changes in terms of rwx characters.

$ touch book_list.txt
$ stat -c '%a %A' book_list.txt
664 -rw-rw-r--

# ???
$ stat -c '%a %A' book_list.txt
220 --w--w----

21) Change the permissions of test_dir to match the output of the second stat command shown below. Don't use the number 757, specify the changes in terms of rwx characters.

$ mkdir test_dir
$ stat -c '%a %A' test_dir
775 drwxrwxr-x

# ???
$ stat -c '%a %A' test_dir
757 drwxr-xrwx