File Properties
In this chapter, you'll learn how to view file details like line and word counts, file and disk sizes, file types, extract parts of a file path, etc. You'll also learn how to change file properties like timestamps and permissions.
The example_files directory has the scripts and sample input files used in this chapter.
wc
The wc
command is typically used to count the number of lines, words and characters for the given inputs. Here are some basic examples:
# change to the 'example_files/text_files' directory
$ cat greeting.txt
Hi there
Have a nice day
# by default, wc gives the newline/word/byte count (in that order)
$ wc greeting.txt
2 6 25 greeting.txt
# get only the specified counts
$ wc -l greeting.txt
2 greeting.txt
$ wc -w greeting.txt
6 greeting.txt
$ wc -c greeting.txt
25 greeting.txt
$ wc -wc greeting.txt
6 25 greeting.txt
Filename won't be printed for stdin data. This is helpful to save the results in a variable for scripting purposes.
$ wc -l <greeting.txt
2
Word count is based on whitespace separation. You can pre-process the input to prevent certain non-whitespace characters to influence the results. tr
can be used to remove a particular set of characters (this command will be discussed in the Assorted Text Processing Tools chapter).
$ echo 'apple ; banana ; cherry' | wc -w
5
# remove characters other than alphabets and whitespace
# -d option is for deleting, -c option complements the given set
$ echo 'apple ; banana ; cherry' | tr -cd 'a-zA-Z[:space:]'
apple banana cherry
$ echo 'apple ; banana ; cherry' | tr -cd 'a-zA-Z[:space:]' | wc -w
3
If you pass multiple files to the wc
command, the count values will be displayed separately for each file. You'll also get a summary at the end, which sums the respective count of all the input files.
$ wc greeting.txt fruits.txt sample.txt
2 6 25 greeting.txt
3 3 20 fruits.txt
15 38 183 sample.txt
20 47 228 total
You can use the -L
option to report the length of the longest line in the input (excluding the newline character of a line). Note that -L
won't count non-printable characters and tabs are converted to equivalent spaces. Multibyte characters and grapheme clusters will each be counted as 1
(depending on the locale, they might become non-printable too).
$ echo 'apple' | wc -L
5
$ echo 'αλεπού cag̈e' | wc -L
11
$ wc -L <greeting.txt
15
Use the -m
option instead of -c
if the input has multibyte characters.
$ printf 'αλεπού' | wc -c
12
$ printf 'αλεπού' | wc -m
6
du
The du
command helps you estimate the size of files and directories.
By default, size is given in terms of 1024 bytes. All directories and sub-directories are recursively reported, but files are ignored. You can use the -a
option if files should also be reported. du
is one of the commands that require an explicit option (-L
in this case) if you want symbolic links to be followed.
# change to the 'scripts' directory and source the 'du.sh' script
$ source du.sh
# n * 1024 bytes
$ du
28 ./projects/scripts
48 ./projects
8 ./todos
7536 .
Use the -s
option to show the total directory size without descending into sub-directories. Add the -c
option to also show the total size at the end.
$ du -s projects report.log
48 projects
7476 report.log
$ du -sc projects report.log
48 projects
7476 report.log
7524 total
Here are some examples to illustrate the size formatting options:
# number of bytes
$ du -b report.log
7654321 report.log
# n * 1024 bytes
$ du -k report.log
7476 report.log
# n * 1024 * 1024 bytes
$ du -m report.log
8 report.log
The -h
option reports size in human readable format (uses power of 1024). Use the --si
option to get results in powers of 1000 instead. If you use du -h
, you can pipe the output to sort -h
for sorting purposes.
$ du -sh *
48K projects
7.4M report.log
8.0K todos
$ du -sh * | sort -h
8.0K todos
48K projects
7.4M report.log
df
The df
command gives you the space usage of file systems. df
without path arguments will give information about all the currently mounted file systems.
$ df .
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sda1 98298500 58563816 34734748 63% /
Use the -h
option for human readable sizes. The -B
option allows you to scale sizes by the specified amount. Use --si
for size in powers of 1000 instead of 1024.
$ df -h .
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 94G 56G 34G 63% /
Use the --output
option to report only the specific fields of interest:
$ df -h --output=size,used,file / /media/learnbyexample/projs
Size Used File
94G 56G /
92G 35G /media/learnbyexample/projs
# 'awk' here excludes first line and matches lines with first field >= 30
$ df -h --output=pcent,fstype,target | awk 'NR>1 && $1>=30'
63% ext3 /
38% ext4 /media/learnbyexample/projs
51% ext4 /media/learnbyexample/backups
stat
The stat
command is useful to get details like file type, size, inode, permissions, last accessed and modified timestamps, etc. You'll get all of these details by default. The -c
and --printf
options can be used to display only the required details in a particular format.
# change to the 'scripts' directory and source the 'stat.sh' script
$ source stat.sh
# %x gives the last accessed timestamp
$ stat -c '%x' ip.txt
2022-06-01 13:25:18.693823117 +0530
# %y gives the last modified timestamp
$ stat -c '%y' ip.txt
2022-05-24 14:39:41.285714934 +0530
# %s gives the file size in bytes
# \n is used to insert a newline
# %i gives the inode value
# same as: stat --printf='%s\n%i\n' ip.txt
$ stat -c $'%s\n%i' ip.txt
10
787224
# %N gives quoted filenames
# if the input is a link, the path it points to is also displayed
$ stat -c '%N' words.txt
'words.txt' -> '/usr/share/dict/words'
You can also pass multiple file arguments:
# %s gives the file size in bytes
# %n gives filenames
$ stat -c '%s %n' ip.txt hi.sh
10 ip.txt
21 hi.sh
The
stat
command should be preferred instead of parsing thels -l
output for file details. See mywiki.wooledge: avoid parsing output of ls and unix.stackexchange: why not parse ls? for explanation and other alternatives.
touch
As mentioned earlier, the touch
command helps you change the timestamps of files. You can do so based on the current timestamp, passing an argument, copying the value from another file and so on.
By default, touch
updates both the access and modification timestamps to the current time. You can use the -a
option to change only the access timestamp and -m
to change only the modification timestamp.
# change to the 'scripts' directory and source the 'touch.sh' script
$ source touch.sh
# last access and modification timestamps
$ stat -c $'%x\n%y' fruits.txt
2017-07-19 17:06:01.523308599 +0530
2017-07-13 13:54:03.576055933 +0530
# update the access and modification values to the current time
$ touch fruits.txt
$ stat -c $'%x\n%y' fruits.txt
2024-05-14 13:01:25.921205889 +0530
2024-05-14 13:01:25.921205889 +0530
You can use the -r
option to copy timestamp information from one file to another. The -d
and -t
options will allow you to specify timestamps directly as part of the command.
$ stat -c '%y' hi.sh
2022-06-14 13:00:46.170416890 +0530
# copy the modified timestamp from 'ip.txt' to 'hi.sh'
$ touch -m -r ip.txt hi.sh
$ stat -c '%y' hi.sh
2022-05-24 14:39:41.285714934 +0530
# pass timestamp as an argument
$ touch -m -d '2000-01-01 00:00:01' hi.sh
$ stat -c '%y' hi.sh
2000-01-01 00:00:01.000000000 +0530
As seen in the Managing Files and Directories chapter, touch
creates a new file if the target file doesn't exist yet. You can use the -c
option to prevent this behavior.
$ ls report.txt
ls: cannot access 'report.txt': No such file or directory
$ touch report.txt
$ ls report.txt
report.txt
$ touch -c xyz.txt
$ ls xyz.txt
ls: cannot access 'xyz.txt': No such file or directory
file
The file
command helps you identify text encoding (ASCII, UTF-8, etc), whether the file is executable and so on.
Here are some examples to show how the file
command behaves for different types:
# change to the 'scripts' directory and source the 'file.sh' script
$ source file.sh
$ ls -F
hi.sh* ip.txt moon.png sunrise.jpg
$ file ip.txt hi.sh
ip.txt: ASCII text
hi.sh: Bourne-Again shell script, ASCII text executable
$ printf 'αλεπού\n' | file -
/dev/stdin: UTF-8 Unicode text
$ printf 'hi\r\n' | file -
/dev/stdin: ASCII text, with CRLF line terminators
Here's an example for image files:
# output of 'sunrise.jpg' wrapped for illustration purposes
$ file sunrise.jpg moon.png
sunrise.jpg: JPEG image data, JFIF standard 1.01, resolution (DPI), density
96x96, segment length 16, baseline, precision 8, 76x76, components 3
moon.png: PNG image data, 76 x 76, 8-bit colormap, non-interlaced
You can use the -b
option to avoid filenames in the output:
$ file -b ip.txt
ASCII text
Here's how you can find particular type of files, images for example.
# assuming filenames do not contain ':' or newline characters
# awk here helps to print the first field of lines containing 'image data'
$ find -type f -exec file {} + | awk -F: '/\<image data\>/{print $1}'
./sunset.jpg
./moon.png
See also the
identify
command which "describes the format and characteristics of one or more image files".
basename
By default, the basename
command will remove the leading directory component from the given path argument. Any trailing slashes will be removed before determining the portion to be extracted.
$ basename /home/learnbyexample/example_files/scores.csv
scores.csv
# quote the arguments as needed
$ basename 'path with spaces/report.log'
report.log
You can use the -s
option to remove a suffix from the filename. Usually used to remove the file extension.
$ basename -s'.csv' /home/learnbyexample/example_files/scores.csv
scores
# suffix will be removed only once
$ basename -s'.txt' purchases.txt.txt
purchases.txt
The basename
command requires -a
or -s
(which implies -a
) to work with multiple arguments.
$ basename -a /backups/jan_2021.tar.gz /home/learnbyexample/report.log
jan_2021.tar.gz
report.log
# -a is implied when -s is used
$ basename -s'.txt' logs/purchases.txt logs/report.txt
purchases
report
dirname
By default, the dirname
command removes the trailing path component (after removing any trailing slashes).
$ dirname /home/learnbyexample/example_files/scores.csv
/home/learnbyexample/example_files
# one or more trailing slashes will not affect the output
$ dirname /home/learnbyexample/example_files/
/home/learnbyexample
# unlike basename, multiple arguments are accepted by default
$ dirname /home/learnbyexample/example_files/scores.csv ../report/backups/
/home/learnbyexample/example_files
../report
You can use shell features like command substitution to combine the effects of the basename
and dirname
commands.
# extract the second last path component
$ basename $(dirname /home/learnbyexample/example_files/scores.csv)
example_files
chmod
You can use the chmod
command to change permissions. Consider this example:
$ mkdir practice_chmod
$ cd practice_chmod
$ echo 'learnbyexample' > ip.txt
# this info can also be seen in the first column of the 'ls -l' output
$ stat -c '%A' ip.txt
-rw-rw-r--
In the above output, the 10 characters displayed in the last line are related to file type and permissions. First character indicates the file type. The most common ones are shown below:
-
regular filed
directoryl
symbolic link
The other nine characters represent three sets of file permissions for user (u
), group (g
) and others (o
), in that order.
- user — file owner
- group — users having file access as part of a group
- others — everyone else
Only rwx
file properties will be discussed in this section. For other types of properties, refer to the coreutils manual: File permissions.
Permission reference table for files:
Character | Meaning | Value |
---|---|---|
r | read | 4 |
w | write | 2 |
x | execute | 1 |
- | no permission | 0 |
Here's an example showing both rwx
and numerical representations of a file's permissions:
$ stat -c '%A' ip.txt
-rw-rw-r--
# r(4) + w(2) + 0 = 6
# r(4) + 0 + 0 = 4
$ stat -c '%a' ip.txt
664
Note that the permissions are not straightforward to understand for directories. If a directory only has the
x
permission, you cancd
into it but you cannot read the contents (usingls
for example). If a directory only has ther
permission, you cannotcd
into it, but you'll be able to read the contents (along with "cannot access" error). For this reason, therx
permissions are almost always enabled/disabled together. Thew
permission allows you to add or remove contents, providedx
is active.
Changing permissions for all three categories
You can provide numbers for ugo
(in that order) to change permissions. This is best understood with examples:
$ printf '#!/bin/bash\n\necho hi\n' > hi.sh
$ stat -c '%a %A' hi.sh
664 -rw-rw-r--
# r(4) + w(2) + x(1) = 7
# r(4) + 0 + x(1) = 5
$ chmod 755 hi.sh
$ stat -c '%a %A' hi.sh
755 -rwxr-xr-x
Here's an example for a directory:
$ mkdir dot_files
$ stat -c '%a %A' dot_files
775 drwxrwxr-x
$ chmod 700 dot_files
$ stat -c '%a %A' dot_files
700 drwx------
You can also use mkdir -m
instead of the mkdir+chmod
combination seen above. The argument to the -m
option accepts the same syntax as chmod
(including the format that'll be discussed next).
$ mkdir -m 750 backups
$ stat -c '%a %A' backups
750 drwxr-x---
You can use
chmod -R
to recursively change permissions. Usefind+exec
if you want to apply changes only for files filtered by some criteria.
Changing permissions for specific categories
You can assign (=
), add (+
) or remove (-
) permissions by using those symbols followed by one or more rwx
permissions. This depends on the umask
value:
$ umask
0002
umask
value of 0002
means:
- read and execute permissions without
ugo
prefix affects all the three categories - write permissions without
ugo
prefix affects only theuser
andgroup
categories
Here are some examples without ugo
prefixes:
# remove execute permission for all three categories
$ chmod -x hi.sh
# add write permission only for 'user' and 'group'
$ chmod +w ip.txt
$ touch sample.txt
$ chmod 702 sample.txt
# give only read permission for all three categories
# write/execute permissions, if any, will be removed
$ chmod =r sample.txt
$ stat -c '%a %A' sample.txt
444 -r--r--r--
# give read and write permissions for 'user' and 'group'
# and read permission for 'others'
# execute permissions, if any, will be removed
$ chmod =rw hi.sh
Here are some examples with ugo
prefixes. You can use a
to refer to all the three categories. For example, a+w
is same as ugo+w
.
# remove read and write permissions only for 'others'
$ chmod o-rw sample.txt
# add execute permission for 'group' and 'others'
$ chmod go+x hi.sh
# give read and write permissions for all three categories
# execute permissions, if any, will be removed
$ chmod a=rw hi.sh
You can use ,
to separate multiple permissions:
# remove execute permission for 'group' and 'others'
# remove write permission for 'others'
$ chmod go-x,o-w hi.sh
Further Reading
- Linux Permissions Primer
- unix.stackexchange: why chmod +w filename not giving write permission to other
Exercises
Use the example_files/text_files directory for input files used in the following exercises, unless otherwise specified.
Create a temporary directory for exercises that may require you to create some files and directories. You can delete such practice directories afterwards.
1) Save the number of lines in the greeting.txt
input file to the lines
shell variable.
# ???
$ echo "$lines"
2
2) What do you think will be the output of the following command?
$ echo 'dragons:2 ; unicorns:10' | wc -w
3) Use appropriate options and arguments to get the output shown below.
$ printf 'apple\nbanana\ncherry' | wc # ???
15 183 sample.txt
2 19 -
17 202 total
4) Go through the wc
manual and use appropriate options and arguments to get the output shown below.
$ printf 'greeting.txt\0scores.csv' | wc # ???
2 6 25 greeting.txt
4 4 70 scores.csv
6 10 95 total
5) What is the difference between the wc -c
and wc -m
options? And which option would you use to get the longest line length?
6) Find filenames ending with .log
and report their sizes in human readable format. Use the find+du
combination for the first case and the ls
command (with appropriate shell features) for the second case.
# change to the 'scripts' directory and source the 'du.sh' script
$ source du.sh
# ??? find+du
16K ./projects/errors.log
7.4M ./report.log
# ??? ls and shell features
16K projects/errors.log
7.4M report.log
7) Report sizes of files/directories in the current path in powers of 1000
without descending into sub-directories. Also, show a total at the end.
# change to the 'scripts' directory and source the 'du.sh' script
$ source du.sh
# ???
50k projects
7.7M report.log
8.2k todos
7.8M total
8) What does the du --apparent-size
option do?
9) When will you use the df
command instead of du
? Which df
command option will help you to report only the specific fields of interest?
10) Display the size of scores.csv
and timings.txt
files in the format shown below.
$ stat # ???
scores.csv: 70
timings.txt: 49
11) Which touch
option will help you prevent file creation if it doesn't exist yet?
12) Assume new_file.txt
doesn't exist in the current working directory. What would be the output of the stat
command shown below?
$ touch -t '202010052010.05' new_file.txt
$ stat -c '%y' new_file.txt
# ???
13) Is the following touch
command valid? If so, what would be the output of the stat
command that follows?
# change to the 'scripts' directory and source the 'touch.sh' script
$ source touch.sh
$ stat -c '%n: %y' fruits.txt
fruits.txt: 2017-07-13 13:54:03.576055933 +0530
$ touch -r fruits.txt f{1..3}.txt
$ stat -c '%n: %y' f*.txt
# ???
14) Use appropriate option(s) to get the output shown below.
$ printf 'αλεπού\n' | file -
/dev/stdin: UTF-8 Unicode text
$ printf 'αλεπού\n' | file # ???
UTF-8 Unicode text
15) Is the following command valid? If so, what would be the output?
$ basename -s.txt ~///test.txt///
# ???
16) Given the file path in the shell variable p
, how'd you obtain the output shown below?
$ p='~/projects/square_tictactoe/python/game.py'
$ dirname # ???
~/projects/square_tictactoe
17) Explain what each of the characters mean in the following stat
command's output.
$ stat -c '%A' ../scripts/
drwxrwxr-x
18) What would be the output of the second stat
command shown below?
$ touch new_file.txt
$ stat -c '%a %A' new_file.txt
664 -rw-rw-r--
$ chmod 546 new_file.txt
$ stat -c '%a %A' new_file.txt
# ???
19) How would you specify directory permissions using the mkdir
command?
# instead of this
$ mkdir back_up
$ chmod 750 back_up
$ stat -c '%a %A' back_up
750 drwxr-x---
$ rm -r back_up
# do this
$ mkdir # ???
$ stat -c '%a %A' back_up
750 drwxr-x---
20) Change the file permission of book_list.txt
to match the output of the second stat
command shown below. Don't use the number 220
, specify the changes in terms of rwx
characters.
$ touch book_list.txt
$ stat -c '%a %A' book_list.txt
664 -rw-rw-r--
# ???
$ stat -c '%a %A' book_list.txt
220 --w--w----
21) Change the permissions of test_dir
to match the output of the second stat
command shown below. Don't use the number 757
, specify the changes in terms of rwx
characters.
$ mkdir test_dir
$ stat -c '%a %A' test_dir
775 drwxrwxr-x
# ???
$ stat -c '%a %A' test_dir
757 drwxr-xrwx