This chapter will discuss the
open() built-in function and introduce some of the built-in modules for file processing.
The open() built-in function is one of the ways to read and write files. The first argument to this function is the filename to be processed. The filename is a relative/absolute path to the location of the file. Rest are keyword arguments that you can configure. The output is a
TextIOWrapper object (i.e. a filehandle), which you can use as an iterator. Here's an example:
# default mode is rt i.e. read text >>> fh = open('ip.txt') >>> fh <_io.TextIOWrapper name='ip.txt' mode='r' encoding='UTF-8'> >>> next(fh) 'hi there\n' >>> next(fh) 'today is sunny\n' >>> next(fh) 'have a nice day\n' >>> next(fh) Traceback (most recent call last): File "<stdin>", line 1, in <module> StopIteration # check if the filehandle is active or closed >>> fh.closed False # close the filehandle >>> fh.close() >>> fh.closed True
mode argument specifies what kind of processing you want. Only
text mode will be covered in this chapter, which is the default. You can combine options, for example,
binary mode. Here's the relevant details from the documentation:
'r'open for reading (default)
'w'open for writing, truncating the file first
'x'open for exclusive creation, failing if the file already exists
'a'open for writing, appending to the end of the file if it exists
't'text mode (default)
'+'open for updating (reading and writing)
encoding argument is meaningful only in the
text mode. You can check the default encoding for your environment using the locale module as shown below. See docs.python: standard encodings and docs.python HOWTOs: Unicode for more details.
>>> import locale >>> locale.getpreferredencoding() 'UTF-8'
Here's how Python handles line separation by default, see documentation for more details.
On input, if
None, universal newlines mode is enabled. Lines in the input can end in
'\r\n', and these are translated into
'\n'before being returned to the caller.
On output, if
'\n'characters written are translated to the system default line separator,
If the given filename doesn't exist, you'll get a
>>> open('xyz.txt', mode='r', encoding='ascii') Traceback (most recent call last): File "<stdin>", line 1, in <module> FileNotFoundError: [Errno 2] No such file or directory: 'xyz.txt'
Quoting from docs.python: Reading and Writing Files:
It is good practice to use the
withkeyword when dealing with file objects. The advantage is that the file is properly closed after its suite finishes, even if an exception is raised at some point. Using
withis also much shorter than writing equivalent
# read_file.py with open('ip.txt', mode='r', encoding='ascii') as f: for ip_line in f: op_line = ip_line.rstrip('\n').capitalize() + '.' print(op_line)
$ python3.9 read_file.py Hi there. Today is sunny. Have a nice day.
See The Magic of Python Context Managers for more examples and details.
read() method gives you entire remaining contents of the file as a single string. The
readline() method gives next line of text and
readlines() gives all the remaining lines as a
list of strings.
>>> open('ip.txt').read() 'hi there\ntoday is sunny\nhave a nice day\n' >>> fh = open('ip.txt') # readline() is similar to next() # but returns empty string instead of StopIteration exception >>> fh.readline() 'hi there\n' >>> fh.readlines() ['today is sunny\n', 'have a nice day\n'] >>> fh.readline() ''
# write_file.py with open('op.txt', mode='w', encoding='ascii') as f: f.write('this is a sample line of text\n') f.write('yet another line\n')
You can call the
write() method on a filehandle to add contents to that file (provided the
mode you have set supports writing). Unlike
write() method doesn't automatically add newline characters.
$ python3.9 write_file.py $ cat op.txt this is a sample line of text yet another line $ file op.txt op.txt: ASCII text
If the file already exists, the
wmode will overwrite the contents (i.e. existing content will be lost).
You can also use the
print()function for writing by passing the filehandle to the
fileargument. The fileinput module supports in-place editing and other features (see In-place editing with fileinput section for examples).
This section gives introductory examples for some of the built-in modules that are handy for file processing. Quoting from docs.python: os:
This module provides a portable way of using operating system dependent functionality.
>>> import os # current working directory >>> os.getcwd() '/home/learnbyexample/Python/programs/' # value of an environment variable >>> os.getenv('SHELL') '/bin/bash' # file size >>> os.stat('ip.txt').st_size 40 # check if given path is a file >>> os.path.isfile('ip.txt') True
Quoting from docs.python: glob:
The glob module finds all the pathnames matching a specified pattern according to the rules used by the Unix shell, although results are returned in arbitrary order. No tilde expansion is done, but
?, and character ranges expressed with
will be correctly matched.
>>> import glob # list of files (including directories) containing '_file' in their name >>> glob.glob('*_file*') ['read_file.py', 'write_file.py']
Quoting from docs.python: shutil:
shutilmodule offers a number of high-level operations on files and collections of files. In particular, functions are provided which support file copying and removal.
>>> import shutil >>> shutil.copy('ip.txt', 'ip_file.txt') 'ip_file.txt' >>> glob.glob('*_file*') ['read_file.py', 'ip_file.txt', 'write_file.py']
Quoting from docs.python: pathlib:
This module offers classes representing filesystem paths with semantics appropriate for different operating systems. Path classes are divided between pure paths, which provide purely computational operations without I/O, and concrete paths, which inherit from pure paths but also provide I/O operations.
>>> from pathlib import Path # use 'rglob' instead of 'glob' if you want to match names recursively >>> list(Path('programs').glob('*file.py')) [PosixPath('programs/read_file.py'), PosixPath('programs/write_file.py')]
See pathlib module: taming the file system and stackoverflow: How can I iterate over files in a given directory? for more details and examples.
There are specialized modules for structured data processing as well, for example:
Write a program that reads a known filename
f1.txtwhich contains a single column of numbers in Python syntax. Your task is to display the sum of these numbers, which is
10485.14for the given example.
$ cat f1.txt 8 53 3.14 84 73e2 100 2937
Read the documentation for
glob.glob()and write a program to list all files ending with
.txtin the current directory as well as sub-directories, recursively.