Handling text files in python - an easy guide for beginners
All you need to know to efficiently handle text files using python
When working on a large-scale web application or a project which involves working with a large amount of data, it is not logical to store all the data in variables as they are volatile in nature. We need something much more reliable and structured. This is when data files come into play. They provide an easier way to access and manipulate data.
In Python, there are two types of data files:
- Text files
- Binary files
Text files are regular data files that we all are familiar with. We can open these files in a text editor and read the content inside.
Binary files, on the other hand, encode data in a specific format that can only be understood by a computer or a machine. Most of the files on our computers are stored in binary format.
In this article, I will cover all the basic syntaxes for opening and closing files, and various other syntaxes Python provides to efficiently handle text files.
Opening a file
The most commonly used command while handling data files in Python is open()
. It is used to open a file in one of the following modes-
- r (read mode) - to read the contents of a file
- w (write mode) - to write to a file. Note that this mode overwrites the previously stored data.
- a (append mode) - to append to an existing file. This mode writes data at the end of the file and no previously stored data is lost.
- x (create mode) - to create a new file. This mode returns an error if the file already exists.
- r+ / +r (read and write mode) - to both read and write data to the same file.
- a+ / +a (read and append mode) - to both read and append data to the same file.
Syntax for opening a file
fileObject = open(filename, mode)
If you don't specify a mode, Python opens the file in 'r' mode as default.
So, f = open("file.txt")
is same as f = open("file.txt", 'r')
Here, 'f' is the file object that contains the contents of the file opened.
Opening files using 'with' clause
Another way of opening files in Python is by using the 'with' clause, which is often considered to be the more efficient way for opening files.
One advantage of using 'with' clause is that any opened file is closed automatically, in case you forget to close it manually.
Syntax
with open(filename, mode) as fileObject:
Example
with open("file.txt", 'r') as myFile:
for text in myFile:
print(text)
File Object Attributes
There are some file object attributes in Python that are used to access some more information about the opened file -
<file.closed>
- returnsTrue
if the file is closed andFalse
otherwise<file.name>
- returns the name of the opened file<file.mode>
- returns the mode in which the file was opened
Reading a file
To read a file in Python, we first need to open the file in r, r+, or a+ mode.
with open("file.txt", 'r') as myFile:
# more code goes here...
There are three ways to read the contents of a file -
1. The read() method
This method is used to read a specific number of bytes of data from the file.
Syntax
fileObject.read(n) # 'n' is the no of bytes of data
If 'n' is not specified in the syntax or a negative number is specified, it reads the entire content of the file.
Let's understand this method with an example -
# reading 8 characters from the file
with open("file.txt", 'r') as myFile:
myFile.read(8)
'Hello wo'
# reading all the content from the file
with open("file.txt", 'r') as myFile:
myFile.read()
'Hello world! This is content of the file'
2. The readline() method
This method is used to read a single line from the file or a specified number of bytes of data from the first line, but maximum up to the whole line.
Each line ends with a newline character '\n', which is counted as a single character
Syntax
fileObject.readline(n) # 'n' is the no of bytes of data
If 'n' is not specified in the syntax or a negative number is specified, it reads the entire first line from the file.
Example
# reading 10 characters from the first line
with open("file.txt", 'r') as myFile:
myFile.readline(10)
'Hello worl'
# reading the entire first line
with open("file.txt", 'r') as myFile:
myFile.readline()
'Hello world! This is the first line of the file'
3. The readlines() method
This method reads and returns all the lines from a text file, as members of a list. It takes no argument.
Syntax
fileObject.readlines()
Example
with open("file.txt", 'r') as myFile:
data = myFile.readlines()
print(data)
['Hello world!\n', 'Hello world!\n', 'Hello world!\n', 'Hello world!\n']
As we can see, each line in the file is returned as a member of the list with a newline character '\n' at the end.
If we want to return each line as a separate list, we can use the splitlines()
function.
with open("file.txt", 'r') as myFile:
lines = myFile.readlines()
for line in lines:
line_split = line.splitlines()
print(line_split)
['Hello World!']
['Hello World!']
['Hello World!']
['Hello World!']
Creating a file
To create a file in Python, we use the open()
method and pass the name and mode for the file as arguments.
Syntax
fileObject = open(filename, mode)
When a file is opened in write(w) mode, an empty file is created. If a file with the same name already exists in the system, all the previous data is erased and a new empty file is created.
When opened in append(a) mode, the previous data of the file remains and the new data is written at the end. However, if the file does not exist already, an empty file is created.
Create(x) mode creates a new file with the specified name, but it cannot be read or edited. If a file with the same name already exists, it returns an error.
Writing to a file
For writing to a file, we need to open the file in either 'write' or 'append' mode.
Let's understand the difference between the two -
Write(w) mode opens the file or creates the files if it doesn't exist already, and sets the offset at the beginning of the file, meaning that the data written to this file after opening will overwrite the pre-existing data in the file.
Append(a) mode, on the other hand, sets the offset of the file at its end after opening, which means that the new data is written to the file after the previous data, instead of overwriting it.
After opening the file in either of these modes, there are two methods for writing data to the file -
1. The write() method
This method takes a string as an argument and returns the number of bytes written onto the file.
Numerical values need to be converted into strings before passing as the argument
Syntax
fileObject.write("This is some data")
Example
>>> myFile = open("file.txt", 'w')
>>> myFile.write("Hello World!")
12
2. The writelines() method
This method is used to write multiple lines to a file at the same time. It takes an iterable object like a tuple or a list, containing multiple lines, as the argument.
Syntax
fileObject.writelines(object)
Look at this example for a better understanding -
>>> myFile = open("file.txt", 'w')
>>> lines = ["line1\n", "line2\n", "line3\n"]
>>> myFile.writelines(lines)
Remember to put the newline character(\n) at the end of each line.
After running this code, the file will look like this -
Setting offsets in a file
When we discussed the differences between 'write' and 'append' modes earlier, I mentioned offsets being set at the beginning or end of a text file.
Put simply, the offset is the position of the cursor from where the data is to be read or written in the file.
All the functions I talked about till now read the file data sequentially from the beginning. If we want to manipulate data in a random manner, Python gives us two functions - seek()
and tell()
tell() function
The tell()
function returns the current position of the cursor or file handle of the file as an integer. This function takes no argument. When a file is opened in any mode other than 'append' mode, the initial value of tell()
function is zero.
Syntax
fileObject.tell()
seek() function
The seek()
function allows us to position the file handle at a specific point in the file.
Syntax
fileObject.seek(offset, ref)
The function takes two arguments -
- offset defines the number of bytes/positions to move forward in the file
- ref defines the point of reference
Let's understand these two functions with an example -
First, we create a file and write some data.
# Creating and writing data to a file
myFile = open("file.txt", 'w')
myFile.write("Hello world!, this data is being written onto the file.")
myFile.close()
After creating the file, we open it again in 'read' mode and display the position of the file handle before and after reading the file. The offset is set to zero by default.
# reading the file and displaying the offset position before and after reading
myFile = open("file.txt", 'r')
print("default position of the cursor:", myFile.tell())
data = myFile.read()
offset = myFile.tell()
print("current position of the cursor:", offset)
Output:
We can see, after reading 55 characters from the file, the offset is now set to the 55th position (technically 56th position, as it starts from 0, not 1).
Now, to set the offset at a specific position within the file, we use the seek() function.
# positioning the offset at the 10th position
offset = myFile.seek(10)
print("new position of the cursor", offset)
Output:
Closing a file
After all the read/write operations are done, it is a good practice to close the file. Sometimes the written data is stored in cached memory and isn't actually written on the file until it is closed. Closing a file makes sure that all the unwritten data is flushed(written) on the file before closing.
The syntax for closing a file in python is
fileObject.close()
Note that when we re-assign a file object to another file, then the previous file is automatically closed.
Also, we discussed earlier that opening a file using the 'with' clause also closes the file automatically and we don't need to close it manually.
Deleting a file
In order to delete a file from the system, we need to import the 'os' python module.
import os
This library has a lot of useful functions, but the one we need here is os.remove(filename)
We pass the name of the file as an argument. If the file does not exist, this function returns an error.
A better way to delete a file in python is to check whether the file we want to delete exists. We do this by using os.path.exist(filename)
And the code looks like this -
# Deleting a file
import os
if os.path.exists("file.txt"):
os.remove("file.txt")
else:
print("This file does not exist!")
Now that we have covered all the basic concepts for handling text files, it is time for you to practice them yourself and play around with these syntaxes. It might feel a bit overwhelming at first, but it only gets easier with practice and some experience.
Here are a few other resources you can check out -
If you want to add more to this article to make it more informative, feel free to share them.
For any queries, you can connect with me on Twitter @TusharS_23
Hope you found this article helpful. See you in the next one!