Today we’ll learn how to Iterating List in Python using example. You can download Python for Mac, Linux & Windows from here.
Your program will end up with a
list of file handles that will need to be processed. We can use a
for loop to iterate through the characters in the input text in python. Here we can use a
for loop over the
args.file inputs, which will be open file handles:
for fh in args.file: # read each file
You can give whatever name you like to the variable you use in your
for loop, but I think it’s very important to give it a semantically meaningful name. Here the variable name
fh reminds me that this is an open file handle. You saw in chapter 5 how to manually
read() a file. Here
fh is already open, so we can use it directly to read the contents.
There are many ways to read a file. The
fh.read() method will give you the entire contents of the file in one go. If the file is large–if it exceeds the available memory on your machine–your program will crash. I would recommend, instead, that you use another
for loop on the
fh. Python will understand this to mean that you wish to read each
line of the file handle, one at a time.
for fh in args.file: # ONE LOOP! for line in fh: # TWO LOOPS! # process the line
What you’re counting
The output for each file will be the number of lines, words, and bytes (like characters and whitespace), each of which is printed in a field eight characters wide, followed by a space and then the name of the file, which will be available to you via
$ wc fox.txt 1 9 45 fox.txt
The fox.txt file is short enough that you could manually verify that it does in fact contain 1 line, 9 words, and 45 bytes, which includes all the characters, spaces, and the trailing newline (see figure 6.2).
Figure 1.1 The fox.txt file contains 1 line of text, 9 words, and a total of 45 bytes.
$ wc fox.txt sonnet-29.txt 1 9 45 fox.txt 17 118 669 sonnet-29.txt 18 127 714 total
We are going to emulate the behavior of this program. For each file, you will need to create variables to hold the numbers of lines, words, and bytes. For instance, if you use the
fh loop that I suggest, you will need to have a variable like
num_lines to increment on each iteration.
That is, somewhere in your code you will need to set a variable to
0 and then, inside the
for loop, make it go up by 1. The idiom in Python is to use the
+= operator to add some value on the right side to the variable on the left side (as shown in figure 6.3):
num_lines = 0 for line in fh: num_lines += 1
Figure 1.2 The
+= operator will add the value on the right to the variable on the left.
To get the words, we’ll use the
str.split() method to break each
line on spaces. You can then use the length of the resulting
list as the number of words. For the number of bytes, you can use the
len() (length) function on the
line and add that to a
Splitting the text on spaces doesn’t actually produce “words” because it won’t separate the punctuation, like commas and periods, from the letters, but it’s close enough for this program.
More Articles on Python