From 2c80375338fc996042691e44782020a27ff090c3 Mon Sep 17 00:00:00 2001 From: khoivan88 <33493502+khoivan88@users.noreply.github.com> Date: Mon, 16 Mar 2020 12:20:02 -0400 Subject: [PATCH 1/2] Use context-manager to open and read file 1. Use context-manager to open and read file. 2. Add `break` for breaking out of the loop while reading a file to no wasting computer resources. --- _episodes/03-multiple_files.md | 82 ++++++++++++++++------------------ 1 file changed, 39 insertions(+), 43 deletions(-) diff --git a/_episodes/03-multiple_files.md b/_episodes/03-multiple_files.md index c2e52f1..2258d95 100644 --- a/_episodes/03-multiple_files.md +++ b/_episodes/03-multiple_files.md @@ -73,15 +73,14 @@ print(filenames) This will give us a list of all the files which end in `*.out` in the `outfiles` directory. Now if we want to parse every file we just read in, we will use a `for` loop to go through each file. ``` for f in filenames: - outfile = open(f,'r') - data = outfile.readlines() - outfile.close() - for line in data: - if 'Final Energy' in line: - energy_line = line - words = energy_line.split() - energy = float(words[3]) - print(energy) + with open(f,'r') as data: + for line in data: + if 'Final Energy' in line: + energy_line = line + words = energy_line.split() + energy = float(words[3]) + print(energy) + break ``` {: .language-python} @@ -101,6 +100,8 @@ for f in filenames: Notice that in this code we actually used two `for` loops, one nested inside the other. The outer `for` loop counts over the filenames we read in earlier. The inner `for` loop counts over the line in each file, just as we did in our previous file parsing lesson. +`break` was used after `print(energy)` to break out of the `for` loop for reading the current file. This will stop python from reading the rest of the file content after finding the line with 'Final Energy'. + The output our code currently generates is not that useful. It doesn't show us which file each energy value came from. We want to print the name of the molecule with the energy. We can use `os.path.basename`, which is another function in `os.path` to get just the name of the file. @@ -144,18 +145,15 @@ for f in filenames: split_filname = file_name.split('.') molecule_name = split_filename[0] - # Read the data - outfile = open(f,'r') - data = outfile.readlines() - outfile.close() - - # Loop through the data - for line in data: - if 'Final Energy' in line: - energy_line = line - words = energy_line.split() - energy = float(words[3]) - print(molecule_name, energy) + # Read the data and loop through the data: + with open(f,'r') as data: + for line in data: + if 'Final Energy' in line: + energy_line = line + words = energy_line.split() + energy = float(words[3]) + print(molecule_name, energy) + break ~~~ {: .language-python} @@ -188,37 +186,35 @@ Python can only write strings to files. Our current print statement is not a st To make the printing neater, we will separate the file name from the energy using a tab. To insert a tab, we use the special character `\t`. ``` -datafile = open('energies.txt','w+') #This opens the file for writing -for f in filenames: - # Get the molecule name - file_name = os.path.basename(f) - split_filename = file_name.split('.') - molecule_name = split_filename[0] - - # Read the data - outfile = open(f,'r') - data = outfile.readlines() - outfile.close() +with open('energies.txt','w+') as datafile: #This opens the file for writing + for f in filenames: + # Get the molecule name + file_name = os.path.basename(f) + split_filename = file_name.split('.') + molecule_name = split_filename[0] - # Loop through the data - for line in data: - if 'Final Energy' in line: - energy_line = line - words = energy_line.split() - energy = float(words[3]) - datafile.write(F'{molecule_name} \t {energy} \n') -datafile.close() + # Read the data and loop through the data + with open(f,'r') as data: + for line in data: + if 'Final Energy' in line: + energy_line = line + words = energy_line.split() + energy = float(words[3]) + datafile.write(f'{molecule_name} \t {energy} \n') + break ``` {: .language-python} After you run this command, look in the directory where you ran your code and find the "energies.txt" file. Open it in a text editor and look at the file. -In the file writing line, notice the `\n` at the end of the line. This is the newline character. Without it, the text in our file would just be all smushed together on one line. Also, the `filehandle.close()` command is very important. Think about a computer as someone who has a very good memory, but is very slow at writing. Therefore, when you tell the computer to write a line, it remembers what you want it to write, but it doesn't actually write the new file until you tell it you are finished. The `datafile.close()` command tells the computer you are finished giving it lines to write and that it should go ahead and write the file now. If you are trying to write a file and the file keeps coming up empty, it is probably because you forgot to close the file. +In the file writing line, notice the `\n` at the end of the line. This is the newline character. Without it, the text in our file would just be all smushed together on one line. ~Also, the `filehandle.close()` command is very important. Think about a computer as someone who has a very good memory, but is very slow at writing. Therefore, when you tell the computer to write a line, it remembers what you want it to write, but it doesn't actually write the new file until you tell it you are finished. The `datafile.close()` command tells the computer you are finished giving it lines to write and that it should go ahead and write the file now. If you are trying to write a file and the file keeps coming up empty, it is probably because you forgot to close the file.~ All of this now will not be neccessary with 'context-manager' (the use of `with open('energies.txt','w+') as datafile:`). Context-manager will automatically take care of closing the file with or without any error during the process. ## A final note about string formatting -The F'string' notation that you can use with the print or the write command lets you format strings in many ways. You could include other words or whole sentences. For example, we could change the file writing line to +Also, notice that `f'{molecule_name} \t {energy} \n'` was use as a new string format. This is call f-string and was introduced for python 3.5+. An excellent tutorial is [here](https://realpython.com/python-f-strings/). + +The f-string notation that you can use with the print or the write command lets you format strings in many ways. You could include other words or whole sentences. For example, we could change the file writing line to ``` -datafile.write(F'For the file {molecule_name} the energy is {energy} in kcal/mole.') +datafile.write(f'For the file {molecule_name} the energy is {energy} in kcal/mol.') ``` {: .language-python} where anything in the braces is a python variable and it will print the value of that variable. From f998802e770500a77e29f920993991a8b87c506a Mon Sep 17 00:00:00 2001 From: khoivan88 <33493502+khoivan88@users.noreply.github.com> Date: Thu, 26 Mar 2020 13:42:56 -0400 Subject: [PATCH 2/2] Remove `break` from the for-loop to not introduce new concept --- _episodes/03-multiple_files.md | 5 ----- 1 file changed, 5 deletions(-) diff --git a/_episodes/03-multiple_files.md b/_episodes/03-multiple_files.md index 2258d95..0a1173c 100644 --- a/_episodes/03-multiple_files.md +++ b/_episodes/03-multiple_files.md @@ -80,7 +80,6 @@ for f in filenames: words = energy_line.split() energy = float(words[3]) print(energy) - break ``` {: .language-python} @@ -100,8 +99,6 @@ for f in filenames: Notice that in this code we actually used two `for` loops, one nested inside the other. The outer `for` loop counts over the filenames we read in earlier. The inner `for` loop counts over the line in each file, just as we did in our previous file parsing lesson. -`break` was used after `print(energy)` to break out of the `for` loop for reading the current file. This will stop python from reading the rest of the file content after finding the line with 'Final Energy'. - The output our code currently generates is not that useful. It doesn't show us which file each energy value came from. We want to print the name of the molecule with the energy. We can use `os.path.basename`, which is another function in `os.path` to get just the name of the file. @@ -153,7 +150,6 @@ for f in filenames: words = energy_line.split() energy = float(words[3]) print(molecule_name, energy) - break ~~~ {: .language-python} @@ -201,7 +197,6 @@ with open('energies.txt','w+') as datafile: #This opens the file for writing words = energy_line.split() energy = float(words[3]) datafile.write(f'{molecule_name} \t {energy} \n') - break ``` {: .language-python}