updated Assignment 3 with FAQs

MartenPostma · MartenPostma · commit f280b89fa651 · 2020-01-16T11:59:23.000+01:00
diff --git a/Assignments/ASSIGNMENT-3a.ipynb b/Assignments/ASSIGNMENT-3a.ipynb
@@ -8,7 +8,7 @@
     "\n",
     "## Due: Friday the 27th of September 2019 23:59 p.m.\n",
     "\n",
-    "* Please submit your assignment (notebooks of parts 3a and 3b + Python modules) as **a single .zip file** using [this google form](https://forms.gle/JiDmuLLKxbjgA8Sn8)\n",
+    "* Please submit your assignment (notebooks of parts 3a and 3b + Python modules) as **a single .zip file** using [this google form](https://forms.gle/JiDmuLLKxbjgA8Sn8). Please put the notebooks for Assignment 3a and 3b as well as the Python modules (files ending with .py) in one folder, which you call ASSIGNMENT_3_FIRSTNAME_LASTNAME. Please zip this folder and upload it as your submission.\n",
     "\n",
     "* Please name your zip file with the following naming convention: ASSIGNMENT_3_FIRSTNAME_LASTNAME.zip\n",
     "\n",
@@ -23,7 +23,13 @@
     "* Chapter 15 - Off to analyzing text \n",
     "\n",
     "\n",
-    "In this assignment, you will first complete a number of small exercises about each chapter to make sure you are familiar with the most important concepts. In the second part of the assignment, you will apply your newly acquired skills to write your very own text processing program (ASSIGNMENT-3b) :-). But don't worry, there will be instructions and hints along the way. "
+    "In this assignment, you will first complete a number of small exercises about each chapter to make sure you are familiar with the most important concepts. In the second part of the assignment, you will apply your newly acquired skills to write your very own text processing program (ASSIGNMENT-3b) :-). But don't worry, there will be instructions and hints along the way. \n",
+    "\n",
+    "**Can I use external modules?**\n",
+    "For now, please try to avoid it. All the exercises can be solved with what we have covered in block I, II, and III\n",
+    "\n",
+    "\n",
+    "\n"
    ]
   },
   {
@@ -47,7 +53,7 @@
     "\n",
     "* Hint 1: There is a specific python container which does not allow for duplicates and simply removes them. Use this one. \n",
     "* Hint 2: There is a function which sorts items in an iterable called 'sorted'. Look at the documentation to see how it is used. \n",
-    "* Hint 3: Don't forget to write a docstring. "
+    "* Hint 3: Don't forget to write a docstring. Please make sure that the docstring generally explains with the input is, what the function does, and what the function returns. If you want, but this is not needed to receive full points, you can use [reStructuredText](http://docutils.sourceforge.net/rst.html)."
    ]
   },
   {
@@ -73,7 +79,7 @@
    "metadata": {},
    "source": [
     "### Exercise 2\n",
-    "NLTK offers a way of using WordNet in Python. Do some research (using google, because quite frankly, that's what we do very often) and see if you can find out how to import it. WordNet is a computational lexicon which organizes words according to their senses (collected in synsets). See if you can print all the synset definitions (i.e. entries) of the word 'dog'.\n",
+    "NLTK offers a way of using WordNet in Python. Do some research (using google, because quite frankly, that's what we do very often) and see if you can find out how to import it. WordNet is a computational lexicon which organizes words according to their senses (collected in synsets). See if you can print all the **synset definitions** of the lemma **dog**.\n",
     "\n",
     "Make sure you have run the following cell to make sure you have installed WordNet:"
    ]
@@ -114,8 +120,8 @@
     "### Exercise  3\n",
     "\n",
     "\n",
-    "#### a.) Define a function called `count` which counts the words in a string. Do not use NLTK just yet. Find a way to test it.  \n",
-    "* Hint 1: Write a helper-function called `preprocess` which preprocesses the string (split it, remove the punctuation specified by the user, return it in a container that you think works best for the next steps). You call the function `preprocess` inside the `count` function.\n",
+    "#### a.) Define a function called `count` which determine how often each word occurs in a string. Do not use NLTK just yet. Find a way to test it.  \n",
+    "* Hint 1: Write a helper-function called `preprocess` which preprocesses the string (remove the punctuation specified by the user, split it, return it in a container that you think works best for the next steps). You call the function `preprocess` inside the `count` function.\n",
     "\n",
     "* Hint 2: Remember that there are string methods which you can use to get rid of unwanted characters. Test the `preprocess` function using the string 'this is a (tricky) test'. No assert statements are needed.\n",
     "\n",
@@ -125,7 +131,9 @@
     "\n",
     "#### b.) Create a python script \n",
     "\n",
-    "Use your editor to create a python script called `count_words.py`. Move your function call of the **count** function to this file. Move your helper function (**preprocess**) to a seperate script which you call `utils_3a.py`. Import your helper function into `count_words.py`. Test whether everything works as expected by calling the scipt `count_words.py` from the terminal. \n",
+    "Use your editor to create a Python script called **count_words.py**. You place the function definition of the **count** function in **count_words.py**. You also place a function call of the **count** function in this file to test it. Place your helper function definition, i.e., **preprocess**, in a separate script called **utils_3a.py**. Import your helper function **preprocess** into count_words.py. Test whether everything works as expected by calling the script count_words.py from the terminal.\n",
+    "\n",
+    "The function **preprocess** preprocesses the text by removing characters that are unwanted by the user. The function **count** builds upon the output from the preprocess function and creates a dictionary in which the key is a word and the value is the frequency of the word.\n",
     "\n",
     "**Please submit these scripts together with the other notebooks**.\n",
     "\n",
@@ -162,6 +170,7 @@
     "a.) Write a function called `load_text` which opens and reads a file and returns the text in the file. It should take a filepath as a argument. Test it by loading this file: ../Data/lyrics/walrus.txt\n",
     "\n",
     "* Hint: remember it is best practice to use a context manager\n",
+    "* Hint: **FileNotFoundError**: This means that the path you provide does not lead to an existing file on your computer. Please carefully study Chapter 14. Please determine where the notebook or Python module that you are working with is located on your computer. Try to determine where Python is looking if you provide a path such as “../Data/lyrics/walrus.txt”. Try to go from your notebook to the location on your computer where Python is trying to find the file. One tip: if you did not store the Assignments notebooks 3a and 3b in the folder “Assignments”, you will get this error.\n",
     "\n",
     "b.) Write a function called `replace_walrus`  which takes lyrics as input and replaces every instance of 'walrus' by 'hippo' (make sure to account for upper and lower case - it is fine to transform everything to lower case). The function should write the new version of the song to a file called 'walrus_hippo.txt and stored in ../Data/lyrics. \n",
     "\n",
@@ -257,17 +266,6 @@
     "[answer]"
    ]
   },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "collapsed": true
-   },
-   "outputs": [],
-   "source": [
-    "# your code here"
-   ]
-  },
   {
    "cell_type": "markdown",
    "metadata": {},
@@ -281,11 +279,6 @@
    "source": [
     "[answer]"
    ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": []
   }
  ],
  "metadata": {
@@ -304,7 +297,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.7.2"
+   "version": "3.6.8"
   }
  },
  "nbformat": 4,
diff --git a/Assignments/ASSIGNMENT-3b.ipynb b/Assignments/ASSIGNMENT-3b.ipynb
@@ -83,10 +83,26 @@
     "    \n",
     "for name, url in book_urls.items():\n",
     "    text = download_book(url)\n",
-    "    with open('../Data/books/'+name+'.txt', 'w') as outfile:\n",
+    "    with open('../Data/books/'+name+'.txt', 'w', encoding='utf-8') as outfile:\n",
     "        outfile.write(text)"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**Encoding issues with txt files**\n",
+    "\n",
+    "For Windows users, the file “AnnaKarenina.txt” gets the encoding cp1252. \n",
+    "In order to open the file, you have to add **encoding='utf-8'**, i.e.,\n",
+    "\n",
+    "```python\n",
+    "a_path = 'some path on your computer.txt'\n",
+    "with open(a_path, mode='r', encoding='utf-8'):\n",
+    "    # process file\n",
+    "```"
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},
@@ -198,6 +214,8 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
+    "Tip: **book2stats** is a dictionary mapping a book name (the key), e.g., ‘AnnaKarenina’, to a dictionary (the value) (the output from get_basic_stats)\n",
+    "\n",
     "Tip: please use the following code snippet to obtain the basename name of a file path:"
    ]
   },
@@ -266,7 +284,7 @@
     "and \n",
     "..\n",
     "```\n",
-    "The following code snippet can help you with obtaining the top 20 occurring tokens."
+    "The following code snippet can help you with obtaining the top 20 occurring tokens. The goal is to call the function you updated in Exercise 4a, i.e., get_basic_stats, in the file analyze.py. This also makes it possible to write the top 20 tokens to files."
    ]
   },
   {
@@ -318,7 +336,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.7.2"
+   "version": "3.6.8"
   }
  },
  "nbformat": 4,