From d1396ad0ecbb4f04537cc4e9a0d681132b243170 Mon Sep 17 00:00:00 2001
From: wendlingd <wendlingd@users.noreply.github.com>
Date: Mon, 10 Sep 2018 11:03:06 -0400
Subject: [PATCH] Scripts first post

---
 01_Text_wrangling.ipynb                  | 3274 ++++++++++++++++++++++
 02_Run_APIs.ipynb                        | 1077 +++++++
 02_Run_APIs.py                           |  983 +++++++
 03_Fuzzy_match.ipynb                     |  530 ++++
 04_Machine_learning_classification.ipynb |  906 ++++++
 05_Chart_the_trends.ipynb                |  293 ++
 05b_Chart_the_trends-BiggestMovers.ipynb |  578 ++++
 06_Load_database.ipynb                   |  316 +++
 07_UI_building.ipynb                     |  135 +
 08_Misc_fixes.ipynb                      |  306 ++
 10 files changed, 8398 insertions(+)
 create mode 100644 01_Text_wrangling.ipynb
 create mode 100644 02_Run_APIs.ipynb
 create mode 100644 02_Run_APIs.py
 create mode 100644 03_Fuzzy_match.ipynb
 create mode 100644 04_Machine_learning_classification.ipynb
 create mode 100644 05_Chart_the_trends.ipynb
 create mode 100644 05b_Chart_the_trends-BiggestMovers.ipynb
 create mode 100644 06_Load_database.ipynb
 create mode 100644 07_UI_building.ipynb
 create mode 100644 08_Misc_fixes.ipynb
diff --git a/01_Text_wrangling.ipynb b/01_Text_wrangling.ipynb
new file mode 100644
index 0000000..d6f2a9c
--- /dev/null
+++ b/01_Text_wrangling.ipynb
@@ -0,0 +1,3274 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Part 1. Text wrangling\n",
+    "App to analyze web-site search logs (internal search)<br>\n",
+    "**This script:** Resolve text formatting/syntax problems and match against historical file<br>\n",
+    "Authors: dan.wendling@nih.gov, <br>\n",
+    "Last modified: 2018-09-09\n",
+    "\n",
+    "\n",
+    "## Script contents\n",
+    "\n",
+    "1. Start-up / What to put into place, where\n",
+    "2. Unite search log data in single dataframe; globally update columns and rows\n",
+    "3. Separate out the queries with non-English characters\n",
+    "4. Run baseline dataset stats\n",
+    "5. Clean up content to improve matching\n",
+    "6. Make special-case assignments with F&R, RegEx: Bibliographic, Numeric, Named entities\n",
+    "7. Create logAfterGoldStandard - Match to the \"gold standard\" file of historical matches\n",
+    "8. Create 'uniques' dataframe/file for APIs\n",
+    "\n",
+    "\n",
+    "## FIXMEs\n",
+    "\n",
+    "Things Dan wrote for Dan; modify as needed. There are more FIXMEs in context.\n",
+    "* [ ] Update from 1:1 capture to 1:n capture\n",
+    "* [ ] Add two more runs against the UMLS Metathesaurus API:\n",
+    "    - Isolate non-English terms and remove them from percent-complete calcs.\n",
+    "      Add separate statistics for non-English terms.\n",
+    "    - Run remaining terms with \"word\" matching or \"approximate: matching;\n",
+    "      compare those suggestions to ML suggestions. Create a df with one \n",
+    "      column for each suggestion source, Metathesaurus-Approximate, \n",
+    "      LinearSVC, LogisticRegresssion...\n",
+    "* [ ] Add summary visualizations / data quality dashboard\n",
+    "* [ ] Update Cognos search log reports:\n",
+    "** [ ] Change col names to one word: 'Search Timestamp': 'Timestamp', \n",
+    "  'NLM IP Y/N':'StaffYN', 'IP':'SessionID'\n",
+    "** [ ] Make it UTF-8-enough for Python \n",
+    "** [ ] Remove 8 col of blank cells\n",
+    "** [ ] Standardize Timestamp syntax b/w CSV and Excel formats\n",
+    "** [ ] I could isolate acronyms easier if queries with unaltered case were \n",
+    "  available. I can lower-case things as needed. Reasons for receiving in lc? \n",
+    "  Any reasons to keep it the way it is?\n",
+    "* [ ] Continue changing processing order. Perhaps avoid all the extra Semantic\n",
+    "Network assignments until very end of scripts 1 and 2.\n",
+    "\n",
+    "\n",
+    "## Cheat sheets for markdown text\n",
+    "\n",
+    "* https://jupyter-notebook.readthedocs.io/en/stable/examples/Notebook/Working%20With%20Markdown%20Cells.html\n",
+    "* https://medium.com/ibm-data-science-experience/markdown-for-jupyter-notebooks-cheatsheet-386c05aeebed\n",
+    "* https://www.dataquest.io/blog/jupyter-notebook-tips-tricks-shortcuts/\n",
+    "\n",
+    "\n",
+    "## 1. Start-up / What to put into place, where\n",
+    "\n",
+    "Search log from internal site search. This script assumes an Excel file,\n",
+    " top two rows need to be ignored, and these columns:\n",
+    "\n",
+    "| ID | IP | NLM IP Y/N | Referrer | Query | Search Timestamp |\n",
+    "\n",
+    "ID               - Unique row ID\n",
+    "IP               - Unique session ID; anonomized session ID\n",
+    "NLM IP Y/N       - Whether the query was from the NLM LAN, Y or N\n",
+    "Referrer         - Where the visitor was when the search was submitted\n",
+    "Query            - The query content\n",
+    "Search Timestamp - When the query was run\n",
+    "\n",
+    "Required for this script: Referrer, Query, Search Timestamp. I use Excel \n",
+    "because my source info system breaks CSV files when the query has commas."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import pandas as pd\n",
+    "import matplotlib.pyplot as plt\n",
+    "from matplotlib.pyplot import pie, axis, show\n",
+    "import numpy as np\n",
+    "import os\n",
+    "import string\n",
+    "\n",
+    "# Set working directory\n",
+    "os.chdir('/Users/wendlingd/_webDS')\n",
+    "\n",
+    "localDir = '01_Text_wrangling_files/'\n",
+    "\n",
+    "'''\n",
+    "Before running script, copy the following new files to /00 SourceFiles/; \n",
+    "adjust names below, as needed. Make them THE SAME TIME PERIOD - one month,\n",
+    "one quarter, one year, whatever.\n",
+    "'''\n",
+    "\n",
+    "# What is your new log file named?\n",
+    "newSearchLogFile = '00_Source_files/week31.xlsx'\n",
+    "\n",
+    "# Bring in historical file of (somewhat edited) matches\n",
+    "GoldStandard = localDir + 'GoldStandard_Master.xlsx'\n",
+    "GoldStandard = pd.read_excel(GoldStandard)\n",
+    "\n",
+    "'''\n",
+    "SemanticNetworkReference - Used in progress charts\n",
+    "\n",
+    "It's a customized version of the list at \n",
+    "https://www.nlm.nih.gov/research/umls/META3_current_semantic_types.html, \n",
+    "to be used to put search terms into huge bins. Can be integrated into \n",
+    "GoldStandard and be available if we want to see the progress of assignments\n",
+    "through the process.\n",
+    "'''\n",
+    "SemanticNetworkReference = localDir + 'SemanticNetworkReference.xlsx'\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 2. Unite search log data into single dataframe; globally update columns and rows\n",
+    "\n",
+    "If csv and Tab delimited, for example: pd.read_csv(filename, sep='\\t')\n",
+    "searchLog = pd.read_csv(newSearchLogFile)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "<class 'pandas.core.frame.DataFrame'>\n",
+      "RangeIndex: 20639 entries, 0 to 20638\n",
+      "Data columns (total 6 columns):\n",
+      "ID                  20639 non-null object\n",
+      "IP                  20638 non-null object\n",
+      "NLM IP Y/N          20639 non-null object\n",
+      "Referrer            20638 non-null object\n",
+      "Query               20639 non-null object\n",
+      "Search Timestamp    20638 non-null datetime64[ns]\n",
+      "dtypes: datetime64[ns](1), object(5)\n",
+      "memory usage: 967.5+ KB\n"
+     ]
+    },
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>SessionID</th>\n",
+       "      <th>StaffYN</th>\n",
+       "      <th>Referrer</th>\n",
+       "      <th>Query</th>\n",
+       "      <th>Timestamp</th>\n",
+       "      <th>adjustedQueryCase</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>FCB8C84AEDB5855CDDB2F29E38C8C8D1</td>\n",
+       "      <td>N</td>\n",
+       "      <td>www.nlm.nih.gov/</td>\n",
+       "      <td>lichen ruber mucosae</td>\n",
+       "      <td>2018-07-30 07:48:01.000</td>\n",
+       "      <td>lichen ruber mucosae</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>1</th>\n",
+       "      <td>D052BA917FD4489BD63014BE6568670E</td>\n",
+       "      <td>N</td>\n",
+       "      <td>www.nlm.nih.gov/</td>\n",
+       "      <td>molecular identification of marine bacteria</td>\n",
+       "      <td>2018-07-30 01:14:26.000</td>\n",
+       "      <td>molecular identification of marine bacteria</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>2</th>\n",
+       "      <td>D052BA917FD4489BD63014BE6568670E</td>\n",
+       "      <td>N</td>\n",
+       "      <td>vsearch.nlm.nih.gov/vivisimo/cgi-bin/query-met...</td>\n",
+       "      <td>molecular identification of marine fishes bact...</td>\n",
+       "      <td>2018-07-30 01:23:06.000</td>\n",
+       "      <td>molecular identification of marine fishes bact...</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>3</th>\n",
+       "      <td>47C9DEE89B48E22FB53E2BE2DB107763</td>\n",
+       "      <td>N</td>\n",
+       "      <td>www.nlm.nih.gov/bsd/serfile_addedinfo.html</td>\n",
+       "      <td>secondaries brain prognostic factors</td>\n",
+       "      <td>2018-07-30 02:18:34.999</td>\n",
+       "      <td>secondaries brain prognostic factors</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>4</th>\n",
+       "      <td>993C3E958AB335FC500CB6BA0C03CBD8</td>\n",
+       "      <td>N</td>\n",
+       "      <td>vsearch.nlm.nih.gov/vivisimo/cgi-bin/query-met...</td>\n",
+       "      <td>smoking&amp;alzheimer's disease</td>\n",
+       "      <td>2018-07-30 02:26:16.999</td>\n",
+       "      <td>smoking&amp;alzheimer's disease</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "                          SessionID StaffYN  \\\n",
+       "0  FCB8C84AEDB5855CDDB2F29E38C8C8D1       N   \n",
+       "1  D052BA917FD4489BD63014BE6568670E       N   \n",
+       "2  D052BA917FD4489BD63014BE6568670E       N   \n",
+       "3  47C9DEE89B48E22FB53E2BE2DB107763       N   \n",
+       "4  993C3E958AB335FC500CB6BA0C03CBD8       N   \n",
+       "\n",
+       "                                            Referrer  \\\n",
+       "0                                   www.nlm.nih.gov/   \n",
+       "1                                   www.nlm.nih.gov/   \n",
+       "2  vsearch.nlm.nih.gov/vivisimo/cgi-bin/query-met...   \n",
+       "3         www.nlm.nih.gov/bsd/serfile_addedinfo.html   \n",
+       "4  vsearch.nlm.nih.gov/vivisimo/cgi-bin/query-met...   \n",
+       "\n",
+       "                                               Query               Timestamp  \\\n",
+       "0                               lichen ruber mucosae 2018-07-30 07:48:01.000   \n",
+       "1        molecular identification of marine bacteria 2018-07-30 01:14:26.000   \n",
+       "2  molecular identification of marine fishes bact... 2018-07-30 01:23:06.000   \n",
+       "3               secondaries brain prognostic factors 2018-07-30 02:18:34.999   \n",
+       "4                        smoking&alzheimer's disease 2018-07-30 02:26:16.999   \n",
+       "\n",
+       "                                   adjustedQueryCase  \n",
+       "0                               lichen ruber mucosae  \n",
+       "1        molecular identification of marine bacteria  \n",
+       "2  molecular identification of marine fishes bact...  \n",
+       "3               secondaries brain prognostic factors  \n",
+       "4                        smoking&alzheimer's disease  "
+      ]
+     },
+     "execution_count": 2,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "# Search log from Excel; IBM Cognos starts a new worksheet every 65k rows;\n",
+    "# open the file to see how many worksheets you need to bring in. FY18 q3 had 6 worksheets\n",
+    "searchLog = pd.read_excel(newSearchLogFile, skiprows=2)\n",
+    "'''\n",
+    "x2 = pd.read_excel(newSearchLogFile, 'Page1_2', skiprows=2)\n",
+    "x3 = pd.read_excel(newSearchLogFile, 'Page1_3', skiprows=2)\n",
+    "x4 = pd.read_excel(newSearchLogFile, 'Page1_4', skiprows=2)\n",
+    "x5 = pd.read_excel(newSearchLogFile, 'Page1_5', skiprows=2)\n",
+    "x6 = pd.read_excel(newSearchLogFile, 'Page1_6', skiprows=2)\n",
+    "# x5 = pd.read_excel('00 SourceFiles/2018-06/Queries-2018-05.xlsx', 'Page1_2', skiprows=2)\n",
+    "\n",
+    "searchLog = pd.concat([x1, x2, x3, x4, x5, x6], ignore_index=True) # , x3, x4, x5, x6, x7\n",
+    "'''\n",
+    "\n",
+    "searchLog.head(n=5)\n",
+    "searchLog.shape\n",
+    "searchLog.info()\n",
+    "searchLog.columns\n",
+    "\n",
+    "# Drop ID column, not needed\n",
+    "searchLog.drop(['ID'], axis=1, inplace=True)\n",
+    "            \n",
+    "# Until Cognos report is fixed, problem of blank columns, multi-word col name\n",
+    "# Update col name\n",
+    "searchLog = searchLog.rename(columns={'Search Timestamp': 'Timestamp', \n",
+    "                                      'NLM IP Y/N':'StaffYN',\n",
+    "                                      'IP':'SessionID'})\n",
+    "\n",
+    "# Remove https:// to become joinable with traffic data\n",
+    "searchLog['Referrer'] = searchLog['Referrer'].str.replace('https://', '')\n",
+    "\n",
+    "# Dupe off the Query column into a lower-cased 'adjustedQueryCase', which \n",
+    "# will be the column you match against\n",
+    "searchLog['adjustedQueryCase'] = searchLog['Query'].str.lower()\n",
+    "\n",
+    "# Remove incomplete rows, which can cause errors later\n",
+    "searchLog = searchLog[~pd.isnull(searchLog['Referrer'])]\n",
+    "searchLog = searchLog[~pd.isnull(searchLog['Query'])]\n",
+    "searchLog.head()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\"\\n# Start new df so you can revert if needed\\nsearchLogClean = nonForeign\\n\\n\\n\\n# When restarting work or recovering from error later, use cleaned log from file\\n# newSearchLogFile = '00 SourceFiles/2018-04/q2_2018-en-us.xlsx'\\n# searchLogClean = pd.read_excel(localDir + 'searchLogClean.xlsx')\\n\\n# Remove showForeign, nonForeign, searchLog\\n\""
+      ]
+     },
+     "execution_count": 4,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "# *** STILL NEEDED?? - COMMENTED OUT TO AVOID DAMAGING IN AUTO-RUN ***\n",
+    "# Eyeball df and remove (some) foreign-language entries APIs can't match - non-Roman's (??)\n",
+    "\n",
+    "'''\n",
+    "showForeign = searchLog.sort_values(by='adjustedQueryCase', ascending=False)\n",
+    "showForeign = showForeign.reset_index()\n",
+    "showForeign.drop(['index'], axis=1, inplace=True)\n",
+    "\n",
+    "nonForeign = showForeign[330:] # Eyeball, update to remove down to the rows the APIs will be able to parse\n",
+    "\n",
+    "# Eyeball, sorting by adjustedQueryCase, remove specific useless rows as needed\n",
+    "nonForeign.drop(41402, inplace=True)\n",
+    "nonForeign.drop(41401, inplace=True)\n",
+    "nonForeign.drop(19657, inplace=True)\n",
+    "nonForeign.drop(19656, inplace=True)\n",
+    "nonForeign.drop(19655, inplace=True)\n",
+    "nonForeign.drop(19654, inplace=True)\n",
+    "nonForeign.drop(19646, inplace=True)\n",
+    "nonForeign.drop(19647, inplace=True)\n",
+    "\n",
+    "# Space clean-up as needed\n",
+    "nonForeign['adjustedQueryCase'] = nonForeign['adjustedQueryCase'].str.replace('  ', ' ') # two spaces to one\n",
+    "nonForeign['adjustedQueryCase'] = nonForeign['adjustedQueryCase'].str.strip() # remove leading and trailing spaces\n",
+    "nonForeign = nonForeign.loc[(nonForeign['adjustedQueryCase'] != \"\")]\n",
+    "'''\n",
+    "\n",
+    "'''\n",
+    "# Start new df so you can revert if needed\n",
+    "searchLogClean = nonForeign\n",
+    "\n",
+    "# Remove showForeign, nonForeign, searchLog\n",
+    "'''\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# When restarting work or recovering from error later, use cleaned log from file\n",
+    "# newSearchLogFile = '00 SourceFiles/2018-04/q2_2018-en-us.xlsx'\n",
+    "# searchLogClean = pd.read_excel(localDir + 'searchLogClean.xlsx')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 4. Run baseline dataset stats\n",
+    "\n",
+    "Not the purpose of this project, but other staff have use for... Before we\n",
+    "cut down the log content, some quick calcuations.\n",
+    "\n",
+    "Future: Overall percentage of hit-and-runs, 'one and done'\n",
+    "Group and count by IP address - Create table of counts\n",
+    "\n",
+    "numberOfSearches = searchLog\n",
+    "numberOfSearches.groupby(['ID']).size()\n",
+    "numberOfSearches = numberOfSearches.reset_index()\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Total searches in raw log file: 20638\n",
+      "Total SEARCHES, on NLM LAN or not\n",
+      "N    20432\n",
+      "Y      206\n",
+      "Name: StaffYN, dtype: int64\n",
+      "Total SESSIONS, on NLM LAN or not\n",
+      "StaffYN\n",
+      "N    7993\n",
+      "Y      41\n",
+      "Name: SessionID, dtype: int64\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "\"\\n# How to set a date range\\nAprMay = logAfterUmlsApi1[(logAfterUmlsApi1['Timestamp'] > '2018-04-01 01:00:00') & (logAfterUmlsApi1['Timestamp'] < '2018-06-01 00:00:00')]\\n\""
+      ]
+     },
+     "execution_count": 3,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "print(\"Total searches in raw log file: {}\".format(len(searchLog)))\n",
+    "\n",
+    "# tot\n",
+    "print(\"Total SEARCHES, on NLM LAN or not\\n{}\".format(searchLog['StaffYN'].value_counts()))\n",
+    "\n",
+    "print(\"Total SESSIONS, on NLM LAN or not\\n{}\".format(searchLog.groupby('StaffYN')['SessionID'].nunique()))\n",
+    "\n",
+    "# If you see digits in text col, perhaps these are partial log entries - eyeball for removal\n",
+    "# searchLog.drop(76080, inplace=True)\n",
+    "\n",
+    "\n",
+    "# Total SEARCHES containing 'Non-English characters'\n",
+    "# print(\"Total SEARCHES with non-English characters\\n{}\".format(searchLog['preferredTerm'].value_counts()))\n",
+    "\n",
+    "# Total SESSIONS containing 'Non-English characters'\n",
+    "# Future\n",
+    "\n",
+    "'''\n",
+    "# How to set a date range\n",
+    "AprMay = logAfterUmlsApi1[(logAfterUmlsApi1['Timestamp'] > '2018-04-01 01:00:00') & (logAfterUmlsApi1['Timestamp'] < '2018-06-01 00:00:00')]\n",
+    "'''\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>Top staff queries as entered</th>\n",
+       "      <th>Count</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>1234</td>\n",
+       "      <td>7</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>1</th>\n",
+       "      <td>nnlm</td>\n",
+       "      <td>4</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>2</th>\n",
+       "      <td>dennis benson</td>\n",
+       "      <td>4</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>3</th>\n",
+       "      <td>errata</td>\n",
+       "      <td>3</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>4</th>\n",
+       "      <td>medline</td>\n",
+       "      <td>2</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>5</th>\n",
+       "      <td>itas</td>\n",
+       "      <td>2</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>6</th>\n",
+       "      <td>lister hill auditorium</td>\n",
+       "      <td>2</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>7</th>\n",
+       "      <td>nlm logo</td>\n",
+       "      <td>2</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>8</th>\n",
+       "      <td>strategic plan</td>\n",
+       "      <td>2</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>9</th>\n",
+       "      <td>sheridan</td>\n",
+       "      <td>2</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>10</th>\n",
+       "      <td>staff library</td>\n",
+       "      <td>2</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>11</th>\n",
+       "      <td>nichsr</td>\n",
+       "      <td>2</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>12</th>\n",
+       "      <td>turning the pages</td>\n",
+       "      <td>2</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>13</th>\n",
+       "      <td>sis</td>\n",
+       "      <td>2</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>14</th>\n",
+       "      <td>https://www.nlm.nih.gov/services/nlmchat.html</td>\n",
+       "      <td>2</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>15</th>\n",
+       "      <td>digital collections</td>\n",
+       "      <td>2</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>16</th>\n",
+       "      <td>locatorplus</td>\n",
+       "      <td>2</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>17</th>\n",
+       "      <td>urgoclean: the evidence base</td>\n",
+       "      <td>1</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>18</th>\n",
+       "      <td>mesh</td>\n",
+       "      <td>1</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>19</th>\n",
+       "      <td>nlm service and hours</td>\n",
+       "      <td>1</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "                     Top staff queries as entered  Count\n",
+       "0                                            1234      7\n",
+       "1                                            nnlm      4\n",
+       "2                                   dennis benson      4\n",
+       "3                                          errata      3\n",
+       "4                                         medline      2\n",
+       "5                                            itas      2\n",
+       "6                          lister hill auditorium      2\n",
+       "7                                        nlm logo      2\n",
+       "8                                  strategic plan      2\n",
+       "9                                        sheridan      2\n",
+       "10                                  staff library      2\n",
+       "11                                         nichsr      2\n",
+       "12                              turning the pages      2\n",
+       "13                                            sis      2\n",
+       "14  https://www.nlm.nih.gov/services/nlmchat.html      2\n",
+       "15                            digital collections      2\n",
+       "16                                    locatorplus      2\n",
+       "17                   urgoclean: the evidence base      1\n",
+       "18                                           mesh      1\n",
+       "19                          nlm service and hours      1"
+      ]
+     },
+     "execution_count": 8,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "# Top queries from LAN (not normalized)\n",
+    "searchLogLanYes = searchLog.loc[searchLog['StaffYN'].str.contains('Y') == True]\n",
+    "searchLogLanYesQueryCounts = searchLogLanYes['Query'].value_counts()\n",
+    "searchLogLanYesQueryCounts = searchLogLanYesQueryCounts.reset_index()\n",
+    "searchLogLanYesQueryCounts = searchLogLanYesQueryCounts.rename(columns={'index': 'Top staff queries as entered', 'Query': 'Count'})\n",
+    "searchLogLanYesQueryCounts.head(n=20)\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>Top queries from NLM LAN, from Home, as entered</th>\n",
+       "      <th>Count</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>1234</td>\n",
+       "      <td>7</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>1</th>\n",
+       "      <td>dennis benson</td>\n",
+       "      <td>4</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>2</th>\n",
+       "      <td>errata</td>\n",
+       "      <td>3</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>3</th>\n",
+       "      <td>sis</td>\n",
+       "      <td>2</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>4</th>\n",
+       "      <td>lister hill auditorium</td>\n",
+       "      <td>2</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>5</th>\n",
+       "      <td>strategic plan</td>\n",
+       "      <td>2</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>6</th>\n",
+       "      <td>turning the pages</td>\n",
+       "      <td>2</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>7</th>\n",
+       "      <td>nichsr</td>\n",
+       "      <td>2</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>8</th>\n",
+       "      <td>sheridan</td>\n",
+       "      <td>2</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>9</th>\n",
+       "      <td>nlm logo</td>\n",
+       "      <td>2</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>10</th>\n",
+       "      <td>medline</td>\n",
+       "      <td>2</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>11</th>\n",
+       "      <td>digital collections</td>\n",
+       "      <td>2</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>12</th>\n",
+       "      <td>locatorplus</td>\n",
+       "      <td>2</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>13</th>\n",
+       "      <td>itas</td>\n",
+       "      <td>2</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>14</th>\n",
+       "      <td>urgoclean: the evidence base</td>\n",
+       "      <td>1</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>15</th>\n",
+       "      <td>news and events</td>\n",
+       "      <td>1</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>16</th>\n",
+       "      <td>nlm service and hours</td>\n",
+       "      <td>1</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>17</th>\n",
+       "      <td>indexing in medline</td>\n",
+       "      <td>1</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>18</th>\n",
+       "      <td>mesh</td>\n",
+       "      <td>1</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>19</th>\n",
+       "      <td>cords</td>\n",
+       "      <td>1</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>20</th>\n",
+       "      <td>dreger</td>\n",
+       "      <td>1</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>21</th>\n",
+       "      <td>drug information</td>\n",
+       "      <td>1</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>22</th>\n",
+       "      <td>oid</td>\n",
+       "      <td>1</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>23</th>\n",
+       "      <td>phd</td>\n",
+       "      <td>1</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>24</th>\n",
+       "      <td>hsrproj</td>\n",
+       "      <td>1</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "   Top queries from NLM LAN, from Home, as entered  Count\n",
+       "0                                             1234      7\n",
+       "1                                    dennis benson      4\n",
+       "2                                           errata      3\n",
+       "3                                              sis      2\n",
+       "4                           lister hill auditorium      2\n",
+       "5                                   strategic plan      2\n",
+       "6                                turning the pages      2\n",
+       "7                                           nichsr      2\n",
+       "8                                         sheridan      2\n",
+       "9                                         nlm logo      2\n",
+       "10                                         medline      2\n",
+       "11                             digital collections      2\n",
+       "12                                     locatorplus      2\n",
+       "13                                            itas      2\n",
+       "14                    urgoclean: the evidence base      1\n",
+       "15                                 news and events      1\n",
+       "16                           nlm service and hours      1\n",
+       "17                             indexing in medline      1\n",
+       "18                                            mesh      1\n",
+       "19                                           cords      1\n",
+       "20                                          dreger      1\n",
+       "21                                drug information      1\n",
+       "22                                             oid      1\n",
+       "23                                             phd      1\n",
+       "24                                         hsrproj      1"
+      ]
+     },
+     "execution_count": 9,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "# Top queries from NLM LAN, from NLM Home (not normalized)\n",
+    "searchLogLanYesHmPg = searchLog.loc[searchLog['StaffYN'].str.contains('Y') == True]\n",
+    "searchfor = ['www.nlm.nih.gov$', 'www.nlm.nih.gov/$']\n",
+    "searchLogLanYesHmPg = searchLogLanYesHmPg[searchLogLanYesHmPg.Referrer.str.contains('|'.join(searchfor))]\n",
+    "searchLogLanYesHmPgQueryCounts = searchLogLanYesHmPg['Query'].value_counts()\n",
+    "searchLogLanYesHmPgQueryCounts = searchLogLanYesHmPgQueryCounts.reset_index()\n",
+    "searchLogLanYesHmPgQueryCounts = searchLogLanYesHmPgQueryCounts.rename(columns={'index': 'Top queries from NLM LAN, from Home, as entered', 'Query': 'Count'})\n",
+    "searchLogLanYesHmPgQueryCounts.head(n=25)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>Top queries off of LAN, as entered</th>\n",
+       "      <th>Count</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>search</td>\n",
+       "      <td>49</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>1</th>\n",
+       "      <td>diabetes</td>\n",
+       "      <td>41</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>2</th>\n",
+       "      <td>endnote</td>\n",
+       "      <td>24</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>3</th>\n",
+       "      <td>index medicus</td>\n",
+       "      <td>24</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>4</th>\n",
+       "      <td>mesh</td>\n",
+       "      <td>23</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>5</th>\n",
+       "      <td>cancer</td>\n",
+       "      <td>22</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>6</th>\n",
+       "      <td>pubmed</td>\n",
+       "      <td>21</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>7</th>\n",
+       "      <td>calcium channel blockers</td>\n",
+       "      <td>20</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>8</th>\n",
+       "      <td>international journal of scientific research</td>\n",
+       "      <td>15</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>9</th>\n",
+       "      <td>metabolic syndrome</td>\n",
+       "      <td>14</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>10</th>\n",
+       "      <td>rotator cuff injuries</td>\n",
+       "      <td>13</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>11</th>\n",
+       "      <td>stroke</td>\n",
+       "      <td>13</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>12</th>\n",
+       "      <td>keywords</td>\n",
+       "      <td>12</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>13</th>\n",
+       "      <td>nursing</td>\n",
+       "      <td>12</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>14</th>\n",
+       "      <td>breast cancer</td>\n",
+       "      <td>11</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>15</th>\n",
+       "      <td>suicide</td>\n",
+       "      <td>11</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>16</th>\n",
+       "      <td>depression</td>\n",
+       "      <td>11</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>17</th>\n",
+       "      <td>tuberculosis</td>\n",
+       "      <td>10</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>18</th>\n",
+       "      <td>heart</td>\n",
+       "      <td>10</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>19</th>\n",
+       "      <td>rxnorm</td>\n",
+       "      <td>10</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>20</th>\n",
+       "      <td>egg</td>\n",
+       "      <td>10</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>21</th>\n",
+       "      <td>adhd</td>\n",
+       "      <td>9</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>22</th>\n",
+       "      <td>icdk9 c18 acn乙腈洗脱</td>\n",
+       "      <td>9</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>23</th>\n",
+       "      <td>vancouver</td>\n",
+       "      <td>9</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>24</th>\n",
+       "      <td>an attempt at isolation and characterization o...</td>\n",
+       "      <td>9</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "                   Top queries off of LAN, as entered  Count\n",
+       "0                                              search     49\n",
+       "1                                            diabetes     41\n",
+       "2                                             endnote     24\n",
+       "3                                       index medicus     24\n",
+       "4                                                mesh     23\n",
+       "5                                              cancer     22\n",
+       "6                                              pubmed     21\n",
+       "7                            calcium channel blockers     20\n",
+       "8        international journal of scientific research     15\n",
+       "9                                  metabolic syndrome     14\n",
+       "10                              rotator cuff injuries     13\n",
+       "11                                             stroke     13\n",
+       "12                                           keywords     12\n",
+       "13                                            nursing     12\n",
+       "14                                      breast cancer     11\n",
+       "15                                            suicide     11\n",
+       "16                                         depression     11\n",
+       "17                                       tuberculosis     10\n",
+       "18                                              heart     10\n",
+       "19                                             rxnorm     10\n",
+       "20                                                egg     10\n",
+       "21                                               adhd      9\n",
+       "22                                  icdk9 c18 acn乙腈洗脱      9\n",
+       "23                                          vancouver      9\n",
+       "24  an attempt at isolation and characterization o...      9"
+      ]
+     },
+     "execution_count": 10,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "# Top queries outside NLM LAN (not normalized)\n",
+    "searchLogLanNo = searchLog.loc[searchLog['StaffYN'].str.contains('N') == True]\n",
+    "searchLogLanNoQueryCounts = searchLogLanNo['Query'].value_counts()\n",
+    "searchLogLanNoQueryCounts = searchLogLanNoQueryCounts.reset_index()\n",
+    "searchLogLanNoQueryCounts = searchLogLanNoQueryCounts.rename(columns={'index': 'Top queries off of LAN, as entered', 'Query': 'Count'})\n",
+    "searchLogLanNoQueryCounts.head(n=25)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>Top queries off of LAN, from Home, as entered</th>\n",
+       "      <th>Count</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>calcium channel blockers</td>\n",
+       "      <td>20</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>1</th>\n",
+       "      <td>diabetes</td>\n",
+       "      <td>20</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>2</th>\n",
+       "      <td>index medicus</td>\n",
+       "      <td>16</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>3</th>\n",
+       "      <td>mesh</td>\n",
+       "      <td>10</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>4</th>\n",
+       "      <td>stevia</td>\n",
+       "      <td>8</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>5</th>\n",
+       "      <td>heart</td>\n",
+       "      <td>8</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>6</th>\n",
+       "      <td>mrsa</td>\n",
+       "      <td>7</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>7</th>\n",
+       "      <td>bubonic plague</td>\n",
+       "      <td>7</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>8</th>\n",
+       "      <td>xanax</td>\n",
+       "      <td>7</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>9</th>\n",
+       "      <td>stroke</td>\n",
+       "      <td>6</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>10</th>\n",
+       "      <td>sunscreen</td>\n",
+       "      <td>6</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>11</th>\n",
+       "      <td>rxnorm</td>\n",
+       "      <td>6</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>12</th>\n",
+       "      <td>tuberculosis</td>\n",
+       "      <td>6</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>13</th>\n",
+       "      <td>nutrition</td>\n",
+       "      <td>6</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>14</th>\n",
+       "      <td>data, tools, and statistics</td>\n",
+       "      <td>6</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>15</th>\n",
+       "      <td>immunocytochemical study of human lymphoid tis...</td>\n",
+       "      <td>6</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>16</th>\n",
+       "      <td>fibromyalgia</td>\n",
+       "      <td>6</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>17</th>\n",
+       "      <td>foreign matter enters the eye</td>\n",
+       "      <td>6</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>18</th>\n",
+       "      <td>hemohim</td>\n",
+       "      <td>5</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>19</th>\n",
+       "      <td>teicoplanin</td>\n",
+       "      <td>5</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>20</th>\n",
+       "      <td>keywords</td>\n",
+       "      <td>5</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>21</th>\n",
+       "      <td>prevention techniques</td>\n",
+       "      <td>5</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>22</th>\n",
+       "      <td>pillbox</td>\n",
+       "      <td>5</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>23</th>\n",
+       "      <td>testosterone</td>\n",
+       "      <td>5</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>24</th>\n",
+       "      <td>depression</td>\n",
+       "      <td>5</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "        Top queries off of LAN, from Home, as entered  Count\n",
+       "0                            calcium channel blockers     20\n",
+       "1                                            diabetes     20\n",
+       "2                                       index medicus     16\n",
+       "3                                                mesh     10\n",
+       "4                                              stevia      8\n",
+       "5                                               heart      8\n",
+       "6                                                mrsa      7\n",
+       "7                                      bubonic plague      7\n",
+       "8                                               xanax      7\n",
+       "9                                              stroke      6\n",
+       "10                                          sunscreen      6\n",
+       "11                                             rxnorm      6\n",
+       "12                                       tuberculosis      6\n",
+       "13                                          nutrition      6\n",
+       "14                        data, tools, and statistics      6\n",
+       "15  immunocytochemical study of human lymphoid tis...      6\n",
+       "16                                       fibromyalgia      6\n",
+       "17                      foreign matter enters the eye      6\n",
+       "18                                            hemohim      5\n",
+       "19                                        teicoplanin      5\n",
+       "20                                           keywords      5\n",
+       "21                              prevention techniques      5\n",
+       "22                                            pillbox      5\n",
+       "23                                       testosterone      5\n",
+       "24                                         depression      5"
+      ]
+     },
+     "execution_count": 11,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "# Top queries outside NLM LAN, from NLM Home (not normalized)\n",
+    "searchLogLanNoHmPg = searchLog.loc[searchLog['StaffYN'].str.contains('N') == True]\n",
+    "searchfor = ['www.nlm.nih.gov$', 'www.nlm.nih.gov/$']\n",
+    "searchLogLanNoHmPg = searchLogLanNoHmPg[searchLogLanNoHmPg.Referrer.str.contains('|'.join(searchfor))]\n",
+    "searchLogLanNoHmPgQueryCounts = searchLogLanNoHmPg['Query'].value_counts()\n",
+    "searchLogLanNoHmPgQueryCounts = searchLogLanNoHmPgQueryCounts.reset_index()\n",
+    "searchLogLanNoHmPgQueryCounts = searchLogLanNoHmPgQueryCounts.rename(columns={'index': 'Top queries off of LAN, from Home page, as entered', 'Query': 'Count'})\n",
+    "searchLogLanNoHmPgQueryCounts.head(n=25)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>Top home page queries, staff or public, as entered</th>\n",
+       "      <th>Count</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>diabetes</td>\n",
+       "      <td>20</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>1</th>\n",
+       "      <td>calcium channel blockers</td>\n",
+       "      <td>20</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>2</th>\n",
+       "      <td>index medicus</td>\n",
+       "      <td>16</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>3</th>\n",
+       "      <td>mesh</td>\n",
+       "      <td>11</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>4</th>\n",
+       "      <td>heart</td>\n",
+       "      <td>9</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>5</th>\n",
+       "      <td>stevia</td>\n",
+       "      <td>8</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>6</th>\n",
+       "      <td>1234</td>\n",
+       "      <td>7</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>7</th>\n",
+       "      <td>xanax</td>\n",
+       "      <td>7</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>8</th>\n",
+       "      <td>mrsa</td>\n",
+       "      <td>7</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>9</th>\n",
+       "      <td>tuberculosis</td>\n",
+       "      <td>7</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>10</th>\n",
+       "      <td>bubonic plague</td>\n",
+       "      <td>7</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>11</th>\n",
+       "      <td>immunocytochemical study of human lymphoid tis...</td>\n",
+       "      <td>6</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>12</th>\n",
+       "      <td>sunscreen</td>\n",
+       "      <td>6</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>13</th>\n",
+       "      <td>nutrition</td>\n",
+       "      <td>6</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>14</th>\n",
+       "      <td>foreign matter enters the eye</td>\n",
+       "      <td>6</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>15</th>\n",
+       "      <td>rxnorm</td>\n",
+       "      <td>6</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>16</th>\n",
+       "      <td>fibromyalgia</td>\n",
+       "      <td>6</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>17</th>\n",
+       "      <td>data, tools, and statistics</td>\n",
+       "      <td>6</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>18</th>\n",
+       "      <td>stroke</td>\n",
+       "      <td>6</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>19</th>\n",
+       "      <td>pillbox</td>\n",
+       "      <td>5</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>20</th>\n",
+       "      <td>hemohim</td>\n",
+       "      <td>5</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>21</th>\n",
+       "      <td>depression</td>\n",
+       "      <td>5</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>22</th>\n",
+       "      <td>magnet hospitals instiutions of excellence</td>\n",
+       "      <td>5</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>23</th>\n",
+       "      <td>keywords</td>\n",
+       "      <td>5</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>24</th>\n",
+       "      <td>journal manuscript guidelines</td>\n",
+       "      <td>5</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "   Top home page queries, staff or public, as entered  Count\n",
+       "0                                            diabetes     20\n",
+       "1                            calcium channel blockers     20\n",
+       "2                                       index medicus     16\n",
+       "3                                                mesh     11\n",
+       "4                                               heart      9\n",
+       "5                                              stevia      8\n",
+       "6                                                1234      7\n",
+       "7                                               xanax      7\n",
+       "8                                                mrsa      7\n",
+       "9                                        tuberculosis      7\n",
+       "10                                     bubonic plague      7\n",
+       "11  immunocytochemical study of human lymphoid tis...      6\n",
+       "12                                          sunscreen      6\n",
+       "13                                          nutrition      6\n",
+       "14                      foreign matter enters the eye      6\n",
+       "15                                             rxnorm      6\n",
+       "16                                       fibromyalgia      6\n",
+       "17                        data, tools, and statistics      6\n",
+       "18                                             stroke      6\n",
+       "19                                            pillbox      5\n",
+       "20                                            hemohim      5\n",
+       "21                                         depression      5\n",
+       "22         magnet hospitals instiutions of excellence      5\n",
+       "23                                           keywords      5\n",
+       "24                      journal manuscript guidelines      5"
+      ]
+     },
+     "execution_count": 12,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "# Top home page queries, staff or public\n",
+    "searchfor = ['www.nlm.nih.gov$', 'www.nlm.nih.gov/$']\n",
+    "searchLogAllHmPgQueryCounts = searchLog[searchLog.Referrer.str.contains('|'.join(searchfor))]\n",
+    "searchLogAllHmPgQueryCounts = searchLogAllHmPgQueryCounts['Query'].value_counts()\n",
+    "searchLogAllHmPgQueryCounts = searchLogAllHmPgQueryCounts.reset_index()\n",
+    "searchLogAllHmPgQueryCounts = searchLogAllHmPgQueryCounts.rename(columns={'index': 'Top home page queries, staff or public, as entered', 'Query': 'Count'})\n",
+    "searchLogAllHmPgQueryCounts.head(n=25)\n",
+    "\n",
+    "# Add table, Percentage of staff, public searches done within pages, within search results\n",
+    "\n",
+    "# Add table for Top queries with columns/counts On LAN, Off LAN, Total\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\"\\nRemove manually for now.\\nNot finding an equiv to R's rm; cf https://stackoverflow.com/questions/32247643/how-to-delete-multiple-pandas-python-dataframes-from-memory-to-save-ram?rq=1\\npd.x1(), pd.x2(), # pd.x3(), pd.x4(), pd.x5(), pd.x6(), pd.x7(), \\n              pd.searchLogLanYes(), pd.searchLogLanYesHmPg(), \\n              pd.searchLogLanNo(), pd.searchLogLanNoHmPg(),\\n              pd.searchLogAllHmPg()\\n\""
+      ]
+     },
+     "execution_count": 4,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "# Remove the searches run from within search results screens, vsearch.nlm.nih.gov/vivisimo/\n",
+    "# I'm not looking at these now; you might be.\n",
+    "searchLog = searchLog[searchLog.Referrer.str.startswith(\"www.nlm.nih.gov\") == True]\n",
+    "\n",
+    "# Not sure what these are, www.nlm.nih.gov/?_ga=2.95055260.1623044406.1513044719-1901803437.1513044719\n",
+    "searchLog = searchLog[searchLog.Referrer.str.startswith(\"www.nlm.nih.gov/?_ga=\") == False]\n",
+    "\n",
+    "# FIXME - VARIABLE EXPLORER: After saving the stats, remove unneeded 'Type=DataFrame' items\n",
+    "'''\n",
+    "Remove manually for now.\n",
+    "Not finding an equiv to R's rm; cf https://stackoverflow.com/questions/32247643/how-to-delete-multiple-pandas-python-dataframes-from-memory-to-save-ram?rq=1\n",
+    "pd.x1(), pd.x2(), # pd.x3(), pd.x4(), pd.x5(), pd.x6(), pd.x7(), \n",
+    "              pd.searchLogLanYes(), pd.searchLogLanYesHmPg(), \n",
+    "              pd.searchLogLanNo(), pd.searchLogLanNoHmPg(),\n",
+    "              pd.searchLogAllHmPg()\n",
+    "'''"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>SessionID</th>\n",
+       "      <th>StaffYN</th>\n",
+       "      <th>Referrer</th>\n",
+       "      <th>Query</th>\n",
+       "      <th>Timestamp</th>\n",
+       "      <th>adjustedQueryCase</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>FCB8C84AEDB5855CDDB2F29E38C8C8D1</td>\n",
+       "      <td>N</td>\n",
+       "      <td>www.nlm.nih.gov/</td>\n",
+       "      <td>lichen ruber mucosae</td>\n",
+       "      <td>2018-07-30 07:48:01.000</td>\n",
+       "      <td>lichen ruber mucosae</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>1</th>\n",
+       "      <td>D052BA917FD4489BD63014BE6568670E</td>\n",
+       "      <td>N</td>\n",
+       "      <td>www.nlm.nih.gov/</td>\n",
+       "      <td>molecular identification of marine bacteria</td>\n",
+       "      <td>2018-07-30 01:14:26.000</td>\n",
+       "      <td>molecular identification of marine bacteria</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>3</th>\n",
+       "      <td>47C9DEE89B48E22FB53E2BE2DB107763</td>\n",
+       "      <td>N</td>\n",
+       "      <td>www.nlm.nih.gov/bsd/serfile_addedinfo.html</td>\n",
+       "      <td>secondaries brain prognostic factors</td>\n",
+       "      <td>2018-07-30 02:18:34.999</td>\n",
+       "      <td>secondaries brain prognostic factors</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>6</th>\n",
+       "      <td>0D3354A8E8C07196F17340B2C641487E</td>\n",
+       "      <td>N</td>\n",
+       "      <td>www.nlm.nih.gov/nlmhome.html</td>\n",
+       "      <td>parasites</td>\n",
+       "      <td>2018-07-30 02:43:59.999</td>\n",
+       "      <td>parasites</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>8</th>\n",
+       "      <td>5EC71AEB4FDE600004405F91FD0F0379</td>\n",
+       "      <td>N</td>\n",
+       "      <td>www.nlm.nih.gov/bsd/pmresources.html</td>\n",
+       "      <td>vojta</td>\n",
+       "      <td>2018-07-30 06:19:02.000</td>\n",
+       "      <td>vojta</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "                          SessionID StaffYN  \\\n",
+       "0  FCB8C84AEDB5855CDDB2F29E38C8C8D1       N   \n",
+       "1  D052BA917FD4489BD63014BE6568670E       N   \n",
+       "3  47C9DEE89B48E22FB53E2BE2DB107763       N   \n",
+       "6  0D3354A8E8C07196F17340B2C641487E       N   \n",
+       "8  5EC71AEB4FDE600004405F91FD0F0379       N   \n",
+       "\n",
+       "                                     Referrer  \\\n",
+       "0                            www.nlm.nih.gov/   \n",
+       "1                            www.nlm.nih.gov/   \n",
+       "3  www.nlm.nih.gov/bsd/serfile_addedinfo.html   \n",
+       "6                www.nlm.nih.gov/nlmhome.html   \n",
+       "8        www.nlm.nih.gov/bsd/pmresources.html   \n",
+       "\n",
+       "                                         Query               Timestamp  \\\n",
+       "0                         lichen ruber mucosae 2018-07-30 07:48:01.000   \n",
+       "1  molecular identification of marine bacteria 2018-07-30 01:14:26.000   \n",
+       "3         secondaries brain prognostic factors 2018-07-30 02:18:34.999   \n",
+       "6                                    parasites 2018-07-30 02:43:59.999   \n",
+       "8                                        vojta 2018-07-30 06:19:02.000   \n",
+       "\n",
+       "                             adjustedQueryCase  \n",
+       "0                         lichen ruber mucosae  \n",
+       "1  molecular identification of marine bacteria  \n",
+       "3         secondaries brain prognostic factors  \n",
+       "6                                    parasites  \n",
+       "8                                        vojta  "
+      ]
+     },
+     "execution_count": 5,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "# 5. Clean up content to improve matching\n",
+    "# ========================================\n",
+    "'''\n",
+    "NOTE: Do not limit to a-zA-Z0-9 re: non-English character sets.\n",
+    "'''\n",
+    "\n",
+    "# FIXME - Remove punctuation. Also must include a fix for punct at start WITH trailing space\n",
+    "\n",
+    "searchLog['adjustedQueryCase'] = searchLog['adjustedQueryCase'].str.replace('\"', '')\n",
+    "searchLog['adjustedQueryCase'] = searchLog['adjustedQueryCase'].str.replace(\"'\", \"\")\n",
+    "searchLog['adjustedQueryCase'] = searchLog['adjustedQueryCase'].str.replace(\"`\", \"\")\n",
+    "searchLog['adjustedQueryCase'] = searchLog['adjustedQueryCase'].str.replace('(', '')\n",
+    "searchLog['adjustedQueryCase'] = searchLog['adjustedQueryCase'].str.replace(')', '')\n",
+    "searchLog['adjustedQueryCase'] = searchLog['adjustedQueryCase'].str.replace('.', '')\n",
+    "searchLog['adjustedQueryCase'] = searchLog['adjustedQueryCase'].str.replace(',', '')\n",
+    "searchLog['adjustedQueryCase'] = searchLog['adjustedQueryCase'].str.replace('!', '')\n",
+    "searchLog['adjustedQueryCase'] = searchLog['adjustedQueryCase'].str.replace('#NAME?', '')\n",
+    "searchLog['adjustedQueryCase'] = searchLog['adjustedQueryCase'].str.replace('*', '')\n",
+    "searchLog['adjustedQueryCase'] = searchLog['adjustedQueryCase'].str.replace('$', '')\n",
+    "searchLog['adjustedQueryCase'] = searchLog['adjustedQueryCase'].str.replace('+', '')\n",
+    "searchLog['adjustedQueryCase'] = searchLog['adjustedQueryCase'].str.replace('?', '')\n",
+    "searchLog['adjustedQueryCase'] = searchLog['adjustedQueryCase'].str.replace('!', '')\n",
+    "searchLog['adjustedQueryCase'] = searchLog['adjustedQueryCase'].str.replace('#', '')\n",
+    "searchLog['adjustedQueryCase'] = searchLog['adjustedQueryCase'].str.replace('%', '')\n",
+    "searchLog['adjustedQueryCase'] = searchLog['adjustedQueryCase'].str.replace(':', '')\n",
+    "searchLog['adjustedQueryCase'] = searchLog['adjustedQueryCase'].str.replace(';', '')\n",
+    "searchLog['adjustedQueryCase'] = searchLog['adjustedQueryCase'].str.replace('{', '')\n",
+    "searchLog['adjustedQueryCase'] = searchLog['adjustedQueryCase'].str.replace('}', '')\n",
+    "searchLog['adjustedQueryCase'] = searchLog['adjustedQueryCase'].str.replace('|', '')\n",
+    "\n",
+    "# searchLog['adjustedQueryCase'] = searchLog['adjustedQueryCase'].str.replace('-', '')\n",
+    "\n",
+    "# Is backslash required?\n",
+    "searchLog['adjustedQueryCase'] = searchLog['adjustedQueryCase'].str.replace('\\^', '')\n",
+    "searchLog['adjustedQueryCase'] = searchLog['adjustedQueryCase'].str.replace('\\[', '')\n",
+    "searchLog['adjustedQueryCase'] = searchLog['adjustedQueryCase'].str.replace('\\]', '')\n",
+    "searchLog['adjustedQueryCase'] = searchLog['adjustedQueryCase'].str.replace('\\<', '')\n",
+    "searchLog['adjustedQueryCase'] = searchLog['adjustedQueryCase'].str.replace('\\>', '')\n",
+    "searchLog['adjustedQueryCase'] = searchLog['adjustedQueryCase'].str.replace('\\\\', '')\n",
+    "\n",
+    "\n",
+    "# First-character issues\n",
+    "# searchLog = searchLog[searchLog.adjustedQueryCase.str.contains(\"^[0-9]{4}\") == False] # char entities\n",
+    "searchLog = searchLog[searchLog.adjustedQueryCase.str.contains(\"^-\") == False] # char entities\n",
+    "searchLog = searchLog[searchLog.adjustedQueryCase.str.contains(\"^/\") == False] # char entities\n",
+    "searchLog = searchLog[searchLog.adjustedQueryCase.str.contains(\"^@\") == False] # char entities\n",
+    "searchLog = searchLog[searchLog.adjustedQueryCase.str.contains(\"^;\") == False] # char entities\n",
+    "searchLog = searchLog[searchLog.adjustedQueryCase.str.contains(\"^<\") == False] # char entities\n",
+    "searchLog = searchLog[searchLog.adjustedQueryCase.str.contains(\"^>\") == False] # char entities\n",
+    "\n",
+    "# If removing punct caused a preceding space, remove the space.\n",
+    "searchLog['adjustedQueryCase'] = searchLog['adjustedQueryCase'].str.replace('^  ', '')\n",
+    "searchLog['adjustedQueryCase'] = searchLog['adjustedQueryCase'].str.replace('^ ', '')\n",
+    "\n",
+    "# Drop junk rows\n",
+    "searchLog = searchLog[searchLog.adjustedQueryCase.str.startswith(\"&#\") == False] # char entities\n",
+    "searchLog = searchLog[searchLog.adjustedQueryCase.str.contains(\"^&[0-9]{4}\") == False] # char entities\n",
+    "\n",
+    "# Remove modified entries that are now dupes or blank entries\n",
+    "searchLog['adjustedQueryCase'] = searchLog['adjustedQueryCase'].str.replace('  ', ' ') # two spaces to one\n",
+    "searchLog['adjustedQueryCase'] = searchLog['adjustedQueryCase'].str.strip() # remove leading and trailing spaces\n",
+    "searchLog = searchLog.loc[(searchLog['adjustedQueryCase'] != \"\")]\n",
+    "\n",
+    "\n",
+    "# Test - Does the following do anything, good or bad? Can't tell. Remove non-ASCII; https://www.quora.com/How-do-I-remove-non-ASCII-characters-e-g-%C3%90%C2%B1%C2%A7%E2%80%A2-%C2%B5%C2%B4%E2%80%A1%C5%BD%C2%AE%C2%BA%C3%8F%C6%92%C2%B6%C2%B9-from-texts-in-Panda%E2%80%99s-DataFrame-columns\n",
+    "# I think a previous operation converted these already, for example, &#1583;&#1608;&#1588;&#1606;\n",
+    "# def remove_non_ascii(Query):\n",
+    "#    return ''.join(i for i in Query if ord(i)<128)\n",
+    "# testingOnly = uniqueSearchTerms['Query'] = uniqueSearchTerms['Query'].apply(remove_non_ascii)\n",
+    "# Also https://stackoverflow.com/questions/20078816/replace-non-ascii-characters-with-a-single-space?rq=1\n",
+    "\n",
+    "# Remove starting text that can complicate matching\n",
+    "searchLog['adjustedQueryCase'] = searchLog['adjustedQueryCase'].str.replace('^benefits of ', '')\n",
+    "searchLog['adjustedQueryCase'] = searchLog['adjustedQueryCase'].str.replace('^cause of ', '')\n",
+    "searchLog['adjustedQueryCase'] = searchLog['adjustedQueryCase'].str.replace('^cause for ', '')\n",
+    "searchLog['adjustedQueryCase'] = searchLog['adjustedQueryCase'].str.replace('^causes for ', '')\n",
+    "searchLog['adjustedQueryCase'] = searchLog['adjustedQueryCase'].str.replace('^causes of ', '')\n",
+    "searchLog['adjustedQueryCase'] = searchLog['adjustedQueryCase'].str.replace('^definition for ', '')\n",
+    "searchLog['adjustedQueryCase'] = searchLog['adjustedQueryCase'].str.replace('^definition of ', '')\n",
+    "searchLog['adjustedQueryCase'] = searchLog['adjustedQueryCase'].str.replace('^effect of ', '')\n",
+    "searchLog['adjustedQueryCase'] = searchLog['adjustedQueryCase'].str.replace('^etiology of ', '')\n",
+    "searchLog['adjustedQueryCase'] = searchLog['adjustedQueryCase'].str.replace('^symptoms of ', '')\n",
+    "searchLog['adjustedQueryCase'] = searchLog['adjustedQueryCase'].str.replace('^treating ', '')\n",
+    "searchLog['adjustedQueryCase'] = searchLog['adjustedQueryCase'].str.replace('^treatment for ', '')\n",
+    "searchLog['adjustedQueryCase'] = searchLog['adjustedQueryCase'].str.replace('^treatments for ', '')\n",
+    "searchLog['adjustedQueryCase'] = searchLog['adjustedQueryCase'].str.replace('^treatment of ', '')\n",
+    "searchLog['adjustedQueryCase'] = searchLog['adjustedQueryCase'].str.replace('^what are ', '')\n",
+    "searchLog['adjustedQueryCase'] = searchLog['adjustedQueryCase'].str.replace('^what causes ', '')\n",
+    "searchLog['adjustedQueryCase'] = searchLog['adjustedQueryCase'].str.replace('^what is a ', '')\n",
+    "searchLog['adjustedQueryCase'] = searchLog['adjustedQueryCase'].str.replace('^what is ', '')\n",
+    "\n",
+    "# Is this one different than the above? Such as, pathology of the lung\n",
+    "searchLog['adjustedQueryCase'] = searchLog['adjustedQueryCase'].str.replace('^pathology of ', '')\n",
+    "\n",
+    "searchLog.head()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 6. Make special-case assignments with F&R, RegEx: Bibliographic, Numeric, Named entities\n",
+    "\n",
+    "Later procedures can't match the below very well. For rows we know won't be matchable later, assign preferredTerm here - PubMed search strategies, whatever. Future: clean these up, add nuance.\n",
+    "\n",
+    "- Remove errant punctuation; create dataframe of unique terms with frequency.\n",
+    "- Run list of RegEx operations based on solutions to historical problems. \n",
+    "    During the rest of the project, update RegEx list to improve future matching.\n",
+    "    Designators such as PMIDs, DOIs, ISSNs, ISBNs, etc.\n",
+    "    Tag PubMed search strategies - i.e., find entries with \\[DT\\], other \n",
+    "    PubMed field tags, in entries that are over ~20 characters, indicating \n",
+    "    the entry is a PubMed search strategy.\n",
+    "- Remove known errant punctuation with RegEx\n",
+    "- Use the \"gold standard\" history file - previously matched terms that have \n",
+    "been assigned from UMLS, plus vetted terms that were matched from \n",
+    "dictionaries (named entities), etc., added manually, AND vetted. Solve in \n",
+    "the new file everything that was solved in the past.\n",
+    "\n",
+    "We have one page on the web site that is a HUGE OUTLIER, where ~30% of searches are run from, and we\n",
+    "know that people are running PubMed searches in this site search blank. \n",
+    "Pre-assign these to avoid blasting the API matching resources unnecessarily?\n",
+    "\n",
+    "FIXME - The below preferredTerm entries are added before several cols are\n",
+    "available. Later on you will need to assign SemanticTypeName, etc., so these \n",
+    "rows will be picked up in the status charts.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Assignments to preferredTerm\n",
+      "Bibliographic Entity    3901\n",
+      "pmresources.html        1502\n",
+      "Numeric Entity           417\n",
+      "Name: preferredTerm, dtype: int64\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Start new df for protection - rollbacks if needed\n",
+    "searchLogClean = searchLog\n",
+    "\n",
+    "# --- pmresources.html ---\n",
+    "searchLogClean.loc[searchLogClean['Referrer'].str.contains('/bsd/pmresources.html'), 'preferredTerm'] = 'pmresources.html'\n",
+    "# ToTestThis = searchLogClean[searchLogClean.Referrer.str.contains(\"/bsd/pmresources.html\") == True]\n",
+    "\n",
+    "\n",
+    "# --- Bibliographic Entity ---\n",
+    "# Assign ALL queries over 20 char to 'Bibliographic Entity' (often citations, search strategies, pub titles...)\n",
+    "searchLogClean.loc[(searchLogClean['adjustedQueryCase'].str.len() > 20), 'preferredTerm'] = 'Bibliographic Entity'\n",
+    "\n",
+    "# searchLogClean.loc[(searchLogClean['adjustedQueryCase'].str.len() > 25) & (~searchLogClean['preferredTerm'].str.contains('pmresources.html', na=False)), 'preferredTerm'] = 'Bibliographic Entity'\n",
+    "\n",
+    "# Search strategies might also be in the form \"clinical trial\" and \"phase 0\"\n",
+    "searchLogClean.loc[searchLogClean['adjustedQueryCase'].str.contains('[a-z]{3,}\" and \"[a-z]{3,}', na=False), 'preferredTerm'] = 'Bibliographic Entity'\n",
+    "\n",
+    "# Search strategies might also be in the form \"clinical trial\" and \"phase 0\"\n",
+    "searchLogClean.loc[searchLogClean['adjustedQueryCase'].str.contains('[a-z]{3,}\" and \"[a-z]{3,}', na=False), 'preferredTerm'] = 'Bibliographic Entity'\n",
+    "\n",
+    "# Queries about specific journal titles\n",
+    "searchLogClean.loc[searchLogClean['adjustedQueryCase'].str.contains('^journal of', na=False), 'preferredTerm'] = 'Bibliographic Entity'\n",
+    "searchLogClean.loc[searchLogClean['adjustedQueryCase'].str.contains('^international journal of', na=False), 'preferredTerm'] = 'Bibliographic Entity'\n",
+    "\n",
+    "\n",
+    "# --- Numeric Entity ---\n",
+    "# Assign entries starting with 3 digits\n",
+    "# FIXME - Clarify and grab the below, PMID, ISSN, ISBN, etc.\n",
+    "searchLogClean.loc[searchLogClean['adjustedQueryCase'].str.contains('^[0-9]{3,}', na=False), 'preferredTerm'] = 'Numeric Entity'\n",
+    "searchLogClean.loc[searchLogClean['adjustedQueryCase'].str.contains('[0-9]{5,}', na=False), 'preferredTerm'] = 'Numeric Entity'\n",
+    "\n",
+    "# Fix this - I want to restrict to entries starting with 3 digits. Can't find solution.\n",
+    "# After this, might want to let loose of dates, from 201? to 202? or similar\n",
+    "\n",
+    "\n",
+    "'''\n",
+    "If trying to clean up later in the process\n",
+    "logAfterUmlsApi2.loc[logAfterUmlsApi2['adjustedQueryCase'].str.contains('^journal of', na=False), 'preferredTerm'] = 'Bibliographic Entity'\n",
+    "logAfterUmlsApi2.loc[logAfterUmlsApi2['adjustedQueryCase'].str.contains('^international journal of', na=False), 'preferredTerm'] = 'Bibliographic Entity'\n",
+    "logAfterUmlsApi2.loc[logAfterUmlsApi2['adjustedQueryCase'].str.contains('^[0-9]{3,}', na=False), 'preferredTerm'] = 'Numeric Entity'\n",
+    "\n",
+    "Assign '^pmid [0-9]'\n",
+    "Assign '^pmc [0-9]'\n",
+    "More - ISSNs are probably ####-####, etc. Match syntax to different types\n",
+    "of numbers\n",
+    "\n",
+    "# How diffent numbers might manifest\n",
+    "# MeSH unique IDs  d009369   (d and 6 digits)\n",
+    "\n",
+    "# PMIDs            pmid 23193287, pmid23193287\n",
+    "# PMC IDs          pmc5419604, pmc/articles/pmc3221073\n",
+    "\n",
+    "nlm uid 8207799\n",
+    "nm_001096633\n",
+    "np_0105443\n",
+    "nr_1039342\n",
+    "pmc3183535\n",
+    "pmid 7187238\n",
+    "x95160\n",
+    "wp_012745022\n",
+    "\n",
+    "# Info about PMID, PMCID, Manuscript ID, DOI: https://www.ncbi.nlm.nih.gov/pmc/pmctopmid/#converter\n",
+    "'''\n",
+    "\n",
+    "print(\"Assignments to preferredTerm\\n{}\".format(searchLogClean['preferredTerm'].value_counts()))\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "\n",
+      "Total queries in searchLogClean: 11655\n",
+      "\n",
+      "Pre-processing assignments:\n",
+      "Bibliographic Entity    3901\n",
+      "pmresources.html        1502\n",
+      "Numeric Entity           417\n",
+      "Name: preferredTerm, dtype: int64\n",
+      "\n",
+      "Assigned: 5820\n",
+      "Unassigned: 5835\n",
+      "\n",
+      "Percent of queries to resolve: 50.0%\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Useful to write out the cleaned up version; if you do re-processing, you can skip a bunch of work.\n",
+    "writer = pd.ExcelWriter(localDir + 'searchLogClean.xlsx')\n",
+    "searchLogClean.to_excel(writer,'searchLogClean')\n",
+    "# df2.to_excel(writer,'Sheet2')\n",
+    "writer.save()\n",
+    "\n",
+    "\n",
+    "# -------------\n",
+    "# How we doin?\n",
+    "# -------------\n",
+    "\n",
+    "TotQueries = len(searchLogClean)\n",
+    "Assigned = searchLogClean['preferredTerm'].notnull().sum()\n",
+    "Unassigned = searchLogClean['preferredTerm'].isnull().sum()\n",
+    "PercentUnassigned = (Unassigned / TotQueries) * 100\n",
+    "\n",
+    "print(\"\\nTotal queries in searchLogClean: {}\".format(TotQueries))\n",
+    "print(\"\\nPre-processing assignments:\\n{}\".format(searchLogClean['preferredTerm'].value_counts()))\n",
+    "print(\"\\nAssigned: {}\".format(Assigned))\n",
+    "print(\"Unassigned: {}\".format(Unassigned))\n",
+    "print(\"\\nPercent of queries to resolve: {}%\".format(round(PercentUnassigned)))\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "\"\"\"\n",
+    "To look further occasionally at problems that might remain, that might require more-manual intervention...\n",
+    "\n",
+    "print(searchLogClean['preferredTerm'].value_counts())\n",
+    "\n",
+    "unassignedNow = searchLogClean[pd.isnull(searchLogClean['preferredTerm'])]\n",
+    "\n",
+    "unassignedNowUnique = unassignedNow.groupby('adjustedQueryCase').size()\n",
+    "unassignedNowUnique = pd.DataFrame({'timesSearched':unassignedNowUnique})\n",
+    "unassignedNowUnique = unassignedNowUnique.reset_index()\n",
+    "unassignedNowUnique = unassignedNowUnique.sort_values(by='timesSearched', ascending=True)\n",
+    "\n",
+    "# Drop rows\n",
+    "unassignedNowUnique = unassignedNowUnique[unassignedNowUnique.adjustedQueryCase.str.startswith(\"&\") == False] # char entities\n",
+    "\n",
+    "# df.col1.str.contains('^[Cc]ountry')\n",
+    "\n",
+    "unassignedNowUnique2 = unassignedNowUnique[unassignedNowUnique.adjustedQueryCase.str.contains(\"^&[0-9]{4}\") == False] # char entities\n",
+    "\n",
+    "# PMIDs            pmid 23193287, pmid23193287\n",
+    "searchLogClean.loc[searchLogClean['Query'].str.contains('pmid [0-9]{8}|pmid [0-9]{8}', na=False), 'preferredTerm'] = 'Numeric Entity-PubMed'\n",
+    "print(searchLogClean['preferredTerm'].value_counts())\n",
+    "\"\"\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# 7. Create logAfterGoldStandard - Match to the \"gold standard\" file of historical matches\n",
+    "\n",
+    "Maintain a list of UMLS Semantic Network terms that you've already matched\n",
+    "and edited for accuracy. Over time, applying your history of vetted matches \n",
+    "before going out to the UMLS API should lighten your overall workload.\n",
+    "\n",
+    "9/9/2018: GoldStandard should be replaced after changing API to 1:n capture.\n",
+    "\n",
+    "Abandoned method\n",
+    "logWithNamedEntities = pd.read_excel(localDir + 'logWithNamedEntities.xlsx')\n",
+    "GoldStandard = localDir + 'GoldStandard.xlsx'\n",
+    "\n",
+    "GoldStandard = pd.read_excel(localDir + 'GoldStandard.xlsx')\n",
+    "GoldStandard['adjustedQueryCase'] = GoldStandard['Query'].str.lower()\n",
+    "GoldStandard.rename(columns={'Query': 'origQuery'}, inplace=True)\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>SessionID</th>\n",
+       "      <th>StaffYN</th>\n",
+       "      <th>Referrer</th>\n",
+       "      <th>Query</th>\n",
+       "      <th>Timestamp</th>\n",
+       "      <th>adjustedQueryCase</th>\n",
+       "      <th>preferredTerm_x</th>\n",
+       "      <th>Address</th>\n",
+       "      <th>BranchPosition</th>\n",
+       "      <th>CustomTreeNumber</th>\n",
+       "      <th>EntrySource</th>\n",
+       "      <th>ResourceType</th>\n",
+       "      <th>SemanticGroup</th>\n",
+       "      <th>SemanticGroupCode</th>\n",
+       "      <th>SemanticTypeName</th>\n",
+       "      <th>contentSteward</th>\n",
+       "      <th>preferredTerm_y</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>FCB8C84AEDB5855CDDB2F29E38C8C8D1</td>\n",
+       "      <td>N</td>\n",
+       "      <td>www.nlm.nih.gov/</td>\n",
+       "      <td>lichen ruber mucosae</td>\n",
+       "      <td>2018-07-30 07:48:01.000</td>\n",
+       "      <td>lichen ruber mucosae</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>1</th>\n",
+       "      <td>D052BA917FD4489BD63014BE6568670E</td>\n",
+       "      <td>N</td>\n",
+       "      <td>www.nlm.nih.gov/</td>\n",
+       "      <td>molecular identification of marine bacteria</td>\n",
+       "      <td>2018-07-30 01:14:26.000</td>\n",
+       "      <td>molecular identification of marine bacteria</td>\n",
+       "      <td>Bibliographic Entity</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>2</th>\n",
+       "      <td>47C9DEE89B48E22FB53E2BE2DB107763</td>\n",
+       "      <td>N</td>\n",
+       "      <td>www.nlm.nih.gov/bsd/serfile_addedinfo.html</td>\n",
+       "      <td>secondaries brain prognostic factors</td>\n",
+       "      <td>2018-07-30 02:18:34.999</td>\n",
+       "      <td>secondaries brain prognostic factors</td>\n",
+       "      <td>Bibliographic Entity</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>3</th>\n",
+       "      <td>0D3354A8E8C07196F17340B2C641487E</td>\n",
+       "      <td>N</td>\n",
+       "      <td>www.nlm.nih.gov/nlmhome.html</td>\n",
+       "      <td>parasites</td>\n",
+       "      <td>2018-07-30 02:43:59.999</td>\n",
+       "      <td>parasites</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>4.0</td>\n",
+       "      <td>1113.0</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>Living Beings</td>\n",
+       "      <td>9.0</td>\n",
+       "      <td>Eukaryote</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>Parasites</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>4</th>\n",
+       "      <td>0D3354A8E8C07196F17340B2C641487E</td>\n",
+       "      <td>N</td>\n",
+       "      <td>www.nlm.nih.gov/nlmhome.html</td>\n",
+       "      <td>parasites</td>\n",
+       "      <td>2018-07-30 02:43:59.999</td>\n",
+       "      <td>parasites</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>4.0</td>\n",
+       "      <td>1113.0</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>Living Beings</td>\n",
+       "      <td>9.0</td>\n",
+       "      <td>Eukaryote</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>Parasites</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "                          SessionID StaffYN  \\\n",
+       "0  FCB8C84AEDB5855CDDB2F29E38C8C8D1       N   \n",
+       "1  D052BA917FD4489BD63014BE6568670E       N   \n",
+       "2  47C9DEE89B48E22FB53E2BE2DB107763       N   \n",
+       "3  0D3354A8E8C07196F17340B2C641487E       N   \n",
+       "4  0D3354A8E8C07196F17340B2C641487E       N   \n",
+       "\n",
+       "                                     Referrer  \\\n",
+       "0                            www.nlm.nih.gov/   \n",
+       "1                            www.nlm.nih.gov/   \n",
+       "2  www.nlm.nih.gov/bsd/serfile_addedinfo.html   \n",
+       "3                www.nlm.nih.gov/nlmhome.html   \n",
+       "4                www.nlm.nih.gov/nlmhome.html   \n",
+       "\n",
+       "                                         Query               Timestamp  \\\n",
+       "0                         lichen ruber mucosae 2018-07-30 07:48:01.000   \n",
+       "1  molecular identification of marine bacteria 2018-07-30 01:14:26.000   \n",
+       "2         secondaries brain prognostic factors 2018-07-30 02:18:34.999   \n",
+       "3                                    parasites 2018-07-30 02:43:59.999   \n",
+       "4                                    parasites 2018-07-30 02:43:59.999   \n",
+       "\n",
+       "                             adjustedQueryCase       preferredTerm_x Address  \\\n",
+       "0                         lichen ruber mucosae                   NaN     NaN   \n",
+       "1  molecular identification of marine bacteria  Bibliographic Entity     NaN   \n",
+       "2         secondaries brain prognostic factors  Bibliographic Entity     NaN   \n",
+       "3                                    parasites                   NaN     NaN   \n",
+       "4                                    parasites                   NaN     NaN   \n",
+       "\n",
+       "   BranchPosition  CustomTreeNumber EntrySource ResourceType  SemanticGroup  \\\n",
+       "0             NaN               NaN         NaN          NaN            NaN   \n",
+       "1             NaN               NaN         NaN          NaN            NaN   \n",
+       "2             NaN               NaN         NaN          NaN            NaN   \n",
+       "3             4.0            1113.0         NaN          NaN  Living Beings   \n",
+       "4             4.0            1113.0         NaN          NaN  Living Beings   \n",
+       "\n",
+       "   SemanticGroupCode SemanticTypeName contentSteward preferredTerm_y  \n",
+       "0                NaN              NaN            NaN             NaN  \n",
+       "1                NaN              NaN            NaN             NaN  \n",
+       "2                NaN              NaN            NaN             NaN  \n",
+       "3                9.0        Eukaryote            NaN       Parasites  \n",
+       "4                9.0        Eukaryote            NaN       Parasites  "
+      ]
+     },
+     "execution_count": 8,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "# FIXME - see notes below, problem here\n",
+    "logAfterGoldStandard = pd.merge(searchLogClean, GoldStandard, how='left', on='adjustedQueryCase')\n",
+    "\n",
+    "logAfterGoldStandard.head()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 23,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "Index(['SessionID', 'StaffYN', 'Referrer', 'Query', 'Timestamp',\n",
+       "       'adjustedQueryCase', 'preferredTerm_x', 'Address', 'BranchPosition',\n",
+       "       'CustomTreeNumber', 'EntrySource', 'ResourceType', 'SemanticGroup',\n",
+       "       'SemanticGroupCode', 'SemanticTypeName', 'contentSteward',\n",
+       "       'preferredTerm_y'],\n",
+       "      dtype='object')"
+      ]
+     },
+     "execution_count": 23,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "# logAfterGoldStandard.columns"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Future: Look for a better way to do the above - MERGE WITH CONDITIONAL OVERWRITE. Temporary fix:\n",
+    "logAfterGoldStandard['preferredTerm2'] = logAfterGoldStandard['preferredTerm_x'].where(logAfterGoldStandard['preferredTerm_x'].notnull(), logAfterGoldStandard['preferredTerm_y'])\n",
+    "logAfterGoldStandard.drop(['preferredTerm_x', 'preferredTerm_y'], axis=1, inplace=True)\n",
+    "logAfterGoldStandard.rename(columns={'preferredTerm2': 'preferredTerm'}, inplace=True)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\"\\nIf trying to clean up later outside normal flow\\n\\nlogAfterUmlsApi2.loc[logAfterUmlsApi2['preferredTerm'].str.contains('Bibliographic Entity', na=False), 'SemanticGroup'] = 'Concepts and Ideas'\\nlogAfterUmlsApi2.loc[logAfterUmlsApi2['preferredTerm'].str.contains('Bibliographic Entity', na=False), 'SemanticTypeName'] = 'Intellectual Product'\\nlogAfterUmlsApi2.loc[logAfterUmlsApi2['preferredTerm'].str.contains('Bibliographic Entity', na=False), 'BranchPosition'] = 3\\nlogAfterUmlsApi2.loc[logAfterUmlsApi2['preferredTerm'].str.contains('Bibliographic Entity', na=False), 'CustomTreeNumber'] = 124\\n\\nlogAfterUmlsApi2.loc[logAfterUmlsApi2['preferredTerm'].str.contains('Numeric Entity', na=False), 'SemanticGroup'] = 'Concepts and Ideas'\\nlogAfterUmlsApi2.loc[logAfterUmlsApi2['preferredTerm'].str.contains('Numeric Entity', na=False), 'SemanticTypeName'] = 'Intellectual Product'\\nlogAfterUmlsApi2.loc[logAfterUmlsApi2['preferredTerm'].str.contains('Numeric Entity', na=False), 'BranchPosition'] = 3\\nlogAfterUmlsApi2.loc[logAfterUmlsApi2['preferredTerm'].str.contains('Numeric Entity', na=False), 'CustomTreeNumber'] = 124\\n\""
+      ]
+     },
+     "execution_count": 10,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "# Now we can add the missing columns after what we started above in Bibliographic Entity, Numeric Entity rows\n",
+    "\n",
+    "# FIXME - New, change as needed\n",
+    "logAfterGoldStandard['preferredTerm'] = logAfterGoldStandard['preferredTerm'].str.replace('Bibliographic Entity', 'PubMed strategy, citation, unclear, etc.')\n",
+    "\n",
+    "logAfterGoldStandard.loc[logAfterGoldStandard['preferredTerm'].str.startswith('Bibliographic Entity', na=False), 'SemanticGroup'] = 'Unparsed'\n",
+    "logAfterGoldStandard.loc[logAfterGoldStandard['preferredTerm'].str.startswith('Bibliographic Entity', na=False), 'SemanticTypeName'] = 'Unparsed'\n",
+    "\n",
+    "\n",
+    "'''\n",
+    "Old version\n",
+    "logAfterGoldStandard.loc[logAfterGoldStandard['preferredTerm'].str.contains('Bibliographic Entity', na=False), 'SemanticGroup'] = 'Concepts and Ideas'\n",
+    "logAfterGoldStandard.loc[logAfterGoldStandard['preferredTerm'].str.contains('Bibliographic Entity', na=False), 'SemanticTypeName'] = 'Intellectual Product'\n",
+    "logAfterGoldStandard.loc[logAfterGoldStandard['preferredTerm'].str.contains('Bibliographic Entity', na=False), 'BranchPosition'] = 3\n",
+    "logAfterGoldStandard.loc[logAfterGoldStandard['preferredTerm'].str.contains('Bibliographic Entity', na=False), 'CustomTreeNumber'] = 124\n",
+    "'''\n",
+    "\n",
+    "\n",
+    "logAfterGoldStandard.loc[logAfterGoldStandard['preferredTerm'].str.contains('Numeric Entity', na=False), 'SemanticGroup'] = 'Accession Number'\n",
+    "logAfterGoldStandard.loc[logAfterGoldStandard['preferredTerm'].str.contains('Numeric Entity', na=False), 'SemanticTypeName'] = 'Accession Number'\n",
+    "# logAfterGoldStandard.loc[logAfterGoldStandard['preferredTerm'].str.contains('Numeric Entity', na=False), 'BranchPosition'] = 3\n",
+    "# logAfterGoldStandard.loc[logAfterGoldStandard['preferredTerm'].str.contains('Numeric Entity', na=False), 'CustomTreeNumber'] = 124\n",
+    "\n",
+    "logAfterGoldStandard.loc[logAfterGoldStandard['preferredTerm'].str.contains('NON-ENGLISH CHARACTERS', na=False), 'SemanticGroup'] = 'Foreign language'\n",
+    "logAfterGoldStandard.loc[logAfterGoldStandard['preferredTerm'].str.contains('NON-ENGLISH CHARACTERS', na=False), 'SemanticTypeName'] = 'Foreign language'\n",
+    "logAfterGoldStandard.loc[logAfterGoldStandard['preferredTerm'].str.contains('NON-ENGLISH CHARACTERS', na=False), 'BranchPosition'] = 0\n",
+    "logAfterGoldStandard.loc[logAfterGoldStandard['preferredTerm'].str.contains('NON-ENGLISH CHARACTERS', na=False), 'CustomTreeNumber'] = 000\n",
+    "\n",
+    "\n",
+    "'''\n",
+    "If trying to clean up later outside normal flow\n",
+    "\n",
+    "logAfterUmlsApi2.loc[logAfterUmlsApi2['preferredTerm'].str.contains('Bibliographic Entity', na=False), 'SemanticGroup'] = 'Concepts and Ideas'\n",
+    "logAfterUmlsApi2.loc[logAfterUmlsApi2['preferredTerm'].str.contains('Bibliographic Entity', na=False), 'SemanticTypeName'] = 'Intellectual Product'\n",
+    "logAfterUmlsApi2.loc[logAfterUmlsApi2['preferredTerm'].str.contains('Bibliographic Entity', na=False), 'BranchPosition'] = 3\n",
+    "logAfterUmlsApi2.loc[logAfterUmlsApi2['preferredTerm'].str.contains('Bibliographic Entity', na=False), 'CustomTreeNumber'] = 124\n",
+    "\n",
+    "logAfterUmlsApi2.loc[logAfterUmlsApi2['preferredTerm'].str.contains('Numeric Entity', na=False), 'SemanticGroup'] = 'Concepts and Ideas'\n",
+    "logAfterUmlsApi2.loc[logAfterUmlsApi2['preferredTerm'].str.contains('Numeric Entity', na=False), 'SemanticTypeName'] = 'Intellectual Product'\n",
+    "logAfterUmlsApi2.loc[logAfterUmlsApi2['preferredTerm'].str.contains('Numeric Entity', na=False), 'BranchPosition'] = 3\n",
+    "logAfterUmlsApi2.loc[logAfterUmlsApi2['preferredTerm'].str.contains('Numeric Entity', na=False), 'CustomTreeNumber'] = 124\n",
+    "'''\n",
+    "\n",
+    "# Leaving pmresources.html rows to be updated "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 37,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>SessionID</th>\n",
+       "      <th>StaffYN</th>\n",
+       "      <th>Referrer</th>\n",
+       "      <th>Query</th>\n",
+       "      <th>Timestamp</th>\n",
+       "      <th>adjustedQueryCase</th>\n",
+       "      <th>Address</th>\n",
+       "      <th>BranchPosition</th>\n",
+       "      <th>CustomTreeNumber</th>\n",
+       "      <th>EntrySource</th>\n",
+       "      <th>ResourceType</th>\n",
+       "      <th>SemanticGroup</th>\n",
+       "      <th>SemanticGroupCode</th>\n",
+       "      <th>SemanticTypeName</th>\n",
+       "      <th>contentSteward</th>\n",
+       "      <th>preferredTerm</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>FCB8C84AEDB5855CDDB2F29E38C8C8D1</td>\n",
+       "      <td>N</td>\n",
+       "      <td>www.nlm.nih.gov/</td>\n",
+       "      <td>lichen ruber mucosae</td>\n",
+       "      <td>2018-07-30 07:48:01.000</td>\n",
+       "      <td>lichen ruber mucosae</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>1</th>\n",
+       "      <td>D052BA917FD4489BD63014BE6568670E</td>\n",
+       "      <td>N</td>\n",
+       "      <td>www.nlm.nih.gov/</td>\n",
+       "      <td>molecular identification of marine bacteria</td>\n",
+       "      <td>2018-07-30 01:14:26.000</td>\n",
+       "      <td>molecular identification of marine bacteria</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>PubMed strategy, citation, unclear, etc.</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>2</th>\n",
+       "      <td>47C9DEE89B48E22FB53E2BE2DB107763</td>\n",
+       "      <td>N</td>\n",
+       "      <td>www.nlm.nih.gov/bsd/serfile_addedinfo.html</td>\n",
+       "      <td>secondaries brain prognostic factors</td>\n",
+       "      <td>2018-07-30 02:18:34.999</td>\n",
+       "      <td>secondaries brain prognostic factors</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>PubMed strategy, citation, unclear, etc.</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>3</th>\n",
+       "      <td>0D3354A8E8C07196F17340B2C641487E</td>\n",
+       "      <td>N</td>\n",
+       "      <td>www.nlm.nih.gov/nlmhome.html</td>\n",
+       "      <td>parasites</td>\n",
+       "      <td>2018-07-30 02:43:59.999</td>\n",
+       "      <td>parasites</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>4.0</td>\n",
+       "      <td>1113.0</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>Living Beings</td>\n",
+       "      <td>9.0</td>\n",
+       "      <td>Eukaryote</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>Parasites</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>4</th>\n",
+       "      <td>0D3354A8E8C07196F17340B2C641487E</td>\n",
+       "      <td>N</td>\n",
+       "      <td>www.nlm.nih.gov/nlmhome.html</td>\n",
+       "      <td>parasites</td>\n",
+       "      <td>2018-07-30 02:43:59.999</td>\n",
+       "      <td>parasites</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>4.0</td>\n",
+       "      <td>1113.0</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>Living Beings</td>\n",
+       "      <td>9.0</td>\n",
+       "      <td>Eukaryote</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>Parasites</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>5</th>\n",
+       "      <td>0D3354A8E8C07196F17340B2C641487E</td>\n",
+       "      <td>N</td>\n",
+       "      <td>www.nlm.nih.gov/nlmhome.html</td>\n",
+       "      <td>parasites</td>\n",
+       "      <td>2018-07-30 02:43:59.999</td>\n",
+       "      <td>parasites</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>4.0</td>\n",
+       "      <td>1113.0</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>Living Beings</td>\n",
+       "      <td>9.0</td>\n",
+       "      <td>Eukaryote</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>Parasites</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>6</th>\n",
+       "      <td>5EC71AEB4FDE600004405F91FD0F0379</td>\n",
+       "      <td>N</td>\n",
+       "      <td>www.nlm.nih.gov/bsd/pmresources.html</td>\n",
+       "      <td>vojta</td>\n",
+       "      <td>2018-07-30 06:19:02.000</td>\n",
+       "      <td>vojta</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>pmresources.html</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>7</th>\n",
+       "      <td>D3DC089B24442B3796F18BD8581DDAE3</td>\n",
+       "      <td>N</td>\n",
+       "      <td>www.nlm.nih.gov/</td>\n",
+       "      <td>kaatsu</td>\n",
+       "      <td>2018-07-30 08:05:02.999</td>\n",
+       "      <td>kaatsu</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>8</th>\n",
+       "      <td>47C9DEE89B48E22FB53E2BE2DB107763</td>\n",
+       "      <td>N</td>\n",
+       "      <td>www.nlm.nih.gov/</td>\n",
+       "      <td>htc</td>\n",
+       "      <td>2018-07-30 23:22:47.999</td>\n",
+       "      <td>htc</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>9</th>\n",
+       "      <td>5BA856161F7799CF43DB5F82DB0FFE5D</td>\n",
+       "      <td>N</td>\n",
+       "      <td>www.nlm.nih.gov/bsd/pmresources.html</td>\n",
+       "      <td>nedd8</td>\n",
+       "      <td>2018-07-30 04:34:01.999</td>\n",
+       "      <td>nedd8</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>pmresources.html</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>10</th>\n",
+       "      <td>9DE8BE123BD8A5C620F7F6E360C932F2</td>\n",
+       "      <td>N</td>\n",
+       "      <td>www.nlm.nih.gov/</td>\n",
+       "      <td>jour guilan uni med sci</td>\n",
+       "      <td>2018-07-30 10:07:09.999</td>\n",
+       "      <td>jour guilan uni med sci</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>PubMed strategy, citation, unclear, etc.</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>11</th>\n",
+       "      <td>61D85C39CF12C86918334F7DE47F0D5C</td>\n",
+       "      <td>N</td>\n",
+       "      <td>www.nlm.nih.gov/</td>\n",
+       "      <td>and am willing to bet there are not many other...</td>\n",
+       "      <td>2018-07-30 16:46:01.999</td>\n",
+       "      <td>and am willing to bet there are not many other...</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>PubMed strategy, citation, unclear, etc.</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>12</th>\n",
+       "      <td>3AAEAD5B07167F3E471BD2EC27A08AD2</td>\n",
+       "      <td>N</td>\n",
+       "      <td>www.nlm.nih.gov/</td>\n",
+       "      <td>james d massie</td>\n",
+       "      <td>2018-07-30 18:33:34.999</td>\n",
+       "      <td>james d massie</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>13</th>\n",
+       "      <td>47C9DEE89B48E22FB53E2BE2DB107763</td>\n",
+       "      <td>N</td>\n",
+       "      <td>www.nlm.nih.gov/bsd/pmresources.html</td>\n",
+       "      <td>wang be</td>\n",
+       "      <td>2018-07-30 00:53:34.000</td>\n",
+       "      <td>wang be</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>pmresources.html</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>14</th>\n",
+       "      <td>47C9DEE89B48E22FB53E2BE2DB107763</td>\n",
+       "      <td>N</td>\n",
+       "      <td>www.nlm.nih.gov/</td>\n",
+       "      <td>gerd</td>\n",
+       "      <td>2018-07-30 02:34:10.000</td>\n",
+       "      <td>gerd</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>6.0</td>\n",
+       "      <td>222121.0</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>Disorders</td>\n",
+       "      <td>6.0</td>\n",
+       "      <td>Disease or Syndrome</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>Gastroesophageal reflux disease</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>15</th>\n",
+       "      <td>47C9DEE89B48E22FB53E2BE2DB107763</td>\n",
+       "      <td>N</td>\n",
+       "      <td>www.nlm.nih.gov/</td>\n",
+       "      <td>gerd</td>\n",
+       "      <td>2018-07-30 02:34:10.000</td>\n",
+       "      <td>gerd</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>6.0</td>\n",
+       "      <td>222121.0</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>Disorders</td>\n",
+       "      <td>6.0</td>\n",
+       "      <td>Disease or Syndrome</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>Gastroesophageal reflux disease</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>16</th>\n",
+       "      <td>3E9E6BC66C5F4149470C99ACDFF9F4C3</td>\n",
+       "      <td>N</td>\n",
+       "      <td>www.nlm.nih.gov/</td>\n",
+       "      <td>glu</td>\n",
+       "      <td>2018-07-30 10:52:28.999</td>\n",
+       "      <td>glu</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>17</th>\n",
+       "      <td>00B419C908ABCED089BF5D2D9A895F28</td>\n",
+       "      <td>N</td>\n",
+       "      <td>www.nlm.nih.gov/bsd/pmresources.html</td>\n",
+       "      <td>adaptive design clinical trials</td>\n",
+       "      <td>2018-07-30 11:54:57.000</td>\n",
+       "      <td>adaptive design clinical trials</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>PubMed strategy, citation, unclear, etc.</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>18</th>\n",
+       "      <td>47C9DEE89B48E22FB53E2BE2DB107763</td>\n",
+       "      <td>N</td>\n",
+       "      <td>www.nlm.nih.gov/bsd/pmresources.html</td>\n",
+       "      <td>research questionnaire for prevention of child...</td>\n",
+       "      <td>2018-07-30 10:53:47.000</td>\n",
+       "      <td>research questionnaire for prevention of child...</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>PubMed strategy, citation, unclear, etc.</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>19</th>\n",
+       "      <td>6A98D34D328585DDB56B3A2D0021EA92</td>\n",
+       "      <td>N</td>\n",
+       "      <td>www.nlm.nih.gov/bsd/disted/nurses/intro_quiz.html</td>\n",
+       "      <td>translation ebp guidelines</td>\n",
+       "      <td>2018-07-30 18:36:22.000</td>\n",
+       "      <td>translation ebp guidelines</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>PubMed strategy, citation, unclear, etc.</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>20</th>\n",
+       "      <td>47C9DEE89B48E22FB53E2BE2DB107763</td>\n",
+       "      <td>N</td>\n",
+       "      <td>www.nlm.nih.gov/bsd/special_queries.html</td>\n",
+       "      <td>ms and constipation</td>\n",
+       "      <td>2018-07-30 19:37:15.000</td>\n",
+       "      <td>ms and constipation</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>21</th>\n",
+       "      <td>452F3BCA978146BBD4F31DA81671E335</td>\n",
+       "      <td>N</td>\n",
+       "      <td>www.nlm.nih.gov/bsd/pmresources.html</td>\n",
+       "      <td>digest resistant</td>\n",
+       "      <td>2018-07-30 19:37:27.000</td>\n",
+       "      <td>digest resistant</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>pmresources.html</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>22</th>\n",
+       "      <td>47C9DEE89B48E22FB53E2BE2DB107763</td>\n",
+       "      <td>N</td>\n",
+       "      <td>www.nlm.nih.gov/bsd/pubmed.html</td>\n",
+       "      <td>collie+eye+anomaly+sweden+rough</td>\n",
+       "      <td>2018-07-30 09:49:08.000</td>\n",
+       "      <td>collieeyeanomalyswedenrough</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>PubMed strategy, citation, unclear, etc.</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>23</th>\n",
+       "      <td>4C4D9C88AC39838AF94C0E10513294FC</td>\n",
+       "      <td>N</td>\n",
+       "      <td>www.nlm.nih.gov/index.html</td>\n",
+       "      <td>international journal of naval architecture an...</td>\n",
+       "      <td>2018-07-30 11:44:21.000</td>\n",
+       "      <td>international journal of naval architecture an...</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>PubMed strategy, citation, unclear, etc.</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>24</th>\n",
+       "      <td>47C9DEE89B48E22FB53E2BE2DB107763</td>\n",
+       "      <td>N</td>\n",
+       "      <td>www.nlm.nih.gov/news/genetics_tenth_anniversar...</td>\n",
+       "      <td>chd8</td>\n",
+       "      <td>2018-07-30 14:59:07.999</td>\n",
+       "      <td>chd8</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>25</th>\n",
+       "      <td>47C9DEE89B48E22FB53E2BE2DB107763</td>\n",
+       "      <td>N</td>\n",
+       "      <td>www.nlm.nih.gov/bsd/pmresources.html</td>\n",
+       "      <td>african american men social determinants of he...</td>\n",
+       "      <td>2018-07-30 23:59:07.999</td>\n",
+       "      <td>african american men social determinants of he...</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>PubMed strategy, citation, unclear, etc.</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>26</th>\n",
+       "      <td>D5354A41531899B74AB388E8A0B269E8</td>\n",
+       "      <td>N</td>\n",
+       "      <td>www.nlm.nih.gov/bsd/medline.html</td>\n",
+       "      <td>van gurp, maria</td>\n",
+       "      <td>2018-07-30 14:52:16.000</td>\n",
+       "      <td>van gurp maria</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>27</th>\n",
+       "      <td>04D4C719A11BEE4018E1A44B2C4A3D71</td>\n",
+       "      <td>N</td>\n",
+       "      <td>www.nlm.nih.gov/</td>\n",
+       "      <td>alignx</td>\n",
+       "      <td>2018-07-30 20:20:37.999</td>\n",
+       "      <td>alignx</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>28</th>\n",
+       "      <td>7598EE17908E487DECBDD2320CA9AA84</td>\n",
+       "      <td>N</td>\n",
+       "      <td>www.nlm.nih.gov/mesh/</td>\n",
+       "      <td>autohemoterapiy</td>\n",
+       "      <td>2018-07-30 17:20:12.000</td>\n",
+       "      <td>autohemoterapiy</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>29</th>\n",
+       "      <td>0F8C16185894B53CC363E8CC446F8827</td>\n",
+       "      <td>N</td>\n",
+       "      <td>www.nlm.nih.gov/mesh/meshhome.html</td>\n",
+       "      <td>liver transaminases and cholangitis</td>\n",
+       "      <td>2018-07-30 09:02:01.000</td>\n",
+       "      <td>liver transaminases and cholangitis</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>PubMed strategy, citation, unclear, etc.</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "                           SessionID StaffYN  \\\n",
+       "0   FCB8C84AEDB5855CDDB2F29E38C8C8D1       N   \n",
+       "1   D052BA917FD4489BD63014BE6568670E       N   \n",
+       "2   47C9DEE89B48E22FB53E2BE2DB107763       N   \n",
+       "3   0D3354A8E8C07196F17340B2C641487E       N   \n",
+       "4   0D3354A8E8C07196F17340B2C641487E       N   \n",
+       "5   0D3354A8E8C07196F17340B2C641487E       N   \n",
+       "6   5EC71AEB4FDE600004405F91FD0F0379       N   \n",
+       "7   D3DC089B24442B3796F18BD8581DDAE3       N   \n",
+       "8   47C9DEE89B48E22FB53E2BE2DB107763       N   \n",
+       "9   5BA856161F7799CF43DB5F82DB0FFE5D       N   \n",
+       "10  9DE8BE123BD8A5C620F7F6E360C932F2       N   \n",
+       "11  61D85C39CF12C86918334F7DE47F0D5C       N   \n",
+       "12  3AAEAD5B07167F3E471BD2EC27A08AD2       N   \n",
+       "13  47C9DEE89B48E22FB53E2BE2DB107763       N   \n",
+       "14  47C9DEE89B48E22FB53E2BE2DB107763       N   \n",
+       "15  47C9DEE89B48E22FB53E2BE2DB107763       N   \n",
+       "16  3E9E6BC66C5F4149470C99ACDFF9F4C3       N   \n",
+       "17  00B419C908ABCED089BF5D2D9A895F28       N   \n",
+       "18  47C9DEE89B48E22FB53E2BE2DB107763       N   \n",
+       "19  6A98D34D328585DDB56B3A2D0021EA92       N   \n",
+       "20  47C9DEE89B48E22FB53E2BE2DB107763       N   \n",
+       "21  452F3BCA978146BBD4F31DA81671E335       N   \n",
+       "22  47C9DEE89B48E22FB53E2BE2DB107763       N   \n",
+       "23  4C4D9C88AC39838AF94C0E10513294FC       N   \n",
+       "24  47C9DEE89B48E22FB53E2BE2DB107763       N   \n",
+       "25  47C9DEE89B48E22FB53E2BE2DB107763       N   \n",
+       "26  D5354A41531899B74AB388E8A0B269E8       N   \n",
+       "27  04D4C719A11BEE4018E1A44B2C4A3D71       N   \n",
+       "28  7598EE17908E487DECBDD2320CA9AA84       N   \n",
+       "29  0F8C16185894B53CC363E8CC446F8827       N   \n",
+       "\n",
+       "                                             Referrer  \\\n",
+       "0                                    www.nlm.nih.gov/   \n",
+       "1                                    www.nlm.nih.gov/   \n",
+       "2          www.nlm.nih.gov/bsd/serfile_addedinfo.html   \n",
+       "3                        www.nlm.nih.gov/nlmhome.html   \n",
+       "4                        www.nlm.nih.gov/nlmhome.html   \n",
+       "5                        www.nlm.nih.gov/nlmhome.html   \n",
+       "6                www.nlm.nih.gov/bsd/pmresources.html   \n",
+       "7                                    www.nlm.nih.gov/   \n",
+       "8                                    www.nlm.nih.gov/   \n",
+       "9                www.nlm.nih.gov/bsd/pmresources.html   \n",
+       "10                                   www.nlm.nih.gov/   \n",
+       "11                                   www.nlm.nih.gov/   \n",
+       "12                                   www.nlm.nih.gov/   \n",
+       "13               www.nlm.nih.gov/bsd/pmresources.html   \n",
+       "14                                   www.nlm.nih.gov/   \n",
+       "15                                   www.nlm.nih.gov/   \n",
+       "16                                   www.nlm.nih.gov/   \n",
+       "17               www.nlm.nih.gov/bsd/pmresources.html   \n",
+       "18               www.nlm.nih.gov/bsd/pmresources.html   \n",
+       "19  www.nlm.nih.gov/bsd/disted/nurses/intro_quiz.html   \n",
+       "20           www.nlm.nih.gov/bsd/special_queries.html   \n",
+       "21               www.nlm.nih.gov/bsd/pmresources.html   \n",
+       "22                    www.nlm.nih.gov/bsd/pubmed.html   \n",
+       "23                         www.nlm.nih.gov/index.html   \n",
+       "24  www.nlm.nih.gov/news/genetics_tenth_anniversar...   \n",
+       "25               www.nlm.nih.gov/bsd/pmresources.html   \n",
+       "26                   www.nlm.nih.gov/bsd/medline.html   \n",
+       "27                                   www.nlm.nih.gov/   \n",
+       "28                              www.nlm.nih.gov/mesh/   \n",
+       "29                 www.nlm.nih.gov/mesh/meshhome.html   \n",
+       "\n",
+       "                                                Query               Timestamp  \\\n",
+       "0                                lichen ruber mucosae 2018-07-30 07:48:01.000   \n",
+       "1         molecular identification of marine bacteria 2018-07-30 01:14:26.000   \n",
+       "2                secondaries brain prognostic factors 2018-07-30 02:18:34.999   \n",
+       "3                                           parasites 2018-07-30 02:43:59.999   \n",
+       "4                                           parasites 2018-07-30 02:43:59.999   \n",
+       "5                                           parasites 2018-07-30 02:43:59.999   \n",
+       "6                                               vojta 2018-07-30 06:19:02.000   \n",
+       "7                                              kaatsu 2018-07-30 08:05:02.999   \n",
+       "8                                                 htc 2018-07-30 23:22:47.999   \n",
+       "9                                               nedd8 2018-07-30 04:34:01.999   \n",
+       "10                            jour guilan uni med sci 2018-07-30 10:07:09.999   \n",
+       "11  and am willing to bet there are not many other... 2018-07-30 16:46:01.999   \n",
+       "12                                     james d massie 2018-07-30 18:33:34.999   \n",
+       "13                                            wang be 2018-07-30 00:53:34.000   \n",
+       "14                                               gerd 2018-07-30 02:34:10.000   \n",
+       "15                                               gerd 2018-07-30 02:34:10.000   \n",
+       "16                                                glu 2018-07-30 10:52:28.999   \n",
+       "17                    adaptive design clinical trials 2018-07-30 11:54:57.000   \n",
+       "18  research questionnaire for prevention of child... 2018-07-30 10:53:47.000   \n",
+       "19                         translation ebp guidelines 2018-07-30 18:36:22.000   \n",
+       "20                                ms and constipation 2018-07-30 19:37:15.000   \n",
+       "21                                   digest resistant 2018-07-30 19:37:27.000   \n",
+       "22                    collie+eye+anomaly+sweden+rough 2018-07-30 09:49:08.000   \n",
+       "23  international journal of naval architecture an... 2018-07-30 11:44:21.000   \n",
+       "24                                               chd8 2018-07-30 14:59:07.999   \n",
+       "25  african american men social determinants of he... 2018-07-30 23:59:07.999   \n",
+       "26                                    van gurp, maria 2018-07-30 14:52:16.000   \n",
+       "27                                             alignx 2018-07-30 20:20:37.999   \n",
+       "28                                    autohemoterapiy 2018-07-30 17:20:12.000   \n",
+       "29                liver transaminases and cholangitis 2018-07-30 09:02:01.000   \n",
+       "\n",
+       "                                    adjustedQueryCase Address  BranchPosition  \\\n",
+       "0                                lichen ruber mucosae     NaN             NaN   \n",
+       "1         molecular identification of marine bacteria     NaN             NaN   \n",
+       "2                secondaries brain prognostic factors     NaN             NaN   \n",
+       "3                                           parasites     NaN             4.0   \n",
+       "4                                           parasites     NaN             4.0   \n",
+       "5                                           parasites     NaN             4.0   \n",
+       "6                                               vojta     NaN             NaN   \n",
+       "7                                              kaatsu     NaN             NaN   \n",
+       "8                                                 htc     NaN             NaN   \n",
+       "9                                               nedd8     NaN             NaN   \n",
+       "10                            jour guilan uni med sci     NaN             NaN   \n",
+       "11  and am willing to bet there are not many other...     NaN             NaN   \n",
+       "12                                     james d massie     NaN             NaN   \n",
+       "13                                            wang be     NaN             NaN   \n",
+       "14                                               gerd     NaN             6.0   \n",
+       "15                                               gerd     NaN             6.0   \n",
+       "16                                                glu     NaN             NaN   \n",
+       "17                    adaptive design clinical trials     NaN             NaN   \n",
+       "18  research questionnaire for prevention of child...     NaN             NaN   \n",
+       "19                         translation ebp guidelines     NaN             NaN   \n",
+       "20                                ms and constipation     NaN             NaN   \n",
+       "21                                   digest resistant     NaN             NaN   \n",
+       "22                        collieeyeanomalyswedenrough     NaN             NaN   \n",
+       "23  international journal of naval architecture an...     NaN             NaN   \n",
+       "24                                               chd8     NaN             NaN   \n",
+       "25  african american men social determinants of he...     NaN             NaN   \n",
+       "26                                     van gurp maria     NaN             NaN   \n",
+       "27                                             alignx     NaN             NaN   \n",
+       "28                                    autohemoterapiy     NaN             NaN   \n",
+       "29                liver transaminases and cholangitis     NaN             NaN   \n",
+       "\n",
+       "    CustomTreeNumber EntrySource ResourceType  SemanticGroup  \\\n",
+       "0                NaN         NaN          NaN            NaN   \n",
+       "1                NaN         NaN          NaN            NaN   \n",
+       "2                NaN         NaN          NaN            NaN   \n",
+       "3             1113.0         NaN          NaN  Living Beings   \n",
+       "4             1113.0         NaN          NaN  Living Beings   \n",
+       "5             1113.0         NaN          NaN  Living Beings   \n",
+       "6                NaN         NaN          NaN            NaN   \n",
+       "7                NaN         NaN          NaN            NaN   \n",
+       "8                NaN         NaN          NaN            NaN   \n",
+       "9                NaN         NaN          NaN            NaN   \n",
+       "10               NaN         NaN          NaN            NaN   \n",
+       "11               NaN         NaN          NaN            NaN   \n",
+       "12               NaN         NaN          NaN            NaN   \n",
+       "13               NaN         NaN          NaN            NaN   \n",
+       "14          222121.0         NaN          NaN      Disorders   \n",
+       "15          222121.0         NaN          NaN      Disorders   \n",
+       "16               NaN         NaN          NaN            NaN   \n",
+       "17               NaN         NaN          NaN            NaN   \n",
+       "18               NaN         NaN          NaN            NaN   \n",
+       "19               NaN         NaN          NaN            NaN   \n",
+       "20               NaN         NaN          NaN            NaN   \n",
+       "21               NaN         NaN          NaN            NaN   \n",
+       "22               NaN         NaN          NaN            NaN   \n",
+       "23               NaN         NaN          NaN            NaN   \n",
+       "24               NaN         NaN          NaN            NaN   \n",
+       "25               NaN         NaN          NaN            NaN   \n",
+       "26               NaN         NaN          NaN            NaN   \n",
+       "27               NaN         NaN          NaN            NaN   \n",
+       "28               NaN         NaN          NaN            NaN   \n",
+       "29               NaN         NaN          NaN            NaN   \n",
+       "\n",
+       "    SemanticGroupCode     SemanticTypeName contentSteward  \\\n",
+       "0                 NaN                  NaN            NaN   \n",
+       "1                 NaN                  NaN            NaN   \n",
+       "2                 NaN                  NaN            NaN   \n",
+       "3                 9.0            Eukaryote            NaN   \n",
+       "4                 9.0            Eukaryote            NaN   \n",
+       "5                 9.0            Eukaryote            NaN   \n",
+       "6                 NaN                  NaN            NaN   \n",
+       "7                 NaN                  NaN            NaN   \n",
+       "8                 NaN                  NaN            NaN   \n",
+       "9                 NaN                  NaN            NaN   \n",
+       "10                NaN                  NaN            NaN   \n",
+       "11                NaN                  NaN            NaN   \n",
+       "12                NaN                  NaN            NaN   \n",
+       "13                NaN                  NaN            NaN   \n",
+       "14                6.0  Disease or Syndrome            NaN   \n",
+       "15                6.0  Disease or Syndrome            NaN   \n",
+       "16                NaN                  NaN            NaN   \n",
+       "17                NaN                  NaN            NaN   \n",
+       "18                NaN                  NaN            NaN   \n",
+       "19                NaN                  NaN            NaN   \n",
+       "20                NaN                  NaN            NaN   \n",
+       "21                NaN                  NaN            NaN   \n",
+       "22                NaN                  NaN            NaN   \n",
+       "23                NaN                  NaN            NaN   \n",
+       "24                NaN                  NaN            NaN   \n",
+       "25                NaN                  NaN            NaN   \n",
+       "26                NaN                  NaN            NaN   \n",
+       "27                NaN                  NaN            NaN   \n",
+       "28                NaN                  NaN            NaN   \n",
+       "29                NaN                  NaN            NaN   \n",
+       "\n",
+       "                               preferredTerm  \n",
+       "0                                        NaN  \n",
+       "1   PubMed strategy, citation, unclear, etc.  \n",
+       "2   PubMed strategy, citation, unclear, etc.  \n",
+       "3                                  Parasites  \n",
+       "4                                  Parasites  \n",
+       "5                                  Parasites  \n",
+       "6                           pmresources.html  \n",
+       "7                                        NaN  \n",
+       "8                                        NaN  \n",
+       "9                           pmresources.html  \n",
+       "10  PubMed strategy, citation, unclear, etc.  \n",
+       "11  PubMed strategy, citation, unclear, etc.  \n",
+       "12                                       NaN  \n",
+       "13                          pmresources.html  \n",
+       "14           Gastroesophageal reflux disease  \n",
+       "15           Gastroesophageal reflux disease  \n",
+       "16                                       NaN  \n",
+       "17  PubMed strategy, citation, unclear, etc.  \n",
+       "18  PubMed strategy, citation, unclear, etc.  \n",
+       "19  PubMed strategy, citation, unclear, etc.  \n",
+       "20                                       NaN  \n",
+       "21                          pmresources.html  \n",
+       "22  PubMed strategy, citation, unclear, etc.  \n",
+       "23  PubMed strategy, citation, unclear, etc.  \n",
+       "24                                       NaN  \n",
+       "25  PubMed strategy, citation, unclear, etc.  \n",
+       "26                                       NaN  \n",
+       "27                                       NaN  \n",
+       "28                                       NaN  \n",
+       "29  PubMed strategy, citation, unclear, etc.  "
+      ]
+     },
+     "execution_count": 37,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "logAfterGoldStandard.head(30)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Save to file so you can open in future sessions, if needed\n",
+    "writer = pd.ExcelWriter(localDir + 'logAfterGoldStandard.xlsx')\n",
+    "logAfterGoldStandard.to_excel(writer,'logAfterGoldStandard')\n",
+    "# df2.to_excel(writer,'Sheet2')\n",
+    "writer.save()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Decision point, whether to add semantic group (15 supercategories) data, to show process bars after GoldStandard??\n",
+    "# Or temp join\n",
+    "# Or move semantic assignments later to reduce processing load and here \n",
+    "# only show percent of preferredTerm assignments.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAWQAAAD7CAYAAABdXO4CAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvIxREBQAAIABJREFUeJztnXd4W+XZ/z+3vEdsk0WcRSAkTkwCSYBAgWBmS8AtUGgpq2UWOn4tvKUUuhxBR2j7lhYoLS9QRlo2tICLgRISs0kAhxBMnL294iFvW7ae3x/nOAhjS7Yj6cjS/bkuX5b0nPE9R0dfPbqf59y3GGNQFEVRnMfltABFURTFQg1ZURQlSlBDVhRFiRLUkBVFUaIENWRFUZQoQQ1ZURQlSlBDHuGIyPEislFEWkTkHKf1DAYRMSJy6ABtl4nIG5HW1B+h1iIiJ4nIrlBtLxoRkUUiUuG0jpFK3BqyiJwgIm+JiEdE6kXkTRE52m4b0gdRRKbZJpMYPsUDcgtwlzEm0xjzbxFZKSJXhXIH9jZP8ns+Q0QeE5FaEWmyvxDuFJHJodyvva+zRWSNvZ+9IrJcRKbZbUtE5B+h3mekEJFtvccSKxhjXjfG5DmtY6QSl4YsIllAMXAnMBqYBLiBTid1DZODgI9DtTERSQjSfijwLrAHmG+MyQKOBzYDJ4RKh9++HgZ+BGQDBwN3A75Q7iccRPLL2aGOgBIOjDFx9wccBTQO0DYb6AB6gJbe5YCzgDKgCdgJLPFbZwdg7OVbgC8AS4B/+C0zzV4m0X5+GbAFaAa2AhcPoGch8DbQCFQCdwHJdttmLHNqt/f7W1t3h/38Lnu5WcB/gXqgAvi63/YfBP4KvAC0Aqf1o2ElcJL9+B/A84M4x1cDm+x9PgdM9GszwKH24zF2exOwCrgVeMNuOx9YM8D2zwC6AK99rB/ar18OfGKf1y3ANX7rnATswjL4Gvt8Xu7XPqAWu/3P9nvfBLwPLPJrWwI8ZZ+fJuAqIM0+vw1AOfBjYJffOtuAaQMc30r7/VwFeIBngdF9rqUrsa691+zXv4L15dxorz/bb3tTgGeAWqCu99qw266wz1kD8BJwkP26ALfb58oDrAXm2G1n2sfUDOwGbvA/x32O8QZ7XQ/wOJDq136j/T7ssc/ZvmsjHv8cF+DIQUOWfVE+BCwGDujTfpn/B9F+7SRgLtavisOBauAcu633A5Lot/wSBjBkIMP+0ObZbbnAYQNoPRI41l5vmv3Buc6vfRt+Jmp/EK/ye56BZSKX29tYAOzt3R+WYXiwerku/w/LAHqqgMuCLHOKvY8FQArWL5HX/Nr9Dfkx4Alb5xz7w91ryIdgfbncDpwMZPbZz2fOsf3aWcB0LDMpANqABX7vYTdWmCcJy1Taet//QFrs9kuwTDsRy9Sres+XrcULnGOfxzRgKfA61q+wKcA6/MwqyDlcae9/jq3n6d5j9buWHrbb0oCZWF+op9vHdiPWF2IykAB8aJ/HDCAVOMHe1jn2crPt4/o58Jbd9iWsL54c+3zOBnLttkrsLyTggD7nuK8hrwIm2ufhE+Bau+0M+xweBqQDy1BDdl6EIwduXVwPYvWYurF6RgfabZfRx5D7Wf9PwO32494PyFAMuRE4D0gbou7rgH/5Pd9GYEO+AHi9zzbuAYrsxw8CDw9h/93AGX7Pv28fSwtwr/3a/cDv/JbJxDKrafZzAxxqG4UXmOW37G/4rAkei2WStVjm/CC2Mfc9xwPo/TfwQ/vxSVi/Jvzfpxp7H0G19LPtBuAIPy2v9Wnf0udcfZuhGfJSv+f5WL8IEvyupUP82n8BPOH33IVl6Cdh/WKr9T9uv+VKgCv7rNeGFQo7Bdhgnx9Xn/V2ANcAWX1eP4nPG/Ilfs9/B/zNfvx34Ld+bYcS54YclzFkAGPMJ8aYy4wxk7F6IROxTLZfROQYEVlhD2R5gGuBscPcdyuWUV4LVIrIf0Rk1gD7nSkixSJSJSJNWCYxlP0eBBwjIo29f8DFwAS/ZXYOYXt1WD363mO5yxiTg3XukuyXJwLb/ZZpsdeb1Gdb47C+oPz3v91/AWPMO8aYrxtjxgGLgBOBnw0kTkQWi8g79kBtI1Yv2P981Rljuv2et2F9YQTVIiI/EpFP7IHgRqy4tv+2+57HiYG2Nwj6rpsUYH99z7nPbp+E1Tvf3ue4ezkI+LPftVGP1RueZIx5FStE9hegWkT+zx5/AaszcSawXURKReQLAY6jyu9x7/nu1ex/DEO5DmOSuDVkf4wx67F6XnN6X+pnsUewetFTjDHZwN+wLtyBlm/F+hnWi78BYox5yRhzOpa5rQfuHUDeX+32GcYaQPup3377PZw+z3cCpcaYHL+/TGPMdwKsE4jlwFeDLLMH64MOgIhkYP3U391nuVqsHvcUv9emDrRRY8xqrDhov++TiKRg/bT/A9avnRys2Hig8zUoLSKyCPgJ8HWsEEcOVqjHf9t9z2PlQNsbJH3X9WKFgvrbX99zLvb6u7GugakDDP7txIqz+18facaYtwCMMXcYY47ECivMxIqDY4xZbYw5GxiP9SvkiSEeG1jnx39mzpSBFowX4tKQRWSW3duZbD+fAlwIvGMvUg1MFpFkv9VGAfXGmA4RWQhc5NdWizW4dojfa2uAE0VkqohkAzf77f9AEfmKbVSdWD/3ewaQOwor3txi96K/M8ByvVT30VEMzBSRS0Ukyf47WkRmB9nOQCwBFonIH0Vkkn08Y7FCQL08AlwuIvNsk/wN8K4xZpv/howxPVgGu0RE0kUkH/hWb7s9NfFqERlvP5+FNXDl/z5NE5He6zgZK2ZdC3SLyGLgi4M5qGBasN6HbnvbiSLyS6yxiEA8AdwsIgfY19r/G4wWPy4RkXwRSceKez9l6xxoX2eJyKkikoQV4+4E3sKK4VYCS0UkQ0RSReR4e72/2RoPAxCRbBH5mv34aPuXYRJWB6MD6BGRZBG5WESyjTFerOtzIF2BeALrOpltH+Mvh7GNmCIuDRlrZPgY4F0RacX6gK/DuogBXsUara4Skd4eyXeBW0SkGevC2dcjMMa0Ab8G3rR/+h1rjPkv1ojyWqyBkWK//bvsfe3B+olYYG+/P27AMv9mrF7040GO7c/A+SLSICJ3GGOasUzpG/b+qoDbsIxryBhjemOKk4EP7fPxpr3tX9jLLLcfP41lBNPt/ffH97F+wlZh/Up5wK+tEcuAPxKRFuBF4F9YcUiAJ+3/dSLygX2sP8B6bxqwzttzQzi8QFpewoq3bsAKDXQQ/Ce22152K/Ay1qDVUFhm66jCGoj7wUALGmMqsAYd78TqRX8Z+LIxpss28S9jxWh3YI2bXGCv9y+s6+ExOyS2DmugG6wvnHuxzuV2rLDTH+y2S4Ft9jrX2vseEsaYEuAOYAXWwOLbdtNInH4aEsQOpiuKEkWIyEqsAcv7nNYSKexfbeuAlAHi3TFPvPaQFUWJAkTkXDsEcgBWT/35eDVjUENWFMVZrsGKy2/GikMHGyOJaTRkoSiKEiVoD1lRFCVKUENWFEWJEtSQFUVRogQ1ZEVRlChBDVlRFCVKUENWFEWJEtSQFUVRogQ1ZEVRlChBDVlRFCVKUENWFEWJEtSQFUVRogQ1ZEVRlChBDVlRFCVKUENWFEWJEvoreqgoIwKP291bPHUMVjXmVKxCoF6gy/7r+7gNqMkuKtK8s0rUofmQlajE43ZPwCqcOsv+m4BlumP8/qcOc/NdWHXltmPVmPP/vx3YkV1UFLd13RTnUENWog6P270DZ0vCG+ATrOK372AV3yzPLiryOahJiQPUkJWIsbTMm3rT/KSO/toW5+WlAV8ADr7jK19ZMj4zc3Jk1QWlCVjFpyb9TnZRUZ2zkpRYQ2PISlhYWuadDBwHLADm238vAt8cYJWZWPXV6qpbWuqj0JCzgNPsPwDjcbtXA08DT2cXFW12TJkSM6ghKyFhaZk3BTgROMMYc4aI5Pez2NwAm6gBWoCa6ubm3XMnTDg8HDpDiAAL7b/bPG73h3xqzuWOKlNGLBqyUIbN0jLvTOAMLBMuEJH0IKt0Apk3zU/6XJn3xXl5ScA9wM5TDz30oKsXLvxW6BVHjPXAM8BT2UVFZU6LUUYOasjKkFha5p0DXGGMOUdEDh7GJvJvmp/0SX8Ni/PyfgWkjM/I8N1x9tk37pfQ6GEV8GfgyeyiIq/TYpToRkMWSlCWlnkzgQt9Pd3XuhISFwCIyHA3NxdrBkN/bAWOrGltrWnr6mpOT04eNdydRBELgX8Cv/e43XcD92QXFe11WJMSpeidesqALC3zHveb9zv+bny+auD/es14PwkUR94MpAHUt7fXhGBf0cRE4FfATo/bfZ/H7Z7jtCAl+tAesvIZlpZ5c4wxVxhfzzWuhMSZLldCqHcRaLCuGugBqGlpqZ6cnT091DuPAlKBK4ErPW73q8DvsouKXnJYkxIlqCErACwt82Z3e7t+7EpI+KHLlZApCWG7NAL1kKuxf7Xt8nhqFkyaFC4N0cIpwCket/u/wA3ZRUVrnRakOIuGLOKcpWXezFvebLi1p9u7KzEp+WcuV0JmmHc5zY5J90cD1m3NiRv37q0Os45o4nSgzON23+9xu3OdFqM4hxpynLK0zJtxy5sNRT1e7+7k9MyfJyQmhduIexGg3/hpSUWFwcopkfFRVVWtL76mALmAK4CNHrd7iZ04SYkz1JDjjKVl3rRb3mq8ucfbtTs5PXNJQlJSlgMygg3sZXR0d/c0dXTE463JGUARsMHjdl/hcbv1MxpH6JsdRxS9Vntxd1fnjuS0jN8kJCVnOyglkCFvxx7bqGtri6ewRV8mAvcD73vc7nlOi1EigxpyHHDzi1vzfrGialXaqJx/JCanjHVaD8EH9gxAVXNzrE19Gw7zgFUet7vI43brIHyMo4Ycw/z4ufWJN7+49fbMMRPWZeSMOdppPX4Ey2khADsaG+O5h+xPErAEy5gDnTtlhKOGHKNc/9SHx6dnj96cfeDk6xISE6OtZzVmaZl3Yn8NJRUVrYAHSFlfW6s95M8yH1jtcbuv97jdw75VUole1JBjjGsfKE368fMV946blvdaWtYBU53WE4BAPb1tQMaG2toGb09PV4T0jBRSgD8CL3jc7gOdFqOEFjXkGOLq//vvvPEH520eM/mQq1wJCdH+3gYy5E1AhgEa29trI6RnpHEGsNbjdp/utBAldET7h1YZBPkFhXLl3S9cN/XwY95Jzx7jZOmjoRDIkHdjx5FrW1s1jjww44ESj9v9XaeFKKFBDXmEk19QmHrylTc/degxp/4xKSUtxWk9QyDYwJ4PYE9TkxpyYBKAv3jc7j/pnOWRj76BI5iv3Hj7wWf96A8fTJm78Kvico20QZ7ZS8u8A2UuqsW6NmVrQ4MO7A2OHwLPetzuSN1xqYQBNeQRysW/e7Rw3pkXfTBmyvTZTmsZJqlYdfQ+R0lFhReoBNLXVVVpD3nwFAKve9zumM/KFKuoIY8w8gsKXVf+teRXswu+/Ex69ugcp/XsJ4HCFluAjOqWlvZ2r7clUoJigN4bSUKRu1qJMGrII4j8gsKMEy69/okZx572s8TklCSn9YSAYIacCnF/C/VwmAi85nG7v+y0EGVoqCGPEPILCnOPv+gHLxxy5InnOa0lhAQy5CrsW6hrWlo0jjx0MoCnPW73V5wWogweNeQRQH5B4ZRFl17/3PSFJ5/otJYQEyynhQDs8ni0hzw8koAnPG73F50WogwONeQoJ7+gcFrB5Tc+f/CRJx7ltJYwcLAmqw87KcC/PW73SU4LUYKjhhzFzDnlnENPvuqnxQcd8YUjnNYSJgQ4rL8G/2T1H1dX742zZPWhJg143uN2H+e0ECUwashRyuGnnz/7lKt/9p8pc47u17BiiEBFTzcDGW1eb3dzZ2c8JqsPJZlY+S+OdFqIMjBqyFHI4aefN/eUq3/6/MRZ8/qdpxtjBEtWnwRQ19qqA3v7TzbwsqbwjF7UkKOM/ILCecdd+P1HJ8yYO91pLREi2C3UPQBVLS0aRw4No4FXPG53vFxfIwo15Cgiv6Bw7rwzL7p72vwTYj1M4U+wmRYJADv0FupQMh5roE8LqUYZashRQn5B4dTpC0+5bc5p5x3jtJYIM2ZpmTe3vwY7WX0jkPJJba32kEPLHKyafUoUoYYcBeQXFB4wYebhtx5z/rdPdblc8fieaLJ6Z7jA43b/yGkRyqfE44c/qsgvKEzNGj/p5hO/+T/nJianJDutxyE0Wb1z3OZxu09xWoRioYbsIPkFhQnJ6ZnfPvmqmy9Lzcwe5bQeBwk09W1P7wNNVh8WEoDHPW53NJf7ihvUkB0iv6BQgHNPuuIn12ePnzTOaT0OE2xgD4A9zc06sBcexmLlvUh1Wki8o4bsHCcc+ZVv3Tjh0DnTnBYSBQRLVi+AbKuv1x5y+DgKuNtpEfGOGrID5BcUzpwwY+4Nswu+rDlrLVKBGf01fCZZfXW19pDDy+Uet/tcp0XEM2rIESa/oDArKTX9/51wyXXHuxISBuoVxiOBwhZbgYyq5uY2TVYfdv7icbuznRYRr6ghR5D8gkIX8M0TLrnu1PTs0WOc1hNlBDLkzWiy+kiRC/zeaRHxihpyZFk0/eiTz54y5+iRWgcvnGiy+ujhKk3X6QxqyBEiv6AwNzUz+4qjzr1iodNaopRAU9/2mbAmq48IAtzrcbvTnBYSb6ghR4D8gsJE4MoTLr1uYUp6ZpbTeqKUYMnqu4GETXV12kOODIcCS5wWEW+oIUeGU6cvPOXEiXnzZjktJIoJlKzeh52sfl1VVa0mq48YP9Lq1ZFFDTnM5BcUTnIlJF6woPBSvbCDE+wW6kw7WX19pATFOQnA/R63W2cDRQg15DBiz6q4fP5ZF09Py8rRWRXBCWTIO4BEgDq9hTqSzAMudlpEvKCGHF7mp2Rm5c88/oxYLFAaDoLdQu0DTVbvAEUetzvRaRHxgBpymMgvKEwBLj7m/GumJ6WkpjutZ4Sgyeqjk0OAK5wWEQ+oIYePgpyJB02ZOnehxo4Hz9ggyeo9QMp6TVbvBD/3uN0pTouIddSQw0B+QWEWcN6xX7t2tishUX/qDY2gyeorrGT13gjpUSymANc4LSLWUUMOD2dNzj9qwrhpefFUGy9UBLuFOt0AjR0dGraIPDd73G4Nv4URNeQQk19QmAucftS5l88TEafljEQCGfIurPnK7G1tVUOOPBOA7zstIpZRQw4hdtL58w9esGhs1riJWoFheAQy5BrsnBa7m5o0juwMN3rc7niubhNW1JBDywzgqPxTzuk3t68yKPIDJKuvwbpmNVm9c4xBZ1yEDTXk0HLW6CnTXaMnTctzWsgIJhUrj8LnsJPVV6HJ6p3maqcFxCpqyCEiv6BwAnD4vMUXThdxafB4/wiU+W0LmqzeaQ7zuN3HOy0iFlFDDh0npaSPYsKMuTrveP8JNtMiDaC+rU17yc6hveQwoIYcAvILCjOAU44488LcxKRknTy//wS7Y68HoEZvoXaSr3vc7hynRcQaasih4RiQxGnzjtOcFaEhmCELwK6mJu0hO0cacInTImINNeT9JL+gMAEonLVocVZqZrZmdAsNhywt82YM0LYvWf3GvXu1h+wsGrYIMWrI+89hwAEzj//SEU4LiSE0Wf3I4HCP232M0yJiCTXk/cC+EeSsjAPGdWePn3yI03pijEBhiy1Ahiarjwq0lxxC1JD3j4nAzNknnjVeXC49l6El0NS3bUASQF1bm4YtnOVsj9ut136I0BO5fxwOmNy8ebOdFhKDBBvYMwBVzc06sOcsY4GjnRYRK6ghDxM7XLEodVROS/aBGq4IA8FyWgjAjsZG7SE7z5lOC4gV1JCHz4FA7qxFZ050JSRoEcjQM3ZpmXdCfw0lFRUtQBOQUlFbqz1k51FDDhFqyMPnMMBMmr0g32khMUygXvJ2IGN9TU29Jqt3nCM9bvd4p0XEAmrIw2dRUmp6S07u1H4T4SghIZAhb0KT1UcLApzhtIhYQA15GOQXFI4FDpq16MzxCYlJSU7riWECGfJuNFl9NKFhixCghjw88gEm5R85y2khMU6gqW/7Zlrs0WT10cAXPW73kMZSRGSaiKzr89oSEbkhtNI+t99bROS0MO/jMhG5a6jrqSEPjxOAppwJUw52WkiMM6hk9Vvr67WH7DwHAMc6LWIwGGN+aYx5xWkd/aGGPETyCwpHAYeOmXqoLzktI8tpPTFOsGT11UDax9XV2kOODkKWI1lEVorIbSKySkQ2iMgi+/VpIvK6iHxg/x1nv54rIq+JyBoRWScii0QkQUQetJ9/JCLX28s+KCLn24/PFJH1IvKGiNwhIsX260tE5O+2ji0i8gM/bZfYutaIyD0ikmC/frmttXS450INeegcBJgpc47RmnmRIVhu5MxKK1l9a6QEKQNyZIi3l2iMWQhcBxTZr9UApxtjFgAXAHfYr18EvGSMmQccAawB5gGTjDFzjDFzgQf8Ny4iqcA9wGJjzAnAuD77nwV8CVgIFIlIkojMtvd7vL2vHuBiEckF3FhGfDp2WHOoqCEPnUMBM27aTDXkyBAsp0UqQL3eQh0NDNWQB0oM1fv6M/b/94Fp9uMk4F4R+Qh4kk+NbzVwuYgsAeYaY5qxro9DROROETkDa+66P7OALcaYrfbzR/u0/8cY02mM2Yv1RXAgcCrWca4WkTX280OAY4CVxphaY0wX8PhgTkBf1JCHzlygKWv8pMlOC4kTAhlyFeADTVYfJUwfYtL6OqzYsz+jgb324077fw+QaD++HitUdQRwFJAMYIx5DTgRa/bNMhH5pjGmwV5uJfA94L4++wpWaq3T73GvBgEeMsbMs//yjDFL7GX2O/OgGvIQyC8oTAGmJaaktadl5ehE+MgwqFuoNVl91DDoEmbGmBagUkROBRCR0Vjzmd8IsFo2UGmM8QGXAr3x24OAGmPMvcD9wAIRGQu4jDFPA7/oR9t6rB70NPv5BYOQvRw4X0TG92q29/0ucJKIjBGRJOBrg9jW51BDHhqTACbnHznO5UrQcxcZAiWrr8dOVr9Jk9VHC/3msQ7AN4Gf2z//XwXcxpjNAZa/G/iWiLwDzAR6xw5OAtaISBlwHvBnrM/rSnvbDwI3+2/IGNMOfBd4UUTewOp5ewKJNcaUAz8HXhaRtcB/gVxjTCWwBHgbeAX4YDAH35fE4IsofuQCrvGHzD7QaSFxhAvrQ76qb0NJRYVvcV7eTmD0uurqWmOMERGt+O0sQzJk2+BO7uf1k/we78WOIRtjNvLZ+ek3268/BDzUzy4+12M3xlzm93SFMWaWfd38BXjPXmZJn3Xm+D1+nH5ixMaYB+gzcDhU1JCHxkygIyd3aq7TQuKMufRjyDabgSmtXV1NzZ2d9VmpqSO+jNbc229nVEoKLhESXS5WXnMNDW1tXP7UU+xobGRqTg4Pfu1r5KSl8Wx5Ob9dsYID0tL45ze+wej0dLbW13Pr8uX8/WvD+tW8vwy1h+w0V4vIt7Bi0WVYsy4cQw15aMwEWjJyxox1WkicESiOvI1Pk9XXxIIhAzz/rW8xJuPTSM3tb7xBwcEHc/2iRdz++uvc/sYbuE8/nb+89Rb/veoqnlm3jic/+ohrjjmGX736Kj875RSnpI+oZFvGmNuB253W0YvGQQdJfkFhKta0l7bk9Ey9ISSyBBvY8wFUNjfHbBz5hYoKLpw3D4AL583jP+vXA+ASoau7mzavlySXi7e2b+fAzEymj3Hse2m0x+3uO3NCGSRqyIPnAMAnLhdJqelqyJElWPUQF8ROsnoR4dxlyyi45x4efO89AGpaWpgwahQAE0aNorbVGsv6yUkn8dV//IOVW7Zw3ty5/OG117ixoMAx7TZ9b7BQBomGLAZPNkDOhKmZOsMi4oxbWuY98Kb5Sf0ZbivQAiTHSrL6l664gtysLGpbWjhn2TJmjB04Qnby9OmcPH06AI+sWcPpM2awce9e7nzrLXLS0lh6xhmkJydHSnov44ENkd5pLKDGMniyAFfOhCnaO3aGfjO/lVRUGGArkFlRW9vQ7fON+GT1uVnWJTYuM5PCWbP4YPduxmdmUtXcDEBVczPjMj47E7Ctq4tH16zhqqOPxr18OXedfTbzcnN58qOPIq4f7SEPGzXkwTMaYNS43GynhcQpwZLVZ/iMMY3t7bWREhQOWru6aO7s3Pd4xebNzB4/nsV5eTy6Zg0Aj65Zw5l5eZ9Z789vvsm1xx5LUkICHV4vIoJLhDavI99PasjDREMWg2cC0JlxwLhJTguJU4IlqwegtrW1emxGxsQI6AkLtS0tXPy4NcW1x+fj/LlzOW3GDBZMmsRlTz7JsrIyJmdn85DflLbKpibW7NnDzSdb03m/f9xxnH7ffWSnpvLPb3zDicNQQx4masiD50CgIz17tIYsnCHYwJ4BqGxqqpk9fuTe1T5t9Gje/M53Pvf66PR0nvvWt/pdJzcri8cvvnjf83MOO4xzDnN0OvDIfQMcRkMWg2cs0Jmama0hC2fIX1rmHeh6rcXKaSFb6utjYqbFCEd7yMNEDXkQ5BcUurBiyJ3JaRmZTuuJU9IYOFl9F9Z8ZE1WHx2oIQ8TNeTBMcr+byQhYUh1w5SQEihssRXI0GT1UUFM3C3pBGrIgyMdO0Yp4tJz5hyBip5uwupFa7J659EET8NEzWVw7DtPIqLnzDmCDexZyepbW2PiBpERTLfTAkYqai6D41NDdmkP2UGCGbKVrN7j0R6ys6ghDxM1l8GxL26sPWRHOWRpmTd9gLZ9yeo31NZWRVCT8nnUkIeJmsvg8AtZaA/ZQVzAnP4aSioqfMAuIGNtVVWtz+fzRVSZ4k+P0wJGKmoug0NDFtHDvABtG4DMzu7unsaOjhF9C/UIR3vIw0TNZXB8ep40ZOE08wO0bcW++7SmpaUyMnKUflBDHiZqLoPDf5aFTulxlkA95CrsmRa7mpo0juwcasjDRA15cOw7T77u7i4nhSgcHuAW6iqs90o27d2rPWTnUEMeJmrIg0PsP7wd7XoXmLOkA3n9Ndi3UFcC6Wv27KkyxkRUmLKPeqcFjFTUkAdHJ/adet7OtjYnBNw09mN1AAAZbklEQVR21gz+9PX53PGNo7jr4mMBaPPUc/93FvOHs/O5/zuLaW9qAGDd8me4/fwjuOeKk2ltrAOgbudmHr3p4gG3P8IIFEfeBGQ2dnR0NXd2qjE4wy6nBYxU1JAHxz4T7mpvc6yHfPU9/+UHj73H9//5DgClD/yO6QtP5oZny5m+8GRWPvA7AF5f9ie++9AbzC+8hA9ffAyAl+8u4vTvLnFKeqgJZMgbgRSAva2tGkd2hp1OCxipqCEPjlbskEVnW3PUhCzKS59nQeGlACwovJTylc8BIC4X3V2deDvaSEhMYusHbzBqbC5jp85wUm4oCWTI+wb29jQ3qyE7gxryMFFDHhxt2Oeqs7XZkZCFiPD3753JnRcdw6qn7wOgpa6GrHG5AGSNy6Wl3pp6e+q3f87fv3cWm95dzhFfuoAV9/2WU6/+qROyw0WgmRaV2O/Vlro6HdhzBjXkYaIVQwZBeWmxN7+gsANI6GxpcqSHfO0DK8kaN5GW+hru/85ixk3rd1wLgBnHnsaMY08D4P3nHybvhDOo3baB15f9kbSsAyi84Y8kpw10B/KIYMzSMu+Um+Ynfe6DX1JR0bo4L68BSF1bVaU9ZGdQQx4m2kMePE1AcntzgyM95KxxVpm4zNHjOezks9n58Woyx4ynqdbqBDbVVpI5+rN5wbva2/jg+X9w7Neu5aW7fs55RfcyafYC1pQ8GnH9YSBQ2GIzkLnL42lp93pbIiVIAaApu6ioyWkRIxU15MHjAZJaG+si3kPuam+ls7V53+ON77zCgdMPY/aJX+aD4mUAfFC8jPyCL39mvdce+gPHX/R9EpKS8Ha0IyKIuPB2OPKdEmoCGfIGrOlxOrAXeXSGxX6gIYvB0wjkeqp2Rvzbv6WummU/sqoM+3q6mXfGN8g7/ktMPuwoHv3JRbz37wfJmTCFi373ac+3qXYPu8o/4LRrfwnAokuv4+5vnUDaqBwu+eNTkT6EcBDIkPdgD+xVNjdXTsnJ6bf0kxIWNFyxH6ghD54GINlTvauuu6uzPTE5JS1SOx49+RB++Pj7n3s9I2cMV93zUr/rZI2byGV3/Hvf87mnn8/c088Pm0YHCGTIldizYrY3NFQtnDIlMooUsOaBK8MkIiELETlXRIyIzBrGureIyGnh0OW3j8tE5K4gi9UCyQDtTfWaScx5pi4t844eoK0R6ACSymtqNGQRWVY5LWAkE6kY8oXAG8A3hrqiMeaXxphXQi9pyOwrEdTasFcNOTrod/pbSUWFAbYAmetrauq7eno6IysrrlFD3g/CbsgikgkcD1yJbcgikisir4nIGhFZJyKLRCRBRB60n38kItfbyz4oIufbj88UkfUi8oaI3CEixfbrS0Tk7yKyUkS2iMgP/PZ/iYissvd1j4gk2K9fLiIbRKTU1heMfXXammr3aM226CDYwF6GQYueRgpjjAeocFrHSCYSPeRzgBeNMRuAehFZAFwEvGSMmQccAazB6u1MMsbMMcbMBR7w34iIpAL3AIuNMScAn53jBbOALwELgSIRSRKR2cAFwPH2vnqAi0UkF3BjGfHpQP4gjqOuV8reHZv0Ax4dBDLkHdhx5GrNjRwRRGR1dlGRZnTaDyJhyBcCj9mPH7OfrwYuF5ElwFxjTDPWT8xDROROETkDa96vP7OALcaYrfbzvpNp/2OM6TTG7MXqzR4InAocCawWkTX280OAY4CVxphaY0wX8HiwgygvLe7Gui03fde61ZXG+PTCc55gt1ADsKOxUePIkUHDFftJWA1ZRMYApwD3icg24MdYPdbXgROB3cAyEfmmMaYBq7e8EvgecF/fzQXZnX+csAdrBokADxlj5tl/ecaYJfYywzHUzUBmR4unq6OlqS7o0kq4yVta5h1otsterOsgoaKmRnvIkeFdpwWMdMLdQz4feNgYc5AxZpoxZgpWmZ0TgRpjzL3A/cACERkLuIwxTwO/ABb02dZ6rB70NPv5BYPY/3LgfBEZDyAio0XkIKwL5yQRGSMiScDXBnk8FdiZxFrqqvVD7jwJwOH9NZRUVPQA27GLnvb4fFp4M/xoD3k/Cfc85AuBpX1eexp4EGgVES/QAnwTmAQ8IJ/WrLvZfyVjTLuIfBd4UUT2Mog33xhTLiI/B162t+sFvmeMeccOl7yNNWf1A6wPdzAqsXvWDXu27x43LW/uINZRwst8Bu6ZbQQO6urpaWrs6Kgdk54+IYK64o2d2UVFGhraT8JqyMaYk/p57Q7gjgFW6dsrxhhzmd/TFcaYWXZdu78A79nLLOmzzhy/x4/TT4zYGPMAfQYOB0HvDQey/cO3N8887otDXF0JA4Eyv32m6Kkaclh5w2kBscBIy2VxtT049zGQjTXrImKUlxa3Yw/sVVas2dvZ1twYyf0r/TKo3Mi7PB7tvYWXZ50WEAuMKEM2xtxuD87lG2MuNsY4kSVnNXAAQP2urXqbqPPMXVrmHSjcVI39i2ajFj0NG8aYTuAFp3XEAiPKkKOEj7FnfOz5pGyjw1oUSMOaEvk5/IueflhZWW206mm4WJ5dVNTstIhYQA156GzDnk616d3lW309PTp67zxBi556tOhp2BCRfzmtIVZQQx4i5aXFXcBa4IDOtmZvU+2ebQ5LUgZZ9LRWcyOHHGOMD3jOaR2xghry8HgPOwF6zdZPNGzhPIMretrUpIYcet7MLirS3C4hQg15eGzEjiNv++ANHdhznkEVPd1cX68DeyFGRJ5xWkMsoYY8DMpLi+uwPuiZVRs/quto8Whs0lkOWFrmPai/hpKKijagHkhdW1mpPeTQo/HjEKKGPHzeAXIAKivWfuiwFmUQRU/3NDW1tnV16WyAEOEz5t3soqLtTuuIJdSQh0859vlbt/yZMuPT7G8OM7iip21t2ksOES6RO53WEGuoIQ+frVilgtIb9mxrbqjcscFpQXHO4IqeNjVpHDkEdPt8dcCTTuuINdSQh0l5aXEP8CIwFmDLeys/X4VUiSTBip66ALZrbuSQIHB3dlFRl9M6Yg015P1jtf3f9Unp85u62lv7JtVXIsfkpWXeMQO0eYBWIOnj6mo15P3EZ0x3gsv1V6d1xCJqyPtBeWlxA/A+MM74fGbP+jUfOK0pzum3l+xX9HTUhtraBi16un90+3z/zi4q0tBPGFBD3n9WAKmgg3tRwGCLnmoveT9ITkj4X6c1xCpqyPtPBfbgXv2uLU2NVTv1zj3nCGTIu3ofVDU3a+9umHT19KzNLip6x2kdsYoa8n5iD+69hD24t+nd5asDr6GEkWADe4AWPd0fEl0u7R2HETXk0NBbTsr1Senzm1rqa3Y7qiZ+mbm0zJsxQNunRU9ra9WQh0FXd/d2l8gjTuuIZdSQQ0B5aXE99uAewMcrnn3VWUVxi4uBi576sFKnZq6trNSip8PAwI3ZRUXdTuuIZdSQQ0cxVrJ0qXj9hS1NtZV6S6kzBEo0tAHI9Pp8vsb2ds1QNgRaOjvXjb/11iec1hHrqCGHiPLS4u1Y1Y8nAHz036e0l+wMgeLI27Cri1e3tOjA3hCwK74rYUYNObQ8CyQDrs2rXt3RWLVzs9OC4hAtehpimjo6Xp3461+/7rSOeEANOYSUlxbvwSqHPgHgwxcf115y5Jm7tMybOECbFj0dIj5jfMmJido7jhBqyKGnGEgEEraveXNP/a6t650WFGekALP7ayipqPACu4GMNVr0dFA0dXQ8euCtt1Y4rSNeUEMOMeWlxdVYd+9NACh74Z8rjNG79yJM0KKnzZ2d3iYtehqQ7p6ejrSkpP9xWkc8oYYcHl7AOreJu8vfr6nauE4zwUWWYIacDFDb2qphiwA0dXbeNv7WW3U2SgRRQw4Ddomnl4FcgDf/eccrXR1tLc6qiisCTX37tOipDuwNSGN7+6bR6em3OK0j3lBDDh8lQAeQ3uap6yxf8VyJ04LiiGCGrEVPA9Dt8/XsaWq6OLuoyOe0lnhDDTlMlJcWNwHLsGPJa196vLxhz3ZNPBQZcpaWeQ/ur8EueloHpH2oRU/7ZWdj4z1fuPvuVcGXVEKNGnJ4WQWsAw4EeOvRO4u7vV2aizcyBC16WtXc3Nba1aVFBfyoa23d2u71/sBpHfGKGnIYKS8t9mH1kpOB5Lqdm5s2vFHyksOy4oVAhlyBdZs7e1tbtZds09XT491SX3/eF+6+W/N8OIQacpgpLy2uxCoGORHgvWcfLGuo1NBFBAiWitMHUNXcrIZss76m5rbT7ruvzGkd8YwacmR4BauE0DiAN5b96fnurs4OZyXFPIEG9vYVPd3a0KADe8DOxsb3f7NixS+d1hHvqCFHgPLS4m7gPqxST8kNe7Y1ryl59Bm9USysTFpa5h03QFsTVtHTZC16Cg3t7fXrqqu/YtceVBxEDTlC2HkuHgcmAZSveHbjtrI3VjirKuYJVvQ0c+PevY2d3d1x+2ul3evtXL5p09e/8cgje5zWoqghR5rlwBpsU3592e2v1e/aorkuwkewgb0MiN+ipz0+n++F9euLrnzqqeVOa1Es1JAjiF1/7z6gARiDMSy/59Z/tTc37nVYWqyiRU8DsGLLloef/Oij3zutQ/kUNeQIU15a3AzcgRVPTm9vbux6/eE/Pqbzk8OCFj0dgPd37379vlWrrrVLWylRghqyA5SXFu8C/oZ1F19C1caP6ta88MjTOsgXcg5dWubNHKCtDvACCevjrOjp5rq6TXe99dbZJRUV2gmIMtSQHaK8tPh9rAojU0EH+cJEsKKn24HMdVVVe7t9vrgo3lnT0lL3+IcfFj6zbl2D01qUz6OG7Cz/ps8gX+22io+clRRzBApbxFXR04b29uZ/ffzxhbeVlmrC+ShFDdlB+hvke+nOn/+rbsemcoelxRKBDHk7cVL0tK6tzXPPu+9ec+MLL/zXaS3KwKghO4zfIF8ikOPr6TYld/z0aS39FDIGdQt1LBc93dva2vjH11//2Zo9ex5zWosSGDXkKMAe5Ps9kA5k+bq9vhfv+OlTDXu2b3BYWiwwZ2mZN2mAtt4whWyorY3JHnJNS0vDH1577abNdXV/1Tvxoh815CihvLR4C/A7YBQwqruro+fFP9/8RGPVzs0OSxvpJAP5/TXYRU/3YBc99cXYNJfq5ub6P7z22o3bGhru1eltIwM15CiivLR4E1ZPORvI9Ha295T86abHPNW7tzosbaQTdGCvtauru6mjoy5SgsJNZVPT3t+/9tr1Oxob71czHjmoIUcZ5aXFG4D/BQ4AMrwdbd0v/vnmR5tqK7c7LG0kEyjz22ZirOjpbo+n9nelpdft8niWaZhiZKGGHIWUlxZ/AvwRGAukd7Y1e0v+dNM/daBv2ATqIe8rerq7qWnED+x9uGfP5luXL/9OZXPzI2rGIw815CilvLT4Y+B2rBzKGZ2tTd7i//3R4zvXrX7TYWkjkXlLy7wyQFsV9tS3zXV1I7aH7DPG91x5+erfrlz5w8aOjmfUjEcmashRTHlp8VosU84BRmMMK+77zSsfv/rvZ32+Hi2zM3iygEP6ayipqGjHmm2R9uGePSOyh9zu9bbf9dZbLz2yZs3/AC+oGY9c1JCjHNuUfwX0YFewfv+5h9a88/hfH+7u6mh3VNzIImjR05rW1vbWri5PpASFgpqWlr3uV1559K3t2/9fSUXFG2rGIxs15BFAeWnxduAWrClaUwHZ9O7yHa/89ZZ7NXXnoAk20yIdRlbR03VVVZtvfvHFO7c1NNxQUlGh0yNjADXkEUJ5aXED1jzlVcDBQGLN1k8a/vOHG+7TucqDIliNvR6AyhFQ9LTH5+v5zyefvPerV1+9pbWra2lJRYUmCooR1JBHEOWlxR3A/wFPY/WU09o8dZ3Fv/+ff27/8J3XjfHpz9WBCXYLtVX0tL4+qgf2qltaqn716qvPLysruwFYVlJR0eW0JiV0SIzdnBQ35BcULgSuAZqBRoDpC0+ZetQ5l381JT0z21Fx0cuEm+YnVfd9cXFengB/Blqmjx6d9uszzrgu8tIC0+3zdS/ftOn9h99/f3WPMX8uqajY5LQmJfRoD3mEUl5avAr4NdYc2imAbF716o7nb7vur5rCc0ACFT3dDIzaXF/v6ezujqrB0j1NTbuKXn65+IH33nuwx5ifqxnHLmrIIxg7/8UvgXeBadghjJI/3fTMmhceecLb2d7qqMDoI1jR03SAuigpeurt6el6rrz87R8VF/9rc329G7inpKJiRM0CUYZGotMClP2jvLS4Jb+g8F7gQ+AKrDwYVWtffvKTLe+/vm3RpdcvHjdt5lxnVUYNgQx5d++DqubmqolZWQdHQM+AbG9o2PaXt99+f0dj43+Ap0oqKpqd1KNEBjXkGKC8tNgA7+YXFG4BLgPmAJUtdVXtJX/6yTOHnXrux3NOOfeMlIxROY4KdZ5gA3sCsKOxsXLBpEmRUdSH2tbW3U+uXfvRa1u3rscqXlCuc4vjBx3UizHyCwpdwPHAJfZLlQCJyakJR3/1yoUHL1i0KDE5Jc0xgc5igOyb5id9rre5OC/PBdwN7D0iN/eAm08++buRFNbQ3l7zXHn5+yUVFTXAy8C/Syoq2iKpQXEeNeQYJb+gcCzwTeAIrBJRjQAZOWNTF57/7UWTZs9f6EpIjMdfSItump/0Rn8Ni/PyfgLkJrhcnoe+/vWfJrpcYT8/zZ2dDS9WVKz+18cfV/mM+RgrPLEl3PtVopN4/EDGBeWlxXvzCwpvxwpfXIQ16FfT2ri3bcV9v/nvmKmHrlr41atOGXvQzMNFBsq7E5PMB/o1ZKw79mb0+HyNDW1t1eMyM8MWt2jr6mpevnnzqic+/HC31+fbDDwBVGh4Ir5RQ45h7NjyR/kFhb8AjgEuwErpWVW3Y5On5E83/Wvq4ce+Pf+si0/PPnByv8l3YpBAceQd2DOPqltaKsNhyLWtrbvf3LZt3b8//ri2o7t7J/A48JEmkVdADTkuKC8t7gbezC8o/AA4GTgHy3gqd6x9p2rH2neWHXrMqVPzTlh8zOhJB88SlyuWp0MGG9gzADs9nqo5EyaEZIfenp6uDXv3fvSf9es/+WD37h6gDngM+KCkokKz9in70BhyHJJfUJgDnAWcBniBauxcDqMnH5J1+Be/dlRu3rwjk1JS0x2UGS68QOZN85M+d8vx4ry8ROBvwJ7jDjoo9wfHH3/V/uyovq2tetXOnR88vW5dVXNnp2Cd52LgXbuen6J8BjXkOCa/oDAXWIw1K0OAWqAdICklLeHwL3197rQFJyzMyBmb66DMcDD/pvlJa/prWJyXtwQYlZGc3H7veef91DXEAHtHd3fb1vr6ipc3bCh/e8cOL9YX3TvASmCzxoiVQKghK7095mOBM7GqXjcD9b3thxx90pRZJ5y5cPTkg2fFyMyMK26an/RAfw2L8/IuBk4EKv927rnfzUlLGxdsYw1tbdUb6+o2rNq5c9Nb27d3+oxJxQpLlACrSyoqmkKqXolZYuHDpewn5aXFjcCL+QWFy7FmZZwFTMcOZ2xZvXLnltUrd6ZkZCXlHf+lQyflHzn7gEnTZiQmpaQ6qXs/mA/0a8jAFqxQDjWtrZX9GXK3z9e9p6lpyyc1NRtXbt68ZWtDQyKQihV//gh4FdigA3XKUFFDVvZRXlrsBcryCwrXYCUsOhk4Aes6aelsbWpY+/KTn6x9+clPEpKSXYcee9q0qXMWzhozdcas5LT0UU5qHyLBBvZ6AHZ7PFUzx449vLO7u722tXXPbo9nd0Vt7c6VW7bUtnm9o7DCPMlAGfAesFF7w8r+oCELJSD5BYVZQD5wHHAYlgl1Yv0k7wZAhGnzj580bf4Js8ZMPuSQ9OwxE6J8pkYz1h17n7v4F+flpWLdsbdzYlZW+qiUlOSK2tpmrLp8o7B6wbXA28DHwDYdoFNChRqyMmjyCwozgJnAQmABVs+5dxpXZ+9yyWkZiVOP+MKkCdMPm5IzcdrkzNHjcpPTMrIcET0wM2+an7Sxv4bFeXm3YdUv7MEyYC/WTSOrsbLC1ergnBIO1JCVYZFfUJiMFWc+Euumk3Q+7T03Yc/W6GXU2Nz0ibPnTxg75dDcUeNyx6VkjMpKSc/MSkpNz0pITEoKl84er7fL29nW3Oap39uwe1s7wsvTjz75YWD9TfOT+p0DvDgv7xhgErAVKwvcXo0HK5FADVnZb/ILChOwepRTgNlYA4MHYPUuBcucW+hj0r1k5IxNzZl4UNaosRNGZY4en5WeNTordVT2KFdCYiIiIuISERHrsfVC72Nfd3d3Z3tLS2dLc2tHi6elram+ta1hb0tzfXVrU/Xulq721m6snnwGMB54r7y0+PbwnxVFGTpqyErIyS8oFKyYay4wESvMMR3rtm0fnxq1CysO3en3N5QacYJltsl+f0lAQp/9eLFui96MZchaFFaJStSQlYiRX1CYgmXUvQNkWcA44ED7/2j7tUAXpX+bC2uAzoM1b7oBK57dALRi9cr3Ao12Xg9FiWrUkJWowg5/JGCZrQT48wFt5aXFmgtCiRnUkBVFUaKEaJ4rqiiKEleoISuKokQJasiKoihRghqyoihKlKCGrCiKEiWoISuKokQJasiKoihRghqyoihKlKCGrCiKEiWoISuKokQJasiKoihRghqyoihKlKCGrCiKEiWoISuKokQJasiKoihRghqyoihKlKCGrCiKEiX8f2RxgMJwgJqHAAAAAElFTkSuQmCC\n",
+      "text/plain": [
+       "<Figure size 432x288 with 1 Axes>"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "# -----------------\n",
+    "# Visualize results\n",
+    "# -----------------\n",
+    "''' CHANGE TO STACKED BAR; SHOW WHAT ASSIGNMENTS CAME FROM WHERE '''\n",
+    "\n",
+    "# Pie for percentage of rows assigned; https://pythonspot.com/matplotlib-pie-chart/\n",
+    "totCount = len(logAfterGoldStandard)\n",
+    "unassigned = logAfterGoldStandard['SemanticGroup'].isnull().sum()\n",
+    "assigned = totCount - unassigned\n",
+    "labels = ['Assigned', 'Unassigned']\n",
+    "sizes = [assigned, unassigned]\n",
+    "colors = ['lightskyblue', 'lightcoral']\n",
+    "explode = (0.1, 0)  # explode 1st slice\n",
+    "plt.pie(sizes, explode=explode, labels=labels, colors=colors,\n",
+    "        autopct='%1.f%%', shadow=True, startangle=100)\n",
+    "plt.axis('equal')\n",
+    "plt.title(\"Status after 'GoldStandard' processing\")\n",
+    "plt.show()\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAmsAAAGDCAYAAAB0s1eWAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvIxREBQAAIABJREFUeJzs3Xl8VNX5x/HPNyyGsMuioGIsssWFEAErWoVWUVwqCq5gQVu3H4q41PantXXrr9oqtCJVcKNWVMBd1KKtIlJFdoFghFZkEQRRQDBATPL8/rgncRgmG9tMyPN+veaVmXPPPee5Z25mnpy7RGaGc84555xLTWnJDsA555xzzpXNkzXnnHPOuRTmyZpzzjnnXArzZM0555xzLoV5suacc845l8I8WXPOOeecS2GerDnnyiXJJPVPdhw7Q1JmiL9rsmMBkLRQ0u272EaGpOckbQzblrlbgksSST3DdjQvp05/SSl9nylJD0qasofaTvnt31skfSbppmTHsbd5subcbiDpAEl/kfRfSdskfS7pDUmnV6GNwZI278k4d1Ir4NVkB7GTVhDFPy/ZgexGlwEnAicQbduKPf0FFvbNKXFlDSXdERLQfElfS5ot6ZbyEq/dFE8tSb+S9HHoe72kWZKGxtQZK2nSnowjWRK9HzVIN+CvyQ5ib6ud7ACcq+7CzMa/gU3A/wIfEf0h9BPgYaBNsmLbFZLqmlmBmX2R7Fh2lpkVAdU2/jIcDnxsZgtKCiTtloZL3vNK1GsKvAc0AX4HzAK2htjOJ0oo/7hbgkrsd8D/ANcAM4AGQBeq6e9aCUl1zOy7mtp/ZZjZl8mOISnMzB/+8McuPIDXgVVAgwTLmsY8vwGYD3wLfA48CjQJy3oCFve4PSyrC9wLrAzrzgROjevnDOAToi/MqcCFoY3MmDrnAguAbUQzTrcCiln+GXA78DiwAZgYyg3oH1PvIOBZYH14vAa0i1l+CPAy8DWQD+QBF5Yzft2AN4F1wDfANOC4uDpXAovD9n0JTAZqh2VHAf8K624iSpZ7hWWZIf6ulR0rYDCwmSjZXhjG/B3gsLiYzgJmh3aWAr8H6sYsbxnGYQuwjCiBWVjyvpYxFm3DOl+EfucAZ8YsnxK3j0xJUGYx9XsA74b34XPgIaBRXHsPAfeFcZ1ZRlyDgSkxrx8KY3RQGfVj96umwN/CvrIF+CdwRMzyniHu5jFlPwtjlg9MAobEbdc84O5yxvH2+DEBeoZl94T3fwvRPv9HID1u3YVhv/gv0T71Ulx8tcKYlfwO/DmMSewYnUaU0K4n+l2YDHSKWZ4Z4roIeDvEc00lt3+79yPB9pe0fTHR79NWot/D3gnG/XSihLeAsK8R/b79J5T9B7g8rv1GYXtXh7Y/Bi6own53IjA97EMbgQ+BI8OyxsDfgbWh7U+BYXGfUzfFvDbgCmAi0e/Mp8DAuHiPJfpd2grMDdtcuk9Uh0fSA/CHP6rzA9gfKAZuqUTdYcCPwwfpSUSJ29/DsrrAdeHD5sDwaBCWjQsfbCcCPyCaTSgAOoflbYgSsOFAB6A/sJztE5BjgCLgDqA9MCB8UF4bE99nRAnPzUQzJO1CeWmyBmQQJU1jgaOBjkRJ5zIgI9R5FXgL6AwcRvSldVo54/Jj4BKgU2jvQaIvuOZheVegMMR8aGj3er5P1hYAT4V1DwfOISR7xCVrlRyrwcB3RElF97Cdc4HJMTGfGsbqUqIEqxdRAnBfTJ3XgVzgeKJZnylhzG8vZyw6A1cRJaCHEyXUBUDHmP3tceB9on1k//BYEd7bA4EDQ92jQn83Au2IvrA+AJ6L6W8KUTJyfxi/TmXENZiQHBDNGq8HHq7k78jLRInCiSGmV0K89cLynsQkayHO4rDt7YkSh6/YPln5B1ESckAZfTYAxhPthyW/T3XDstvCe5JJ9KW9HLgrZt3bw7i9GN7744j279ExdW4mSjLOD+M2MuwPU2Lq9AuPdqGdCUSJT0kcmWG7PyPaDw8DDq7k9g+mcsnayrgYtxAS7JhxXwD0JvpsaUH0+/Md0edMe+Da8PqssJ6IjiQsIvrd/gHQBzinMvsd0RG99UTJbtsQ28WEfS/EOY/ody8zxHle3OdUfLK2EhhI9DvzB6LfmUNj9oUvgaeBI4BTiH4vPVnzhz9qyiN8oFjJB1UV1z2NKHFIC68HA5vj6rQNH9xt4spfAv4anv+B6C/b2NmMW9g+ARkHvB3Xxu3AypjXnwGvJogzNlm7DFgS11ctoi+T88Pr+cDvdmFMRfQX+8Dw+lyiL8aGZdT/BhhUxrJMtk/WKjNWg8PrDjF1BoQvgJL3aipwW1xffYm+pET0JWfA8THLDyVKmG+v4nhMB34T8/pB4r6o47/AQtmTwGNxZdkhrpbh9RRgfhXjOSC0cX1c+fth+zcDb4SydqHuiTH1Gof38xfhdU+2T9aeBt6Ka/tRtk9WsoiShWKiL95Hw34S+76OBSZVYnuuAv4T93uxFWgcU3ZrXJ1VwK0xr9OI/oiZUk4/9cP7f0LcvnljXL0Kt78S21TSdqIY744b935x6/4beDyubCwwLTw/JYx7WYl9ufsd0R8XBpxUxvqvAE+Us23b7euhrT/EvK5NNKNX8vlxJdHMZr2YOhdTzZI1v8DAuV1T6ZOFJP1Y0luSVkraBLxANKN2YDmr5YQ+FknaXPIgOpTXNtTpSHT4ymLW+zCunU5EH8KxpgEHSWoUUzargs04hmgGYFNMLBuJDnWVxPMX4DeSPpB0t6RjymtQUktJoyUtlrSRaKanJd+ff/QW0czGUknjJA2S1DCmieHAo5LelnSrpI7ldFeZsQLYZmafxLxeBdQhOkerZBxujXtPnib6Qj6QaLyLiQ4vAWBmy0I75Y1FfUl/lLQonDS/mWhmcWfOxToGGBgXY8k+0Dam3uydaDuRC4i+lF8E6oWyknH4oKSSmW0kms3JKqOdTrH1g+1em9ki4EiiWZtHgWZEM1evSSr3ey1cWTlN0hdhTEaw4/guC3GWWEW0TyKpMdGFHbHbVEzcfiSpraSnw0VH3wBriBKm+L7if+cq3P4qSBRj/Lgn6j/RZ0XJel2A1Wb2cRl9lrvfmdnXRMnfZEmvSbpB0iEx6z8EnC/pI0n3STqp4s1kfskTMyskmklrGYo6AgvNbEtM/US/8ynNkzXnds0Sor/QOpVXSdKhROd2fQycR/SBdllYXLecVdNC+92IvghLHp1i1leoU24I5dSJLf+2gnbSiA5RZMc92gOjAczsMaKE7olQ/n4Ft6v4G9H2XU90rks20WGNuqG9TURJ6/lEh6z+F8iT1Dosv53oi+SlsP58SZeRWGXGCqLDrrFK1kmL+XkH24/B0UQzSV9ShSQ+zn1E+8dtRIfKs4kSvvL2kbKkESUysTF2DjHGXh1b0Xse70uicxq3S4rNbIWZ/YcoeS9R3jiU9T5UauzMrNjMZprZCDM7h2hGtA/R4dbEDUs/JDrfcjLROYddgN8QJeKx4k+yN6r+ffkq0WHFK4mSyi5E+1X8exk//rvnapHKS/T+J3pvSsoqiq/C/c7MLiUak6nAT4HFkk4Ny94gmoW+D2hOlIA/UUGf5b1flf2dT2merDm3C8JfiZOBayQ1iF8uqWQmpivRh/T1ZvaBmS0GWsdVLyA6pBhrLtGHzYFm9p+4x+ehzsdEyU6s7nGvFxHd6iHWCUSHQTeVv5XbmUN0Xsi6BPF8XVLJzFaa2RgzOx/4LdEJwGU5ARhpZq+ZWS7RzFqr2ApmVmhmb5vZ/xIlRfWBM2OWLzGzB8zsDOAx4Bdl9FWZsaqMOUTnkcWPwX/CX/YfE32+lvYlqQ07vufxTgCeNLPnzWw+UdLatoJ1IPG+M4foRP5EMW5J0EalhBma8USzJ4dUUH0R0TgcV1IQZnKPCsvKWueHcWXxr8taD6JzlCDxmBwPfG5md4VEbwlRYlBpYcZtdWxMii7H7R7zuhnRH1T/Z2b/DLNQDancHRh2dvsTSRRjWTNiJT4m8WdFyfjOAVpJKusP1Ertd2b2kZnda2Y9iQ7HD4pZts7M/m5mg4GfA4Mk7VdB3OVtz1GS6sWU7czvfFJ5subcrvsfooRqlqTzJHWQ1FHS1Xw/Pb+E6PdtmKTDJF1EdMFBrM+AdEmnSGouKSMkdeOAseHwzQ8kdZV0k6Rzw3oPA23DIYMOofzKsKzkL8r7gZMk3S6pvaQBRCcAV/X2CuOIDue8LOmksC0nSrpfUjsARfebOy3Emk10bl5ZX8wQnUczUFKWpG5EMx+lt4+QdKak6yR1CTOUFxN98X0sqZ6kUYpurJop6Vi2/2KJV5mxqow7gYsl3SnpyPB+95f0R4BwCPUfwGhJx4VxGEt0gnd5FgPnSMqRdBTRhRPplYjnM+BHkg7S9/c4uxfoLunhMHaHh7EcXYXtLMstRLOc0yX9QlLncNjvp0TnQhVBlEQTXWAwWtKPYrbpG6LDxok8AJws6X8ltZN0OdFJ76UU3RT4eknHSjpUUk9gFNEVhO/HjMmR4X1uLqkO0fgeJGlA2D+vJroas6r+Atwc3vMORFeDxv6BsZ7o6ubLw7ifRLTvxc/Y7tT2V8HVcTEeSnSYsTx/Ai6RNCT0fy3ROZslnxX/IjqM+LykU8NnwCmS+obl5e53of49knqE964X0R9gi8LyOyX1DX13IjoX8VMz27aTYzCOaH98JHzGnEy0/0J1mnFL9klz/vDHvvAg+qAeSXTZ+Daic1zeAPrE1BlKdBn7FqIPvPPZ8fYaDxF9yBvf37qjDtFJz58SJTFfEJ2Ee0zMemfy/a0t3iO6StGIuVqO72/dUUDZt+64KcG2lV5gEF4fQHSIc23Y1qVEVyiWnCA+kig5LbnNxrOUcYuHUL8z0Yf/FqJbJVxCzC0uiJKvd4guYtgSll0altUl+tJfFjPuYwi3CSDxrTvKHSsSX+jRkx1vL9E7rJ9PlHzMItx6IWacXgkxryCa7avo1h2HEl2F+i3RrNpNRLduGBtTJ9EFBj8kumXJVrY/Eb8rUdL4TWhzAXBnzPIpwIM7uc83Au4m+pLdEh7zgf8jXMAQ6u3MrTsuJUoGtxD9Hl0Tt12Xh3bW8P2taJ6Na7cF0S1hNrH9rTv+QLRfbiY6b/TquLZvJzrHKXZbt9sniGbIRhAdDt5AtM/H37rjx+H93hp+nhr6HFzWvlnZ7a/Ee1PS9gCi5HUr0dXKsZ9HO4x7zLKriK5c/Y7Et+5oAjwSxnFr2AfOr8x+R/R78QLRZ+G2sJ1/BOqE5bcSXTSST3RhwOtsf8uTz9jxAoP+cfHF1/kh0VGKbeFnv7DesTuz7yfjobAhzrl9iKTriGZ/mlp02MqVwcfK7WsU3ah7KdDNzCq6aKjGkXQ20YUwLc1sXbLjqQz/DwbO7QMkDSG6We6XRH9F3kY0G+PJRxwfK+dqFkmDiI5MrCC6ivjPRLcpqhaJGniy5ty+4nCi8zCaER0+e5hotsjtyMfKuZrlAKKrt1sRnUbyGvCrpEZURX4Y1DnnnHMuhfnVoM4555xzKcyTNeecc865FObnrLlqr3nz5paZmZnsMJxzzrkqmT179joza1FRPU/WXLWXmZnJrFl+dbpzzrnqRdKyytTzw6DOOeeccynMkzXnnHPOuRTmyZpzzjnnXArzc9Zctbd8aT5DBsxJdhjOOef2cb++pwXjx48nLS2NtLQ0Bg4cSPPmzQGYNGkSM2fO5I477gBg7NixrF+/nm3bttGtWzd+8pOfsHr1ap599lkACgsLueyyy7Ir068nazWIpCKif6hbBygk+ufKfzazYkldgZ+Z2dDd3OftRP8A+b7d2a5zzjm3tzVq1IhrrrmG9PR0Fi5cyKRJkxg8eDDffPMNa9eu3a7uwIEDqV27NkVFRdx5550cf/zxtGrViuuvvx6A2bNnM3ny5K8r068fBq1ZtphZtpkdAZwCnA78DsDMZu1qoiap1q4GKMn/gHDOOZeSGjduTHp6OgC1a9cmLS1Ko9544w169+69Xd3ataOvs++++47999+funXrbrd8xowZLF682JM1VzYzWwtcAVyjSE9JkwAknSRpXnjMldQw1PmTpIWSFki6INTtKekdSU8Tzdoh6VZJn0j6J9ChpE9JbSX9Q9JsSe9J6hjKx0oaLukd4N5E/e/l4XHOOefKtG3bNl555RVOOeUU1q5dy7Zt2zj44IN3qPfII4/w29/+lrZt25YmdgCbN29mzZo1rFixYnNl+vNZjBrMzD6VlAa0jFt0EzDEzP4tqQGwFTgXyAY6A82BmZKmhvrdgSPNbKmkY4ALgS5E+9ccYHaoNwa4ysyWSDoW+Cvw47CsPXCymRVJejVB/84551zSFRUV8dhjj9G7d29atWrFE088wZlnnpmw7uWXX05BQQHDhw/nmGOOoVWrVkB0CLRLly6V7tNn1pwSlP0bGC5pKNDEzAqBE4BnzKzIzNYA7wLdQv0ZZrY0PP8R8KKZ5ZvZN8ArACHp6gFMlDQPGA20iulzopkVldP/9kFLV0iaJWnWlq3rd2HznXPOucopLi7miSeeoHPnzmRnR9cGrFu3jvHjx/Pggw+yceNGJkyYgJlRWBh9ddWuXZs6depQp06d0nZmzpxJ9+7dK92vz6zVYJJ+ABQBa4FOJeVmdo+k14jOaZsu6WQSJ3Ulvo17bQnqpAEbzKysK19K20jUv5nlbdeB2RiimTpaNstK1J9zzjm3W82bN4/c3Fw2bdrEjBkzaN26Nb/85S9Ll//ud7/j/PPPp6ioiJEjRwLRVZ/HHHNM6VWj69ato7CwsHSWrTI8WauhJLUAHgYeNDOTFLusrZktABZIOg7oCEwFrpT0N2B/4ETgl2FZrKnAWEn3EO1fZwGjzewbSUslnWdmExV1eLSZfZQgtkT958XXc8455/amnJwccnJyylxectuOWrVqlV71Ga958+b8+te/rlK/nqzVLPXCIciSW3f8HRieoN4wSb2IZt0WAW8ABcBxwEdEM2c3m9kXJRcJlDCzOZLGA/OAZcB7MYsHAA9J+k2I4dnQXmX6d84552okmfkRJFe9tWyWZeed9lSyw3DOObePGzWu7Fm1nSFptpl1raiez6y5aq/NYRm7/RfIOeecSxV+NahzzjnnXArzZM0555xzLoV5suacc845l8I8WXPOOeecS2GerDnnnHPOpTBP1pxzzjnnUpgna84555xzKcyTNeecc865FObJmnPOOedcCvP/YOCqveVL8xkyYE6yw3DO1TCjxuUwcuRIVqxYQa9evejTpw8A06dP58MPP6S4uJgTTjiBbt26sWbNGp5++mkADjnkEPr164ckNm/ezLPPPsvmzZtJS0tj6NChydwkl6I8WXPOOed20sCBA8nLy2PDhg0ArFq1iry8PIYOHYqk0novvvgiffv25bDDDuOZZ54hLy+PTp06MXHiRE4//XRat26drE1w1YAfBgUkHSjpWUn/lbRI0uuS2kvqKWnSHurz/Z1c73ZJN+3ueMroK+H2h/KNkuZK+kTSVEln7o2YnHMulTRt2nS713PnzmW//fZj5MiRjB49mvXr1wOwdu1a2rRpA0BmZiaLFy+muLiY1atX869//Yvhw4fz7rvv7vX4XfVQ45M1RX/6vAhMMbO2ZpYF3AIcsCf7NbMee7L9veA9M+tiZh2AocCDkn4SX0mSz94652qMjRs3snnzZq699lp69OjBCy+8AEDr1q1ZtGgRZkZubi7ffvstmzZt4vPPP6dXr14MHTqUWbNmsXr16iRvgUtFNT5ZA3oB35nZwyUFZjbPzN4LLxtIek5SnqRxIblD0jGS3pU0W9JkSa1C+RRJI8Js08eSukl6QdISSXeX9CFpc8zzmyUtkPSRpHtC2eWSZoay5yVlxAcuaWiYCZwv6dkEyzMlvSdpTnj0COU9Q5yJtuu0UDYNOLcyA2hm84A7gWtCG2MlDZf0DnBv/GygpIWSMsPz20J/b0l6pqReRdvmnHOpKCMjg6ysLCSRlZXFqlWrAOjXrx/vv/8+DzzwABkZGTRp0oSMjAwaN27MwQcfTO3atWnXrl1pfediebIGRwKzy1neBRgGZAE/AI6XVAcYCfQ3s2OAx4Hfx6xTYGYnAg8DLwNDQj+DJTWLbVxSH6AvcKyZdQb+GBa9YGbdQtnHwM8TxPZroIuZHQ1clWD5WuAUM8sBLgAeqGC70oFHgLOAHwEHljMu8eYAHWNetwdONrMby1pBUlegX4jlXKBrZbdN0hWSZkmatWXr+iqE6Zxze0779u1ZtmwZAMuXL6d58+ZAdLj0yiuvZOjQoRQUFJCdnU2dOnVo3rw5X3/9dWn9Fi1aJC12l7r8EFXFZpjZSgBJ84BMYANR8vVWmJCqBcTOXb8Sfi4Acs1sdVj/U+AQ4KuYuicDT5hZPoCZfR3KjwwzcU2ABsDkBLHNB8ZJegl4KcHyOkSHJ7OBIqIEqrzt2gwsNbMlofwp4IqyBiaO4l5PNLOiCtY5AXjZzLaE/l6NWVbutpnZGGAMQMtmWVbJGJ1zbrcaN24cn376KYWFhSxbtowrr7ySRYsWMWLECMyMiy++GICZM2cybdo0JHHssceWXlBw3nnnMXbsWIqKiujQoUPpeW3OxfJkDXKB/uUs3xbzvIhozESUhB1XwTrFcesXs+OYC0iUbIwF+prZR5IGAz0T1DkDOBH4KXCbpCPMrDBm+fXAGqAz0Szq1gq2izJiqYwuRDOAJb6NeV7I9rO46eFnfIIXq6Jtc865pBswYMAOZf377/iV0q1bN7p167ZD+SGHHMINN9ywR2Jz+w4/DApvA/tJurykIJxndlI563wCtJB0XKhfR9IRO9n/m8BlJeekSdo/lDcEVodDrjt8GkhKAw4xs3eAm/l+Bi5WY2C1mRUDlxDNAJYnDzhMUtvw+qLKbICko4HbgFFlVPkMyAl1c4DDQvk04CxJ6ZIaECVold0255xzrkao8TNrZmaSzgH+LOnXRLNPnxGdz3VQGesUSOoPPCCpMdE4/plolq6q/f8jHKacJakAeJ3oatTbgA+BZUSHUxvGrVoLeCr0L2CEmW2Iq/NX4HlJ5wHvsP1sV6JYtkq6AnhN0jqiZOrIMqr/SNJcIIPo3LihZvavMuo+D/wsHG6dCSwO/c2U9ArwUdjOWcDGSm6bc845VyPIzE/3cckjqYGZbQ4zi1OBK8ysSv+OoGvXrjZr1qw9E6Bzzjm3h0iabWZdK6pX42fWXNKNkZRFdB7b36qaqDnnnHP7Ok/WXFKZ2cXJjsE555xLZX6BgXPOOedcCvNkzTnnnHMuhXmy5pxzzjmXwjxZc84555xLYZ6sOeecc86lME/WnHPOOedSmCdrzjnnnHMpzO+z5pxz1dTIkSNZsWIFvXr1ok+fPnz55Zc8+uijrF27liFDhnD44YcDMHHiRJYuXQpA586dOfXUUwFYt24dEyZMoKCggCZNmjB48OBkbYpzrhz+76ZctdeyWZadd9pTyQ7Dub1q1Lgc1q9fT15eHhs2bKBPnz4UFBTw3Xff8fzzz9OjR4/SZG3t2rW0bNmS4uJi7r//fgYPHkyLFi0YNWoUAwcOpHHjxkneGudqpsr+uyk/DFoJkg6U9Kyk/0paJOl1Se2THNMte7m/seGf11e2vKekSXsnOudqpqZNm273um7dutSvX3+Hei1btgQgLS2t9PHVV19RUFDAxIkTGTFiBHPnzt0rMTvnqs6TtQpIEvAiMMXM2ppZFnALcEByI2OvJmvOuervww8/pHnz5jRr1oyNGzeyYsUK+vfvz1VXXcWkSZPIz89PdojOuQQ8WatYL+A7M3u4pMDM5pnZe4r8SdJCSQskXQCls0pTJD0nKU/SuJD0IambpPclfSRphqSGkmqFdmZKmi/pyph2pkp6MczoPSwpTdI9QD1J80Lb9SW9FtpcWBJHLEmXh/Y/kvS8pIxQPlbSAyGmT0tmycK2PRj6fQ1oWdFASTotbO804NyY8vqSHg/9z5V0dijPlPSepDnh0SOUtwrbPS9sz4929s1zzkXy8vKYPn06F110EQD169fnoIMOokmTJtSrV4+DDz6YtWvXJjlK51wifoFBxY4EZpex7FwgG+gMNAdmSpoalnUBjgBWAf8Gjpc0AxgPXGBmMyU1ArYAPwc2mlk3SfsB/5b0ZminO5AFLAP+AZxrZr+WdI2ZZQNI6gesMrMzwutEJ6C8YGaPhOV3hz5HhmWtgBOAjsArwHPAOUAH4CiiWcRFwONlDZKkdOAR4MfAf8J2lrgVeNvMLpPUBJgh6Z/AWuAUM9sqqR3wDNAVuBiYbGa/l1QLyCirX+dcxZYuXcqrr77KkCFDqFu3LgAtWrSgoKCArVu3UqdOHb744gv233//JEfqnEvEk7VdcwLwjJkVAWskvQt0A74BZpjZSgBJ84BMYCOw2sxmApjZN2F5b+DomHO/GgPtgILQzqeh3jOhz+fi4lgA3CfpXmCSmb2XINYjQ5LWBGgATI5Z9pKZFQOLJJUc3j0xZttWSXq7grHoCCw1syUh1qeAK8Ky3sBPJd0UXqcDbYgS2QclZQNFQMl5gDOBxyXVCbHNi+9M0hUl7TfIOLCC0JzbN40bN45PP/2UwsJCli1bxqBBgxgzZgxffPEFq1at4sgjj+TMM8/kqaeiC3BGjx4NQL9+/WjTpg19+/Zl1KhRFBUVcfzxx9OoUaNkbo5zrgyerFUsF9jhBPpA5ay3LeZ5EdFYC0h0+a2Aa81s8naFUs8E9XdY38wWSzoGOB34g6Q3zezOuGpjgb5m9pGkwUDPMmKN3aaqXipcVn0B/czsk+0KpduBNUQzk2nAVgAzmyrpROAM4O+S/mRmT27XkdkYYAxEV4NWMU7n9gkDBgzYoey6667boey2225LuH6nTp3o1KnTbo/LObd7+TlrFXsb2E/S5SUF4byzk4CpwAXhnLMWRLNRM8ppKw9oLalbaKehpNpEs1xXh5kkJLWXVHJJV3dJh0lKAy4ApoXy72Lqtwbyzewp4D4gJ0HfDYHVYZ0dP+F3NBW4MGxbK6Jz98qTBxwmqW14fVHMssnAtTHn7XUJ5Y2JZhqLgUuAWmH5ocDacNj2sTK2xznnnKsRfGaCcnrlAAAgAElEQVStAmZmks4B/izp10SzP58Bw4gSmuOAj4hmlW42sy8kdSyjrYJw8v9ISfWIzlc7GXiU6DDpnJDQfAn0Dat9ANxDdO7YVKIrUyGaVZovaQ7wJPAnScXAd8DVCbq/DfiQ6Ny3BUTJW3leJDr/bAGwGHi3vMrhvLMrgNckrSNKKo8Mi+8C/hziFdH4nQn8FXhe0nnAO8C3oX5P4JeSvgM2Az+rIFbnnHNun+U3xU1h4TDoTWZ2ZrJjSWV+U1xXE40a5xPOzlV3quRNcX1mzVV7bQ7L8C8u55xz+yxP1lKYmU0BpiQ5DOecc84lkV9g4JxzzjmXwjxZc84555xLYZ6sOeecc86lME/WnHPOOedSmCdrzjnnnHMpzJM155xzzrkU5smac84551wK82TNOeeccy6F+U1xXbW3fGk+QwbMSXYY+5yS/wpx3XXXkZmZCUD37t3p0aMHEyZMYOXKldSrV49BgwZRv359XnjhBZYtWwbAmjVrOPXUU+nVq1eywnfOuX2GJ2vOuXI1adKE66+/vvR1bm4uBQUF3HjjjUyfPp233nqLvn37cu6555bWufvuu+nSpUsywnXOuX2OHwbdh0gqkjRP0kJJEyVl7MW+p0iq8J/Ruurnm2++Yfjw4YwePZqvvvqKJUuWcNRRRwFw9NFHs2TJku3qL1++nIYNG9KkSZNkhOucc/scn1nbt2wxs2wASeOAq4DhJQslCZCZFScpvlKSaptZYbLjcBW76667aNCgAYsWLeKpp56iefPmZGREfwfUq1eP/Pz87erPmDGD7t27JyNU55zbJ/nM2r7rPeBwSZmSPpb0V2AOcIikiyQtCDNw95asIOk0SXMkfSTpX6GsvqTHJc2UNFfS2aG8nqRnJc2XNB6oF9PO5pjn/SWNDc/HShou6R3g3nLaPkLSjDBLOF9Suz0/XK4sDRo0ACArK4uvv/6a+vXrlyZoW7ZsKU3cAIqLi5k/fz7Z2dlJidU55/ZFPrO2D5JUG+gD/CMUdQAuNbP/kdQauBc4BlgPvCmpL/Bv4BHgRDNbKmn/sO6twNtmdpmkJsAMSf8ErgTyzexoSUcTJYKV0R442cyKJP1fGW1fBfzFzMZJqgvUSrCNVwBXADTIOLBK4+Mqb+vWrdStW5e0tDRWrlxJ/fr1adeuHfPmzSM7O5vc3Fzatfs+l87Ly6NNmzbUq1evnFadc85VhSdr+5Z6kuaF5+8BjwGtgWVmNj2UdwOmmNmXUHq49ESgCJhqZksBzOzrUL838FNJN4XX6UCbsM4Doe58SfMrGeNEMyuqoO0PgFslHQy8YGZL4hsxszHAGICWzbKskn27Kvriiy94+umnSU9PB+Diiy+mdevWLFiwgPvvv5/09HQGDRpUWt8PgTrn3O7nydq+pfSctRLRaWp8G1tUxroCEiU9AvqZ2ScJ2i0rSYotT49bFh/LDm0DH0v6EDgDmCzpF2b2dhl9uT0oMzOTW265ZYfyCy+8MGH9wYMH7+GInHOu5vFz1mqeD4GTJDWXVAu4CHiXaDbrJEmHAcQcBp0MXBsuTkBSyf0YpgIDQtmRwNExfayR1ElSGnBOObEkbFvSD4BPzewB4JW4tp1zzrkaxZO1GsbMVgP/C7wDfATMMbOXw2HRK4AXJH0EjA+r3AXUAeZLWhheAzwENAiHP28GZsR082tgEvA2sLqccMpq+wJgYTik2xF4chc22TnnnKvWZOan+7jqrWvXrjZr1qxkh+Gcc85ViaTZZlbhPUp9Zs0555xzLoV5suacc845l8I8WXPOOeecS2GerDnnnHPOpTBP1pxzzjnnUpgna84555xzKcyTNeecc865FObJmnPOOedcCvNkzTnnnHMuhfk/cnfV3vKl+QwZMCfZYaSkUeNyAFizZg133XUXw4YNQxLPPPMMa9eu5Y477qBp06YAjB07lvXr1wOwcuVKBg0axNFH+79ldc65ZPNkzbka4I033qBdu3YAtG7dmptuuomHHnpouzqDBw8GoLCwkDvuuINOnTrt7TCdc84l4IdBd4GkcySZpI5JjOEqST/bDe1khm25NqbsQUmDd7Xt0NYUSRX+/zO3+3322Wc0atSodAatXr16pKenl1l/wYIFdOjQgTp16uytEJ1zzpXDk7VdcxEwDbgwWQGY2cNm9uRuam4tcJ2kurupvd1Cks8A74I33niD3r17V7r+jBkz6Nat2x6MyDnnXFV4sraTJDUAjgd+TlyyJulmSQskfSTpnlB2uKR/hrI5ktqG8l9KmilpvqQ7Qll9Sa+FugslXRDK75G0KNS9L5TdLumm8Dxb0vSw/EVJTUP5FEn3SpohabGkH5WxWV8C/wIGJdje0pkxSc0lfRaeD5b0kqRXJS2VdI2kGyTNDbHsH9PMQEnvh23qHrOtj4cxmCvp7Jh2J0p6FXizim+PCxYsWMChhx5KgwYNKlU/Pz+fVatWlR4ydc45l3w+Y7Hz+gL/MLPFkr6WlGNmcyT1CcuONbP8mGRlHHCPmb0oKR1Ik9QbaAd0BwS8IulEoAWwyszOAJDUOLRzDtDRzExSkwQxPQlca2bvSroT+B0wLCyrbWbdJZ0eyk8uY7vuAd6Q9HgVxuJIoAuQDvwH+JWZdZE0AvgZ8OdQr76Z9Qjb+HhY71bgbTO7LGzTDEn/DPWPA442s6/jO5R0BXAFQIOMA6sQas2ycuVKFi9ezKeffsrnn3/OF198wc9//nOaNWuWsP6cOXPIzs4mLc3/jnPOuVThn8g77yLg2fD82fAaoiToCTPLBzCzryU1BA4ysxdD2dawvHd4zAXmAB2JkrcFwMlhNuxHZrYR+AbYCjwq6VwgPzYYSY2BJmb2bij6G3BiTJUXws/ZQGZZG2VmS4EZwMVVGIt3zGyTmX0JbAReDeUL4vp6JvQxFWgUkrPewK8lzQOmECV8bUL9txIlaqGNMWbW1cy61ktvWoVQa5Y+ffowbNgwrrnmGjp16sS5555LYWEhf/nLX1i5ciWPP/44U6dOLa0/Y8YMunfvnsSInXPOxfOZtZ0gqRnwY+BISQbUAkzSzUQzZBa/SllNAX8ws9EJ+jgGOB34g6Q3zezOcOjwJ0SHXa8JMVTWtvCziIrf9/8DngOmxpQV8n1yH392+raY58Uxr4vj+oofFyMag35m9knsAknHAt9WEKergp/97PvrUK677rqEdW644Ya9FY5zzrlK8pm1ndMfeNLMDjWzTDM7BFgKnEB0ftVlkjIAJO1vZt8AKyX1DWX7heWTQ90GofwgSS0ltQbyzewp4D4gJ9RpbGavEx3azI4NKMy+rY85H+0S4F12gpnlAYuAM2OKPwOOidn+nVFy7t0JwMYQ82TgWkkKy7rsZNvOOefcPsln1nbORUTndsV6HrjYzK6WlA3MklQAvA7cQpQ8jQ7nkn0HnGdmb0rqBHwQcpXNwEDgcOBPkopD3auBhsDL4Xw3AdcniGsQ8HBIBD8FLt2Fbfw90eHZEvcBEyRdAry9k22ul/Q+0Ai4LJTdRXRO2/yQsH3G9kmic845V6PJLP7IlHPVS9euXW3WrFnJDsM555yrEkmzzazCe5D6YVDnnHPOuRTmyZpzzjnnXArzZM0555xzLoV5suacc845l8I8WXPOOeecS2GerDnnnHPOpTBP1pxzzjnnUpgna84555xzKcyTNeecc865FObJmnPOOedcCvP/DeqqveVL8xkyYE6yw9htRo3LYcuWLTz44IPUrl2bgoICzj77bDp06MCECRNYuXIl9erVY9CgQdSvX58XXniBZcuWAbBmzRpOPfVUevXqleStcM45t7t4spbiJG02swZxZVcB+Wb2ZDnrPQoMN7NFuyGGImAB0T+QLwKuMbP3K1jnfTPrsat911T77bcfN9xwA7Vq1WLdunU8+uijnHXWWRQUFHDjjTcyffp03nrrLfr27cu5555but7dd99Nly5dkhi5c8653c2TtWrIzB6uRJ1f7MYut5hZNoCkU4E/ACdV0L8narsgLe37MxS2bNnCQQcdxJIlSzjqqKMAOProo3nvvfe2W2f58uU0bNiQJk2a7NVYnXPO7Vl+zlo1JOl2STdJ6iRpRkx5pqT54fkUSV3D882Sfi/pI0nTJR0QytuG1zMl3SlpcyW6bwSsj+nzl2H9+ZLuiCnfHH72DLE8JylP0jhJCstOD2XTJD0gaVIoP0nSvPCYK6nhro9a9bNhwwbuv/9+Ro4cSXZ2Nt9++y0ZGRkA1KtXj/z8/O3qz5gxg+7duycjVOecc3uQJ2vVmJl9DNSV9INQdAEwIUHV+sB0M+sMTAUuD+V/Af5iZt2AVeV0VS8kTnnAo8BdAJJ6A+2A7kA2cIykExOs3wUYBmQBPwCOl5QOjAb6mNkJQIuY+jcBQ8Js3o+ALeXEts9q0qQJN954I7/61a8YP3489evXL03QtmzZUpq4ARQXFzN//nyys7OTFa5zzrk9xJO16m8CcH54fgEwPkGdAmBSeD4byAzPjwMmhudPl9PHFjPLNrOOwGnAk2F2rHd4zAXmAB2Jkrd4M8xspZkVA/NC/x2BT81saajzTEz9fwPDJQ0FmphZYXyDkq6QNEvSrC1b18cvrva+++670ufp6emkp6fTrl07cnNzAcjNzaVdu++HOi8vjzZt2lCvXr29Hqtzzrk9y89Zq/7GAxMlvQCYmS1JUOc7M7PwvIhdeN/N7ANJzYlmwgT8wcxGV7DatpjnJf2rnD7ukfQacDowXdLJZpYXV2cMMAagZbMsS9BMtbZ69Wqee+45JFFcXEz//v1p3749CxYs4P777yc9PZ1BgwaV1vdDoM45t+/yZK2aM7P/hqs1byPxrFp5pgP9wnoXVmYFSR2BWsBXwGTgLknjzGyzpIOIEsO1lWgqD/iBpEwz+4xoVrCkj7ZmtgBYIOk4olm4vMTN7JvatGnDDTfcsEP5hRcmfpsGDx68hyNyzjmXLJ6spb4MSStjXg9PUGc88CfgsCq2PQx4StKNwGvAxjLq1ZM0LzwXMMjMioA3JXUCPgjXDGwGBgIVJmtmtkXS/wD/kLQOmBGzeJikXkSzcIuAN6q4Xc4559w+Q98fHXM1jaQMovPRTNKFwEVmdvZe7L9BmJETMApYYmYjqtpOy2ZZdt5pT+3+AJNk1LicZIfgnHNuL5A028y6VlTPZ9ZqtmOAB0OytAG4bC/3f7mkQUBdoosUKjr3LaE2h2V4guOcc26f5claDWZm7wGdk9j/CKDKM2nOOedcTeK37nDOOeecS2GerDnnnHPOpTBP1pxzzjnnUpgna84555xzKcyTNeecc865FObJmnPOOedcCvNkzTnnnHMuhXmy5pxzzjmXwvymuK7aW740nyED5iQ7jJ02alwOK1asYPz48aSlpZGWlsbAgQMBePLJJwn/d5XBgwfTtGlTCgoKmDBhAl999RXFxcVceeWVZGRkJHMTnHPO7UGerDmXAho1asQ111xDeno6CxcuZNKkSTRs2JAePXrwwx/+kA8++IApU6Zwzjnn8Nprr5GTk0NWVlayw3bOObcX+GHQakrSwZJelrRE0n8l/UVS3STG01dSVszrOyWdnKx4qpvGjRuTnp4OQO3atUlLS6N169bk5+cDkJ+fT8OGDQH45JNPWLRoESNGjGDSpElJi9k559ze4claNRT+8foLwEtm1g5oDzQAfp/EsPoCpcmamf3WzP6ZxHiqpW3btvHKK69wyimn0KFDB6ZNm8bdd9/Ne++9x/HHHw/AqlWr6NChA8OGDWP16tXk5uYmOWrnnHN7kidr1dOPga1m9gSAmRUB1wOXSaov6T5JCyTNl3QtgKRukt6X9JGkGZIaShos6cGSRiVNktQzPN8s6X5JcyT9S1KLUH65pJmhneclZUjqAfwU+JOkeZLaShorqX9Y5yeS5oaYHpe0Xyj/TNIdoY8FkjqG8pNCO/PCeg331sAmU1FREY899hi9e/emVatWvPTSS5x11ln85je/4YwzzuDll18GICMjg6ysLCSRlZXF559/nuTInXPO7UmerFVPRwCzYwvM7BtgOfAL4DCgi5kdDYwLh0fHA9eZWWfgZGBLBX3UB+aYWQ7wLvC7UP6CmXUL7XwM/NzM3gdeAX5pZtlm9t+SRiSlA2OBC8zsKKLzJK+O6Wdd6OMh4KZQdhMwxMyygR8lilXSFZJmSZq1Zev6CjYl9RUXF/PEE0/QuXNnsrOzS8sbNGgAQMOGDUsPibZv357ly5cDsGzZMlq0aLH3A3bOObfX+AUG1ZMAK6P8ROBhMysEMLOvJR0FrDazmaHsG6D0KsMyFBMleABPER12BThS0t1AE6JDr5MriLUDsNTMFofXfwOGAH8Or0vanQ2cG57/GxguaRxRcrgyvlEzGwOMAWjZLCvRWFQr8+bNIzc3l02bNjFjxgxat27NaaedxjPPPENaWhpFRUVcfPHFAJx99tmMGzeOwsJCWrRoQefOnZMcvXPOuT3Jk7XqKRfoF1sgqRFwCPApOyZyZSV3hWw/u5peTp8l648F+prZR5IGAz0riLXcjBDYFn4WEfZHM7tH0mvA6cB0SSebWV4F7VRrOTk55OTk7FB+44037lDWrFkzhg4dujfCcs45lwL8MGj19C8gQ9LPACTVAu4nSqTeBK6SVDss2x/IA1pL6hbKGoblnwHZktIkHQJ0j+kjDegfnl8MTAvPGwKrJdUBBsTU3xSWxcsDMiUdHl5fQnRYtUyS2prZAjO7F5gFdCyvvnPOObcv82StGjIzA84BzpO0BFgMbAVuAR4lOndtvqSPgIvNrAC4ABgZyt4imkX7N7AUWADcB8TeWfZb4AhJs4kuaLgzlN8GfBjaiJ3tehb4ZbggoG1MrFuBS4GJkhYQHV59uIJNHCZpYYh1C/BGpQfHOeec28co+t53bnuSNptZg2THURldu3a1WbNmJTsM55xzrkokzTazrhXV85k155xzzrkU5smaS6i6zKo555xz+zpP1pxzzjnnUpgna84555xzKcyTNeecc865FObJmnPOOedcCvNkzTnnnHMuhXmy5pxzzjmXwjxZc84555xLYf6P3F21t3xpPkMGzKm4YgoYNS6HFStWMH78eNLS0khLS2PgwIEsWbKE119/nf333x+ASy+9lCZNmlBQUMCECRP46quvKC4u5sorryQjIyPJW+Gcc25v8mTNub2sUaNGXHPNNaSnp7Nw4UImTZpEhw4d6NGjB3369Nmu7muvvUZOTg5ZWVlJitY551yyebJWw0g6B3gB6GRmeRXVL6edwcCbZrZqd8VWUzRu3Lj0ee3atUlLi85G+PDDD1m0aBHt27fnjDPOIC0tjU8++YSioiImT55Mu3btOPPMM5MVtnPOuSTxc9ZqnouAacCFu9jOYKD1LkdTg23bto1XXnmFU045hc6dO/Pb3/6W66+/nq+++oqZM2cCsGrVKjp06MCwYcNYvXo1ubm5SY7aOefc3ubJWg0iqQFwPPBzYpI1ST0lTZH0nKQ8SeMkKSz7raSZkhZKGqNIf6ArME7SPEn1JP1E0lxJCyQ9Lmm/sP5nkv5P0geSZknKkTRZ0n8lXRXq/F3S2THxjJP00704NHtdUVERjz32GL1796ZVq1ZkZGSUnsPWtWtXli1bBkBGRgZZWVlIIisri88//zzJkTvnnNvbPFmrWfoC/zCzxcDXknJilnUBhgFZwA+IkjqAB82sm5kdCdQDzjSz54BZwAAzywYMGAtcYGZHER1evzqm7RVmdhzwXqjXH/ghcGdY/ihwKYCkxkAP4PXduN0ppbi4mCeeeILOnTuTnZ0NQH5+funyTz75hAMOOACA9u3bs3z5cgCWLVtGixYt9n7AzjnnksrPWatZLgL+HJ4/G16XXEY5w8xWAkiaB2QSHS7tJelmIAPYH8gFXo1rtwOwNCSBAH8DhsT09Ur4uQBoYGabgE2StkpqYmbvSholqSVwLvC8mRWWtyGSrgCuAGiQcWAVhiD55s2bR25uLps2bWLGjBm0bt2a9PR0PvnkE9LS0jjggAM44YQTADj77LMZN24chYWFtGjRgs6dOyc5euecc3ubJ2s1hKRmwI+BIyUZUAuwkIgBbIupXgTUlpQO/BXoamYrJN0OpCdqvoLuS9oujuunmO/3wb8DA4gOz15W0faY2RhgDEDLZllWUf1UkpOTQ05OTsUVgWbNmjF06NA9HJFzzrlU5odBa47+wJNmdqiZZZrZIcBS4IRy1ilJzNaF8936xyzbBDQMz/OATEmHh9eXAO9WMb6xRIdhMTM/i94555wLPFmrOS4CXowrex64uKwVzGwD8AjR4cuXgJkxi8cCD4dDpiI652yipAVEM2YPVyU4M1sDfAw8UZX1nHPOuX2dzKrVESS3j5KUQZQU5pjZxqqs27JZlp132lN7JrDdbNS4yh3+dM45t++TNNvMulZUz89Zc0kn6WTgcWB4VRM1gDaHZXgS5Jxzbp/lyZpLOjP7J9Am2XE455xzqcjPWXPOOeecS2GerDnnnHPOpTBP1pxzzjnnUpgna84555xzKcyTNeecc865FObJmnPOOedcCvNkzTnnnHMuhXmy5pxzzjmXwvymuK7aW740nyED5iQ1hlHjchg5ciQrVqygV69e9OnTh4KCAv72t7+xefNmMjIyuOSSS8jIyKC4uJgXX3yRlStXUlxczIUXXkirVq2SGr9zzrnU5cnaPkpSEdH/2qxN9A/SBwEtgUlmduQutn0VkG9mT5ZTZ2zo67ld6as6GThwIHl5eWzYsAGAadOm0aZNG0499VRmzZrFW2+9xdlnn820adM44IAD6NevX5Ijds45Vx34YdB91xYzyw6JWQFw1e5q2MweLi9Rq6maNm263es1a9Zw6KGHApCZmcnixYsBmDNnDl999RUjRoxg/PjxFBYW7vVYnXPOVR+erNUM7wGHh+e1JD0iKVfSm5LqSWorqfQ4oqR2kmaH5/dIWiRpvqT7Qtntkm4Kz7MlTQ/LX5TUNL5zST+RNFfSAkmPS9ovlJ8uKU/SNEkPSJokKU3SEkktQp00Sf+R1HzPDtHud9BBB5GbmwtAbm4u+fn5AGzYsIHGjRtz/fXXU7t2bd5///1khumccy7FebK2j5NUG+hDdEgUoB0wysyOADYA/czsv8BGSdmhzqXAWEn7A+cAR5jZ0cDdCbp4EvhVWL4A+F1c/+nAWOACMzuK6LDs1aF8NNDHzE4AWgCYWTHwFDAgNHEy8JGZrdu1kdj7evToQWFhISNGjChN0ADq/397dx5mRXWtcfj3MQkNCIiiBkVQiV4QBcUJ1KCJKHHAMaBEwWiIRo2iRs2oxgzeGKNxSAwhBI1EVFTkOmIcUCMOoAziAAYFUQQnnKCBbtb9o3bjoTk9QUMfmu99nn66zqpdu1bVOTbLvatONW9Oly5dAOjatSvvvvtuXaZpZmYFzsVa/dVM0lRgMjAP+HuKvxURU9PyFKBjWh4BnCapITAA+BfwGVAMjJB0HLAkdweSWgGtI2JiCt0CHFQuj13SPmeVa7MrMCci3krx23O2GQmcmpa/B/yj/MFJGippsqTJS4s/qfRE1JVGjRoxYMAAhg0bRtu2benRowcAnTt3Zt68eQDMnTuXdu3a1WWaZmZW4HyDQf21NCK65wYkASzLCZUCzdLy3WSjYo8DUyLio7TNPsA3gYHAOcAhNcxDNYwTEe9IWijpEGBfvhply20zHBgO0K5tl6hhTuvF6NGjmTNnDiUlJcydO5f+/fszZswYGjRoQPv27Tn22GMB6Nu3L7feeitPP/00RUVFDBkypG4TNzOzguZizQCIiGJJjwB/AU4HkNQCKIqIByU9B7xZbptPJX0i6cCIeBo4BZhYruvXgY6Sdo6IN3PavA7sKKljRLxNNpqXawTZdOg/I6K0Vg92PRk0aI2akmHDhq0RKyoq4swza+1+DzMzq+dcrFmu0cBxwIT0uiVwX7q+TMCalUf2lSA3SyoC5pBd77ZKKgJPA+5K18+9CNwcEcsk/RB4WNKHwAvl+h1PNv25xhSomZnZpsTFWj0VES3yxN4Gdst5/YdyTQ4ARpaNZEXEAmCfPP1cnrM8FdgvT5shOcuPAT3ypPlEROyqbH72JrLr68rsQXZjwet5tjMzM9tkuFgzACTdC+xEza9JWxfflzQYaAK8THZ3KJIuBc4iz7Vq+XToVMRNo/dcb0mamZnVJUUUxLXZZmutZ8+eMXny5KobmpmZFRBJUyKiZ1Xt/NUdZmZmZgXMxZqZmZlZAXOxZmZmZlbAXKyZmZmZFTAXa2ZmZmYFzMWamZmZWQFzsWZmZmZWwFysmZmZmRUwP8HANnrz3lrC2YNeWi993zR6T2644QbeeecdDj74YPr168esWbMYP348DRo0QBKDBw9miy22WLXNLbfcwuLFiznvvPPWS05mZrZpcbFmVoXvfve7vP766yxevBiAHXfckYsuugiAZ599lieffJLjjjsOgHfffZelS5fWWa5mZlb/eBrU8pK0naT7JM2W9F9Jf5LURNIQSTdWsM2za7mvYyR1WbeM1582bdqs9rpRo6/+H6e4uJj27duvev3ggw9y2GGHbbDczMys/nOxZmuQJOAeYFxEdAa+DrQAflPZdhHRay13eQxQsMVaPjNmzOCqq67iqaeeolOnTgDMmjWLdu3asfnmm9dxdmZmVp+4WLN8DgGKI+IfABFRCgwDvgcUAdtLeljSG5IuK9tI0hc5yz+W9KKk6ZKuyImfmmLTJP1TUi/gaOBqSVMl7STpR5JeTe3GbKBjrpFu3bpx6aWXctRRRzF+/HgAJkyYwKGHHlrHmZmZWX3ja9Ysn67AlNxARHwmaR7ZZ2YfYDdgCfCipAciYnJZW0l9gc6pnYDxkg4CPgJ+BvSOiA8lbRERH0saD9wfEWPT9pcCnSJimaTW+RKUNBQYCtCiaJvaPPYqrZ66e+MAACAASURBVFixgsaNGwPQrFkzmjRpQnFxMZ999hkjR45k+fLlLFiwgIceeoh+/fpt0NzMzKz+cbFm+QiISuKPRsRHAJLuAQ4AJue065t+Xk6vW5AVb3sAYyPiQ4CI+LiC/U8HRksaB4zL1yAihgPDAdq17ZIv11ozevRo5syZQ0lJCXPnzqVbt248//zzSKJRo0acfPLJNG3alJ/+9KcAfPTRR9x2220u1MzMrFa4WLN8ZgLH5wYkbQ5sD5SyZiFX/rWA30XEX8v18aM8bfM5AjiIbHr0F5K6RkRJ9dOvXYMGDVoj1rt37wrbt23b1l/bYWZmtcbXrFk+jwFFkk4FkNQQuAYYRTb1eaikLSQ1I7s54D/ltn8E+J6kFmn79pLapX6/I6ltipd9OdnnQMsUawBsHxFPABcDrclG5szMzDZJLtZsDRERwLHAiZJmA7OAYuCnqckzwD+BqcDdOderRdp+AvAvYJKkGcBYoGVEzCS7o3SipGnAH9N2Y4AfS3qZbLr0trTdy8C1EbF4vR6wmZlZAVP277LZukmjZS9FxA4bet89e/aMyZMnV93QzMysgEiaEhE9q2rnkTVbZ5K+BkwC/lDXuZiZmdU3vsHA1llEvEf2xblmZmZWyzyyZmZmZlbAXKyZmZmZFTAXa2ZmZmYFzMWamZmZWQFzsWZmZmZWwFysmZmZmRUwF2tmZmZmBczfs2ZWhRtuuIF33nmHgw8+mH79+jFr1izGjx9PgwYNkMTgwYPZYosteOyxx5g2bRorV65kyy235JRTTqFhw4Z1nb6ZmW3k/Lgp2+i1a9slTjz8tvXS902j9+STTz7h9ddfZ/HixfTr14+SkhIaNcr+P+fZZ5/l/fff57jjjlstPmrUKPbaay+6deu2XvIyM7ON3yb5uClJx0oKSbtWo+2Q9JikstcjJHWpYptn0++Okk7OifeUdP265F4bJL0tacsK4jMkTU2/+1ejry9qKaczJZ1aG33VlTZt2qz2uqwgAyguLqZ9+/arxSOCiKBdu3YbLkkzM6u36ts06EnAM8BA4PIq2g4BXgHeA4iIM6rqPCJ6pcWOwMnAv1J8MlDoTxI/OCI+lLQLMAG4b0PsNCJurkl7SY0iomR95VNbZsyYwQMPPEBxcTE//OEPV8UfeughnnvuOdq1a7dGkWdmZrY26s3ImqQWQG/gdLJiLXfdxWlEaZqkqySdAPQERqfRpmaSnkwjZGdJ+n3OtkMk3ZCWy0abrgIOTNsOk9RH0v2pTXNJIyW9KOnlslEsSV0lvZC2mS6pc55j+IukyZJmSroiJ/62pCskvZSOY9cUbytpQtrPXwFV41RtDnyS0/d3c/L6q6SGOet+k87Zc5K2TrGjJD2f9vlvSVtLapBybJ2z7Ztp3eWSLkqx7qmv6ZLuldQmxZ+U9FtJE4HzJJ0o6ZW076eqcUwbXLdu3bj00ks56qijGD9+/Kp4v379uPzyy2nbti2TJk2qwwzNzKy+qDfFGnAM8HBEzAI+lrQngKR+ad2+EbEH8PuIGEs2EjYoIrpHxNKcfsYCx+W8HgDcUW5flwJPp22vLbfuZ8DjEbE3cDBwtaTmwJnAnyKiO1mhOD/PMfwszV3vDnxD0u456z6MiD2BvwAXpdhlwDMR0QMYD3So5Pw8IekVYCLw83Ru/icdX++UVykwKLVvDjyXztlTwPdT/Blgv7TPMcDFEbGSbKTu2NTvvsDbEbGwXA63ApdExO7AjJR/mdYR8Y2IuAb4JXBY2vfRlRxTnVixYsWq5WbNmtGkSZPV4pJWi5uZma2L+jQNehJwXVoek16/BHwL+EdELAGIiI8r6yQiPpA0R9J+wGxgF+A/NcijL3B02WgS0JSsiJoE/EzSdsA9ETE7z7bfkTSU7H3ZFugCTE/r7km/p/BVMXlQ2XJEPCDpEypWNg26E/CYpCeBbwJ7AS9KAmgGLErtlwP35+zz0LS8HXCHpG2BJsBbKX4HWZH1D7KRzdUKXEmtyAqyiSl0C3BXTpPc9v8BRkm6M+e4V5PO01CAFkXbVHLY62706NHMmTOHkpIS5s6dS7du3Xj++eeRRKNGjTj55OzyxbvvvpsFCxYQEWy11VYceeSR6zUvMzPbNNSLYk1SW+AQYDdJATQEQtLFZFODNb3l9Q7gO8DrwL1Rs1tmBRwfEW+Ui78m6XngCOARSWdExOM5x9CJbMRs74j4RNIoskKvzLL0u5TV37caHVtE/FfSQrJCUMAtEfGTPE1X5Bx37j5vAP4YEeMl9eGrawMnATtL2opsJPPXNckL+DInxzPT6NwRwFRJ3SPio3LHMRwYDtndoDXcV40MGjRojVjv3r3XiA0cOHCNmJmZ2bqqL9OgJwC3RsQOEdExIrYnG/E5gOxi+u9JKgKQtEXa5nOgZQX93UNWcJzEmlOgVW37CHCu0lCVpB7p947AnIi4nmzKcvdy221OVrB8mq4P61f5IQPZ9OSg1H8/oMor2iW1AzoBc4HHgBNSDElbSNqhii5aAe+m5cFlwVTY3Qv8EXgtT3H1KfCJpANT6BSyKdl8Oe4UEc9HxC+BD4HtqzouMzOz+qpejKyRFVVXlYvdDZwcEWdJ6g5MlrQceBD4KTAKuFnSUmD/3A3TyNarQJeIeCHP/qYDJZKmpX5ezll3Jdl07PRUsL0NHEl2bdh3Ja0A3gd+VW6f0yS9DMwE5lC9qdcrgNslvURW+MyrpO0TkkqBxsCl6XqyhZJ+DkyQ1ABYAZxNVshV5HLgLknvAs+RFX5l7gBeJLvTNp/BZOe8iOwYT6ug3dXpBgyRFZTTKsnHzMysXvOX4tpGb31/Ka6Zmdn6oGp+KW59GVmzTViHTkUuqszMrN6qL9esmZmZmdVLLtbMzMzMCpiLNTMzM7MC5mLNzMzMrIC5WDMzMzMrYC7WzMzMzAqYizUzMzOzAuZizczMzKyAuVgzMzMzK2B+goFt9Oa9tYSzB720VtuWPfngjjvuYO7cuaxcuZJvfvObdO3alREjRlBSUsLKlSsZOHAg2223XW2mbWZmVi0u1myT99577/Hee+9x8cUXU1xczG9/+1u+/PJLdtppJ4444ghmzZrFww8/zBlnnFHXqZqZ2SbI06C2GkmlkqZKmilpmqQLJK3V50RST0nX13aOta1Vq1Y0atSI0tJSiouLKSoqYptttqG4uBiAL7/8kpYtW9ZxlmZmtqnyyJqVtzQiugNIagf8C2gFXFbTjiJiMjC5dtOrfUVFRbRr147LL7+cZcuWMWjQIDp06MD999/PlVdeydKlS7nwwgvrOk0zM9tEeWTNKhQRi4ChwDnKNJR0taQXJU2X9AMASXdI+nbZdpJGSTpeUh9J96dYC0n/kDQjbXt8iveVNEnSS5LuktQixa+S9Gpq+4f1eZyvvfYaixcv5oorruCyyy5j/PjxTJgwgR49evCLX/yCM844gzFjxqzPFMzMzCrkkTWrVETMSdOg7YD+wKcRsbekzYD/SJoAjAEGAA9KagJ8EzgL2Denq1+kbbsBSGojaUvg58C3IuJLSZcAF0i6ETgW2DUiQlLr8nlJGkpWSNKiaJt1Ps6ioiIaNGhA06ZNKSkpoaSkhObNmwPQsmVLlixZss77MDMzWxsu1qw6lH73BXaXdEJ63QroDDwEXJ8KuMOBpyJiqaTcPr4FDCx7ERGfSDoS6EJW9AE0ASYBnwHFwAhJDwD3l08oIoYDwwHate0S63Jwu+66K5MnT+aaa65hxYoV9OnThx49enDLLbcwadIkVqxYwTHHHLMuuzAzM1trLtasUpJ2BEqBRWRF27kR8Uiedk8Ch5GNsN2eryugfFEl4NGIOClPf/uQjdANBM4BDln7o6hcgwYNOPXUU9eIn3feeetrl2ZmZtXma9asQpK2Am4GboyIAB4BzpLUOK3/uqTmqfkY4DTgwNSuvAlkRVdZ322A54DeknZOsaLUZwugVUQ8CJwPdF8vB2hmZrYR8MialddM0lSgMVAC/BP4Y1o3AugIvKRs3vIDoGx+cAJwKzA+Ipbn6ffXwE2SXiEbqbsiIu6RNAS4PU2hQnYN2+fAfZKako2+DavdQzQzM9t4KBswMdt49ezZMyZPLvhvCDEzM1uNpCkR0bOqdp4GNTMzMytgLtbMzMzMCpiLNTMzM7MC5mLNzMzMrIC5WDMzMzMrYC7WzMzMzAqYizUzMzOzAuZizczMzKyAuVgzMzMzK2B+3JRt9Oa9tYSzB71UZbubRu+ZtZ83j3HjxlFaWsoOO+zAsccey5133sn8+fNp1qwZgwcPpnnz5lX0ZmZmtmG4WLNNSklJCePGjWPo0KE0bdoUgJkzZ7J8+XIuvPBCnnvuOR599FGOOeaYKnoyMzPbMKqcBpW0taR/SZojaYqkSZKO3RDJ1SZJfSTdX0E8JJ2eE+uRYhdV0ecoSSfUcp5PSqryOWGVbL9L6mOqpNckDa/N/DZ2c+bMYbPNNmPkyJFcd911vPnmm8yePZtu3boBsPvuuzN79uw6ztLMzOwrlRZrkgSMA56KiB0jYi9gILDdhkhuA5oBDMh5PRCYVke51IikhuVC1wPXRkT3iPgf4IY6SKtgffrpp8yfP5/TTjuNIUOGMHr0aL788kuKiooAaNasGUuWLKnjLM3MzL5S1cjaIcDyiLi5LBARcyPiBsgKBUlXS3pR0nRJP0jxPml0Z6yk1yWNToUfkvaSNDGN0j0iadsU/5GkV1M/Y8onIqmjpKclvZR+elVjX4en2DPAcZUc5zygaRpFFHA48FDOvrtLei7ldq+kNnnyq+i4dpb0b0nTUt47lR/lk3SjpCF5+vyLpMmSZkq6Iif+tqRfpuM6sdxm2wLzc96vGVW8V0r7f1XSA5IeLBstTPvZMi33lPRkWm4uaWTq62VJ/VN8iKR7JD0sabak3+fkfHg6/mmSHquin66SXkijg9Mlda7kvauR5s2bs+OOO9KsWTNat25NixYtWLly5aoCbenSpasKNzMzs0JQVbHWFajsyu3TgU8jYm9gb+D7kjqldT2A84EuwI5Ab0mNyUZ6TkijdCOB36T2lwI9ImJ34Mw8+1oEHBoRe5KNgl2fsy7fvpoCfwOOAg4EtqniWMeSFT690jEvy1l3K3BJym0GcFnuhlUc12jgpojYI/W9oIo8cv0sInoCuwPfkLR7zrriiDggIsoXttcCj0t6SNIwSa1TvKL36lhgF6Ab8P2UY5V5AY+nvg4GrpZUdkV+d7L3pxswQNL2krYiey+OT+fhxCr6ORP4U0R0B3qSU3yWkTQ0FbKTlxZ/Uo2UMx07dmTRokWUlpZSXFzM559/Tvfu3Zk5cyaQXb/WuXOt1YZmZmbrrEY3GEi6CTiAbLRtb6AvsLu+um6rFdAZWA68EBHz03ZTgY7AYmA34NE0+NWQr4qX6cBoSePIpl7LawzcKKk7UAp8PWddvn19AbwVEbNT/DZgaCWHdydwB7ArcDupaJHUCmgdERNTu1uAu8ptu0u+45LUEmgfEfcCRERx6rOSNFbzHUlDyd6nbcmK0elp3R35NoiIf0h6hGx0sD/wA0l7UPF7dRBwe0SUAu9JerwaefUFjtZX1/Q1BTqk5cci4tN0nK8COwBtyKbS30o5flxFP5OAn0naDrin7D0sd5zDgeEA7dp2iWrkDEBRURF9+vThuuuuo7S0lGOOOYauXbsyc+ZMrrnmGpo2bcrgwYOr252Zmdl6V1WxNhM4vuxFRJydpsUmp5CAcyPikdyNJPVh9ZGp0rQvATMjYv88+zqCrHA4GviFpK4RUZKzfhiwENiDbESwOGddvn0BVPsf8Yh4X9IK4FDgPKo3wlQm73FJ2ryC9iWsPqrZdI0Os1Gvi4C9I+ITSaPKtfuyomQi4j2y0b2Rkl4hKyQreq++TcXnKTfP3H2LbJTsjXJ97UvF73u+feTtB3hN0vNkn4lHJJ0REdUpIqtl3333Zd99910tNnDgwNrq3szMrFZVNQ36ONm1XGflxHIv6HkEOCtNAyLp6znTYfm8AWwlaf/UvnG6PqkBsH1EPAFcDLQGWpTbthWwICJWAqeQjV5V5nWgk6Sd0uuTqmgP8Euy6c7SskAaJfpE0oEpdAowsdx2eY8rIj4D5ks6JsU3k1QEzAW6pNetgG/myWVzsoLsU0lbA/2qkX/ZtWFl78c2QFvgXSp+r54CBqZr2rYlm44s8zawV1o+Pif+CHCutOrawB5VpDWJbBq3U2q/RWX9SNoRmBMR1wPjyaaBzczMNkmVjqxFRKRC41pJFwMfkBUQl6QmI8imHF9K/+B+AFT4BVURsTxNw12fipRGwHXALOC2FBPZ3YyLy23+Z+BuSScCT1DJyFLaV3GaQnxA0ofAM2QjTJVt82wFqwYDN6dCaw5wWjWPayZZcfdXSb8CVgAnRsQcSXeSTWnOBl7Ok8s0SS+nPuYA/6ks9xx9gT9JKht5/HEaNazovbqX7EaSGWTvQ24hegXwd0k/BZ7PiV+Zjm966utt4MiKEoqID9J7cU8qzBeRjWBW1M8A4LtppPN94FfVPHYzM7N6RxHVnim0TUCabr0/IsbWdS7V1a5tlzjx8NuqbFf2BAMzM7NCIGlKupGwUn6CgW30OnQqciFmZmb1los1W01EDKnrHMzMzOwrVT5uyszMzMzqjos1MzMzswLmYs3MzMysgLlYMzMzMytgLtbMzMzMCpiLNTMzM7MC5mLNzMzMrIC5WDMzMzMrYP5SXNvozXtrCWcPeqnC9blPN1i4cCFXXnkl559/Pq1bt+bWW28lPUeeIUOG0KZNm/Wer5mZWU14ZG0jJ6lU0lRJr0i6S1KRpI6SXqnr3ArRQw89ROfOnQGYOHEivXr1YtiwYey33348+eSTdZucmZlZHi7WNn5LI6J7ROwGLAfOrOuECtXbb7/N5ptvvmr07Gtf+xpLliwBYMmSJbRs2bIu0zMzM8vLxVr98jSwc1puKOlvkmZKmiCpGYCknSQ9LGmKpKcl7ZrioyRdL+lZSXMknZDiknR1GrmbIWlAiveRNFHSnZJmSbpK0iBJL6R2O6V2W0m6W9KL6ad3il8uaaSkJ9P+flR2EJLGpfxmShpaWyfnoYceom/fvqte77LLLjzzzDP8+te/5umnn6Z37961tSszM7Na42KtnpDUCOgHzEihzsBNEdEVWAwcn+LDgXMjYi/gIuDPOd1sCxwAHAlclWLHAd2BPYBvAVdL2jat2wM4D+gGnAJ8PSL2AUYA56Y2fwKujYi9Uw4jcva3K3AYsA9wmaTGKf69lF9P4EeS2q7VSckxY8YMdthhB1q0aLEqNm7cOI466ih+/vOfc8QRR3Dfffet627MzMxqnW8w2Pg1kzQ1LT8N/B34GvBWRJTFpwAdJbUAegF3lV1UD2yW09e4iFgJvCpp6xQ7ALg9IkqBhZImAnsDnwEvRsQCAEn/BSakbWYAB6flbwFdcva3uaSy+cYHImIZsEzSImBrYD5ZgXZsarM9WeH5Ue5BpxG3oQAtirap8iTNnz+fWbNmMWfOHN59913ef/99GjduvKp4a9my5aopUTMzs0LiYm3jtzQiuucGUmG0LCdUCjQjG0ldXL59jtxtVO53Ve1X5rxeyVefrQbA/hGxtBo5NpLUh6zA2z8ilkh6EmhafscRMZxslJB2bbtEJTkC0K9fP/r16wfArbfeSq9evSgqKuL222+nQYMGlJaWcvLJJ1fVjZmZ2QbnYm0TEhGfSXpL0okRcZeyimn3iJhWyWZPAT+QdAuwBXAQ8GOyKczqmACcA1wNIKl7zohfPq2AT1KhtiuwXzX3U22nnnrqquULL7ywtrs3MzOrVb5mbdMzCDhd0jRgJtC/ivb3AtOBacDjwMUR8X4N9vcjoKek6ZJepeq7VR8mG2GbDlwJPFeDfZmZmdU7iqhyBsmsoLVr2yVOPPy2CtfnfimumZlZoZA0JSJ6VtXO06C20evQqcgFmZmZ1VueBjUzMzMrYC7WzMzMzAqYizUzMzOzAuZizczMzKyAuVgzMzMzK2Au1szMzMwKmIs1MzMzswLmYs3MzMysgLlYMzMzMytgfoKBbfTmvbWEswe9lHfdTaP3ZOnSpdx44400atSI5cuX079/f3bccUduueUWvvjiC4qKijjllFMoKirawJmbmZlVzcWa1XubbbYZF1xwAQ0bNuTDDz9kxIgR7LPPPnTo0IHDDjuMyZMn8+ijj9K/f1XPtDczM9vwPA1aSyRtLelfkuZImiJpkqRj6zqv8iQNkXRjBeselNS6hv3dJ2lS7WS3fjRo0ICGDRsCsHTpUtq3b8/ChQvZYYcdAOjYsSOzZs2qyxTNzMwq5JG1WiBJwDjglog4OcV2AI5ez/ttGBGltdVfRHy7hvtvDewJfCGpU0S8ladNo4goqa0c19bixYv5+9//zsKFCznllFP45JNPmDlzJrvuuiszZ85kyZIldZ2imZlZXh5Zqx2HAMsj4uayQETMjYgbICuqJF0t6UVJ0yX9IMWV4q9ImiFpQIo3kPRnSTMl3Z9GvE5I696W9EtJzwAnSvp+6neapLslFaV2oyTdLOlpSbMkHZmT79ckPSxptqTflwVT31um5VNTrtMk/bOC4z4e+D9gDDAwp59Rkv4o6QngfyU1lzQy5fmypP6pXceU30vpp1eKbyvpKUlT07k5cF3eHIDWrVtz4YUXcskll3DHHXfQq1cvSkpKuPbaa1m8eDGtWrVa112YmZmtFx5Zqx1dgfxXuGdOBz6NiL0lbQb8R9IEslGp7sAewJbAi5KeAnoDHYFuQDvgNWBkTn/FEXEAgKS2EfG3tPzrtK8bUruOwDeAnYAnJO2c4t2BHsAy4A1JN0TEO2WdS+oK/AzoHREfStqiguM6CbgCWAiMBX6Xs+7rwLciolTSb4HHI+J7aTTuBUn/BhYBh0ZEsaTOwO1AT+Bk4JGI+I2khsAaV/5LGgoMBWhRtE0F6WVWrFhB48aNAWjatClNmzalUaNGDBgwAIBnnnmG1q1rNPtrZma2wbhYWw8k3QQcQDbatjfQF9i9bHQMaAV0Tm1uT1OZCyVNBPZO8bsiYiXwfhqhynVHzvJuqUhrDbQAHslZd2fqY7akOcCuKf5YRHyacn0V2AF4J2e7Q4CxEfEhQER8nOcYtwZ2Bp6JiJBUImm3iHglNbkrZ4q2L3C0pIvS66ZAB+A94EZJ3YFSsgIP4EVgpKTGwLiImFp+/xExHBgO0K5tlyi/PteCBQsYO3Yskli5ciUnnHACCxYsYMyYMTRo0ID27dtz7LEFd3mhmZkZ4GKttswkmxIEICLOTtOJk1NIwLkRkVtIIamia8RUxf6+zFkeBRwTEdMkDQH65KwrX8SUvV6WEytlzc+B8mxb3gCgDfBWdskem5NNhf48T44Cjo+IN1bbiXQ52ajcHmRT8sUAEfGUpIOAI4B/Sro6Im6tIp8KdejQgQsuuGCN+LBhw9a2SzMzsw3G16zVjseBppLOyonlTt09ApyVRoqQ9HVJzYGngAHpmratgIOAF4BngOPTtWtbs3oBVl5LYEHqe1C5dSemPnYCdgTeWGPr/B4DviOpbco33zToScDhEdExIjoCe5Fz3Vo5jwDnphsxkNQjxVsBC9Lo3ylAw7R+B2BRmt79O9l0sZmZ2SbJI2u1IE0DHgNcK+li4AOykaVLUpMRZNePvZQKlg+AY4B7gf2BaWQjWRdHxPuS7ga+CbwCzAKeBz6tYPe/SOvnAjPIircybwATga2BM9O1YdU5npmSfgNMlFQKvAwMKVsvqSPZNOZzOdu8JekzSfvm6fJK4Dpgejr+t4EjgT8Dd0s6EXiCr0bj+gA/lrQC+AI4tcqkzczM6ilFVDXbZXVBUouI+CKNbr1AdrH/+zXYfhRwf0SMXV85FoqePXvG5MmTq25oZmZWQCRNiYieVbXzyFrhuj/dOdkEuLImhZqZmZnVHy7WClRE9FnH7YfUTiZmZmZWl3yDgZmZmVkBc7FmZmZmVsBcrJmZmZkVMBdrZmZmZgXMxZqZmZlZAXOxZmZmZlbAXKyZmZmZFTB/z5rVW0uXLuXGG2+kUaNGLF++nP79+9O2bVtGjBjBokWLOPvss9l5553rOk0zM7NK+XFTttFr17ZLnHj4bavFbhq9JytXriQiaNiwIR9++CEjRozgggsuYMWKFdx999306tXLxZqZmdWZ6j5uytOgBUrSdpLukzRb0n8l/UlSk/W8z56Srl+L7TpKOnld+6ltDRo0oGHDhkA2yta+fXuaNGlC8+bN6zgzMzOz6nOxVoAkCbgHGBcRnYGvAy2A35RrV6vT2BExOSJ+tBabdgRWFWvr0E+tW7x4Mddccw033HAD3bt3r+t0zMzMaszFWmE6BCiOiH8AREQpMAz4nqQfSrpL0v8BEyQ1kPRnSTMl3S/pQUknAEj6paQXJb0iaXgqApH0pKT/lfSCpFmSDkzxPpLuT8sPSpqafj6VNDiNoD0t6aX00yvlexVwYGo7rFw/W0gaJ2m6pOck7Z7il0samXKZI+lHKd5c0gOSpqW8B6zLiWzdujUXXnghl1xyCXfccce6dGVmZlYnXKwVpq7AlNxARHwGzCO7KWR/YHBEHAIcRzay1Q04I60rc2NE7B0RuwHNgCNz1jWKiH2A84HLyicQEd+OiO7A6cBcYBywCDg0IvYEBgBlU52XAk9HRPeIuLZcV1cAL0fE7sBPgVtz1u0KHAbsA1wmqTFwOPBeROyR8n640jNViRUrVqxabtq0KU2bNl3brszMzOqM7wYtTALy3flRFn80Ij5OsQOAuyJiJfC+pCdy2h8s6WKgCNgCmAn8X1p3T/o9hazYW3Nn0pbAP4HvRMSnkloBN0rqDpSSTc9W5QDgeICIeFxS29QPwAMRsQxYJmkRsDUwA/iDpP8F7o+IpyvIbSgwFKBF0TZ5d7xgwQLGjh2LJFauXMkJJ5zA0qVLLh1onwAACUJJREFUGT58OO+//z7vvfceu+22G0ceeWTe7c3MzAqBi7XCNJNU4JSRtDmwPVmR9GXuqnwdSGoK/BnoGRHvSLocyB1aWpZ+l5LncyCpITAG+FVEvJLCw4CFwB5ko7LF1TiWfPmVFaLLcmKlZKN9syTtBXwb+J2kCRHxqzU6iBgODIfsbtB8O+7QoQMXXHDBGvHzzjuvGmmbmZkVBk+DFqbHgCJJp8KqwukaYBSwpFzbZ4Dj07VrWwN9UrysMPtQUgvghBrmcBUwPSLG5MRaAQvSKN4pQMMU/xxoWUE/TwGD0nH0AT5MU7p5SfoasCQibgP+AOxZw7zNzMzqFRdrBSiyL787FjhR0mxgFtko1k/zNL8bmA+8AvwVeB74NCIWA38jm1YcB7xYwzQuAvrm3GRwNNlI3WBJz5FNgZaN8E0HStJNAcPK9XM50FPSdLICcHAV++0GvCBpKvAz4Nc1zNvMzKxe8Zfi1gOSWkTEF5LaAi8AvSPi/brOa0Op6EtxzczMCll1vxTX16zVD/dLag00Aa7clAo1gA6dilycmZlZveVirR6IiD51nYOZmZmtH75mzczMzKyAuVgzMzMzK2Au1szMzMwKmO8GtY2epM+BN+o6j3pmS+DDuk6iHvH5rF0+n7XP57R2Vfd87hARW1XVyDcYWH3wRnVufbbqkzTZ57T2+HzWLp/P2udzWrtq+3x6GtTMzMysgLlYMzMzMytgLtasPhhe1wnUQz6ntcvns3b5fNY+n9PaVavn0zcYmJmZmRUwj6yZmZmZFTAXa7ZRk3S4pDckvSnp0rrOZ2Mh6W1JMyRNlTQ5xbaQ9Kik2el3mxSXpOvTOZ4uyQ9iBSSNlLRI0is5sRqfQ0mDU/vZkgbXxbEUggrO5+WS3k2f06mSvp2z7ifpfL4h6bCcuP8mAJK2l/SEpNckzZR0Xor7M7oWKjmfG+YzGhH+8c9G+QM0BP4L7Ej2EPtpQJe6zmtj+AHeBrYsF/s9cGlavhT437T8beAhQMB+wPN1nX8h/AAHAXsCr6ztOQS2AOak323Scpu6PrYCOp+XAxfladsl/fe+GdAp/R1o6L8Jq52jbYE903JLYFY6b/6M1u753CCfUY+s2cZsH+DNiJgTEcuBMUD/Os5pY9YfuCUt3wIckxO/NTLPAa0lbVsXCRaSiHgK+LhcuKbn8DDg0Yj4OCI+AR4FDl//2ReeCs5nRfoDYyJiWUS8BbxJ9vfAfxOSiFgQES+l5c+B14D2+DO6Vio5nxWp1c+oizXbmLUH3sl5PZ/K/+OxrwQwQdIUSUNTbOuIWADZHyagXYr7PFdfTc+hz23VzknTciPLpuzw+awRSR2BHsDz+DO6zsqdT9gAn1EXa7YxU56Yb2+unt4RsSfQDzhb0kGVtPV5XncVnUOf28r9BdgJ6A4sAK5JcZ/PapLUArgbOD8iPqusaZ6Yz2k5ec7nBvmMulizjdl8YPuc19sB79VRLhuViHgv/V4E3Es2NL+wbHoz/V6Umvs8V19Nz6HPbSUiYmFElEbESuBvZJ9T8PmsFkmNyQqL0RFxTwr7M7qW8p3PDfUZdbFmG7MXgc6SOklqAgwExtdxTgVPUnNJLcuWgb7AK2TnruxOr8HAfWl5PHBqultsP+DTsmkUW0NNz+EjQF9JbdL0Sd8UM1YVE2WOJfucQnY+B0raTFInoDPwAv6bsIokAX8HXouIP+as8md0LVR0PjfUZ9QPcreNVkSUSDqH7A9HQ2BkRMys47Q2BlsD92Z/e2gE/CsiHpb0InCnpNOBecCJqf2DZHeKvQksAU7b8CkXHkm3A32ALSXNBy4DrqIG5zAiPpZ0JdkfcIBfRUR1L7KvVyo4n30kdSebJnob+AFARMyUdCfwKlACnB0Rpakf/03I9AZOAWZImppiP8Wf0bVV0fk8aUN8Rv0EAzMzM7MC5mlQMzMzswLmYs3MzMysgLlYMzMzMytgLtbMzMzMCpiLNTMzM7MC5mLNzGwDkdRRUkjqnxN7sxb6Xec+Kum7maTHJT0hqcP62k8F+35S0nYbcp9mhcjFmpnZhvU68JP0JZt1SlLDajTrDrwTEQdHxLw6zMNsk+Vizcxsw3oXeAnonxuUdLmk76blAySNSsujJP1Z0kNpdOs7kiZImiLpaznb/07SREm3SWpQLjZJ0pE5+xklaTzwnXI5fF/S8+nne6mI+gvZN9jfX65tH0kvpJz+kWLdJP07jcTdKalZij+SRslekLR/vjwkHSzpP6ndtTm7Oicd72OSNkvbnivp6XRcZ6TYwJx8frcub5BZofETDMzMNrzfAmMl3Vdly8zrEfFDSTcDvSOir6TzgQHAtWR/y8dHxE8k/Q04WlIx0CYiviGpCJgk6YHU37KIODp3B5K2As4B9k6hF4H/A84HvhsRZ5TL6Tjg5xExoaw4BG5KbedJOg84HbgROC4ivpT0P6nNIbl5pFHG14BvRMTCciNtz0XEpZKGA4dK+i9wOHAQ2YDD05LuBU5O+56Vk49ZveBizcxsA4uI+ZKmAMfkhnOWy0+Rvpx+zycbmStb3iNn2xfS8vPALsBK4BuSnkzxzYC2afnZPGntCMyIiOUAkmYAnSo5jKuBSyQNBh4ne25iV+DWNMPbFPh3Gl37k6RdgFKgfU4fZXlsBXwUEQsByh7Lk0xJv+el/JsBXYAnUnxzsgdj/wS4SNnzbu/kq2demm30XKyZmdWN3wFjc15/DJRdTL9XubZRwbJyfvckK9T2Bh4GlgETIuI8AElNImJ5KqRyi6EybwG7p4dLA3RLsa4V5P9RRJyTRsVmSbqL7CHWJ6UHgJP6OgIojYgDJXVh9YdWl+XxAbCFpK0i4gNJDSJiZQXH+xpZ8Xp8RISkxhGxQlJRRAxNU6WzcbFm9YiLNTOzOpBG1yaTTelBNho0XtKBZEVSTZQAx0v6PdnI2/iIKJW0fxpZC7KRuFMqyWeRpD8Dz6TQjalwqmiTCyT1JZuKfDQiPpN0NjBKUuPU5nfAJLIbKv4N/KeCfUfadrykZWTF2LAK2r6S+pooqRRYKulo4GpJ3YDGwF8rStpsY+QHuZuZmZkVMF+EaWZmZlbAXKyZmZmZFTAXa2ZmZmYFzMWamZmZWQFzsWZmZmZWwFysmZmZmRUwF2tmZmZmBczFmpmZmVkB+3+uVoQVFjlEBwAAAABJRU5ErkJggg==\n",
+      "text/plain": [
+       "<Figure size 720x432 with 1 Axes>"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "# Bar of SemanticGroup categories, horizontal\n",
+    "# Source: http://robertmitchellv.com/blog-bar-chart-annotations-pandas-mpl.html\n",
+    "ax = logAfterGoldStandard['SemanticGroup'].value_counts().plot(kind='barh', figsize=(10,6),\n",
+    "                                                 color=\"slateblue\", fontsize=10);\n",
+    "ax.set_alpha(0.8)\n",
+    "ax.set_title(\"Categories assigned after 'GoldStandard' processing\", fontsize=14)\n",
+    "ax.set_xlabel(\"Number of searches\", fontsize=9);\n",
+    "# set individual bar lables using above list\n",
+    "for i in ax.patches:\n",
+    "    # get_width pulls left or right; get_y pushes up or down\n",
+    "    ax.text(i.get_width()+.1, i.get_y()+.31, \\\n",
+    "            str(round((i.get_width()), 2)), fontsize=9, color='dimgrey')\n",
+    "# invert for largest on top \n",
+    "ax.invert_yaxis()\n",
+    "plt.gcf().subplots_adjust(left=0.3)\n",
+    "\n",
+    "# Remove searchLogClean"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# 8. Create 'uniques' dataframe/file for APIs\n",
+    "\n",
+    "\n",
+    "OPTIONS IF YOU DON'T WANT TO RUN THE ENTIRE LOG\n",
+    "\n",
+    "Eyeball the df and select everything with 2 or more queries\n",
+    "listOfUniqueUnassignedAfterGS = listOfUniqueUnassignedAfterGS.iloc[0:11335]\n",
+    "\n",
+    "Or, remove to a row count, such as to 186 based on looking at the content\n",
+    "\n",
+    "listOfUniqueUnassignedAfterGS = listOfUniqueUnassignedAfterGS.iloc[186:]\n",
+    "\n",
+    "listOfUniqueUnassignedAfterGS = listOfUniqueUnassignedAfterGS.reset_index()\n",
+    "\n",
+    "If you think the count is too high you could reduce the allowed character count\n",
+    "\n",
+    "mask = (listOfUniqueUnassignedAfterGS['adjustedQueryCase'].str.len() <= 15)\n",
+    "\n",
+    "listOfUniqueUnassignedAfterGS = listOfUniqueUnassignedAfterGS.loc[mask]\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 14,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Re-starting?\n",
+    "# logAfterGoldStandard = pd.read_excel(localDir + 'logAfterGoldStandard.xlsx')\n",
+    "\n",
+    "# Unique unassigned terms and frequency of occurrence\n",
+    "listOfUniqueUnassignedAfterGS = logAfterGoldStandard[pd.isnull(logAfterGoldStandard['preferredTerm'])] # was SemanticGroup\n",
+    "listOfUniqueUnassignedAfterGS = listOfUniqueUnassignedAfterGS.groupby('adjustedQueryCase').size()\n",
+    "listOfUniqueUnassignedAfterGS = pd.DataFrame({'timesSearched':listOfUniqueUnassignedAfterGS})\n",
+    "listOfUniqueUnassignedAfterGS = listOfUniqueUnassignedAfterGS.sort_values(by='timesSearched', ascending=False)\n",
+    "listOfUniqueUnassignedAfterGS = listOfUniqueUnassignedAfterGS.reset_index()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# ---------------------------------------------------------------\n",
+    "# Eyeball for fixes - Don't give the API things it can't resolve\n",
+    "# ---------------------------------------------------------------\n",
+    "\n",
+    "'''\n",
+    "*** RUN SOME OF THIS EVERY TIME - COMMENTED OUT TO AVOID DAMAGE ****\n",
+    "\n",
+    "# Eyeball the data frame, sort by adjustedQueryCase; remove rows that as appropriate\n",
+    "\n",
+    "# logAfterGoldStandard = logAfterGoldStandard.iloc[1595:] # remove before index...\n",
+    "\n",
+    "listToCheck4 = listToCheck4[listToCheck4.adjustedQueryCase.str.contains(\"^[0-9]{4}\") == False] # char entities\n",
+    "listToCheck4 = listToCheck4[listToCheck4.adjustedQueryCase.str.contains(\"^-\") == False] # char entities\n",
+    "listToCheck4 = listToCheck4[listToCheck4.adjustedQueryCase.str.contains(\"^/\") == False] # char entities\n",
+    "listToCheck4 = listToCheck4[listToCheck4.adjustedQueryCase.str.contains(\"^@\") == False] # char entities\n",
+    "listToCheck4 = listToCheck4[listToCheck4.adjustedQueryCase.str.contains(\"^\\[\") == False] # char entities\n",
+    "listToCheck4 = listToCheck4[listToCheck4.adjustedQueryCase.str.contains(\"^;\") == False] # char entities\n",
+    "listToCheck4 = listToCheck4[listToCheck4.adjustedQueryCase.str.contains(\"^<\") == False] # char entities\n",
+    "listToCheck4 = listToCheck4[listToCheck4.adjustedQueryCase.str.contains(\"^>\") == False] # char entities\n",
+    "\n",
+    "listToCheck3.drop(58027, inplace=True)\n",
+    "'''"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 15,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Save to file so you can open in future sessions\n",
+    "writer = pd.ExcelWriter(localDir + 'listOfUniqueUnassignedAfterGS.xlsx')\n",
+    "listOfUniqueUnassignedAfterGS.to_excel(writer,'listOfUniqueUnassignedAfterGS')\n",
+    "# df2.to_excel(writer,'Sheet2')\n",
+    "writer.save()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.5"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/02_Run_APIs.ipynb b/02_Run_APIs.ipynb
new file mode 100644
index 0000000..18f2c6b
--- /dev/null
+++ b/02_Run_APIs.ipynb
@@ -0,0 +1,1077 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Part 2. Run APIs\n",
+    "App to analyze web-site search logs (internal search)<br>\n",
+    "**This script:** Match query entries against UMLS REST API<br>\n",
+    "Authors: dan.wendling@nih.gov, <br>\n",
+    "Last modified: 2018-09-09\n",
+    "\n",
+    "\n",
+    "## Script contents\n",
+    "\n",
+    "Rather than re-using the same code during similar-and-optional runs, I duped and modified for special cases. Could be re-factored.\n",
+    "\n",
+    "1. Start-up\n",
+    "2. UmlsApi1 - Normalized string matching\n",
+    "3. Isolate entries updated by API, complete tagging, and match to the \n",
+    "   current version of the search log - logAfterUmlsApi\n",
+    "4. Create logAfterUmlsApi as an update to logAfterGoldStandard by appending \n",
+    "   newUmlsWithSemanticGroupData\n",
+    "5. Update GoldStandard\n",
+    "6. Create new 'uniques' dataframe/file for fuzzy matching\n",
+    "\n",
+    "8. UmlsApi2 - Tag non-English terms in Roman character sets\n",
+    "\n",
+    "7. UmlsApi3 - Word matching (relax prediction rules )\n",
+    "\n",
+    "8. RxNorm API\n",
+    "\n",
+    "9. UmlsApi4 - Re-run first config - Create logAfterUmlsApi4 as an \n",
+    "   update to logAfterUmlsApi by \n",
+    "   \n",
+    "   append newUmlsWithSemanticGroupData\n",
+    "   \n",
+    "10. Create updated training file (GoldStandard) for ML script\n",
+    "\n",
+    "Google Translate API, https://cloud.google.com/translate/\n",
+    "But it's not free; https://stackoverflow.com/questions/37667671/is-it-possible-to-access-to-google-translate-api-for-free\n",
+    "\n",
+    "\n",
+    "## FIXMEs\n",
+    "\n",
+    "Things Dan wrote for Dan; modify as needed. There are more FIXMEs in context.\n",
+    "\n",
+    "* [ ] Improve/clarify processing flow\n",
+    "* [ ] Change SemanticNetworkReference.UniqueID to SemanticTypeCode\n",
+    "* [ ] Add SemanticNetworkReference.SemanticTypeCode to what goes into the logs, for ML.\n",
+    "\n",
+    "\n",
+    "## RESOURCES\n",
+    "\n",
+    "* Register at UMLS, get a UMLS-UTS API key, and add it below. This is the \n",
+    "primary source for Semantic Type classifications.\n",
+    "https://documentation.uts.nlm.nih.gov/rest/authentication.html\n",
+    "* UMLS quick start: \n",
+    "UMLS description of what Normalized String option is, \n",
+    "https://uts.nlm.nih.gov/doc/devGuide/webservices/metaops/find/find2.html\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [
+    {
+     "ename": "FileNotFoundError",
+     "evalue": "[Errno 2] No such file or directory: '01_Pre-processing_files/listOfUniqueUnassignedAfterGS.xlsx'",
+     "output_type": "error",
+     "traceback": [
+      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
+      "\u001b[0;31mFileNotFoundError\u001b[0m                         Traceback (most recent call last)",
+      "\u001b[0;32m<ipython-input-1-44823dadb110>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m()\u001b[0m\n\u001b[1;32m     21\u001b[0m \u001b[0;31m# If you're starting a new session an this is not already open\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m     22\u001b[0m \u001b[0mlistOfUniqueUnassignedAfterGS\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m'01_Pre-processing_files/listOfUniqueUnassignedAfterGS.xlsx'\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 23\u001b[0;31m \u001b[0mlistOfUniqueUnassignedAfterGS\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mpd\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mread_excel\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mlistOfUniqueUnassignedAfterGS\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m     24\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m     25\u001b[0m \u001b[0;31m# Bring in historical file of (somewhat edited) matches\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
+      "\u001b[0;32m~/anaconda3/lib/python3.6/site-packages/pandas/util/_decorators.py\u001b[0m in \u001b[0;36mwrapper\u001b[0;34m(*args, **kwargs)\u001b[0m\n\u001b[1;32m    176\u001b[0m                 \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    177\u001b[0m                     \u001b[0mkwargs\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mnew_arg_name\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mnew_arg_value\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 178\u001b[0;31m             \u001b[0;32mreturn\u001b[0m \u001b[0mfunc\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m*\u001b[0m\u001b[0margs\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    179\u001b[0m         \u001b[0;32mreturn\u001b[0m \u001b[0mwrapper\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    180\u001b[0m     \u001b[0;32mreturn\u001b[0m \u001b[0m_deprecate_kwarg\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
+      "\u001b[0;32m~/anaconda3/lib/python3.6/site-packages/pandas/util/_decorators.py\u001b[0m in \u001b[0;36mwrapper\u001b[0;34m(*args, **kwargs)\u001b[0m\n\u001b[1;32m    176\u001b[0m                 \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    177\u001b[0m                     \u001b[0mkwargs\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mnew_arg_name\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mnew_arg_value\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 178\u001b[0;31m             \u001b[0;32mreturn\u001b[0m \u001b[0mfunc\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m*\u001b[0m\u001b[0margs\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    179\u001b[0m         \u001b[0;32mreturn\u001b[0m \u001b[0mwrapper\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    180\u001b[0m     \u001b[0;32mreturn\u001b[0m \u001b[0m_deprecate_kwarg\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
+      "\u001b[0;32m~/anaconda3/lib/python3.6/site-packages/pandas/io/excel.py\u001b[0m in \u001b[0;36mread_excel\u001b[0;34m(io, sheet_name, header, names, index_col, usecols, squeeze, dtype, engine, converters, true_values, false_values, skiprows, nrows, na_values, parse_dates, date_parser, thousands, comment, skipfooter, convert_float, **kwds)\u001b[0m\n\u001b[1;32m    305\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    306\u001b[0m     \u001b[0;32mif\u001b[0m \u001b[0;32mnot\u001b[0m \u001b[0misinstance\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mio\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mExcelFile\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 307\u001b[0;31m         \u001b[0mio\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mExcelFile\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mio\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mengine\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mengine\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    308\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    309\u001b[0m     return io.parse(\n",
+      "\u001b[0;32m~/anaconda3/lib/python3.6/site-packages/pandas/io/excel.py\u001b[0m in \u001b[0;36m__init__\u001b[0;34m(self, io, **kwds)\u001b[0m\n\u001b[1;32m    392\u001b[0m             \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mbook\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mxlrd\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mopen_workbook\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mfile_contents\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mdata\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    393\u001b[0m         \u001b[0;32melif\u001b[0m \u001b[0misinstance\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_io\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mcompat\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mstring_types\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 394\u001b[0;31m             \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mbook\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mxlrd\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mopen_workbook\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_io\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    395\u001b[0m         \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    396\u001b[0m             raise ValueError('Must explicitly set engine if not passing in'\n",
+      "\u001b[0;32m~/anaconda3/lib/python3.6/site-packages/xlrd/__init__.py\u001b[0m in \u001b[0;36mopen_workbook\u001b[0;34m(filename, logfile, verbosity, use_mmap, file_contents, encoding_override, formatting_info, on_demand, ragged_rows)\u001b[0m\n\u001b[1;32m    114\u001b[0m         \u001b[0mpeek\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mfile_contents\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0mpeeksz\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    115\u001b[0m     \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 116\u001b[0;31m         \u001b[0;32mwith\u001b[0m \u001b[0mopen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mfilename\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m\"rb\"\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;32mas\u001b[0m \u001b[0mf\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    117\u001b[0m             \u001b[0mpeek\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mf\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mread\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mpeeksz\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    118\u001b[0m     \u001b[0;32mif\u001b[0m \u001b[0mpeek\u001b[0m \u001b[0;34m==\u001b[0m \u001b[0;34mb\"PK\\x03\\x04\"\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0;31m# a ZIP file\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
+      "\u001b[0;31mFileNotFoundError\u001b[0m: [Errno 2] No such file or directory: '01_Pre-processing_files/listOfUniqueUnassignedAfterGS.xlsx'"
+     ]
+    }
+   ],
+   "source": [
+    "# 1. Start-up / What to put into place, where\n",
+    "# ============================================\n",
+    "\n",
+    "import pandas as pd\n",
+    "import matplotlib.pyplot as plt\n",
+    "from matplotlib.pyplot import pie, axis, show\n",
+    "import numpy as np\n",
+    "import requests\n",
+    "import json\n",
+    "import lxml.html as lh\n",
+    "from lxml.html import fromstring\n",
+    "import time\n",
+    "import os\n",
+    "\n",
+    "# Set working directory\n",
+    "os.chdir('/Users/wendlingd/webDS')\n",
+    "\n",
+    "\n",
+    "localDir = '02_Run_APIs_files/'\n",
+    "\n",
+    "# If you're starting a new session an this is not already open\n",
+    "listOfUniqueUnassignedAfterGS = '01_Pre-processing_files/listOfUniqueUnassignedAfterGS.xlsx'\n",
+    "listOfUniqueUnassignedAfterGS = pd.read_excel(listOfUniqueUnassignedAfterGS)\n",
+    "\n",
+    "# Bring in historical file of (somewhat edited) matches\n",
+    "GoldStandard = '01_Pre-processing_files/GoldStandard_master.xlsx'\n",
+    "GoldStandard = pd.read_excel(GoldStandard)\n",
+    "\n",
+    "\n",
+    "\n",
+    "# Get API key\n",
+    "def get_umls_api_key(filename=None):\n",
+    "    key = os.environ.get('UMLS_API_KEY', None)\n",
+    "    if key is not None:\n",
+    "        return key\n",
+    "    if filename is None:\n",
+    "          path = os.environ.get('HOME', None)\n",
+    "          if path is None:\n",
+    "               path = os.environ.get('USERPROFILE', None)\n",
+    "          if path is None:\n",
+    "               path = '.'\n",
+    "          filename = os.path.join(path, '.umls_api_key')\n",
+    "    with open(filename, 'r') as f:\n",
+    "           key = f.readline().strip()\n",
+    "    return key\n",
+    "\n",
+    "myUTSAPIkey = get_umls_api_key()\n",
+    "\n",
+    "\n",
+    "'''\n",
+    "GoldStandard.xlsx - Already-assigned term list, from UMLS and other sources, \n",
+    "    vetted.\n",
+    "'''\n",
+    "\n",
+    "\n",
+    "'''\n",
+    "SemanticNetworkReference - Customized version of the list at \n",
+    "https://www.nlm.nih.gov/research/umls/META3_current_semantic_types.html, \n",
+    "to be used to put search terms into huge bins. Should be integrated into \n",
+    "GoldStandard and be available at the end of the ML matching process.\n",
+    "'''\n",
+    "SemanticNetworkReference = '01_Pre-processing_files/SemanticNetworkReference.xlsx'\n",
+    "\n",
+    "\n",
+    "''' \n",
+    "- Run what remains against the UMLS API.\n",
+    "\n",
+    "Requires having your own license and API key; see https://www.nlm.nih.gov/research/umls/\n",
+    "Not shown here: \n",
+    "    - In huge files I sort by count and focus on terms searched by multiple\n",
+    "    or many people. The 'long tail' can be huge.\n",
+    "    - I have a database of terms aready assigned. I match these before \n",
+    "    contacting UMLS; no need to check them again. Shortens processing time.\n",
+    "More options:\n",
+    "    https://documentation.uts.nlm.nih.gov/rest/home.html\n",
+    "    https://documentation.uts.nlm.nih.gov/rest/concept/\n",
+    "\n",
+    "'''\n",
+    "\n",
+    "\n",
+    "\n",
+    "# unassignedAfterUmls1 = pd.read_excel(localdir + 'unassignedAfterUmls1.xlsx')\n",
+    "\n",
+    "'''\n",
+    "Register at RxNorm, get API key, and add it below. This is for drug misspellings.\n",
+    "'''\n",
+    "\n",
+    "# Generate a one-day Ticket-Granting-Ticket (TGT)\n",
+    "tgt = requests.post('https://utslogin.nlm.nih.gov/cas/v1/api-key', data = {'apikey':myUTSAPIkey})\n",
+    "# For API key get a license from https://www.nlm.nih.gov/research/umls/\n",
+    "# tgt.text\n",
+    "response = fromstring(tgt.text)\n",
+    "todaysTgt = response.xpath('//form/@action')[0]\n",
+    "\n",
+    "uiUri = \"https://uts-ws.nlm.nih.gov/rest/search/current?\"\n",
+    "semUri = \"https://uts-ws.nlm.nih.gov/rest/content/current/CUI/\"\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# 2. UmlsApi1 - Normalized string matching\n",
+    "# =========================================\n",
+    "'''\n",
+    "In this run the API calls use the Normalized String setting. Example: \n",
+    "for the input string Yellow leaves, normalizedString would return two strings, \n",
+    "leaf yellow and leave yellow. Each string would be matched exactly to the \n",
+    "strings in the normalized string index to return a result. \n",
+    "\n",
+    "Re-start:\n",
+    "# listOfUniqueUnassignedAfterGS = pd.read_excel('01_Pre-processing_files/listOfUniqueUnassignedAfterGS.xlsx')\n",
+    "\n",
+    "listToCheck6 = pd.read_excel(localDir + 'listToCheck6.xlsx')\n",
+    "listToCheck7 = pd.read_excel(localDir + 'listToCheck7.xlsx')\n",
+    "'''\n",
+    "\n",
+    "# ---------------------------------------\n",
+    "# Batch rows so you can do separate runs\n",
+    "# Max of 5,000 rows per run\n",
+    "# ---------------------------------------\n",
+    "\n",
+    "# uniqueSearchTerms = search['adjustedQueryCase'].unique()\n",
+    "\n",
+    "# Reduce entry length, to focus on single concepts that UTS API can match\n",
+    "listOfUniqueUnassignedAfterGS = listOfUniqueUnassignedAfterGS.loc[(listOfUniqueUnassignedAfterGS['adjustedQueryCase'].str.len() <= 20) == True]\n",
+    "\n",
+    "\n",
+    "# listToCheck1 = unassignedAfterGS.iloc[0:20]\n",
+    "listToCheck1 = listOfUniqueUnassignedAfterGS.iloc[0:6000]\n",
+    "listToCheck2 = listOfUniqueUnassignedAfterGS.iloc[6001:12000]\n",
+    "listToCheck3 = listOfUniqueUnassignedAfterGS.iloc[12001:18000]\n",
+    "listToCheck4 = listOfUniqueUnassignedAfterGS.iloc[18001:24000]\n",
+    "listToCheck5 = listOfUniqueUnassignedAfterGS.iloc[24001:30000]\n",
+    "listToCheck6 = listOfUniqueUnassignedAfterGS.iloc[30001:36000]\n",
+    "listToCheck7 = listOfUniqueUnassignedAfterGS.iloc[36001:39523]\n",
+    "\n",
+    "\n",
+    "\n",
+    "'''\n",
+    "listToCheck1 = unassignedToCheck.iloc[12497:20000]\n",
+    "listToCheck2 = unassignedToCheck.iloc[20001:26000]\n",
+    "listToCheck3 = unassignedToCheck.iloc[23225:28000]\n",
+    "listToCheck4 = unassignedToCheck.iloc[28001:31256]\n",
+    "\n",
+    "mask = (unassignedToCheck['adjustedQueryCase'].str.len() <= 15)\n",
+    "listToCheck3 = listToCheck3.loc[mask]\n",
+    "listToCheck4 = listToCheck4.loc[mask]\n",
+    "'''\n",
+    "\n",
+    "\n",
+    "# If multiple sessions required, saving to file might help\n",
+    "writer = pd.ExcelWriter(localDir + 'listToCheck7.xlsx')\n",
+    "listToCheck7.to_excel(writer,'listToCheck7')\n",
+    "# df2.to_excel(writer,'Sheet2')\n",
+    "writer.save()\n",
+    "\n",
+    "writer = pd.ExcelWriter(localDir + 'listToCheck2.xlsx')\n",
+    "listToCheck2.to_excel(writer,'listToCheck2')\n",
+    "# df2.to_excel(writer,'Sheet2')\n",
+    "writer.save()\n",
+    "\n",
+    "'''\n",
+    "OPTIONS\n",
+    "\n",
+    "# Bring in from file\n",
+    "listToCheck3 = pd.read_excel(localDir + 'listToCheck3.xlsx')\n",
+    "listToCheck4 = pd.read_excel(localDir + 'listToCheck4.xlsx')\n",
+    "\n",
+    "listToCheck1 = unassignedAfterGS\n",
+    "listToCheck2 = unassignedAfterGS.iloc[5001:10000]\n",
+    "listToCheck1 = unassignedAfterGS.iloc[10001:11335]\n",
+    "'''\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Run this block after changing listToCheck# top and bottom\n",
+    "# ----------------------------------------------------------\n",
+    "'''\n",
+    "Until you put this into a function, you need to change listToCheck# \n",
+    "and apiGetNormalizedString# counts every run!\n",
+    "Stay below 30 API requests per second. With 4 API requests per item\n",
+    "(2 .get and 2 .post requests)...\n",
+    "time.sleep commented out: 6,000 / 35 min = 171 per minute = 2.9 items per second / 11.4 requests per second\n",
+    "Computing differently, 6,000 items @ 4 Req per item = 24,000 Req, divided by 35 min+\n",
+    "686 Req/min = 11.4 Req/sec\n",
+    "time.sleep(.07):  ~38 minutes to do 6,000; 158 per minute / 2.6 items per second\n",
+    "'''\n",
+    "\n",
+    "apiGetNormalizedString = pd.DataFrame()\n",
+    "apiGetNormalizedString['adjustedQueryCase'] = \"\"\n",
+    "apiGetNormalizedString['preferredTerm'] = \"\"\n",
+    "apiGetNormalizedString['SemanticTypeName'] = \"\"\n",
+    "\n",
+    "'''\n",
+    "For file 6, 7/5/18 1:05 p.m.: SSLError: HTTPSConnectionPool(host='utslogin.nlm.nih.gov', \n",
+    "port=443): Max retries exceeded with url: \n",
+    "/cas/v1/api-key/TGT-480224-qLwYAMKl5cTfa7Jwb7RWZ3kfexPUm479HfddD7yVUKt79lZ0Ta-cas \n",
+    "(Caused by SSLError(SSLError(\"bad handshake: SysCallError(60, 'ETIMEDOUT')\",),))\n",
+    "\n",
+    "Later, run 6 and 7\n",
+    "'''\n",
+    "\n",
+    "\n",
+    "for index, row in listToCheck7.iterrows():\n",
+    "    currLogTerm = row['adjustedQueryCase']\n",
+    "    # === Get 'preferred term' and its concept identifier (CUI/UI) =========\n",
+    "    stTicket = requests.post(todaysTgt, data = {'service':'http://umlsks.nlm.nih.gov'}) # Get single-use Service Ticket (ST)\n",
+    "    # Example: GET https://uts-ws.nlm.nih.gov/rest/search/current?string=tylenol&sabs=MSH&ticket=ST-681163-bDfgQz5vKe2DJXvI4Snm-cas\n",
+    "    tQuery = {'string':currLogTerm, 'searchType':'normalizedString', 'ticket':stTicket.text} # removed 'sabs':'MSH', \n",
+    "    getPrefTerm = requests.get(uiUri, params=tQuery)\n",
+    "    getPrefTerm.encoding = 'utf-8'\n",
+    "    tItems  = json.loads(getPrefTerm.text)\n",
+    "    tJson = tItems[\"result\"]\n",
+    "    if tJson[\"results\"][0][\"ui\"] != \"NONE\": # Sub-loop to resolve \"NONE\"\n",
+    "        currUi = tJson[\"results\"][0][\"ui\"]\n",
+    "        currPrefTerm = tJson[\"results\"][0][\"name\"]\n",
+    "        # === Get 'semantic type' =========\n",
+    "        stTicket = requests.post(todaysTgt, data = {'service':'http://umlsks.nlm.nih.gov'}) # Get single-use Service Ticket (ST)\n",
+    "        # Example: GET https://uts-ws.nlm.nih.gov/rest/content/current/CUI/C0699142?ticket=ST-512564-vUxzyI00ErMRm6tjefNP-cas\n",
+    "        semQuery = {'ticket':stTicket.text}\n",
+    "        getPrefTerm = requests.get(semUri+currUi, params=semQuery)\n",
+    "        getPrefTerm.encoding = 'utf-8'\n",
+    "        semItems  = json.loads(getPrefTerm.text)\n",
+    "        semJson = semItems[\"result\"]\n",
+    "        currSemTypes = []\n",
+    "        for name in semJson[\"semanticTypes\"]:\n",
+    "            currSemTypes.append(name[\"name\"]) #  + \" ; \"\n",
+    "        # === Post to dataframe =========\n",
+    "        apiGetNormalizedString = apiGetNormalizedString.append(pd.DataFrame({'adjustedQueryCase': currLogTerm, \n",
+    "                                                       'preferredTerm': currPrefTerm, \n",
+    "                                                       'SemanticTypeName': currSemTypes[0]}, index=[0]), ignore_index=True)\n",
+    "        print('{} --> {}'.format(currLogTerm, currSemTypes[0])) # Write progress to console\n",
+    "        # time.sleep(.06)\n",
+    "    else:\n",
+    "       # Post \"NONE\" to database and restart loop\n",
+    "        apiGetNormalizedString = apiGetNormalizedString.append(pd.DataFrame({'adjustedQueryCase': currLogTerm, 'preferredTerm': \"NONE\"}, index=[0]), ignore_index=True)\n",
+    "        print('{} --> NONE'.format(currLogTerm, )) # Write progress to console\n",
+    "        # time.sleep(.06)\n",
+    "print (\"* Done *\")\n",
+    "\n",
+    "\n",
+    "writer = pd.ExcelWriter(localDir + 'apiGetNormalizedString7.xlsx')\n",
+    "apiGetNormalizedString.to_excel(writer,'apiGetNormalizedString')\n",
+    "# df2.to_excel(writer,'Sheet2')\n",
+    "writer.save()\n",
+    "\n",
+    "\n",
+    "# Free up memory: Remove listToCheck, listToCheck1, listToCheck2, listToCheck3, \n",
+    "# listToCheck4, nonForeign, searchLog, unassignedAfterGS\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# 3. Isolate entries updated by API, complete tagging, and match to \n",
+    "# the current version of the search log - logAfterUmlsApi\n",
+    "# ==================================================================\n",
+    "'''\n",
+    "To Do:\n",
+    "\n",
+    "    Isolate new assignments and:\n",
+    "    - merge them into the master version of the log\n",
+    "    - add to GoldStandard for next time\n",
+    " \n",
+    " # Move unassigned entries into workflow for human identification\n",
+    " \n",
+    "To re-start\n",
+    "\n",
+    "unassignedAfterGS = pd.read_excel(localDir + 'unassignedAfterGS.xlsx')\n",
+    "logAfterGoldStandard = pd.read_excel(localDir + 'logAfterGoldStandard.xlsx')\n",
+    "\n",
+    "listFromApi = pd.read_excel('02_UMLS_API_files/listFromApi1-April-May.xlsx')\n",
+    "assignedByUmlsApi = pd.read_excel(localDir + 'assignedByUmlsApi.xlsx')\n",
+    "\n",
+    "# Fix temporary issue of nulls in SemanticTypeName, and wrong col name semTypeName\n",
+    " \n",
+    "listFromApi.drop(['SemanticTypeName'], axis=1, inplace=True)\n",
+    "listFromApi.rename(columns={'semTypeName': 'SemanticTypeName'}, inplace=True)\n",
+    "\n",
+    "# listFromApi = listFromApi.dropna(subset=['SemanticTypeName'])\n",
+    " '''\n",
+    " \n",
+    "\n",
+    "# If you stored output from UMLS API in files, re-open and unite\n",
+    "newAssignments1 = pd.read_excel(localDir + 'apiGetNormalizedString1.xlsx')\n",
+    "newAssignments2 = pd.read_excel(localDir + 'apiGetNormalizedString2.xlsx')\n",
+    "newAssignments3 = pd.read_excel(localDir + 'apiGetNormalizedString3.xlsx')\n",
+    "newAssignments4 = pd.read_excel(localDir + 'apiGetNormalizedString4.xlsx')\n",
+    "newAssignments5 = pd.read_excel(localDir + 'apiGetNormalizedString5.xlsx')\n",
+    "newAssignments6 = pd.read_excel(localDir + 'apiGetNormalizedString6.xlsx')\n",
+    "newAssignments7 = pd.read_excel(localDir + 'apiGetNormalizedString7.xlsx')\n",
+    "\n",
+    "\n",
+    "# Put dataframes together into one; df = df1.append([df2, df3])\n",
+    "afterUmlsApi1 = newAssignments1.append([newAssignments2, newAssignments3, newAssignments4, newAssignments5])\n",
+    "afterUmlsApi1 = newAssignments6.append([newAssignments7])\n",
+    "\n",
+    "\n",
+    "'''\n",
+    "afterUmlsApi1 = afterUmlsApi1.append(newAssignments3)\n",
+    "afterUmlsApi1 = afterUmlsApi1.append(newAssignments4)\n",
+    "'''\n",
+    "\n",
+    "\n",
+    "# If you only used one df for listFromApi\n",
+    "# afterUMLSapi = listFromApi\n",
+    "# assignedByUmlsApi = listFromApi\n",
+    "\n",
+    "\n",
+    "# Reduce to a version that has only successful assignments\n",
+    "\n",
+    "# Remove various problem entries\n",
+    "assignedByUmlsApi1 = afterUmlsApi1.loc[(afterUmlsApi1['preferredTerm'] != \"NONE\")]\n",
+    "assignedByUmlsApi1 = assignedByUmlsApi1[~pd.isnull(assignedByUmlsApi1['preferredTerm'])]\n",
+    "assignedByUmlsApi1 = assignedByUmlsApi1.loc[(assignedByUmlsApi1['preferredTerm'] != \"Null Value\")]\n",
+    "assignedByUmlsApi1 = assignedByUmlsApi1[~pd.isnull(assignedByUmlsApi1['adjustedQueryCase'])]\n",
+    "\n",
+    "\n",
+    "# If you want to send to Excel\n",
+    "writer = pd.ExcelWriter(localDir + 'assignedByUmlsApi1.xlsx')\n",
+    "assignedByUmlsApi1.to_excel(writer,'assignedByUmlsApi1')\n",
+    "# df2.to_excel(writer,'Sheet2')\n",
+    "writer.save()\n",
+    "\n",
+    "\n",
+    "# Bring in subject category master file\n",
+    "# SemanticNetworkReference = pd.read_excel(localDir + 'SemanticNetworkReference.xlsx')\n",
+    "SemanticNetworkReference = pd.read_excel(SemanticNetworkReference)\n",
+    "\n",
+    "# Reduce to required cols\n",
+    "SemTypeData = SemanticNetworkReference[['SemanticTypeName', 'SemanticGroupCode', 'SemanticGroup', 'CustomTreeNumber', 'BranchPosition']]\n",
+    "# SemTypeData.rename(columns={'SemanticTypeName': 'semTypeName'}, inplace=True) # The join col\n",
+    "\n",
+    "# Add more semantic tagging to new UMLS API adds\n",
+    "newUmlsWithSemanticGroupData = pd.merge(assignedByUmlsApi1, SemTypeData, how='left', on='SemanticTypeName')\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# 4. Create logAfterUmlsApi as an update to logAfterGoldStandard by appending \n",
+    "# newUmlsWithSemanticGroupData\n",
+    "# ============================================================================\n",
+    "\n",
+    "'''\n",
+    "Depending on what you're processing, use this or the next section of the below.\n",
+    "\n",
+    "Depends on how you choose to process - Like, down to one occurrence to API \n",
+    "in first batch, or not.\n",
+    "'''\n",
+    "\n",
+    "\n",
+    "logAfterGoldStandard = '01_Pre-processing_files/logAfterGoldStandard.xlsx'\n",
+    "logAfterGoldStandard = pd.read_excel(logAfterGoldStandard)\n",
+    "\n",
+    "\n",
+    "'''\n",
+    "# FIXME - Remove after this is fixed within the fixme above.\n",
+    "logAfterGoldStandard = logAfterGoldStandard.sort_values(by='adjustedQueryCase', ascending=True)\n",
+    "logAfterGoldStandard = logAfterGoldStandard.reset_index()\n",
+    "logAfterGoldStandard.drop(['index'], axis=1, inplace=True)\n",
+    "'''\n",
+    "\n",
+    "\n",
+    "# Eyeball. If you need to remove rows...\n",
+    "# logAfterGoldStandard = logAfterGoldStandard.iloc[760:] # remove before index...\n",
+    "\n",
+    "# Join new UMLS API adds to the current search log master\n",
+    "logAfterUmlsApi1 = pd.merge(logAfterGoldStandard, newUmlsWithSemanticGroupData, how='left', on='adjustedQueryCase')\n",
+    "\n",
+    "logAfterUmlsApi1.columns\n",
+    "\n",
+    "'''\n",
+    "['SessionID', 'StaffYN', 'Referrer', 'Query', 'Timestamp',\n",
+    "       'adjustedQueryCase', 'SemanticTypeName_x', 'SemanticGroup_x',\n",
+    "       'SemanticGroupCode_x', 'BranchPosition_x', 'CustomTreeNumber_x',\n",
+    "       'ResourceType', 'Address', 'EntrySource', 'contentSteward',\n",
+    "       'preferredTerm_x', 'SemanticTypeName_y', 'preferredTerm_y',\n",
+    "       'SemanticGroupCode_y', 'SemanticGroup_y', 'CustomTreeNumber_y',\n",
+    "       'BranchPosition_y']\n",
+    "\n",
+    "'''\n",
+    "\n",
+    "\n",
+    "# Future: Look for a better way to do the above - MERGE WITH CONDITIONAL OVERWRITE. Temporary fix:\n",
+    "logAfterUmlsApi1['preferredTerm2'] = logAfterUmlsApi1['preferredTerm_x'].where(logAfterUmlsApi1['preferredTerm_x'].notnull(), logAfterUmlsApi1['preferredTerm_y'])\n",
+    "logAfterUmlsApi1['SemanticTypeName2'] = logAfterUmlsApi1['SemanticTypeName_x'].where(logAfterUmlsApi1['SemanticTypeName_x'].notnull(), logAfterUmlsApi1['SemanticTypeName_y'])\n",
+    "logAfterUmlsApi1['SemanticGroup2'] = logAfterUmlsApi1['SemanticGroup_x'].where(logAfterUmlsApi1['SemanticGroup_x'].notnull(), logAfterUmlsApi1['SemanticGroup_y'])\n",
+    "logAfterUmlsApi1['SemanticGroupCode2'] = logAfterUmlsApi1['SemanticGroupCode_x'].where(logAfterUmlsApi1['SemanticGroupCode_x'].notnull(), logAfterUmlsApi1['SemanticGroupCode_y'])\n",
+    "logAfterUmlsApi1['BranchPosition2'] = logAfterUmlsApi1['BranchPosition_x'].where(logAfterUmlsApi1['BranchPosition_x'].notnull(), logAfterUmlsApi1['BranchPosition_y'])\n",
+    "logAfterUmlsApi1['CustomTreeNumber2'] = logAfterUmlsApi1['CustomTreeNumber_x'].where(logAfterUmlsApi1['CustomTreeNumber_x'].notnull(), logAfterUmlsApi1['CustomTreeNumber_y'])\n",
+    "logAfterUmlsApi1.drop(['preferredTerm_x', 'preferredTerm_y',\n",
+    "                          'SemanticTypeName_x', 'SemanticTypeName_y',\n",
+    "                          'SemanticGroup_x', 'SemanticGroup_y',\n",
+    "                          'SemanticGroupCode_x', 'SemanticGroupCode_y',\n",
+    "                          'BranchPosition_x', 'BranchPosition_y', \n",
+    "                          'CustomTreeNumber_x', 'CustomTreeNumber_y'], axis=1, inplace=True)\n",
+    "logAfterUmlsApi1.rename(columns={'preferredTerm2': 'preferredTerm',\n",
+    "                                    'SemanticTypeName2': 'SemanticTypeName',\n",
+    "                                    'SemanticGroup2': 'SemanticGroup',\n",
+    "                                    'SemanticGroupCode2': 'SemanticGroupCode',\n",
+    "                                    'BranchPosition2': 'BranchPosition',\n",
+    "                                    'CustomTreeNumber2': 'CustomTreeNumber'\n",
+    "                                    }, inplace=True)\n",
+    "\n",
+    "# Save to file so you can open in future sessions, if needed\n",
+    "writer = pd.ExcelWriter(localDir + 'logAfterUmlsApi1.xlsx')\n",
+    "logAfterUmlsApi1.to_excel(writer,'logAfterUmlsApi1')\n",
+    "# df2.to_excel(writer,'Sheet2')\n",
+    "writer.save()\n",
+    "\n",
+    "'''\n",
+    "To Do:\n",
+    "    - Create list of unmatched terms with freq\n",
+    "    - Cluster similar spellings together?\n",
+    "    \n",
+    "- Look at \"Not currently matchable\" terms with \"high\" frequency counts. Eyeball to see if these were incorrectly matched in the past; assign historical term or update all to new term, save in gold standard file.\n",
+    "- Process entries from the PubMed product page.\n",
+    "- If you haven't done so, update RegEx list to improve future matching.\n",
+    "- Every several months, through Flask interface, interactively update the gold standard, manually.\n",
+    "\n",
+    "# Reduce logAfterUmlsApi to unique, unmatched entries, prep for ML\n",
+    "\n",
+    "To re-start:\n",
+    "logAfterUmlsApi = pd.read_excel(localDir + 'logAfterUmlsApi.xlsx')\n",
+    "'''\n",
+    "\n",
+    "\n",
+    "# ------------------------------------\n",
+    "# Visualize results - logAfterUmlsApi\n",
+    "# ------------------------------------\n",
+    "    \n",
+    "# Pie for percentage of rows assigned; https://pythonspot.com/matplotlib-pie-chart/\n",
+    "totCount = len(logAfterUmlsApi1)\n",
+    "unassigned = logAfterUmlsApi1['SemanticGroup'].isnull().sum()\n",
+    "assigned = totCount - unassigned\n",
+    "labels = ['Assigned', 'Unassigned']\n",
+    "sizes = [assigned, unassigned]\n",
+    "colors = ['lightskyblue', 'lightcoral']\n",
+    "explode = (0.1, 0)  # explode 1st slice\n",
+    "plt.pie(sizes, explode=explode, labels=labels, colors=colors,\n",
+    "        autopct='%1.f%%', shadow=True, startangle=100)\n",
+    "plt.axis('equal')\n",
+    "plt.title(\"Status after 'UMLS API' processing\")\n",
+    "plt.show()\n",
+    "\n",
+    "# Bar of SemanticGroup categories, horizontal\n",
+    "# Source: http://robertmitchellv.com/blog-bar-chart-annotations-pandas-mpl.html\n",
+    "ax = logAfterUmlsApi1['SemanticGroup'].value_counts().plot(kind='barh', figsize=(10,6),\n",
+    "                                                 color=\"slateblue\", fontsize=10);\n",
+    "ax.set_alpha(0.8)\n",
+    "ax.set_title(\"Categories assigned after 'UMLS API' processing\", fontsize=14)\n",
+    "ax.set_xlabel(\"Number of searches\", fontsize=9);\n",
+    "# set individual bar lables using above list\n",
+    "for i in ax.patches:\n",
+    "    # get_width pulls left or right; get_y pushes up or down\n",
+    "    ax.text(i.get_width()+.1, i.get_y()+.31, \\\n",
+    "            str(round((i.get_width()), 2)), fontsize=9, color='dimgrey')\n",
+    "# invert for largest on top \n",
+    "ax.invert_yaxis()\n",
+    "plt.gcf().subplots_adjust(left=0.3)\n",
+    "\n",
+    "# Remove listOfUniqueUnassignedAfterGS, listToCheck1, etc., logAfterGoldStandard, logAfterUmlsApi1, \n",
+    "# newAssignments1 etc.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# 5. Update GoldStandard\n",
+    "# =======================\n",
+    "\n",
+    "# Open GoldStandard if needed\n",
+    "GoldStandard = '01_Pre-processing_files/GoldStandard.xlsx'\n",
+    "GoldStandard = pd.read_excel(GoldStandard)\n",
+    "\n",
+    "# Append fully tagged UMLS API adds to GoldStandard\n",
+    "GoldStandard = GoldStandard.append(newUmlsWithSemanticGroupData, sort=False)\n",
+    "\n",
+    "# Reset index\n",
+    "GoldStandard = GoldStandard.reset_index()\n",
+    "GoldStandard.drop(['index'], axis=1, inplace=True)\n",
+    "# temp GoldStandard.drop(['adjustedQueryCase'], axis=1, inplace=True)\n",
+    "\n",
+    "'''\n",
+    "Eyeball top and bottom of cols, remove rows by Index, if needed\n",
+    "\n",
+    "GoldStandard.drop(58027, inplace=True)\n",
+    "'''\n",
+    "\n",
+    "\n",
+    "# Write out the updated GoldStandard\n",
+    "writer = pd.ExcelWriter('01_Pre-processing_files/GoldStandard.xlsx')\n",
+    "GoldStandard.to_excel(writer,'GoldStandard')\n",
+    "writer.save()\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# 6. Start new 'uniques' dataframe that gets new column for each of the below\n",
+    "# listOfUniqueUnassignedAfterUmls1\n",
+    "# ============================================================================\n",
+    "\n",
+    "'''\n",
+    "To Do:\n",
+    "    - Create list of unmatched terms with freq\n",
+    "    - Cluster similar spellings together?\n",
+    "    \n",
+    "- Look at \"Not currently matchable\" terms with \"high\" frequency counts. Eyeball to see if these were incorrectly matched in the past; assign historical term or update all to new term, save in gold standard file.\n",
+    "- Process entries from the PubMed product page.\n",
+    "- If you haven't done so, update RegEx list to improve future matching.\n",
+    "- Every several months, through Flask interface, interactively update the gold standard, manually.\n",
+    "\n",
+    "# Reduce logAfterUmlsApi to unique, unmatched entries, prep for ML\n",
+    "\n",
+    "To re-start:\n",
+    "logAfterUmlsApi = pd.read_excel(localDir + 'logAfterUmlsApi.xlsx')\n",
+    "'''\n",
+    "\n",
+    "listOfUniqueUnassignedAfterUmls1 = logAfterUmlsApi1[pd.isnull(logAfterUmlsApi1['SemanticGroup'])]\n",
+    "listOfUniqueUnassignedAfterUmls1 = listOfUniqueUnassignedAfterUmls1.groupby('adjustedQueryCase').size()\n",
+    "listOfUniqueUnassignedAfterUmls1 = pd.DataFrame({'timesSearched':listOfUniqueUnassignedAfterUmls1})\n",
+    "listOfUniqueUnassignedAfterUmls1 = listOfUniqueUnassignedAfterUmls1.sort_values(by='timesSearched', ascending=False)\n",
+    "listOfUniqueUnassignedAfterUmls1 = listOfUniqueUnassignedAfterUmls1.reset_index()\n",
+    "\n",
+    "writer = pd.ExcelWriter(localDir + 'listOfUniqueUnassignedAfterUmls11.xlsx')\n",
+    "listOfUniqueUnassignedAfterUmls1.to_excel(writer,'unassignedToCheck')\n",
+    "writer.save()\n",
+    "\n",
+    "# FY 18 Q3: 57,287\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# 5. Google Translate API, https://cloud.google.com/translate/\n",
+    "# =============================================================\n",
+    "'''\n",
+    "But it's not free; https://stackoverflow.com/questions/37667671/is-it-possible-to-access-to-google-translate-api-for-free\n",
+    "'''\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# 5. UmlsApi2 - Tag non-English terms in Roman character sets\n",
+    "# ==========================================================================\n",
+    "'''\n",
+    "Some foreign terms can be matched. This run does not return a preferred term,\n",
+    "just returns what vocabulary the term is found in. \n",
+    "\n",
+    "Queries with words not in English are ignored by the first API run using\n",
+    "\"normalized string\" matching. Here, try flagging what you can and take them \n",
+    "out of the percent-complete calculation.\n",
+    "\n",
+    "The API apparently only supports U.S. English. RegEx could be used to convert\n",
+    "UTF-8 Roman characters that are not English... Non-Roman languages (Chinese, \n",
+    "Cyrillic, Arabic, Japanese, etc.) are not supported by the API; these should \n",
+    "be kept out of the API runs entirely.\n",
+    "\n",
+    "6/22/18, from David of UMLS support, TRACKING:000308010\n",
+    "\n",
+    "> Can the UMLS REST API tell me the term's language? \n",
+    "\n",
+    "One option would be to specify returnIdType=sourceUi for your search \n",
+    "request. For example: \n",
+    " \n",
+    "https://uts-ws.nlm.nih.gov/rest/search/current?string=Infarto de miocardio&returnIdType=sourceUi&ticket=\n",
+    "\n",
+    "This will give you a set of codes back where there is a match, but will \n",
+    "also return a vocabulary (rootSource). If you have that, you can get \n",
+    "the language (in this case, Spanish). The first result may be all you \n",
+    "need. If you have the rootSource, you can match it to the \"abbreviation\" \n",
+    "and look up the language here: https://uts-ws.nlm.nih.gov/rest/metadata/current/sources. \n",
+    " \n",
+    "It won't be perfect. I'm seeing some problems with accented characters. \n",
+    "For example, coração returns no results, so that's not great, but may \n",
+    "not matter. Some strings will appear in multiple languages, too. \n",
+    " \n",
+    "Let me know how that works for you. - David\n",
+    "'''\n",
+    "\n",
+    "\n",
+    "# ------------------------------------------------------\n",
+    "# Batch up your API runs. Re-starting, correcting, etc.\n",
+    "# ------------------------------------------------------\n",
+    "\n",
+    "# uniqueSearchTerms = search['adjustedQueryCase'].unique()\n",
+    "\n",
+    "# vocabCheck1 = unassignedAfterGS.iloc[0:20]\n",
+    "vocabCheck1 = listOfUniqueUnassignedAfterUmls1.iloc[0:5000]\n",
+    "# vocabCheck2 = listOfUniqueUnassignedAfterUmls1.iloc[5001:10678]\n",
+    "\n",
+    "\n",
+    "# If multiple sessions required, saving to file might help\n",
+    "writer = pd.ExcelWriter(localDir + 'vocabCheck1.xlsx')\n",
+    "vocabCheck1.to_excel(writer,'vocabCheck')\n",
+    "# df2.to_excel(writer,'Sheet2')\n",
+    "writer.save()\n",
+    "\n",
+    "\n",
+    "'''\n",
+    "writer = pd.ExcelWriter(localDir + 'listToCheck2.xlsx')\n",
+    "listToCheck2.to_excel(writer,'listToCheck2')\n",
+    "# df2.to_excel(writer,'Sheet2')\n",
+    "writer.save()\n",
+    "'''\n",
+    "\n",
+    "\n",
+    "'''\n",
+    "OPTIONS\n",
+    "\n",
+    "# Bring in from file\n",
+    "listToCheck3 = pd.read_excel('01 Pre-process/listToCheck3.xlsx')\n",
+    "listToCheck4 = pd.read_excel('01 Pre-process/listToCheck4.xlsx')\n",
+    "\n",
+    "listToCheck1 = unassignedAfterGS\n",
+    "listToCheck2 = unassignedAfterGS.iloc[5001:10000]\n",
+    "listToCheck1 = unassignedAfterGS.iloc[10001:11335]\n",
+    "'''\n",
+    "\n",
+    "\n",
+    "'''\n",
+    "Work with PostMan app to test/approve\n",
+    "\n",
+    "David from UMLS: One option would be to specify returnIdType=sourceUi for \n",
+    "your search request. For example:\n",
+    " \n",
+    "https://uts-ws.nlm.nih.gov/rest/search/current?string=Infarto de miocardio&returnIdType=sourceUi&ticket=\n",
+    "\n",
+    "This will give you a set of codes back where there is a match, but will \n",
+    "also return a vocabulary (rootSource). If you have that, you can get the \n",
+    "language (in this case, Spanish). The first result may be all you need. \n",
+    "If you have the rootSource, you can match it to the \"abbreviation\" and \n",
+    "look up the language here: https://uts-ws.nlm.nih.gov/rest/metadata/current/sources. \n",
+    " \n",
+    "It won't be perfect. I'm seeing some problems with accented characters. \n",
+    "For example, coração returns no results, so that's not great, but may \n",
+    "not matter. Some strings will appear in multiple languages, too. \n",
+    " \n",
+    "Let me know how that works for you.\n",
+    "'''\n",
+    "\n",
+    "'''\n",
+    "FIXME - Unfinished.\n",
+    "\n",
+    "TGT-16294-ajZgfOTNGBxvzAXAvQslZtuL2U0HksFsED6tZ0ajoewNBNdSVz-cas\n",
+    "\n",
+    "\n",
+    "# THIS IS SOURCE VOCAB CODE\n",
+    "\n",
+    "https://uts-ws.nlm.nih.gov/rest/search/current?string=Infarto de miocardio&returnIdType=sourceUi&ticket=\n",
+    "'''\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Gather list of source vocabularies\n",
+    "# -----------------------------------\n",
+    "\n",
+    "uiUri = \"https://uts-ws.nlm.nih.gov/rest/search/current?returnIdType=sourceUi\"\n",
+    "\n",
+    "listOfSourceVocabularies = pd.DataFrame()\n",
+    "listOfSourceVocabularies['adjustedQueryCase'] = \"\"\n",
+    "listOfSourceVocabularies['sourceVocab'] = \"\"\n",
+    "\n",
+    "for index, row in listToCheck1.iterrows():\n",
+    "    currLogTerm = row['adjustedQueryCase']\n",
+    "    # === Get 'source vocab' =========\n",
+    "    stTicket = requests.post(todaysTgt, data = {'service':'http://umlsks.nlm.nih.gov'}) # Get single-use Service Ticket (ST)\n",
+    "    # Example: GET https://uts-ws.nlm.nih.gov/rest/search/current?string=tylenol&sabs=MSH&ticket=ST-681163-bDfgQz5vKe2DJXvI4Snm-cas\n",
+    "    termQuery = {'string':currLogTerm, 'ticket':stTicket.text} # removed 'searchType':'word' (it's the default),      'sabs':'MSH', \n",
+    "    getSourceVocab = requests.get(uiUri, params=termQuery)\n",
+    "    getSourceVocab.encoding = 'utf-8'\n",
+    "    tItems = json.loads(getSourceVocab.text)\n",
+    "    tJson = tItems[\"result\"]\n",
+    "    if tJson[\"results\"][0][\"ui\"] != \"NONE\": # Sub-loop to resolve \"NONE\"\n",
+    "        currUi = tJson[\"results\"][0][\"rootSource\"]\n",
+    "        sourceVocab = tJson[\"results\"][0][\"rootSource\"]\n",
+    "        # === Post to dataframe =========\n",
+    "        listOfSourceVocabularies = listOfSourceVocabularies.append(pd.DataFrame({'adjustedQueryCase': currLogTerm, \n",
+    "                                                       'sourceVocab': sourceVocab}, index=[0]), ignore_index=True)\n",
+    "        print('{} --> {}'.format(currLogTerm, sourceVocab)) # Write progress to console\n",
+    "        time.sleep(.07)\n",
+    "    else:\n",
+    "       # Post \"NONE\" to database and restart loop\n",
+    "        listOfSourceVocabularies = listOfSourceVocabularies.append(pd.DataFrame({'adjustedQueryCase': currLogTerm, 'sourceVocab': \"NONE\"}, index=[0]), ignore_index=True)\n",
+    "        print('{} --> NONE'.format(currLogTerm, )) # Write progress to console\n",
+    "        time.sleep(.07)\n",
+    "print (\"* Done *\")\n",
+    "\n",
+    "\n",
+    "writer = pd.ExcelWriter(localDir + 'listOfSourceVocabularies.xlsx')\n",
+    "listOfSourceVocabularies.to_excel(writer,'listOfSourceVocabularies')\n",
+    "# df2.to_excel(writer,'Sheet2')\n",
+    "writer.save()\n",
+    "\n",
+    "# Free up memory: Remove listToCheck, listToCheck1, listToCheck2, listToCheck3, \n",
+    "# listToCheck4, nonForeign, searchLog, unassignedAfterGS\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "\n",
+    "# Load external reference file: SourceVocabsForeign.xlsx\n",
+    "\n",
+    "# F&R Foreign vocab names with the language name, \"Spanish,\" \"Swedish\"\n",
+    "\n",
+    "# Append to running list of updates\n",
+    "\n",
+    "\n",
+    "\n",
+    "# ------------------------------------------------------\n",
+    "# Match vocabCheck \n",
+    "# ------------------------------------------------------\n",
+    "'''\n",
+    "FIXME - RESULTING LIST NEEDS TO BE VETTED; START WITH HIGHEST-FREQUENCY USE.\n",
+    "\n",
+    "Update naming? this is the result from the API run for languages\n",
+    "\n",
+    "This custom list of vocabs does not include and English vocabs, therefore, \n",
+    "only foreign matches are returned, which is what we want.\n",
+    "\n",
+    "Re-start:\n",
+    "listOfSourceVocabularies = pd.read_excel(localDir + 'listOfSourceVocabularies.xlsx')\n",
+    "'''\n",
+    "\n",
+    "# Load list of Non-English vocabularies\n",
+    "# 7/5/2018, https://www.nlm.nih.gov/research/umls/sourcereleasedocs/index.html (English vocabs not included.)\n",
+    "UMLS_NonEnglish_Vocabularies = pd.read_excel(localDir + 'UMLS_Non-English_Vocabularies.xlsx')\n",
+    "\n",
+    "# Inner join\n",
+    "foreignButEnglishChar = pd.merge(listOfSourceVocabularies, UMLS_NonEnglish_Vocabularies, how='inner', left_on='sourceVocab', right_on='Vocabulary')\n",
+    "\n",
+    "\n",
+    "# Get frequency count, reduce cols for easier manual checking\n",
+    "PerhapsForeign = pd.merge(foreignButEnglishChar, listOfUniqueUnassignedAfterUmls1, how='inner', on='adjustedQueryCase')\n",
+    "\n",
+    "PerhapsForeign = PerhapsForeign.sort_values(by='timesSearched', ascending=False)\n",
+    "PerhapsForeign = PerhapsForeign.reset_index()\n",
+    "PerhapsForeign.drop(['index'], axis=1, inplace=True)\n",
+    "col = ['adjustedQueryCase', 'timesSearched', 'Language']\n",
+    "PerhapsForeign = PerhapsForeign[col]\n",
+    "PerhapsForeign.rename(columns={'Language': 'LanguageGuess'}, inplace=True)\n",
+    "\n",
+    "# Send out for manual checking\n",
+    "writer = pd.ExcelWriter(localDir + 'PerhapsForeign.xlsx')\n",
+    "PerhapsForeign.to_excel(writer,'PerhapsForeign')\n",
+    "# df2.to_excel(writer,'Sheet2')\n",
+    "writer.save()\n",
+    "\n",
+    "'''\n",
+    "In Excel or Flask, delete rows with terms that we use in English; check that \n",
+    "tyhe remaining rows contain terms that most English speakers would think are \n",
+    "foriegn. \n",
+    "Supplement cols for the definite foreign terms, append to GoldStandard as \n",
+    "foreign terms.\n",
+    "'''\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "\n",
+    "# Update GoldStandard with edits from PerhapsForeign result\n",
+    "\n",
+    "# Update current log file from PerhapsForeign result\n",
+    "\n",
+    "# Create new 'uniques' list for FuzzyWuzzy\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# 5. Second UMLS API clean-up - Create logAfterUmlsApi2 as an \n",
+    "# update to logAfterUmlsApi by appending newUmlsWithSemanticGroupData\n",
+    "# ===========================================================================\n",
+    "'''\n",
+    "Use this AFTER you do a SECOND run against the UMLS Metathesaurus API.\n",
+    "\n",
+    "Re-start: \n",
+    "logAfterUmlsApi2 = pd.read_excel(localDir + 'logAfterUmlsApi2.xlsx')\n",
+    "'''\n",
+    "\n",
+    "logAfterUmlsApi2 = pd.read_excel(localDir + 'logAfterUmlsApi1.xlsx')\n",
+    "\n",
+    "# FIXME - Remove after this is fixed within the fixme above.\n",
+    "logAfterUmlsApi2 = logAfterUmlsApi2.sort_values(by='adjustedQueryCase', ascending=False)\n",
+    "logAfterUmlsApi2 = logAfterUmlsApi2.reset_index()\n",
+    "logAfterUmlsApi2.drop(['index'], axis=1, inplace=True)\n",
+    "\n",
+    "\n",
+    "# Join new UMLS API adds to the current search log master\n",
+    "logAfterUmlsApi2 = pd.merge(logAfterUmlsApi, newUmlsWithSemanticGroupData, how='left', on='adjustedQueryCase')\n",
+    "\n",
+    "# Future: Look for a better way to do the above - MERGE WITH CONDITIONAL OVERWRITE. Temporary fix:\n",
+    "logAfterUmlsApi2['preferredTerm2'] = logAfterUmlsApi2['preferredTerm_x'].where(logAfterUmlsApi2['preferredTerm_x'].notnull(), logAfterUmlsApi2['preferredTerm_y'])\n",
+    "logAfterUmlsApi2['SemanticTypeName2'] = logAfterUmlsApi2['SemanticTypeName_x'].where(logAfterUmlsApi2['SemanticTypeName_x'].notnull(), logAfterUmlsApi2['SemanticTypeName_y'])\n",
+    "logAfterUmlsApi2['SemanticGroupCode2'] = logAfterUmlsApi2['SemanticGroupCode_x'].where(logAfterUmlsApi2['SemanticGroupCode_x'].notnull(), logAfterUmlsApi2['SemanticGroupCode_y'])\n",
+    "logAfterUmlsApi2['SemanticGroup2'] = logAfterUmlsApi2['SemanticGroup_x'].where(logAfterUmlsApi2['SemanticGroup_x'].notnull(), logAfterUmlsApi2['SemanticGroup_y'])\n",
+    "logAfterUmlsApi2['BranchPosition2'] = logAfterUmlsApi2['BranchPosition_x'].where(logAfterUmlsApi2['BranchPosition_x'].notnull(), logAfterUmlsApi2['BranchPosition_y'])\n",
+    "logAfterUmlsApi2['CustomTreeNumber2'] = logAfterUmlsApi2['CustomTreeNumber_x'].where(logAfterUmlsApi2['CustomTreeNumber_x'].notnull(), logAfterUmlsApi2['CustomTreeNumber_y'])\n",
+    "logAfterUmlsApi2.drop(['preferredTerm_x', 'preferredTerm_y',\n",
+    "                          'SemanticTypeName_x', 'SemanticTypeName_y',\n",
+    "                          'SemanticGroup_x', 'SemanticGroup_y',\n",
+    "                          'SemanticGroupCode_x', 'SemanticGroupCode_y',\n",
+    "                          'BranchPosition_x', 'BranchPosition_y', \n",
+    "                          'CustomTreeNumber_x', 'CustomTreeNumber_y'], axis=1, inplace=True)\n",
+    "logAfterUmlsApi2.rename(columns={'preferredTerm2': 'preferredTerm',\n",
+    "                                    'SemanticTypeName2': 'SemanticTypeName',\n",
+    "                                    'SemanticGroup2': 'SemanticGroup',\n",
+    "                                    'SemanticGroupCode2': 'SemanticGroupCode',\n",
+    "                                    'BranchPosition2': 'BranchPosition',\n",
+    "                                    'CustomTreeNumber2': 'CustomTreeNumber'\n",
+    "                                    }, inplace=True)\n",
+    "\n",
+    "# Save to file so you can open in future sessions, if needed\n",
+    "writer = pd.ExcelWriter(localDir + 'logAfterUmlsApi2.xlsx')\n",
+    "logAfterUmlsApi2.to_excel(writer,'logAfterUmlsApi2')\n",
+    "# df2.to_excel(writer,'Sheet2')\n",
+    "writer.save()\n",
+    "\n",
+    "\n",
+    "\n",
+    "# -----------------------------------------------\n",
+    "# Create files to assign Semantic Types manually\n",
+    "# -----------------------------------------------\n",
+    "'''\n",
+    "If you want to add matches manually using two spreadsheet windows\n",
+    "To do in Python - cluster:\n",
+    "    - Probable person names\n",
+    "    - Probable NLM products, services, web pages\n",
+    "    - Probable journal names\n",
+    "'''\n",
+    "\n",
+    "col = ['SemanticGroup', 'SemanticTypeName', 'Definition', 'Examples']\n",
+    "SemRef = SemanticNetworkReference[col]\n",
+    "\n",
+    "# Get class distributions if you want to bolster under-represented sem types\n",
+    "\n",
+    "currentSemTypeCount = GoldStandard['SemanticTypeName'].value_counts()\n",
+    "currentSemTypeCount = pd.DataFrame({'TypeCount':currentSemTypeCount})\n",
+    "currentSemTypeCount.sort_values(\"TypeCount\", ascending=True, inplace=True)\n",
+    "currentSemTypeCount = currentSemTypeCount.reset_index()\n",
+    "currentSemTypeCount = currentSemTypeCount.rename(columns={'index': 'SemanticTypeName'})\n",
+    "\n",
+    "\n",
+    "\n",
+    "# ------------------------------------\n",
+    "# Visualize results - logAfterUmlsApi2\n",
+    "# ------------------------------------\n",
+    "    \n",
+    "# Pie for percentage of rows assigned; https://pythonspot.com/matplotlib-pie-chart/\n",
+    "totCount = len(logAfterUmlsApi2)\n",
+    "unassigned = logAfterUmlsApi2['SemanticGroup'].isnull().sum()\n",
+    "assigned = totCount - unassigned\n",
+    "labels = ['Assigned', 'Unassigned']\n",
+    "sizes = [assigned, unassigned]\n",
+    "colors = ['lightskyblue', 'lightcoral']\n",
+    "explode = (0.1, 0)  # explode 1st slice\n",
+    "plt.pie(sizes, explode=explode, labels=labels, colors=colors,\n",
+    "        autopct='%1.f%%', shadow=True, startangle=100)\n",
+    "plt.axis('equal')\n",
+    "plt.title(\"Status after 'UMLS API 2' processing\")\n",
+    "plt.show()\n",
+    "\n",
+    "# Bar of SemanticGroup categories, horizontal\n",
+    "# Source: http://robertmitchellv.com/blog-bar-chart-annotations-pandas-mpl.html\n",
+    "ax = logAfterUmlsApi2['SemanticGroup'].value_counts().plot(kind='barh', figsize=(10,6),\n",
+    "                                                 color=\"slateblue\", fontsize=10);\n",
+    "ax.set_alpha(0.8)\n",
+    "ax.set_title(\"Categories assigned after 'UMLS API 2' processing\", fontsize=14)\n",
+    "ax.set_xlabel(\"Number of searches\", fontsize=9);\n",
+    "# set individual bar lables using above list\n",
+    "for i in ax.patches:\n",
+    "    # get_width pulls left or right; get_y pushes up or down\n",
+    "    ax.text(i.get_width()+.1, i.get_y()+.31, \\\n",
+    "            str(round((i.get_width()), 2)), fontsize=9, color='dimgrey')\n",
+    "# invert for largest on top \n",
+    "ax.invert_yaxis()\n",
+    "plt.gcf().subplots_adjust(left=0.3)\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# 6. Create new 'uniques' dataframe/file for fuzzy matching\n",
+    "# ===========================================================================\n",
+    "'''\n",
+    "Re-start\n",
+    "\n",
+    "logAfterUmlsApi1 = pd.read_excel(localDir + 'logAfterUmlsApi1.xlsx')\n",
+    "\n",
+    "# Set a date range\n",
+    "AprMay = logAfterUmlsApi1[(logAfterUmlsApi1['Timestamp'] > '2018-04-01 01:00:00') & (logAfterUmlsApi1['Timestamp'] < '2018-06-01 00:00:00')]\n",
+    "\n",
+    "logAfterUmlsApi2 = AprMay\n",
+    "\n",
+    "# Restrict to NLM Home\n",
+    "searchfor = ['www.nlm.nih.gov$', 'www.nlm.nih.gov/$']\n",
+    "logAfterUmlsApi2 = logAfterUmlsApi2[logAfterUmlsApi2.Referrer.str.contains('|'.join(searchfor))]\n",
+    "\n",
+    "# Set a date range\n",
+    "AprMay = logAfterUmlsApi1[(logAfterUmlsApi1['Timestamp'] > '2018-04-01 01:00:00') & (logAfterUmlsApi1['Timestamp'] < '2018-06-01 00:00:00')]\n",
+    "\n",
+    "logAfterUmlsApi2 = AprMay\n",
+    "'''\n",
+    "\n",
+    "\n",
+    "listOfUniqueUnassignedAfterUmls2 = logAfterUmlsApi2[pd.isnull(logAfterUmlsApi2['preferredTerm'])]\n",
+    "listOfUniqueUnassignedAfterUmls2 = listOfUniqueUnassignedAfterUmls2.groupby('adjustedQueryCase').size()\n",
+    "listOfUniqueUnassignedAfterUmls2 = pd.DataFrame({'timesSearched':listOfUniqueUnassignedAfterUmls2})\n",
+    "listOfUniqueUnassignedAfterUmls2 = listOfUniqueUnassignedAfterUmls2.sort_values(by='timesSearched', ascending=False)\n",
+    "listOfUniqueUnassignedAfterUmls2 = listOfUniqueUnassignedAfterUmls2.reset_index()\n",
+    "\n",
+    "writer = pd.ExcelWriter(localDir + 'unassignedToCheck2.xlsx')\n",
+    "listOfUniqueUnassignedAfterUmls2.to_excel(writer,'unassignedToCheck')\n",
+    "writer.save()"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.5"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/02_Run_APIs.py b/02_Run_APIs.py
new file mode 100644
index 0000000..211b5b3
--- /dev/null
+++ b/02_Run_APIs.py
@@ -0,0 +1,983 @@
+#!/usr/bin/env python3
+# -*- coding: utf-8 -*-
+"""
+Created on Thu Jun 28 15:33:33 2018
+
+@author: dan.wendling@nih.gov
+
+Last modified: 2018-07-09
+
+
+----------------
+SCRIPT CONTENTS
+----------------
+
+1. Start-up
+2. UmlsApi1 - Normalized string matching
+3. Isolate entries updated by API, complete tagging, and match to the 
+   current version of the search log - logAfterUmlsApi
+4. Create logAfterUmlsApi as an update to logAfterGoldStandard by appending 
+   newUmlsWithSemanticGroupData
+5. Update GoldStandard
+6. Create new 'uniques' dataframe/file for fuzzy matching
+
+
+7. Google Translate API, https://cloud.google.com/translate/
+But it's not free; https://stackoverflow.com/questions/37667671/is-it-possible-to-access-to-google-translate-api-for-free
+
+8. UmlsApi2 - Tag non-English terms in Roman character sets
+
+
+
+7. UmlsApi3 - Word matching (relax prediction rules )
+
+8. RxNorm API
+
+9. UmlsApi4 - Re-run first config - Create logAfterUmlsApi4 as an 
+   update to logAfterUmlsApi by 
+   
+   append newUmlsWithSemanticGroupData
+   
+10. Create updated training file (GoldStandard) for ML script
+
+
+----------------------------------
+FIXME - DAN'S TO-DO ITEMS FOR DAN
+----------------------------------
+
+Change SemanticNetworkReference.UniqueID to SemanticTypeCode
+Add SemanticNetworkReference.SemanticTypeCode to what goes into the logs, for ML.
+
+
+----------
+RESOURCES
+----------
+
+Register at UMLS, get a UMLS-UTS API key, and add it below. This is the 
+primary source for Semantic Type classifications.
+https://documentation.uts.nlm.nih.gov/rest/authentication.html
+UMLS quick start: 
+UMLS description of what Normalized String option is, 
+https://uts.nlm.nih.gov/doc/devGuide/webservices/metaops/find/find2.html
+"""
+
+
+#%%
+# ============================================
+# 1. Start-up / What to put into place, where
+# ============================================
+
+import pandas as pd
+import matplotlib.pyplot as plt
+from matplotlib.pyplot import pie, axis, show
+import numpy as np
+import requests
+import json
+import lxml.html as lh
+from lxml.html import fromstring
+import time
+import os
+
+# Set working directory
+os.chdir('/Users/wendlingd/Projects/webDS/_util')
+
+
+localDir = '02_Run_APIs_files/'
+
+# If you're starting a new session an this is not already open
+listOfUniqueUnassignedAfterGS = '01_Pre-processing_files/listOfUniqueUnassignedAfterGS.xlsx'
+listOfUniqueUnassignedAfterGS = pd.read_excel(listOfUniqueUnassignedAfterGS)
+
+# Bring in historical file of (somewhat edited) matches
+GoldStandard = '01_Pre-processing_files/GoldStandard.xlsx'
+GoldStandard = pd.read_excel(GoldStandard)
+
+
+
+# Get API key ('e2f15391-871e-4b54-86e5-0a52e1a879cc')
+def get_umls_api_key(filename=None):
+    key = os.environ.get('UMLS_API_KEY', None)
+    if key is not None:
+        return key
+    if filename is None:
+          path = os.environ.get('HOME', None)
+          if path is None:
+               path = os.environ.get('USERPROFILE', None)
+          if path is None:
+               path = '.'
+          filename = os.path.join(path, '.umls_api_key')
+    with open(filename, 'r') as f:
+           key = f.readline().strip()
+    return key
+
+myUTSAPIkey = get_umls_api_key()
+
+
+'''
+GoldStandard.xlsx - Already-assigned term list, from UMLS and other sources, 
+    vetted.
+'''
+
+
+'''
+SemanticNetworkReference - Customized version of the list at 
+https://www.nlm.nih.gov/research/umls/META3_current_semantic_types.html, 
+to be used to put search terms into huge bins. Should be integrated into 
+GoldStandard and be available at the end of the ML matching process.
+'''
+SemanticNetworkReference = '01_Pre-processing_files/SemanticNetworkReference.xlsx'
+
+
+''' 
+- Run what remains against the UMLS API.
+
+Requires having your own license and API key; see https://www.nlm.nih.gov/research/umls/
+Not shown here: 
+    - In huge files I sort by count and focus on terms searched by multiple
+    or many people. The 'long tail' can be huge.
+    - I have a database of terms aready assigned. I match these before 
+    contacting UMLS; no need to check them again. Shortens processing time.
+More options:
+    https://documentation.uts.nlm.nih.gov/rest/home.html
+    https://documentation.uts.nlm.nih.gov/rest/concept/
+
+'''
+
+
+
+# unassignedAfterUmls1 = pd.read_excel(localdir + 'unassignedAfterUmls1.xlsx')
+
+'''
+Register at RxNorm, get API key, and add it below. This is for drug misspellings.
+'''
+
+# Generate a one-day Ticket-Granting-Ticket (TGT)
+tgt = requests.post('https://utslogin.nlm.nih.gov/cas/v1/api-key', data = {'apikey':myUTSAPIkey})
+# For API key get a license from https://www.nlm.nih.gov/research/umls/
+# tgt.text
+response = fromstring(tgt.text)
+todaysTgt = response.xpath('//form/@action')[0]
+
+uiUri = "https://uts-ws.nlm.nih.gov/rest/search/current?"
+semUri = "https://uts-ws.nlm.nih.gov/rest/content/current/CUI/"
+
+
+
+#%%
+# =========================================
+# 2. UmlsApi1 - Normalized string matching
+# =========================================
+'''
+In this run the API calls use the Normalized String setting. Example: 
+for the input string Yellow leaves, normalizedString would return two strings, 
+leaf yellow and leave yellow. Each string would be matched exactly to the 
+strings in the normalized string index to return a result. 
+
+Re-start:
+# listOfUniqueUnassignedAfterGS = pd.read_excel('01_Pre-processing_files/listOfUniqueUnassignedAfterGS.xlsx')
+
+listToCheck6 = pd.read_excel(localDir + 'listToCheck6.xlsx')
+listToCheck7 = pd.read_excel(localDir + 'listToCheck7.xlsx')
+'''
+
+# ---------------------------------------
+# Batch rows so you can do separate runs
+# Max of 5,000 rows per run
+# ---------------------------------------
+
+# uniqueSearchTerms = search['adjustedQueryCase'].unique()
+
+# Reduce entry length, to focus on single concepts that UTS API can match
+listOfUniqueUnassignedAfterGS = listOfUniqueUnassignedAfterGS.loc[(listOfUniqueUnassignedAfterGS['adjustedQueryCase'].str.len() <= 20) == True]
+
+
+# listToCheck1 = unassignedAfterGS.iloc[0:20]
+listToCheck1 = listOfUniqueUnassignedAfterGS.iloc[0:6000]
+listToCheck2 = listOfUniqueUnassignedAfterGS.iloc[6001:12000]
+listToCheck3 = listOfUniqueUnassignedAfterGS.iloc[12001:18000]
+listToCheck4 = listOfUniqueUnassignedAfterGS.iloc[18001:24000]
+listToCheck5 = listOfUniqueUnassignedAfterGS.iloc[24001:30000]
+listToCheck6 = listOfUniqueUnassignedAfterGS.iloc[30001:36000]
+listToCheck7 = listOfUniqueUnassignedAfterGS.iloc[36001:39523]
+
+
+
+'''
+listToCheck1 = unassignedToCheck.iloc[12497:20000]
+listToCheck2 = unassignedToCheck.iloc[20001:26000]
+listToCheck3 = unassignedToCheck.iloc[23225:28000]
+listToCheck4 = unassignedToCheck.iloc[28001:31256]
+
+mask = (unassignedToCheck['adjustedQueryCase'].str.len() <= 15)
+listToCheck3 = listToCheck3.loc[mask]
+listToCheck4 = listToCheck4.loc[mask]
+'''
+
+
+# If multiple sessions required, saving to file might help
+writer = pd.ExcelWriter(localDir + 'listToCheck7.xlsx')
+listToCheck7.to_excel(writer,'listToCheck7')
+# df2.to_excel(writer,'Sheet2')
+writer.save()
+
+writer = pd.ExcelWriter(localDir + 'listToCheck2.xlsx')
+listToCheck2.to_excel(writer,'listToCheck2')
+# df2.to_excel(writer,'Sheet2')
+writer.save()
+
+'''
+OPTIONS
+
+# Bring in from file
+listToCheck3 = pd.read_excel(localDir + 'listToCheck3.xlsx')
+listToCheck4 = pd.read_excel(localDir + 'listToCheck4.xlsx')
+
+listToCheck1 = unassignedAfterGS
+listToCheck2 = unassignedAfterGS.iloc[5001:10000]
+listToCheck1 = unassignedAfterGS.iloc[10001:11335]
+'''
+
+
+#%% 
+# ----------------------------------------------------------
+# Run this block after changing listToCheck# top and bottom
+# ----------------------------------------------------------
+'''
+Until you put this into a function, you need to change listToCheck# 
+and apiGetNormalizedString# counts every run!
+Stay below 30 API requests per second. With 4 API requests per item
+(2 .get and 2 .post requests)...
+time.sleep commented out: 6,000 / 35 min = 171 per minute = 2.9 items per second / 11.4 requests per second
+Computing differently, 6,000 items @ 4 Req per item = 24,000 Req, divided by 35 min+
+686 Req/min = 11.4 Req/sec
+time.sleep(.07):  ~38 minutes to do 6,000; 158 per minute / 2.6 items per second
+'''
+
+apiGetNormalizedString = pd.DataFrame()
+apiGetNormalizedString['adjustedQueryCase'] = ""
+apiGetNormalizedString['preferredTerm'] = ""
+apiGetNormalizedString['SemanticTypeName'] = ""
+
+'''
+For file 6, 7/5/18 1:05 p.m.: SSLError: HTTPSConnectionPool(host='utslogin.nlm.nih.gov', 
+port=443): Max retries exceeded with url: 
+/cas/v1/api-key/TGT-480224-qLwYAMKl5cTfa7Jwb7RWZ3kfexPUm479HfddD7yVUKt79lZ0Ta-cas 
+(Caused by SSLError(SSLError("bad handshake: SysCallError(60, 'ETIMEDOUT')",),))
+
+Later, run 6 and 7
+'''
+
+
+for index, row in listToCheck7.iterrows():
+    currLogTerm = row['adjustedQueryCase']
+    # === Get 'preferred term' and its concept identifier (CUI/UI) =========
+    stTicket = requests.post(todaysTgt, data = {'service':'http://umlsks.nlm.nih.gov'}) # Get single-use Service Ticket (ST)
+    # Example: GET https://uts-ws.nlm.nih.gov/rest/search/current?string=tylenol&sabs=MSH&ticket=ST-681163-bDfgQz5vKe2DJXvI4Snm-cas
+    tQuery = {'string':currLogTerm, 'searchType':'normalizedString', 'ticket':stTicket.text} # removed 'sabs':'MSH', 
+    getPrefTerm = requests.get(uiUri, params=tQuery)
+    getPrefTerm.encoding = 'utf-8'
+    tItems  = json.loads(getPrefTerm.text)
+    tJson = tItems["result"]
+    if tJson["results"][0]["ui"] != "NONE": # Sub-loop to resolve "NONE"
+        currUi = tJson["results"][0]["ui"]
+        currPrefTerm = tJson["results"][0]["name"]
+        # === Get 'semantic type' =========
+        stTicket = requests.post(todaysTgt, data = {'service':'http://umlsks.nlm.nih.gov'}) # Get single-use Service Ticket (ST)
+        # Example: GET https://uts-ws.nlm.nih.gov/rest/content/current/CUI/C0699142?ticket=ST-512564-vUxzyI00ErMRm6tjefNP-cas
+        semQuery = {'ticket':stTicket.text}
+        getPrefTerm = requests.get(semUri+currUi, params=semQuery)
+        getPrefTerm.encoding = 'utf-8'
+        semItems  = json.loads(getPrefTerm.text)
+        semJson = semItems["result"]
+        currSemTypes = []
+        for name in semJson["semanticTypes"]:
+            currSemTypes.append(name["name"]) #  + " ; "
+        # === Post to dataframe =========
+        apiGetNormalizedString = apiGetNormalizedString.append(pd.DataFrame({'adjustedQueryCase': currLogTerm, 
+                                                       'preferredTerm': currPrefTerm, 
+                                                       'SemanticTypeName': currSemTypes[0]}, index=[0]), ignore_index=True)
+        print('{} --> {}'.format(currLogTerm, currSemTypes[0])) # Write progress to console
+        # time.sleep(.06)
+    else:
+       # Post "NONE" to database and restart loop
+        apiGetNormalizedString = apiGetNormalizedString.append(pd.DataFrame({'adjustedQueryCase': currLogTerm, 'preferredTerm': "NONE"}, index=[0]), ignore_index=True)
+        print('{} --> NONE'.format(currLogTerm, )) # Write progress to console
+        # time.sleep(.06)
+print ("* Done *")
+
+
+writer = pd.ExcelWriter(localDir + 'apiGetNormalizedString7.xlsx')
+apiGetNormalizedString.to_excel(writer,'apiGetNormalizedString')
+# df2.to_excel(writer,'Sheet2')
+writer.save()
+
+
+# Free up memory: Remove listToCheck, listToCheck1, listToCheck2, listToCheck3, 
+# listToCheck4, nonForeign, searchLog, unassignedAfterGS
+
+
+#%%
+# ==================================================================
+# 3. Isolate entries updated by API, complete tagging, and match to 
+# the current version of the search log - logAfterUmlsApi
+# ==================================================================
+'''
+To Do:
+
+    Isolate new assignments and:
+    - merge them into the master version of the log
+    - add to GoldStandard for next time
+ 
+ # Move unassigned entries into workflow for human identification
+ 
+To re-start
+
+unassignedAfterGS = pd.read_excel(localDir + 'unassignedAfterGS.xlsx')
+logAfterGoldStandard = pd.read_excel(localDir + 'logAfterGoldStandard.xlsx')
+
+listFromApi = pd.read_excel('02_UMLS_API_files/listFromApi1-April-May.xlsx')
+assignedByUmlsApi = pd.read_excel(localDir + 'assignedByUmlsApi.xlsx')
+
+# Fix temporary issue of nulls in SemanticTypeName, and wrong col name semTypeName
+ 
+listFromApi.drop(['SemanticTypeName'], axis=1, inplace=True)
+listFromApi.rename(columns={'semTypeName': 'SemanticTypeName'}, inplace=True)
+
+# listFromApi = listFromApi.dropna(subset=['SemanticTypeName'])
+ '''
+ 
+
+# If you stored output from UMLS API in files, re-open and unite
+newAssignments1 = pd.read_excel(localDir + 'apiGetNormalizedString1.xlsx')
+newAssignments2 = pd.read_excel(localDir + 'apiGetNormalizedString2.xlsx')
+newAssignments3 = pd.read_excel(localDir + 'apiGetNormalizedString3.xlsx')
+newAssignments4 = pd.read_excel(localDir + 'apiGetNormalizedString4.xlsx')
+newAssignments5 = pd.read_excel(localDir + 'apiGetNormalizedString5.xlsx')
+newAssignments6 = pd.read_excel(localDir + 'apiGetNormalizedString6.xlsx')
+newAssignments7 = pd.read_excel(localDir + 'apiGetNormalizedString7.xlsx')
+
+
+# Put dataframes together into one; df = df1.append([df2, df3])
+afterUmlsApi1 = newAssignments1.append([newAssignments2, newAssignments3, newAssignments4, newAssignments5])
+afterUmlsApi1 = newAssignments6.append([newAssignments7])
+
+
+'''
+afterUmlsApi1 = afterUmlsApi1.append(newAssignments3)
+afterUmlsApi1 = afterUmlsApi1.append(newAssignments4)
+'''
+
+
+# If you only used one df for listFromApi
+# afterUMLSapi = listFromApi
+# assignedByUmlsApi = listFromApi
+
+
+# Reduce to a version that has only successful assignments
+
+# Remove various problem entries
+assignedByUmlsApi1 = afterUmlsApi1.loc[(afterUmlsApi1['preferredTerm'] != "NONE")]
+assignedByUmlsApi1 = assignedByUmlsApi1[~pd.isnull(assignedByUmlsApi1['preferredTerm'])]
+assignedByUmlsApi1 = assignedByUmlsApi1.loc[(assignedByUmlsApi1['preferredTerm'] != "Null Value")]
+assignedByUmlsApi1 = assignedByUmlsApi1[~pd.isnull(assignedByUmlsApi1['adjustedQueryCase'])]
+
+
+# If you want to send to Excel
+writer = pd.ExcelWriter(localDir + 'assignedByUmlsApi1.xlsx')
+assignedByUmlsApi1.to_excel(writer,'assignedByUmlsApi1')
+# df2.to_excel(writer,'Sheet2')
+writer.save()
+
+
+# Bring in subject category master file
+# SemanticNetworkReference = pd.read_excel(localDir + 'SemanticNetworkReference.xlsx')
+SemanticNetworkReference = pd.read_excel(SemanticNetworkReference)
+
+# Reduce to required cols
+SemTypeData = SemanticNetworkReference[['SemanticTypeName', 'SemanticGroupCode', 'SemanticGroup', 'CustomTreeNumber', 'BranchPosition']]
+# SemTypeData.rename(columns={'SemanticTypeName': 'semTypeName'}, inplace=True) # The join col
+
+# Add more semantic tagging to new UMLS API adds
+newUmlsWithSemanticGroupData = pd.merge(assignedByUmlsApi1, SemTypeData, how='left', on='SemanticTypeName')
+
+
+#%%
+# ============================================================================
+# 4. Create logAfterUmlsApi as an update to logAfterGoldStandard by appending 
+# newUmlsWithSemanticGroupData
+# ============================================================================
+
+'''
+Depending on what you're processing, use this or the next section of the below.
+
+Depends on how you choose to process - Like, down to one occurrence to API 
+in first batch, or not.
+'''
+
+
+logAfterGoldStandard = '01_Pre-processing_files/logAfterGoldStandard.xlsx'
+logAfterGoldStandard = pd.read_excel(logAfterGoldStandard)
+
+
+'''
+# FIXME - Remove after this is fixed within the fixme above.
+logAfterGoldStandard = logAfterGoldStandard.sort_values(by='adjustedQueryCase', ascending=True)
+logAfterGoldStandard = logAfterGoldStandard.reset_index()
+logAfterGoldStandard.drop(['index'], axis=1, inplace=True)
+'''
+
+
+# Eyeball. If you need to remove rows...
+# logAfterGoldStandard = logAfterGoldStandard.iloc[760:] # remove before index...
+
+# Join new UMLS API adds to the current search log master
+logAfterUmlsApi1 = pd.merge(logAfterGoldStandard, newUmlsWithSemanticGroupData, how='left', on='adjustedQueryCase')
+
+logAfterUmlsApi1.columns
+
+'''
+['SessionID', 'StaffYN', 'Referrer', 'Query', 'Timestamp',
+       'adjustedQueryCase', 'SemanticTypeName_x', 'SemanticGroup_x',
+       'SemanticGroupCode_x', 'BranchPosition_x', 'CustomTreeNumber_x',
+       'ResourceType', 'Address', 'EntrySource', 'contentSteward',
+       'preferredTerm_x', 'SemanticTypeName_y', 'preferredTerm_y',
+       'SemanticGroupCode_y', 'SemanticGroup_y', 'CustomTreeNumber_y',
+       'BranchPosition_y']
+
+'''
+
+
+# Future: Look for a better way to do the above - MERGE WITH CONDITIONAL OVERWRITE. Temporary fix:
+logAfterUmlsApi1['preferredTerm2'] = logAfterUmlsApi1['preferredTerm_x'].where(logAfterUmlsApi1['preferredTerm_x'].notnull(), logAfterUmlsApi1['preferredTerm_y'])
+logAfterUmlsApi1['SemanticTypeName2'] = logAfterUmlsApi1['SemanticTypeName_x'].where(logAfterUmlsApi1['SemanticTypeName_x'].notnull(), logAfterUmlsApi1['SemanticTypeName_y'])
+logAfterUmlsApi1['SemanticGroup2'] = logAfterUmlsApi1['SemanticGroup_x'].where(logAfterUmlsApi1['SemanticGroup_x'].notnull(), logAfterUmlsApi1['SemanticGroup_y'])
+logAfterUmlsApi1['SemanticGroupCode2'] = logAfterUmlsApi1['SemanticGroupCode_x'].where(logAfterUmlsApi1['SemanticGroupCode_x'].notnull(), logAfterUmlsApi1['SemanticGroupCode_y'])
+logAfterUmlsApi1['BranchPosition2'] = logAfterUmlsApi1['BranchPosition_x'].where(logAfterUmlsApi1['BranchPosition_x'].notnull(), logAfterUmlsApi1['BranchPosition_y'])
+logAfterUmlsApi1['CustomTreeNumber2'] = logAfterUmlsApi1['CustomTreeNumber_x'].where(logAfterUmlsApi1['CustomTreeNumber_x'].notnull(), logAfterUmlsApi1['CustomTreeNumber_y'])
+logAfterUmlsApi1.drop(['preferredTerm_x', 'preferredTerm_y',
+                          'SemanticTypeName_x', 'SemanticTypeName_y',
+                          'SemanticGroup_x', 'SemanticGroup_y',
+                          'SemanticGroupCode_x', 'SemanticGroupCode_y',
+                          'BranchPosition_x', 'BranchPosition_y', 
+                          'CustomTreeNumber_x', 'CustomTreeNumber_y'], axis=1, inplace=True)
+logAfterUmlsApi1.rename(columns={'preferredTerm2': 'preferredTerm',
+                                    'SemanticTypeName2': 'SemanticTypeName',
+                                    'SemanticGroup2': 'SemanticGroup',
+                                    'SemanticGroupCode2': 'SemanticGroupCode',
+                                    'BranchPosition2': 'BranchPosition',
+                                    'CustomTreeNumber2': 'CustomTreeNumber'
+                                    }, inplace=True)
+
+# Save to file so you can open in future sessions, if needed
+writer = pd.ExcelWriter(localDir + 'logAfterUmlsApi1.xlsx')
+logAfterUmlsApi1.to_excel(writer,'logAfterUmlsApi1')
+# df2.to_excel(writer,'Sheet2')
+writer.save()
+
+'''
+To Do:
+    - Create list of unmatched terms with freq
+    - Cluster similar spellings together?
+    
+- Look at "Not currently matchable" terms with "high" frequency counts. Eyeball to see if these were incorrectly matched in the past; assign historical term or update all to new term, save in gold standard file.
+- Process entries from the PubMed product page.
+- If you haven't done so, update RegEx list to improve future matching.
+- Every several months, through Flask interface, interactively update the gold standard, manually.
+
+# Reduce logAfterUmlsApi to unique, unmatched entries, prep for ML
+
+To re-start:
+logAfterUmlsApi = pd.read_excel(localDir + 'logAfterUmlsApi.xlsx')
+'''
+
+
+# ------------------------------------
+# Visualize results - logAfterUmlsApi
+# ------------------------------------
+    
+# Pie for percentage of rows assigned; https://pythonspot.com/matplotlib-pie-chart/
+totCount = len(logAfterUmlsApi1)
+unassigned = logAfterUmlsApi1['SemanticGroup'].isnull().sum()
+assigned = totCount - unassigned
+labels = ['Assigned', 'Unassigned']
+sizes = [assigned, unassigned]
+colors = ['lightskyblue', 'lightcoral']
+explode = (0.1, 0)  # explode 1st slice
+plt.pie(sizes, explode=explode, labels=labels, colors=colors,
+        autopct='%1.f%%', shadow=True, startangle=100)
+plt.axis('equal')
+plt.title("Status after 'UMLS API' processing")
+plt.show()
+
+# Bar of SemanticGroup categories, horizontal
+# Source: http://robertmitchellv.com/blog-bar-chart-annotations-pandas-mpl.html
+ax = logAfterUmlsApi1['SemanticGroup'].value_counts().plot(kind='barh', figsize=(10,6),
+                                                 color="slateblue", fontsize=10);
+ax.set_alpha(0.8)
+ax.set_title("Categories assigned after 'UMLS API' processing", fontsize=14)
+ax.set_xlabel("Number of searches", fontsize=9);
+# set individual bar lables using above list
+for i in ax.patches:
+    # get_width pulls left or right; get_y pushes up or down
+    ax.text(i.get_width()+.1, i.get_y()+.31, \
+            str(round((i.get_width()), 2)), fontsize=9, color='dimgrey')
+# invert for largest on top 
+ax.invert_yaxis()
+plt.gcf().subplots_adjust(left=0.3)
+
+# Remove listOfUniqueUnassignedAfterGS, listToCheck1, etc., logAfterGoldStandard, logAfterUmlsApi1, 
+# newAssignments1 etc.
+
+
+
+#%%
+# =======================
+# 5. Update GoldStandard
+# =======================
+
+# Open GoldStandard if needed
+GoldStandard = '01_Pre-processing_files/GoldStandard.xlsx'
+GoldStandard = pd.read_excel(GoldStandard)
+
+# Append fully tagged UMLS API adds to GoldStandard
+GoldStandard = GoldStandard.append(newUmlsWithSemanticGroupData, sort=False)
+
+# Reset index
+GoldStandard = GoldStandard.reset_index()
+GoldStandard.drop(['index'], axis=1, inplace=True)
+# temp GoldStandard.drop(['adjustedQueryCase'], axis=1, inplace=True)
+
+'''
+Eyeball top and bottom of cols, remove rows by Index, if needed
+
+GoldStandard.drop(58027, inplace=True)
+'''
+
+
+# Write out the updated GoldStandard
+writer = pd.ExcelWriter('01_Pre-processing_files/GoldStandard.xlsx')
+GoldStandard.to_excel(writer,'GoldStandard')
+writer.save()
+
+
+
+#%%
+# ============================================================================
+# 6. Start new 'uniques' dataframe that gets new column for each of the below
+# listOfUniqueUnassignedAfterUmls1
+# ============================================================================
+
+'''
+To Do:
+    - Create list of unmatched terms with freq
+    - Cluster similar spellings together?
+    
+- Look at "Not currently matchable" terms with "high" frequency counts. Eyeball to see if these were incorrectly matched in the past; assign historical term or update all to new term, save in gold standard file.
+- Process entries from the PubMed product page.
+- If you haven't done so, update RegEx list to improve future matching.
+- Every several months, through Flask interface, interactively update the gold standard, manually.
+
+# Reduce logAfterUmlsApi to unique, unmatched entries, prep for ML
+
+To re-start:
+logAfterUmlsApi = pd.read_excel(localDir + 'logAfterUmlsApi.xlsx')
+'''
+
+listOfUniqueUnassignedAfterUmls1 = logAfterUmlsApi1[pd.isnull(logAfterUmlsApi1['SemanticGroup'])]
+listOfUniqueUnassignedAfterUmls1 = listOfUniqueUnassignedAfterUmls1.groupby('adjustedQueryCase').size()
+listOfUniqueUnassignedAfterUmls1 = pd.DataFrame({'timesSearched':listOfUniqueUnassignedAfterUmls1})
+listOfUniqueUnassignedAfterUmls1 = listOfUniqueUnassignedAfterUmls1.sort_values(by='timesSearched', ascending=False)
+listOfUniqueUnassignedAfterUmls1 = listOfUniqueUnassignedAfterUmls1.reset_index()
+
+writer = pd.ExcelWriter(localDir + 'listOfUniqueUnassignedAfterUmls11.xlsx')
+listOfUniqueUnassignedAfterUmls1.to_excel(writer,'unassignedToCheck')
+writer.save()
+
+# FY 18 Q3: 57,287
+
+
+#%%
+# =============================================================
+# 5. Google Translate API, https://cloud.google.com/translate/
+# =============================================================
+'''
+But it's not free; https://stackoverflow.com/questions/37667671/is-it-possible-to-access-to-google-translate-api-for-free
+'''
+
+
+#%%
+# ==========================================================================
+# 5. UmlsApi2 - Tag non-English terms in Roman character sets
+# ==========================================================================
+'''
+Some foreign terms can be matched. This run does not return a preferred term,
+just returns what vocabulary the term is found in. 
+
+Queries with words not in English are ignored by the first API run using
+"normalized string" matching. Here, try flagging what you can and take them 
+out of the percent-complete calculation.
+
+The API apparently only supports U.S. English. RegEx could be used to convert
+UTF-8 Roman characters that are not English... Non-Roman languages (Chinese, 
+Cyrillic, Arabic, Japanese, etc.) are not supported by the API; these should 
+be kept out of the API runs entirely.
+
+6/22/18, from David of UMLS support, TRACKING:000308010
+
+> Can the UMLS REST API tell me the term's language? 
+
+One option would be to specify returnIdType=sourceUi for your search 
+request. For example: 
+ 
+https://uts-ws.nlm.nih.gov/rest/search/current?string=Infarto de miocardio&returnIdType=sourceUi&ticket=
+
+This will give you a set of codes back where there is a match, but will 
+also return a vocabulary (rootSource). If you have that, you can get 
+the language (in this case, Spanish). The first result may be all you 
+need. If you have the rootSource, you can match it to the "abbreviation" 
+and look up the language here: https://uts-ws.nlm.nih.gov/rest/metadata/current/sources. 
+ 
+It won't be perfect. I'm seeing some problems with accented characters. 
+For example, coração returns no results, so that's not great, but may 
+not matter. Some strings will appear in multiple languages, too. 
+ 
+Let me know how that works for you. - David
+'''
+
+
+# ------------------------------------------------------
+# Batch up your API runs. Re-starting, correcting, etc.
+# ------------------------------------------------------
+
+# uniqueSearchTerms = search['adjustedQueryCase'].unique()
+
+# vocabCheck1 = unassignedAfterGS.iloc[0:20]
+vocabCheck1 = listOfUniqueUnassignedAfterUmls1.iloc[0:5000]
+# vocabCheck2 = listOfUniqueUnassignedAfterUmls1.iloc[5001:10678]
+
+
+# If multiple sessions required, saving to file might help
+writer = pd.ExcelWriter(localDir + 'vocabCheck1.xlsx')
+vocabCheck1.to_excel(writer,'vocabCheck')
+# df2.to_excel(writer,'Sheet2')
+writer.save()
+
+
+'''
+writer = pd.ExcelWriter(localDir + 'listToCheck2.xlsx')
+listToCheck2.to_excel(writer,'listToCheck2')
+# df2.to_excel(writer,'Sheet2')
+writer.save()
+'''
+
+
+'''
+OPTIONS
+
+# Bring in from file
+listToCheck3 = pd.read_excel('01 Pre-process/listToCheck3.xlsx')
+listToCheck4 = pd.read_excel('01 Pre-process/listToCheck4.xlsx')
+
+listToCheck1 = unassignedAfterGS
+listToCheck2 = unassignedAfterGS.iloc[5001:10000]
+listToCheck1 = unassignedAfterGS.iloc[10001:11335]
+'''
+
+
+'''
+Work with PostMan app to test/approve
+
+David from UMLS: One option would be to specify returnIdType=sourceUi for 
+your search request. For example:
+ 
+https://uts-ws.nlm.nih.gov/rest/search/current?string=Infarto de miocardio&returnIdType=sourceUi&ticket=
+
+This will give you a set of codes back where there is a match, but will 
+also return a vocabulary (rootSource). If you have that, you can get the 
+language (in this case, Spanish). The first result may be all you need. 
+If you have the rootSource, you can match it to the "abbreviation" and 
+look up the language here: https://uts-ws.nlm.nih.gov/rest/metadata/current/sources. 
+ 
+It won't be perfect. I'm seeing some problems with accented characters. 
+For example, coração returns no results, so that's not great, but may 
+not matter. Some strings will appear in multiple languages, too. 
+ 
+Let me know how that works for you.
+'''
+
+'''
+FIXME - Unfinished.
+
+TGT-16294-ajZgfOTNGBxvzAXAvQslZtuL2U0HksFsED6tZ0ajoewNBNdSVz-cas
+
+
+# THIS IS SOURCE VOCAB CODE
+
+https://uts-ws.nlm.nih.gov/rest/search/current?string=Infarto de miocardio&returnIdType=sourceUi&ticket=
+'''
+
+
+#%%
+# -----------------------------------
+# Gather list of source vocabularies
+# -----------------------------------
+
+uiUri = "https://uts-ws.nlm.nih.gov/rest/search/current?returnIdType=sourceUi"
+
+listOfSourceVocabularies = pd.DataFrame()
+listOfSourceVocabularies['adjustedQueryCase'] = ""
+listOfSourceVocabularies['sourceVocab'] = ""
+
+for index, row in listToCheck1.iterrows():
+    currLogTerm = row['adjustedQueryCase']
+    # === Get 'source vocab' =========
+    stTicket = requests.post(todaysTgt, data = {'service':'http://umlsks.nlm.nih.gov'}) # Get single-use Service Ticket (ST)
+    # Example: GET https://uts-ws.nlm.nih.gov/rest/search/current?string=tylenol&sabs=MSH&ticket=ST-681163-bDfgQz5vKe2DJXvI4Snm-cas
+    termQuery = {'string':currLogTerm, 'ticket':stTicket.text} # removed 'searchType':'word' (it's the default),      'sabs':'MSH', 
+    getSourceVocab = requests.get(uiUri, params=termQuery)
+    getSourceVocab.encoding = 'utf-8'
+    tItems = json.loads(getSourceVocab.text)
+    tJson = tItems["result"]
+    if tJson["results"][0]["ui"] != "NONE": # Sub-loop to resolve "NONE"
+        currUi = tJson["results"][0]["rootSource"]
+        sourceVocab = tJson["results"][0]["rootSource"]
+        # === Post to dataframe =========
+        listOfSourceVocabularies = listOfSourceVocabularies.append(pd.DataFrame({'adjustedQueryCase': currLogTerm, 
+                                                       'sourceVocab': sourceVocab}, index=[0]), ignore_index=True)
+        print('{} --> {}'.format(currLogTerm, sourceVocab)) # Write progress to console
+        time.sleep(.07)
+    else:
+       # Post "NONE" to database and restart loop
+        listOfSourceVocabularies = listOfSourceVocabularies.append(pd.DataFrame({'adjustedQueryCase': currLogTerm, 'sourceVocab': "NONE"}, index=[0]), ignore_index=True)
+        print('{} --> NONE'.format(currLogTerm, )) # Write progress to console
+        time.sleep(.07)
+print ("* Done *")
+
+
+writer = pd.ExcelWriter(localDir + 'listOfSourceVocabularies.xlsx')
+listOfSourceVocabularies.to_excel(writer,'listOfSourceVocabularies')
+# df2.to_excel(writer,'Sheet2')
+writer.save()
+
+# Free up memory: Remove listToCheck, listToCheck1, listToCheck2, listToCheck3, 
+# listToCheck4, nonForeign, searchLog, unassignedAfterGS
+
+
+#%%
+
+# Load external reference file: SourceVocabsForeign.xlsx
+
+# F&R Foreign vocab names with the language name, "Spanish," "Swedish"
+
+# Append to running list of updates
+
+
+
+# ------------------------------------------------------
+# Match vocabCheck 
+# ------------------------------------------------------
+'''
+FIXME - RESULTING LIST NEEDS TO BE VETTED; START WITH HIGHEST-FREQUENCY USE.
+
+Update naming? this is the result from the API run for languages
+
+This custom list of vocabs does not include and English vocabs, therefore, 
+only foreign matches are returned, which is what we want.
+
+Re-start:
+listOfSourceVocabularies = pd.read_excel(localDir + 'listOfSourceVocabularies.xlsx')
+'''
+
+# Load list of Non-English vocabularies
+# 7/5/2018, https://www.nlm.nih.gov/research/umls/sourcereleasedocs/index.html (English vocabs not included.)
+UMLS_NonEnglish_Vocabularies = pd.read_excel(localDir + 'UMLS_Non-English_Vocabularies.xlsx')
+
+# Inner join
+foreignButEnglishChar = pd.merge(listOfSourceVocabularies, UMLS_NonEnglish_Vocabularies, how='inner', left_on='sourceVocab', right_on='Vocabulary')
+
+
+# Get frequency count, reduce cols for easier manual checking
+PerhapsForeign = pd.merge(foreignButEnglishChar, listOfUniqueUnassignedAfterUmls1, how='inner', on='adjustedQueryCase')
+
+PerhapsForeign = PerhapsForeign.sort_values(by='timesSearched', ascending=False)
+PerhapsForeign = PerhapsForeign.reset_index()
+PerhapsForeign.drop(['index'], axis=1, inplace=True)
+col = ['adjustedQueryCase', 'timesSearched', 'Language']
+PerhapsForeign = PerhapsForeign[col]
+PerhapsForeign.rename(columns={'Language': 'LanguageGuess'}, inplace=True)
+
+# Send out for manual checking
+writer = pd.ExcelWriter(localDir + 'PerhapsForeign.xlsx')
+PerhapsForeign.to_excel(writer,'PerhapsForeign')
+# df2.to_excel(writer,'Sheet2')
+writer.save()
+
+'''
+In Excel or Flask, delete rows with terms that we use in English; check that 
+tyhe remaining rows contain terms that most English speakers would think are 
+foriegn. 
+Supplement cols for the definite foreign terms, append to GoldStandard as 
+foreign terms.
+'''
+
+#%%
+
+# Update GoldStandard with edits from PerhapsForeign result
+
+# Update current log file from PerhapsForeign result
+
+# Create new 'uniques' list for FuzzyWuzzy
+
+
+
+
+
+
+#%%
+# ===========================================================================
+# 5. Second UMLS API clean-up - Create logAfterUmlsApi2 as an 
+# update to logAfterUmlsApi by appending newUmlsWithSemanticGroupData
+# ===========================================================================
+'''
+Use this AFTER you do a SECOND run against the UMLS Metathesaurus API.
+
+Re-start: 
+logAfterUmlsApi2 = pd.read_excel(localDir + 'logAfterUmlsApi2.xlsx')
+'''
+
+logAfterUmlsApi2 = pd.read_excel(localDir + 'logAfterUmlsApi1.xlsx')
+
+# FIXME - Remove after this is fixed within the fixme above.
+logAfterUmlsApi2 = logAfterUmlsApi2.sort_values(by='adjustedQueryCase', ascending=False)
+logAfterUmlsApi2 = logAfterUmlsApi2.reset_index()
+logAfterUmlsApi2.drop(['index'], axis=1, inplace=True)
+
+
+# Join new UMLS API adds to the current search log master
+logAfterUmlsApi2 = pd.merge(logAfterUmlsApi, newUmlsWithSemanticGroupData, how='left', on='adjustedQueryCase')
+
+# Future: Look for a better way to do the above - MERGE WITH CONDITIONAL OVERWRITE. Temporary fix:
+logAfterUmlsApi2['preferredTerm2'] = logAfterUmlsApi2['preferredTerm_x'].where(logAfterUmlsApi2['preferredTerm_x'].notnull(), logAfterUmlsApi2['preferredTerm_y'])
+logAfterUmlsApi2['SemanticTypeName2'] = logAfterUmlsApi2['SemanticTypeName_x'].where(logAfterUmlsApi2['SemanticTypeName_x'].notnull(), logAfterUmlsApi2['SemanticTypeName_y'])
+logAfterUmlsApi2['SemanticGroupCode2'] = logAfterUmlsApi2['SemanticGroupCode_x'].where(logAfterUmlsApi2['SemanticGroupCode_x'].notnull(), logAfterUmlsApi2['SemanticGroupCode_y'])
+logAfterUmlsApi2['SemanticGroup2'] = logAfterUmlsApi2['SemanticGroup_x'].where(logAfterUmlsApi2['SemanticGroup_x'].notnull(), logAfterUmlsApi2['SemanticGroup_y'])
+logAfterUmlsApi2['BranchPosition2'] = logAfterUmlsApi2['BranchPosition_x'].where(logAfterUmlsApi2['BranchPosition_x'].notnull(), logAfterUmlsApi2['BranchPosition_y'])
+logAfterUmlsApi2['CustomTreeNumber2'] = logAfterUmlsApi2['CustomTreeNumber_x'].where(logAfterUmlsApi2['CustomTreeNumber_x'].notnull(), logAfterUmlsApi2['CustomTreeNumber_y'])
+logAfterUmlsApi2.drop(['preferredTerm_x', 'preferredTerm_y',
+                          'SemanticTypeName_x', 'SemanticTypeName_y',
+                          'SemanticGroup_x', 'SemanticGroup_y',
+                          'SemanticGroupCode_x', 'SemanticGroupCode_y',
+                          'BranchPosition_x', 'BranchPosition_y', 
+                          'CustomTreeNumber_x', 'CustomTreeNumber_y'], axis=1, inplace=True)
+logAfterUmlsApi2.rename(columns={'preferredTerm2': 'preferredTerm',
+                                    'SemanticTypeName2': 'SemanticTypeName',
+                                    'SemanticGroup2': 'SemanticGroup',
+                                    'SemanticGroupCode2': 'SemanticGroupCode',
+                                    'BranchPosition2': 'BranchPosition',
+                                    'CustomTreeNumber2': 'CustomTreeNumber'
+                                    }, inplace=True)
+
+# Save to file so you can open in future sessions, if needed
+writer = pd.ExcelWriter(localDir + 'logAfterUmlsApi2.xlsx')
+logAfterUmlsApi2.to_excel(writer,'logAfterUmlsApi2')
+# df2.to_excel(writer,'Sheet2')
+writer.save()
+
+
+
+# -----------------------------------------------
+# Create files to assign Semantic Types manually
+# -----------------------------------------------
+'''
+If you want to add matches manually using two spreadsheet windows
+To do in Python - cluster:
+    - Probable person names
+    - Probable NLM products, services, web pages
+    - Probable journal names
+'''
+
+col = ['SemanticGroup', 'SemanticTypeName', 'Definition', 'Examples']
+SemRef = SemanticNetworkReference[col]
+
+# Get class distributions if you want to bolster under-represented sem types
+
+currentSemTypeCount = GoldStandard['SemanticTypeName'].value_counts()
+currentSemTypeCount = pd.DataFrame({'TypeCount':currentSemTypeCount})
+currentSemTypeCount.sort_values("TypeCount", ascending=True, inplace=True)
+currentSemTypeCount = currentSemTypeCount.reset_index()
+currentSemTypeCount = currentSemTypeCount.rename(columns={'index': 'SemanticTypeName'})
+
+
+
+# ------------------------------------
+# Visualize results - logAfterUmlsApi2
+# ------------------------------------
+    
+# Pie for percentage of rows assigned; https://pythonspot.com/matplotlib-pie-chart/
+totCount = len(logAfterUmlsApi2)
+unassigned = logAfterUmlsApi2['SemanticGroup'].isnull().sum()
+assigned = totCount - unassigned
+labels = ['Assigned', 'Unassigned']
+sizes = [assigned, unassigned]
+colors = ['lightskyblue', 'lightcoral']
+explode = (0.1, 0)  # explode 1st slice
+plt.pie(sizes, explode=explode, labels=labels, colors=colors,
+        autopct='%1.f%%', shadow=True, startangle=100)
+plt.axis('equal')
+plt.title("Status after 'UMLS API 2' processing")
+plt.show()
+
+# Bar of SemanticGroup categories, horizontal
+# Source: http://robertmitchellv.com/blog-bar-chart-annotations-pandas-mpl.html
+ax = logAfterUmlsApi2['SemanticGroup'].value_counts().plot(kind='barh', figsize=(10,6),
+                                                 color="slateblue", fontsize=10);
+ax.set_alpha(0.8)
+ax.set_title("Categories assigned after 'UMLS API 2' processing", fontsize=14)
+ax.set_xlabel("Number of searches", fontsize=9);
+# set individual bar lables using above list
+for i in ax.patches:
+    # get_width pulls left or right; get_y pushes up or down
+    ax.text(i.get_width()+.1, i.get_y()+.31, \
+            str(round((i.get_width()), 2)), fontsize=9, color='dimgrey')
+# invert for largest on top 
+ax.invert_yaxis()
+plt.gcf().subplots_adjust(left=0.3)
+
+
+
+#%%
+# ===========================================================================
+# 6. Create new 'uniques' dataframe/file for fuzzy matching
+# ===========================================================================
+'''
+Re-start
+
+logAfterUmlsApi1 = pd.read_excel(localDir + 'logAfterUmlsApi1.xlsx')
+
+# Set a date range
+AprMay = logAfterUmlsApi1[(logAfterUmlsApi1['Timestamp'] > '2018-04-01 01:00:00') & (logAfterUmlsApi1['Timestamp'] < '2018-06-01 00:00:00')]
+
+logAfterUmlsApi2 = AprMay
+
+# Restrict to NLM Home
+searchfor = ['www.nlm.nih.gov$', 'www.nlm.nih.gov/$']
+logAfterUmlsApi2 = logAfterUmlsApi2[logAfterUmlsApi2.Referrer.str.contains('|'.join(searchfor))]
+
+# Set a date range
+AprMay = logAfterUmlsApi1[(logAfterUmlsApi1['Timestamp'] > '2018-04-01 01:00:00') & (logAfterUmlsApi1['Timestamp'] < '2018-06-01 00:00:00')]
+
+logAfterUmlsApi2 = AprMay
+
+'''
+
+
+
+listOfUniqueUnassignedAfterUmls2 = logAfterUmlsApi2[pd.isnull(logAfterUmlsApi2['preferredTerm'])]
+listOfUniqueUnassignedAfterUmls2 = listOfUniqueUnassignedAfterUmls2.groupby('adjustedQueryCase').size()
+listOfUniqueUnassignedAfterUmls2 = pd.DataFrame({'timesSearched':listOfUniqueUnassignedAfterUmls2})
+listOfUniqueUnassignedAfterUmls2 = listOfUniqueUnassignedAfterUmls2.sort_values(by='timesSearched', ascending=False)
+listOfUniqueUnassignedAfterUmls2 = listOfUniqueUnassignedAfterUmls2.reset_index()
+
+writer = pd.ExcelWriter(localDir + 'unassignedToCheck2.xlsx')
+listOfUniqueUnassignedAfterUmls2.to_excel(writer,'unassignedToCheck')
+writer.save()
diff --git a/03_Fuzzy_match.ipynb b/03_Fuzzy_match.ipynb
new file mode 100644
index 0000000..02c26a6
--- /dev/null
+++ b/03_Fuzzy_match.ipynb
@@ -0,0 +1,530 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Part 3. Fuzzy match\n",
+    "App to analyze web-site search logs (internal search)<br>\n",
+    "**This script:** For training ML algorithms: Post fuzzy match candidates to browser so user can select manually<br>\n",
+    "Authors: dan.wendling@nih.gov, <br>\n",
+    "Last modified: 2018-09-09\n",
+    "\n",
+    "\n",
+    "## Script contents\n",
+    "\n",
+    "1. Start-up / What to put into place, where\n",
+    "2. FuzzyWuzzyListToAdd - FuzzyWuzzy matching\n",
+    "3. Add result to MySQL, process at http://localhost:5000/fuzzy/\n",
+    "   (Use browser to update MySQL table)\n",
+    "4. Bring data from manual_assignments back into Pandas\n",
+    "5. Update log and GoldStandard with new matches from MySQL\n",
+    "6. Create new 'uniques' dataframe/file for ML\n",
+    "7. Next steps\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# 1. Start-up / What to put into place, where\n",
+    "# ============================================\n",
+    "\n",
+    "import pandas as pd\n",
+    "import matplotlib.pyplot as plt\n",
+    "from matplotlib.pyplot import pie, axis, show\n",
+    "import numpy as np\n",
+    "import requests\n",
+    "import json\n",
+    "import lxml.html as lh\n",
+    "from lxml.html import fromstring\n",
+    "import time\n",
+    "import os\n",
+    "\n",
+    "# Set working directory\n",
+    "os.chdir('/Users/wendlingd/Projects/webDS/_util')\n",
+    "\n",
+    "localDir = '03_Fuzzy_match_files/'\n",
+    "\n",
+    "\n",
+    "\n",
+    "# Bring in historical file of (somewhat edited) matches\n",
+    "GoldStandard = '01_Text_wrangling_files/GoldStandard_master.xlsx'\n",
+    "GoldStandard = pd.read_excel(GoldStandard)\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 2. FuzzyWuzzyListToAdd - FuzzyWuzzy matching\n",
+    "5,000 in ~25 minutes; ~10,000 in ~50 minutes (at score_cutoff=85)\n",
+    "\n",
+    "Fuzzy match can be applied to an entire column of dataset_1 to return the \n",
+    "best score against the column of dataset_2. Here we set the scorer to \n",
+    "‘token_set_ratio’ with score_cutoff of (originally) 90.\n",
+    "\n",
+    "FuzzyWuzzy was written for single inputs to a web form; I, however, \n",
+    "am using it to compare one dataframe column to another dataframe's column,\n",
+    "which outside https://www.neudesic.com/blog/fuzzywuzzy-using-python/ is \n",
+    "poorly documented. Takes a lot to match the tokenized function output back \n",
+    "to the original untokenized term, which is necessary for this work.\n",
+    "\n",
+    "For more options see temp_FuzzyWuzzyHowTo.py\n",
+    "\n",
+    "Browser page looks like: \n",
+    "\n",
+    "<img src=\"03_Fuzzy_match_files/fuzzyMatch-Browser.png\" />\n",
+    "\n",
+    "\n",
+    "# Quick test, if you want - punctuation difference\n",
+    "fuzz.ratio('Testing FuzzyWuzzy', 'Testing FuzzyWuzzy!!')\n",
+    "\n",
+    "FuzzyWuzzyResults - What the results of this function mean:\n",
+    "('hippocratic oath', 100, 2987)\n",
+    "('Best match string from dataset_2' (GoldStandard), 'Score of best match', 'Index of best match string in GoldStandard')\n",
+    "\n",
+    "Re-start:\n",
+    "listOfUniqueUnassignedAfterUmls11 = pd.read_excel('02_Run_APIs_files/listOfUniqueUnassignedAfterUmls11.xlsx')\n",
+    "GoldStandard = pd.read_excel('01_Text_wrangling_files/GoldStandard_master.xlsx')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from fuzzywuzzy import fuzz, process\n",
+    "\n",
+    "# Recommendation: Test first\n",
+    "# fuzzySourceZ = listOfUniqueUnassignedAfterGS.iloc[0:25]\n",
+    "\n",
+    "# 2018-07-08: Created FuzzyWuzzyProcResult1, 3,000 records, in 24 minutes\n",
+    "# 2018-07-09: 5,000 in 39 minutes\n",
+    "# 2018-07-09: 4,000 in 32 minutes\n",
+    "fuzzySourceZ = listOfUniqueUnassignedAfterUmls2.iloc[0:4000]\n",
+    "\n",
+    "'''\n",
+    "fuzzySource1 = listOfUniqueUnassignedAfterGS.iloc[0:5000]\n",
+    "fuzzySource2 = listOfUniqueUnassignedAfterGS.iloc[5001:10678]\n",
+    "'''\n",
+    "\n",
+    "def fuzzy_match(x, choices, scorer, cutoff):\n",
+    "    return process.extractOne(\n",
+    "        x, choices=choices, scorer=scorer, score_cutoff=cutoff\n",
+    "    )\n",
+    "\n",
+    "# Create series FuzzyWuzzyResults\n",
+    "FuzzyWuzzyProcResult1 = fuzzySourceZ.loc[:, 'adjustedQueryCase'].apply(\n",
+    "        fuzzy_match,\n",
+    "    args=( GoldStandard.loc[:, 'adjustedQueryCase'],\n",
+    "            fuzz.token_set_ratio,\n",
+    "            95\n",
+    "        )\n",
+    ")\n",
+    "\n",
+    "# Convert FuzzyWuzzyResults Series to df\n",
+    "FuzzyWuzzyProcResult2 = pd.DataFrame(FuzzyWuzzyProcResult1)\n",
+    "\n",
+    "# Move Index (IDs) into 'FuzzyIndex' col because Index values will be discarded\n",
+    "FuzzyWuzzyProcResult2 = FuzzyWuzzyProcResult2.reset_index()\n",
+    "FuzzyWuzzyProcResult2 = FuzzyWuzzyProcResult2.rename(columns={'index': 'FuzzyIndex'})\n",
+    "\n",
+    "# Remove nulls\n",
+    "FuzzyWuzzyProcResult2 = FuzzyWuzzyProcResult2[FuzzyWuzzyProcResult2.adjustedQueryCase.notnull() == True] # remove nulls\n",
+    "\n",
+    "# Move tuple output into 3 cols\n",
+    "FuzzyWuzzyProcResult2[['FuzzyToken', 'FuzzyScore', 'GoldStandardIndex']] = FuzzyWuzzyProcResult2['adjustedQueryCase'].apply(pd.Series)\n",
+    "FuzzyWuzzyProcResult2.drop(['adjustedQueryCase'], axis=1, inplace=True) # drop tuples\n",
+    "\n",
+    "# Merge result to the orig source list cols\n",
+    "FuzzyWuzzyProcResult3 = pd.merge(FuzzyWuzzyProcResult2, fuzzySourceZ, how='left', left_index=True, right_index=True)\n",
+    "\n",
+    "# Change col order for browsability if you want to analyze this by itself\n",
+    "FuzzyWuzzyProcResult3 = FuzzyWuzzyProcResult3[['adjustedQueryCase', 'FuzzyToken', 'FuzzyScore', 'timesSearched', 'FuzzyIndex', 'GoldStandardIndex']]\n",
+    "\n",
+    "# Merge result to GoldStandard supplemental info\n",
+    "# Don't have a second person altering GoldStandard during your work...\n",
+    "FuzzyWuzzyProcResult4 = pd.merge(FuzzyWuzzyProcResult3, GoldStandard, how='left', left_on='GoldStandardIndex', right_index=True)\n",
+    "\n",
+    "# Reduce and rename\n",
+    "FuzzyWuzzyProcResult4 = FuzzyWuzzyProcResult4[['adjustedQueryCase_x', 'preferredTerm', 'FuzzyToken', 'SemanticTypeName', 'SemanticGroup', 'timesSearched', 'FuzzyScore']]\n",
+    "FuzzyWuzzyProcResult4 = FuzzyWuzzyProcResult4.rename(columns={'adjustedQueryCase_x': 'adjustedQueryCase'})\n",
+    "\n",
+    "# Change name to be sensical inside other procedures\n",
+    "FuzzyWuzzyRawRecommendations = FuzzyWuzzyProcResult4\n",
+    "\n",
+    "\n",
+    "# Save to file so you can open in future sessions, if needed\n",
+    "writer = pd.ExcelWriter(localDir + 'FuzzyWuzzyRawRecommendations.xlsx')\n",
+    "FuzzyWuzzyRawRecommendations.to_excel(writer,'FuzzyWuzzyRawRecommendations')\n",
+    "# df2.to_excel(writer,'Sheet2')\n",
+    "writer.save()\n",
+    "\n",
+    "\n",
+    "# Future: chart for precent of total that were assigned something...\n",
+    "\n",
+    "# Remove fuzzySource1, etc., FuzzyWuzzyProcResult1, FuzzyWuzzyProcResult2, etc.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# 3. Add result to MySQL, process at http://localhost:5000/fuzzy/\n",
+    "# ========================================================================\n",
+    "\n",
+    "# Requires manual_assignments table, see 06_Load_database.\n",
+    "\n",
+    "# Add dataframe to MySQL\n",
+    "\n",
+    "import pandas as pd\n",
+    "import mysql.connector\n",
+    "from pandas.io import sql\n",
+    "from sqlalchemy import create_engine\n",
+    "\n",
+    "dbconn = create_engine('mysql+mysqlconnector://wendlingd:pwd@localhost/ia')\n",
+    "\n",
+    "FuzzyWuzzyRawRecommendations.to_sql(name='manual_assignments', con=dbconn, if_exists = 'replace', index=False) # or if_exists='append'\n",
+    "    \n",
+    "\n",
+    "'''\n",
+    "From MySQL command line:\n",
+    "LOAD DATA LOCAL INFILE '/Users/wendlingd/Downloads/FuzzyWuzzyRawRecommendations.csv' INTO TABLE manual_assignments FIELDS TERMINATED BY ',' (adjustedQueryCase, preferredTerm, FuzzyToken, SemanticTypeName, SemanticGroup, timesSearched, FuzzyScore);\n",
+    "\n",
+    "ALTER TABLE `manual_assignments` ADD `NewSemanticTypeName` VARCHAR(100) NULL AFTER `adjustedQueryCase`;\n",
+    "\n",
+    "Re-start:\n",
+    "FuzzyWuzzyRawRecommendations = pd.read_excel(localDir + 'FuzzyWuzzyRawRecommendations')\n",
+    "\n",
+    "\n",
+    "select NewSemanticTypeName, count(*) as cnt\n",
+    "from manual_assignments\n",
+    "group by NewSemanticTypeName\n",
+    "order by cnt DESC;\n",
+    "\n",
+    "select count(*) cnt\n",
+    "from manual_assignments\n",
+    "WHERE NewSemanticTypeName IS NOT NULL\n",
+    "\n",
+    "\n",
+    "\n",
+    "FuzzyWuzzyRawRecommendations = pd.read_excel(localDir + 'FuzzyWuzzyRawRecommendations.xlsx')\n",
+    "\n",
+    "FuzzyWuzzyRawRecommendations.to_csv(localDir + 'FuzzyWuzzyRawRecommendations.csv', index=False, header=None)\n",
+    "\n",
+    "'''\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "'''\n",
+    "Resolve null column issues...\n",
+    "- No nulls\n",
+    "- Look right? Consistent, etc. \n",
+    "\n",
+    "Get SemanticTypeName for terms with new preferredTerm\n",
+    "\n",
+    "SELECT preferredTerm, NewSemanticTypeName, SemanticGroup\n",
+    "FROM manual_assignments\n",
+    "WHERE NewSemanticTypeName IS NULL\n",
+    "ORDER BY preferredTerm\n",
+    "\n",
+    "\n",
+    "When NewSemanticTypeName is null\n",
+    "\n",
+    "UPDATE manual_assignments\n",
+    "SET NewSemanticTypeName = SemanticTypeName\n",
+    "WHERE NewSemanticTypeName IS NULL\n",
+    "\n",
+    "\n",
+    "'''"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# 4. Bring data from manual_assignments back into Pandas\n",
+    "# ========================================================================\n",
+    "'''\n",
+    "Assign SemanticGroup from GoldStandard or other.\n",
+    "\n",
+    "'''\n",
+    "\n",
+    "\n",
+    "from sqlalchemy import create_engine\n",
+    "\n",
+    "dbconn = create_engine('mysql+mysqlconnector://wendlingd:DataSciPwr17@localhost/ia')\n",
+    "\n",
+    "\n",
+    "# Extract from MySQL to df\n",
+    "FuzAssigned = pd.read_sql('SELECT adjustedQueryCase, preferredTerm, NewSemanticTypeName FROM manual_assignments WHERE NewSemanticTypeName IS NOT NULL AND NewSemanticTypeName NOT LIKE \"Ignore\"', con=dbconn)\n",
+    "\n",
+    "\n",
+    "\n",
+    "# Write this to file (assuming multiple cycles)\n",
+    "writer = pd.ExcelWriter(localDir + 'FuzAssigned_BackFromMysql.xlsx')\n",
+    "FuzAssigned.to_excel(writer,'FuzAssigned')\n",
+    "writer.save()\n",
+    "\n",
+    "\n",
+    "# update SemanticGroup from GoldStandard_master\n",
+    "\n",
+    "gsUnique = GoldStandard[['preferredTerm', 'SemanticTypeName', 'SemanticGroup', 'SemanticGroupCode', 'BranchPosition', 'CustomTreeNumber']]\n",
+    "\n",
+    "gsUnique = gsUnique.drop_duplicates()\n",
+    "FuzAssigned2 = pd.merge(FuzAssigned, gsUnique, how='inner', on='preferredTerm')\n",
+    "\n",
+    "# Not sure why NewSemanticTypeName and SemanticTypeName are the same.\n",
+    "FuzAssigned2.drop(['NewSemanticTypeName'], axis=1, inplace=True)\n",
+    "\n",
+    "# Append to GoldStandard_master\n",
+    "GoldStandard = GoldStandard.append(FuzAssigned2, sort=True)\n",
+    "\n",
+    "# Write new GoldStandard\n",
+    "writer = pd.ExcelWriter('01_Text_wrangling_files/GoldStandard_master.xlsx')\n",
+    "GoldStandard.to_excel(writer,'GoldStandard')\n",
+    "# df2.to_excel(writer,'Sheet2')\n",
+    "writer.save()\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# 5. Update log and GoldStandard with new matches from MySQL\n",
+    "# ========================================================================\n",
+    "'''\n",
+    "Move clean-up work into browser.\n",
+    "\n",
+    "delete from manual_assignments\n",
+    "where NewSemanticTypeName like 'Ignore'\n",
+    "\n",
+    "Re-start:\n",
+    "logAfterUmlsApi1 = pd.read_excel('02_Run_APIs_files/logAfterUmlsApi1.xlsx')\n",
+    "'''\n",
+    "\n",
+    "logAfterUmlsApi1 = pd.read_excel('02_Run_APIs_files/logAfterUmlsApi1.xlsx')\n",
+    "\n",
+    "\n",
+    "# Apply to log file\n",
+    "logAfterFuzzyMatch = pd.merge(logAfterUmlsApi1, FuzAssigned2, how='left', on='adjustedQueryCase')\n",
+    "\n",
+    "# Future: Look for a better way to do the above - MERGE WITH CONDITIONAL OVERWRITE. Temporary fix:\n",
+    "logAfterFuzzyMatch['preferredTerm2'] = logAfterFuzzyMatch['preferredTerm_x'].where(logAfterFuzzyMatch['preferredTerm_x'].notnull(), logAfterFuzzyMatch['preferredTerm_y'])\n",
+    "logAfterFuzzyMatch['SemanticTypeName2'] = logAfterFuzzyMatch['SemanticTypeName_x'].where(logAfterFuzzyMatch['SemanticTypeName_x'].notnull(), logAfterFuzzyMatch['SemanticTypeName_y'])\n",
+    "logAfterFuzzyMatch['SemanticGroupCode2'] = logAfterFuzzyMatch['SemanticGroupCode_x'].where(logAfterFuzzyMatch['SemanticGroupCode_x'].notnull(), logAfterFuzzyMatch['SemanticGroupCode_y'])\n",
+    "logAfterFuzzyMatch['SemanticGroup2'] = logAfterFuzzyMatch['SemanticGroup_x'].where(logAfterFuzzyMatch['SemanticGroup_x'].notnull(), logAfterFuzzyMatch['SemanticGroup_y'])\n",
+    "logAfterFuzzyMatch['BranchPosition2'] = logAfterFuzzyMatch['BranchPosition_x'].where(logAfterFuzzyMatch['BranchPosition_x'].notnull(), logAfterFuzzyMatch['BranchPosition_y'])\n",
+    "logAfterFuzzyMatch['CustomTreeNumber2'] = logAfterFuzzyMatch['CustomTreeNumber_x'].where(logAfterFuzzyMatch['CustomTreeNumber_x'].notnull(), logAfterFuzzyMatch['CustomTreeNumber_y'])\n",
+    "logAfterFuzzyMatch.drop(['preferredTerm_x', 'preferredTerm_y',\n",
+    "                          'SemanticTypeName_x', 'SemanticTypeName_y',\n",
+    "                          'SemanticGroup_x', 'SemanticGroup_y',\n",
+    "                          'SemanticGroupCode_x', 'SemanticGroupCode_y',\n",
+    "                          'BranchPosition_x', 'BranchPosition_y', \n",
+    "                          'CustomTreeNumber_x', 'CustomTreeNumber_y'], axis=1, inplace=True)\n",
+    "logAfterFuzzyMatch.rename(columns={'preferredTerm2': 'preferredTerm',\n",
+    "                                    'SemanticTypeName2': 'SemanticTypeName',\n",
+    "                                    'SemanticGroup2': 'SemanticGroup',\n",
+    "                                    'SemanticGroupCode2': 'SemanticGroupCode',\n",
+    "                                    'BranchPosition2': 'BranchPosition',\n",
+    "                                    'CustomTreeNumber2': 'CustomTreeNumber'\n",
+    "                                    }, inplace=True)\n",
+    "\n",
+    "\n",
+    "# FIXME - Why are duplicate rows introduced?\n",
+    "logAfterFuzzyMatch = logAfterFuzzyMatch.drop_duplicates()\n",
+    "\n",
+    "\n",
+    "# Save to file so you can open in future sessions, if needed\n",
+    "writer = pd.ExcelWriter(localDir + 'logAfterFuzzyMatch.xlsx')\n",
+    "logAfterFuzzyMatch.to_excel(writer,'logAfterFuzzyMatch')\n",
+    "# df2.to_excel(writer,'Sheet2')\n",
+    "writer.save()\n",
+    "\n",
+    "\n",
+    "\n",
+    "# ---------------------------------------\n",
+    "# Visualize results - logAfterFuzzyMatch\n",
+    "# ---------------------------------------\n",
+    "    \n",
+    "# Pie for percentage of rows assigned; https://pythonspot.com/matplotlib-pie-chart/\n",
+    "totCount = len(logAfterFuzzyMatch)\n",
+    "unassigned = logAfterFuzzyMatch['SemanticGroup'].isnull().sum()\n",
+    "# unassigned = logAfterFuzzyMatch.loc[logAfterFuzzyMatch['preferredTerm'].str.contains('Unparsed') == True]\n",
+    "assigned = totCount - unassigned\n",
+    "labels = ['Assigned', 'Unassigned']\n",
+    "sizes = [assigned, unassigned]\n",
+    "colors = ['lightskyblue', 'lightcoral']\n",
+    "explode = (0.1, 0)  # explode 1st slice\n",
+    "plt.pie(sizes, explode=explode, labels=labels, colors=colors,\n",
+    "        autopct='%1.f%%', shadow=True, startangle=100)\n",
+    "plt.axis('equal')\n",
+    "plt.title(\"Status after 'fuzzy match' processing\")\n",
+    "plt.show()\n",
+    "\n",
+    "\n",
+    "# Bar of SemanticGroup categories, horizontal\n",
+    "# Source: http://robertmitchellv.com/blog-bar-chart-annotations-pandas-mpl.html\n",
+    "ax = logAfterFuzzyMatch['SemanticGroup'].value_counts().plot(kind='barh', figsize=(10,6),\n",
+    "                                                 color=\"slateblue\", fontsize=10);\n",
+    "ax.set_alpha(0.8)\n",
+    "ax.set_title(\"Categories assigned after 'fuzzy match' processing\", fontsize=14)\n",
+    "ax.set_xlabel(\"Number of searches\", fontsize=9);\n",
+    "# set individual bar lables using above list\n",
+    "for i in ax.patches:\n",
+    "    # get_width pulls left or right; get_y pushes up or down\n",
+    "    ax.text(i.get_width()+.1, i.get_y()+.31, \\\n",
+    "            str(round((i.get_width()), 2)), fontsize=9, color='dimgrey')\n",
+    "# invert for largest on top \n",
+    "ax.invert_yaxis()\n",
+    "plt.gcf().subplots_adjust(left=0.3)\n",
+    "\n",
+    "\n",
+    "\n",
+    "\n",
+    "'''\n",
+    "# Unite data, if there are multiple output files\n",
+    "f1 = pd.read_excel(localDir + 'FuzAssigned_Dan1.xlsx')\n",
+    "f2 = pd.read_excel(localDir + 'FuzAssigned_Dan2.xlsx')\n",
+    "f3 = pd.read_excel(localDir + 'FuzAssigned_Dan3.xlsx')\n",
+    "\n",
+    "# Concat\n",
+    "fmAdd1 = pd.concat([f1, f2, f3], ignore_index=True, sort=True)\n",
+    "\n",
+    "# drop SemanticTypeName (if present)\n",
+    "fmAdd1.drop(['SemanticTypeName'], axis=1, inplace=True)\n",
+    "\n",
+    "# Rename SemanticTypeName\n",
+    "fmAdd1 = fmAdd1.rename(columns={'NewSemanticTypeName': 'SemanticTypeName'})\n",
+    "\n",
+    "\n",
+    "# De-dupe. Future? July run, need to eyeball before deleting\n",
+    "# fmAdd1.drop_duplicates(subset=['A', 'C'], keep=False)\n",
+    "\n",
+    "searchLog.head(n=5)\n",
+    "searchLog.shape\n",
+    "searchLog.info()\n",
+    "searchLog.columns\n",
+    "\n",
+    "# Remove f1, etc.\n",
+    "'''\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# 5. Create new 'uniques' dataframe/file for ML\n",
+    "# =========================================================================\n",
+    "'''\n",
+    "Won't require Excel file if you run the df from here. File will have \n",
+    "updated entries from this session, by pulling back GoldStandard. Also will\n",
+    "make sure that previously found preferredTerm will be available as if they \n",
+    "were queries. To get maximum utility from the API.\n",
+    "\n",
+    "Training file: ApiAssignedSearches.xlsx (successful matches)\n",
+    "Unmatched terms we want to predict for: search-seed_the_ML.xlsx\n",
+    "\n",
+    "GoldStandard = pd.read_excel('01_Text_wrangling_files/GoldStandard_master.xlsx')\n",
+    "'''\n",
+    "\n",
+    "\n",
+    "# Base on UPDATED (above) GoldStandard\n",
+    "ApiAssignedSearches = GoldStandard\n",
+    "\n",
+    "col = ['adjustedQueryCase', 'preferredTerm', 'SemanticTypeName']\n",
+    "ApiAssignedSearches = ApiAssignedSearches[col]\n",
+    "\n",
+    "'''\n",
+    "get all preferredTerm items, dupe this into adjustedQueryCase column (so both \n",
+    "columns are the same, i.e, preferredTerm is also available as if it were \n",
+    "raw input; append to df, de-dedupe rows.\n",
+    "'''\n",
+    "\n",
+    "prefGrabber = ApiAssignedSearches.drop(['adjustedQueryCase'], axis=1) # drop col\n",
+    "prefGrabber.drop_duplicates(inplace=True) # de-dupe rows\n",
+    "prefGrabber['adjustedQueryCase'] = prefGrabber['preferredTerm'].str.lower()  # dupe and lc\n",
+    "\n",
+    "ApiAssignedSearches = ApiAssignedSearches.append(prefGrabber, sort=True) # append to orig\n",
+    "ApiAssignedSearches.drop_duplicates(inplace=True) # de-dupe rows after append\n",
+    "\n",
+    "# FIXME - Some adjustedQueryCase = nan\n",
+    "ApiAssignedSearches.adjustedQueryCase.fillna(ApiAssignedSearches.preferredTerm, inplace=True)\n",
+    "ApiAssignedSearches['adjustedQueryCase'].str.lower() # str.lower the nan fixes\n",
+    "\n",
+    "# Write this to file\n",
+    "writer = pd.ExcelWriter(localDir + 'ApiAssignedSearches.xlsx')\n",
+    "ApiAssignedSearches.to_excel(writer,'ApiAssignedSearches')\n",
+    "writer.save()\n",
+    "\n",
+    "\n",
+    "# REMOVE\n",
+    "# Most variables but NOT ApiAssignedSearches"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# 7. Next steps\n",
+    "# ==============\n",
+    "'''\n",
+    "Open 03_ML-classification.py, run the machine learning routines. You will use\n",
+    "these Excel files or dataframes\n",
+    "\n",
+    "- ApiAssignedSearches\n",
+    "- unassignedAfterUmls1 or unassignedAfterUmls2\n",
+    "\n",
+    "'''\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.5"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/04_Machine_learning_classification.ipynb b/04_Machine_learning_classification.ipynb
new file mode 100644
index 0000000..bf7d3c1
--- /dev/null
+++ b/04_Machine_learning_classification.ipynb
@@ -0,0 +1,906 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Part 4. Machine Learning Classification\n",
+    "App to analyze web-site search logs (internal search)<br>\n",
+    "**This script:** scikit-learn for site-search classifications<br>\n",
+    "Authors: dan.wendling@nih.gov, <br>\n",
+    "Last modified: 2018-09-09\n",
+    "\n",
+    "\n",
+    "## THIS SCRIPT (A WORK IN PROGRESS)\n",
+    "\n",
+    "Some rough, not-quite in order, partly not-functioning machine learning \n",
+    "code that will eventually result in a dataframe of classification choices \n",
+    "for the ~30% of visitor search queries that the UMLS Metathesaurus is not \n",
+    "able to classify into broader categories (see 01_Pre-processing.py). \n",
+    "Some entries will be automatically assignable if misspellings can be \n",
+    "overcome with confidence.\n",
+    "\n",
+    "Desired end result in the future (feature name / column name is first row):\n",
+    "\n",
+    "| adjustedQueryCase  | preferredTerm                     | SemanticType       | SemanticGroup |\n",
+    "| gallbladder cancer | Malignant neoplasm of gallbladder | Neoplastic Process | Disorders     |\n",
+    "\n",
+    "\n",
+    "What this script is trying to generate currently:\n",
+    "    \n",
+    "| adjustedQueryCase  | UmlsApproximate | pred-LinearSVC   | pred-LogisticRegression | pred-NaiveBayesMultinomial |\n",
+    "| cbd oil            | nan             | Organic Chemical | Intellectual Product    | Organic Chemical           |\n",
+    "\n",
+    "(cbd oil is a marajuana-based product that many visitors seem to be \n",
+    "interested in. Variations on cbd will be avilable in the training set. \n",
+    "(An alternative to this work is clustering, but would it cluster correctly...)\n",
+    "\n",
+    "\n",
+    "Feature/column explanations:\n",
+    "    \n",
+    "adjustedQueryCase - Lowercased version of query with most punctuation removed.\n",
+    "preferredTerm - Can be assigned later if needed, but these are the UMLS system\n",
+    "    picks from more than 200 medical vocabularies. 01_Pre-processing.py will \n",
+    "    assign when possible; this script does not use, as described below.\n",
+    "    The UMLS system has 10s of thousands of these or more.\n",
+    "SemanticType - Select one of 130 ontology categories. Eventually I will need\n",
+    "    to select two or more of these categories for searches that look like \n",
+    "    \"cancer exercise.\" For now I am okay capturing one category.\n",
+    "SemanticGroup - One of 15 ontology super-categories.\n",
+    "UmlsApproximate - A new run of the UMLS API that will be set to more liberal\n",
+    "matching.\n",
+    "pred-LinearSVC, etc. - Predictions from the various models. This can be used for eyeballing\n",
+    "entries and adding manual assignments, to get the under-represented classes \n",
+    "some more content for future matching.\n",
+    "\n",
+    "\n",
+    "## Script contents\n",
+    "\n",
+    "1. Start-up / What to put into place, where; dataframe mods\n",
+    "2. Eyeball level of balance among classes under study, in training set\n",
+    "3. Training set: Calculate tf-idf vector for each query\n",
+    "4. Training set, Chi square: Find the terms most correlated with each item\n",
+    "5. Train, test, predict with a multi-class classifier, Naive Bayes-Multinomial\n",
+    "6. Model selection - Which among four models is the BEST model for this dataset?\n",
+    "\n",
+    "Asperational, not working\n",
+    "7. Look deeper, with a confusion matrix, into the most successful model of\n",
+    "   our group, LinearSVC (Linear Support Vector Classification)\n",
+    "8. Understand the misclassifications. Should we change the model, or not?\n",
+    "\n",
+    "Somewhat working\n",
+    "9. Chi-square to find terms MOST CORRELATED with each category\n",
+    "10. TryLinearSVCdf - Unmatched terms with LinearSVC\n",
+    "11. TryLogisticRegressiondf - Unmatched terms with LogisticRegression\n",
+    "\n",
+    "Not working\n",
+    "12. Final report by category\n",
+    "\n",
+    "\n",
+    "## FIXMEs\n",
+    "\n",
+    "Things Dan wrote for Dan; modify as needed. There are more FIXMEs in context.\n",
+    "\n",
+    "* [ ] \n",
+    "\n",
+    "- Biggest problem: I have under-represented classes such as for NLM\n",
+    "  products, that we are building manually. See file search-seed_the_ML.xlsx - \n",
+    "  these are not matchable to the UMLS API as currently configured (it is \n",
+    "  configured for high-confidence matches). We're working now to improve \n",
+    "  category prediction for things not found in the UMLS datasets, such as \n",
+    "  misspellings, NLM products and services, partial NLM Web page titles \n",
+    "  (I scrape the site so I have a file of these, but they are verbose), \n",
+    "  historical names, commercial product names, etc. These will be added \n",
+    "  to the \"GoldStandard\" file. We started with the highest-frequency \n",
+    "  unmatched; hopefully ML can take over some or most of this. Clustering \n",
+    "  and FuzzyWuzzy will probably help here.\n",
+    "  \n",
+    "- Dan will add second and third runs for the UMLS API, as described in \n",
+    "  01_Pre-processing.py, to resolve non-English queries and provide a feature\n",
+    "  (column) of UMLS API guesses, whose prediction scores were too low to \n",
+    "  return in the \"normalized string\" procedure I am using in the single \n",
+    "  current UMLS API run. Then an editor can perhaps choose among the UMLS, \n",
+    "  LinearSVC, LogisticRegression, etc., predictions.\n",
+    "  \n",
+    "- For the ML code below, I am trying to assign from the ~130 \n",
+    "  *SemanticTypeName* categories (see file 01_Pre-processing_files/\n",
+    "  SemanticNetworkReference.xlsx). I think using *SemanticTypeName* is \n",
+    "  best for the project; we could also try to match to the 15 \n",
+    "  super-categories and then create more routines to match to the 130 \n",
+    "  sub-categories.\n",
+    "  \n",
+    "- Add FuzzyWuzzy, perhaps fix misspellings in place (in col adjustedQueryCase).\n",
+    "  If confidence is high that it will be the right fix...\n",
+    "- Add stemming, lemmatization?\n",
+    "- Future: Add the ability to assign one query to multiple categories.\n",
+    "- More FIXMEs may appear in context below.\n",
+    "\n",
+    "\n",
+    "## INFLUENCES\n",
+    "\n",
+    "- Susan Li, https://towardsdatascience.com/multi-class-text-classification-with-scikit-learn-12f1e60e0a9f\n",
+    "  (This code came from her code; I don't know what all of it does. Still has some of her info in comments)\n",
+    "- Andreas Mueller, https://github.com/amueller/introduction_to_ml_with_python\n",
+    "  (I am looking to add procedures from here that will assist in manual assign-\n",
+    "  ments. Not sure what to add; LDA-based charts look useful.)\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 1. Start-up / What to put into place, where; dataframe mods\n",
+    "\n",
+    "Training file: ApiAssignedSearches.xlsx (successful matches)\n",
+    "Unmatched terms we want to predict for: search-seed_the_ML.xlsx\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import pandas as pd\n",
+    "import matplotlib.pyplot as plt\n",
+    "from matplotlib.pyplot import pie, axis, show\n",
+    "import numpy as np\n",
+    "import os\n",
+    "\n",
+    "# Set working directory\n",
+    "os.chdir('/Users/wendlingd/Projects/webDS/_util')\n",
+    "\n",
+    "\n",
+    "'''\n",
+    "Bring in and adjust training file, ApiAssignedSearches.xlsx - previous \n",
+    "successful assignments; we will need adjustedQueryCase, preferredTerm, \n",
+    "SemanticTypeName, SemanticGroup...\n",
+    "'''\n",
+    "\n",
+    "df = pd.read_excel('02_UMLS_API_files/ApiAssignedSearches.xlsx')\n",
+    "\n",
+    "\n",
+    "# OR...\n",
+    "# df = ApiAssignedSearches\n",
+    "\n",
+    "\n",
+    "# Don't use preferredTerm for now - will be too inaccurate\n",
+    "df = df.drop(['preferredTerm'], axis=1) # drop col\n",
+    "\n",
+    "# Don't try to process any non-Roman characters; eyeball and remove\n",
+    "# df = df[17:]\n",
+    "\n",
+    "df.info()\n",
+    "# df.head()\n",
+    "# df.columns\n",
+    "\n",
+    "# 6/23: Trouble with fit, so trying this - remove integer data type, perhaps\n",
+    "# Remove int values in adjustedQueryCase by removal or coerced data type change\n",
+    "df['adjustedQueryCase'] = df['adjustedQueryCase'].astype(str)\n",
+    "\n",
+    "\n",
+    "df = df.sort_values(by='adjustedQueryCase', ascending=True)\n",
+    "df = df.reset_index()\n",
+    "df.drop(['index'], axis=1, inplace=True)\n",
+    "\n",
+    "'''\n",
+    "df.drop(12038, inplace=True)\n",
+    "df.drop(10714, inplace=True)\n",
+    "df.drop(6822, inplace=True)\n",
+    "'''\n",
+    "df.drop(26905, inplace=True)\n",
+    "df.drop(26904, inplace=True)\n",
+    "df.drop(26903, inplace=True)\n",
+    "\n",
+    "'''\n",
+    "To preserve changes to training file for future sessions\n",
+    "\n",
+    "# Useful to write out the cleaned up version; if you do re-processing, you can skip a bunch of work.\n",
+    "writer = pd.ExcelWriter('01_Pre-processing_files/ApiAssignedSearches.xlsx')\n",
+    "df.to_excel(writer,'ApiAssignedSearches')\n",
+    "# df2.to_excel(writer,'Sheet2')\n",
+    "writer.save()\n",
+    "'''\n",
+    "\n",
+    "# add a column encoding the product as an integer, because categorical \n",
+    "# variables are often better represented by integers than strings\n",
+    "df['category_id'] = df['SemanticTypeName'].factorize()[0]\n",
+    "\n",
+    "# create a couple of dictionaries for future use\n",
+    "category_id_df = df[['SemanticTypeName', \n",
+    "                     'category_id']].drop_duplicates().sort_values('category_id')\n",
+    "category_to_id = dict(category_id_df.values)\n",
+    "id_to_category = dict(category_id_df[['category_id', 'SemanticTypeName']].values)\n",
+    "\n",
+    "# what the first rows look like after the mods\n",
+    "df.head()\n",
+    "\n",
+    "# Bring in entries to match\n",
+    "unassignedAfterUmls1 = pd.read_excel('02_UMLS_API_files/unassignedAfterUmls1.xlsx')\n",
+    "unassignedAfterUmls1 = unassignedAfterUmls1.drop(['timesSearched'], axis=1) # drop col\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "\n",
+    "'''\n",
+    "*** I'M STUCK ON THIS ONE (FILE BELOW) ***\n",
+    "\n",
+    "Not sure of your definition of fuzzy matching, but I use it to describe \n",
+    "misspellings and also this - verbose product, service, etc. names.\n",
+    "\n",
+    "Advice on how this should be implemented would be very useful!\n",
+    "\n",
+    "People often look for words within web pages - librarians call this a \n",
+    "\"known item search,\" for a web page/product page/service page they are \n",
+    "trying to get to. Many person names are in our biography pages. Product,\n",
+    "service names, etc. Multiple searches for \"Kenneth Walker\" - people\n",
+    "are probably trying to get to https://www.nlm.nih.gov/news/nlm_mourns_ken_walker.html,\n",
+    "or that's where we want them to get to. \n",
+    "\n",
+    "I use SEO Spider to scrape web page content, including page titles, in this \n",
+    "case\n",
+    "they probably are trying to get to the page titled \"NLM Mourns the Loss \n",
+    "of H. Kenneth Walker, MD, MACP, FAAN, Former Chair of the National \n",
+    "Library of Medicine Board of Regents.\"\n",
+    "\n",
+    "Do I have to vectorize this whole title? How should the matches between\n",
+    "visitor search terms and verbose title pages, be implemented? Name \n",
+    "extraction first?\n",
+    "\n",
+    "I also have to do this with the list of 200+ named NLM products (mostly\n",
+    "databases such as pubmed.gov).\n",
+    "'''\n",
+    "\n",
+    "# This data needs to be used to fuzzy match against web page names.\n",
+    "ShouldBeFuzzyMatched = pd.read_excel('03_ML-classification_files/ShouldBeFuzzyMatched.xlsx')\n",
+    "\n",
+    "ShouldBeFuzzyMatched.head(n=10)\n",
+    "\n",
+    "'''\n",
+    "ShouldBeFuzzyMatched is not used in this script; please suggest methods. \n",
+    "For the above example I would like eventually to end up with:\n",
+    "\n",
+    "| adjustedQueryCase | preferredTerm | SemanticType | SemanticGroup         |\n",
+    "| kenneth walker    | NLM News 2018 | NLM Web Page | Intellectual Products |\n",
+    "\n",
+    "\n",
+    "FYI, preferredTerm is not part of this script currrently, I am dropping \n",
+    "that column above. But the value of preferredTerm would be \n",
+    "ShouldBeFuzzyMatched['ContentGroup']) when a match is made to the page title.\n",
+    "Regarding SemanticType, to the original ontology of ~130 types I have added \n",
+    "several NLM-specific type names such as \"NLM Web Page.\" Not many in the\n",
+    "training set yet.\n",
+    "'''\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# 2. Eyeball level of balance among classes under study, in training set\n",
+    "# Less than 1 minute\n",
+    "# =======================================================================\n",
+    "'''\n",
+    "Perhaps this should be used with item 12 below, Final report, after that\n",
+    "has been run.\n",
+    "'''\n",
+    "\n",
+    "fig = plt.figure(figsize=(10,20))\n",
+    "df.groupby('SemanticTypeName').adjustedQueryCase.count().sort_index(ascending=False).plot.barh(ylim=0, fontsize=6, color=\"slateblue\")\n",
+    "fig.subplots_adjust(left=0.3)\n",
+    "plt.title(\"Eyeball level of balance among classes\", fontsize=12)\n",
+    "plt.xlabel(\"Number of queries\")\n",
+    "plt.show()\n",
+    "\n",
+    "\n",
+    "'''\n",
+    "(Wow, lots of variation. Many under-represented classes.)\n",
+    "'''\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# 3. Training set: Calculate tf-idf vector for each query\n",
+    "# Less than 1 minute\n",
+    "# ========================================================\n",
+    "'''\n",
+    "Current classifiers and learning algorithms can not directly process \n",
+    "text in its original form; most of them expect numerical feature vectors \n",
+    "with a fixed size, rather than raw text of variable length. Therefore, \n",
+    "in this preprocessing step, comment text will be converted to a more manageable \n",
+    "representation.\n",
+    "\n",
+    "One common approach for extracting features from text is to use the \n",
+    "\"bag of words\" model, where for each document, a complaint narrative \n",
+    "in our case, the presence (and often the frequency) of words is taken \n",
+    "into consideration, but the order in which they occur is ignored.\n",
+    "\n",
+    "Specifically, for each term in our dataset, we will calculate a measure \n",
+    "called Term Frequency, Inverse Document Frequency, abbreviated to tf-idf. \n",
+    "\n",
+    "We will use sklearn.feature_extraction.text.TfidfVectorizer to calculate \n",
+    "a tf-idf vector for each of our consumer complaint narratives.\n",
+    "\n",
+    "Cf. http://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html\n",
+    "'''\n",
+    "\n",
+    "from sklearn.feature_extraction.text import TfidfVectorizer\n",
+    "\n",
+    "tfidf = TfidfVectorizer(sublinear_tf=True, # True = Use a logarithmic form for frequency\n",
+    "                        min_df=5,   # minimum number of documents a word must be present in to be kept\n",
+    "                        norm='l2',  # to ensure all our feature vectors have a euclidian norm of 1\n",
+    "                        encoding='latin-1', \n",
+    "                        ngram_range=(1, 2), # both unigrams and bigrams\n",
+    "                        stop_words='english') # remove common \"noise\" words, limit resulting features to useful ones\n",
+    "\n",
+    "features = tfidf.fit_transform(df.adjustedQueryCase).toarray()\n",
+    "labels = df.category_id\n",
+    "features.shape\n",
+    "\n",
+    "\n",
+    "'''\n",
+    "Shape example, (29289, 75036)\n",
+    "\n",
+    "Now, each of 29289 queries is represented by 75036 features, representing \n",
+    "the tf-idf score for different unigrams and bigrams.\n",
+    "'''\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# 4. Training set, Chi square: Find the terms most correlated with each item\n",
+    "# Less than 1 minute\n",
+    "# ===========================================================================\n",
+    "'''\n",
+    "We can use sklearn.feature_selection.chi2 to find the terms that are the \n",
+    "most correlated with each of the products.\n",
+    "\n",
+    "Cf. http://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.chi2.html\n",
+    "'''\n",
+    "\n",
+    "from sklearn.feature_selection import chi2\n",
+    "\n",
+    "N = 2\n",
+    "for Product, category_id in sorted(category_to_id.items()):\n",
+    "  features_chi2 = chi2(features, labels == category_id)\n",
+    "  indices = np.argsort(features_chi2[0])\n",
+    "  feature_names = np.array(tfidf.get_feature_names())[indices]\n",
+    "  unigrams = [v for v in feature_names if len(v.split(' ')) == 1]\n",
+    "  bigrams = [v for v in feature_names if len(v.split(' ')) == 2]\n",
+    "  print(\"# '{}':\".format(Product))\n",
+    "  print(\"  . Most correlated unigrams:\\n       . {}\".format('\\n       . '.join(unigrams[-N:])))\n",
+    "  print(\"  . Most correlated bigrams:\\n       . {}\".format('\\n       . '.join(bigrams[-N:])))\n",
+    "  "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# 5. Train, test, predict with a multi-class classifier, Naive Bayes-Multinomial\n",
+    "# Less than 1 minute\n",
+    "# ===============================================================================\n",
+    "'''\n",
+    "To train supervised classifiers, we first transformed the “Consumer \n",
+    "complaint narrative” into a vector of numbers. We explored vector \n",
+    "representations such as TF-IDF weighted vectors.\n",
+    "\n",
+    "After having a vector representation of the text, we can train \n",
+    "supervised-learning classifiers to train unseen “Consumer complaint narrative” \n",
+    "entries and predict which of our products to assign them to.\n",
+    "\n",
+    "Here we will vectorize with CountVectorizer and transform with TfidfTransformer.\n",
+    "'''\n",
+    "\n",
+    "from sklearn.model_selection import train_test_split\n",
+    "from sklearn.feature_extraction.text import CountVectorizer\n",
+    "from sklearn.feature_extraction.text import TfidfTransformer\n",
+    "\n",
+    "X_train, X_test, y_train, y_test = train_test_split(df['adjustedQueryCase'], df['SemanticTypeName'], random_state = 0)\n",
+    "count_vect = CountVectorizer()\n",
+    "X_train_counts = count_vect.fit_transform(X_train)\n",
+    "tfidf_transformer = TfidfTransformer()\n",
+    "X_train_tfidf = tfidf_transformer.fit_transform(X_train_counts)\n",
+    "\n",
+    "\n",
+    "'''\n",
+    "Now that we have all the features and labels, we can start training the \n",
+    "classifier. There are a number of algorithms that might be useful for the \n",
+    "current dataset. Naive Bayes is a common go-to. The model most suitable \n",
+    "for word counts is the multinomial variant.\n",
+    "\n",
+    "Cf. http://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.MultinomialNB.html\n",
+    "\n",
+    "'''\n",
+    "\n",
+    "from sklearn.naive_bayes import MultinomialNB\n",
+    "\n",
+    "clf = MultinomialNB().fit(X_train_tfidf, y_train)\n",
+    "\n",
+    "\n",
+    "# After fitting the training set, let’s try a few predictions.\n",
+    "\n",
+    "# Tests\n",
+    "print(clf.predict(count_vect.transform([\"herpes i\"])))\n",
+    "print(clf.predict(count_vect.transform([\"bemer\"])))\n",
+    "print(clf.predict(count_vect.transform([\"dental journals\"])))\n",
+    "print(clf.predict(count_vect.transform([\"intermittent fasting\"])))\n",
+    "print(clf.predict(count_vect.transform([\"cardiac tamponade pericardial lymphoma\"])))\n",
+    "print(clf.predict(count_vect.transform([\"fisioterapia\"])))\n",
+    "print(clf.predict(count_vect.transform([\"diabete\"])))\n",
+    "print(clf.predict(count_vect.transform([\"journal of clinical and diagnostic research\"])))\n",
+    "print(clf.predict(count_vect.transform([\"hippocrates\"])))\n",
+    "print(clf.predict(count_vect.transform([\"the new england journal of medicine\"])))\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# 6. Model selection - Which among four models is the BEST model for this dataset?\n",
+    "# ~1 minute\n",
+    "# REQUIRES AT LEAST 5 EXAMPLES PER CLASS\n",
+    "# =================================================================================\n",
+    "'''\n",
+    "Let's benchmark four models used for this type of dataset, evaluate their \n",
+    "accuracy, and visualize their classification accuracy for our dataset.\n",
+    "\n",
+    "1. (Multinomial) Naive Bayes, http://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.MultinomialNB.html\n",
+    "2. Logistic Regression, http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html\n",
+    "3. Linear Support Vector Classification, http://scikit-learn.org/stable/modules/generated/sklearn.svm.LinearSVC.html\n",
+    "4. Random Forest, http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html\n",
+    "'''\n",
+    "\n",
+    "from sklearn.naive_bayes import MultinomialNB\n",
+    "from sklearn.linear_model import LogisticRegression\n",
+    "from sklearn.svm import LinearSVC\n",
+    "from sklearn.ensemble import RandomForestClassifier\n",
+    "\n",
+    "from sklearn.model_selection import cross_val_score\n",
+    "\n",
+    "models = [\n",
+    "    MultinomialNB(),\n",
+    "    LogisticRegression(random_state=0),\n",
+    "    LinearSVC(),\n",
+    "    RandomForestClassifier(n_estimators=200, max_depth=3, random_state=0),\n",
+    "]\n",
+    "CV = 5\n",
+    "cv_df = pd.DataFrame(index=range(CV * len(models)))\n",
+    "entries = []\n",
+    "for model in models:\n",
+    "  model_name = model.__class__.__name__\n",
+    "  accuracies = cross_val_score(model, features, labels, scoring='accuracy', cv=CV)\n",
+    "  for fold_idx, accuracy in enumerate(accuracies):\n",
+    "    entries.append((model_name, fold_idx, accuracy))\n",
+    "cv_df = pd.DataFrame(entries, columns=['model_name', 'fold_idx', 'accuracy'])\n",
+    "\n",
+    "\n",
+    "import seaborn as sns\n",
+    "# Cf. https://seaborn.pydata.org/generated/seaborn.boxplot.html\n",
+    "\n",
+    "sns.boxplot(x='model_name', y='accuracy', data=cv_df).set_title(\"Classifier performance (box plot)\")\n",
+    "sns.stripplot(x='model_name', y='accuracy', data=cv_df, \n",
+    "              size=8, jitter=True, edgecolor=\"gray\", linewidth=2)\n",
+    "plt.show()\n",
+    "\n",
+    "cv_df.groupby('model_name').accuracy.mean()\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# 7. Look deeper, with a confusion matrix, into the most successful model of\n",
+    "# our group, LinearSVC (Linear Support Vector Classification)\n",
+    "# Less than 1 minute\n",
+    "# ========================================================================\n",
+    "'''\n",
+    "Too many categories to display. Future, could use SemanticGroup (only 15 classes)\n",
+    "\n",
+    "Continuing with LinearSVC, the most-accurate model of the ones we tested, \n",
+    "let's create a confusion matrix to show the discrepancies between predicted \n",
+    "and actual labels within the categories.\n",
+    "\n",
+    "Parameters: http://scikit-learn.org/stable/modules/generated/sklearn.svm.LinearSVC.html\n",
+    "'''\n",
+    "\n",
+    "from sklearn.model_selection import train_test_split\n",
+    "\n",
+    "model = LinearSVC()\n",
+    "\n",
+    "X_train, X_test, y_train, y_test, indices_train, indices_test = train_test_split(\n",
+    "        features, labels, df.index, test_size=0.33, random_state=0)\n",
+    "model.fit(X_train, y_train)\n",
+    "y_pred = model.predict(X_test)\n",
+    "\n",
+    "from sklearn.metrics import confusion_matrix\n",
+    "\n",
+    "conf_mat = confusion_matrix(y_test, y_pred)\n",
+    "fig, ax = plt.subplots(figsize=(10,8))\n",
+    "sns.heatmap(conf_mat, annot=True, fmt='d',\n",
+    "            xticklabels=category_id_df.Product.values, \n",
+    "            yticklabels=category_id_df.Product.values)\n",
+    "plt.rcParams.update({'font.size': 8})\n",
+    "plt.ylabel('Actual')\n",
+    "plt.subplots_adjust(left=0.5, bottom=0.5)\n",
+    "plt.xlabel('Predicted')\n",
+    "plt.show()\n",
+    "\n",
+    "# Reduce long tag to see this as it was intended, with the Actual and \n",
+    "# Predicted labels, 'Credit reporting, credit repair services, or other \n",
+    "# personal consumer reports'.\n",
+    "\n",
+    "'''\n",
+    "The vast majority of the predictions end up on the diagonal (predicted \n",
+    "label = actual label), where we want them to be. \n",
+    "\n",
+    "However, there are a number of misclassifications, and it might be \n",
+    "interesting to see what those are caused by.\n",
+    "'''\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# 8. Understand the misclassifications. Should we change the model, or not?\n",
+    "# Less than 1 minute\n",
+    "# ==========================================================================\n",
+    "'''\n",
+    "From Susan Li's blog post, not working with this dataset.\n",
+    "\n",
+    "Uses dictionary id_to_category.\n",
+    "'''\n",
+    "\n",
+    "from IPython.display import display\n",
+    "\n",
+    "for predicted in category_id_df.category_id:\n",
+    "  for actual in category_id_df.category_id:\n",
+    "    if predicted != actual and conf_mat[actual, predicted] >= 6:\n",
+    "      print(\"'{}' predicted as '{}' : {} examples.\".format(\n",
+    "              id_to_category[actual], id_to_category[predicted], \n",
+    "              conf_mat[actual, predicted]))\n",
+    "      display(df.loc[indices_test[(y_test == actual) & \n",
+    "                                  (y_pred == predicted)]]\n",
+    "      [['SemanticTypeName', 'adjustedQueryCase']])\n",
+    "      print('')\n",
+    "\n",
+    "'''\n",
+    "When things belong in multiple categories, errors will happen; not directly\n",
+    "fixable.\n",
+    "'''\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# 9. Chi-square to find terms MOST CORRELATED with each category\n",
+    "# Less than 1 minute\n",
+    "# ========================================================================\n",
+    "'''\n",
+    "Again, we use the chi-squared test to find the terms that are the most \n",
+    "correlated with each of the categories.\n",
+    "\n",
+    "In IPython console:\n",
+    "    Start recording print output: %logstart dan1.txt\n",
+    "    Stop recording print output: %logstop\n",
+    "\n",
+    "Or don't specifiy file name and then look for ipython_log.py\n",
+    "https://ipython.org/ipython-doc/3/interactive/reference.html\n",
+    "'''\n",
+    "\n",
+    "model.fit(features, labels)\n",
+    "\n",
+    "from sklearn.feature_selection import chi2\n",
+    "\n",
+    "N = 3\n",
+    "stringCapture = \"\"\n",
+    "for Product, category_id in sorted(category_to_id.items()):\n",
+    "  indices = np.argsort(model.coef_[category_id])\n",
+    "  feature_names = np.array(tfidf.get_feature_names())[indices]\n",
+    "  unigrams = [v for v in reversed(feature_names) if len(v.split(' ')) == 1][:N]\n",
+    "  bigrams = [v for v in reversed(feature_names) if len(v.split(' ')) == 2][:N]\n",
+    "  print(\"# '{}':\".format(Product))\n",
+    "  print(\"  . Top unigrams:\\n       . {}\".format('\\n       . '.join(unigrams)))\n",
+    "  print(\"  . Top bigrams:\\n       . {}\".format('\\n       . '.join(bigrams)))\n",
+    "  stringCapture += '\\n\\n' + str(Product) + '\\n Top unigrams:\\n   ' + str(unigrams) + '\\n Top bigrams:\\n   ' + str(bigrams)\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# 10. TryLinearSVCdf - Unmatched terms with LinearSVC\n",
+    "# Less than 1 minute\n",
+    "# ========================================================================\n",
+    "\n",
+    "TryLinearSVCdf = pd.DataFrame()\n",
+    "TryLinearSVCdf['adjustedQueryCase'] = \"\"\n",
+    "TryLinearSVCdf['pred-LinearSVC'] = \"\"\n",
+    "\n",
+    "TryLinearSVC = unassignedAfterUmls1['adjustedQueryCase'].astype(str)\n",
+    "\n",
+    "text_features = tfidf.transform(TryLinearSVC)\n",
+    "\n",
+    "predictions = model.predict(text_features)\n",
+    "\n",
+    "\n",
+    "for queryTerm, predicted in zip(TryLinearSVC, predictions):\n",
+    "    TryLinearSVCdf = TryLinearSVCdf.append(pd.DataFrame({'adjustedQueryCase': queryTerm, \n",
+    "            'pred-LinearSVC': id_to_category[predicted]}, index=[0]), ignore_index=True, sort=True)\n",
+    "\n",
+    "TryLinearSVCdf = TryLinearSVCdf[['adjustedQueryCase', 'pred-LinearSVC']]\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# 11. TryLogisticRegressiondf - Unmatched terms with LogisticRegression\n",
+    "# Less than 1 minute\n",
+    "# ========================================================================\n",
+    "'''\n",
+    "https://towardsdatascience.com/logistic-regression-using-python-sklearn-numpy-mnist-handwriting-recognition-matplotlib-a6b31e2b166a\n",
+    "'''\n",
+    "\n",
+    "from sklearn.model_selection import train_test_split\n",
+    "from sklearn.linear_model import LogisticRegression\n",
+    "\n",
+    "logisticRegr = LogisticRegression(random_state=0)\n",
+    "\n",
+    "logisticRegr.fit(X_train, y_train)\n",
+    "\n",
+    "predictions = logisticRegr.predict(X_train)\n",
+    "\n",
+    "'''\n",
+    "# Use score method to get accuracy of model\n",
+    "score = logisticRegr.score(x_test, y_test)\n",
+    "print(score)\n",
+    "'''\n",
+    "\n",
+    "TryLogisticRegressionDf = pd.DataFrame()\n",
+    "TryLogisticRegressionDf['adjustedQueryCase'] = \"\"\n",
+    "TryLogisticRegressionDf['pred-LogisticReg'] = \"\"\n",
+    "\n",
+    "TryLogisticRegression = unassignedAfterUmls1['adjustedQueryCase'].astype(str)\n",
+    "\n",
+    "text_features = tfidf.transform(TryLogisticRegression)\n",
+    "\n",
+    "for queryTerm, predicted in zip(TryLogisticRegression, predictions):\n",
+    "    TryLogisticRegressionDf = TryLogisticRegressionDf.append(pd.DataFrame({'adjustedQueryCase': queryTerm, \n",
+    "            'pred-LogisticReg': id_to_category[predicted]}, index=[0]), ignore_index=True, sort=True)\n",
+    "\n",
+    "TryLogisticRegressionDf = TryLogisticRegressionDf[['adjustedQueryCase', 'pred-LogisticReg']]\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# JOIN NEW DATAFRAMES\n",
+    "\n",
+    "twoGuesses = pd.merge(TryLinearSVCdf, TryLogisticRegressionDf)\n",
+    "\n",
+    "writer = pd.ExcelWriter('01_Pre-processing_files/twoGuesses.xlsx')\n",
+    "twoGuesses.to_excel(writer,'twoGuesses')\n",
+    "# df2.to_excel(writer,'Sheet2')\n",
+    "writer.save()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# 12. Final report by category\n",
+    "# Less than 1 minute\n",
+    "# ========================================================================\n",
+    "'''\n",
+    "Classes where more training data is needed, or other changes need to be\n",
+    "made.\n",
+    "'''\n",
+    "  \n",
+    "from sklearn import metrics\n",
+    "\n",
+    "print(metrics.classification_report(y_test, y_pred, \n",
+    "                                    target_names=df['SemanticTypeName'].unique()))\n",
+    "\n",
+    "\n",
+    "'''\n",
+    "\n",
+    "                                         precision    recall  f1-score   support\n",
+    "\n",
+    "                 NLM Product or Service       0.66      0.48      0.55       299\n",
+    "                   Quantitative Concept       0.31      0.22      0.26        78\n",
+    "Nucleic Acid, Nucleoside, or Nucleotide       0.40      0.14      0.21        57\n",
+    "    Therapeutic or Preventive Procedure       0.66      0.47      0.55       383\n",
+    "                                  Plant       0.53      0.12      0.19       178\n",
+    "                       Organic Chemical       0.26      0.94      0.41      1212\n",
+    "                   Intellectual Product       0.50      0.29      0.36       270\n",
+    "        Amino Acid, Peptide, or Protein       0.68      0.24      0.35       409\n",
+    "                         Cell Component       0.65      0.36      0.46        36\n",
+    "                Pharmacologic Substance       0.29      0.15      0.20       143\n",
+    "  Indicator, Reagent, or Diagnostic Aid       0.20      0.08      0.12        12\n",
+    "                       Temporal Concept       0.36      0.20      0.26        45\n",
+    "                    Nucleotide Sequence       0.00      0.00      0.00         3\n",
+    "                   Laboratory Procedure       0.57      0.54      0.55        98\n",
+    "   Body Part, Organ, or Organ Component       0.55      0.39      0.45       199\n",
+    "                                Finding       0.40      0.26      0.32       334\n",
+    "                    Disease or Syndrome       0.68      0.51      0.58      1040\n",
+    "                        Spatial Concept       0.20      0.02      0.04        43\n",
+    "                    Manufactured Object       0.28      0.11      0.16        91\n",
+    "                                   Cell       0.66      0.61      0.64        44\n",
+    "                         Gene or Genome       0.95      0.54      0.69       448\n",
+    "                                Vitamin       0.00      0.00      0.00         4\n",
+    "                     Immunologic Factor       0.78      0.34      0.47       116\n",
+    "          Cell or Molecular Dysfunction       0.40      0.14      0.21        14\n",
+    "                   Diagnostic Procedure       0.62      0.41      0.50       104\n",
+    "                     Molecular Function       0.47      0.24      0.32        29\n",
+    "                            semTypeName       0.77      0.66      0.71       176\n",
+    "                     Neoplastic Process       0.00      0.00      0.00         5\n",
+    "       Self-help or Relief Organization       0.20      0.16      0.18        19\n",
+    "                Body Location or Region       0.58      0.37      0.45       119\n",
+    "                        Sign or Symptom       0.00      0.00      0.00        31\n",
+    "                   Acquired Abnormality       0.59      0.28      0.38       169\n",
+    "                         Medical Device       0.21      0.18      0.19        28\n",
+    "                 Anatomical Abnormality       0.74      0.67      0.70       116\n",
+    "                    Injury or Poisoning       0.41      0.33      0.37        27\n",
+    "                     Clinical Attribute       0.53      0.44      0.48       125\n",
+    "       Mental or Behavioral Dysfunction       0.26      0.15      0.19       131\n",
+    "                    Pathologic Function       0.54      0.24      0.33        59\n",
+    "                       Population Group       1.00      0.33      0.50         9\n",
+    "                    Embryonic Structure       0.40      0.17      0.24        12\n",
+    "                      Regulation or Law       0.18      0.08      0.11        98\n",
+    "                    Qualitative Concept       0.62      0.22      0.33        94\n",
+    "                 Congenital Abnormality       0.29      0.10      0.15        72\n",
+    "                     Functional Concept       0.00      0.00      0.00        35\n",
+    "                  Occupational Activity       0.47      0.24      0.32        67\n",
+    "                         Mental Process       0.29      0.19      0.23        62\n",
+    "     Professional or Occupational Group       0.25      0.09      0.13        11\n",
+    "                           Organization       0.50      0.29      0.37        90\n",
+    "                                   Food       0.00      0.00      0.00        74\n",
+    "                              Eukaryote       0.00      0.00      0.00        20\n",
+    "                  Phenomenon or Process       0.54      0.22      0.32        58\n",
+    "       Health Care Related Organization       0.50      0.10      0.17        49\n",
+    "                      Organism Function       0.12      0.06      0.09        31\n",
+    "               Organ or Tissue Function       0.36      0.15      0.21        34\n",
+    "                        Social Behavior       0.31      0.08      0.13        59\n",
+    "                        Idea or Concept       0.96      0.32      0.48        78\n",
+    "                              Bacterium       0.00      0.00      0.00         2\n",
+    "                               Chemical       0.56      0.17      0.26        29\n",
+    "          Natural Phenomenon or Process       0.31      0.17      0.22        24\n",
+    "                               Activity       0.00      0.00      0.00         2\n",
+    "                   Professional Society       0.45      0.41      0.43       123\n",
+    "                   Health Care Activity       0.33      0.06      0.10        17\n",
+    "               Element, Ion, or Isotope       0.17      0.05      0.07        21\n",
+    "                   Physiologic Function       0.38      0.25      0.30        32\n",
+    "         Daily or Recreational Activity       0.65      0.47      0.54       129\n",
+    "    Biomedical Occupation or Discipline       0.00      0.00      0.00        10\n",
+    "           Chemical Viewed Structurally       0.73      0.40      0.52        20\n",
+    "                               Receptor       0.63      0.32      0.42        38\n",
+    "                                  Virus       0.75      0.17      0.27        18\n",
+    "                                 Tissue       0.00      0.00      0.00        21\n",
+    "                     Organism Attribute       0.00      0.00      0.00         7\n",
+    "           Chemical Viewed Functionally       0.50      0.31      0.38        32\n",
+    "                    Individual Behavior       0.12      0.12      0.12         8\n",
+    "                              Age Group       0.27      0.50      0.35         8\n",
+    "                        Group Attribute       0.44      0.41      0.42        17\n",
+    "           NLM Organizational Component       0.27      0.25      0.26        12\n",
+    "                          Cell Function       0.44      0.21      0.28        39\n",
+    "               Occupation or Discipline       0.43      0.07      0.12        90\n",
+    "                        Geographic Area       0.56      0.74      0.64        31\n",
+    "                          Clinical Drug       0.40      0.10      0.16        20\n",
+    "                                 Fungus       0.45      0.49      0.47        49\n",
+    "                      Research Activity       0.14      0.07      0.10        14\n",
+    "                              Substance       0.00      0.00      0.00         2\n",
+    "         Environmental Effect of Humans       1.00      0.29      0.44         7\n",
+    "              Patient or Disabled Group       0.00      0.00      0.00         4\n",
+    "                                  Human       0.38      0.19      0.25        16\n",
+    "              Laboratory or Test Result       0.00      0.00      0.00         1\n",
+    "                                Reptile       0.00      0.00      0.00         2\n",
+    "          Experimental Model of Disease       0.17      0.05      0.08        20\n",
+    "          Biologically Active Substance       0.00      0.00      0.00        16\n",
+    "                      Conceptual Entity       0.47      0.21      0.29        39\n",
+    "          Biomedical or Dental Material       0.69      0.19      0.30        47\n",
+    "                                 Mammal       0.00      0.00      0.00         1\n",
+    "                    Amino Acid Sequence       0.11      0.11      0.11        18\n",
+    "                         Body Substance       0.00      0.00      0.00         3\n",
+    "                              Amphibian       0.00      0.00      0.00         6\n",
+    "                      Biologic Function       1.00      0.08      0.15        12\n",
+    "                                   Bird       0.00      0.00      0.00         0\n",
+    "                   Anatomical Structure       0.50      0.12      0.20         8\n",
+    "                                Hormone       0.00      0.00      0.00        16\n",
+    "                         Classification       0.00      0.00      0.00         6\n",
+    "                                   Fish       0.00      0.00      0.00         3\n",
+    "                                 Animal       0.00      0.00      0.00         1\n",
+    "                                  Event       0.31      0.31      0.31        13\n",
+    "                 Body Space or Junction       0.00      0.00      0.00         3\n",
+    "                             Antibiotic       0.00      0.00      0.00        14\n",
+    "    Governmental or Regulatory Activity       0.77      0.38      0.51        26\n",
+    "                   Educational Activity       0.00      0.00      0.00         2\n",
+    "                               Archaeon       0.00      0.00      0.00        13\n",
+    "       Hazardous or Poisonous Substance       0.00      0.00      0.00         5\n",
+    "                       Machine Activity       0.40      0.32      0.35        19\n",
+    "                       Genetic Function       0.00      0.00      0.00        14\n",
+    "                     Inorganic Chemical       0.00      0.00      0.00         4\n",
+    "                               Behavior       0.75      0.30      0.43        10\n",
+    "     Human-caused Phenomenon or Process       0.00      0.00      0.00         7\n",
+    "   Molecular Biology Research Technique       0.44      0.31      0.36        13\n",
+    "                            Body System       0.40      0.44      0.42         9\n",
+    "                               Language       0.50      0.07      0.12        15\n",
+    "                               Organism       0.00      0.00      0.00         5\n",
+    "                                  Group       0.29      0.17      0.21        12\n",
+    "                           Family Group       0.00      0.00      0.00         2\n",
+    "                        Research Device       0.00      0.00      0.00         1\n",
+    "                        Physical Object       0.00      0.00      0.00         2\n",
+    "                            NLM Product       0.00      0.00      0.00         1\n",
+    "\n",
+    "                            avg / total       0.51      0.42      0.40      8878                                 \n",
+    "'''"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.5"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/05_Chart_the_trends.ipynb b/05_Chart_the_trends.ipynb
new file mode 100644
index 0000000..dd0d3eb
--- /dev/null
+++ b/05_Chart_the_trends.ipynb
@@ -0,0 +1,293 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Part 5. Chart the trends\n",
+    "App to analyze web-site search logs (internal search)<br>\n",
+    "**This script:** Biggest Movers / Percent change charts<br>\n",
+    "Authors: dan.wendling@nih.gov, <br>\n",
+    "Last modified: 2018-09-09\n",
+    "\n",
+    "\n",
+    "## Script contents\n",
+    "\n",
+    "1. Start-up / What to put into place, where\n",
+    "2. Load and clean a subset of data\n",
+    "3. Put stats into form that matplotlib can consume and export data\n",
+    "4. Biggest movers bar chart - Percent change in search frequency\n",
+    "\n",
+    "\n",
+    "## FIXMEs\n",
+    "\n",
+    "Things Dan wrote for Dan; modify as needed. There are more FIXMEs in context.\n",
+    "\n",
+    "* [ ] \n",
+    "\n",
+    "\n",
+    "## RESOURCES\n",
+    "\n",
+    "- Partly based on code from Mueller-Guido 2017, Visualize_coefficients, p 341.\n",
+    "- https://stackoverflow.com/questions/tagged/matplotlib\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# 1. Start-up / What to put into place, where\n",
+    "# ============================================\n",
+    "\n",
+    "import pandas as pd\n",
+    "import numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "import os\n",
+    "\n",
+    "from matplotlib.colors import ListedColormap\n",
+    "\n",
+    "\n",
+    "# Set working directory\n",
+    "os.chdir('/Users/wendlingd/Projects/webDS/_util')\n",
+    "\n",
+    "localDir = '05_Chart_the_trends_files/' # Different than others, see about changing\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# 2. Load and clean a subset of data\n",
+    "# ===================================\n",
+    "\n",
+    "logAfterFuzzyMatch = pd.read_excel('03_Fuzzy_match_files/logAfterFuzzyMatch.xlsx')\n",
+    "\n",
+    "# Limit to off-LAN, NLM Home\n",
+    "df1 = logAfterFuzzyMatch.loc[logAfterFuzzyMatch['StaffYN'].str.contains('N') == True]\n",
+    "searchfor = ['www.nlm.nih.gov$', 'www.nlm.nih.gov/$']\n",
+    "df1 = df1[df1.Referrer.str.contains('|'.join(searchfor))]\n",
+    "\n",
+    "'''\n",
+    "# If you want to remove unparsed\n",
+    "df1 = df1[df1.SemanticGroup.str.contains(\"Unparsed\") == False]\n",
+    "df1 = df1[df1.preferredTerm.str.contains(\"PubMed strategy, citation, unclear, etc.\") == False]\n",
+    "'''\n",
+    "\n",
+    "\n",
+    "# reduce cols\n",
+    "df2 = df1[['Timestamp', 'preferredTerm', 'SemanticTypeName', 'SemanticGroup']]\n",
+    "\n",
+    "# Get nan count, remove nan rows\n",
+    "Unassigned = df2['preferredTerm'].isnull().sum()\n",
+    "df2 = df2[~pd.isnull(df2['Timestamp'])]\n",
+    "df2 = df2[~pd.isnull(df2['preferredTerm'])]\n",
+    "df2 = df2[~pd.isnull(df2['SemanticTypeName'])]\n",
+    "df2 = df2[~pd.isnull(df2['SemanticGroup'])]\n",
+    "\n",
+    "# Limit to May and June and assign month name\n",
+    "df2.loc[(df2['Timestamp'] > '2018-05-01 00:00:00') & (df2['Timestamp'] < '2018-06-01 00:00:00'), 'Month'] = 'May'\n",
+    "df2.loc[(df2['Timestamp'] > '2018-06-01 00:00:00') & (df2['Timestamp'] < '2018-07-01 00:00:00'), 'Month'] = 'June'\n",
+    "df2 = df2.loc[(df2['Month'] != \"\")]\n",
+    "\n",
+    "\n",
+    "\n",
+    "\n",
+    "'''\n",
+    "--------------------------\n",
+    "IN CASE YOU COMPLETE CYCLE AND THEN SEE THAT LABELS SHOULD BE SHORTENED\n",
+    "\n",
+    "# Shorten names if needed\n",
+    "df2['preferredTerm'] = df2['preferredTerm'].str.replace('National Center for Biotechnology Information', 'NCBI')\n",
+    "df2['preferredTerm'] = df2['preferredTerm'].str.replace('Samples of Formatted Refs J Articles', 'Formatted Refs Authors J Articles')\n",
+    "df2['preferredTerm'] = df2['preferredTerm'].str.replace('Formatted References for Authors of Journal Articles', 'Formatted Refs J Articles')\n",
+    "\n",
+    "dobby = df2.loc[df2['preferredTerm'].str.contains('Formatted') == True]\n",
+    "dobby = df2.loc[df2['preferredTerm'].str.contains('Biotech') == True]\n",
+    "\n",
+    "writer = pd.ExcelWriter('03_Fuzzy_match_files/logAfterFuzzyMatch.xlsx')\n",
+    "df2.to_excel(writer,'logAfterFuzzyMatch')\n",
+    "# df2.to_excel(writer,'Sheet2')\n",
+    "writer.save()\n",
+    "'''\n",
+    "\n",
+    "writer = pd.ExcelWriter('03_Fuzzy_match_files/logAfterFuzzyMatch.xlsx')\n",
+    "df2.to_excel(writer,'logAfterFuzzyMatch')\n",
+    "# df2.to_excel(writer,'Sheet2')\n",
+    "writer.save()\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "\n",
+    "# Count number of unique preferredTerm\n",
+    "\n",
+    "# May counts\n",
+    "May = df2.loc[df2['Month'].str.contains('May') == True]\n",
+    "MayCounts = May.groupby('preferredTerm').size()\n",
+    "MayCounts = pd.DataFrame({'MayCount':MayCounts})\n",
+    "# MayCounts = MayCounts.sort_values(by='timesSearched', ascending=False)\n",
+    "MayCounts = MayCounts.reset_index()\n",
+    "\n",
+    "# June counts\n",
+    "June = df2.loc[df2['Month'].str.contains('June') == True]\n",
+    "JuneCounts = June.groupby('preferredTerm').size()\n",
+    "JuneCounts = pd.DataFrame({'JuneCount':JuneCounts})\n",
+    "# JuneCounts = JuneCounts.sort_values(by='timesSearched', ascending=False)\n",
+    "JuneCounts = JuneCounts.reset_index()\n",
+    "\n",
+    "\n",
+    "# Remove rows with a count less than 10; next code would make some exponential.\n",
+    "MayCounts = MayCounts[MayCounts['MayCount'] >= 10]\n",
+    "JuneCounts = JuneCounts[JuneCounts['JuneCount'] >= 10]\n",
+    "\n",
+    "# Join, removing terms not searched in BOTH months \n",
+    "df3 = pd.merge(MayCounts, JuneCounts, how='inner', on='preferredTerm')\n",
+    "\n",
+    "# Assign the percentage of that month's search share\n",
+    "# MayPercent\n",
+    "df3['MayPercent'] = \"\"\n",
+    "MayTotal = df3.MayCount.sum()\n",
+    "df3['MayPercent'] = df3.MayCount / MayTotal * 100\n",
+    "\n",
+    "# JunePercent\n",
+    "df3['JunePercent'] = \"\"\n",
+    "JuneTotal = df3.JuneCount.sum()\n",
+    "df3['JunePercent'] = df3.JuneCount / JuneTotal * 100\n",
+    "\n",
+    "# Assign Percent Change\n",
+    "df3['PercentChange'] = \"\"\n",
+    "df3['PercentChange'] = df3.JunePercent - df3.MayPercent\n",
+    "\n",
+    "# Prep for next phase\n",
+    "\n",
+    "PercentChangeData = df3\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# 3. Put stats into form that matplotlib can consume and export data\n",
+    "# ===================================================================\n",
+    "\n",
+    "PercentChangeData = PercentChangeData.sort_values(by='PercentChange', ascending=True)\n",
+    "PercentChangeData = PercentChangeData.reset_index()\n",
+    "PercentChangeData.drop(['index'], axis=1, inplace=True)          \n",
+    "     \n",
+    "negative_values = PercentChangeData.head(20)\n",
+    "\n",
+    "positive_values = PercentChangeData.tail(20)\n",
+    "positive_values = positive_values.sort_values(by='PercentChange', ascending=True)\n",
+    "positive_values = positive_values.reset_index()\n",
+    "positive_values.drop(['index'], axis=1, inplace=True) \n",
+    "\n",
+    "interesting_values =  negative_values.append([positive_values])\n",
+    "\n",
+    "\n",
+    "# Write out full file and chart file\n",
+    "\n",
+    "writer = pd.ExcelWriter(localDir + 'PercentChangeData.xlsx')\n",
+    "PercentChangeData.to_excel(writer,'PercentChangeData')\n",
+    "# df2.to_excel(writer,'Sheet2')\n",
+    "writer.save()\n",
+    "\n",
+    "writer = pd.ExcelWriter(localDir + 'interesting_values.xlsx')\n",
+    "interesting_values.to_excel(writer,'interesting_values')\n",
+    "# df2.to_excel(writer,'Sheet2')\n",
+    "writer.save()\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# 4. Biggest movers bar chart - Percent change in search frequency\n",
+    "# =================================================================\n",
+    "'''\n",
+    "Re-start:\n",
+    "interesting_values = pd.read_excel(localDir + 'interesting_values.xlsx')\n",
+    "'''\n",
+    "\n",
+    "\n",
+    "# Percent change chart\n",
+    "cm = ListedColormap(['#0000aa', '#ff2020'])\n",
+    "colors = [cm(1) if c < 0 else cm(0)\n",
+    "          for c in interesting_values.PercentChange]\n",
+    "ax = interesting_values.plot(x='preferredTerm', y='PercentChange',\n",
+    "                             kind='bar', \n",
+    "                             color=colors,\n",
+    "                             fontsize=10) # figsize=(30, 10), \n",
+    "ax.set_xlabel(\"preferredTerm\")\n",
+    "ax.set_ylabel(\"Percent change for June\")\n",
+    "ax.legend_.remove()\n",
+    "plt.axvline(x=19.4, linewidth=.5, color='gray')\n",
+    "plt.axvline(x=19.6, linewidth=.5, color='gray')\n",
+    "plt.subplots_adjust(bottom=0.4)\n",
+    "plt.ylabel(\"Percent change in search frequency\")\n",
+    "plt.xlabel(\"Standardized topic name from UMLS+\")\n",
+    "plt.xticks(rotation=60, ha=\"right\", fontsize=9)\n",
+    "plt.suptitle('Biggest movers - How June site searches were different from the past', fontsize=16, fontweight='bold')\n",
+    "plt.title('NLM Home page, classify-able search terms only. In June use of the terms on the left\\ndropped the most, and use of the terms on the right rose the most, compared to May.', fontsize=10)\n",
+    "plt.show()\n",
+    "\n",
+    "# How June was different than May\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Outlier check\n",
+    "# =================================================================\n",
+    "'''\n",
+    "Why did Bibliographic Entity increase by 4%?\n",
+    "'''\n",
+    "\n",
+    "huh = logAfterFuzzyMatch[logAfterFuzzyMatch.preferredTerm.str.startswith(\"Biblio\") == True] # retrieve records to eyeball\n",
+    "# huh = huh.groupby('preferredTerm').size()\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.5"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/05b_Chart_the_trends-BiggestMovers.ipynb b/05b_Chart_the_trends-BiggestMovers.ipynb
new file mode 100644
index 0000000..41dca53
--- /dev/null
+++ b/05b_Chart_the_trends-BiggestMovers.ipynb
@@ -0,0 +1,578 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# 05b Chart the trends - \"Biggest movers\" May-June\n",
+    "App to analyze web-site search logs (internal search)<br>\n",
+    "**This script:** May-June analysis, fuller than the 05 file. Biggest Movers / Percent change charts<br>\n",
+    "Authors: dan.wendling@nih.gov, <br>\n",
+    "Last modified: 2018-09-09\n",
+    "\n",
+    "\n",
+    "## Script contents\n",
+    "\n",
+    "1. Start-up / What to put into place, where\n",
+    "2. Unite search log data into single dataframe; globally update columns and rows\n",
+    "3. Separate out the queries with non-English characters\n",
+    "4. Run STAFF stats\n",
+    "5. Run PUBLIC (off-LAN) stats\n",
+    "6. Add result to MySQL, process at http://localhost:5000/searchsum\n",
+    "\n",
+    "\n",
+    "## FIXMEs\n",
+    "\n",
+    "Things Dan wrote for Dan; modify as needed. There are more FIXMEs in context.\n",
+    "\n",
+    "* [ ] "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# 1. Start-up / What to put into place, where\n",
+    "# ============================================\n",
+    "\n",
+    "import pandas as pd\n",
+    "import matplotlib.pyplot as plt\n",
+    "from matplotlib.pyplot import pie, axis, show\n",
+    "import numpy as np\n",
+    "import os\n",
+    "import string\n",
+    "\n",
+    "# Set working directory\n",
+    "os.chdir('/Users/wendlingd/Projects/webDS/_util')\n",
+    "\n",
+    "localDir = '05_Chart_the_trends_files/'\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# 2. Unite search log data into single dataframe; globally update columns and rows\n",
+    "# =================================================================================\n",
+    "# What is your new log file named?\n",
+    "\n",
+    "newSearchLogFile = '00_Source_files/FY18-q3.xlsx'\n",
+    "\n",
+    "x1 = pd.read_excel(newSearchLogFile, 'Page1_1', skiprows=2)\n",
+    "x2 = pd.read_excel(newSearchLogFile, 'Page1_2', skiprows=2)\n",
+    "x3 = pd.read_excel(newSearchLogFile, 'Page1_3', skiprows=2)\n",
+    "x4 = pd.read_excel(newSearchLogFile, 'Page1_4', skiprows=2)\n",
+    "x5 = pd.read_excel(newSearchLogFile, 'Page1_5', skiprows=2)\n",
+    "x6 = pd.read_excel(newSearchLogFile, 'Page1_6', skiprows=2)\n",
+    "# x5 = pd.read_excel('00 SourceFiles/2018-06/Queries-2018-05.xlsx', 'Page1_2', skiprows=2)\n",
+    "\n",
+    "searchLog = pd.concat([x1, x2, x3, x4, x5, x6], ignore_index=True) # , x3, x4, x5, x6, x7\n",
+    "\n",
+    "searchLog.head(n=5)\n",
+    "searchLog.shape\n",
+    "searchLog.info()\n",
+    "searchLog.columns\n",
+    "\n",
+    "# Drop ID column, not needed\n",
+    "# searchLog.drop(['ID'], axis=1, inplace=True)\n",
+    "            \n",
+    "# Until Cognos report is fixed, problem of blank columns, multi-word col name\n",
+    "# Update col name\n",
+    "searchLog = searchLog.rename(columns={'Search Timestamp': 'Timestamp', \n",
+    "                                      'NLM IP Y/N':'StaffYN',\n",
+    "                                      'IP':'SessionID'})\n",
+    "\n",
+    "# Remove https:// to become joinable with traffic data\n",
+    "searchLog['Referrer'] = searchLog['Referrer'].str.replace('https://', '')\n",
+    "\n",
+    "# Dupe off the Query column into a lower-cased 'adjustedQueryCase', which \n",
+    "# will be the column you match against\n",
+    "searchLog['adjustedQueryCase'] = searchLog['Query'].str.lower()\n",
+    "\n",
+    "# Remove incomplete rows, which can cause errors later\n",
+    "searchLog = searchLog[~pd.isnull(searchLog['Referrer'])]\n",
+    "searchLog = searchLog[~pd.isnull(searchLog['Query'])]\n",
+    "\n",
+    "# Limit to NLM Home\n",
+    "searchfor = ['www.nlm.nih.gov$', 'www.nlm.nih.gov/$']\n",
+    "HmPgLog = searchLog[searchLog.Referrer.str.contains('|'.join(searchfor))]\n",
+    "\n",
+    "timeBoundHmPgLog = HmPgLog\n",
+    "\n",
+    "# Limit to May and June and assign month name\n",
+    "timeBoundHmPgLog.loc[(timeBoundHmPgLog['Timestamp'] > '2018-05-01 00:00:00') & (timeBoundHmPgLog['Timestamp'] < '2018-06-01 00:00:00'), 'Month'] = 'May'\n",
+    "timeBoundHmPgLog.loc[(timeBoundHmPgLog['Timestamp'] > '2018-06-01 00:00:00') & (timeBoundHmPgLog['Timestamp'] < '2018-07-01 00:00:00'), 'Month'] = 'June'\n",
+    "timeBoundHmPgLog = timeBoundHmPgLog.loc[(timeBoundHmPgLog['Month'] != \"\")]\n",
+    "# or drop nan\n",
+    "timeBoundHmPgLog.dropna(subset=['Month'], inplace=True) \n",
+    "\n",
+    "\n",
+    "# Useful to write out the cleaned up version; if you do re-processing, you can skip a bunch of work.\n",
+    "writer = pd.ExcelWriter(localDir + 'timeBoundHmPgLog.xlsx')\n",
+    "timeBoundHmPgLog.to_excel(writer,'timeBoundHmPgLog')\n",
+    "# df2.to_excel(writer,'Sheet2')\n",
+    "writer.save()\n",
+    "\n",
+    "# Remove x1., etc., searchLog, HmPgLog\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# 3. Separate out the queries with non-English characters\n",
+    "# ========================================================\n",
+    "'''\n",
+    "# FIXME - STOP THIS FROM CHANGING NORMAL ROWS.\n",
+    "See comment in function. Trying things from:\n",
+    "https://stackoverflow.com/questions/36340627/removing-non-ascii-characters-and-replacing-with-spaces-from-pandas-data-frame\n",
+    "https://stackoverflow.com/questions/27084617/detect-strings-with-non-english-characters-in-python\n",
+    "https://stackoverflow.com/questions/196345/how-to-check-if-a-string-in-python-is-in-ascii\n",
+    "https://stackoverflow.com/questions/16353729/how-do-i-use-pandas-apply-function-to-multiple-columns\n",
+    "And other places\n",
+    "\n",
+    "For testing\n",
+    "searchLogClean = pd.read_excel(localDir + 'searchLogClean.xlsx')\n",
+    "searchLogClean = searchLogClean.iloc[12000:13000]\n",
+    "searchLogClean['preferredTerm'] = searchLogClean['preferredTerm'].str.replace(None, '')\n",
+    "\n",
+    "Future: Break out languages better; assign language name, find translation API, etc.\n",
+    "\n",
+    "Re-start\n",
+    "MayJuneHmPg = pd.read_excel(localDir + 'searchLog-MayJune-HmPg.xlsx')\n",
+    "timeBoundHmPgLog = MayJuneHmPg\n",
+    "'''\n",
+    "\n",
+    "\n",
+    "# When it hangs... checkTrouble = searchLog.iloc[156422:156427]\n",
+    "\n",
+    "\n",
+    "timeBoundHmPgLog['preferredTerm'] = \"\"\n",
+    "\n",
+    "def foreignCharTest(row):\n",
+    "    try: \n",
+    "        row['Query'].encode('ascii'); \n",
+    "        pass # Intention is, don't alter row at all; but returns None.\n",
+    "    except UnicodeEncodeError: \n",
+    "        return 'NON-ENGLISH CHARACTERS'\n",
+    "\n",
+    "timeBoundHmPgLog['preferredTerm'] = timeBoundHmPgLog.apply(foreignCharTest, axis=1)\n",
+    "\n",
+    "# FIXME - Find a way to restore preferredTerm\n",
+    "# searchLog['preferredTerm'].replace('', np.nan, inplace=True)\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# 4. Run STAFF stats\n",
+    "# ==============================\n",
+    "'''\n",
+    "On-LAN stats\n",
+    "FIXME - Check whether Cognos separation of Sfaff-YN can exclude reading room?\n",
+    "But, how many of the people in the reading room are on www.nlm.nih.gov at all?\n",
+    "'''\n",
+    "# Restrict to staff\n",
+    "staffStats = timeBoundHmPgLog.loc[timeBoundHmPgLog['StaffYN'].str.contains('Y') == True]\n",
+    "\n",
+    "# Staff search count\n",
+    "totSearchesStaff = staffStats.groupby('Month')['ID'].nunique()\n",
+    "print(\"\\nTotal STAFF SEARCHES in raw log file:\\n{}\".format(totSearchesStaff))\n",
+    "\n",
+    "# Staff unique queries\n",
+    "uniqueSearchesStaff = staffStats['Query'].nunique()\n",
+    "uniqueSearchesStaff\n",
+    "\n",
+    "uniqueSearchesStaffByMonth = staffStats.groupby('Month')['Query'].nunique()\n",
+    "uniqueSearchesStaffByMonth\n",
+    "\n",
+    "\n",
+    "\n",
+    "\n",
+    "# Staff session count\n",
+    "totSessionsStaff = staffStats.groupby('Month')['SessionID'].nunique()\n",
+    "print(\"\\nTotal STAFF SESSIONS in raw log file:\\n{}\".format(totSessionsStaff))\n",
+    "\n",
+    "'''\n",
+    "Bar chart - by number of searches per session\n",
+    "\n",
+    "Average searches per session\n",
+    "Median searches per session\n",
+    "Average searches per day (@ 22d/mo.)\n",
+    "Median searches per day  (@ 22d/mo.)\n",
+    "Average sessions per day\n",
+    "Median sessions per day\n",
+    "Highest search count in one session\n",
+    "\n",
+    "'''\n",
+    "\n",
+    "\n",
+    "# Top 40 queries from NLM LAN, from NLM Home (not normalized)\n",
+    "searchLogLanYesHmPg = staffStats.loc[staffStats['StaffYN'].str.contains('Y') == True]\n",
+    "searchfor = ['www.nlm.nih.gov$', 'www.nlm.nih.gov/$']\n",
+    "searchLogLanYesHmPg = searchLogLanYesHmPg[searchLogLanYesHmPg.Referrer.str.contains('|'.join(searchfor))]\n",
+    "searchLogLanYesHmPgQueryCounts = searchLogLanYesHmPg['Query'].value_counts()\n",
+    "searchLogLanYesHmPgQueryCounts = searchLogLanYesHmPgQueryCounts.reset_index()\n",
+    "searchLogLanYesHmPgQueryCounts = searchLogLanYesHmPgQueryCounts.rename(columns={'index': 'Top queries from NLM LAN, from Home, as entered', 'Query': 'Count'})\n",
+    "searchLogLanYesHmPgQueryCounts.head(n=25)\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# 5. Run PUBLIC (off-LAN) stats\n",
+    "# ==============================\n",
+    "\n",
+    "\n",
+    "visitorStats = timeBoundHmPgLog.loc[timeBoundHmPgLog['StaffYN'].str.contains('N') == True]\n",
+    "\n",
+    "# Count rows with foreign chars\n",
+    "foreignCount = visitorStats.loc[visitorStats['preferredTerm'].str.contains('NON-ENGLISH CHARACTERS') == True]\n",
+    "foreignCount.count()\n",
+    "\n",
+    "# Drop rows with foreign chars\n",
+    "visitorStats = visitorStats[visitorStats.preferredTerm != 'NON-ENGLISH CHARACTERS']\n",
+    "\n",
+    "# Visitor search count\n",
+    "totSearchesVisitors = visitorStats.groupby('Month')['ID'].nunique()\n",
+    "print(\"\\nTotal VISITOR SEARCHES in raw log file:\\n{}\".format(totSearches))\n",
+    "\n",
+    "# Visitor unique queries\n",
+    "uniqueSearchesVisitors = visitorStats['Query'].nunique()\n",
+    "uniqueSearchesVisitors\n",
+    "\n",
+    "uniqueSearchesVisitorsByMonth = visitorStats.groupby('Month')['Query'].nunique()\n",
+    "uniqueSearchesVisitorsByMonth\n",
+    "\n",
+    "\n",
+    "# Visitor session count\n",
+    "totSessionsVisitors = visitorStats.groupby('Month')['SessionID'].nunique()\n",
+    "print(\"\\nTotal VISITOR SESSIONS in raw log file:\\n{}\".format(totSessions))\n",
+    "\n",
+    "\n",
+    "\n",
+    "'''\n",
+    "Bar chart - by number of searches per session\n",
+    "\n",
+    "Average searches per session\n",
+    "Median searches per session\n",
+    "Average searches per day (@ 22d/mo.)\n",
+    "Median searches per day  (@ 22d/mo.)\n",
+    "Average sessions per day\n",
+    "Median sessions per day\n",
+    "Highest search count in one session\n",
+    "\n",
+    "'''\n",
+    "\n",
+    "\n",
+    "# Highest session search count\n",
+    "SessionCounts = visitorStats['SessionID'].value_counts()\n",
+    "SessionCounts = pd.DataFrame({'TypeCount':SessionCounts})\n",
+    "SessionCounts.sort_values(\"TypeCount\", ascending=True, inplace=True)\n",
+    "SessionCounts = SessionCounts.reset_index()\n",
+    "\n",
+    "# test = searchLog.loc[searchLog['SessionID'].str.contains('47C9DEE89B48E22FB53E2BE2DB107763') == True]\n",
+    "\n",
+    "\n",
+    "# Top queries outside NLM LAN, from NLM Home (not normalized)\n",
+    "# May-June\n",
+    "df3LanNoHmPgQueryCounts = visitorStats['Query'].value_counts()\n",
+    "df3LanNoHmPgQueryCounts = df3LanNoHmPgQueryCounts.reset_index()\n",
+    "df3LanNoHmPgQueryCounts = df3LanNoHmPgQueryCounts.rename(columns={'index': 'Top queries off of LAN, from Home, as entered', 'Query': 'Count'})\n",
+    "df3LanNoHmPgQueryCounts.head(n=25)\n",
+    "\n",
+    "# May top 25\n",
+    "MayVisitorTop25 = visitorStats.loc[visitorStats['Month'].str.contains('May') == True]\n",
+    "MayVisitorTop25 = MayVisitorTop25['Query'].value_counts()\n",
+    "MayVisitorTop25 = MayVisitorTop25.reset_index()\n",
+    "MayVisitorTop25 = MayVisitorTop25.rename(columns={'index': 'Top VISITOR queries from NLM Home page, as entered', 'Query': 'Count'})\n",
+    "MayVisitorTop25.head(n=25)\n",
+    "\n",
+    "# June top 25\n",
+    "JuneVisitorTop25 = visitorStats.loc[visitorStats['Month'].str.contains('June') == True]\n",
+    "JuneVisitorTop25 = JuneVisitorTop25['Query'].value_counts()\n",
+    "JuneVisitorTop25 = JuneVisitorTop25.reset_index()\n",
+    "JuneVisitorTop25 = JuneVisitorTop25.rename(columns={'index': 'Top VISITOR queries from NLM Home page, as entered', 'Query': 'Count'})\n",
+    "JuneVisitorTop25.head(n=25)\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# logAfterFuzzyMatch\n",
+    "\n",
+    "EffectOfLight = logAfterFuzzyMatch.loc[logAfterFuzzyMatch['Query'].str.contains('effect of light') == True]\n",
+    "\n",
+    "# Useful to write out the cleaned up version; if you do re-processing, you can skip a bunch of work.\n",
+    "writer = pd.ExcelWriter(localDir + 'EffectOfLight.xlsx')\n",
+    "EffectOfLight.to_excel(writer,'EffectOfLight')\n",
+    "# df2.to_excel(writer,'Sheet2')\n",
+    "writer.save()\n",
+    "\n",
+    "\n",
+    "\n",
+    "dobby = logAfterFuzzyMatch.loc[logAfterFuzzyMatch['preferredTerm'].str.startswith('Samples of Formatted') == True]\n",
+    "\n",
+    "# Samples of Formatted References for Authors of Journal Articles\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# 6. Add result to MySQL, process at http://localhost:5000/searchsum\n",
+    "# ========================================================================\n",
+    "'''\n",
+    "timeBoundHmPgLog.columns\n",
+    "\n",
+    "In phpMyAdmin:\n",
+    "\n",
+    "DROP TABLE IF EXISTS `timeboundhmpglog`;\n",
+    "CREATE TABLE `timeboundhmpglog` (\n",
+    "  `Timestamp` datetime DEFAULT NULL,\n",
+    "  `preferredTerm` text,\n",
+    "  `SemanticTypeName` text,\n",
+    "  `SemanticTypeCode` int(11) DEFAULT NULL,\n",
+    "  `SemanticGroup` text,\n",
+    "  `SemanticGroupCode` int(11) DEFAULT NULL,\n",
+    "  `Month` text\n",
+    ") ENGINE=InnoDB DEFAULT CHARSET=utf8;\n",
+    "\n",
+    "    \n",
+    "        \n",
+    "writer = pd.ExcelWriter(localDir + 'timeBoundHmPgLog.xlsx')\n",
+    "timeBoundHmPgLog.to_excel(writer,'timeBoundHmPgLog')\n",
+    "# df2.to_excel(writer,'Sheet2')\n",
+    "writer.save()\n",
+    "\n",
+    "\n",
+    "Re-start\n",
+    "MayJuneHmPg = pd.read_excel(localDir + 'searchLog-MayJune-HmPg.xlsx')\n",
+    "timeBoundHmPgLog = MayJuneHmPg\n",
+    "'''\n",
+    "\n",
+    "logAfterFuzzyMatch = pd.read_excel('03_Fuzzy_match_files/logAfterFuzzyMatch.xlsx')\n",
+    "\n",
+    "# Remove nans from Month\n",
+    "logAfterFuzzyMatch = logAfterFuzzyMatch.dropna(subset=['Month'])\n",
+    "\n",
+    "logAfterFuzzyMatch.columns\n",
+    "\n",
+    "# Reduce size for test\n",
+    "test = logAfterFuzzyMatch.iloc[0:49]\n",
+    "\n",
+    "\n",
+    "# Add dataframe to MySQL\n",
+    "\n",
+    "import mysql.connector\n",
+    "from pandas.io import sql\n",
+    "from sqlalchemy import create_engine\n",
+    "\n",
+    "dbconn = create_engine('mysql+mysqlconnector://wendlingd:DataSciPwr17@localhost/ia')\n",
+    "\n",
+    "logAfterFuzzyMatch.to_sql(name='timeboundhmpglog', con=dbconn, if_exists = 'replace', index=False) # or if_exists='append'\n",
+    "   \n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "\n",
+    "'''\n",
+    "\n",
+    "test = df3\n",
+    "\n",
+    "df3.set_index('SessionID', inplace=True)\n",
+    "test = df3.groupby(['col2','col3'], as_index=False).count()\n",
+    "\n",
+    "\n",
+    "\n",
+    "test = df3.groupby(['Month','StaffYN'], as_index=False)['SessionID'].count()\n",
+    "test\n",
+    "\n",
+    "test = df3.groupby(['Month','StaffYN'], as_index=False)['SearchID'].count()\n",
+    "test\n",
+    "\n",
+    "\n",
+    "test = df3['ID'].groupby([df3['Month'], df3['StaffYN']]).size()\n",
+    "test\n",
+    "\n",
+    "\n",
+    "df3['SearchID'].count()\n",
+    "\n",
+    "test = df3.groupby(['Month', 'StaffYN'])['Referrer'].size()\n",
+    "test\n",
+    "\n",
+    "totSearches = df3.groupby(['Month', 'StaffYN'])['SearchID'].count()\n",
+    "print(\"\\nTotal SEARCHES in raw log file:\\n{}\".format(totSearches))\n",
+    "\n",
+    "totSessions = df3.groupby(['Month', 'StaffYN']).size()\n",
+    "print(\"\\nTotal SESSIONS in raw log file:\\n{}\".format(totSessions))\n",
+    "\n",
+    "\n",
+    "# pd.crosstab(df3.ID, df3.SessionID, margins=True)\n",
+    "\n",
+    "\n",
+    "# df3 = df3.rename(columns={'ID': 'SearchID'})\n",
+    "'''\n",
+    "\n",
+    "\n",
+    "\n",
+    "# Total SEARCHES in raw log file\n",
+    "totSearches = df3['SearchID'].groupby([df3['Month'], df3['StaffYN']]).count()\n",
+    "print(\"\\nTotal SEARCHES in raw log file:\\n{}\".format(totSearches))\n",
+    "\n",
+    "# Total SESSIONS in raw log file\n",
+    "totSessions = df3['SessionID'].groupby([df3['Month'], df3['StaffYN']]).count()\n",
+    "print(\"\\nTotal SESSIONS in raw log file:\\n{}\".format(totSessions))\n",
+    "\n",
+    "\n",
+    "\n",
+    "print(\"Total searches in raw log file: {}\".format(len(df3)))\n",
+    "\n",
+    "# totals\n",
+    "print(\"\\nTotal SEARCH QUERIES, on NLM LAN or not\\n{}\".format(df3['StaffYN'].value_counts()))\n",
+    "\n",
+    "print(\"\\nTotal SESSIONS, on NLM LAN or not\\n{}\".format(df3.groupby('StaffYN')['SessionID'].nunique()))\n",
+    "\n",
+    "\n",
+    "\n",
+    "\n",
+    "test = df3['SearchID'].groupby(df3['Month'])\n",
+    "test.count()\n",
+    "\n",
+    "# If you see digits in text col, perhaps these are partial log entries - eyeball for removal\n",
+    "# df3.drop(76080, inplace=True)\n",
+    "\n",
+    "\n",
+    "test = df3['StaffYN'].groupby(df3['Month'])\n",
+    "test.count()\n",
+    "\n",
+    "\n",
+    "\n",
+    "# Total SEARCHES containing 'Non-English characters'\n",
+    "print(\"Total SEARCHES with non-English characters\\n{}\".format(df3['preferredTerm'].value_counts()))\n",
+    "\n",
+    "# Total SESSIONS containing 'Non-English characters'\n",
+    "# Future\n",
+    "\n",
+    "\n",
+    "\n",
+    "\n",
+    "# How to set a date range\n",
+    "AprMay = logAfterUmlsApi1[(logAfterUmlsApi1['Timestamp'] > '2018-04-01 01:00:00') & (logAfterUmlsApi1['Timestamp'] < '2018-06-01 00:00:00')]\n",
+    "\n",
+    "\n",
+    "\n",
+    "\n",
+    "\n",
+    "# Top queries from LAN (not normalized)\n",
+    "df3LanYes = df3.loc[df3['StaffYN'].str.contains('Y') == True]\n",
+    "df3LanYesQueryCounts = df3LanYes['Query'].value_counts()\n",
+    "df3LanYesQueryCounts = df3LanYesQueryCounts.reset_index()\n",
+    "df3LanYesQueryCounts = df3LanYesQueryCounts.rename(columns={'index': 'Top staff queries as entered', 'Query': 'Count'})\n",
+    "df3LanYesQueryCounts.head(n=30)\n",
+    "\n",
+    "# Top queries from NLM LAN, from NLM Home (not normalized)\n",
+    "df3LanYesHmPg = df3.loc[df3['StaffYN'].str.contains('Y') == True]\n",
+    "searchfor = ['www.nlm.nih.gov$', 'www.nlm.nih.gov/$']\n",
+    "df3LanYesHmPg = df3LanYesHmPg[df3LanYesHmPg.Referrer.str.contains('|'.join(searchfor))]\n",
+    "df3LanYesHmPgQueryCounts = df3LanYesHmPg['Query'].value_counts()\n",
+    "df3LanYesHmPgQueryCounts = df3LanYesHmPgQueryCounts.reset_index()\n",
+    "df3LanYesHmPgQueryCounts = df3LanYesHmPgQueryCounts.rename(columns={'index': 'Top queries from NLM LAN, from Home, as entered', 'Query': 'Count'})\n",
+    "df3LanYesHmPgQueryCounts.head(n=25)\n",
+    "\n",
+    "\n",
+    "# Top queries outside NLM LAN (not normalized)\n",
+    "df3LanNo = df3.loc[df3['StaffYN'].str.contains('N') == True]\n",
+    "df3LanNoQueryCounts = df3LanNo['Query'].value_counts()\n",
+    "df3LanNoQueryCounts = df3LanNoQueryCounts.reset_index()\n",
+    "df3LanNoQueryCounts = df3LanNoQueryCounts.rename(columns={'index': 'Top queries off of LAN, as entered', 'Query': 'Count'})\n",
+    "df3LanNoQueryCounts.head(n=25)\n",
+    "\n",
+    "\n",
+    "\n",
+    "# Top home page queries, staff or public\n",
+    "searchfor = ['www.nlm.nih.gov$', 'www.nlm.nih.gov/$']\n",
+    "df3AllHmPgQueryCounts = df3[df3.Referrer.str.contains('|'.join(searchfor))]\n",
+    "df3AllHmPgQueryCounts = df3AllHmPgQueryCounts['Query'].value_counts()\n",
+    "df3AllHmPgQueryCounts = df3AllHmPgQueryCounts.reset_index()\n",
+    "df3AllHmPgQueryCounts = df3AllHmPgQueryCounts.rename(columns={'index': 'Top home page queries, staff or public, as entered', 'Query': 'Count'})\n",
+    "df3AllHmPgQueryCounts.head(n=25)\n",
+    "\n",
+    "\n",
+    "# FIXME - Add table, Percentage of staff, public searches done within pages, within search results\n",
+    "\n",
+    "\n",
+    "# FIXME - Add table for Top queries with columns/counts On LAN, Off LAN, Total\n",
+    "\n",
+    "\n",
+    "\n",
+    "\n",
+    "\n",
+    "\n",
+    "# Remove the searches run from within search results screens, vsearch.nlm.nih.gov/vivisimo/\n",
+    "# I'm not looking at these now; you might be.\n",
+    "df3 = df3[df3.Referrer.str.startswith(\"www.nlm.nih.gov\") == True]\n",
+    "\n",
+    "# Not sure what these are, www.nlm.nih.gov/?_ga=2.95055260.1623044406.1513044719-1901803437.1513044719\n",
+    "df3 = df3[df3.Referrer.str.startswith(\"www.nlm.nih.gov/?_ga=\") == False]\n",
+    "\n",
+    "\n",
+    "# FIXME - VARIABLE EXPLORER: After saving the stats, remove unneeded 'Type=DataFrame' items\n",
+    "'''\n",
+    "Remove manually for now.\n",
+    "Not finding an equiv to R's rm; cf https://stackoverflow.com/questions/32247643/how-to-delete-multiple-pandas-python-dataframes-from-memory-to-save-ram?rq=1\n",
+    "pd.x1(), pd.x2(), # pd.x3(), pd.x4(), pd.x5(), pd.x6(), pd.x7(), \n",
+    "              pd.searchLogLanYes(), pd.searchLogLanYesHmPg(), \n",
+    "              pd.searchLogLanNo(), pd.searchLogLanNoHmPg(),\n",
+    "              pd.searchLogAllHmPg()\n",
+    "'''\n"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.5"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/06_Load_database.ipynb b/06_Load_database.ipynb
new file mode 100644
index 0000000..9e7506c
--- /dev/null
+++ b/06_Load_database.ipynb
@@ -0,0 +1,316 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Part 6. Load database\n",
+    "App to analyze web-site search logs (internal search)<br>\n",
+    "**This script:** The log table, in the database<br>\n",
+    "Authors: dan.wendling@nih.gov, <br>\n",
+    "Last modified: 2018-09-09\n",
+    "\n",
+    "For now let's load search_log and semantic_network. I decided not to load other tables for now; let's see how the work goes. The next candidate would be 01_Text_wrangling_files/GoldStandard_master.xlsx\n",
+    "\n",
+    "Preference: Postgres. If MySQL, the 03_Fuzzy_match file has code for SQLAlchemy, MySQLConnector, etc.\n",
+    "\n",
+    "\n",
+    "# Contents\n",
+    "1. search_log table\n",
+    "2. manual_assignments table\n",
+    "3. semantic_network table"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# 1. search_log table"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "'\\nsql code; this has worked with past code.\\n\\nDROP TABLE IF EXISTS `search_log`;\\nCREATE TABLE `search_log` (\\n    `search_log_id` INT PRIMARY KEY NOT NULL AUTO_INCREMENT,\\n    `Timestamp` datetime DEFAULT NULL,\\n    `Query` varchar(800) DEFAULT NULL,\\n    `Address` varchar(900) DEFAULT NULL,\\n    `SessionID` varchar(15) NOT NULL,\\n    `preferredTerm` text,\\n    `SemanticTypeName` text,\\n    `SemanticTypeCode` int(11) DEFAULT NULL,\\n    `SemanticGroup` text,\\n    `SemanticGroupCode` int(11) DEFAULT NULL,\\n    `Month` text\\n) ENGINE=InnoDB DEFAULT CHARSET=utf8;\\n\\n# For a quick start with data, please try 06_Load_database/logAfterGoldStandard.xlsx, \\nwhich was copied over from 01_Text_wrangling_files.\\n'"
+      ]
+     },
+     "execution_count": 2,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "'''\n",
+    "sql code; this has worked with past code.\n",
+    "\n",
+    "DROP TABLE IF EXISTS `search_log`;\n",
+    "CREATE TABLE `search_log` (\n",
+    "    `search_log_id` INT PRIMARY KEY NOT NULL AUTO_INCREMENT,\n",
+    "    `Timestamp` datetime DEFAULT NULL,\n",
+    "    `Query` varchar(800) DEFAULT NULL,\n",
+    "    `Address` varchar(900) DEFAULT NULL,\n",
+    "    `SessionID` varchar(15) NOT NULL,\n",
+    "    `preferredTerm` text,\n",
+    "    `SemanticTypeName` text,\n",
+    "    `SemanticTypeCode` int(11) DEFAULT NULL,\n",
+    "    `SemanticGroup` text,\n",
+    "    `SemanticGroupCode` int(11) DEFAULT NULL,\n",
+    "    `Month` text\n",
+    ") ENGINE=InnoDB DEFAULT CHARSET=utf8;\n",
+    "\n",
+    "# For a quick start with data, please try 06_Load_database/logAfterGoldStandard.xlsx, \n",
+    "which was copied over from 01_Text_wrangling_files.\n",
+    "'''"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# 2. manual_assignments table\n",
+    "\n",
+    "See 03_Fuzzy_match for how this is constructed/used.\n",
+    "\n",
+    "<img src=\"03_Fuzzy_match_files/fuzzyMatch-Browser.png\" />"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "'''    \n",
+    "DROP TABLE IF EXISTS manual_assignments;\n",
+    "CREATE TABLE `manual_assignments` (\n",
+    "  `assignment_id` INT PRIMARY KEY NOT NULL AUTO_INCREMENT,\n",
+    "  `adjustedQueryCase` varchar(200) NULL,\n",
+    "  `NewSemanticTypeName` varchar(100) NULL,\n",
+    "  `preferredTerm` varchar(200) NULL,\n",
+    "  `FuzzyToken` varchar(50) NULL,\n",
+    "  `SemanticTypeName` varchar(100) NULL,\n",
+    "  `SemanticGroup` varchar(50) NULL,\n",
+    "  `timesSearched` int(11) NULL,\n",
+    "  `FuzzyScore` int(11) NULL\n",
+    ") ENGINE=InnoDB DEFAULT CHARSET=utf8;\n",
+    "'''"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# 3. semantic_network table\n",
+    "I didn't see what I needed online, so created it here. Should be sufficient for joining with the processed logs for reporting.\n",
+    "\n",
+    "Table has one UMLS Semantic Type per row.\n",
+    "\n",
+    "* SemanticGroupCode: With SemanticGroup, SemanticGroupAbr, identifies ~15 supergroups, see McCray AT, Burgun A, Bodenreider O. (2001). Aggregating UMLS semantic types for reducing conceptual complexity. Stud Health Technol Inform. 84(Pt 1):216-20. PMID: 11604736. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4300099/ and https://semanticnetwork.nlm.nih.gov/.\n",
+    "* SemanticGroup: See SemanticGroupCode.\n",
+    "* SemanticGroupAbr: See SemanticGroupCode.\n",
+    "* CustomTreeNumber: An attempt to get queries to dump in the correct order so counts could be attached to each semantic type, with proper indentation.\n",
+    "* SemanticTypeName: See https://www.nlm.nih.gov/research/umls/META3_current_semantic_types.html\n",
+    "* BranchPosition: Use to create indents in browser-based reporting to make it look like https://www.nlm.nih.gov/research/umls/META3_current_semantic_types.html (but one column).\n",
+    "* Definition: Sem type Definition from UMLS documentation.\n",
+    "* Examples: Sem type examples from UMLS doc, with a few added.\n",
+    "* RelationName: Semantic \"triples\" from UMLS doc; not currently used.\n",
+    "* SemTypeTreeNo: Sem type tree number from UMLS doc; not currently used.\n",
+    "* UsageNote: From UMLS doc.\n",
+    "* Abbreviation: Sem type abbrev from UMLS doc; not currently used.\n",
+    "* UniqueID: Another attempt to sort the table in hierarchical order.\n",
+    "* NonHumanFlag: From UMLS doc; not currently used.\n",
+    "* RecordType: From UMLS doc; not currently used. The UMLS Semantic Network includes other content that could be added such as if we wanted to do {item} {howrelated} {item}. \n",
+    "\n",
+    "This information can go out of date over time. It is current as of Summer 2018."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "'''\n",
+    "# sql code\n",
+    "CREATE TABLE `semantic_network` (\n",
+    "  `SemanticGroupCode` bigint(20) DEFAULT NULL,\n",
+    "  `SemanticGroup` text,\n",
+    "  `SemanticGroupAbr` text,\n",
+    "  `CustomTreeNumber` bigint(20) DEFAULT NULL,\n",
+    "  `SemanticTypeName` text,\n",
+    "  `BranchPosition` bigint(20) DEFAULT NULL,\n",
+    "  `Definition` text,\n",
+    "  `Examples` text,\n",
+    "  `RelationName` text,\n",
+    "  `SemTypeTreeNo` text,\n",
+    "  `UsageNote` text,\n",
+    "  `Abbreviation` text,\n",
+    "  `UniqueID` bigint(20) DEFAULT NULL,\n",
+    "  `NonHumanFlag` text,\n",
+    "  `RecordType` text\n",
+    ") ENGINE=InnoDB DEFAULT CHARSET=utf8;\n",
+    "\n",
+    "--\n",
+    "-- Dumping data for table `semantic_network`\n",
+    "--\n",
+    "\n",
+    "INSERT INTO `semantic_network` (`SemanticGroupCode`, `SemanticGroup`, `SemanticGroupAbr`, `CustomTreeNumber`, `SemanticTypeName`, `BranchPosition`, `Definition`, `Examples`, `RelationName`, `SemTypeTreeNo`, `UsageNote`, `Abbreviation`, `UniqueID`, `NonHumanFlag`, `RecordType`) VALUES\n",
+    "(1, 'Activities and Behaviors', 'ACTI', 2, 'Event', 1, 'A broad type for grouping activities, processes and states.', 'Anniversaries; Exposure to Mumps virus (event); Device Unattended', '{inverse_isa} Activity; {inverse_isa} Phenomenon or Process', 'B', 'Few concepts will be assigned to this broad type.', 'evnt', 1051, NULL, 'STY'),\n",
+    "(1, 'Activities and Behaviors', 'ACTI', 21, 'Activity', 2, 'An operation or series of operations that an organism or machine carries out or participates in.', 'Expeditions; Information Distribution; Social Planning', '{isa} Event; {inverse_isa} Behavior; {inverse_isa} Daily or Recreational Activity; {inverse_isa} Occupational Activity; {inverse_isa} Machine Activity', 'B1', 'Few concepts will be assigned to this broad type. Wherever possible, one of the more specific types from this hierarchy will be chosen. For concepts assigned to this type, the focus of interest is on the activity. When the focus of interest is the individual or group that is carrying out the activity, then a type from the \\'Behavior\\' hierarchy will be chosen. In general, concepts will not receive a type from both the \\'Activity\\' and the \\'Behavior\\' hierarchies.', 'acty', 1052, NULL, 'STY'),\n",
+    "(1, 'Activities and Behaviors', 'ACTI', 211, 'Behavior', 3, 'Any of the psycho-social activities of humans or animals that can be observed directly by others or can be made systematically observable by the use of special strategies.', 'Homing Behavior; Sexuality; Habitat Selection', '{isa} Activity; {inverse_isa} Social Behavior; {inverse_isa} Individual Behavior', 'B1.1', 'Few concepts will be assigned to this broad type. For concepts assigned to the \\'Behavior\\' hierarchy, the focus of interest is on the individual or group that is carrying out the activity. When the activity is of paramount interest, then a type from the \\'Activity\\' hierarchy will be chosen. In general, concepts will not receive a type from both the \\'Behavior\\' and the \\'Activity\\' hierarchies.', 'bhvr', 1053, 'Y', 'STY'),\n",
+    "(1, 'Activities and Behaviors', 'ACTI', 212, 'Daily or Recreational Activity', 3, 'An activity carried out for recreation or exercise, or as part of daily life.', 'Badminton; Dancing; Swimming', '{isa} Activity', 'B1.2', NULL, 'dora', 1056, NULL, 'STY'),\n",
+    "(1, 'Activities and Behaviors', 'ACTI', 213, 'Occupational Activity', 3, 'An activity carried out as part of an occupation or job.', 'Collective Bargaining; Commerce; Containment of Biohazards', '{isa} Activity; {inverse_isa} Health Care Activity; {inverse_isa} Research Activity; {inverse_isa} Governmental or Regulatory Activity; {inverse_isa} Educational Activity', 'B1.3', NULL, 'ocac', 1057, NULL, 'STY'),\n",
+    "(1, 'Activities and Behaviors', 'ACTI', 214, 'Machine Activity', 3, 'An activity carried out primarily or exclusively by machines.', 'Computer Simulation; Equipment Failure; Natural Language Processing', '{isa} Activity', 'B1.4', NULL, 'mcha', 1066, NULL, 'STY'),\n",
+    "(1, 'Activities and Behaviors', 'ACTI', 2111, 'Social Behavior', 4, 'Behavior that is a direct result or function of the interaction of humans or animals with their fellows. This includes behavior that may be considered anti-social.', 'Acculturation; Communication; Interpersonal Relations', '{isa} Behavior', 'B1.1.1', '\\'Social Behavior\\' requires the direct participation of others and is, thus, distinguished from \\'Individual Behavior\\' which is carried out by an individual, though others may be present.', 'socb', 1054, NULL, 'STY'),\n",
+    "(1, 'Activities and Behaviors', 'ACTI', 2112, 'Individual Behavior', 4, 'Behavior exhibited by a human or an animal that is not a direct result of interaction with other members of the species, but which may have an effect on others.', 'Assertiveness; Grooming; Risk-Taking', '{isa} Behavior', 'B1.1.2', '\\'Individual Behavior\\' is carried out by an individual, though others may be present, and is, thus, distinguished from \\'Social Behavior\\' which requires the direct participation of others.', 'inbe', 1055, NULL, 'STY'),\n",
+    "(1, 'Activities and Behaviors', 'ACTI', 2133, 'Governmental or Regulatory Activity', 4, 'An activity carried out by officially constituted governments, or an activity related to the creation or enforcement of the rules or regulations governing some field of endeavor.', 'Certification; Credentialing; Public Policy', '{isa} Occupational Activity', 'B1.3.3', NULL, 'gora', 1064, NULL, 'STY'),\n",
+    "(2, 'Anatomy ', 'ANAT', 112, 'Anatomical Structure', 3, 'A normal or pathological part of the anatomy or structural organization of an organism.', 'Cadaver; Pharyngostome; Anatomic structures', '{isa} Physical Object; {inverse_isa} Embryonic Structure; {inverse_isa} Fully Formed Anatomical Structure; {inverse_isa} Anatomical Abnormality', 'A1.2', 'Few concepts will be assigned to this broad type.', 'anst', 1017, 'Y', 'STY'),\n",
+    "(2, 'Anatomy ', 'ANAT', 1121, 'Embryonic Structure', 4, 'An anatomical structure that exists only before the organism is fully formed; in mammals, for example, a structure that exists only prior to the birth of the organism. This structure may be normal or abnormal.', 'Blastoderm; Fetus; Neural Crest', '{isa} Anatomical Structure', 'A1.2.1', NULL, 'emst', 1018, NULL, 'STY'),\n",
+    "(2, 'Anatomy ', 'ANAT', 1123, 'Fully Formed Anatomical Structure', 4, 'An anatomical structure in a fully formed organism; in mammals, for example, a structure in the body after the birth of the organism.', 'Entire body as a whole; Female human body; Set of parts of human body', '{isa} Anatomical Structure; {inverse_isa} Body Part, Organ, or Organ Component; {inverse_isa} Tissue; {inverse_isa} Cell; {inverse_isa} Cell Component; {inverse_isa} Gene or Genome', 'A1.2.3', 'Few concepts will be assigned to this broad type.', 'ffas', 1021, NULL, 'STY'),\n",
+    "(2, 'Anatomy ', 'ANAT', 1142, 'Body Substance', 4, 'Extracellular material, or mixtures of cells and extracellular material, produced, excreted, or accreted by the body. Included here are substances such as saliva, dental enamel, sweat, and gastric acid.', 'Amniotic Fluid; saliva; Smegma', '{isa} Substance', 'A1.4.2', NULL, 'bdsu', 1031, 'Y', 'STY'),\n",
+    "(2, 'Anatomy ', 'ANAT', 11231, 'Body Part, Organ, or Organ Component', 5, 'A collection of cells and tissues which are localized to a specific area or combine and carry out one or more specialized functions of an organism. This ranges from gross structures to small components of complex organs. These structures are relatively localized in comparison to tissues.', 'Aorta; Brain Stem; Structure of neck of femur', '{isa} Fully Formed Anatomical Structure', 'A1.2.3.1', 'When assigning this type, consider whether \\'Body Location or Region\\' might be the correct choice.', 'bpoc', 1023, NULL, 'STY'),\n",
+    "(2, 'Anatomy ', 'ANAT', 11232, 'Tissue', 5, 'An aggregation of similarly specialized cells and the associated intercellular substance. Tissues are relatively non-localized in comparison to body parts, organs or organ components.', 'Cartilage; Endothelium; Epidermis', '{isa} Fully Formed Anatomical Structure', 'A1.2.3.2', NULL, 'tisu', 1024, NULL, 'STY'),\n",
+    "(2, 'Anatomy ', 'ANAT', 11233, 'Cell', 5, 'The fundamental structural and functional unit of living organisms.', 'B-Lymphocytes; Dendritic Cells; Fibroblasts', '{isa} Fully Formed Anatomical Structure', 'A1.2.3.3', NULL, 'cell', 1025, NULL, 'STY'),\n",
+    "(2, 'Anatomy ', 'ANAT', 11234, 'Cell Component', 5, 'A part of a cell or the intercellular matrix, generally visible by light microscopy.', 'Axon; Golgi Apparatus; Organelles', '{isa} Fully Formed Anatomical Structure', 'A1.2.3.4', NULL, 'celc', 1026, NULL, 'STY'),\n",
+    "(2, 'Anatomy ', 'ANAT', 12141, 'Body System', 5, 'A complex of anatomical structures that performs a common function.', 'Endocrine system; Renin-angiotensin system; Reticuloendothelial System', '{isa} Functional Concept', 'A2.1.4.1', NULL, 'bdsy', 1022, NULL, 'STY'),\n",
+    "(2, 'Anatomy ', 'ANAT', 12151, 'Body Space or Junction', 5, 'An area enclosed or surrounded by body parts or organs or the place where two anatomical structures meet or connect.', 'Knee joint; Greater sac of peritoneum; Synapses', '{isa} Spatial Concept', 'A2.1.5.1', NULL, 'bsoj', 1030, 'Y', 'STY'),\n",
+    "(2, 'Anatomy ', 'ANAT', 12152, 'Body Location or Region', 5, 'An area, subdivision, or region of the body demarcated for the purpose of topographical description.', 'Forehead; Sublingual Region; Base of skull structure', '{isa} Spatial Concept', 'A2.1.5.2', 'When assigning this type, consider whether \\'Body Part, Organ, or Organ Component\\' might be the correct choice.', 'blor', 1029, 'Y', 'STY'),\n",
+    "(3, 'Chemicals and Drugs', 'CHEM', 1133, 'Clinical Drug', 4, 'A pharmaceutical preparation as produced by the manufacturer. The name usually includes the substance, its strength, and the form, but may include the substance and only one of the other two items.', 'Ranitidine 300 MG Oral Tablet [Zantac]; Aspirin 300 MG Delayed Release Oral Tablet; sleeping pill', '{isa} Manufactured Object', 'A1.3.3', 'Do not double type with Pharmacologic Substance, Antibiotic, or other chemical semantic types.', 'clnd', 1200, NULL, 'STY'),\n",
+    "(3, 'Chemicals and Drugs', 'CHEM', 1141, 'Chemical', 4, 'Compounds or substances of definite molecular composition. Chemicals are viewed from two distinct perspectives in the network, functionally and structurally. Almost every chemical concept is assigned at least two types, generally one from the structure hierarchy and at least one from the function hierarchy.', 'Acids; Chemicals; Ionic Liquids', '{isa} Substance; {inverse_isa} Chemical Viewed Structurally; {inverse_isa} Chemical Viewed Functionally', 'A1.4.1', 'Few concepts will be assigned to this broad type.', 'chem', 1103, NULL, 'STY'),\n",
+    "(3, 'Chemicals and Drugs', 'CHEM', 11411, 'Chemical Viewed Functionally', 5, 'A chemical viewed from the perspective of its functional characteristics or pharmacological activities.', 'Aerosol Propellants; Detergents; Stabilizing Agents', '{isa} Chemical; {inverse_isa} Pharmacologic Substance; {inverse_isa} Biomedical or Dental Material; {inverse_isa} Biologically Active Substance; {inverse_isa} Indicator, Reagent, or Diagnostic Aid; {inverse_isa} Hazardous or Poisonous Substance', 'A1.4.1.1', 'A specific chemical will not be assigned here. Groupings of chemicals viewed functionally, such as \\\"Aerosol Propellants\\\" may appropriately be assigned here. A name that is inherently functional, such as \\\"Food Additives\\\", will not also be assigned a type from the \\'Chemical Viewed Structurally\\' hierarchy.', 'chvf', 1120, NULL, 'STY'),\n",
+    "(3, 'Chemicals and Drugs', 'CHEM', 11412, 'Chemical Viewed Structurally', 5, 'A chemical or chemicals viewed from the perspective of their structural characteristics. Included here are concepts which can mean either a salt, an ion, or a compound (e.g., \\\"Bromates\\\" and \\\"Bromides\\\").', 'Ammonium Compounds; Cations; Sulfur Compounds', '{isa} Chemical; {inverse_isa} Organic Chemical; {inverse_isa} Element, Ion, or Isotope; {inverse_isa} Inorganic Chemical', 'A1.4.1.2', 'Concepts are assigned to this type if they can be both organic and inorganic, e.g. sulfur compounds. Do not use this type if the concept has an important functional aspect, e.g., \\\"Mylanta Double Strength Liquid\\\" contains Al(OH)3, Mg(OH)2, and simethicone, but would be assigned only to \\'Pharmacologic Substance\\'.', 'chvs', 1104, NULL, 'STY'),\n",
+    "(3, 'Chemicals and Drugs', 'CHEM', 114111, 'Pharmacologic Substance', 6, 'A substance used in the treatment or prevention of pathologic disorders. This includes substances that occur naturally in the body and are administered therapeutically.', 'Antiemetics; Cardiovascular Agents; Alka-Seltzer', '{isa} Chemical Viewed Functionally; {inverse_isa} Antibiotic', 'A1.4.1.1.1', 'If a substance is both endogenous and typically used as a drug, then this type and the type \\'Biologically Active Substance\\' or one of its children are assigned. Body substances that are used therapeutically such as whole blood preparation, NOS would only receive the type \\'Body Substance\\'. Substances used in the diagnosis or analysis of normal and abnormal body functions should be given the type \\'Indicator, Reagent, or Diagnostic Aid\\'.', 'phsu', 1121, NULL, 'STY'),\n",
+    "(3, 'Chemicals and Drugs', 'CHEM', 114112, 'Biomedical or Dental Material', 6, 'A substance used in biomedicine or dentistry predominantly for its physical, as opposed to chemical, properties. Included here are biocompatible materials, tissue adhesives, bone cements, resins, toothpastes, etc.', 'Acrylic Resins; Bone Cements; Dentifrices', '{isa} Chemical Viewed Functionally', 'A1.4.1.1.2', NULL, 'bodm', 1122, NULL, 'STY'),\n",
+    "(3, 'Chemicals and Drugs', 'CHEM', 114113, 'Biologically Active Substance', 6, 'A generally endogenous substance produced or required by an organism, of primary interest because of its role in the biologic functioning of the organism that produces it.', 'Cytokinins; Pheromone', '{isa} Chemical Viewed Functionally; {inverse_isa} Hormone; {inverse_isa} Enzyme; {inverse_isa} Vitamin; {inverse_isa} Immunologic Factor; {inverse_isa} Receptor', 'A1.4.1.1.3', 'If a substance is both endogenous and typically used as a drug, then this type and the type \\'Pharmacologic Substance\\' are assigned.', 'bacs', 1123, NULL, 'STY'),\n",
+    "(3, 'Chemicals and Drugs', 'CHEM', 114114, 'Indicator, Reagent, or Diagnostic Aid', 6, 'A substance primarily of interest for its use in laboratory or diagnostic tests and procedures to detect, measure, examine, or analyze other chemicals, processes, or conditions.', 'Fluorescent Dyes; Indicators and Reagents; India ink stain', '{isa} Chemical Viewed Functionally', 'A1.4.1.1.4', 'Radioactive imaging agents should be assigned to this type and not to the type \\'Pharmacologic Substance\\' unless they are also being used therapeutically.', 'irda', 1130, NULL, 'STY'),\n",
+    "(3, 'Chemicals and Drugs', 'CHEM', 114115, 'Hazardous or Poisonous Substance', 6, 'A substance of concern because of its potentially hazardous or toxic effects. This would include most drugs of abuse, as well as agents that require special handling because of their toxicity.', 'Carcinogens; Fumigant; Mutagens', '{isa} Chemical Viewed Functionally', 'A1.4.1.1.5', 'Most pharmaceutical agents, although potentially harmful, are excluded here and are assigned to the type \\'Pharmacologic Substance\\'. All pesticides are assigned to this type.', 'hops', 1131, NULL, 'STY'),\n",
+    "(3, 'Chemicals and Drugs', 'CHEM', 114121, 'Organic Chemical', 6, 'The general class of carbon-containing compounds, usually based on carbon chains or rings, and also containing hydrogen (hydrocarbons), with or without nitrogen, oxygen, or other elements in which the bonding between elements is generally covalent.', 'Benzene Derivatives', '{isa} Chemical Viewed Structurally; {inverse_isa} Nucleic Acid, Nucleoside, or Nucleotide; {inverse_isa} Amino Acid, Peptide, or Protein', 'A1.4.1.2.1', 'Salts of organic chemicals (such as Calcium Acetate) would be considered organic chemicals and should not also receive the type \\'Inorganic Chemical\\'.', 'orch', 1109, NULL, 'STY'),\n",
+    "(3, 'Chemicals and Drugs', 'CHEM', 114122, 'Inorganic Chemical', 6, 'Chemical elements and their compounds, excluding the hydrocarbons and their derivatives (except carbides, carbonates, cyanides, cyanates and carbon disulfide). Generally inorganic compounds contain ionic bonds. Included here are inorganic acids and salts, alloys, alkalies, and minerals.', 'Carbonic Acid; aluminum nitride; ferric citrate', '{isa} Chemical Viewed Structurally', 'A1.4.1.2.2', NULL, 'inch', 1197, NULL, 'STY'),\n",
+    "(3, 'Chemicals and Drugs', 'CHEM', 114123, 'Element, Ion, or Isotope', 6, 'One of the 109 presently known fundamental substances that comprise all matter at and above the atomic level. This includes elemental metals, rare gases, and most abundant naturally occurring radioactive elements, as well as the ionic counterparts of elements (NA+, Cl-), and the less abundant isotopic forms. This does not include organic ions such as iodoacetate to which the type \\'Organic Chemical\\' is assigned.', 'Carbon; Chromium Isotopes; Radioisotopes', '{isa} Chemical Viewed Structurally', 'A1.4.1.2.3', 'Group terms such as sulfates would be assigned to the type \\'Chemical Viewed Structurally\\'. Substances such as aluminum chloride would be assigned the type \\'Inorganic Chemical\\'. Technetium Tc 99m Aggregated Albumin would not receive this type.', 'elii', 1196, NULL, 'STY'),\n",
+    "(3, 'Chemicals and Drugs', 'CHEM', 1141111, 'Antibiotic', 7, 'A pharmacologically active compound produced by growing microorganisms which kill or inhibit growth of other microorganisms.', 'Antibiotics; bactericide; Thienamycins', '{isa} Pharmacologic Substance', 'A1.4.1.1.1.1', NULL, 'antb', 1195, NULL, 'STY'),\n",
+    "(3, 'Chemicals and Drugs', 'CHEM', 1141132, 'Hormone', 7, 'In animals, a chemical usually secreted by an endocrine gland whose products are released into the circulating fluid. Hormones act as chemical messengers and regulate various physiologic processes such as growth, reproduction, metabolism, etc. They usually fall into two broad classes, steroid hormones and peptide hormones.', 'Enteric Hormones; thymic humoral factor; Prohormone', '{isa} Biologically Active Substance', 'A1.4.1.1.3.2', 'Synthetic hormones that are used as drugs should receive this type and \\'Pharmacologic Substance\\'. Plant hormones are assigned only to the type \\'Pharmacologic Substance\\'.', 'horm', 1125, NULL, 'STY'),\n",
+    "(3, 'Chemicals and Drugs', 'CHEM', 1141133, 'Enzyme', 7, 'A complex chemical, usually a protein, that is produced by living cells and which catalyzes specific biochemical reactions. There are six main types of enzymes: oxidoreductases, transferases, hydrolases, lyases, isomerases, and ligases.', 'GTP Cyclohydrolase II; enzyme substrate complex; arginine amidase', '{isa} Biologically Active Substance', 'A1.4.1.1.3.3', 'Generally when a concept is assigned to this type, it will also be assigned to the type \\'Amino Acid, Peptide, or Protein\\'.', 'enzy', 1126, NULL, 'STY'),\n",
+    "(3, 'Chemicals and Drugs', 'CHEM', 1141134, 'Vitamin', 7, 'A substance, usually an organic chemical complex, present in natural products or made synthetically, which is essential in the diet of man or other higher animals. Included here are vitamin precursors, provitamins, and vitamin supplements.', '5,25-Dihydroxy cholecalciferol; alpha-tocopheryl oxalate; Vitamin A [EPC]', '{isa} Biologically Active Substance', 'A1.4.1.1.3.4', 'Essential amino acids are not assigned to this type. They will be assigned to the type \\'Amino Acid, Peptide, or Protein\\'. This can be used with \\'Pharmacologic Substance\\' if the compound is being administered therapeutically or if the source has it classified as therapeutic (i.e., N\\'ICE Sugarless Vitamin C Drops).', 'vita', 1127, NULL, 'STY'),\n",
+    "(3, 'Chemicals and Drugs', 'CHEM', 1141135, 'Immunologic Factor', 7, 'A biologically active substance whose activities affect or play a role in the functioning of the immune system.', 'Antigens; Immunologic Factors; Blood group antigen P', '{isa} Biologically Active Substance', 'A1.4.1.1.3.5', 'Antigens and antibodies are assigned to this type. Unlike most biologically active substances, some immunologic factors may be exogenous. Vaccines should be given this type and the type \\'Pharmacologic Substance\\'.', 'imft', 1129, NULL, 'STY'),\n",
+    "(3, 'Chemicals and Drugs', 'CHEM', 1141136, 'Receptor', 7, 'A specific structure or site on the cell surface or within its cytoplasm that recognizes and binds with other specific molecules. These include the proteins on the surface of an immunocompetent cell that binds with antigens, or proteins found on the surface molecules that bind with hormones or neurotransmitters and react with other molecules that respond in a specific way.', 'Binding Sites; Lymphocyte antigen CD4 receptor; integrin alpha11beta1', '{isa} Biologically Active Substance', 'A1.4.1.1.3.6', NULL, 'rcpt', 1192, NULL, 'STY'),\n",
+    "(3, 'Chemicals and Drugs', 'CHEM', 1141215, 'Nucleic Acid, Nucleoside, or Nucleotide', 7, 'A complex compound of high molecular weight occurring in living cells. These are basically of two types, ribonucleic (RNA) and deoxyribonucleic (DNA) acids. Nucleic acids are made of nucleotides (nitrogen-containing base, a 5-carbon sugar, and one or more phosphate group) linked together by a phosphodiester bond between the 5\\' and 3\\' carbon atoms. Nucleosides are compounds composed of a purine or pyrimidine base (usually adenine, cytosine, guanine, thymine, uracil) linked to either a ribose or a deoxyribose sugar.', 'Cytosine Nucleotides; Guanine; Oligonucleotides', '{isa} Organic Chemical', 'A1.4.1.2.1.5', 'Naturally occurring nucleic acids, nucleosides, or nucleotides will also be assigned a type from the \\'Biologically Active Substance\\' hierarchy.', 'nnon', 1114, NULL, 'STY'),\n",
+    "(3, 'Chemicals and Drugs', 'CHEM', 1141217, 'Amino Acid, Peptide, or Protein', 7, 'Amino acids and chains of amino acids connected by peptide linkages.', 'Amino Acids, Cyclic; Glycopeptides; Keratin', '{isa} Organic Chemical', 'A1.4.1.2.1.7', 'When the concept is both an enzyme and a protein, this type and the type \\'Enzyme\\' will be assigned.', 'aapp', 1116, NULL, 'STY'),\n",
+    "(4, 'Concepts and Ideas', 'CONC', 12, 'Conceptual Entity', 2, 'A broad type for grouping abstract entities or concepts.', 'Geographic Factors; Fractals; Secularism', '{isa} Entity; {inverse_isa} Organism Attribute; {inverse_isa} Finding; {inverse_isa} Idea or Concept; {inverse_isa} Occupation or Discipline; {inverse_isa} Organization; {inverse_isa} Group; {inverse_isa} Group Attribute; {inverse_isa} Intellectual Product; {inverse_isa} Language', 'A2', 'Few concepts will be assigned to this broad type.', 'cnce', 1077, NULL, 'STY'),\n",
+    "(4, 'Concepts and Ideas', 'CONC', 121, 'Idea or Concept', 3, 'An abstract concept, such as a social, religious or philosophical concept.', 'Capitalism; Civil Rights; Ethics', '{isa} Conceptual Entity; {inverse_isa} Temporal Concept; {inverse_isa} Qualitative Concept; {inverse_isa} Quantitative Concept; {inverse_isa} Spatial Concept; {inverse_isa} Functional Concept', 'A2.1', NULL, 'idcn', 1078, NULL, 'STY'),\n",
+    "(4, 'Concepts and Ideas', 'CONC', 124, 'Intellectual Product', 3, 'A conceptual entity resulting from human endeavor. Concepts assigned to this type generally refer to information created by humans for some purpose.', 'Decision Support Techniques; Information Systems; Literature', '{isa} Conceptual Entity; {inverse_isa} Regulation or Law; {inverse_isa} Classification', 'A2.4', 'Concepts referring to theorems, models, and systems are assigned here. In some cases, a concept may be assigned to both \\'Intellectual Product\\' and \\'Research Activity\\'. For example, the concept \\\"Comparative Study\\\" might be viewed as both an activity and the result, or product, of that activity.', 'inpr', 1170, NULL, 'STY'),\n",
+    "(4, 'Concepts and Ideas', 'CONC', 125, 'Language', 3, 'The system of communication used by a particular nation or people.', 'Armenian language; braille; Bilingualism', '{isa} Conceptual Entity', 'A2.5', NULL, 'lang', 1171, NULL, 'STY'),\n",
+    "(4, 'Concepts and Ideas', 'CONC', 128, 'Group Attribute', 3, 'A conceptual entity which refers to the frequency or distribution of certain characteristics or phenomena in certain groups.', 'Family Size; Group Structure; Life Expectancy', '{isa} Conceptual Entity', 'A2.8', NULL, 'grpa', 1102, NULL, 'STY'),\n",
+    "(4, 'Concepts and Ideas', 'CONC', 1211, 'Temporal Concept', 4, 'A concept which pertains to time or duration.', 'Birth Intervals; Half-Life; Postoperative Period', '{isa} Idea or Concept', 'A2.1.1', 'If the concept refers to a phase, stage, cycle, interval, period, or rhythm, it is assigned to this type.', 'tmco', 1079, NULL, 'STY'),\n",
+    "(4, 'Concepts and Ideas', 'CONC', 1212, 'Qualitative Concept', 4, 'A concept which is an assessment of some quality, rather than a direct measurement.', 'Clinical Competence; Consumer Satisfaction; Health Status', '{isa} Idea or Concept', 'A2.1.2', NULL, 'qlco', 1080, NULL, 'STY'),\n",
+    "(4, 'Concepts and Ideas', 'CONC', 1213, 'Quantitative Concept', 4, 'A concept which involves the dimensions, quantity or capacity of something using some unit of measure, or which involves the quantitative comparison of entities.', 'Age Distribution; Metric System; Selection Bias', '{isa} Idea or Concept', 'A2.1.3', 'If the concept refers to rate or distribution, the type \\'Temporal Concept\\' is not also assigned.', 'qnco', 1081, NULL, 'STY'),\n",
+    "(4, 'Concepts and Ideas', 'CONC', 1214, 'Functional Concept', 4, 'A concept which is of interest because it pertains to the carrying out of a process or activity.', 'Interviewer Effect; Problem Formulation; Endogenous', '{isa} Idea or Concept; {inverse_isa} Body System', 'A2.1.4', NULL, 'ftcn', 1169, NULL, 'STY'),\n",
+    "(4, 'Concepts and Ideas', 'CONC', 1215, 'Spatial Concept', 4, 'A location, region, or space, generally having definite boundaries.', 'Mandibular Rest Position; Lateral; Extrinsic', '{isa} Idea or Concept; {inverse_isa} Body Location or Region; {inverse_isa} Body Space or Junction; {inverse_isa} Geographic Area; {inverse_isa} Molecular Sequence', 'A2.1.5', NULL, 'spco', 1082, NULL, 'STY'),\n",
+    "(4, 'Concepts and Ideas', 'CONC', 1241, 'Classification', 4, 'A term or system of terms denoting an arrangement by class or category.', 'Anatomy (MeSH Category); Tumor Stage Classification; axis i', '{isa} Intellectual Product', 'A2.4.1', NULL, 'clas', 1185, NULL, 'STY'),\n",
+    "(4, 'Concepts and Ideas', 'CONC', 1242, 'Regulation or Law', 4, 'An intellectual product resulting from legislative or regulatory activity.', 'Building Codes; Criminal Law; Health Planning Guidelines', '{isa} Intellectual Product', 'A2.4.2', NULL, 'rnlw', 1089, NULL, 'STY'),\n",
+    "(5, 'Devices', 'DEVI', 1131, 'Medical Device', 4, 'A manufactured object used primarily in the diagnosis, treatment, or prevention of physiologic or anatomic disorders.', 'Bone Screws; Headgear, Orthodontic; Compression Stockings', '{isa} Manufactured Object; {inverse_isa} Drug Delivery Device', 'A1.3.1', 'A medical device may be used for research purposes, but since its primary use is for routine medical care, it is distinguished from a \\'Research Device\\' which is used primarily for research purposes.', 'medd', 1074, NULL, 'STY'),\n",
+    "(5, 'Devices', 'DEVI', 1132, 'Research Device', 4, 'A manufactured object used primarily in carrying out scientific research or experimentation.', 'Electrodes, Enzyme; DNA Microarray Chip; Particle Count and Size Analyzer', '{isa} Manufactured Object', 'A1.3.2', 'A research device is distinguished from a \\'Medical Device\\', which though it may be used for research purposes is used primarily for routine medical care.', 'resd', 1075, NULL, 'STY'),\n",
+    "(5, 'Devices', 'DEVI', 11311, 'Drug Delivery Device', 5, 'A medical device that contains a clinical drug or drugs.', 'Nordette 21 Day Pack; {7 (Terazosin 1 MG Oral Tablet) / 7 (Terazosin 2 MG Oral Tablet) } Pack; {10 (cefdinir 300 MG Oral Capsule [Omnicef]) } Pack [Omni-Pac]', '{isa} Medical Device', 'A1.3.1.1', NULL, 'drdd', 1203, NULL, 'STY'),\n",
+    "(6, 'Disorders', 'DISO', 122, 'Finding', 3, 'That which is discovered by direct observation or measurement of an organism attribute or condition, including the clinical history of the patient. The history of the presence of a disease is a \\'Finding\\' and is distinguished from the disease itself.', 'Birth History; Downward displacement of diaphragm; Decreased glucose level', '{isa} Conceptual Entity; {inverse_isa} Laboratory or Test Result; {inverse_isa} Sign or Symptom', 'A2.2', 'Only in rare circumstances will findings be double-typed with either \\'Pathologic Function\\' or \\'Anatomical Abnormality\\'. Most findings will be assigned the types \\'Laboratory or Test Result\\' or \\'Sign or Symptom\\'. Only those findings that relate to patient history or to the determination of a state will be assigned the type \\'Finding\\'.', 'fndg', 1033, NULL, 'STY'),\n",
+    "(6, 'Disorders', 'DISO', 223, 'Injury or Poisoning', 3, 'A traumatic wound, injury, or poisoning caused by an external agent or force.', 'Accidental Falls; Carbon Monoxide Poisoning; Snake Bites', '{isa} Phenomenon or Process', 'B2.3', 'An `Injury or Poisoning\\' is distinguished from a \\'Disease or Syndrome\\' that may be a result of prolonged exposure to toxic materials.', 'inpo', 1037, NULL, 'STY'),\n",
+    "(6, 'Disorders', 'DISO', 1122, 'Anatomical Abnormality', 4, 'An abnormal structure, or one that is abnormal in size or location.', 'Bronchial Fistula; Foot Deformities; Hyperostosis of skull', '{isa} Anatomical Structure; {inverse_isa} Congenital Abnormality; {inverse_isa} Acquired Abnormality', 'A1.2.2', 'Use this type if the abnormality in question can be either an acquired or congenital abnormality. Neoplasms are not included here. These are given the type \\'Neoplastic Process\\'. If an anatomical abnormality has a pathologic manifestation, then it will additionally be given the type \\'Disease or Syndrome\\', e.g., \\\"Diabetic Cataract\\\" will be double-typed for this reason.', 'anab', 1190, NULL, 'STY'),\n",
+    "(6, 'Disorders', 'DISO', 1222, 'Sign or Symptom', 4, 'An observable manifestation of a disease or condition based on clinical judgment, or a manifestation of a disease or condition which is experienced by the patient and reported as a subjective observation.', 'Dyspnea; Nausea; Pain', '{isa} Finding', 'A2.2.2', NULL, 'sosy', 1184, NULL, 'STY'),\n",
+    "(6, 'Disorders', 'DISO', 11221, 'Congenital Abnormality', 5, 'An abnormal structure, or one that is abnormal in size or location, present at birth or evolving over time as a result of a defect in embryogenesis.', 'Albinism; Cleft palate with cleft lip; Polydactyly of toes', '{isa} Anatomical Abnormality', 'A1.2.2.1', 'If the congenital abnormality involves multiple defects then the type \\'Disease or Syndrome\\' will also be assigned.', 'cgab', 1019, NULL, 'STY'),\n",
+    "(6, 'Disorders', 'DISO', 11222, 'Acquired Abnormality', 5, 'An abnormal structure, or one that is abnormal in size or location, found in or deriving from a previously normal structure. Acquired abnormalities are distinguished from diseases even though they may result in pathological functioning (e.g., \\\"hernias incarcerate\\\").', 'Hemorrhoids; Hernia, Femoral; Cauliflower ear', '{isa} Anatomical Abnormality', 'A1.2.2.2', NULL, 'acab', 1020, NULL, 'STY'),\n",
+    "(6, 'Disorders', 'DISO', 22212, 'Pathologic Function', 5, 'A disordered process, activity, or state of the organism as a whole, of a body system or systems, or of multiple organs or tissues. Included here are normal responses to a negative stimulus as well as patholologic conditions or states that are less specific than a disease. Pathologic functions frequently have systemic effects.', 'Inflammation; Shock; Thrombosis', '{isa} Biologic Function; {inverse_isa} Disease or Syndrome; {inverse_isa} Cell or Molecular Dysfunction; {inverse_isa} Experimental Model of Disease', 'B2.2.1.2', 'If the process is specific, for example to a site or substance, then \\'Disease or Syndrome\\' will be assigned and not \\'Pathologic Function\\'. For example, \\\"cerebral anoxia\\\", \\\"brain edema\\\", and \\\"milk hypersensitivity\\\" will all be assigned to \\'Disease or Syndrome\\' only.', 'patf', 1046, NULL, 'STY'),\n",
+    "(6, 'Disorders', 'DISO', 222121, 'Disease or Syndrome', 6, 'A condition which alters or interferes with a normal process, state, or activity of an organism. It is usually characterized by the abnormal functioning of one or more of the host\\'s systems, parts, or organs. Included here is a complex of symptoms descriptive of a disorder.', 'Diabetes Mellitus; Drug Allergy; Malabsorption Syndrome', '{isa} Pathologic Function; {inverse_isa} Mental or Behavioral Dysfunction; {inverse_isa} Neoplastic Process', 'B2.2.1.2.1', 'Any specific disease or syndrome that is modified by such modifiers as \\\"acute\\\", \\\"prolonged\\\", etc. will also be assigned to this type. If an anatomic abnormality has a pathologic manifestation, then it will be given this type as well as a type from the \\'Anatomical Abnormality\\' hierarchy, e.g., \\\"Diabetic Cataract\\\" will be double-typed for this reason.', 'dsyn', 1047, NULL, 'STY'),\n",
+    "(6, 'Disorders', 'DISO', 222122, 'Cell or Molecular Dysfunction', 6, 'A pathologic function inherent to cells, parts of cells, or molecules.', 'DNA Damage; Wallerian Degeneration; Atypical squamous metaplasia', '{isa} Pathologic Function', 'B2.2.1.2.2', 'This is not intended to be a repository for diseases whose molecular basis has been established.', 'comd', 1049, NULL, 'STY'),\n",
+    "(6, 'Disorders', 'DISO', 222123, 'Experimental Model of Disease', 6, 'A representation in a non-human organism of a human disease for the purpose of research into its mechanism or treatment.', 'Alloxan Diabetes; Liver Cirrhosis, Experimental; Transient Gene Knock-Out Model', '{isa} Pathologic Function', 'B2.2.1.2.3', NULL, 'emod', 1050, NULL, 'STY'),\n",
+    "(6, 'Disorders', 'DISO', 2221211, 'Mental or Behavioral Dysfunction', 7, 'A clinically significant dysfunction whose major manifestation is behavioral or psychological. These dysfunctions may have identified or presumed biological etiologies or manifestations.', 'Agoraphobia; Cyclothymic Disorder; Frigidity', '{isa} Disease or Syndrome', 'B2.2.1.2.1.1', NULL, 'mobd', 1048, NULL, 'STY'),\n",
+    "(6, 'Disorders', 'DISO', 2221212, 'Neoplastic Process', 7, 'A new and abnormal growth of tissue in which the growth is uncontrolled and progressive. The growths may be malignant or benign.', 'Abdominal Neoplasms; Bowen\\'s Disease; Polyp in nasopharynx', '{isa} Disease or Syndrome', 'B2.2.1.2.1.2', 'All neoplasms are assigned to this type. Do not also assign a type from the \\'Anatomical Abnormality\\' hierarchy.', 'neop', 1191, NULL, 'STY'),\n",
+    "(7, 'Genes and Molecular Sequences', 'GENE', 11235, 'Gene or Genome', 5, 'A specific sequence, or in the case of the genome the complete sequence, of nucleotides along a molecule of DNA or RNA (in the case of some viruses) which represent the functional units of heredity.', 'Alleles; Genome, Human; rRNA Operon', '{isa} Fully Formed Anatomical Structure', 'A1.2.3.5', NULL, 'gngm', 1028, NULL, 'STY'),\n",
+    "(7, 'Genes and Molecular Sequences', 'GENE', 12153, 'Molecular Sequence', 5, 'A broad type for grouping the collected sequences of amino acids, carbohydrates, and nucleotide sequences. Descriptions of these sequences are generally reported in the published literature and/or are deposited in and maintained by databanks such as GenBank, European Molecular Biology Laboratory (EMBL), National Biomedical Research Foundation (NBRF), or other sequence repositories.', 'Genetic Code; Homologous Sequences; Molecular Sequence', '{isa} Spatial Concept; {inverse_isa} Nucleotide Sequence; {inverse_isa} Amino Acid Sequence; {inverse_isa} Carbohydrate Sequence', 'A2.1.5.3', NULL, 'mosq', 1085, NULL, 'STY'),\n",
+    "(7, 'Genes and Molecular Sequences', 'GENE', 121531, 'Nucleotide Sequence', 6, 'The sequence of purines and pyrimidines in nucleic acids and polynucleotides. Included here are nucleotide-rich regions, conserved sequence, and DNA transforming region.', 'Base Sequence; Direct Repeat; RNA Sequence', '{isa} Molecular Sequence', 'A2.1.5.3.1', NULL, 'nusq', 1086, NULL, 'STY'),\n",
+    "(7, 'Genes and Molecular Sequences', 'GENE', 121532, 'Amino Acid Sequence', 6, 'The sequence of amino acids as arrayed in chains, sheets, etc., within the protein molecule. It is of fundamental importance in determining protein structure.', 'Signal Peptides; Homologous Sequences, Amino Acid; Abnormal amino acid sequence', '{isa} Molecular Sequence', 'A2.1.5.3.2', NULL, 'amas', 1087, NULL, 'STY'),\n",
+    "(7, 'Genes and Molecular Sequences', 'GENE', 121533, 'Carbohydrate Sequence', 6, 'The sequence of carbohydrates within polysaccharides, glycoproteins, and glycolipids.', 'Carbohydrate Sequence; Abnormal carbohydrate sequence', '{isa} Molecular Sequence', 'A2.1.5.3.3', NULL, 'crbs', 1088, NULL, 'STY'),\n",
+    "(8, 'Geographic Areas', 'GEOG', 12154, 'Geographic Area', 5, 'A geographic location, generally having definite boundaries.', 'Baltimore; Canada; Far East', '{isa} Spatial Concept', 'A2.1.5.4', NULL, 'geoa', 1083, NULL, 'STY'),\n",
+    "(9, 'Living Beings', 'LIVE', 111, 'Organism', 3, 'Generally, a living individual, including all plants and animals.', 'Organism; Infectious agent; Heterotroph', '{isa} Physical Object; {inverse_isa} Virus; {inverse_isa} Bacterium; {inverse_isa} Archaeon; {inverse_isa} Eukaryote', 'A1.1', NULL, 'orgm', 1001, NULL, 'STY'),\n",
+    "(9, 'Living Beings', 'LIVE', 129, 'Group', 3, 'A conceptual entity referring to the classification of individuals according to certain shared characteristics.', 'Focus Groups; jury; teams', '{isa} Conceptual Entity; {inverse_isa} Professional or Occupational Group; {inverse_isa} Population Group; {inverse_isa} Family Group; {inverse_isa} Age Group; {inverse_isa} Patient or Disabled Group', 'A2.9', 'Few concepts will be assigned to this broad type.', 'grup', 1096, NULL, 'STY'),\n",
+    "(9, 'Living Beings', 'LIVE', 1111, 'Archaeon', 4, 'A member of one of the three domains of life, formerly called Archaebacteria under the taxon Bacteria, but now considered separate and distinct. Archaea are characterized by: 1) the presence of characteristic tRNAs and ribosomal RNAs; 2) the absence of peptidoglycan cell walls; 3) the presence of ether-linked lipids built from branched-chain subunits; and 4) their occurrence in unusual habitats. While archaea resemble bacteria in morphology and genomic organization, they resemble eukarya in their method of genomic replication.', 'Thermoproteales; Haloferax volcanii; Methanospirillum', '{isa} Organism', 'A1.1.1', NULL, 'arch', 1194, NULL, 'STY'),\n",
+    "(9, 'Living Beings', 'LIVE', 1112, 'Bacterium', 4, 'A small, typically one-celled, prokaryotic micro-organism.', 'Acetobacter; Bacillus cereus; Cytophaga', '{isa} Organism', 'A1.1.2', NULL, 'bact', 1007, NULL, 'STY'),\n",
+    "(9, 'Living Beings', 'LIVE', 1113, 'Eukaryote', 4, 'One of the three domains of life (the others being Bacteria and Archaea), also called Eukarya. These are organisms whose cells are enclosed in membranes and possess a nucleus. They comprise almost all multicellular and many unicellular organisms, and are traditionally divided into groups (sometimes called kingdoms) including Animals, Plants, Fungi, various Algae, and other taxa that were previously part of the old kingdom Protista.', 'Order Acarina; Bees; Plasmodium malariae', '{isa} Organism; {inverse_isa} Plant; {inverse_isa} Fungus; {inverse_isa} Animal', 'A1.1.3', NULL, 'euka', 1204, NULL, 'STY'),\n",
+    "(9, 'Living Beings', 'LIVE', 1114, 'Virus', 4, 'An organism consisting of a core of a single nucleic acid enclosed in a protective coat of protein. A virus may replicate only inside a host living cell. A virus exhibits some but not all of the usual characteristics of living things.', 'Coliphages; Echovirus; Parvoviridae', '{isa} Organism', 'A1.1.4', NULL, 'virs', 1005, NULL, 'STY'),\n",
+    "(9, 'Living Beings', 'LIVE', 1291, 'Professional or Occupational Group', 4, 'An individual or individuals classified according to their vocation.', 'Clergy; Demographers; Hospital Volunteers', '{isa} Group', 'A2.9.1', 'If the concept refers to the discipline or vocation itself, rather than to the individuals who have the vocation, then the type \\'Occupation or Discipline\\' will be assigned instead.', 'prog', 1097, NULL, 'STY'),\n",
+    "(9, 'Living Beings', 'LIVE', 1292, 'Population Group', 4, 'An indivdual or individuals classified according to their sex, racial origin, religion, common place of living, financial or social status, or some other cultural or behavioral attribute.', 'Asian Americans; Ethnic group; Adult Offenders', '{isa} Group', 'A2.9.2', NULL, 'popg', 1098, NULL, 'STY'),\n",
+    "(9, 'Living Beings', 'LIVE', 1293, 'Family Group', 4, 'An individual or individuals classified according to their family relationships or relative position in the family unit.', 'Daughter; Is an only child; Unmarried Fathers', '{isa} Group', 'A2.9.3', NULL, 'famg', 1099, NULL, 'STY'),\n",
+    "(9, 'Living Beings', 'LIVE', 1294, 'Age Group', 4, 'An individual or individuals classified according to their age.', 'Adult; Infant, Premature; Adolescent (age group)', '{isa} Group', 'A2.9.4', NULL, 'aggp', 1100, NULL, 'STY'),\n",
+    "(9, 'Living Beings', 'LIVE', 1295, 'Patient or Disabled Group', 4, 'An individual or individuals classified according to a disability, disease, condition or treatment.', 'Amputees; Institutionalized Child; Mentally Ill Persons', '{isa} Group', 'A2.9.5', NULL, 'podg', 1101, NULL, 'STY'),\n",
+    "(9, 'Living Beings', 'LIVE', 11131, 'Animal', 5, 'An organism with eukaryotic cells, and lacking stiff cell walls, plastids and photosynthetic pigments.', 'Animals; Animals, Laboratory; Carnivore', '{isa} Eukaryote; {inverse_isa} Vertebrate', 'A1.1.3.1', NULL, 'anim', 1008, NULL, 'STY'),\n",
+    "(9, 'Living Beings', 'LIVE', 11132, 'Fungus', 5, 'A eukaryotic organism characterized by the absence of chlorophyll and the presence of a rigid cell wall. Included here are both slime molds and true fungi such as yeasts, molds, mildews, and mushrooms.', 'Aspergillus clavatus; Blastomyces; Neurospora', '{isa} Eukaryote', 'A1.1.3.2', NULL, 'fngs', 1004, NULL, 'STY'),\n",
+    "(9, 'Living Beings', 'LIVE', 11133, 'Plant', 5, 'An organism having cellulose cell walls, growing by synthesis of inorganic substances, generally distinguished by the presence of chlorophyll, and lacking the power of locomotion. Plant parts are included here as well.', 'Aloe; Pollen; Helianthus species', '{isa} Eukaryote', 'A1.1.3.3', NULL, 'plnt', 1002, NULL, 'STY'),\n",
+    "(9, 'Living Beings', 'LIVE', 111311, 'Vertebrate', 6, 'An animal which has a spinal column.', 'Vertebrates; Gnathostomata vertebrate; Craniata <chordata>', '{isa} Animal; {inverse_isa} Amphibian; {inverse_isa} Bird; {inverse_isa} Fish; {inverse_isa} Reptile; {inverse_isa} Mammal', 'A1.1.3.1.1', 'Few concepts will be assigned to this broad type.', 'vtbt', 1010, NULL, 'STY'),\n",
+    "(9, 'Living Beings', 'LIVE', 1113111, 'Amphibian', 7, 'A cold-blooded, smooth-skinned vertebrate which characteristically hatches as an aquatic larva, breathing by gills. When mature, the amphibian breathes with lungs.', 'Salamandra; Urodela; Brazilian horned frog', '{isa} Vertebrate', 'A1.1.3.1.1.1', NULL, 'amph', 1011, NULL, 'STY'),\n",
+    "(9, 'Living Beings', 'LIVE', 1113112, 'Bird', 7, 'A vertebrate having a constant body temperature and characterized by the presence of feathers.', 'Serinus; Ducks; Quail', '{isa} Vertebrate', 'A1.1.3.1.1.2', NULL, 'bird', 1012, NULL, 'STY'),\n",
+    "(9, 'Living Beings', 'LIVE', 1113113, 'Fish', 7, 'A cold-blooded aquatic vertebrate characterized by fins and breathing by gills. Included here are fishes having either a bony skeleton, such as a perch, or a cartilaginous skeleton, such as a shark, or those lacking a jaw, such as a lamprey or hagfish.', 'Bass; Salmonidae; Whitefish', '{isa} Vertebrate', 'A1.1.3.1.1.3', NULL, 'fish', 1013, NULL, 'STY'),\n",
+    "(9, 'Living Beings', 'LIVE', 1113114, 'Mammal', 7, 'A vertebrate having a constant body temperature and characterized by the presence of hair, mammary glands and sweat glands.', 'Ursidae Family; Hamsters; Macaca', '{isa} Vertebrate; {inverse_isa} Human', 'A1.1.3.1.1.4', NULL, 'mamm', 1015, NULL, 'STY'),\n",
+    "(9, 'Living Beings', 'LIVE', 1113115, 'Reptile', 7, 'A cold-blooded vertebrate having an external covering of scales or horny plates. Reptiles breathe by means of lungs and are generally egg-laying.', 'Alligators; Water Mocassin; Genus Python (organism)', '{isa} Vertebrate', 'A1.1.3.1.1.5', NULL, 'rept', 1014, NULL, 'STY'),\n",
+    "(9, 'Living Beings', 'LIVE', 11131141, 'Human', 8, 'Modern man, the only remaining species of the Homo genus.', 'Homo sapiens; jean piaget; Member of public', '{isa} Mammal', 'A1.1.3.1.1.4.1', 'If a concept describes a human being from the point of view of occupational, family, social status, etc., then a type from the \\'Group\\' hierarchy will be assigned instead.', 'humn', 1016, NULL, 'STY'),\n",
+    "(10, 'Objects', 'OBJC', 1, 'Entity', 1, 'A broad type for grouping physical and conceptual entities.', 'Gifts, Financial; Image; Product Part', '{inverse_isa} Physical Object; {inverse_isa} Conceptual Entity', 'A', 'Few concepts will be assigned to this broad type.', 'enty', 1071, NULL, 'STY'),\n",
+    "(10, 'Objects', 'OBJC', 11, 'Physical Object', 2, 'An object perceptible to the sense of vision or touch.', 'Printed Media; Meteors; Physical object', '{isa} Entity; {inverse_isa} Organism; {inverse_isa} Anatomical Structure; {inverse_isa} Manufactured Object; {inverse_isa} Substance', 'A1', NULL, 'phob', 1072, NULL, 'STY'),\n",
+    "(10, 'Objects', 'OBJC', 113, 'Manufactured Object', 3, 'A physical object made by human beings.', 'car seat; Cooking and Eating Utensils; Goggles', '{isa} Physical Object; {inverse_isa} Medical Device; {inverse_isa} Research Device; {inverse_isa} Clinical Drug', 'A1.3', NULL, 'mnob', 1073, NULL, 'STY'),\n",
+    "(10, 'Objects', 'OBJC', 114, 'Substance', 3, 'A material with definite or fairly definite chemical composition.', 'Air (substance); Fossils; Plastics', '{isa} Physical Object; {inverse_isa} Body Substance; {inverse_isa} Chemical; {inverse_isa} Food', 'A1.4', NULL, 'sbst', 1167, NULL, 'STY'),\n",
+    "(10, 'Objects', 'OBJC', 1143, 'Food', 4, 'Any substance generally containing nutrients, such as carbohydrates, proteins, and fats, that can be ingested by a living organism and metabolized into energy and body tissue. Some foods are naturally occurring, others are either partially or entirely made by humans.', 'Beverages; Egg Yolk (Dietary); Ice Cream', '{isa} Substance', 'A1.4.3', 'Food additives, food preservatives, and food dyes should be given the type \\'Chemical Viewed Functionally\\'; \\\"Diet Coke\\\" would be assigned this type.', 'food', 1168, NULL, 'STY'),\n",
+    "(11, 'Occupations', 'OCCU', 126, 'Occupation or Discipline', 3, 'A vocation, academic discipline, or field of study, or a subpart of an occupation or discipline.', 'Aviation; Craniology; Ecology', '{isa} Conceptual Entity; {inverse_isa} Biomedical Occupation or Discipline', 'A2.6', 'If the concept refers to the individuals who have the vocation, the type \\'Professional or Occupational Group\\' will be assigned instead.', 'ocdi', 1090, NULL, 'STY'),\n",
+    "(11, 'Occupations', 'OCCU', 1261, 'Biomedical Occupation or Discipline', 4, 'A vocation, academic discipline, or field of study related to biomedicine.', 'Adolescent Medicine; Cellular Neurobiology; Dentistry', '{isa} Occupation or Discipline', 'A2.6.1', NULL, 'bmod', 1091, NULL, 'STY'),\n",
+    "(12, 'Organizations', 'ORGA', 127, 'Organization', 3, 'The result of uniting for a common purpose or function. The continued existence of an organization is not dependent on any of its members, its location, or particular facility. Components or subparts of organizations are also included here. Although the names of organizations are sometimes used to refer to the buildings in which they reside, they are not inherently physical in nature.', 'Labor Unions; United Nations; Boarding school', '{isa} Conceptual Entity; {inverse_isa} Health Care Related Organization; {inverse_isa} Professional Society; {inverse_isa} Self-help or Relief Organization', 'A2.7', NULL, 'orgt', 1092, NULL, 'STY'),\n",
+    "(12, 'Organizations', 'ORGA', 1271, 'Health Care Related Organization', 4, 'An established organization which carries out specific functions related to health care delivery or research in the life sciences.', 'Centers for Disease Control and Prevention (U.S.); Halfway Houses; Hospitals, Pediatric', '{isa} Organization', 'A2.7.1', 'Concepts for health care related professional societies are assigned the type \\'Professional Society\\'.', 'hcro', 1093, NULL, 'STY'),\n",
+    "(12, 'Organizations', 'ORGA', 1272, 'Professional Society', 4, 'An organization uniting those who have a common vocation or who are involved with a common field of study.', 'American Medical Association; International Council of Nurses; Library', '{isa} Organization', 'A2.7.2', NULL, 'pros', 1094, NULL, 'STY'),\n",
+    "(12, 'Organizations', 'ORGA', 1273, 'Self-help or Relief Organization', 4, 'An organization whose purpose and function is to provide assistance to the needy or to offer support to those sharing similar problems.', 'Alcoholics Anonymous; Charities - organization; Red Cross', '{isa} Organization', 'A2.7.3', NULL, 'shro', 1095, NULL, 'STY'),\n",
+    "(13, 'Phenomena', 'PHEN', 22, 'Phenomenon or Process', 2, 'A process or state which occurs naturally or as a result of an activity.', 'Disasters; Motor Traffic Accidents; Depolymerization', '{isa} Event; {inverse_isa} Injury or Poisoning; {inverse_isa} Human-caused Phenomenon or Process; {inverse_isa} Natural Phenomenon or Process', 'B2', NULL, 'phpr', 1067, NULL, 'STY'),\n",
+    "(13, 'Phenomena', 'PHEN', 221, 'Human-caused Phenomenon or Process', 3, 'A phenomenon or process that is a result of the activities of human beings.', 'Baby Boom; Cultural Evolution; Mass Media', '{isa} Phenomenon or Process; {inverse_isa} Environmental Effect of Humans', 'B2.1', 'If the concept refers to the activity itself, rather than the result of that activity, a type from the \\'Activity\\' hierarchy will be assigned instead.', 'hcpp', 1068, NULL, 'STY'),\n",
+    "(13, 'Phenomena', 'PHEN', 222, 'Natural Phenomenon or Process', 3, 'A phenomenon or process that occurs irrespective of the activities of human beings.', 'Air Movements; Corrosion; Lightning (phenomenon)', '{isa} Phenomenon or Process; {inverse_isa} Biologic Function', 'B2.2', NULL, 'npop', 1070, NULL, 'STY'),\n",
+    "(13, 'Phenomena', 'PHEN', 1221, 'Laboratory or Test Result', 4, 'The outcome of a specific test to measure an attribute or to determine the presence, absence, or degree of a condition.', 'Blood Flow Velocity; Serum Calcium Level; Spinal Fluid Pressure', '{isa} Finding', 'A2.2.1', 'Laboratory or test results are considered inherently quantitative and, thus, are not assigned the additional type \\'Quantitative Concept\\'.', 'lbtr', 1034, NULL, 'STY');\n",
+    "INSERT INTO `semantic_network` (`SemanticGroupCode`, `SemanticGroup`, `SemanticGroupAbr`, `CustomTreeNumber`, `SemanticTypeName`, `BranchPosition`, `Definition`, `Examples`, `RelationName`, `SemTypeTreeNo`, `UsageNote`, `Abbreviation`, `UniqueID`, `NonHumanFlag`, `RecordType`) VALUES\n",
+    "(13, 'Phenomena', 'PHEN', 2211, 'Environmental Effect of Humans', 4, 'A change in the natural environment that is a result of the activities of human beings.', 'Air Pollution; Desertification; Bioremediation', '{isa} Human-caused Phenomenon or Process', 'B2.1.1', NULL, 'eehu', 1069, NULL, 'STY'),\n",
+    "(13, 'Phenomena', 'PHEN', 2221, 'Biologic Function', 4, 'A state, activity or process of the body or one of its systems or parts.', 'Antibody Formation; Drug resistance; Homeostasis', '{isa} Natural Phenomenon or Process; {inverse_isa} Physiologic Function; {inverse_isa} Pathologic Function', 'B2.2.1', 'Few concepts will be assigned to this broad type.', 'biof', 1038, 'Y', 'STY'),\n",
+    "(14, 'Physiology', 'PHYS', 123, 'Organism Attribute', 3, 'A property of the organism or its major parts.', 'Age; Birth Weight; Eye Color', '{isa} Conceptual Entity; {inverse_isa} Clinical Attribute', 'A2.3', NULL, 'orga', 1032, 'Y', 'STY'),\n",
+    "(14, 'Physiology', 'PHYS', 1231, 'Clinical Attribute', 4, 'An observable or measurable property or state of an organism of clinical interest.', 'Bone Density; heart rate; Range of Motion, Articular', '{isa} Organism Attribute', 'A2.3.1', 'These are the attributes that are being evaluated or measured, not the results of the evaluation.', 'clna', 1201, NULL, 'STY'),\n",
+    "(14, 'Physiology', 'PHYS', 22211, 'Physiologic Function', 5, 'A normal process, activity, or state of the body.', 'Biorhythms; Hearing; Vasodilation', '{isa} Biologic Function; {inverse_isa} Organism Function; {inverse_isa} Organ or Tissue Function; {inverse_isa} Cell Function; {inverse_isa} Molecular Function', 'B2.2.1.1', NULL, 'phsf', 1039, NULL, 'STY'),\n",
+    "(14, 'Physiology', 'PHYS', 222111, 'Organism Function', 6, 'A physiologic function of the organism as a whole, of multiple organ systems, or of multiple organs or tissues.', 'Breeding; Hibernation; Motor Skills', '{isa} Physiologic Function; {inverse_isa} Mental Process', 'B2.2.1.1.1', NULL, 'orgf', 1040, NULL, 'STY'),\n",
+    "(14, 'Physiology', 'PHYS', 222112, 'Organ or Tissue Function', 6, 'A physiologic function of a particular organ, organ system, or tissue.', 'Osteogenesis; Renal Circulation; Tooth Calcification', '{isa} Physiologic Function', 'B2.2.1.1.2', NULL, 'ortf', 1042, NULL, 'STY'),\n",
+    "(14, 'Physiology', 'PHYS', 222113, 'Cell Function', 6, 'A physiologic function inherent to cells or cell components.', 'Cell Cycle; Cell division; Phagocytosis', '{isa} Physiologic Function', 'B2.2.1.1.3', NULL, 'celf', 1043, NULL, 'STY'),\n",
+    "(14, 'Physiology', 'PHYS', 222114, 'Molecular Function', 6, 'A physiologic function occurring at the molecular level.', 'Binding, Competitive; Electron Transport; Glycolysis', '{isa} Physiologic Function; {inverse_isa} Genetic Function', 'B2.2.1.1.4', NULL, 'moft', 1044, NULL, 'STY'),\n",
+    "(14, 'Physiology', 'PHYS', 2221111, 'Mental Process', 7, 'A physiologic function involving the mind or cognitive processing.', 'Anger; Auditory Fatigue; Avoidance Learning', '{isa} Organism Function', 'B2.2.1.1.1.1', NULL, 'menp', 1041, NULL, 'STY'),\n",
+    "(14, 'Physiology', 'PHYS', 2221141, 'Genetic Function', 7, 'Functions of or related to the maintenance, translation or expression of the genetic material.', 'Early Gene Transcription; Gene Amplification; RNA Splicing', '{isa} Molecular Function', 'B2.2.1.1.4.1', NULL, 'genf', 1045, NULL, 'STY'),\n",
+    "(15, 'Procedures', 'PROC', 2131, 'Health Care Activity', 4, 'An activity of or relating to the practice of medicine or involving the care of patients.', 'ambulatory care services; Clinic Activities; Preventive Health Services', '{isa} Occupational Activity; {inverse_isa} Laboratory Procedure; {inverse_isa} Diagnostic Procedure; {inverse_isa} Therapeutic or Preventive Procedure', 'B1.3.1', NULL, 'hlca', 1058, NULL, 'STY'),\n",
+    "(15, 'Procedures', 'PROC', 2132, 'Research Activity', 4, 'An activity carried out as part of research or experimentation.', 'Animal Experimentation; Biomedical Research; Experimental Replication', '{isa} Occupational Activity; {inverse_isa} Molecular Biology Research Technique', 'B1.3.2', 'In some cases, a concept may be assigned to both this type and the type \\'Intellectual Product\\'. For example, the concept \\\"Comparative Study\\\" might be viewed as both an activity and the result, or product, of that activity.', 'resa', 1062, NULL, 'STY'),\n",
+    "(15, 'Procedures', 'PROC', 2134, 'Educational Activity', 4, 'An activity related to the organization and provision of education.', 'Academic Training; Family Planning Training; Preceptorship', '{isa} Occupational Activity', 'B1.3.4', NULL, 'edac', 1065, NULL, 'STY'),\n",
+    "(15, 'Procedures', 'PROC', 21311, 'Laboratory Procedure', 5, 'A procedure, method, or technique used to determine the composition, quantity, or concentration of a specimen, and which is carried out in a clinical laboratory. Included here are procedures which measure the times and rates of reactions.', 'Blood Protein Electrophoresis; Crystallography; Radioimmunoassay', '{isa} Health Care Activity', 'B1.3.1.1', NULL, 'lbpr', 1059, NULL, 'STY'),\n",
+    "(15, 'Procedures', 'PROC', 21312, 'Diagnostic Procedure', 5, 'A procedure, method, or technique used to determine the nature or identity of a disease or disorder. This excludes procedures which are primarily carried out on specimens in a laboratory.', 'Biopsy; Heart Auscultation; Magnetic Resonance Imaging', '{isa} Health Care Activity', 'B1.3.1.2', NULL, 'diap', 1060, NULL, 'STY'),\n",
+    "(15, 'Procedures', 'PROC', 21313, 'Therapeutic or Preventive Procedure', 5, 'A procedure, method, or technique designed to prevent a disease or a disorder, or to improve physical function, or used in the process of treating a disease or injury.', 'Cesarean section; Dermabrasion; Family psychotherapy', '{isa} Health Care Activity', 'B1.3.1.3', NULL, 'topp', 1061, NULL, 'STY'),\n",
+    "(15, 'Procedures', 'PROC', 21321, 'Molecular Biology Research Technique', 5, 'Any of the techniques used in the study of or the directed modification of the gene complement of a living organism.', 'Northern Blotting; Genetic Engineering; In Situ Hybridization', '{isa} Research Activity', 'B1.3.2.1', NULL, 'mbrt', 1063, NULL, 'STY');\n",
+    "'''"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.5"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/07_UI_building.ipynb b/07_UI_building.ipynb
new file mode 100644
index 0000000..88aa041
--- /dev/null
+++ b/07_UI_building.ipynb
@@ -0,0 +1,135 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Part 7. UI building\n",
+    "App to analyze web-site search logs (internal search)<br>\n",
+    "**This script:** Build UI information<br>\n",
+    "Authors: dan.wendling@nih.gov, <br>\n",
+    "Last modified: 2018-09-09\n",
+    "\n",
+    "## Script contents\n",
+    "\n",
+    "\n",
+    "## FIXMEs\n",
+    "\n",
+    "Things Dan wrote for Dan; modify as needed. There are more FIXMEs in context.\n",
+    "\n",
+    "* [ ] \n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# 1. Start-up / What to put into place, where\n",
+    "# ============================================\n",
+    "\n",
+    "import pandas as pd\n",
+    "import matplotlib.pyplot as plt\n",
+    "from matplotlib.pyplot import pie, axis, show\n",
+    "import numpy as np\n",
+    "import os\n",
+    "\n",
+    "''' 100-percent content inventory from SEO Spider or other. Our allPages \n",
+    "  dataframe is based on a 100-percent content inventory, so we \n",
+    "  can analyze pages with zero traffic or zero searches. Also includes\n",
+    "  the page title, date the page was last updated - lots of rich info.\n",
+    "- Summary stats by communication package, from content inventory.'''\n",
+    "contentInventoryFileName = '00 SourceFiles/page.csv'\n",
+    "packageSummaryFileName = '00 SourceFiles/group.csv'\n",
+    "\n",
+    "''' Traffic log. This script assumes Google Analytics unsampled report;  \n",
+    "references two column names: Page and Unique Pageviews. I export \n",
+    "report header so I'll know later what is in the file, which means my \n",
+    "import command skips the first ~6 rows.'''\n",
+    "newTrafficFileName = '00 SourceFiles/Pages_Q2.csv'\n",
+    "\n",
+    "'''\n",
+    "The following custom dictionary files need to be in place in /01/Pre-process\n",
+    "\n",
+    "GoldStandard.csv - Already-assigned term list, from UMLS and other sources, \n",
+    "    vetted.\n",
+    "NamedEntities.csv - Known entities such as person names, product names, acronyms, \n",
+    "    abbreviations, org parts, etc. Will overlap with GoldStandard; however, \n",
+    "    UPDATE THIS FILE and this will replicate over to GoldStandard.\n",
+    "MisspelledOrForeign.csv - Short list of frequently misspelled words with HIGH\n",
+    "    confidence that they can be replaced without review. Okay to include\n",
+    "    foreign words.\n",
+    "'''\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 2. Plots"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Pie for percentage of rows assigned; https://pythonspot.com/matplotlib-pie-chart/\n",
+    "totCount = len(logWithGoldStandard)\n",
+    "unassigned = logWithGoldStandard['SemanticGroup'].isnull().sum()\n",
+    "assigned = totCount - unassigned\n",
+    "labels = ['Assigned', 'Unassigned']\n",
+    "sizes = [assigned, unassigned]\n",
+    "colors = ['lightskyblue', 'lightcoral']\n",
+    "explode = (0.1, 0)  # explode 1st slice\n",
+    "plt.pie(sizes, explode=explode, labels=labels, colors=colors,\n",
+    "        autopct='%1.f%%', shadow=True, startangle=100)\n",
+    "plt.axis('equal')\n",
+    "plt.title(\"Status after 'GoldStandard' processing\")\n",
+    "plt.show()\n",
+    "\n",
+    "\n",
+    "# Bar of SemanticGroup categories, horizontal\n",
+    "# Source: http://robertmitchellv.com/blog-bar-chart-annotations-pandas-mpl.html\n",
+    "ax = logWithGoldStandard['SemanticGroup'].value_counts().plot(kind='barh', figsize=(10,6),\n",
+    "                                                 color=\"slateblue\", fontsize=10);\n",
+    "ax.set_alpha(0.8)\n",
+    "ax.set_title(\"Categories assigned after 'GoldStandard' processing\", fontsize=14)\n",
+    "ax.set_xlabel(\"Number of searches\", fontsize=9);\n",
+    "# set individual bar lables using above list\n",
+    "for i in ax.patches:\n",
+    "    # get_width pulls left or right; get_y pushes up or down\n",
+    "    ax.text(i.get_width()+.1, i.get_y()+.31, \\\n",
+    "            str(round((i.get_width()), 2)), fontsize=9, color='dimgrey')\n",
+    "# invert for largest on top \n",
+    "ax.invert_yaxis()\n",
+    "plt.gcf().subplots_adjust(left=0.3)\n"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.5"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/08_Misc_fixes.ipynb b/08_Misc_fixes.ipynb
new file mode 100644
index 0000000..a0c50ee
--- /dev/null
+++ b/08_Misc_fixes.ipynb
@@ -0,0 +1,306 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Part 8. Misc fixes\n",
+    "App to analyze web-site search logs (internal search)<br>\n",
+    "**This script:** Re-usable code that doesn't belong anywhere in particular<br>\n",
+    "Authors: dan.wendling@nih.gov, <br>\n",
+    "Last modified: 2018-09-09\n",
+    "\n",
+    "## Script contents\n",
+    "\n",
+    "\n",
+    "## FIXMEs\n",
+    "\n",
+    "Things Dan wrote for Dan; modify as needed. There are more FIXMEs in context.\n",
+    "\n",
+    "* [ ] \n",
+    "\n",
+    "\n",
+    "Found this useful: https://stackoverflow.com/questions/tagged/matplotlib\n",
+    "\n",
+    "## 1. Start-up / What to put into place, where\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import pandas as pd\n",
+    "import numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "from matplotlib.pyplot import pie, axis, show\n",
+    "import os\n",
+    "\n",
+    "# Set working directory\n",
+    "os.chdir('/Users/wendlingd/Projects/webDS/_util')\n",
+    "\n",
+    "localDir = '08_Misc_fixes/' # Different than others, see about changing\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# 2. Load and clean logAfterUmlsApi1\n",
+    "# ===================================\n",
+    "\n",
+    "logAfterUmlsApi1 = pd.read_excel('02_Run_APIs_files/logAfterUmlsApi1.xlsx')\n",
+    "\n",
+    "logAfterUmlsApi1.loc[logAfterUmlsApi1['preferredTerm'].str.contains('^BLAST (physical force)', na=False), 'preferredTerm'] = 'Bibliographic Entity'\n",
+    "\n",
+    "\n",
+    "logAfterUmlsApi1['preferredTerm'] = logAfterUmlsApi1['preferredTerm'].str.replace(\"^BLAST \\(physical force\\)\", \"^BLAST$\", regex=True)\n",
+    "\n",
+    "logAfterUmlsApi1['preferredTerm'] = logAfterUmlsApi1['preferredTerm'].str.replace(\"BLAST Link\", \"BLAST\", regex=False)\n",
+    "\n",
+    "huh = logAfterUmlsApi1[logAfterUmlsApi1.adjustedQueryCase.str.startswith(\"blast\") == True] # retrieve records to eyeball\n",
+    "huh = huh.groupby('preferredTerm').size()\n",
+    "\n",
+    "logAfterUmlsApi1['preferredTerm'] = logAfterUmlsApi1['preferredTerm'].str.replace('Bibliographic Reference', 'Bibliographic Entity')\n",
+    "\n",
+    "logAfterUmlsApi1['preferredTerm'] = logAfterUmlsApi1['preferredTerm'].str.replace('Mesh surgical material', 'MeSH')\n",
+    "\n",
+    "# Write out the fixed file\n",
+    "writer = pd.ExcelWriter('02_Run_APIs_files/logAfterUmlsApi1.xlsx')\n",
+    "logAfterUmlsApi1.to_excel(writer,'logAfterUmlsApi1')\n",
+    "# df2.to_excel(writer,'Sheet2')\n",
+    "writer.save()\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# VIEW PREVIOUS ASSIGNMENTS IN GoldStandard_master\n",
+    "\n",
+    "\n",
+    "from matplotlib.pyplot import pie, axis, show\n",
+    "import numpy as np\n",
+    "import os\n",
+    "import string\n",
+    "\n",
+    "\n",
+    "# Bring in historical file of (somewhat edited) matches\n",
+    "GoldStandard = localDir + 'GoldStandard_Master.xlsx'\n",
+    "GoldStandard = pd.read_excel(GoldStandard)\n",
+    "\n",
+    "GoldStandard = GoldStandard[pd.notnull(GoldStandard['SemanticGroup'])]\n",
+    "\n",
+    "'''\n",
+    "SELECT * FROM `manual_assignments` \n",
+    "WHERE preferredTerm IS NULL\n",
+    "ORDER BY NewSemanticTypeName` DESC\n",
+    "\n",
+    "\n",
+    "preferredTerm, SemanticTypeName, SemanticGroup\n",
+    "'''\n",
+    "\n",
+    "df2 = GoldStandard[GoldStandard.preferredTerm.str.contains(\"photo\") == True]\n",
+    "df2 = GoldStandard[GoldStandard.SemanticTypeName.str.contains(\"foreign\") == True]\n",
+    "df2 = GoldStandard[GoldStandard.SemanticGroup.str.contains(\"foreign\") == True]\n",
+    "\n",
+    "\n",
+    "\n",
+    "\n",
+    "df = df.groupby('adjustedQueryCase').size()\n",
+    "df = pd.DataFrame({'timesSearched':df})\n",
+    "\n",
+    "\n",
+    "GoldStandard = GoldStandard.sort_values(by='timesSearched', ascending=False)\n",
+    "GoldStandard = GoldStandard.reset_index()\n",
+    "\n",
+    "\n",
+    "sum1 = logAfterUmlsApi1.groupby('SemanticTypeName').size()\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "\n",
+    "# READ FROM SQL TO DATAFRAME\n",
+    "\n",
+    "\n",
+    "from sqlalchemy import create_engine\n",
+    "\n",
+    "dbconn = create_engine('mysql+mysqlconnector://wendlingd:DataSciPwr17@localhost/ia')\n",
+    "\n",
+    "\n",
+    "# Extract from MySQL to df\n",
+    "mayJuneLog = pd.read_sql('SELECT * FROM timeboundhmpglog', con=dbconn)\n",
+    "\n",
+    "\n",
+    "\n",
+    "# Write this to file (assuming multiple cycles)\n",
+    "writer = pd.ExcelWriter(localDir + 'mayJuneLog.xlsx')\n",
+    "mayJuneLog.to_excel(writer,'timeboundhmpglog')\n",
+    "writer.save()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "\n",
+    "# UPDATE SEMANTIC NETWORK table IN MYSQL\n",
+    "'''\n",
+    "DROP TABLE IF EXISTS `semantic_network`;\n",
+    "CREATE TABLE `semantic_network` (\n",
+    "    `semnet_id` INT PRIMARY KEY NOT NULL AUTO_INCREMENT,\n",
+    "    `SemanticGroupCode`  int(11) NOT NULL,\n",
+    "    `SemanticGroup` varchar(60) NOT NULL,\n",
+    "    `SemanticGroupAbr` varchar(10) NOT NULL,\n",
+    "    `CustomTreeNumber`  int(11) NOT NULL,\n",
+    "    `SemanticTypeName` varchar(100) NOT NULL,\n",
+    "    `BranchPosition`  int(11) NOT NULL,\n",
+    "    `Definition` varchar(200) NOT NULL,\n",
+    "    `Examples` varchar(100) NOT NULL,\n",
+    "    `RelationName` varchar(60) NOT NULL,\n",
+    "    `SemTypeTreeNo` varchar(60) NOT NULL,\n",
+    "    `UsageNote` varchar(60) NOT NULL,\n",
+    "    `Abbreviation` varchar(60) NOT NULL,\n",
+    "    `UniqueID` int(11) NOT NULL,\n",
+    "    `NonHumanFlag` varchar(60) NOT NULL,\n",
+    "    `RecordType` varchar(60) NOT NULL\n",
+    ") ENGINE=InnoDB DEFAULT CHARSET=utf8;\n",
+    "'''\n",
+    "\n",
+    "SemanticNetworkReference = pd.read_excel('01_Text_wrangling_files/SemanticNetworkReference.xlsx')\n",
+    "\n",
+    "SemanticNetworkReference.columns\n",
+    "\n",
+    "\n",
+    "# Add dataframe to MySQL\n",
+    "\n",
+    "import mysql.connector\n",
+    "from pandas.io import sql\n",
+    "from sqlalchemy import create_engine\n",
+    "\n",
+    "dbconn = create_engine('mysql+mysqlconnector://wendlingd:DataSciPwr17@localhost/ia')\n",
+    "\n",
+    "SemanticNetworkReference.to_sql(name='semantic_network', con=dbconn, if_exists = 'replace', index=False) # or if_exists='append'\n",
+    "\n",
+    "# Reduce to needed columns\n",
+    "listCol = SemanticNetworkReference[['SemanticGroupCode', 'SemanticGroup']]\n",
+    "\n",
+    "listCol = listCol.drop_duplicates('SemanticGroup')\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# RE-NAME categories\n",
+    "\n",
+    "'''\n",
+    "SemanticGroup\n",
+    "    Citation, PubMed strategy, complex, unclear, etc.\n",
+    "\n",
+    "logAfterFuzzyMatch\n",
+    "\n",
+    "'''\n",
+    "\n",
+    "\n",
+    "logAfterFuzzyMatch['preferredTerm'] = logAfterFuzzyMatch['preferredTerm'].str.replace('Bibliographic Entity', 'PubMed strategy, citation, unclear, etc.')\n",
+    "\n",
+    "logAfterFuzzyMatch.loc[logAfterFuzzyMatch['preferredTerm'].str.startswith('Bibliographic Entity', na=False), 'SemanticGroup'] = 'Unparsed'\n",
+    "logAfterFuzzyMatch.loc[logAfterFuzzyMatch['preferredTerm'].str.startswith('Bibliographic Entity', na=False), 'SemanticTypeName'] = 'Unparsed'\n",
+    "\n",
+    "\n",
+    "logAfterFuzzyMatch.loc[logAfterFuzzyMatch['preferredTerm'].str.contains('Numeric Entity', na=False), 'SemanticGroup'] = 'Accession Number'\n",
+    "logAfterFuzzyMatch.loc[logAfterFuzzyMatch['preferredTerm'].str.contains('Numeric Entity', na=False), 'SemanticTypeName'] = 'Accession Number'\n",
+    "\n",
+    "\n",
+    "writer = pd.ExcelWriter('03_Fuzzy_match_files/logAfterFuzzyMatch.xlsx')\n",
+    "logAfterFuzzyMatch.to_excel(writer,'logAfterFuzzyMatch')\n",
+    "# df2.to_excel(writer,'Sheet2')\n",
+    "writer.save()\n",
+    "\n",
+    "\n",
+    "\n",
+    "logAfterFuzzyMatch['adjustedQueryCase'] = logAfterFuzzyMatch['adjustedQueryCase'].str.replace('^benefits of ', '')\n",
+    "logAfterFuzzyMatch['adjustedQueryCase'] = logAfterFuzzyMatch['adjustedQueryCase'].str.replace('^cause of ', '')\n",
+    "logAfterFuzzyMatch['adjustedQueryCase'] = logAfterFuzzyMatch['adjustedQueryCase'].str.replace('^cause for ', '')\n",
+    "logAfterFuzzyMatch['adjustedQueryCase'] = logAfterFuzzyMatch['adjustedQueryCase'].str.replace('^causes for ', '')\n",
+    "logAfterFuzzyMatch['adjustedQueryCase'] = logAfterFuzzyMatch['adjustedQueryCase'].str.replace('^causes of ', '')\n",
+    "logAfterFuzzyMatch['adjustedQueryCase'] = logAfterFuzzyMatch['adjustedQueryCase'].str.replace('^definition for ', '')\n",
+    "logAfterFuzzyMatch['adjustedQueryCase'] = logAfterFuzzyMatch['adjustedQueryCase'].str.replace('^definition of ', '')\n",
+    "logAfterFuzzyMatch['adjustedQueryCase'] = logAfterFuzzyMatch['adjustedQueryCase'].str.replace('^effect of ', '')\n",
+    "logAfterFuzzyMatch['adjustedQueryCase'] = logAfterFuzzyMatch['adjustedQueryCase'].str.replace('^etiology of ', '')\n",
+    "logAfterFuzzyMatch['adjustedQueryCase'] = logAfterFuzzyMatch['adjustedQueryCase'].str.replace('^symptoms of ', '')\n",
+    "logAfterFuzzyMatch['adjustedQueryCase'] = logAfterFuzzyMatch['adjustedQueryCase'].str.replace('^treating ', '')\n",
+    "logAfterFuzzyMatch['adjustedQueryCase'] = logAfterFuzzyMatch['adjustedQueryCase'].str.replace('^treatment for ', '')\n",
+    "logAfterFuzzyMatch['adjustedQueryCase'] = logAfterFuzzyMatch['adjustedQueryCase'].str.replace('^treatments for ', '')\n",
+    "logAfterFuzzyMatch['adjustedQueryCase'] = logAfterFuzzyMatch['adjustedQueryCase'].str.replace('^treatment of ', '')\n",
+    "logAfterFuzzyMatch['adjustedQueryCase'] = logAfterFuzzyMatch['adjustedQueryCase'].str.replace('^what are ', '')\n",
+    "logAfterFuzzyMatch['adjustedQueryCase'] = logAfterFuzzyMatch['adjustedQueryCase'].str.replace('^what causes ', '')\n",
+    "logAfterFuzzyMatch['adjustedQueryCase'] = logAfterFuzzyMatch['adjustedQueryCase'].str.replace('^what is a ', '')\n",
+    "logAfterFuzzyMatch['adjustedQueryCase'] = logAfterFuzzyMatch['adjustedQueryCase'].str.replace('^what is ', '')\n",
+    "\n",
+    "\n",
+    "writer = pd.ExcelWriter('03_Fuzzy_match_files/logAfterFuzzyMatch.xlsx')\n",
+    "logAfterFuzzyMatch.to_excel(writer,'logAfterFuzzyMatch')\n",
+    "# df2.to_excel(writer,'Sheet2')\n",
+    "writer.save()\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "\n",
+    "logAfterFuzzyMatch = logAfterFuzzyMatch.replace(np.nan, 'Unparsed', regex=True)\n",
+    "\n",
+    "logAfterFuzzyMatch['preferredTerm'] = logAfterFuzzyMatch['preferredTerm'].str.replace('National Center for Biotechnology Information', 'NCBI')\n",
+    "\n",
+    "logAfterFuzzyMatch['preferredTerm'] = logAfterFuzzyMatch['preferredTerm'].str.replace('Formatted References for Authors of Journal Articles', 'Refs for J Article Authors')\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.5"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}

	SessionID	StaffYN	Referrer	Query	Timestamp	adjustedQueryCase
0	FCB8C84AEDB5855CDDB2F29E38C8C8D1	N	www.nlm.nih.gov/	lichen ruber mucosae	2018-07-30 07:48:01.000	lichen ruber mucosae
1	D052BA917FD4489BD63014BE6568670E	N	www.nlm.nih.gov/	molecular identification of marine bacteria	2018-07-30 01:14:26.000	molecular identification of marine bacteria
2	D052BA917FD4489BD63014BE6568670E	N	vsearch.nlm.nih.gov/vivisimo/cgi-bin/query-met...	molecular identification of marine fishes bact...	2018-07-30 01:23:06.000	molecular identification of marine fishes bact...
3	47C9DEE89B48E22FB53E2BE2DB107763	N	www.nlm.nih.gov/bsd/serfile_addedinfo.html	secondaries brain prognostic factors	2018-07-30 02:18:34.999	secondaries brain prognostic factors
4	993C3E958AB335FC500CB6BA0C03CBD8	N	vsearch.nlm.nih.gov/vivisimo/cgi-bin/query-met...	smoking&alzheimer's disease	2018-07-30 02:26:16.999	smoking&alzheimer's disease
	Top staff queries as entered	Count
0	1234	7
1	nnlm	4
2	dennis benson	4
3	errata	3
4	medline	2
5	itas	2
6	lister hill auditorium	2
7	nlm logo	2
8	strategic plan	2
9	sheridan	2
10	staff library	2
11	nichsr	2
12	turning the pages	2
13	sis	2
14	https://www.nlm.nih.gov/services/nlmchat.html	2
15	digital collections	2
16	locatorplus	2
17	urgoclean: the evidence base	1
18	mesh	1
19	nlm service and hours	1
	Top queries off of LAN, as entered	Count
0	search	49
1	diabetes	41
2	endnote	24
3	index medicus	24
4	mesh	23
5	cancer	22
6	pubmed	21
7	calcium channel blockers	20
8	international journal of scientific research	15
9	metabolic syndrome	14
10	rotator cuff injuries	13
11	stroke	13
12	keywords	12
13	nursing	12
14	breast cancer	11
15	suicide	11
16	depression	11
17	tuberculosis	10
18	heart	10
19	rxnorm	10
20	egg	10
21	adhd	9
22	icdk9 c18 acn乙腈洗脱	9
23	vancouver	9
24	an attempt at isolation and characterization o...	9