Skip to content

Commit

Permalink
updating README and use cases
Browse files Browse the repository at this point in the history
  • Loading branch information
jwa7 committed Mar 19, 2019
1 parent 530c17f commit 9a60c9f
Show file tree
Hide file tree
Showing 5 changed files with 28 additions and 8 deletions.
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -111,7 +111,7 @@ data/*
speedcom/frontend/output/*

#MAC
./.DS_Store
.DS_Store

#Testing file:
test_readData_temp.tsv
10 changes: 9 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,15 @@ Authors: **Joe Abbott**, **Ryan Beck**, **Hang Hu**, **Yang Liu**, **Lixin Lu**.

## Overview

_SPEEDCOM_ is an open source python package that aims to predict the fluorescence emission and absorption spectra of small conjugated organic molecules. These features are predicted using a neural network, implemented with [keras](https://github.com/keras-team/keras), and are trained on data from the [PhotochemCAD database](http://www.photochemcad.com/PhotochemCAD.html). The software has a graphical-user-interface (GUI) where users can input the [SMILES](https://en.wikipedia.org/wiki/Simplified_molecular-input_line-entry_system) string for a given molecule and be returned its predicted spectra and associated characteristic quantities. For further details on the background science, and the operations of our program, please see our [use cases](https://github.com/emissible/SPEEDCOM/blob/master/use_cases.md).
_SPEEDCOM_ is an open source python package that aims to predict the fluorescence emission and absorption spectra of small conjugated organic molecules. These features are predicted using a convolutional neural network, implemented with [keras](https://github.com/keras-team/keras), and trained on data from the [PhotochemCAD database](http://www.photochemcad.com/PhotochemCAD.html). The software has a graphical-user-interface (GUI) where users can input the [SMILES](https://en.wikipedia.org/wiki/Simplified_molecular-input_line-entry_system) string for a given molecule and be returned its predicted spectra and associated characteristic quantities. For further details on the background science, and the operations of our program, please see our [use cases](https://github.com/emissible/SPEEDCOM/blob/master/use_cases.md).

### GUI

Below shows the spectra and characteristics prediction of an example molecule, inputted via our GUI.

<p align="center"><img src="doc/source/_static/prediction_screenshot.png" alt="SPEEDCOM spectra prediction" title="SPEEDCOM spectra prediction"/></p>

<p align="center"><img src="doc/source/_static/charac_screenshot.png" alt="SPEEDCOM chracteristics prediction" title="SPEEDCOM characteristics prediction"/></p>


## Configuration
Expand Down
Binary file added doc/source/_static/charac_screenshot.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added doc/source/_static/prediction_screenshot.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
24 changes: 18 additions & 6 deletions doc/use_cases.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,15 +10,15 @@ Other quantities characteristic for a certain molecule are the **molar extinctio

## Objectives

The objective of _SPEEDCOM_ is simple: to use deep-learning to generate the fluorescence emission and absorption spectra of a small organic molecule quickly and accurately, without the need for running expensive and time consuming _ab intio_ calculations.
The objective of _SPEEDCOM_ is simple: to use a convolutional neural network to generate the emission and absorption spectra of small organic molecules quickly and accurately, without the need for running expensive and time consuming _ab intio_ calculations.

## Components

<p align="center"><img src="source/_static/flow_chart.png" alt="Use Flow Diagram" title="Flow Diagram"/></p>

### <u>Front End</u>

#### GUI
### GUI
User interactions are handled through our Graphical User Interface (GUI). Here the user inputs a SMILES string for the molecule for which they would like to generate a spectrum. This string is passed to the data-cleaning portion of our program in the back end.

The user can then choose between the generation of an emission or an absorption spectra for the fluorescence of their molecule. This is displayed in the GUI, along with the skeletal, 2D representation of their molecule. The user can check this structure to see if they are receiving the spectra for the molecule they are expecting to.
Expand All @@ -28,10 +28,22 @@ The user may also choose to download the spectra and associated calculated physi

### <u>Back End</u>

#### Data Cleaning
### Data Cleaning

The SMILES string for the molecule is received from the front end GUI. Any counterions present are removed from the structures as they have no fixed coordinates relative to the molecule as a whole, and have no tangible effects on the electronic behaviour of the molecule with regard to After being converted into a 3D geometry, the nuclear coordinates and atomic charges are used to generate a _Coulomb Matrix_, which is useful representation of molecular geometry. The molecule can also be represented
The SMILES string for the molecule is received from the front end GUI. Any counterions present are removed from the structures as they have no fixed coordinates relative to the molecule as a whole, and have no tangible effects on the electronic behaviour of the molecule with regard to its absorption and emission wavelengths.

#### Neural Network
### Neural Network

Because of the complexity and quantity of the descriptors attributed to each molecule, our program involves the utilization of a neural network to build the model from the predictors and predict outputs. This is implemented with [keras](https://github.com/keras-team/keras), an open-source high-level neural-network programming interface.
**Descriptors for the molecule may be generated in the following ways:**

1. After being converted into a 3D geometry, the nuclear coordinates and atomic charges are used to generate a _Coulomb Matrix_, which is useful representation of molecular geometry. This may be encoded and passed into a Deep Neural Network (DNN).

2. Using RDKit - an open source python package used for processing chemical data - a unique 'fingerprint' for the molecule may be generated, which can be passed into a Convolutional Neural Network (CNN).

3. Numerical encoding of this figerprint can be passed through a 'Long-Short-Term-Memory' Recurrent Neural Network (RNN).

4. However, the model that has given us the best accuracy upon predicting the spectroscopic details of molecules in our test data has been a CNN trained on numerically-encoded SMILES strings themselves.

**Implementation:**

Our CNN is implemented with [keras](https://github.com/keras-team/keras), an open-source high-level neural-network programming interface.

0 comments on commit 9a60c9f

Please sign in to comment.