NLP pipeline

Various jupyter nbs that preprocess different text inputs, vectorize and prepare the text for different machine learning models. And various supervised amnd unsupervised models to execute on the data.

** These scripts borrow heavily from Applied Text Analysis with Python By Benjamin Bengfort, Rebecca Bilbro and Tony Ojeda **

Precprocessing scripts

Takes both text and PDF docs previously dowloaded and creates a corpous reader, and leverages spacy to id sentences, tokenize text, remove stop words and creates lists of lists of preprocessed text. Finally, it pickles the files so they can be read from disk.

Additionally, data is then vectorized in various ways in preparation for use in the different ML models.

Modelling scripts

Takes preprocessed data and creates a pipeline of the variouos class obects, selects various models and trains and fits the models, generating various performance metrics.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Algo_Kmeans_w_textnormer_and_vectorizer.ipynb		Algo_Kmeans_w_textnormer_and_vectorizer.ipynb
Algo_LDA_w_textnormer.ipynb		Algo_LDA_w_textnormer.ipynb
Algo_classifier_w_textnormer_and_vectorizer.ipynb		Algo_classifier_w_textnormer_and_vectorizer.ipynb
Execution_LDA.ipynb		Execution_LDA.ipynb
Execution_classifier.ipynb		Execution_classifier.ipynb
Execution_kmeans.ipynb		Execution_kmeans.ipynb
Kfold_class.ipynb		Kfold_class.ipynb
Preproc_pipeline_PDF.ipynb		Preproc_pipeline_PDF.ipynb
Preproc_pipeline_Text.ipynb		Preproc_pipeline_Text.ipynb
Testing_Vectorizer_&_model_pipeline.ipynb		Testing_Vectorizer_&_model_pipeline.ipynb
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NLP pipeline

Precprocessing scripts

Modelling scripts

About

Releases

Packages

Languages

nebo333/NLP_pipeline

Folders and files

Latest commit

History

Repository files navigation

NLP pipeline

Precprocessing scripts

Modelling scripts

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages