TextMining

Here lies code used to extract entity from raw text.

See TokenPipeline for our two-stage token level entity recognition work.

See DataPreprocessing for the wtsv documents and codes used to transform them in form that can be understood by BERT-NER (.conll). Codes used to process jsons and generate conll files for unannotated articles are also available here.

See DataPostprocessing for codes regarding aligning output of two-stage model in article-level, after which we can get articles with their corresponding concepts and evaluate performance of the whole model. Query_generator is a file used to retrieve top-k articles using NBC, which can automatize the retrieval of articles used in human judgement.

See TextClassification for a very simple example showing the performance of sentence-level classification.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
ConceptErrorAnalysis		ConceptErrorAnalysis
DataPostProcessing		DataPostProcessing
DataPreProcessing		DataPreProcessing
RetrievalErrorAnalysis		RetrievalErrorAnalysis
TextClassification		TextClassification
TokenPipeline		TokenPipeline
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

TextMining

About

Uh oh!

Releases

Packages

Uh oh!

Languages

NeuroBridge/TextMining

Folders and files

Latest commit

History

Repository files navigation

TextMining

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages