MicroTrawler

🎣 - go fishing for ancient antibiotic resistance in public databases

GAMEPLAN

Download database [nctc_trawler, cip_trawler] to a tsv
Extract best guess for culture year for each strain [helper_scripts/nctc,cip/extract_dates]
Use awk to filter strains to pre-ab dates awk {if($2<=1940){print $0}} $dates > ./ancient_strains
Download existing uploaded assemblies found in ena/refseq for each helper_scripts/generic/download_ena,download_refseq
Search these fasta files for ABR using RGI helper_scripts/generic/look_for_ab_resistance

NCTC

To trawl the NCTC, clone and cd into the repo. Then,

chmod +x nctc_trawler
./nctc_trawler

There are 4 options:

-o/--output : output file to dump NCTC db to. default is ./YYYY-MM-DD_nctc_db.tsv
-s/--sleep : number of seconds to sleep before pinging NCTC again. default is 5
-m/--max : max NCTC accession number to search. default is curr. max. accession number in NCTC
-n/--min : min NCTC accession number to search. default is 1
-c/--clean : remove very sparse columns in final database. 'very sparse' is <= $clean number of total values in a given column. Default is 2

The script basically loops through all NCTC numbers from the --min option to the --max option, and dumps all strain info found into --output. Requires pandas (install here)

CIP

To trawl the CIP, clone and cd into the repo. Then,

chmod +x cip_trawler
./cip_trawler

There are 2 options:

-o/--output : output file to dump CIP db to. default is ./YYYY-MM-DD_cip_db.tsv
-s/--search : search term to use. default is 'CIP' (this grabs all CIP entries in database)

The CIP is very nifty in that it already has a button to download everything as an excel spreadsheet!! :O This is still a programmatic option so you don't have to deal with websites. Theoretically we can access every search field. This requires selenium (see install instructions here), geckodriver (install same as before), and pandas (install above)

Name		Name	Last commit message	Last commit date
Latest commit History 60 Commits
helper_scripts		helper_scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
cip_trawler		cip_trawler
labnotebook.md		labnotebook.md
nctc_trawler		nctc_trawler

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MicroTrawler

GAMEPLAN

NCTC

CIP

About

Releases

Packages

Languages

License

baymlab/MicroTrawler

Folders and files

Latest commit

History

Repository files navigation

MicroTrawler

GAMEPLAN

NCTC

CIP

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages