Ranchero

Is your mycobacterial metadata a mess? Grab the M. bovis by the horns with Ranchero.

Ranchero is a Python solution to the dozens of different metadata formats used in genomic datasets. While it is specifically focused on NCBI's collection of Mycobacterium tuberculosis complex metadata, it still has utility for other organisms. For information on what Ranchero considers "a sample" and the like, see ./docs/data_structure.md. For information on how to configure Ranchero, see .docs/configuration.md.

Features

Input a TSV/JSON/CSV of new samples and their metadata into a dataframe
Merge columns of similar data types into a single column, filling in nulls/empty values as you go
Input a TSV of metadata to "inject" into an existing dataframe, optionally overriding metadata already present
Flatten all of those "missing" and "Not Applicable" strings into proper null values
Convert countries into three-letter country codes per ISO 3166
Convert dates to YYYY-MM-DD format into an ISO 8601-like format -- missing months/days are denoted as NN.
Convert common host animal names to a standarized Genus species "common name" format
(tuberculosis only) Convert old-school strain names to the modern lineage system

Dependencies

Python 3.11-ish (3.7+ should be okay)
pandas >= 2.0.0
pyarrow, even if not working with Apache Arrow datasets
polars for Python ==1.16.0
- Please check the minimum version; this code expects the behavior of pola-rs/polars#20069
tqdm

Supported inputs

JSON files directly from BigQuery
CSV files directly from NCBI Run Selector
Any arbitrary TSV file, provided it has a "BioSample" or "run_accession" column

Unsupported inputs

Excel (but Excel supports output to TSV)
XML from NCBI "full summary" file download
JSON files not directly from BigQuery
CSV files not directly from NCBI Run Selector

Name	Name	Last commit message	Last commit date
Latest commit aofarrel tsv_value_counts() and some examples Mar 6, 2025 a5fc582 · Mar 6, 2025 History 57 Commits
docs	docs	Huge update	Nov 22, 2024
src	src	tsv_value_counts() and some examples	Mar 6, 2025
.gitattributes	.gitattributes	Initial commit	Aug 1, 2024
.gitignore	.gitignore	Rearrange and rename some functions	Jan 29, 2025
.pylintrc	.pylintrc	Huge update	Nov 22, 2024
README.md	README.md	Fix readme markdown	Feb 6, 2025
TB_metadata_compilation.py	TB_metadata_compilation.py	tsv_value_counts() and some examples	Mar 6, 2025
demo.py	demo.py	General cleanup	Dec 12, 2024
requirements.txt	requirements.txt	Small adjustments	Dec 19, 2024
setup.py	setup.py	Basic structure	Aug 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Ranchero

Features

Dependencies

Supported inputs

Unsupported inputs

About

Releases

Packages

Languages

aofarrel/ranchero

Folders and files

Latest commit

History

Repository files navigation

Ranchero

Features

Dependencies

Supported inputs

Unsupported inputs

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages