ClinEpiDB was my main area of responsibility, and this repo handles most of the "ETL" work (extract-transform-load)
- VEuPathDB/ClinEpiData repo
- my commits
- Several of my utility scripts found here
- I created this utility for myself to save time preprocessing raw datasets for loading: VEuPathDB/ClinEpiData/Load/bin/clu
Of more interest to genomics/bioninformatics, this package fetches sequences using NCBI Entrez service Entrez wrapper a utility that uses it: auditESTsVersusGenbank
A Nextflow example, with some Perl code embedded. It uses the NCBI edirect tools to fetch and format sequences from the NCBI Popset database: nextflow popset module
For a job application, I was asked to write “something new” as a Perl coding demo. This is the result, a script to convert SNP data to VCF format: https://github.com/jaycolin/demos/tree/main/veupathdb
Another job application offered a specific challenge assignment to create an ETL solution in Microsoft Azure: https://github.com/jaycolin/demos2