Skip to content

Title: PANTHER requires two .bim file versions and results in 0 overlapping SNPs #8

@AlexHatoum

Description

@AlexHatoum

Dear X‑Wing team,

I’m encountering an issue when running PANTHER that might be worth documenting or patching.

When I try to run PANTHER using my target dataset, I run into two conflicting requirements for the .bim file:

munge_ref() expects the BIM file with a header, so it can read column names for SNP alignment. This is consistent with the files that are downloaded off your github page. However, munge_bim() expects a headerless PLINK‑style BIM file (CHR, SNP, CM, BP, A1, A2), otherwise it throws:

"ValueError: invalid literal for int() with base 10: 'CHR'"

To work around this, I had to create two versions of the same file:

snpinfo_mult_1kg_hm3 → original, headered file (used by munge_ref)

snpinfo_mult_1kg_hm3.bim → headerless copy (used by munge_bim)

This allowed the program to proceed, but at the alignment step (munge_sumstats / align_ldblk) PANTHER reports 0 overlapping SNPs, even though I verified that ~1.17 million SNPs exist in common between: (1) The reference panel (snpinfo_mult_1kg_hm3), (2) The target BIM file (3) The GWAS summary stats. The result from (4) the results from LOGOdetect. Python/Pandas confirms 1,177,049 SNPs overlap across these files. PANTHER reports 0 overlapping SNPs and no SNPs survive to MCMC.

Example of my files:

Headered BIM (snpinfo_mult_1kg_hm3):

CHR SNP BP A1 A2
1 rs28527770 751756 C T
1 rs3094315 752566 A G

Headerless .bim for PLINK/PANTHER:

1 rs28527770 0 751756 C T
1 rs3094315 0 752566 A G

GWAS summary stats (GSCANEur_Linux.txt):

CHR SNP BP A1 A2 BETA P
1 rs28527770 751756 C T 0.0403 0.9678
1 rs3094315 752566 A G -0.0410 0.9673

A reproducible example is below. I can send the GSCAN files I used. Otherwise, any set of similar formatted SNPs should be sufficient:

Use snpinfo_mult_1kg_hm3 with header + .bim headerless copy.

Run:

bash
Copy
Edit
python PANTHER.py
--ref_dir PANTHER_1kg_ref
--bim_prefix PANTHER_1kg_ref/snpinfo_mult_1kg_hm3
--sumstats GSCANEur_Linux.txt,GSCANAfr_Linux.txt
--n_gwas 724269,158284
--anno_file LOGODetect_Test/annot_EUR.txt,LOGODetect_Test/annot_AFR.txt
--chrom 1
--pop EUR,AFR
--target_pop AFR
--pst_pop AFR
--out_name SmkInit
--seed 3
--out_dir PANTHER/post
Output: 0 overlapping SNPs

What do you think?
Anyway of updating munge_bim() to detect and skip a header automatically?

Anyway to allow users to pass a single headered .bim file to simplify the workflow?

Thank you for maintaining this tool!
I’m happy to test any patch that resolves the dual‑BIM requirement and the 0‑overlap filtering.

I can also create a small ZIP with the headered BIM + sumstats + GSCAN files if that would help reproduce the behavior.

Alexander

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions