Dear X‑Wing team,
I’m encountering an issue when running PANTHER that might be worth documenting or patching.
When I try to run PANTHER using my target dataset, I run into two conflicting requirements for the .bim file:
munge_ref() expects the BIM file with a header, so it can read column names for SNP alignment. This is consistent with the files that are downloaded off your github page. However, munge_bim() expects a headerless PLINK‑style BIM file (CHR, SNP, CM, BP, A1, A2), otherwise it throws:
"ValueError: invalid literal for int() with base 10: 'CHR'"
To work around this, I had to create two versions of the same file:
snpinfo_mult_1kg_hm3 → original, headered file (used by munge_ref)
snpinfo_mult_1kg_hm3.bim → headerless copy (used by munge_bim)
This allowed the program to proceed, but at the alignment step (munge_sumstats / align_ldblk) PANTHER reports 0 overlapping SNPs, even though I verified that ~1.17 million SNPs exist in common between: (1) The reference panel (snpinfo_mult_1kg_hm3), (2) The target BIM file (3) The GWAS summary stats. The result from (4) the results from LOGOdetect. Python/Pandas confirms 1,177,049 SNPs overlap across these files. PANTHER reports 0 overlapping SNPs and no SNPs survive to MCMC.
Example of my files:
Headered BIM (snpinfo_mult_1kg_hm3):
CHR SNP BP A1 A2
1 rs28527770 751756 C T
1 rs3094315 752566 A G
Headerless .bim for PLINK/PANTHER:
1 rs28527770 0 751756 C T
1 rs3094315 0 752566 A G
GWAS summary stats (GSCANEur_Linux.txt):
CHR SNP BP A1 A2 BETA P
1 rs28527770 751756 C T 0.0403 0.9678
1 rs3094315 752566 A G -0.0410 0.9673
A reproducible example is below. I can send the GSCAN files I used. Otherwise, any set of similar formatted SNPs should be sufficient:
Use snpinfo_mult_1kg_hm3 with header + .bim headerless copy.
Run:
bash
Copy
Edit
python PANTHER.py
--ref_dir PANTHER_1kg_ref
--bim_prefix PANTHER_1kg_ref/snpinfo_mult_1kg_hm3
--sumstats GSCANEur_Linux.txt,GSCANAfr_Linux.txt
--n_gwas 724269,158284
--anno_file LOGODetect_Test/annot_EUR.txt,LOGODetect_Test/annot_AFR.txt
--chrom 1
--pop EUR,AFR
--target_pop AFR
--pst_pop AFR
--out_name SmkInit
--seed 3
--out_dir PANTHER/post
Output: 0 overlapping SNPs
What do you think?
Anyway of updating munge_bim() to detect and skip a header automatically?
Anyway to allow users to pass a single headered .bim file to simplify the workflow?
Thank you for maintaining this tool!
I’m happy to test any patch that resolves the dual‑BIM requirement and the 0‑overlap filtering.
I can also create a small ZIP with the headered BIM + sumstats + GSCAN files if that would help reproduce the behavior.
Alexander
Dear X‑Wing team,
I’m encountering an issue when running PANTHER that might be worth documenting or patching.
When I try to run PANTHER using my target dataset, I run into two conflicting requirements for the .bim file:
munge_ref() expects the BIM file with a header, so it can read column names for SNP alignment. This is consistent with the files that are downloaded off your github page. However, munge_bim() expects a headerless PLINK‑style BIM file (CHR, SNP, CM, BP, A1, A2), otherwise it throws:
"ValueError: invalid literal for int() with base 10: 'CHR'"
To work around this, I had to create two versions of the same file:
snpinfo_mult_1kg_hm3 → original, headered file (used by munge_ref)
snpinfo_mult_1kg_hm3.bim → headerless copy (used by munge_bim)
This allowed the program to proceed, but at the alignment step (munge_sumstats / align_ldblk) PANTHER reports 0 overlapping SNPs, even though I verified that ~1.17 million SNPs exist in common between: (1) The reference panel (snpinfo_mult_1kg_hm3), (2) The target BIM file (3) The GWAS summary stats. The result from (4) the results from LOGOdetect. Python/Pandas confirms 1,177,049 SNPs overlap across these files. PANTHER reports 0 overlapping SNPs and no SNPs survive to MCMC.
Example of my files:
Headered BIM (snpinfo_mult_1kg_hm3):
CHR SNP BP A1 A2
1 rs28527770 751756 C T
1 rs3094315 752566 A G
Headerless .bim for PLINK/PANTHER:
1 rs28527770 0 751756 C T
1 rs3094315 0 752566 A G
GWAS summary stats (GSCANEur_Linux.txt):
CHR SNP BP A1 A2 BETA P
1 rs28527770 751756 C T 0.0403 0.9678
1 rs3094315 752566 A G -0.0410 0.9673
A reproducible example is below. I can send the GSCAN files I used. Otherwise, any set of similar formatted SNPs should be sufficient:
Use snpinfo_mult_1kg_hm3 with header + .bim headerless copy.
Run:
bash
Copy
Edit
python PANTHER.py
--ref_dir PANTHER_1kg_ref
--bim_prefix PANTHER_1kg_ref/snpinfo_mult_1kg_hm3
--sumstats GSCANEur_Linux.txt,GSCANAfr_Linux.txt
--n_gwas 724269,158284
--anno_file LOGODetect_Test/annot_EUR.txt,LOGODetect_Test/annot_AFR.txt
--chrom 1
--pop EUR,AFR
--target_pop AFR
--pst_pop AFR
--out_name SmkInit
--seed 3
--out_dir PANTHER/post
Output: 0 overlapping SNPs
What do you think?
Anyway of updating munge_bim() to detect and skip a header automatically?
Anyway to allow users to pass a single headered .bim file to simplify the workflow?
Thank you for maintaining this tool!
I’m happy to test any patch that resolves the dual‑BIM requirement and the 0‑overlap filtering.
I can also create a small ZIP with the headered BIM + sumstats + GSCAN files if that would help reproduce the behavior.
Alexander