You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have encountered a problem with bwameth that pops up when the FASTQ comment contains a read group. In this case, bwameth only outputs the SAM header without any reads.
running: /home/oender/anaconda3/envs/population-epigenetics/bin/python /home/oender/anaconda3/envs/population-epigenetics/lib/python3.6/site-packages/bwameth-0.2.2-py2.7.egg-info/scripts/bwameth.py c2t data/test/test-line_A-R1.classified.qc.fastq data/test/test-line_A-R2.classified.qc.fastq |bwa mem -T 40 -B 2 -L 10 -CM -U 100 -p -R '@RG\tID:test-line_A-R.classified.qc\tSM:test-line_A-R.classified.qc' -t 4 ../data/arabidopsis_thaliana/genome_assembly/TAIR10.fasta.bwameth.c2t -
converting reads in data/test/test-line_A-R1.classified.qc.fastq,data/test/test-line_A-R2.classified.qc.fastq
[M::bwa_idx_load_from_disk] read 0 ALT contigs
[M::process] read 320080 sequences (40000212 bp)...
[M::process] 0 single-end sequences; 320080 paired-end sequences
WARNING: 1709 reads with length < 80
: this program is designed for long reads
[M::process] read 121626 sequences (15199052 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (0, 97487, 4, 0)
[M::mem_pestat] skip orientation FF as there are not enough pairs
[M::mem_pestat] analyzing insert size distribution for orientation FR...
[M::mem_pestat] (25, 50, 75) percentile: (169, 215, 277)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 493)
[M::mem_pestat] mean and std.dev: (227.57, 79.20)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 601)
[M::mem_pestat] skip orientation RF as there are not enough pairs
[M::mem_pestat] skip orientation RR as there are not enough pairs
[M::mem_process_seqs] Processed 320080 reads in 245.362 CPU sec, 61.338 real sec
['NM:i:21', 'MD:Z:40^GGAATTGTTGATTTGGATTT80G5', 'MC:Z:126M', 'AS:i:97', 'XS:i:83', 'RG:Z:test-line_A-R.classified.qc', 'XA:Z:f3,+14193782,40S86M,1;f3,+14204191,40S86M,1;', 'RG:Z:CB0L6ANXX:1:ATTCCT YS:Z:TTTGGATTTGGAATTGTTGAGAAAAGTTTATCGGGTTTGAGGAATTGTTGAGAAAAGTTTATTGGGTTTGAGGATTTGTTGATTAGGAGTGGAAATTGTTGAGAAAAATTTATTGGGTTTTAGGAA', 'YC:Z:CT']
700523F:121:CB0L6ANXX:1:1103:2712:2482
Traceback (most recent call last):
File "/home/oender/anaconda3/envs/population-epigenetics/bin/bwameth.py", line 4, in<module>
__import__('pkg_resources').run_script('bwameth==0.2.2', 'bwameth.py')
File "/home/oender/anaconda3/envs/population-epigenetics/lib/python3.6/site-packages/pkg_resources/__init__.py", line 664, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/home/oender/anaconda3/envs/population-epigenetics/lib/python3.6/site-packages/pkg_resources/__init__.py", line 1444, in run_script
exec(code, namespace, namespace)
File "/home/oender/anaconda3/envs/population-epigenetics/lib/python3.6/site-packages/bwameth-0.2.2-py2.7.egg-info/scripts/bwameth.py", line 509, in<module>
main(sys.argv[1:])
File "/home/oender/anaconda3/envs/population-epigenetics/lib/python3.6/site-packages/bwameth-0.2.2-py2.7.egg-info/scripts/bwameth.py", line 506, in main
set_as_failed=args.set_as_failed)
File "/home/oender/anaconda3/envs/population-epigenetics/lib/python3.6/site-packages/bwameth-0.2.2-py2.7.egg-info/scripts/bwameth.py", line 331, in bwa_mem
as_bam(cmd, fa, set_as_failed)
File "/home/oender/anaconda3/envs/population-epigenetics/lib/python3.6/site-packages/bwameth-0.2.2-py2.7.egg-info/scripts/bwameth.py", line 353, in as_bam
foralnin handle_reads(pair_list, set_as_failed):
File "/home/oender/anaconda3/envs/population-epigenetics/lib/python3.6/site-packages/bwameth-0.2.2-py2.7.egg-info/scripts/bwameth.py", line 376, in handle_reads
orig_seq = aln.original_seq
File "/home/oender/anaconda3/envs/population-epigenetics/lib/python3.6/site-packages/bwameth-0.2.2-py2.7.egg-info/scripts/bwameth.py", line 284, in original_seq
return next(x forxin self.other if x.startswith("YS:Z:"))[5:]
StopIteration
[M::process] 0 single-end sequences; 121626 paired-end sequences
As you can see, RG:Z:CB0L6ANXX:1:ATTCCT is the RG that was part of the FASTQ input:
I think it is a bug that bwameth adds RG:Z:test-line_A-R.classified.qc although I did not supply any read group parameter and actually want to pass through the RGs in the FASTQs. Indeed, when I run the command
(i.e., explicitly removing -R '...') everything works, although the SAM has to be converted back.
Suggestion
As I see it, the problem arises because of the way in which the read group argument is handled. Probably, you can leave the function bwa_mem as it is but change how it is called. It is not quite clear but I guess in the call of bwa_mem,
rg=args.read_grouporrname(*args.fastqs)
causes the trouble if I do not supply a read group parameter on the command line. Or you have to disentangle the addition of RG to the header from RGs for individual reads.
The text was updated successfully, but these errors were encountered:
thanks for the careful description. I don't intend to fix, but will accept a PR that does. I think that would also require manually setting the addition of the RG to the header.
Problem
I have encountered a problem with bwameth that pops up when the FASTQ comment contains a read group. In this case, bwameth only outputs the SAM header without any reads.
Details
This is the command I run:
The stdout/stderr output is here:
As you can see,
RG:Z:CB0L6ANXX:1:ATTCCT
is the RG that was part of the FASTQ input:I think it is a bug that bwameth adds
RG:Z:test-line_A-R.classified.qc
although I did not supply any read group parameter and actually want to pass through the RGs in the FASTQs. Indeed, when I run the commandbwameth.py c2t data/test/test-line_A-R1.classified.qc.fastq data/test/test-line_A-R2.classified.qc.fastq |bwa mem -T 40 -B 2 -L 10 -CM -U 100 -p -t 4 ../data/arabidopsis_thaliana/genome_assembly/TAIR10.fasta.bwameth.c2t -
(i.e., explicitly removing -R '...') everything works, although the SAM has to be converted back.
Suggestion
As I see it, the problem arises because of the way in which the read group argument is handled. Probably, you can leave the function
bwa_mem
as it is but change how it is called. It is not quite clear but I guess in the call of bwa_mem,causes the trouble if I do not supply a read group parameter on the command line. Or you have to disentangle the addition of RG to the header from RGs for individual reads.
The text was updated successfully, but these errors were encountered: