Skip to content

This solver needs samples of at least 2 classes in the data #12

@skudashev

Description

@skudashev

My issue is very similar to #9 (comment)

Parsing BAM file: chr22_alignments.sorted.bam
Identified 182998 introns
Annotated introns file /ei/projects/8/8289c66d-2d56-4706-a307-5a9a3eb3747e/data/Annotations/gencode.v44.annotated_juncs.bed provided
Identified 402454 annotated introns
debug: Tree structure:
debug: |--- jad <= 71.50
debug: |   |--- class: 0
debug: |--- jad >  71.50
debug: |   |--- is_canonical_motif <= 0.50
debug: |   |   |--- class: 0
debug: |   |--- is_canonical_motif >  0.50
debug: |   |   |--- class: 0
debug: Decision tree 1 confusion matrix:
debug: [[177013      0]
debug:  [  5985      0]]
Fetching junction sequences from /ei/projects/3/31655266-640a-41d2-8663-59bba38bc3c4/data/data/References/hg38_sequin.fa
Identified 132451 unique donors and 127498 unique acceptors
Scoring donor sequences with LR...
pgrep: /nbi/software/production/bin/core/../..//hpccore/5/x86_64/lib/liblzma.so.5: no version information available (required by /lib64/libsystemd.so.0)
pgrep: /nbi/software/production/bin/core/../..//hpccore/5/x86_64/lib/liblzma.so.5: no version information available (required by /lib64/libsystemd.so.0)
pgrep: /nbi/software/production/bin/core/../..//hpccore/5/x86_64/lib/liblzma.so.5: no version information available (required by /lib64/libsystemd.so.0)
pgrep: /nbi/software/production/bin/core/../..//hpccore/5/x86_64/lib/liblzma.so.5: no version information available (required by /lib64/libsystemd.so.0)
pgrep: /nbi/software/production/bin/core/../..//hpccore/5/x86_64/lib/liblzma.so.5: no version information available (required by /lib64/libsystemd.so.0)
pgrep: /nbi/software/production/bin/core/../..//hpccore/5/x86_64/lib/liblzma.so.5: no version information available (required by /lib64/libsystemd.so.0)
joblib.externals.loky.process_executor._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/ei/software/testing/python_miniconda/4.10.3_py3.9_sk/x86_64/envs/2passtools/lib/python3.6/site-packages/joblib/externals/loky/process_executor.py", line 436, in _process_worker
    r = call_item()
  File "/ei/software/testing/python_miniconda/4.10.3_py3.9_sk/x86_64/envs/2passtools/lib/python3.6/site-packages/joblib/externals/loky/process_executor.py", line 288, in __call__
    return self.fn(*self.args, **self.kwargs)
  File "/ei/software/testing/python_miniconda/4.10.3_py3.9_sk/x86_64/envs/2passtools/lib/python3.6/site-packages/joblib/_parallel_backends.py", line 595, in __call__
    return self.func(*args, **kwargs)
  File "/ei/software/testing/python_miniconda/4.10.3_py3.9_sk/x86_64/envs/2passtools/lib/python3.6/site-packages/joblib/parallel.py", line 263, in __call__
    for func, args, kwargs in self.items]
  File "/ei/software/testing/python_miniconda/4.10.3_py3.9_sk/x86_64/envs/2passtools/lib/python3.6/site-packages/joblib/parallel.py", line 263, in <listcomp>
    for func, args, kwargs in self.items]
  File "/ei/software/testing/python_miniconda/4.10.3_py3.9_sk/x86_64/envs/2passtools/lib/python3.6/site-packages/lib2pass/seqlr.py", line 39, in train_and_predict
    lr.fit(X_train, y_train)
  File "/ei/software/testing/python_miniconda/4.10.3_py3.9_sk/x86_64/envs/2passtools/lib/python3.6/site-packages/sklearn/linear_model/_logistic.py", line 1376, in fit
    " class: %r" % classes_[0])
ValueError: This solver needs samples of at least 2 classes in the data, but the data contains only one class: 0
"""

I followed the instructions and then ran 2passtools with DEBUG on.

paftools.js gff2bed -j gencode.v44.annotation.gtf > gencode.v44.annotated_juncs.bed 
2passtools score -v DEBUG -f /ei/projects/3/31655266-640a-41d2-8663-59bba38bc3c4/data/data/References/hg38_sequin.fa -p 24 \
    -a /ei/projects/8/8289c66d-2d56-4706-a307-5a9a3eb3747e/data/Annotations/gencode.v44.annotated_juncs.bed --classifier-type decision_tree \
    -m "GTAG|GCAG|ATAG" -j 4 --keep-all-annot -o iPSC.merged.juncs.all.bed $subset_bam 
head -n 5  /ei/projects/8/8289c66d-2d56-4706-a307-5a9a3eb3747e/data/Annotations/gencode.v44.annotated_juncs.bed
chr1	12227	12612	ENST00000456328.2|lncRNA|DDX11L2	1000	+
chr1	12721	13220	ENST00000456328.2|lncRNA|DDX11L2	1000	+
chr1	12057	12178	ENST00000450305.2|transcribed_unprocessed_pseudogene|DDX11L1	1000	+
chr1	12227	12612	ENST00000450305.2|transcribed_unprocessed_pseudogene|DDX11L1	1000	+
chr1	12697	12974	ENST00000450305.2|transcribed_unprocessed_pseudogene|DDX11L1	1000	+

Could this be something to do with my canonical motifs? Also my JAD is set to 4 but the tree structure says jad <= 71.50, is this correct?

Kind regards,
Sofia

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions