-
Notifications
You must be signed in to change notification settings - Fork 9
NEW: annotating real spectra vs calculated one #314
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This module currently contains the conversion of a flat (annotated) library to a dense library. In the meantime we have implemented a more optimized and clean version in alphabase.
See here: create_dense_matrices
# Loss type mappings - these map loss type numeric codes to their string representations | ||
# Used in annotate.py and flat.py | ||
|
||
LOSS_NUMBER_TO_TYPE = {0: "", 17: "_NH3", 18: "_H2O", 98: "_modloss"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is now covered in https://github.com/MannLabs/alphabase/blob/main/alphabase/peptide/fragment.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How much redundency is there between these two modules?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'll try to minimize redundancy, thank you for pointing it out
---------- | ||
speclib_flat : SpecLibFlat | ||
A spectral library flat object containing precursor information. | ||
raw_data : MzMLReader | ThermoRawData |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the most accurate thig would be to use a MsDataBase from alpharaw here as annotation type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't use alpharaw, otherwise we get circular dependencies
) | ||
|
||
|
||
def _get_dense_column( # noqa: PLR0913 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Part of flat -> dense
return "_".join(items) | ||
|
||
|
||
def _add_frag_column_annotation( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Part of flat -> dense
) | ||
|
||
|
||
def _assign_to_dense( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Part of flat -> dense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit integration work but then we should be good :)
) | ||
from alphabase.spectral_library.flat import SpecLibFlat | ||
|
||
UNANNOTATED_TYPE = 255 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you move this to alphabase/peptide/fragment.py
?
return outlib_flat | ||
|
||
|
||
def calculate_pif(spectrum_peak_df: pd.DataFrame) -> float: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about moving them into a metrics.py module? similar for gini etc.
) | ||
|
||
|
||
def _sequence_coverage_metric(flatlib: SpecLibFlat) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move to metrics.py
) | ||
|
||
|
||
def _sequence_gini_metric(flatlib: SpecLibFlat) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move to metrics.py
) | ||
|
||
|
||
def _normalized_count_metric(flatlib: SpecLibFlat) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move to metrics.py
return np.median(valid_values) | ||
|
||
|
||
def _mass_accuracy_metric(flatlib: SpecLibFlat) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move to metrics.py
mass_error_ppm=mass_error_ppm, | ||
) | ||
|
||
matched_precursor_df.loc[i, "pif"] = calculate_pif(spectrum_peak_df) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, that I'm asking you to fix my mistakes here :D
could you change the calulate_pif()
function so it's structured and applied in the same way as the other metrics?
So it's called after annotation and operates on the precursor df.
) | ||
|
||
|
||
def add_dense_lib( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what do you think about adding this function directly to the speclib flat object?
so we can call SpecLibFlat.add_dense_lib()
.
I think this fits the ab pattern better.
@@ -261,16 +262,16 @@ def get_full_charged_types(self, frag_df: pd.DataFrame) -> list: | |||
# Now if we have a fragment type that is a,b,c we should have the corresponding x,y,z | |||
|
|||
corresponding = {"a": "x", "b": "y", "c": "z", "x": "a", "y": "b", "z": "c"} | |||
loss_number_to_type = {0: "", 18: "_H2O", 17: "_NH3", 98: "_modloss"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get_full_charged_types()
is deprecated, so there is no need to do any changes here.
# Loss type mappings - these map loss type numeric codes to their string representations | ||
# Used in annotate.py and flat.py | ||
|
||
LOSS_NUMBER_TO_TYPE = {0: "", 17: "_NH3", 18: "_H2O", 98: "_modloss"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can be removed as it's not needed by the new AB based implementation :).
@GeorgWa added functionality to compare experimental spectra with calculated ones, so all credit to him.
taken from https://github.com/MannLabs/metaptcm/blob/cbf579b52041d96636dd04c29d4855e5391bd136/metaptcm/utils/annotation.py so @mschwoer already reviewed this
Feel free to suggest a different placement within the package, also added some noqas.