Skip to content

NEW: annotating real spectra vs calculated one #314

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

boopthesnoot
Copy link
Contributor

@boopthesnoot boopthesnoot commented Apr 9, 2025

@GeorgWa added functionality to compare experimental spectra with calculated ones, so all credit to him.

taken from https://github.com/MannLabs/metaptcm/blob/cbf579b52041d96636dd04c29d4855e5391bd136/metaptcm/utils/annotation.py so @mschwoer already reviewed this

Feel free to suggest a different placement within the package, also added some noqas.

@boopthesnoot boopthesnoot requested review from GeorgWa and mschwoer April 9, 2025 16:01
@boopthesnoot boopthesnoot marked this pull request as draft April 10, 2025 07:26
Copy link
Collaborator

@GeorgWa GeorgWa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This module currently contains the conversion of a flat (annotated) library to a dense library. In the meantime we have implemented a more optimized and clean version in alphabase.

See here: create_dense_matrices

# Loss type mappings - these map loss type numeric codes to their string representations
# Used in annotate.py and flat.py

LOSS_NUMBER_TO_TYPE = {0: "", 17: "_NH3", 18: "_H2O", 98: "_modloss"}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How much redundency is there between these two modules?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'll try to minimize redundancy, thank you for pointing it out

----------
speclib_flat : SpecLibFlat
A spectral library flat object containing precursor information.
raw_data : MzMLReader | ThermoRawData
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the most accurate thig would be to use a MsDataBase from alpharaw here as annotation type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't use alpharaw, otherwise we get circular dependencies

)


def _get_dense_column( # noqa: PLR0913
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Part of flat -> dense

return "_".join(items)


def _add_frag_column_annotation(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Part of flat -> dense

)


def _assign_to_dense(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Part of flat -> dense

@GeorgWa GeorgWa self-requested a review April 11, 2025 11:53
Copy link
Collaborator

@GeorgWa GeorgWa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit integration work but then we should be good :)

)
from alphabase.spectral_library.flat import SpecLibFlat

UNANNOTATED_TYPE = 255
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you move this to alphabase/peptide/fragment.py?

return outlib_flat


def calculate_pif(spectrum_peak_df: pd.DataFrame) -> float:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about moving them into a metrics.py module? similar for gini etc.

)


def _sequence_coverage_metric(flatlib: SpecLibFlat) -> None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to metrics.py

)


def _sequence_gini_metric(flatlib: SpecLibFlat) -> None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to metrics.py

)


def _normalized_count_metric(flatlib: SpecLibFlat) -> None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to metrics.py

return np.median(valid_values)


def _mass_accuracy_metric(flatlib: SpecLibFlat) -> None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to metrics.py

mass_error_ppm=mass_error_ppm,
)

matched_precursor_df.loc[i, "pif"] = calculate_pif(spectrum_peak_df)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, that I'm asking you to fix my mistakes here :D

could you change the calulate_pif() function so it's structured and applied in the same way as the other metrics?
So it's called after annotation and operates on the precursor df.

)


def add_dense_lib(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you think about adding this function directly to the speclib flat object?
so we can call SpecLibFlat.add_dense_lib().
I think this fits the ab pattern better.

@@ -261,16 +262,16 @@ def get_full_charged_types(self, frag_df: pd.DataFrame) -> list:
# Now if we have a fragment type that is a,b,c we should have the corresponding x,y,z

corresponding = {"a": "x", "b": "y", "c": "z", "x": "a", "y": "b", "z": "c"}
loss_number_to_type = {0: "", 18: "_H2O", 17: "_NH3", 98: "_modloss"}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_full_charged_types() is deprecated, so there is no need to do any changes here.

# Loss type mappings - these map loss type numeric codes to their string representations
# Used in annotate.py and flat.py

LOSS_NUMBER_TO_TYPE = {0: "", 17: "_NH3", 18: "_H2O", 98: "_modloss"}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be removed as it's not needed by the new AB based implementation :).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants