NEW: annotating real spectra vs calculated one #314

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Draft

boopthesnoot wants to merge 3 commits into main from annotate_migration

Contributor

boopthesnoot commented Apr 9, 2025 •

edited

Loading

@GeorgWa added functionality to compare experimental spectra with calculated ones, so all credit to him.

taken from https://github.com/MannLabs/metaptcm/blob/cbf579b52041d96636dd04c29d4855e5391bd136/metaptcm/utils/annotation.py so @mschwoer already reviewed this

Feel free to suggest a different placement within the package, also added some noqas.


          NEW: annotating real spectra vs calculated one

57d28f1

boopthesnoot requested review from GeorgWa and mschwoer

April 9, 2025 16:01

boopthesnoot marked this pull request as draft

April 10, 2025 07:26

GeorgWa approved these changes

View reviewed changes

Collaborator

GeorgWa left a comment

This module currently contains the conversion of a flat (annotated) library to a dense library. In the meantime we have implemented a more optimized and clean version in alphabase.

See here: create_dense_matrices

alphabase/constants/spectral_library.py

+              # Loss type mappings - these map loss type numeric codes to their string representations
+              # Used in annotate.py and flat.py
+              LOSS_NUMBER_TO_TYPE = {0: "", 17: "_NH3", 18: "_H2O", 98: "_modloss"}

Collaborator

GeorgWa Apr 10, 2025

This is now covered in https://github.com/MannLabs/alphabase/blob/main/alphabase/peptide/fragment.py

Collaborator

jalew188 Apr 10, 2025

How much redundency is there between these two modules?

Contributor Author

boopthesnoot Apr 10, 2025

i'll try to minimize redundancy, thank you for pointing it out

alphabase/spectral_library/annotate.py Outdated

+                  ----------
+                  speclib_flat : SpecLibFlat
+                      A spectral library flat object containing precursor information.
+                  raw_data : MzMLReader | ThermoRawData

Collaborator

GeorgWa Apr 10, 2025

I think the most accurate thig would be to use a MsDataBase from alpharaw here as annotation type.

Contributor Author

boopthesnoot Apr 10, 2025

can't use alpharaw, otherwise we get circular dependencies

alphabase/spectral_library/annotate.py Outdated

		)


		def _get_dense_column( # noqa: PLR0913

Collaborator

GeorgWa Apr 10, 2025

Part of flat -> dense

alphabase/spectral_library/annotate.py Outdated

		return "_".join(items)


		def _add_frag_column_annotation(

Collaborator

GeorgWa Apr 10, 2025

Part of flat -> dense

alphabase/spectral_library/annotate.py Outdated

		)


		def _assign_to_dense(

Collaborator

GeorgWa Apr 10, 2025

Part of flat -> dense

boopthesnoot added 2 commits

April 10, 2025 22:26


          FIX: circular dependencies

9fe02d9


          FIX: 3.9 typing

ce10314

GeorgWa self-requested a review

April 11, 2025 11:53

GeorgWa reviewed

View reviewed changes

Collaborator

GeorgWa left a comment

It's a bit integration work but then we should be good :)

alphabase/spectral_library/annotate.py

+              )
+              from alphabase.spectral_library.flat import SpecLibFlat
+              UNANNOTATED_TYPE = 255

Collaborator

GeorgWa Apr 11, 2025

can you move this to alphabase/peptide/fragment.py?

alphabase/spectral_library/annotate.py

		return outlib_flat


		def calculate_pif(spectrum_peak_df: pd.DataFrame) -> float:

Collaborator

GeorgWa Apr 11, 2025

What do you think about moving them into a metrics.py module? similar for gini etc.

alphabase/spectral_library/annotate.py

		)


		def _sequence_coverage_metric(flatlib: SpecLibFlat) -> None:

Collaborator

GeorgWa Apr 11, 2025

move to metrics.py

alphabase/spectral_library/annotate.py

		)


		def _sequence_gini_metric(flatlib: SpecLibFlat) -> None:

Collaborator

GeorgWa Apr 11, 2025

move to metrics.py

alphabase/spectral_library/annotate.py

		)


		def _normalized_count_metric(flatlib: SpecLibFlat) -> None:

Collaborator

GeorgWa Apr 11, 2025

move to metrics.py

alphabase/spectral_library/annotate.py

		return np.median(valid_values)


		def _mass_accuracy_metric(flatlib: SpecLibFlat) -> None:

Collaborator

GeorgWa Apr 11, 2025

move to metrics.py

alphabase/spectral_library/annotate.py

+                          mass_error_ppm=mass_error_ppm,
+                      )
+                      matched_precursor_df.loc[i, "pif"] = calculate_pif(spectrum_peak_df)

Collaborator

GeorgWa Apr 11, 2025

Sorry, that I'm asking you to fix my mistakes here :D

could you change the calulate_pif() function so it's structured and applied in the same way as the other metrics?
So it's called after annotation and operates on the precursor df.

alphabase/spectral_library/annotate.py

		)


		def add_dense_lib(

Collaborator

GeorgWa Apr 11, 2025

what do you think about adding this function directly to the speclib flat object?
so we can call SpecLibFlat.add_dense_lib().
I think this fits the ab pattern better.

alphabase/spectral_library/flat.py

                       # Now if we have a fragment type that is a,b,c we should have the corresponding x,y,z
                       corresponding = {"a": "x", "b": "y", "c": "z", "x": "a", "y": "b", "z": "c"}
-                      loss_number_to_type = {0: "", 18: "_H2O", 17: "_NH3", 98: "_modloss"}

Collaborator

GeorgWa Apr 11, 2025

get_full_charged_types() is deprecated, so there is no need to do any changes here.

alphabase/constants/spectral_library.py

+              # Loss type mappings - these map loss type numeric codes to their string representations
+              # Used in annotate.py and flat.py
+              LOSS_NUMBER_TO_TYPE = {0: "", 17: "_NH3", 18: "_H2O", 98: "_modloss"}

Collaborator

GeorgWa Apr 11, 2025

I think this can be removed as it's not needed by the new AB based implementation :).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet