Skip to content

Centralize release id extraction #5761

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 18, 2025
Merged

Centralize release id extraction #5761

merged 2 commits into from
May 18, 2025

Conversation

snejus
Copy link
Member

@snejus snejus commented May 5, 2025

Description

Refactor: Centralize release ID extraction

This change introduces a new utility function extract_release_id in beets.util.id_extractors to handle the parsing of release IDs (or URLs containing IDs) for various metadata sources (Spotify, Deezer, Beatport, Discogs, MusicBrainz, Bandcamp, Tidal).

Key changes:

  • Added extract_release_id function and PATTERN_BY_SOURCE regex dictionary.
  • Converted MetadataSourcePlugin._get_id static method to an instance method which uses the data_source property to pick the correct id extractor.
  • Removed the id_regex property and updated _get_id calls in all modules.
  • Replaced old tests related to ID parsing in individual plugin test files with test/util/test_id_extractors.py that tests the extract_release_id function.

This refactoring simplifies the codebase, reduces redundancy, and makes it easier to manage and extend ID extraction logic for different sources.

@snejus snejus requested review from wisp3rwind and semohr May 5, 2025 16:42
Copy link

github-actions bot commented May 5, 2025

Thank you for the PR! The changelog has not been updated, so here is a friendly reminder to check if you need to add an entry.

@snejus snejus force-pushed the restructure-id-extractors branch from 75015dd to b386fea Compare May 5, 2025 16:43
@snejus snejus requested a review from Copilot May 5, 2025 16:44
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR centralizes the extraction of release IDs by introducing a new utility function extract_release_id and replacing source‐specific regex handling with a common implementation. Key changes include:

  • Adding and integrating extract_release_id into various plugins and test files.
  • Removing redundant code and tests that previously handled ID extraction separately.
  • Updating plugin methods to use the new extraction logic.

Reviewed Changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
test/util/test_id_extractors.py Introduces tests for the centralized extract_release_id function.
test/test_plugins.py Removes outdated tests related to individual ID regex extraction.
test/plugins/test_musicbrainz.py Removes tests for the old _parse_id functionality.
test/plugins/test_discogs.py Deletes legacy Discogs ID parsing tests in favor of the new method.
beetsplug/spotify.py Updates album_for_id and track_for_id methods to use extract_release_id.
beetsplug/musicbrainz.py Switches from _parse_id to extract_release_id; updates album/track lookups.
beetsplug/discogs.py Replaces extract_discogs_id_regex with extract_release_id.
beetsplug/deezer.py Refines ID extraction using the new utility with walrus operator.
beetsplug/beatport.py Updates release ID extraction handling with the centralized approach.
beets/util/id_extractors.py Defines extract_release_id and a PATTERN_BY_SOURCE mapping.
beets/plugins.py Refactors _get_id to delegate ID extraction to extract_release_id.

@snejus snejus force-pushed the restructure-id-extractors branch 3 times, most recently from 2374b06 to c51da3f Compare May 5, 2025 16:56
@semohr
Copy link
Contributor

semohr commented May 5, 2025

Thanks for already implementing this!

Are we set on the functional approach here? I would prefer if this would be part of the plugins themself (e.g. defined as an required property or method in the abc). Is in my opinion closer to the actual logic and would required anyone who creates a new source plugin to properly register their ids.

I know not all plugins are SourcePlugins yet, but we might want to implement the shared parent class first before continuing with this. In my opinion such shared behavior (across all source plugins) is exactly a use case where abstract base classes shine bright.

@wisp3rwind
Copy link
Member

Are we set on the functional approach here? I would prefer if this would be part of the plugins themself (e.g. defined as an required property or method in the abc). Is in my opinion closer to the actual logic and would required anyone who creates a new source plugin to properly register their ids.

👍 I only had a very superficial look at the PR, but I'd agree that this looks like plugins should register their ID patterns. Right now, beetsplug and beets are unexpectedly intertwined.

@semohr

This comment has been minimized.

@snejus
Copy link
Member Author

snejus commented May 5, 2025

In an ideal world, I would also imagine that metadata source plugins have something like what @semohr suggested:

I would prefer if this would be part of the plugins themself (e.g. defined as an required property or method in the abc)

And they indeed originally had been part of the plugins, however they have been moved out #4633 in order to support saving external IDs by the musicbrainz plugin (see the relevant changes in beetsplug/musicbrainz.py in this PR).

Arguably, we may be able to remove this ID parsing from musicbrainz and instead just save the URLs? The code that will eventually make use of these IDs can parse them from URLs. Currently, we have no functionality that makes use of them.

On the other hand, what do we do about bandcamp and tidal id regexes? These plugins are not present in the codebase, so how would we parse them if we'd expect the plugins to define them?

@wisp3rwind

This comment has been minimized.

@semohr

This comment has been minimized.

@wisp3rwind

This comment has been minimized.

@semohr

This comment has been minimized.

@snejus snejus force-pushed the make-musicbrainz-a-plugin branch from f0afda7 to 0c2fc7b Compare May 6, 2025 20:58
@snejus snejus force-pushed the restructure-id-extractors branch 3 times, most recently from c51da3f to fb0d14e Compare May 6, 2025 21:25
@snejus

This comment has been minimized.

@semohr

This comment has been minimized.

@snejus
Copy link
Member Author

snejus commented May 6, 2025

I see. See my question from above:

On the other hand, what do we do about bandcamp and tidal id regexes? These plugins are not present in the codebase, so how would we parse them if we'd expect the plugins to define them?

@semohr
Copy link
Contributor

semohr commented May 6, 2025

These additional regexes are only used in the musicbrainz plugin, right? If this is the case, it wouldn't be too big of an issue to just have them as part of the mb plugin.

I'm not super set on this approach, but I generally think the extraction should be closer to the plugins.

@snejus snejus force-pushed the make-musicbrainz-a-plugin branch 2 times, most recently from 66fd8f5 to cab0246 Compare May 7, 2025 21:54
@snejus snejus force-pushed the restructure-id-extractors branch from fb0d14e to 409f3ab Compare May 7, 2025 21:57
@semohr
Copy link
Contributor

semohr commented May 7, 2025

I had a try at this myself and implemented the extraction on plugin level in an earlier commit of #5764.

To be honest my approach gets convoluted and incomprehensive quickly, even if working in theory. I would be happy to go with your inital approach for id extraction as it is more understandable and maintainable.

Could we move the extractors utils into the beetsplug folder tho? That way they are close to where the are used. I think that should be a decent compromise.

@snejus
Copy link
Member Author

snejus commented May 8, 2025

Could we move the extractors utils into the beetsplug folder tho? That way they are close to where the are used. I think that should be a decent compromise.

The extraction function is called from the generic MetadataSourcePlugin, i.e., it's imported in beets core beets.plugins. I think the consensus is that beets core does not import from beetsplug package since it's for plugins only.

@snejus snejus force-pushed the restructure-id-extractors branch from 409f3ab to ad0a784 Compare May 8, 2025 03:10
@semohr
Copy link
Contributor

semohr commented May 8, 2025

The extraction function is called from the generic MetadataSourcePlugin, i.e., it's imported in beets core beets.plugins. I think the consensus is that beets core does not import from beetsplug package since it's for plugins only.

The _get_id method is not required by the MetadataSourcePlugin abstract base class—for example, the Chroma plugin does not use ID extraction at all. Furthermore, only concrete implementations of MetadataSourcePlugin make use of _get_id, which is somewhat of an antipattern in abstract class design.

  • If this functionality is not needed for the general MetadataSourcePlugin interface (e.g. from the Beets core side), I would suggest omitting it from the base class entirely.
  • I view this base class primarily as a contract that plugins must fulfill in order to be recognized as MetadataSourcePlugins.

Also, once we merge #5764 into this PR, the _get_id method should already be removed.

Apologies for the back and forth—this really feels more like a design decision we should get right to ensure it's future-proof than a simple refactor.

@snejus
Copy link
Member Author

snejus commented May 8, 2025

Apologies for the back and forth—this really feels more like a design decision we should get right to ensure it's future-proof than a simple refactor.

No need to apologize! All good 😆

The _get_id method is not required by the MetadataSourcePlugin abstract base class

_get_id is an internal plugin util rather than a public method which must exist in order to fulfil plugin's designed purpose (like candidates etc.). Each metadata source plugin requires it (musicbrainz and discogs use extract_release_id function directly since they don't inherit MetadataSourcePlugin). One of the reasons why I want to make these two inherit MetadataSourcePlugin is so that they use the unified _get_id implementation.

the Chroma plugin does not use ID extraction at all

Indeed - another reason why _get_id shouldn't be abstract.

I don't think we need to overthink this too much - this method simply follows DRY design and defines shared functionality within the base class.

@snejus snejus force-pushed the make-musicbrainz-a-plugin branch from cab0246 to eae7f70 Compare May 8, 2025 20:54
@snejus snejus force-pushed the restructure-id-extractors branch from ad0a784 to 2ee5de4 Compare May 8, 2025 20:55
@semohr
Copy link
Contributor

semohr commented May 8, 2025

I do not think thought _get_id should be abstract. I think it is probably not necessary at all if we define the patterns together in one place 😅

We should really think about moving the extractor utils into the beetsplug folder tho. It is kinda confusing that the regex patterns are defined in beets, because they are not directly used in beets but only by beetsplug i.e. inside specific plugin implementations.

Nonetheless Im still pretty happy with these enhancments as they are a mayor improvment over the previous behaviour. We can still move/remove it in favor of direct imports later.

@snejus snejus force-pushed the make-musicbrainz-a-plugin branch 3 times, most recently from edf688f to f1dc75f Compare May 17, 2025 02:32
Base automatically changed from make-musicbrainz-a-plugin to master May 17, 2025 02:40
@snejus snejus force-pushed the restructure-id-extractors branch from 2ee5de4 to a8e33a2 Compare May 17, 2025 12:19
@snejus snejus force-pushed the restructure-id-extractors branch from 5202219 to 8936ae4 Compare May 17, 2025 13:57
@snejus
Copy link
Member Author

snejus commented May 18, 2025

Just need your approval @semohr if this looks good

@semohr
Copy link
Contributor

semohr commented May 18, 2025

Looks good to me 👍 Once we are done with this, It would be awesome to continue with #5787, since it partially depends on this PR.

@snejus snejus merged commit 4ddd782 into master May 18, 2025
20 checks passed
@snejus snejus deleted the restructure-id-extractors branch May 18, 2025 13:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants