-
Task
-
Resolution: Fixed
-
Normal
MBS-10621 added canonicalization of Tidal streaming URLs, but there appear to be around 12,000 URLs in the database using pre-canonicalization forms. I'd like to rewrite these URLs to use the canonicalized form so that Tidal URLs can be resolved to artist/release/recording entities using a single API call when seeding edits instead of needing to iterate through all the potential forms.
I'm identifying old URLs by running this against the 20230225-002009 dump (using GNU grep with -P so I can pass a PCRE):
grep -P '\thttps?://(listen\.tidal\.com|tidal\.com/browse)/(album|artist|track)/\d+\t' mbdump/url
That results in 12,094 rows. Almost all of them have a last_updated timestamp between 2015-06-30 and 2021-10-04, but there are 30 rows that were updated between 2023-01-13 and 2023-02-09. I'm trying to get in touch with the editor who added them to figure out how they were added.
I plan to use the derat_bot user and the code at https://github.com/derat/mbbot to rewrite the old URLs to the canonicalized https://tidal.com/album/123 form.
I've also started a forum thread about this: https://community.metabrainz.org/t/registering-a-bot-to-canonicalize-old-tidal-urls/625741