Uploaded image for project: 'MusicBrainz Batch Edits'
  1. MusicBrainz Batch Edits
  2. MBBE-71

Canonicalize Tidal streaming URLs to https://tidal.com/artist/123

XMLWordPrintable

    • Icon: Task Task
    • Resolution: Fixed
    • Icon: Normal Normal
    • URL

      MBS-10621 added canonicalization of Tidal streaming URLs, but there appear to be around 12,000 URLs in the database using pre-canonicalization forms. I'd like to rewrite these URLs to use the canonicalized form so that Tidal URLs can be resolved to artist/release/recording entities using a single API call when seeding edits instead of needing to iterate through all the potential forms.

      I'm identifying old URLs by running this against the 20230225-002009 dump (using GNU grep with -P so I can pass a PCRE):

      grep -P '\thttps?://(listen\.tidal\.com|tidal\.com/browse)/(album|artist|track)/\d+\t' mbdump/url

      That results in 12,094 rows. Almost all of them have a last_updated timestamp between 2015-06-30 and 2021-10-04, but there are 30 rows that were updated between 2023-01-13 and 2023-02-09. I'm trying to get in touch with the editor who added them to figure out how they were added.

      I plan to use the derat_bot user and the code at https://github.com/derat/mbbot to rewrite the old URLs to the canonicalized https://tidal.com/album/123 form.

      I've also started a forum thread about this: https://community.metabrainz.org/t/registering-a-bot-to-canonicalize-old-tidal-urls/625741

        1. album_track_edits
          0.9 kB
          derat
        2. url_rewrites.tsv
          98 kB
          derat

            derat derat
            derat derat
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved:

                Version Package