Uploaded image for project: 'MusicBrainz Batch Edits'
  1. MusicBrainz Batch Edits
  2. MBBE-71

Canonicalize Tidal streaming URLs to https://tidal.com/artist/123

    • Icon: Task Task
    • Resolution: Fixed
    • Icon: Normal Normal
    • URL

      MBS-10621 added canonicalization of Tidal streaming URLs, but there appear to be around 12,000 URLs in the database using pre-canonicalization forms. I'd like to rewrite these URLs to use the canonicalized form so that Tidal URLs can be resolved to artist/release/recording entities using a single API call when seeding edits instead of needing to iterate through all the potential forms.

      I'm identifying old URLs by running this against the 20230225-002009 dump (using GNU grep with -P so I can pass a PCRE):

      grep -P '\thttps?://(listen\.tidal\.com|tidal\.com/browse)/(album|artist|track)/\d+\t' mbdump/url

      That results in 12,094 rows. Almost all of them have a last_updated timestamp between 2015-06-30 and 2021-10-04, but there are 30 rows that were updated between 2023-01-13 and 2023-02-09. I'm trying to get in touch with the editor who added them to figure out how they were added.

      I plan to use the derat_bot user and the code at https://github.com/derat/mbbot to rewrite the old URLs to the canonicalized https://tidal.com/album/123 form.

      I've also started a forum thread about this: https://community.metabrainz.org/t/registering-a-bot-to-canonicalize-old-tidal-urls/625741

          Loading...
          Uploaded image for project: 'MusicBrainz Batch Edits'
          1. MusicBrainz Batch Edits
          2. MBBE-71

          Canonicalize Tidal streaming URLs to https://tidal.com/artist/123

            • Icon: Task Task
            • Resolution: Fixed
            • Icon: Normal Normal
            • URL

              MBS-10621 added canonicalization of Tidal streaming URLs, but there appear to be around 12,000 URLs in the database using pre-canonicalization forms. I'd like to rewrite these URLs to use the canonicalized form so that Tidal URLs can be resolved to artist/release/recording entities using a single API call when seeding edits instead of needing to iterate through all the potential forms.

              I'm identifying old URLs by running this against the 20230225-002009 dump (using GNU grep with -P so I can pass a PCRE):

              grep -P '\thttps?://(listen\.tidal\.com|tidal\.com/browse)/(album|artist|track)/\d+\t' mbdump/url

              That results in 12,094 rows. Almost all of them have a last_updated timestamp between 2015-06-30 and 2021-10-04, but there are 30 rows that were updated between 2023-01-13 and 2023-02-09. I'm trying to get in touch with the editor who added them to figure out how they were added.

              I plan to use the derat_bot user and the code at https://github.com/derat/mbbot to rewrite the old URLs to the canonicalized https://tidal.com/album/123 form.

              I've also started a forum thread about this: https://community.metabrainz.org/t/registering-a-bot-to-canonicalize-old-tidal-urls/625741

                    derat derat
                    derat derat
                    Votes:
                    0 Vote for this issue
                    Watchers:
                    3 Start watching this issue

                      Created:
                      Updated:
                      Resolved:

                        Version Package

                          derat derat
                          derat derat
                          Votes:
                          0 Vote for this issue
                          Watchers:
                          3 Start watching this issue

                            Created:
                            Updated:
                            Resolved:

                              Version Package