GeoCities was a titan of web hosting back in the Web 1.0 days. It shut down on October 26, 2009, while GeoCities Japan shut down on March 31, 2019.
      It may seem ridiculous to bring this up now, but there are a handful of these links still scattered throughout the database (primarily from the Japanese service).

          [MBBE-47] Mark GeoCities links as ended

          derat added a comment -

          I've created the remaining 466 edits: https://musicbrainz.org/user/derat_bot/edits/open

          (Some of the relationships were already marked as ended; some URLs had multiple relationships.)

          derat added a comment - I've created the remaining 466 edits: https://musicbrainz.org/user/derat_bot/edits/open (Some of the relationships were already marked as ended; some URLs had multiple relationships.)

          derat added a comment -

          derat added a comment - I've created a few votable edits, in case anyone wants to take a look: https://musicbrainz.org/edit/97873639 https://musicbrainz.org/edit/97873641 https://musicbrainz.org/edit/97873642 https://musicbrainz.org/edit/97873668

          Not sure, but probably (re: removal)

          Nicolás Tamargo added a comment - Not sure, but probably (re: removal)

          derat added a comment -

          I'm planning to make some trial edits for this soon using the following regular expression to match URLs:

          ^https?://(?:[-a-z0-9]+\.)?geocities\.(?:yahoo\.)?(com|jp|co\.jp)/.*$
          

          (There are 5 geocities.yahoo.co.jp URLs that I missed earlier. They also redirect to thanks.yahoo.co.jp now.)

          For each URL, I'm iterating over all of its relationships. If any of the relationships aren't ended, I'm ending them with the date 2019-03-31 (for .jp and .co.jp) or 2009-10-26 (otherwise).

          Does that all sound correct?

          I noticed a 2010-era edit from @jesus2099 that received some no votes before being canceled: https://musicbrainz.org/edit/13180626 . Does anyone remember the history there? Were the no votes just because it was removing the relationship rather than marking it as ended?

          derat added a comment - I'm planning to make some trial edits for this soon using the following regular expression to match URLs: ^https?://(?:[-a-z0-9]+\.)?geocities\.(?:yahoo\.)?(com|jp|co\.jp)/.*$ (There are 5 geocities.yahoo.co.jp URLs that I missed earlier. They also redirect to thanks.yahoo.co.jp now.) For each URL, I'm iterating over all of its relationships. If any of the relationships aren't ended, I'm ending them with the date 2019-03-31 (for .jp and .co.jp) or 2009-10-26 (otherwise). Does that all sound correct? I noticed a 2010-era edit from @jesus2099 that received some no votes before being canceled: https://musicbrainz.org/edit/13180626 . Does anyone remember the history there? Were the no votes just because it was removing the relationship rather than marking it as ended?

          You're probably right, yes. This does seem to require a DB mirror at least.

          Nicolás Tamargo added a comment - You're probably right, yes. This does seem to require a DB mirror at least.

          derat added a comment -

          Thanks for the pointer to the script. I don't have a local mirror of the database to query, so I think I may need to take a different approach. :-/

          Just to make sure I'm not missing anything, am I correct in thinking that there isn't a way to get enough information out of the API to edit an existing relationship? /ws/2/url/<mbid>/?inc=artist-rels+release-rels+recording-rels doesn't seem to give me the ID that I'd need in order to modify or delete a relationship via /relationship-editor. It looks like I can extract the ID (and other relationship-related info) from the script tag in /url/<mbid>/edit, but since that's a bit hacky, I figured I'd double-check first.

          derat added a comment - Thanks for the pointer to the script. I don't have a local mirror of the database to query, so I think I may need to take a different approach. :-/ Just to make sure I'm not missing anything, am I correct in thinking that there isn't a way to get enough information out of the API to edit an existing relationship? /ws/2/url/<mbid>/?inc=artist-rels+release-rels+recording-rels doesn't seem to give me the ID that I'd need in order to modify or delete a relationship via /relationship-editor. It looks like I can extract the ID (and other relationship-related info) from the script tag in /url/<mbid>/edit, but since that's a bit hacky, I figured I'd double-check first.

          yvanzo added a comment -

          Thanks for addressing this!

          Actually the whole MusicBrainz project is a Sisyphean task on its own.

          yvanzo added a comment - Thanks for addressing this! Actually the whole MusicBrainz project is a Sisyphean task on its own.

          If you decide to go for this, see https://github.com/reosarevok/musicbrainz-bot/blob/master/end_bbc.py for a script I had to set a different site as ended which could easily be modified for it.

          Nicolás Tamargo added a comment - If you decide to go for this, see https://github.com/reosarevok/musicbrainz-bot/blob/master/end_bbc.py for a script I had to set a different site as ended which could easily be modified for it.

          derat added a comment -

          Part of me feels that keeping on top of broken links is a Sisyphean task, but it'd probably be pretty easy to update my bot code to do this.

          Here are the URL counts that I see in the 20230225-002009 dump (around 500 in total):

          227 www.geocities.com
          168 www.geocities.jp
          56 www.geocities.co.jp
          22 music.geocities.jp
          5 geocities.yahoo.co.jp
          5 geocities.com
          3 it.geocities.com
          3 es.geocities.com
          2 uk.geocities.com
          2 1st.geocities.jp
          1 sky.geocities.jp
          1 park.geocities.jp
          1 mx.geocities.com
          1 movie.geocities.jp
          1 island.geocities.jp
          1 geocities.jp
          1 de.geocities.com
          1 br.geocities.com
          1 beauty.geocities.jp
          1 au.geocities.com
          1 akiba.geocities.jp

          All of them seem to use http schemes rather than https.

          There are ~20 www.geocities.ws and geocities.ws URLs that should be preserved; that looks like it's an ironically-named webhost that's still operational.

          derat added a comment - Part of me feels that keeping on top of broken links is a Sisyphean task, but it'd probably be pretty easy to update my bot code to do this. Here are the URL counts that I see in the 20230225-002009 dump (around 500 in total): 227 www.geocities.com 168 www.geocities.jp 56 www.geocities.co.jp 22 music.geocities.jp 5 geocities.yahoo.co.jp 5 geocities.com 3 it.geocities.com 3 es.geocities.com 2 uk.geocities.com 2 1st.geocities.jp 1 sky.geocities.jp 1 park.geocities.jp 1 mx.geocities.com 1 movie.geocities.jp 1 island.geocities.jp 1 geocities.jp 1 de.geocities.com 1 br.geocities.com 1 beauty.geocities.jp 1 au.geocities.com 1 akiba.geocities.jp All of them seem to use http schemes rather than https. There are ~20 www.geocities.ws and geocities.ws URLs that should be preserved; that looks like it's an ironically-named webhost that's still operational.

            derat derat
            hibiscuskazeneko HibiscusKazeneko
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved:

                Version Package