Uploaded image for project: 'MusicBrainz Server'
  1. MusicBrainz Server
  2. MBS-13464

Inconsistent sorting of artist release/release group titles

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Normal Normal
    • Schema change, 2025 Q2
    • None
    • None
    • None

      I noticed that (everything else being equal) dates in titles are not sorted as expected. This is especially noticeable with RGs containing live bootlegs. Seems to be a regression.

      I can reproduce this on both main and beta server.

      See here or here for some examples. I also attach a screenshot.

      Edit: updating with more findings: this affects all titles (not only dates).

          [MBS-13464] Inconsistent sorting of artist release/release group titles

          Description of schema changes:

          1. In the artist_release and artist_release_group tables, we'll replace the existing sort_character columns with name VARCHAR NOT NULL columns. (It's undetermined if this will be done by dropping and recreating the table, or renaming and altering the column data type.)
          2. Indexes using the sort_character column will similarly be recreated to use the name column instead:
            1. artist_release_nonva_idx_sort
            2. artist_release_va_idx_sort
            3. artist_release_group_nonva_idx_sort
            4. artist_release_group_va_idx_sort
          3. The get_artist_release_rows and get_artist_release_group_rows functions will be updated to insert the full release/release group names in the above tables.

          If the above tables were not empty, they will be repopulated with new data.

          Michael Wiencek added a comment - Description of schema changes: In the artist_release and artist_release_group tables, we'll replace the existing sort_character columns with name VARCHAR NOT NULL columns. (It's undetermined if this will be done by dropping and recreating the table, or renaming and altering the column data type.) Indexes using the sort_character column will similarly be recreated to use the name column instead: artist_release_nonva_idx_sort artist_release_va_idx_sort artist_release_group_nonva_idx_sort artist_release_group_va_idx_sort The get_artist_release_rows and get_artist_release_group_rows functions will be updated to insert the full release/release group names in the above tables. If the above tables were not empty, they will be repopulated with new data.

          It only increases the artist_release_group table size by about 3.5% (50 MB) and didn't affect sorting performance at all from my testing. (Initially it was 19% (250 MB), but when I came back later it dropped, so I guess PG compressed it.) artist_release increase by about 18% (370 MB), but that might drop yet too. Ditto for performance being unaffected. So my fears may have been unfounded and I think this is a reasonable change.

           

          Michael Wiencek added a comment - It only increases the artist_release_group table size by about 3.5% (50 MB) and didn't affect sorting performance at all from my testing. (Initially it was 19% (250 MB), but when I came back later it dropped, so I guess PG compressed it.) artist_release increase by about 18% (370 MB), but that might drop yet too. Ditto for performance being unaffected. So my fears may have been unfounded and I think this is a reasonable change.  

          This is likely because our materialized table which sorts the release groups only stores the first character of the name (since it's only factored in if all previous sort criteria are equal: official/unofficial, primary type, secondary types, and first release date; and since storing the complete name would increase the size of the table significantly, as it's duplicated for every artist on the release).

           

          We should check again how much of an increase in size it would cause to store the full titles, and reconsider it if it does not regress performance of the sorting too much. I suppose we could also increase the number of characters stored (say, to 10, which would handle ISO dates), but having the complete name is obviously preferable as we could also sort titles with volume/part numbers at the end of the titles correctly, too.

          Michael Wiencek added a comment - This is likely because our materialized table which sorts the release groups only stores the first character of the name (since it's only factored in if all previous sort criteria are equal: official/unofficial, primary type, secondary types, and first release date; and since storing the complete name would increase the size of the table significantly, as it's duplicated for every artist on the release).   We should check again how much of an increase in size it would cause to store the full titles, and reconsider it if it does not regress performance of the sorting too much. I suppose we could also increase the number of characters stored (say, to 10, which would handle ISO dates), but having the complete name is obviously preferable as we could also sort titles with volume/part numbers at the end of the titles correctly, too.

            bitmap Michael Wiencek
            salo.rock salo.rock
            Votes:
            1 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:

                Version Package
                Schema change, 2025 Q2