-
Improvement
-
Resolution: Fixed
-
Normal
-
None
-
None
After some trouble with overlong titles causing various issues with column indexing in Postgres for mirrors, it has been decided to shortly put an end to it.
Technically, the default B-tree index type that we are using in Postgres is storing up to 3 items by page, and the default “maximum size allowed for the index type” is 8192 bytes which is the default block size when compiling Postgres. This allows for 2704 bytes at most for each string. Strings can contain multiple-byte characters so that a string composed only of four-byte Unicode code points would be limited to 676 characters.
Data quality-wise, we also noticed that there were very few titles/names had a length over 1024 characters at that time, see their full list, and that most of these have been intentionally written to cause issues, while the few others are either mistakes or can be arranged otherwise. Actually, titles are supposed to be “minimal summaries” - see Title (publishing) - hence overlong titles are likely to be better split into separate pieces of information that would fit other fields of the database such as work relationships.
In conclusion, to both preserve the data quality and prevent any further issue for mirrors, API clients, and for any other piece of software having to handle data from the MusicBrainz database, the decision has been made to limit the length of titles/names to 1024 characters at most, and 2704 bytes which amount to 676 four-byte characters at best.
This applies to all titles, names, sort names, including artist credits and aliases.