-
Sub-task
-
Resolution: Unresolved
-
Normal
-
None
-
None
Spotted at least the three following issues around the term “entity type”.
I) Shortening
First, “entity type” is often shortened to “entity” which is a misused of language that can lead to confusion as these terms actually have different meanings: Artist is an entity type, Rick Astley is an (artist) entity. It starts to be an issue when it occurs in the code sometimes and in the documentation mainly.
Suggested usage:
- Avoid shortening “entity type” to “entity” in the documentation to start with.
- Then to gradually refactor the code as issues are spotted, e.g. "entities.json" should likely be renamed to "entity_types.json".
II) Usage to end-users
Second, the term “entity” has a broader meaning in the code (where it can refer to any object model) than it has in the documentation and in the community (where it mostly refers to relatable or editable entity mostly). Actually, this term has been introduced to the end-users (editors and visitors) very recently through some very generic messages.
Genuine question: Can we avoid using it again in messages to end-users as it is a quite abstract concept and we have been able not to use it for so many years before?
Unfortunately, I don’t have any good suggestion here, but maybe just document massively misnamed code in HACKING.md at least.
III) Categorization terms
Last, various terms are used in the documentation and in the code to categorize entity types: “basic”, “core”, “musical production”, “MusicBrainz”, “normal”, “primary”, “relatable”, “secondary”, “supplementary”, “with editing restrictions”. Some terms can be sometimes synonymous to some other terms. Some terms can have contradictory meanings depending on the context.
- MusicBrainz API refers to (“MusicBrainz” or “core”) “entities” in a limited meaning: Only entity types that can have relationships, alphabetically ordered.
- MusicBrainz Database refers to “core data” which include in this order: artists, release groups, releases, mediums, recordings, works, labels, relationships, URLs, and CD stubs. Most of these are entities though so these can be easily confused with “core entities” whatever it means.
- MusicBrainz Database Schema refers to “primary entities” (by type) in a limited meaning: Only entity types that can have relationships, alphabetically ordered. It then refers to “secondary entities” for: artist credit, medium, and track.
- MusicBrainz Entity refers to “MusicBrainz” entity (types) in a limited meaning: Only entity types that can have relationships, alphabetically ordered. It splits into two categories: “normal” and “with editing restrictions”, alphabetically ordered.
- MusicBrainz Identifier refers to “entities” in a limited meaning: Only some things that have an MBID (among other properties): artists, release groups, releases, recordings, works, labels, areas, places and URLs. It mentions tracks separately. It adds that in the context of taggers, the MBIDs are most commonly used to identify: recording, release, label, track artist, release artist.
- python-musicbrainz3 refers to “core entities” with an acceptance limited to: artist, label, medium, recording, release, release group, track (with an open question: is it a “top-level entity?”), url, work.
- Terminology refers to “Basic MusicBrainz entities” (should be entity types) in a limited meaning: Only entity types that can have relationships, in this specific ordering: artist, release group, release, recording, work, label, area, place, series, event, instrument, URL.
- Search selector in the navigation bar at https://musicbrainz.org (code) refers to “core entities” in a limited meaning: Only entity types that can have relationships, in this specific ordering: artist, (then “musical production”) event, recording, release, release group, series, work, (then “other core entities”) area, instrument, label, place.
- “Basic metadata” table at https://musicbrainz.org/statistics (code) refers to “core entities” in a quite broad meaning, in a specific ordering: artists, release groups, releases, mediums, recordings, tracks, labels, works, URLs, areas, series, instruments, events, genres. Then it refers to “other entities” as a catch-all: editors, relationships, collections, CD stubs, tags, ratings.
- Perl package MusicBrainz::Server::Entity (code) refers to “entity” (as a type/package/model) in a very broad meaning: Any object model with a row id.
- Perl package MusicBrainz::Server::Entity::CoreEntity (code) refers to “core entity” (as a type/package/model) in a quite broad meaning: any object model with a row id, a name, an MBID, and edits.
- File entities.json (code) refers to “entities” (should be entity types) in a broad meaning: cdtoc, gender, isrc, language, link/relationship…
- Flow.js types (code) refer to “core entity type” in a limited meaning: Only entity types that can have relationships.
- Types of entity that can have relationships (as in MusicBrainz relationships) are: area, artist, event, genre (new), instrument, label, place, recording, release, release group, series, url, work.
- Entities of all of these types also have MBID and are editable, but editing entities of the following types require specific privileges: area, genre, instrument.
- The main application of MBIDs has and still is (even though it is no longer the only one) “managing a digital music collection”. Thus the upper importance of some relatable entities: artist, label, recording, release, release group, series (of recordings), work; and also some non-relatable entities: artist credit, medium, track.
Audience | Data associated with row IDs | Public Domain data | Data associated with edits, MBIDs, relationships, Solr cores, and WS endpoints |
---|---|---|---|
MBS code/developers | Entity | CORE_ENTITY/CORE_TABLE | CoreEntity |
Developers using MB API/DB schema | data | core data/entity/table | core/primary entity |
Non-developers | (same if ever used) | (same if ever used) | (MB) entity |
Two terms are used with different meanings here:
- “Entity” doesn’t match the same meaning for developers and non-developers.
- Unfortunately it seems impossible to fix it as “entity” is so widely used among MusicBrainz code, code using MusicBrainz, and MusicBrainz community. However it’s more or less alright as no superset of MB entities is designated using the term entity for non-developers. For example, artist credit isn’t an entity for non-developers.
- “Core” is used for two different (even though overlapping) meanings at the same time. Note that “core” is also used in Solr for logical (in Solr)/search (in MusicBrainz) index.
- To the exception of MBS code this term is used for public domain data only so it can be replaced with the already in-use term “public domain” in this first meaning. As for its second meaning, which is used in MBS code only, it can be replaced with a new term and provided a clear definition. Using a new term will make a clear cut with the current tainted term which will still appear in old commits, old pull requests, and old tickets.
Suggestions
- Use Core/Supplementary for data only (w.r.t. license), do not use it for entity type
- Pros:
- This is its main current usage in the documentation.
- Avoid having different meanings for the same term: Currently it is mainly used for top-level entity types having a name and an MBID in the Catalyst/Perl code (which could probably use another term instead?) and for entity types having relationships in the Flow/JS code (which could use the below “relatable” instead).
- Cons:
- Even using it for data might not be self-explanatory.
- Some refactoring will be needed as there are 354 occurrences of "core" (entity/entities) in the musicbrainz-server repository.
- Audience: dump downloaders, jurists
- Pros:
- Use Relatable for entities that can have relationships (as in MusicBrainz relationships).
- Pros: This is quite self-explanatory (once you know “relationships”) and unambiguous.
- Cons: This term seems to be currently underused in the code (only in “entities.json” and code using these values) and not used at all in the documentation.
- Audience: developers (not just MB developers)
- Status: Already used for a long time and accepted
- Note: It differs from Linkable which is about hyperlinks rather than relationships.
Use Permanent for entities that have MBID (GID in the code) and thus MBID redirects (as in MusicBrainz Identifier and for permanent link in entity page).Pros: This is quite self-explanatory (once you know MBID) and unambiguous.Cons: This term has not ever been used before in the code about entity itself.Audience: developers (not just MB developers)- Fate: Refused as “it makes [bitmap] think of something that can't be removed”, see comments to the pull request #2706.
- Use Central for entities that can have edits, MBIDs, relationships, search indexes, and webservice endpoints, all at once.
- Pros: It’s not been used for anything else so far, it is among synonyms of previous “core”, and it is in the same lexical field as “relationship”.
- Cons: It isn’t self-explanatory either. (But there is no word to summarize editable, global, and relatable).
- Audience: developers (not just MB developers)
- Avoid all other terms except those below.
Minimal entity types:
The minimal subset of entity types that can be used for “managing a digital music collection”
The only two relatable entity types that can be entered without relationships are:
- recording, with mandatory artist credit
- release, with mandatory artist credit, medium, release group
These are linked through track.
- Relatable entity types: artist, recording, release, release group
- Non-relatable entity types: artist credit, medium, track
Other musical entity types:
The other entity types are more particularly about music or music production:
- Relatable entity types: genre, event, instrument, label, series, work
- Non-relatable entity types: ipi, isni, isrc, iswc...
Contextual entity types:
Everything else isn’t specifically about music, just contextual data:
- Relatable entity types: area, url
- Non-relatable entity types: country, gender, language, script...
- has related issue
-
MBS-13003 Generate Flow types from properties in entities.json
- Open