• Type: New Feature
    • Resolution: Fixed
    • Priority: Normal
    • MBS-8393
    • Affects Version/s: None
    • Component/s: Entity attributes
    • None

      It seems there are releases (e.g. by BMG Direct), which have a barcode that is in Code 39 encoding. One example is the release entered in edit #15627943, but there are others.

      The text under the barcode is "D 140757", and the actual barcode decoded by e.g. onlinebarcodereader.com is:

      Format: CODE_39
      Type: TEXT
      
      D140757
      

      Should MusicBrainz support these barcodes in the barcode field, or is it better to add them as additional catalog number?

      Discussion is also in this forum thread.

          [STYLE-787] Support "Code 39" barcodes

          Being added as part of the implementation of MBS-8393

          Nicolás Tamargo added a comment - Being added as part of the implementation of MBS-8393

          Sean Burke added a comment -

          Took another look at this. For now, the best approach is probably to offer a limited subset of Code 39 sufficient for the one use case that's been mentioned, which is on BMG Direct releases. (D, numerals, and possibly space.) This also lets us do a minimal amount of validation on what's added. If we need broader support, that should probably be a new feature and a new ticket that gets added later. I'll start on this in a week or two unless someone else provides a concrete use case outside of the BMG Direct barcodes.

          Sean Burke added a comment - Took another look at this. For now, the best approach is probably to offer a limited subset of Code 39 sufficient for the one use case that's been mentioned, which is on BMG Direct releases. (D, numerals, and possibly space.) This also lets us do a minimal amount of validation on what's added. If we need broader support, that should probably be a new feature and a new ticket that gets added later. I'll start on this in a week or two unless someone else provides a concrete use case outside of the BMG Direct barcodes.

          Sean Burke added a comment -

          Based on discussion in #musicbrainz-devel, this is the approach I have decided to take with this:

          Code 39 barcodes will be stored in the same fashion as our current barcodes and entered using the same UI field. The validation of this field will be adjusted to allow the full set of characters allowed in Code 39. However, to prevent misentered barcodes, all-numeric barcodes will be stripped of all whitespace. Additionally, barcodes that validate as UPC/EAN after stripping punctuation and spaces will be stripped.

          Some concern has been expressed that other forms of punctuation will creep into UPC/EAN barcodes, but in the interest of not restricting what users can enter overly much, numeric-and-punctuation barcodes won't be stripped. However, reports will be generated periodically after release to determine whether further restrictions would be beneficial.

          Sean Burke added a comment - Based on discussion in #musicbrainz-devel, this is the approach I have decided to take with this: Code 39 barcodes will be stored in the same fashion as our current barcodes and entered using the same UI field. The validation of this field will be adjusted to allow the full set of characters allowed in Code 39. However, to prevent misentered barcodes, all-numeric barcodes will be stripped of all whitespace. Additionally, barcodes that validate as UPC/EAN after stripping punctuation and spaces will be stripped. Some concern has been expressed that other forms of punctuation will creep into UPC/EAN barcodes, but in the interest of not restricting what users can enter overly much, numeric-and-punctuation barcodes won't be stripped. However, reports will be generated periodically after release to determine whether further restrictions would be beneficial.

          Alex Mauer added a comment -

          Yep, that seems reasonable to me.

          Alex Mauer added a comment - Yep, that seems reasonable to me.

          Sean Burke added a comment -

          That's a good point. What about allowing users to somehow note that a particular barcode they're entering is free text and not a UPC or EAN? This is encoding-agnostic, still allows for validation of UPN/EAC and hopefully allows Code 39 barcodes to be useful as identifiers.

          Sean Burke added a comment - That's a good point. What about allowing users to somehow note that a particular barcode they're entering is free text and not a UPC or EAN? This is encoding-agnostic, still allows for validation of UPN/EAC and hopefully allows Code 39 barcodes to be useful as identifiers.

          Alex Mauer added a comment -

          Right, but Code 39 is nothing to do with the fact that it's being used to store an identifier for the album.

          It's like saying that we should have a "Cyrillic" field to store anything written in Cyrillic. It just doesn't make sense to record "the code 39" because it could be anything written like that; unlike Cyrillic it is relatively hard for people to recognize how a barcode is encoded by looking at it, and unlike EAN/UPC, there is no particular standard for us to even detect "valid" vs. "invalid" code 39, except for falling outside the charset given above.

          I would wholly support allowing multiple barcodes and barcode formats, especially if they could be filled in semi-automatically by processing cover art images with a barcode scanner library like http://zbar.sourceforge.net/ (basically like MBS-3978 but with auto detection)

          Specifically "code 39" as a barcode field just makes little sense.

          Alex Mauer added a comment - Right, but Code 39 is nothing to do with the fact that it's being used to store an identifier for the album. It's like saying that we should have a "Cyrillic" field to store anything written in Cyrillic. It just doesn't make sense to record "the code 39" because it could be anything written like that; unlike Cyrillic it is relatively hard for people to recognize how a barcode is encoded by looking at it, and unlike EAN/UPC, there is no particular standard for us to even detect "valid" vs. "invalid" code 39, except for falling outside the charset given above. I would wholly support allowing multiple barcodes and barcode formats, especially if they could be filled in semi-automatically by processing cover art images with a barcode scanner library like http://zbar.sourceforge.net/ (basically like MBS-3978 but with auto detection) Specifically "code 39" as a barcode field just makes little sense.

          Sean Burke added a comment -

          I see your point, but at the same time, Code 39 still seems to be used as an identifier of some sort for the album. Otherwise, why encode it as a barcode? Just as you say format says little about the meaning of the content, the potential uses of the format say little about the actual uses, and from a data standpoint, it seems problematic to create extra fields for any barcode standard which gets used as an identifier for CDs.

          As I recall, you and I had a similar discussion about this during the schema talks for BB. What if, as a middle ground, support for MBS-3978 were implemented and it was made possible to select UPN/EAN or Code 39, with the former as default and having the same restrictions as it does now?

          Sean Burke added a comment - I see your point, but at the same time, Code 39 still seems to be used as an identifier of some sort for the album. Otherwise, why encode it as a barcode? Just as you say format says little about the meaning of the content, the potential uses of the format say little about the actual uses, and from a data standpoint, it seems problematic to create extra fields for any barcode standard which gets used as an identifier for CDs. As I recall, you and I had a similar discussion about this during the schema talks for BB. What if, as a middle ground, support for MBS-3978 were implemented and it was made possible to select UPN/EAN or Code 39, with the former as default and having the same restrictions as it does now?

          Alex Mauer added a comment -

          I guess that depends on what you mean by their basic purpose:

          UPC/EAN have a basic purpose of being a product number for ~anything.

          Code 39 has the basic purpose of storing an arbitrary string of the aforementioned characters, and therefore could have any actual purpose — from encoding the name of the album or artist, to the date of release or recording, to a copy of the catalog number (or even the UPC/EAN )

          The above also applies to most barcode formats, and in most cases the format says little about the meaning of the content. So storing that information is next to useless, and asking the user to determine it is asking for trouble, since users will mostly have no idea, and will just make that field very unreliable.

          Alex Mauer added a comment - I guess that depends on what you mean by their basic purpose: UPC/EAN have a basic purpose of being a product number for ~anything. Code 39 has the basic purpose of storing an arbitrary string of the aforementioned characters, and therefore could have any actual purpose — from encoding the name of the album or artist, to the date of release or recording, to a copy of the catalog number (or even the UPC/EAN ) The above also applies to most barcode formats, and in most cases the format says little about the meaning of the content. So storing that information is next to useless, and asking the user to determine it is asking for trouble, since users will mostly have no idea, and will just make that field very unreliable.

          Sean Burke added a comment -

          The problem I see with separating UPC/EAN and Code 39 barcodes is that their basic purpose is the same, and it would complicate querying releases by barcode. Though it's also not ideal, we could require the user to acknowledge that they're entering a Code 39 barcode. I'm not sure how we would educate in this regard, but as it stands people can still enter UPC/EAN barcodes with missing or incorrect check digits even though the barcode itself reads fine. Acknowledging Code 39 is a bit more to ask, but potentially doable.

          Sean Burke added a comment - The problem I see with separating UPC/EAN and Code 39 barcodes is that their basic purpose is the same, and it would complicate querying releases by barcode. Though it's also not ideal, we could require the user to acknowledge that they're entering a Code 39 barcode. I'm not sure how we would educate in this regard, but as it stands people can still enter UPC/EAN barcodes with missing or incorrect check digits even though the barcode itself reads fine. Acknowledging Code 39 is a bit more to ask, but potentially doable.

          nikki added a comment -

          I was initially fine with the idea of storing code 39 barcodes in the current barcode field, but after reading that conversation, I now really think the current barcode field should be defined as EANs/UPCs and we should add another way of entering other types of barcodes (something similar to what's proposed in MBS-3296 for works for example). This basically comes down to code 39 barcodes being too variable.

          Currently we strip spaces out of barcodes. People try entering them for UPCs/EANs because they're copying the numerical representation under the barcode, not the actual encoded barcode. The actual encoded barcode does not have spaces so they shouldn't be entered. According to Alex, allowing code 39 barcodes means we have to allow spaces and can't prevent people from inserting spaces where they don't belong. I think people attempting to put spaces in normal barcodes would be a far more common scenario than releases with code 39 barcodes. There are also EANs which have a letter before the barcode. This is not part of the barcode and shouldn't be entered, but since these would be valid code 39 barcodes, we can't prevent that either.

          According to previous comments, releases can have both a UPC/EAN and a code 39 barcode. This would mean people encountering these could just enter two barcodes in the barcode field and we wouldn't be able to detect these as being wrong either.

          nikki added a comment - I was initially fine with the idea of storing code 39 barcodes in the current barcode field, but after reading that conversation, I now really think the current barcode field should be defined as EANs/UPCs and we should add another way of entering other types of barcodes (something similar to what's proposed in MBS-3296 for works for example). This basically comes down to code 39 barcodes being too variable. Currently we strip spaces out of barcodes. People try entering them for UPCs/EANs because they're copying the numerical representation under the barcode, not the actual encoded barcode. The actual encoded barcode does not have spaces so they shouldn't be entered. According to Alex, allowing code 39 barcodes means we have to allow spaces and can't prevent people from inserting spaces where they don't belong. I think people attempting to put spaces in normal barcodes would be a far more common scenario than releases with code 39 barcodes. There are also EANs which have a letter before the barcode. This is not part of the barcode and shouldn't be entered, but since these would be valid code 39 barcodes, we can't prevent that either. According to previous comments, releases can have both a UPC/EAN and a code 39 barcode. This would mean people encountering these could just enter two barcodes in the barcode field and we wouldn't be able to detect these as being wrong either.

            Assignee:
            yvanzo
            Reporter:
            Johannes W
            Votes:
            1 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated:
              Resolved:

                Version Package
                MBS-8393