Uploaded image for project: 'Zapped: AcousticBrainz'
  1. Zapped: AcousticBrainz
  2. AB-301

Artist filtering for datasets

XMLWordPrintable

    • Icon: Improvement Improvement
    • Resolution: Unresolved
    • Icon: Normal Normal
    • None
    • None
    • Dataset editor
    • None

      Part of the best practise of MIR machine learning is to perform "Artist filtering" - that is, take a dataset and make sure that an artist only appears once in the dataset, and not in more than one class. This is to prevent over-fitting during the model training stage.

      We have a hacky version of this in place, which happens during the training phase, but it was slow, and hasn't been updated to reflect other changes in the database (https://tickets.metabrainz.org/browse/AB-300?focusedCommentId=42638&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-42638).

      I would like to move this artist filtering stage to the dataset editor. Once we do AB-115, we should be able to easily get a list of artists for each recording in the dataset.

      When we show the list of classes with number of instances, we can show two numbers. One low-contrast with the actual number of items, and one high contrast with the number of items in the class after applying artist filtering. This way, the user can choose if they really want to perform the training process.

      For challenges and cross-dataset evaluation, we also want to filter out artists who appear in the evaluation set - we could perform this filtering here too.

      There are lots of ideas for things that we could do to extend this, which isn't really in the scope of this ticket. We should open more tickets for:

      • Artists which change their style (how do you even begin to separate these?)
      • Artist aliases - we have two mbids for the same artist - e.g. they perform a similar style under a different name
      • Album filtering? Same concept, is implicit if we use "normal" albums, but not compilations by multiple artists.
      • Producer bias? This could be some interesting research/

            Unassigned Unassigned
            alastairp Alastair Porter
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:

                Version Package