Uploaded image for project: 'ListenBrainz'
  1. ListenBrainz
  2. LB-704

training data set split considerations

XMLWordPrintable

    • Icon: Improvement Improvement
    • Resolution: Unresolved
    • Icon: Normal Normal
    • None
    • listenbrainz-labs
    • None

      This ticket isn't actionable yet, but mostly meant as a reminder that we have an issue to address going foward.

      Right now in training our CF algorithm, we split our data set into training, validation and test. This means that the tracks that end up in test are never recommended to users. If a user has very few listens, this means that some of these tracks will never be recommended to the user.

      Perhaps we should randomize tracks before breaking them into three sets. We'll have to consider this issue more before we can proceed.

            Unassigned Unassigned
            rob Robert Kaye
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:

                Version Package