Uploaded image for project: 'ListenBrainz'
  1. ListenBrainz
  2. LB-469

ListenBrainz-Labs- Splitting of date into training, validation and test set should be more logical and not intuitive.

XMLWordPrintable

    • Icon: New Feature New Feature
    • Resolution: Unresolved
    • Icon: Normal Normal
    • None
    • None
    • None

      Presently, ListenBrainz Labs is randomly splitting the pre-processed data into training, validation and test sets. 

      What if listen preference of a particular user is accumulated in the test set and training set has no listens of the user? Probably, the quality of recommendations for that particular user won't be good or there are no recommendations at all.  Test data should be representative of the data set as a whole. 

      This is one thought, there maybe other issues which arise due to this random split. I am not sure how data should be split or should we actually spend time on splitting data effectively.

            Unassigned Unassigned
            vansika Vansika Pareek
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:

                Version Package