-
New Feature
-
Resolution: Unresolved
-
Normal
-
None
-
None
-
None
Presently, ListenBrainz Labs is randomly splitting the pre-processed data into training, validation and test sets.
What if listen preference of a particular user is accumulated in the test set and training set has no listens of the user? Probably, the quality of recommendations for that particular user won't be good or there are no recommendations at all. Test data should be representative of the data set as a whole.
This is one thought, there maybe other issues which arise due to this random split. I am not sure how data should be split or should we actually spend time on splitting data effectively.