[LB-704] training data set split considerations - MetaBrainz Tickets

Type: Improvement
Resolution: Unresolved
Priority: Normal
Fix Version/s: None
Component/s: listenbrainz-labs
Labels:
None

This ticket isn't actionable yet, but mostly meant as a reminder that we have an issue to address going foward.

Right now in training our CF algorithm, we split our data set into training, validation and test. This means that the tracks that end up in test are never recommended to users. If a user has very few listens, this means that some of these tracks will never be recommended to the user.

Perhaps we should randomize tracks before breaking them into three sets. We'll have to consider this issue more before we can proceed.

Assignee:: Unassigned

Reporter:: Robert Kaye

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Created:: 2020-08-17 17:41

Updated:: 2023-12-09 10:33

Version	Package

Details

Description

Attachments

Activity

People

Dates

Packages