Uploaded image for project: 'ListenBrainz'
  1. ListenBrainz
  2. LB-469

ListenBrainz-Labs- Splitting of date into training, validation and test set should be more logical and not intuitive.

    XMLWordPrintable

    Details

    • Type: New Feature
    • Status: Open
    • Priority: Normal
    • Resolution: Unresolved
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      Presently, ListenBrainz Labs is randomly splitting the pre-processed data into training, validation and test sets. 

      What if listen preference of a particular user is accumulated in the test set and training set has no listens of the user? Probably, the quality of recommendations for that particular user won't be good or there are no recommendations at all.  Test data should be representative of the data set as a whole. 

      This is one thought, there maybe other issues which arise due to this random split. I am not sure how data should be split or should we actually spend time on splitting data effectively.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              vansika Vansika Pareek
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:

                Packages

                Version Package