Uploaded image for project: 'ListenBrainz'
  1. ListenBrainz
  2. LB-440

Training and evaluation script for LB playground.

    • Icon: Task Task
    • Resolution: Won't Fix
    • Icon: Normal Normal
    • None
    • listenbrainz-labs
    • None

      Now that we're starting to create models from the collaborative filter in Apache Spark, we need to make the results of this filter available to our community so that our community can give feedback on the results.

      We will need a new script that should follow the guidelines outlined in this howto guide:

      https://github.com/metabrainz/metabrainz-howto-guides/blob/master/writing_effective_data_processing_scripts.md

      The script should do the following:

      1. Given a training data set (could be partial or full), the data set should be pre-processed so that we can feed the data to the CF algorithm and then train a model on that given data set.
      2. Once the model is trained, a set of 50 suggested tracks should be output for each of the users in a given list of users. At first we should assume that iliekcomputers, pristine and ruaok are the users, but anyone in our community should be able to ask to be added to this list.
      3. For each run of this script the script should write a simple, but quite likely large,  HTML file that defines the statistics about the data set and other inputs that were used to generate the model. A nicely formatted table with the 50 tracks for each user should be output so that each user can easily view the recommended tracks.
      4. All of the parameters that go into defining the data set and training the model must be clearly output and identified in the output of the script. Each time that this script is run, we will need to be able to re-run the exact same data set and be able to create the exact same results that we did before.
      5. Once this file is generated it should be placed in some directory where users can look at the list of files, but this task is not for you to carry out – ruaok will make sure that the contents of the directory are available to the public.

      With this script all of the users of the script should be able to see the tracks that were recommended for them and how the model was created. The goal is that the community can collectively learn about how to train models and do a better job of making recommendations better.

       

            vansika Vansika Pareek
            rob Robert Kaye
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved:

                Version Package