Uploaded image for project: 'Zapped: AcousticBrainz'
  1. Zapped: AcousticBrainz
  2. AB-300

Problem running dataset evaluation script due to absence of file - recordingtoartistmap.json

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Normal Normal
    • None
    • None
    • Dataset training, Server
    • None

       

      File "/home/vagrant/acousticbrainz-server/dataset_eval/../dataset_eval/artistfilter.py", line 20, in 
      rtoajson = json.load(open("recordingtoartistmap.json"))
      IOError: [Errno 2] No such file or directory: 'recordingtoartistmap.json'

       

          [AB-300] Problem running dataset evaluation script due to absence of file - recordingtoartistmap.json

          The artistfilter module no longer fails if this json file doesn't exist

          Alastair Porter added a comment - The artistfilter module no longer fails if this json file doesn't exist

          Thanks for looking into this. I opened a ticket to explain the reasoning behind Artist filtering: AB-301

          The filtering process was very slow when I initially implemented this feature, so I added this quick hack with the cache file to make it faster (when I was using the data for a project).
          We haven't used artist filtering since, so I think we could clean this up. I recommend we do two things:

          1. RIght now, change the loading of this file to set `rtoajson = {}` if the file `recordingtoartistmap.json` doesn't exist. This is an easy fix which we should have had in place a long time ago
          2. Use our built-in cache (right now Memcache, but Redis once we merge this PR: https://github.com/metabrainz/acousticbrainz-server/pull/225). This should let us look up some data mapping recording->artist and keep it in the cache so that this step gets faster.

          Alastair Porter added a comment - Thanks for looking into this. I opened a ticket to explain the reasoning behind Artist filtering: AB-301 The filtering process was very slow when I initially implemented this feature, so I added this quick hack with the cache file to make it faster (when I was using the data for a project). We haven't used artist filtering since, so I think we could clean this up. I recommend we do two things: RIght now, change the loading of this file to set `rtoajson = {}` if the file `recordingtoartistmap.json` doesn't exist. This is an easy fix which we should have had in place a long time ago Use our built-in cache (right now Memcache, but Redis once we merge this PR: https://github.com/metabrainz/acousticbrainz-server/pull/225 ). This should let us look up some data mapping recording->artist and keep it in the cache so that this step gets faster.

          Nupur Baghel added a comment -

          @Alastair Porter 

          While debugging the issue I found that the file artisitfilter.py is using the line

          rtoajson = json.load(open("recordingtoartistmap.json"))

          to build a cache for the searched mbid's . The file recordingtoartistmap.json  never gets written though apart from not getting created.  The code to fix this can be added but there is an additional issue. When I try to evaluate a dataset using "Filter by Artist" I get an error as below :

          sqlalchemy.exc.ProgrammingError: (psycopg2.ProgrammingError) column "mbid" does not exist

          LINE 1: SELECT mbid::text, data->'metadata'>'tags'>'musicbrainz_ar...

           So now I am not clear that Filter by artist feature is used or not. The job remains in status "running" forever when the error occurs.

           

          Nupur Baghel added a comment - @ Alastair Porter   While debugging the issue I found that the file artisitfilter.py is using the line rtoajson = json.load(open("recordingtoartistmap.json")) to build a cache for the searched mbid's . The file recordingtoartistmap.json  never gets written though apart from not getting created.  The code to fix this can be added but there is an additional issue. When I try to evaluate a dataset using "Filter by Artist" I get an error as below : sqlalchemy.exc.ProgrammingError: (psycopg2.ProgrammingError) column "mbid" does not exist LINE 1: SELECT mbid::text, data->'metadata' >'tags' >'musicbrainz_ar...  So now I am not clear that Filter by artist feature is used or not. The job remains in status "running" forever when the error occurs.  

            Nupur_Baghel Nupur Baghel
            Nupur_Baghel Nupur Baghel
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved:

                Version Package