Uploaded image for project: 'ListenBrainz'
  1. ListenBrainz
  2. LB-868

Try and reduce the number of times a Listen is converted to and from json during ingestion

XMLWordPrintable

    • Icon: Improvement Improvement
    • Resolution: Unresolved
    • Icon: Normal Normal
    • None
    • None
    • None

      We have a suspicion that during the listen import pipelineĀ a bunch of time is spent converting to and from strings, json objects, and the Listen class. We should do an audit to verify if this is the case, and if so try and optimise it.

      From my quick investigation we have the following things happening:

      webserver -> convert text to dict with ujson -> create Listen object for each item -> write back to string with json and add to queue

      ts writer read from queue [how many at a time?] -> convert to object with ujson -> convert to Listen object -> serialise to row to insert into postgres (ujson) -> also serialise listen to push to unique queue (ujson)

      I think that the conversion in both places is necessary. In the webserver to validate the request, and in ts writer to convert from listen format to what is needed to insert into the database. Additionally, if we want to move Messybrainz into the ts writer then we definitely need this conversion to happen.

      The only possible speed improvement that I can see here is to omit the conversion to a Listen object, however I think that we should keep this around as it helps us understand the data structure better. I would only recommend that we remove it if we can clearly show that it's causing something like a 2x slowdown during reading/writing.

            Unassigned Unassigned
            alastairp Alastair Porter
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:

                Version Package