Uploaded image for project: 'Picard'
  1. Picard
  2. PICARD-441

High priority webservice queue not actually prioritised

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Normal Normal
    • 1.3
    • 1.2
    • Other
    • None
    • Win8 64-bit

      As I understand it, Picard should dispatch work from the high_priority_queues before the low-priority_queues, however reading the code in file: webservice.py function: _run_next_task it seems to me that it handles these with equal priority because for each key in _hosts it checks both queues rather than checking the high_priority queue first and only then looking in the low_priority_queue:

              for key in self._hosts:
                  queue = self._high_priority_queues.get(key) or self._low_priority_queues.get(key)
                  if not queue:
                      continue
                  now = time.time()
                  last = self._last_request_times.get(key)
                  request_delay = REQUEST_DELAY[key]
                  last_ms = (now - last) * 1000 if last is not None else request_delay
                  if last_ms >= request_delay:
                      self.log.debug("Last request to %s was %d ms ago, starting another one", key, last_ms)
                      d = request_delay
                      queue.popleft()()
                  else:
                      d = request_delay - last_ms
                      self.log.debug("Waiting %d ms before starting another request to %s", d, key)
                  if d < delay:
                      delay = d
      

      Should this code be more like:

      for priority_queues in (self._high_priority_queues,self._low_priority_queues)
              for key in self._hosts:
                  queue = priority_queues.get(key)
                  if not queue:
                      continue
                  now = time.time()
                  last = self._last_request_times.get(key)
                  request_delay = REQUEST_DELAY[key]
                  last_ms = (now - last) * 1000 if last is not None else request_delay
                  if last_ms >= request_delay:
                      self.log.debug("Last request to %s was %d ms ago, starting another one", key, last_ms)
                      d = request_delay
                      queue.popleft()()
                  else:
                      d = request_delay - last_ms
                      self.log.debug("Waiting %d ms before starting another request to %s", d, key)
                  if d < delay:
                      delay = d
      

      If experts agree with this I will code, test and create a pull request for it.

      S

          [PICARD-441] High priority webservice queue not actually prioritised

          Michael Wiencek added a comment - https://github.com/musicbrainz/picard/pull/166

          Sophist added a comment -

          PR-166 addresses this issue.

          Sophist added a comment - PR-166 addresses this issue.

          Sophist added a comment -

          IMO the high priority queue should be kept short by having a very limited set of things here and IMO we should limit this to interactive stuff plus redirects - with all the high-volume stuff i.e. loading collections in the low-priority queue.

          I will put this next on my list of fixes.

          Sophist added a comment - IMO the high priority queue should be kept short by having a very limited set of things here and IMO we should limit this to interactive stuff plus redirects - with all the high-volume stuff i.e. loading collections in the low-priority queue. I will put this next on my list of fixes.

          The web service requests are probably asynchronous i.e. queue.pop()() pops and executes a function call which sends a request. The loop should then continue and look at the next key - which it does. So the code as written is correct. Apologies.

          Yeah, that's accurate. Nothing to apologize over - this isn't exactly the clearest code ever written.

          I think the suggestions you've made make sense. Of course, they'll require testing to see what kind of difference they make (hopefully good).

          It's been a while since I've written this code. The "important" flag seems like a hack that avoids having to add a medium-priority queue. The only thing you suggest using it for is redirects, but I question whether it's even necessary to prioritize those over interactive actions like refreshing, etc. Like I said, fine-tuning this will require some testing - we might find that the important flag isn't necessary at all.

          Michael Wiencek added a comment - The web service requests are probably asynchronous i.e. queue.pop()() pops and executes a function call which sends a request. The loop should then continue and look at the next key - which it does. So the code as written is correct. Apologies. Yeah, that's accurate. Nothing to apologize over - this isn't exactly the clearest code ever written. I think the suggestions you've made make sense. Of course, they'll require testing to see what kind of difference they make (hopefully good). It's been a while since I've written this code. The "important" flag seems like a hack that avoids having to add a medium-priority queue. The only thing you suggest using it for is redirects, but I question whether it's even necessary to prioritize those over interactive actions like refreshing, etc. Like I said, fine-tuning this will require some testing - we might find that the important flag isn't necessary at all.

          Sophist added a comment -

          I would also add that in several functions the default Important=True.

          Important effectively means that the request is queued LIFO, so IMO its usage should be restricted to specific requests which make sense to queue-jump, whilst all other requests should be Important=False.

          Michael - what do you think?

          S

          Sophist added a comment - I would also add that in several functions the default Important=True. Important effectively means that the request is queued LIFO, so IMO its usage should be restricted to specific requests which make sense to queue-jump, whilst all other requests should be Important=False. Michael - what do you think? S

          Sophist added a comment -

          Upon reflection, I think I have misunderstood how the requests are processed.

          The web service requests are probably asynchronous i.e. queue.pop()() pops and executes a function call which sends a request. The loop should then continue and look at the next key - which it does. So the code as written is correct. Apologies.

          As for the queue priorities, based on what you have said I would personally queue requests as follows:

          Album refresh (Ctrl-R): High/Unimportant - user is probably waiting for this
          Album Other Versions: High/Unimportant - user is definitely waiting for this

          Submitting AcoustIDs, ratings, or releases to a collection: High/Unimportant - user may want to go to MusicBrainz and do some editing based on submitted data.

          Add files / folder: Low/Unimportant - queue in FIFO but below interactive and secondary lookups.
          Redirects & Secondary lookups: High/Important - want to complete active requests as quickly as possible.
          Downloading album art: High/Unimportant - Art downloads are likely to be on a different key to MB lookups so will not block them.

          I haven't checked the code in detail, but based on my own experiences, I am certain that this is not quite the code which has been used - if I am loading my collection and do a Refresh or select Other Version then the request for this is queued behind all the existing Album lookups.

          I realise that the original subject of this is now effectively closed (my mistake - apologies) - I suggest that once you have responded to this comment we close this and take the discussion e.g. to IRC if that is OK with you.

          S

          Sophist added a comment - Upon reflection, I think I have misunderstood how the requests are processed. The web service requests are probably asynchronous i.e. queue.pop()() pops and executes a function call which sends a request. The loop should then continue and look at the next key - which it does. So the code as written is correct. Apologies. As for the queue priorities, based on what you have said I would personally queue requests as follows: Album refresh (Ctrl-R): High/Unimportant - user is probably waiting for this Album Other Versions: High/Unimportant - user is definitely waiting for this Submitting AcoustIDs, ratings, or releases to a collection: High/Unimportant - user may want to go to MusicBrainz and do some editing based on submitted data. Add files / folder: Low/Unimportant - queue in FIFO but below interactive and secondary lookups. Redirects & Secondary lookups: High/Important - want to complete active requests as quickly as possible. Downloading album art: High/Unimportant - Art downloads are likely to be on a different key to MB lookups so will not block them. I haven't checked the code in detail, but based on my own experiences, I am certain that this is not quite the code which has been used - if I am loading my collection and do a Refresh or select Other Version then the request for this is queued behind all the existing Album lookups. I realise that the original subject of this is now effectively closed (my mistake - apologies) - I suggest that once you have responded to this comment we close this and take the discussion e.g. to IRC if that is OK with you. S

          Sophist added a comment -

          Yes - that was my assumption - I was assuming that there are a finite number of threads, and when a thread completes a request it looks for another to process - and that it should then check first to see if there is a high priority request for any host, and then check for low-priority requests for any host.

          It would still respect the rate limits for each host because these are stored separately from the queues.

          For example, if there were 2 requests in the HPQ and 1 in the LPQ for host ('musicbrainz.org', 80), your code would execute the first request in the HPQ for that host, and then execute the one in the LPQ, even though there's still 1 in the HPQ which should finish first.

          In this example, since the first entry in the HPQ executes a queue.pop, I would expect the loop to stop (though I do not see a break or return statement which might seem to be necessary - not sure what the queue.pop). Then when this HPQ finishes, I would expect the loop to start again from the top and check the HPQ again first. (And in any case, even if we just popped the last entry in the HPQ, we need to check the HPQ again before we check the LPQ because in the meantime another entry might have been made to the HPQ.)

          I do see your point on Album Art - I would not want to go and work on a loaded album until I could see the Album Art, so I would want it to stay in [loading album information] until it had downloaded the art.

          I know I am a novice at how Picard works, and this expert view is exactly what I needed. I have to admit that whilst my general programming skills are good, my python is intermediate and (without documentation or comments) my ability to understand how the Picard code works is variable.

          The good news is that we share the same perspective - which is that the interactive activities of the user should be prioritised over background batched activities like loading the collection.

          S

          Sophist added a comment - Yes - that was my assumption - I was assuming that there are a finite number of threads, and when a thread completes a request it looks for another to process - and that it should then check first to see if there is a high priority request for any host, and then check for low-priority requests for any host. It would still respect the rate limits for each host because these are stored separately from the queues. For example, if there were 2 requests in the HPQ and 1 in the LPQ for host ('musicbrainz.org', 80), your code would execute the first request in the HPQ for that host, and then execute the one in the LPQ, even though there's still 1 in the HPQ which should finish first. In this example, since the first entry in the HPQ executes a queue.pop, I would expect the loop to stop (though I do not see a break or return statement which might seem to be necessary - not sure what the queue.pop). Then when this HPQ finishes, I would expect the loop to start again from the top and check the HPQ again first. (And in any case, even if we just popped the last entry in the HPQ, we need to check the HPQ again before we check the LPQ because in the meantime another entry might have been made to the HPQ.) I do see your point on Album Art - I would not want to go and work on a loaded album until I could see the Album Art, so I would want it to stay in [loading album information] until it had downloaded the art. I know I am a novice at how Picard works, and this expert view is exactly what I needed. I have to admit that whilst my general programming skills are good, my python is intermediate and (without documentation or comments) my ability to understand how the Picard code works is variable. The good news is that we share the same perspective - which is that the interactive activities of the user should be prioritised over background batched activities like loading the collection. S

          Ah, I think your initial confusion was in assuming a high-priority request for one host should execute before a low-priority request for another host, which isn't necessarily the case. There are separate queues for each host because they can have different rate limits.

          Also, your suggested code would actually do the same thing as the current code in most cases, due to the rate limits. But it would interleave the HPQ and LPQ requests for hosts without any rate limit.

          Michael Wiencek added a comment - Ah, I think your initial confusion was in assuming a high-priority request for one host should execute before a low-priority request for another host, which isn't necessarily the case. There are separate queues for each host because they can have different rate limits. Also, your suggested code would actually do the same thing as the current code in most cases, due to the rate limits. But it would interleave the HPQ and LPQ requests for hosts without any rate limit.

          It doesn't handle them with equal priority. This line:

          queue = self._high_priority_queues.get(key) or self._low_priority_queues.get(key)

          only takes from the low-priority queue if there's nothing in the high-priority one. Your suggested code wouldn't work, because it interleaves high- and low-priority requests. For example, if there were 2 requests in the HPQ and 1 in the LPQ for host ('musicbrainz.org', 80), your code would execute the first request in the HPQ for that host, and then execute the one in the LPQ, even though there's still 1 in the HPQ which should finish first.

          Generic POSTs/PUTs/DELETEs default to High-Priority/Important - but I am not sure why - these are submitting data (which the user is not waiting for) rather than getting data (which the user probably is waiting for) so shouldn't these be LOw-Priority/Not Important?

          No, because a lot of people tend to load hundreds of albums in the background and work on them incrementally. If someone is submitting AcoustIDs, ratings, or releases to a collection, they want it to happen right away - otherwise the submissions would be blocked by the hundreds of albums that are still loading, and that could take many minutes.

          It seems to me that: [...] Updates pushed to the server and files such as Album Art being downloaded in the background are Low Priority and Not Important.

          The artwork is needed right away, it blocks the album from loading. If artwork was set to low-priority and not important, an album would be stuck on "[loading album information]" until all albums before it finished loading and all artwork before it finished loading. Everything would jam up for a while, because typically a bunch of album requests are made consecutively; by the time the first album got around to requesting its cover art, hundreds of albums would already be in the LPQ.

          Michael Wiencek added a comment - It doesn't handle them with equal priority. This line: queue = self._high_priority_queues.get(key) or self._low_priority_queues.get(key) only takes from the low-priority queue if there's nothing in the high-priority one. Your suggested code wouldn't work, because it interleaves high- and low-priority requests. For example, if there were 2 requests in the HPQ and 1 in the LPQ for host ('musicbrainz.org', 80), your code would execute the first request in the HPQ for that host, and then execute the one in the LPQ, even though there's still 1 in the HPQ which should finish first. Generic POSTs/PUTs/DELETEs default to High-Priority/Important - but I am not sure why - these are submitting data (which the user is not waiting for) rather than getting data (which the user probably is waiting for) so shouldn't these be LOw-Priority/Not Important? No, because a lot of people tend to load hundreds of albums in the background and work on them incrementally. If someone is submitting AcoustIDs, ratings, or releases to a collection, they want it to happen right away - otherwise the submissions would be blocked by the hundreds of albums that are still loading, and that could take many minutes. It seems to me that: [...] Updates pushed to the server and files such as Album Art being downloaded in the background are Low Priority and Not Important. The artwork is needed right away, it blocks the album from loading. If artwork was set to low-priority and not important, an album would be stuck on " [loading album information] " until all albums before it finished loading and all artwork before it finished loading. Everything would jam up for a while, because typically a bunch of album requests are made consecutively; by the time the first album got around to requesting its cover art, hundreds of albums would already be in the LPQ.

          Sophist added a comment -

          P.S. webservice.py also defines which type of request is High/Low priority and which is Important/Not (important is added to the front of the queue, not important to the back of the queue).

          But I am unclear why these are the way they are e.g.:

          • A redirect is always High-Priority/Important - which makes sense - if we have received a URL response which is a redirect we want to handle this ahead of any other requests.
          • Generic POSTs/PUTs/DELETEs default to High-Priority/Important - but I am not sure why - these are submitting data (which the user is not waiting for) rather than getting data (which the user probably is waiting for) so shouldn't these be LOw-Priority/Not Important?
          • Get_by_ID defaults to Low-Priority/Not important - but presumably the user is waiting for this info.

          etc.

          It seems to me that:

          • Individual requests initiated by the user directly in the GUI should be High Priority & Important.
          • Redirects should also be High Priority & Important.
          • Batched requests indirectly initiated by the user (i.e. who did an Add Folder) and which are updating the screen should be High-Priority and Not Important
          • Updates pushed to the server and files such as Album Art being downloaded in the background are Low Priority and Not Important.

          What do you think?

          Sophist added a comment - P.S. webservice.py also defines which type of request is High/Low priority and which is Important/Not (important is added to the front of the queue, not important to the back of the queue). But I am unclear why these are the way they are e.g.: A redirect is always High-Priority/Important - which makes sense - if we have received a URL response which is a redirect we want to handle this ahead of any other requests. Generic POSTs/PUTs/DELETEs default to High-Priority/Important - but I am not sure why - these are submitting data (which the user is not waiting for) rather than getting data (which the user probably is waiting for) so shouldn't these be LOw-Priority/Not Important? Get_by_ID defaults to Low-Priority/Not important - but presumably the user is waiting for this info. etc. It seems to me that: Individual requests initiated by the user directly in the GUI should be High Priority & Important. Redirects should also be High Priority & Important. Batched requests indirectly initiated by the user (i.e. who did an Add Folder) and which are updating the screen should be High-Priority and Not Important Updates pushed to the server and files such as Album Art being downloaded in the background are Low Priority and Not Important. What do you think?

            sophist Sophist
            sophist Sophist
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved:

                Version Package
                1.3