Uploaded image for project: 'MusicBrainz Search Server'
  1. MusicBrainz Search Server
  2. SEARCH-414

MusicBrainz rate limiting : limit is 22.0 per 20 seconds`

    • Icon: Bug Bug
    • Resolution: Invalid
    • Icon: Normal Normal
    • None
    • None
    • None
    • None

      I've created a new application called musicbrainz-tagger: https://github.com/tchoulihan/musicbrainz-tagger

      Even though I'm setting a User-Agent as: musicbrainz-tagger/1.0.3 (https://github.com/tchoulihan/musicbrainz-tagger)

      I'm still getting rate limited.

          [SEARCH-414] MusicBrainz rate limiting : limit is 22.0 per 20 seconds`

          This is exactly what I needed. Thanks!

          Tyler Houlihan added a comment - This is exactly what I needed. Thanks!

          nikki added a comment -

          Just for the record, my /var/lib/postgresql/ is 21 GB without edits. That is using mbslave for importing/replicating, but I guess that won't make much difference for the actual database size. The search indexes are separate from the database (or have I misunderstood something?) so that should be about 33 GB in total right now.

          The database and web server install instructions are in the git repository: https://bitbucket.org/metabrainz/musicbrainz-server/src/HEAD/INSTALL.md
          I've never set up the search server, but presumably the instructions for that are these ones which are also in git: https://bitbucket.org/metabrainz/search-server

          nikki added a comment - Just for the record, my /var/lib/postgresql/ is 21 GB without edits. That is using mbslave for importing/replicating, but I guess that won't make much difference for the actual database size. The search indexes are separate from the database (or have I misunderstood something?) so that should be about 33 GB in total right now. The database and web server install instructions are in the git repository: https://bitbucket.org/metabrainz/musicbrainz-server/src/HEAD/INSTALL.md I've never set up the search server, but presumably the instructions for that are these ones which are also in git: https://bitbucket.org/metabrainz/search-server

          Release-centric is very understandable, but I've seen far too many mistagged individual songs. My library tags on 6 pieces of info: track title, track #, artist name, album name, duration, and year.

          Here's my library url btw.
          https://github.com/tchoulihan/musicbrainz-tagger

          I've had great success rates, as long as I give a track length to the lucene search(currently using a 3 second window) and a year for the lucene wildcard

          25GB is very doable for me, as long as I can set this up on a linux server. Could you point me to where I can do this?

          Thanks for all this info BTW.

          Tyler Houlihan added a comment - Release-centric is very understandable, but I've seen far too many mistagged individual songs. My library tags on 6 pieces of info: track title, track #, artist name, album name, duration, and year. Here's my library url btw. https://github.com/tchoulihan/musicbrainz-tagger I've had great success rates, as long as I give a track length to the lucene search(currently using a 3 second window) and a year for the lucene wildcard 25GB is very doable for me, as long as I can set this up on a linux server. Could you point me to where I can do this? Thanks for all this info BTW.

          Ian McEwen added a comment -

          The databases on my system are about 53 GB, but that includes the edit dump, which you presumably don't need, so more like 25 most likely (you'll want a database in order to update search indexes). The current search indexes themselves seem to be about 12 GB on one of the production servers.

          I can't think of a way to batch that sort of query (in general, in fact, not just with the current setup); generally it's only possible to batch things that already have MBIDs. Have you considered doing searches/processing at a release level? There's obviously rather fewer of those, for collections where people tend to have full releases; release-centric is how most other taggers using our data operate. I suppose there is a different chance of bad matches, but you're not going to get particularly ideal results without at least some human input anyway (not everywhere, of course, but there's plenty of things it's hard to get good results for without a human choice of exactly which version, or such, they have).

          Ian McEwen added a comment - The databases on my system are about 53 GB, but that includes the edit dump, which you presumably don't need, so more like 25 most likely (you'll want a database in order to update search indexes). The current search indexes themselves seem to be about 12 GB on one of the production servers. I can't think of a way to batch that sort of query (in general, in fact, not just with the current setup); generally it's only possible to batch things that already have MBIDs. Have you considered doing searches/processing at a release level? There's obviously rather fewer of those, for collections where people tend to have full releases; release-centric is how most other taggers using our data operate. I suppose there is a different chance of bad matches, but you're not going to get particularly ideal results without at least some human input anyway (not everywhere, of course, but there's plenty of things it's hard to get good results for without a human choice of exactly which version, or such, they have).

          Its crucial enough to my application that I'm willing to set up a replicated server. How much diskspace are we talking about?

          Overall though, did you guys have real problems with > 20 requests per second? I'd consider switching webservers if that's the case.

          Here's a list of web frameworks and servers by speed:
          http://www.techempower.com/benchmarks/#section=data-r10&hw=peak&test=json

          Most of those servers above can handle > 500k JSON requests/minute easily.

          I've never used an API that limits requests this slowly. Some people have music libraries of over 20k songs, which currently would take more than 5 hours to tag. If this were at 100 requests/second, then we're talking 3 minutes.

          My use-case is basically the following query, many many times(Tagging a track(recording) to an MBID):
          http://musicbrainz.org/ws/2/recording/?query=recording:%22Blueprint%22%20AND%20artist:%22Fugazi%22%20AND%20dur:[231382%20TO%20234382]%20AND%20number:5%20AND%20release:%22Repeater%20%2B%203%20Songs%22%20AND%20date:1990*&limit=1&fmt=json

          The results are nice because I can scrape the artist, release, recording, MBIDs and names in one request, so that is great.

          If there's a way to perform batch requests like the one I have above, I'd love to know.

          Tyler Houlihan added a comment - Its crucial enough to my application that I'm willing to set up a replicated server. How much diskspace are we talking about? Overall though, did you guys have real problems with > 20 requests per second? I'd consider switching webservers if that's the case. Here's a list of web frameworks and servers by speed: http://www.techempower.com/benchmarks/#section=data-r10&hw=peak&test=json Most of those servers above can handle > 500k JSON requests/minute easily. I've never used an API that limits requests this slowly. Some people have music libraries of over 20k songs, which currently would take more than 5 hours to tag. If this were at 100 requests/second, then we're talking 3 minutes. My use-case is basically the following query, many many times(Tagging a track(recording) to an MBID): http://musicbrainz.org/ws/2/recording/?query=recording:%22Blueprint%22%20AND%20artist:%22Fugazi%22%20AND%20dur:[231382%20TO%20234382]%20AND%20number:5%20AND%20release:%22Repeater%20%2B%203%20Songs%22%20AND%20date:1990*&limit=1&fmt=json The results are nice because I can scrape the artist, release, recording, MBIDs and names in one request, so that is great. If there's a way to perform batch requests like the one I have above, I'd love to know.

          Ian McEwen added a comment -

          It's... possible, but the answer for how to do it right now isn't particularly palatable for most

          So, the thing you can do right now is set up a replicated server, against which you can make as many requests as you want, since it obviously won't be configured with a ratelimiter. Note that this would also require setting up a search server (the ticket category this is actually under – not sure if intentionally or not?) and a process for updating indexes, since there's also ratelimiting on the default search server configured to point at search.musicbrainz.org. Replication means the server gets updates once an hour, so changes aren't immediate.

          The longer-term answer is that this has been a problem for a while and is on the list for things to solve for the next version of the webservice (by saner structure of the data, better cacheability, multiple-id-lookup, in this particular case), but that's not a very good answer since there's no real schedule we can associate with that; the project is substantially understaffed from a development perspective.

          It may be possible to restructure the requests your application makes, to ones that require fewer requests, depending exactly what you want to do – sometimes search provides enough information by itself, and you can trick it into returning results for several MBIDs (unlike the main WS), or browse requests do. But that's more difficult and not at all universally applicable.

          Ian McEwen added a comment - It's... possible, but the answer for how to do it right now isn't particularly palatable for most So, the thing you can do right now is set up a replicated server, against which you can make as many requests as you want, since it obviously won't be configured with a ratelimiter. Note that this would also require setting up a search server (the ticket category this is actually under – not sure if intentionally or not?) and a process for updating indexes, since there's also ratelimiting on the default search server configured to point at search.musicbrainz.org. Replication means the server gets updates once an hour, so changes aren't immediate. The longer-term answer is that this has been a problem for a while and is on the list for things to solve for the next version of the webservice (by saner structure of the data, better cacheability, multiple-id-lookup, in this particular case), but that's not a very good answer since there's no real schedule we can associate with that; the project is substantially understaffed from a development perspective. It may be possible to restructure the requests your application makes, to ones that require fewer requests, depending exactly what you want to do – sometimes search provides enough information by itself, and you can trick it into returning results for several MBIDs (unlike the main WS), or browse requests do. But that's more difficult and not at all universally applicable.

          Is there no way to do better than 1 request per second? I'm writing some bigger applications and was wanting to use musicbrainz metadata.

          Tyler Houlihan added a comment - Is there no way to do better than 1 request per second? I'm writing some bigger applications and was wanting to use musicbrainz metadata.

          Ian McEwen added a comment -

          There is still a per-user (per-IP, really) ratelimit, which is as you list. You aren't immune to ratelimiting with a proper user-agent, just from user-agent-based ratelimiting if your application is otherwise behaving properly.

          Ian McEwen added a comment - There is still a per-user (per-IP, really) ratelimit, which is as you list. You aren't immune to ratelimiting with a proper user-agent, just from user-agent-based ratelimiting if your application is otherwise behaving properly.

            Unassigned Unassigned
            tchoulihan Tyler Houlihan
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

              Created:
              Updated:
              Resolved:

                Version Package