• Icon: New Feature New Feature
    • Resolution: Fixed
    • Icon: Normal Normal
    • 2011-12-24
    • None
    • None
    • None

      AS A WFE sysadmin
      I WANT the search server to natively know how to check the ratelimit-server
      SO THAT I don't have to have a separate proxy to perform the checking

      When the search server receives a request via http://search.musicbrainz.org/ , it should perform rate-limiting. When it receives a request via musicbrainz-server, it should not perform rate-limiting.

      Here's how:

      Headers

      Requests via http://search.musicbrainz.org/ will arrive at the search server with two extra headers,

        X-Apply-Rate-Limit: yes
        X-MB-Remote-Addr: 193.195.43.199
      

      i.e. the client's IP address.

      Requests via musicbrainz-server will not say "X-Apply-Rate-Limit: yes" (the header will either be missing, or say "no"). The X-MB-Remote-Addr header may be present.

      Config

      The search server config will need to include the ratelimit-server endpoint address: ratelimitserver.host and ratelimitserver.port.

      Processing Logic

      When the search server receives a search request, first determine whether or not we will be applying rate limiting:

      • if ratelimitserver.host and/or ratelimitserver.port are not set, then skip rate limiting
      • otherwise, read the X-Apply-Rate-Limit header; if it's missing or anything other than "yes", then skip rate limiting
      • otherwise, read the X-MB-Remote-Addr header (it should be a dot-quad IP address, e.g. 1.2.3.4); if it's missing or malformed then skip rate limiting
      • otherwise, we will apply rate limiting

      Next, we apply rate limiting (unless of course we're skipping it):

      • construct the ratelimit key, which should be: "search ip=x.x.x.x" (from X-MB-Remote-Addr header)
      • test the ratelimit (i.e. ask the ratelimit-server "over_limit search ip=x.x.x.x")
      • if the response was "Y" (over limit), then reject with a 503 response, ideally including the current rate / max rate / period in the response somewhere
      • otherwise, continue

      Next (unless we've already 503'd), serve the search request as normal.

      See attached for how to talk to the ratelimit-server.

          [SEARCH-161] Search server should consult ratelimit-server

          Paul Taylor added a comment -

          Fixed

          Paul Taylor added a comment - Fixed

          Paul Taylor added a comment -

          Ok, done that.

          Final point when do contact the rate limiter it is always rejected (I guess its just been setup like this for test) but the rate is much less than the limit so I asume we are hitting the Global limit, so is it misleading to set the rate limiting message and header if the user is not actually breaking their limit but just being unlucky enough to hit the globals limit. So Im only going to se this part of te message if R > L , and set different message if not.

          Paul Taylor added a comment - Ok, done that. Final point when do contact the rate limiter it is always rejected (I guess its just been setup like this for test) but the rate is much less than the limit so I asume we are hitting the Global limit, so is it misleading to set the rate limiting message and header if the user is not actually breaking their limit but just being unlucky enough to hit the globals limit. So Im only going to se this part of te message if R > L , and set different message if not.

          Paul Taylor added a comment -

          'test the ratelimit (i.e. ask the ratelimit-server "over_limit search ip=x.x.x.x")'
          'No, it's not. That's why it was added.'

          It wasn't specified that I used it in this issue, which is why I didnt but I'll change it.

          Re:Config
          Yes it is configurable by hand in a config file (web.xml) but Rob likes the default value to be correct for Production so that when he does a new build/release he doesnt have to modify the config file everytime.

          Paul Taylor added a comment - 'test the ratelimit (i.e. ask the ratelimit-server "over_limit search ip=x.x.x.x")' 'No, it's not. That's why it was added.' It wasn't specified that I used it in this issue, which is why I didnt but I'll change it. Re:Config Yes it is configurable by hand in a config file (web.xml) but Rob likes the default value to be correct for Production so that when he does a new build/release he doesnt have to modify the config file everytime.

          Dave Evans added a comment -

          "is it safe to use the rate server without using a request Id" - No, it's not. That's why it was added.

          Dave Evans added a comment - "is it safe to use the rate server without using a request Id" - No, it's not. That's why it was added.

          Dave Evans added a comment -

          Re. config: we'll need the host/port of the rate limiter to be configurable, because we have more than one environment. So no one answer is correct.

          So there needs to be some mechanism to hand-edit some config file (ideally one that doesn't get splatted at each deployment) to set the host/port for that specific instance.

          Re. the message: "Clients will not know to look in the header" - clients will mostly be machines so they won't know to look in the body either. But I don't mind if you also want to put a message in the body.

          "Actually I dont quite get what these values are": rate=R limit=L period=P means you're allowed to make "L" requests per "P" seconds, but you're currently making "R" requests per "P" seconds, and R > L.

          Dave Evans added a comment - Re. config: we'll need the host/port of the rate limiter to be configurable, because we have more than one environment. So no one answer is correct. So there needs to be some mechanism to hand-edit some config file (ideally one that doesn't get splatted at each deployment) to set the host/port for that specific instance. Re. the message: "Clients will not know to look in the header" - clients will mostly be machines so they won't know to look in the body either. But I don't mind if you also want to put a message in the body. "Actually I dont quite get what these values are": rate=R limit=L period=P means you're allowed to make "L" requests per "P" seconds, but you're currently making "R" requests per "P" seconds, and R > L.

          Paul Taylor added a comment - - edited

          And what values are required to access the production rate limiter, because I ship the code with config files production ready.

          Paul Taylor added a comment - - edited And what values are required to access the production rate limiter, because I ship the code with config files production ready.

          Paul Taylor added a comment -

          Testing rate limiter on hobbes, both requests from test.muiscbrainz.org and search.musicbrainz.org are setting the X-Apply-Rate-Limit header and the X-MB-Remote-Addr, different to what was expected

          Also I wonder abot your suggestion for your responding with
          'The MusicBrainz search server is currently busy\nPlease try again later.'
          And putting the rate limit problem in a header

          Clients will not know to look in the header so I think the rate limiting information should be added to the message

          i.e
          'The MusicBrainz search server is currently busy 0.1 22.0 20\nPlease try again later.'

          Actually I dont quite get what these values are:
          0.1 = Rate
          22.0 = Limit
          20 = Period

          But what does that actually mean ?

          The one other thing I wonder is is it safe to use the rate server without using a request Id, i.e is it possible that I send a request and then get a response intened for someone else ?

          Paul Taylor added a comment - Testing rate limiter on hobbes, both requests from test.muiscbrainz.org and search.musicbrainz.org are setting the X-Apply-Rate-Limit header and the X-MB-Remote-Addr, different to what was expected Also I wonder abot your suggestion for your responding with 'The MusicBrainz search server is currently busy\nPlease try again later.' And putting the rate limit problem in a header Clients will not know to look in the header so I think the rate limiting information should be added to the message i.e 'The MusicBrainz search server is currently busy 0.1 22.0 20\nPlease try again later.' Actually I dont quite get what these values are: 0.1 = Rate 22.0 = Limit 20 = Period But what does that actually mean ? The one other thing I wonder is is it safe to use the rate server without using a request Id, i.e is it possible that I send a request and then get a response intened for someone else ?

          Dave Evans added a comment -

          You can read the ratelimit-server log, if you wish, as follows:

          cd /usr/local/ratelimit-server
          tail -F log/main/current | tai64nlocal

          Dave Evans added a comment - You can read the ratelimit-server log, if you wish, as follows: cd /usr/local/ratelimit-server tail -F log/main/current | tai64nlocal

          Dave Evans added a comment -

          A ratelimit-server is now installed on hobbes: host=hobbes.localdomain (or 10.1.1.18 or 127.0.0.1, etc), port=2000.

          Dave Evans added a comment - A ratelimit-server is now installed on hobbes: host=hobbes.localdomain (or 10.1.1.18 or 127.0.0.1, etc), port=2000.

          Dave Evans added a comment -

          I have modified nginx for search.test.musicbrainz.org so that "X-Apply-Rate-Limit: yes" is now set.

          Example on hobbes just now:

          GET / HTTP/1.0
          Host: search.test.musicbrainz.org
          X-MB-Remote-Addr: 81.187.237.83
          X-Apply-Rate-Limit: yes
          Connection: close
          User-Agent: Mozilla/5.0 (X11; Linux i686; rv:8.0) Gecko/20100101 Firefox/8.0
          Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
          Accept-Language: en-gb,en;q=0.5
          Accept-Encoding: gzip, deflate
          Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
          DNT: 1
          Cache-Control: max-age=2
          

          Dave Evans added a comment - I have modified nginx for search.test.musicbrainz.org so that "X-Apply-Rate-Limit: yes" is now set. Example on hobbes just now: GET / HTTP/1.0 Host: search.test.musicbrainz.org X-MB-Remote-Addr: 81.187.237.83 X-Apply-Rate-Limit: yes Connection: close User-Agent: Mozilla/5.0 (X11; Linux i686; rv:8.0) Gecko/20100101 Firefox/8.0 Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 Accept-Language: en-gb,en;q=0.5 Accept-Encoding: gzip, deflate Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 DNT: 1 Cache-Control: max-age=2

            ijabz Paul Taylor
            djce Dave Evans
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

              Created:
              Updated:
              Resolved:

                Version Package
                2011-12-24