I've done some work into looking how this works, and here are my findings...
Firstly, a look at Varnish. Varnish is a HTTP accelerator, but in this case it's being used to essentially provide hooks for certain parts of a HTTP request. For example, doing something special on a GET request and caching the result, doing something special on cache hits, etc.
So, the Varnish mod gives you 2 options.
- Handle the callback yourself, but use a cache. From what I understand with this approach, your web service code does the HTTP request to 3scale, but instead of using 3Scale actual host, you make the call to yourself. Varnish will intercept this - the first call will go to 3scale and get cached for, say, 30 seconds. Within those 30 seconds, all calls just hit the cache. You essentially give users 30 seconds of grace (can change this amount).
- Use Varnish to actually do the 3Scale stuff as well. In this configuration, the Varnish VCL first does a request to 3scale (caching it), and if that succeeds passes through to our backend to do the web service request. This means that you never need to worry about 3scale stuff in the server at all (and if people want to run the server locally, they don't have to faff around with disabling 3scale options).
It's all quite undocumented, so I mostly reverse engineered that from the C module and the example VCLs. However, it doesn't look overly complicated now that I have an idea how it works.
In regards to the rate limiter, I think we talked about using a per-authorization rate limit - with a high limit. We'll have to figure out what this actually is, but something like 10 simultaneous requests should be OK. CCing djce so we can discuss how this works.
Unassigning to close, we aren't likely to use 3scale at this time.