-
Improvement
-
Resolution: Fixed
-
Normal
-
None
The recent Postgres troubles were resolved mostly by reducing the number of database backends (setting max_connections to 100). However, as the graph shows, the number of backend processes has been slowly rising since then. (It should be investigated why there are more than 100 backends in spite of that setting ...)
100 connections is probably still too much. The rule of thumb cited in various places, "2x number of cores (without HT) plus effective_spindle_count", suggests a value closer to 50 for a 24-core server. So I think we should make the following settings:
- for pgbouncer, default_pool_size=50 and reserve_pool_size=0; not sure about max_client_conn, perhaps 150 or so?
- for Postgres itself, max_connections=65 so that there are 50 for the website plus some more for index generation, ad-hoc queries etc.
We should also re-increase swappiness (perhaps not back to 60, but 30 or so). The totoro swap usage graph shows that there are more than 3 GB that were in swap at the time the issues were fixed and haven't been accessed once since. That could, e.g., be files in a tmpfs that aren't being read or written to for a long time. It's good when practically unused memory like that is swapped out so that the RAM can be used for caching database contents instead, but a 0 swappiness prevents that.