-
Improvement
-
Resolution: Fixed
-
Normal
-
None
-
None
-
None
The scripts dumping the database (for full export, sample data, search indexes, replication packets) make a lot disk read/write operations in a way that the server running these is almost unavailable for anything else when these are running.
It became even more clear since last year with alerts being thrown on every Wednesday and Saturday. (Alerts have been manually silenced since February.)
Two changes can be made to these scripts to try to improve the situation:
- Use ionice -c3 to decrease the priority of these I/O operations;
- Create temporary files in the same volume as their final destination, so that moving files (with mv) doesn’t require to actually copy their content.
Since the load depends on the server environment, the impact of each of these changes has to be evaluated in production. Below is a planning for these tests:
Date | Step |
---|---|
from 33, Mon 15th Aug | Unsilence alerts (no code change) |
from 48½, Fri 2nd Dec | Create temporary files in the final destination volume |
from 2, Tue 10th Jan | Also use ionice -c3 |
After trille server has been restarted on Tue 21st Mar to make Docker daemon use "systemd" cgroup driver (MBH-581) and cgroup v2 (MBH-582),
Date | Step |
---|---|
from Fri 21st Mar 16:55 UTC | (production with no code change again) |
from Fri 24th Mar 11:45 UTC | Use ionice -c3 |
from Mon 27th Mar 12:50 UTC | Use --blkio-weight 50 |
from Mon 3rd Apr 14:15 UTC | Use --blkio-weight 10 |