We had a site outage because memcached on wiley crashed. This caused all of the web front ends to crash with the previously seen error message:
2011-11-28 17:31:50.495995500 [error] Caught exception in engine "Couldn't save expires:c1c611ace1febf8bc148e4a72d67ccbce510332b / 1322508709 in memcached storage"
(this message was taken from
MBS-3590. nagios did not send any messages about memcached on wiley being down. Can you please verify that we are monitoring memcached and if not, please add monitoring? Also, I see the mediawiki instance of memcached is being managed by daemontools – can we move the second instance of memcached on wiley to daemontools as well? The second instance runs on port 11215 and is started by /etc/init.d