1. 2

  2. 1

    Good scenario to learn from (and avoid). Here is my short summary of the article:

    (2019) Upgraded Prod ubuntu servers from 2013 LTS to 2019. Upgrade required code/script changes. All was done, all was working well first 24 hours.

    Then started seeing “missing semaphores” (that were used by shopify/semian library). After much trouble shooting, it turns out that new Ubuntu’s system service logind was clearing up IPC queues, and the semaphore error message was coming up from that user account

    “.. A quick check-in with the service’s provisioning code confirmed that the account in question did in-fact live in user space, and remedial action would be to change default behaviour by setting RemoveIPC=no in /etc/systemd/logind.conf, or move the account to system space where it would be unaffected. ..”

    “ … Summed up succinctly, the service’s Unix account was in user space, whenever it was logged into then logged out of, all Inter Process Communication (IPC) resources would get cleaned up, of which semaphores are a part of. From the system’s perspective, semaphores were not missing, they were cleaned up as expected and it was the service that was misbehaving by trying to refer to semaphores that were no longer there. …”

    and the reason why it happened only after about 24 hour, is that login action was performed periodically to clear up logs, and that’s when logind was causing the IPC queue removal.