Skip to content

Improve resiliency to process corruption #79

@casperisfine

Description

@casperisfine

Context

A fundamental problem Pitchfork has to deal with is that both POSIX and Linux don't quite support running anything but async-signal safe function after a fork().

In practice, as long as you never spawned any background thread, you are fine. But many ruby applications and gems do spawn threads, and in presence of such background threads if we happen to fork at the wrong time, it can result in a sub process that is in an unrecoverable state.

The typical case is forking while a background thread hold a lock, in the child this lock will remain locked and trying to access it will dead lock.

For instance this can happen with OpenSSL 3:

    [/usr/lib/x86_64-linux-gnu/libc.so.6] pthread_rwlock_wrlock
    [/usr/lib/x86_64-linux-gnu/libcrypto.so.3] CRYPTO_THREAD_write_lock
    [/usr/lib/x86_64-linux-gnu/libcrypto.so.3] CRYPTO_alloc_ex_data
    [/usr/lib/x86_64-linux-gnu/libcrypto.so.3] OPENSSL_thread_stop
    [/usr/lib/x86_64-linux-gnu/libcrypto.so.3] OPENSSL_cleanup
    [/usr/lib/x86_64-linux-gnu/libc.so.6] secure_getenv

So any background thread that use a SSL connection may break reforking.

That's what Pitchfork.prevent_fork is for, but still, we should try to handle such scenario as gracefully as possible.

Action Plan

  • If we detect such case we should terminate the affected process.
  • Ideally we replace that process with a new one, but if for some reason we can't, we should gracefully terminate the whole server (last resort).
  • We should consider "reverting" Spawn molds instead of promoting workers #42.
    • Spawning the new mold out of a worker has the nice property of not impacting capacity as much
    • However that fork is risky because workers are even more likely than molds to have background threads.
    • We should probably warn for every thread in the mold (Puma does something similar)
    • (optional) We could provide a way to run background threads in a dedicated process outside the mold.
    • Provide a callback to validate post-fork processes
      • Maybe even validate the usual suspects by default (OpenSSL)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions