prctl: add PR_{SET,GET}_CHILD_SUBREAPER to allow simple process supervision
Lennart Poettering authored

Userspace service managers/supervisors need to track their started
services.  Many services daemonize by double-forking and get implicitly
re-parented to PID 1.  The service manager will no longer be able to
receive the SIGCHLD signals for them, and is no longer in charge of
reaping the children with wait().  All information about the children is
lost at the moment PID 1 cleans up the re-parented processes.

With this prctl, a service manager process can mark itself as a sort of
'sub-init', able to stay as the parent for all orphaned processes
created by the started services.  All SIGCHLD signals will be delivered
to the service manager.

Receiving SIGCHLD and doing wait() is in cases of a service-manager much
preferred over any possible asynchronous notification about specific
PIDs, because the service manager has full access to the child process
data in /proc and the PID can not be re-used until the wait(), the
service-manager itself is in charge of, has happened.

As a side effect, the relevant parent PID information does not get lost
by a double-fork, which results in a more elaborate process tree and
'ps' output:

before:
  # ps afx
  253 ?        Ss     0:00 /bin/dbus-daemon --system --nofork
  294 ?        Sl     0:00 /usr/libexec/polkit-1/polkitd
  328 ?        S      0:00 /usr/sbin/modem-manager
  608 ?        Sl     0:00 /usr/libexec/colord
  658 ?        Sl     0:00 /usr/libexec/upowerd
  819 ?        Sl     0:00 /usr/libexec/imsettings-daemon
  916 ?        Sl     0:00 /usr/libexec/udisks-daemon
  917 ?        S      0:00  \_ udisks-daemon: not polling any devices

after:
  # ps afx
  294 ?        Ss     0:00 /bin/dbus-daemon --system --nofork
  426 ?        Sl     0:00  \_ /usr/libexec/polkit-1/polkitd
  449 ?        S      0:00  \_ /usr/sbin/modem-manager
  635 ?        Sl     0:00  \_ /usr/libexec/colord
  705 ?        Sl     0:00  \_ /usr/libexec/upowerd
  959 ?        Sl     0:00  \_ /usr/libexec/udisks-daemon
  960 ?        S      0:00  |   \_ udisks-daemon: not polling any devices
  977 ?        Sl     0:00  \_ /usr/libexec/packagekitd

This prctl is orthogonal to PID namespaces.  PID namespaces are isolated
from each other, while a service management process usually requires the
services to live in the same namespace, to be able to talk to each
other.

Users of this will be the systemd per-user instance, which provides
init-like functionality for the user's login session and D-Bus, which
activates bus services on-demand.  Both need init-like capabilities to
be able to properly keep track of the services they start.

Many thanks to Oleg for several rounds of review and insights.

[akpm@linux-foundation.org: fix comment layout and spelling]
[akpm@linux-foundation.org: add lengthy code comment from Oleg]
Reviewed-by: default avatarOleg Nesterov <oleg@redhat.com>
Signed-off-by: default avatarLennart Poettering <lennart@poettering.net>
Signed-off-by: default avatarKay Sievers <kay.sievers@vrfy.org>
Acked-by: default avatarValdis Kletnieks <Valdis.Kletnieks@vt.edu>
Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
ebec18a6
Name Last commit Last update
..
acpi Merge 3.3-rc2 into the driver-core-next branch.
asm-generic consolidate WARN_...ONCE() static variables
crypto crypto: remove the second argument of k[un]map_atomic()
drm headers: include linux/types.h where appropriate
keys keys: add a "logon" key type
linux prctl: add PR_{SET,GET}_CHILD_SUBREAPER to allow simple process supervision
math-emu math-emu: correct test for downshifting fraction in _FP_FROM_INT()
media Merge branch 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
misc [media] altera-stapl: it is time to move out from staging
mtd mtd: document that MEMWRITE ioctl is NAND-specific
net Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
pcmcia Merge git://git.kernel.org/pub/scm/linux/kernel/git/brodo/pcmcia-2.6
rdma Merge branches 'cma', 'cxgb3', 'cxgb4', 'ehca', 'iser', 'mad', 'nes', 'qib', 'srp' and 'srpt' into for-next
rxrpc atomic: use <linux/atomic.h>
scsi headers: include linux/types.h where appropriate
sound headers: include linux/types.h where appropriate
target target: Use array_zalloc for device_list
trace Merge tag 'nfs-for-3.4-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
video fbdev: sh_mipi_dsi: add extra phyctrl for sh_mipi_dsi_info
xen Merge tag 'stable/for-linus-3.4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen
Kbuild [SCSI] FC Pass Thru support