守护进程是在一类脱离终端在后台执行的程序, 通常以d
结尾, 随系统启动, 其父进程(ppid)通常是init
进程
一般要让当前程序以守护进程形式运行, 在命令后加&
并重定向输出即可
$ python someprogram.py > /dev/null 2>&1 &
或者使用nohup
也可以
创建子进程, 父进程退出
父进程先于子进程退出会造成子进程成为孤儿进程,而每当系统发现一个孤儿进程时,就会自动由1号进程(init)收养它,这样,原先的子进程就会变成init进程的子进程-
在子进程中创建新会话
更改工作目录到
/
, 以便umount
一个文件系统重设文件权限掩码, 以便拥有完全的写的权限, 即重设继承来的默认文件权限值
调用
setuid
, 让当前进程成为新的会话组长和进程组长
执行第二次fork
关闭文件描述符, 一般是输入/输出和错误输出, 重定向到/dev/null
py代码
上面守护进程的生成步骤中涉及到了孤儿进程
任何孤儿进程产生时都会立即为系统进程init自动接收为子进程,这一过程也被称为“收养”. 但由于创建该进程的进程已不存在,所以仍应称之为“孤儿进程”与之相关的一个概念就是 僵尸进程了. 当子进程退出时, 父进程需要wait/waitpid
系统调用来读取子进程的exit status, 然后子进程被系统回收. 如果父进程没有wait的话, 子进程将变成一个"僵尸进程", 内核会释放这个子进程所有的资源,包括打开的文件占用的内存等, 但在进程表中仍然有一个PCB, 记录进程号和退出状态等信息, 并导致进程号一直被占用, 而系统能使用的进程号数量是有限的(可以用ulimit
查看相关限制), 如果产生大量僵尸进程的话, 将因为没有可用的进程号而导致系统不能产生新的进程
因此很多自带重启功能的服务实现就是用wait/waitpid
实现的.
def fork_processes(num_processes, max_restarts=100): """Starts multiple worker processes. If ``num_processes`` is None or <= 0, we detect the number of cores available on this machine and fork that number of child processes. If ``num_processes`` is given and > 0, we fork that specific number of sub-processes. Since we use processes and not threads, there is no shared memory between any server code. Note that multiple processes are not compatible with the autoreload module (or the ``autoreload=True`` option to `tornado.web.Application` which defaults to True when ``debug=True``). When using multiple processes, no IOLoops can be created or referenced until after the call to ``fork_processes``. In each child process, ``fork_processes`` returns its *task id*, a number between 0 and ``num_processes``. Processes that exit abnormally (due to a signal or non-zero exit status) are restarted with the same id (up to ``max_restarts`` times). In the parent process, ``fork_processes`` returns None if all child processes have exited normally, but will otherwise only exit by throwing an exception. """ global _task_id assert _task_id is None if num_processes is None or num_processes <= 0: num_processes = cpu_count() if ioloop.IOLoop.initialized(): raise RuntimeError("Cannot run in multiple processes: IOLoop instance " "has already been initialized. You cannot call " "IOLoop.instance() before calling start_processes()") gen_log.info("Starting %d processes", num_processes) children = {} def start_child(i): pid = os.fork() if pid == 0: # child process _reseed_random() global _task_id _task_id = i return i else: children[pid] = i return None for i in range(num_processes): id = start_child(i) if id is not None: return id num_restarts = 0 while children: try: pid, status = os.wait() except OSError as e: if errno_from_exception(e) == errno.EINTR: continue raise if pid not in children: continue id = children.pop(pid) if os.WIFSIGNALED(status): gen_log.warning("child %d (pid %d) killed by signal %d, restarting", id, pid, os.WTERMSIG(status)) elif os.WEXITSTATUS(status) != 0: gen_log.warning("child %d (pid %d) exited with status %d, restarting", id, pid, os.WEXITSTATUS(status)) else: gen_log.info("child %d (pid %d) exited normally", id, pid) continue num_restarts += 1 if num_restarts > max_restarts: raise RuntimeError("Too many child restarts, giving up") new_id = start_child(id) if new_id is not None: return new_id # All child processes exited cleanly, so exit the master process # instead of just returning to right after the call to # fork_processes (which will probably just start up another IOLoop # unless the caller checks the return value). sys.exit(0)
参考: