如何限制多处理流程的范围？

由于的性质

os.fork()

，

__main__

模块的全局名称空间中的任何变量都将被子进程继承（假设您使用的是Posix平台），因此您将看到子进程中的内存使用情况立即反映出来被创建。据我所知，直到您实际尝试在子级中更改内存之前，我都不知道是否真正分配了所有内存，此时将创建新副本。另一方面，Windows不使用

os.fork()

-它在每个子级中重新导入主模块，并对要发送给子级的任何局部变量进行腌制。因此，使用Windows，您实际上可以通过仅在

if__name__ == "__main__":

防护中定义它来避免在子级中复制较大的全局结尾，因为该防护中的所有内容都只会在父进程中运行：

import timeimport multiprocessingdef foo(x):    for x in range(2**28):pass    print(x**2)if __name__ == "__main__":    completely_unrelated_array = list(range(2**25)) # This will only be defined in the parent on Windows    P = multiprocessing.Pool()    for x in range(8):        multiprocessing.Process(target=foo, args=(x,)).start()

现在，在Python
2.x中，

multiprocessing.Process

如果您使用的是Posix平台，则只能通过分叉来创建新对象。但是在Python
3.4上，您可以使用上下文指定如何创建新进程。因此，我们可以指定

"spawn"

上下文（Windows使用该上下文）来创建新进程，并使用相同的技巧：

# Note that this is Python 3.4+ onlyimport timeimport multiprocessingdef foo(x):    for x in range(2**28):pass    print(x**2)if __name__ == "__main__":    completely_unrelated_array = list(range(2**23))  # Again, this only exists in the parent    ctx = multiprocessing.get_context("spawn") # Use process spawning instead of fork    P = ctx.Pool()    for x in range(8):        ctx.Process(target=foo, args=(x,)).start()

如果您需要2.x支持，或者想坚持使用

os.fork()

创建新

Process

对象，那么我认为最好的办法是立即删除子级中有问题的对象，以减少报告的内存使用情况：

import timeimport multiprocessing import gcdef foo(x):    init()    for x in range(2**28):pass    print(x**2)def init():    global completely_unrelated_array    completely_unrelated_array = None    del completely_unrelated_array    gc.collect()if __name__ == "__main__":    completely_unrelated_array = list(range(2**23))    P = multiprocessing.Pool(initializer=init)    for x in range(8):        multiprocessing.Process(target=foo, args=(x,)).start()    time.sleep(100)

如何限制多处理流程的范围？

面试问答相关栏目本月热门文章