请教 python 多进程并行计算的问题

2016-05-23 15:02:22 +08:00
 kingmo888

在网上看到一个 demo ,自己整理了一下,运行 ok 。因为是在 if name == "main"下执行的。

如果直接将 py 文件作为一个脚本运行,而不是单文件测试判断语句下的时候,多进程就会崩溃的一塌糊涂。。

运行了多次发现,感觉就像是开了 N 个进程后,脚本从头至尾再执行 N 遍?

PS:与例子无关的题外话:我想要实现的功能是,从某个地方读大量数据过来,然后多进程进行计算 - -!

以下是正常脚本:

import multiprocessing
import time
import pandas as pd
data = {}

for i in range(10):
    data[i] = pd.DataFrame(list(range(1000)),columns=['num'])

def tfunc(key, data):
    data['sum'] = data['num'].cumsum()
    #data['ma5'] = pd.rolling_apply()
    for i in range(len(data)):
        a=data.at[i,'sum']
        if a>5:
            pass
        if a>10:
            pass
        time.sleep(0.001)
    return data


def func(msg):
    for i in range(3):
        print ('func:',msg)
        #time.sleep(1)
    return "done " + msg

if __name__ == "__main__":
    pool = multiprocessing.Pool(processes=4)
    result = []
    for i in range(10):
        msg = "hello %d" %(i)
        result.append(pool.apply_async(tfunc, (i,data[i] )))
    pool.close()
    pool.join()
    for res in result:
        print('result:', res.get())
    print ("Sub-process(es) done.")

以下是出问题脚本,两个脚本差异是取消了—— if name == "main":。。。。。。。

import multiprocessing
import time
import pandas as pd
data = {}

for i in range(10):
    data[i] = pd.DataFrame(list(range(1000)),columns=['num'])




def tfunc(key, data):
    data['sum'] = data['num'].cumsum()
    #data['ma5'] = pd.rolling_apply()
    for i in range(len(data)):
        a=data.at[i,'sum']
        if a>5:
            pass
        if a>10:
            pass
        #time.sleep(0.001)
    return data


def func(msg):
    for i in range(3):
        print ('func:',msg)
        #time.sleep(1)
    return "done " + msg


pool = multiprocessing.Pool(processes=4)
result = []
for i in range(10):
    msg = "hello %d" %(i)
    result.append(pool.apply_async(tfunc, (i,data[i] )))
pool.close()
pool.join()
for res in result:
    print('result:', res.get())
print ("Sub-process(es) done.")
2715 次点击
所在节点    Python
4 条回复
SErHo
2016-05-23 16:24:23 +08:00
kingmo888
2016-05-23 16:31:42 +08:00
@SErHo 是 windows ,在当前脚本下,使用 if __name__ == "__main__":进行测试就 ok ,加不加都行,不用 if __name__ == "__main__":就不行。加不加都一样 - -!
joshz
2016-05-23 16:52:12 +08:00
Due to the way the new processes are started, the child process needs to be able to import the script containing the target function. Wrapping the main part of the application in a check for __main__ ensures that it is not run recursively in each child as the module is imported. Another approach is to import the target function from a separate script.

参考: https://pymotw.com/2/multiprocessing/basics.html
likuku
2016-05-23 16:57:30 +08:00
if __name__ == "__main__" 的话,表示整个脚本从这之后开始跑,没有的话,就是从文件第一行顺次执行。

既然都这样了,何不直接再加个 main 函数,将你 if __name__ == "__main__" 之下的都 放进 main() 里

这是一个专为移动设备优化的页面(即为了让你能够在 Google 搜索结果里秒开这个页面),如果你希望参与 V2EX 社区的讨论,你可以继续到 V2EX 上打开本讨论主题的完整版本。

https://www.v2ex.com/t/280651

V2EX 是创意工作者们的社区,是一个分享自己正在做的有趣事物、交流想法,可以遇见新朋友甚至新机会的地方。

V2EX is a community of developers, designers and creative people.

© 2021 V2EX