Python 多进程

需求：批量检测 url 有效性

我有一个文本中有 10w 条网址，需要检测所有 url 是否能打开，并将能打开的 url 保存到文本中，不了解 python 的多进程，网上看了一些文章，有些懵逼，求指点

原代码如下：

import requests

url_result_success = []
url_result_failed = []

headers = {
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
    'Accept-Encoding': 'gzip, deflate, compress',
    'Accept-Language': 'en-us;q=0.5,en;q=0.3',
    'Cache-Control': 'max-age=0',
    'Connection': 'keep-alive',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:59.0) Gecko/20100101 Firefox/59.0'
}

with open(r'urls.txt', 'r') as f:
    for url in f:
        url = url.strip()
        print url
        try:
            response = requests.get(url, headers=headers, allow_redirects=True, timeout=5)
            if response.status_code != 200:
                raise requests.RequestException(u"Status code error: {}".format(response.status_code))
        except requests.RequestException as e:
            url_result_failed.append(url)
            continue
        url_result_success.append(url)

with open(r'valid_urls.txt', 'a+') as f:
    for url in url_result_success:
        url = url.strip()
        f.write(url + '\n')

Meli55a

2018-04-19 22:50:04 +08:00

@zmj1316 @Nubia @albertofwb @wzwwzw 代码要在 2003 上跑，没法装 3.6.。。
@doubleflower @cszhiyue @ToT 嗯，用多线程
@skyleft 找到过相关文章，值得研究
@lusi1990 你说的知道，但没有深入看
下午小区周边光纤让修地铁的干断了，才来的网。。。。

这是一个专为移动设备优化的页面（即为了让你能够在 Google 搜索结果里秒开这个页面），如果你希望参与 V2EX 社区的讨论，你可以继续到 V2EX 上打开本讨论主题的完整版本。

https://www.v2ex.com/t/448198

V2EX 是创意工作者们的社区，是一个分享自己正在做的有趣事物、交流想法，可以遇见新朋友甚至新机会的地方。

V2EX is a community of developers, designers and creative people.