请问这个 asyncio 异步访问页面怎么写可以更加快？

代码如下：先在主页获取子页面，大概 64 个，然后并发访问子页面，总用时大概 36s，感觉还是有点慢。


URL_MAP = {'home_page': 'https://xxx/stocks/industry', 'base': 'https://xxx.com'}


class App(BaseService):

    def __init__(self):
        super(App, self).__init__()


    async def home_page(self):
        start = time.time()
        async with aiohttp.ClientSession() as session:

            async with session.get(url=URL_MAP['home_page'], headers=headers) as response:
                    html = await response.text()  # 这个阻塞
                    resp = Selector(text=html)
                    industries = resp.xpath('//ul[@class="list-unstyled"]/a')
                    task_list =[]
                    for industry in industries:
                        json_data = {}
                        industry_url = industry.xpath('.//@href').extract_first()
                        industry_name = industry.xpath('.//li/text()').extract_first()
                        json_data['industry_url'] = industry_url
                        json_data['industry_name'] = industry_name

                        task = asyncio.ensure_future(self.detail_list(session, industry_url, json_data))
                        task_list.append(task)

                    await asyncio.gather(*task_list)
                    end = time.time()

                    print(f'time used {end-start}')

    async def detail_list(self, session, url, json_data):

        async with session.get(URL_MAP['base']+url, headers=headers) as response:
            response = await response.text()
            self.parse_detail(response, json_data)

    def parse_detail(self, html, json_data=None):
        resp = Selector(text=html)
        # info = resp.xpath('//div[@id="v_desc"]/div[@class="info open"]/text()').extract_first()
        title =resp.xpath('//title/text()').extract_first()
        print(title)


app = Holdle()
loop = asyncio.get_event_loop()
loop.run_until_complete(app.home_page())

ClericPy

2020-11-24 21:59:47 +08:00

把每个请求用时打出来挨个看看, 找不到再考虑代码层面优化(比如 pysnooper), 看起来没有异常同步代码阻塞协程的地方, 该复用 Session 的地方也复用了, 所以感觉主要网速问题上...

如果同步代码还想稍微提速, 可以试试 selectolax, 测过它和 lxml 的 cssselector 比较, 快了大约一倍, 不过没做 xpath 的比较, 用惯了 css 的语义以后很少碰 xpath 了

这是一个专为移动设备优化的页面（即为了让你能够在 Google 搜索结果里秒开这个页面），如果你希望参与 V2EX 社区的讨论，你可以继续到 V2EX 上打开本讨论主题的完整版本。

https://www.v2ex.com/t/728863