我这边需要实现一个接口: 参数是
{"pdf_file": "http://myserver/somefolder/somefile.pdf"}
输出是
{"key1": "value1", "key2":"value2"}
~~
PDF 文件为几页到几十页的文字
现在测试结果是
下载这个文件大约需要 5 秒
处理大约需要 5 秒
问题时如何加快这个过程?
网上搜到的一个例子是:
~~~python
class Job(BaseModel):
uid: UUID = Field(default=uuid4())
status: str = "in_progress"
result: int = None
jobs: Dict[UUID, Job] = {}
async def run_in_process(fn, *args):
loop = asyncio.get_event_loop()
return await loop.run_in_executor(app.state.executor, fn, *args) # wait and return result
async def start_cpu_bound_task(uid: UUID, stream: io.BufferedReader, filename: str) -> None:
jobs[uid].result = await run_in_process(cpu_bound_func, stream, filename)
jobs[uid].status = "complete"
@app.post("/new_cpu_bound_task/", status_code=HTTPStatus.ACCEPTED)
async def task_handler(req: InsurancePoliciesExtractionReqeust, background_tasks: BackgroundTasks):
new_task = Job()
jobs[new_task.uid] = new_task
## 以下是我改动的
content = None
async with aiohttp.ClientSession() as session:
async with session.get(req.pdf_file) as resp:
if resp.status == 200:
content = io.BytesIO(await resp.read())
filename = os.path.basename(req.pdf_file)
## 以上是我的改动
background_tasks.add_task(start_cpu_bound_task, new_task.uid, content, filename)
return new_task
@app.get("/status/{uid}")
async def status_handler(uid: UUID):
return jobs[uid]
@app.on_event("startup")
async def startup_event():
app.state.executor = ProcessPoolExecutor()
@app.on_event("shutdown")
async def on_shutdown():
app.state.executor.shutdown()
这是一个专为移动设备优化的页面(即为了让你能够在 Google 搜索结果里秒开这个页面),如果你希望参与 V2EX 社区的讨论,你可以继续到 V2EX 上打开本讨论主题的完整版本。
V2EX 是创意工作者们的社区,是一个分享自己正在做的有趣事物、交流想法,可以遇见新朋友甚至新机会的地方。
V2EX is a community of developers, designers and creative people.