celery 为何部署到服务器上变成同步阻塞了

2023-08-29 16:49:37 +08:00
 lzjunika

最近整个小应用

业务服务:gunicorn + flask

后台服务:celery

主业务主文件转换与上传,比较耗时,所以用 celery 跑后台服务

本地(Mac book): 能正常异步跑,自动切换多个 worker,无阻塞,是理想效果:

[2023-08-29 16:37:14,279: INFO/MainProcess] Task pss.api.offce.put_content_to_obs[54e1286f-74c4-48d4-98e5-99937c65714c] received
[2023-08-29 16:37:14,417: INFO/MainProcess] Task pss.api.offce.put_content_to_obs[5b4d53fc-afc2-4e50-9b9c-905d5eddddde] received
[2023-08-29 16:37:14,486: INFO/MainProcess] Task pss.api.offce.put_content_to_obs[8fb95642-6900-4aaa-b666-11f98e3a0eea] received
[2023-08-29 16:37:14,531: INFO/MainProcess] Task pss.api.offce.put_content_to_obs[4df12aa2-458c-4946-8aad-3ed25e68c5e0] received
[2023-08-29 16:37:14,583: INFO/MainProcess] Task pss.api.offce.put_content_to_obs[93192a4d-7569-44c7-a09b-7035ea331901] received
[2023-08-29 16:37:14,618: INFO/MainProcess] Task pss.api.offce.put_content_to_obs[6897f139-bc7a-4b9f-aab8-22c6d7a07a85] received
[2023-08-29 16:37:14,660: INFO/MainProcess] Task pss.api.offce.put_content_to_obs[d301e702-accd-44b2-b85e-2f7d3c3a4e4f] received
[2023-08-29 16:37:14,690: WARNING/ForkPoolWorker-8] requestId:
[2023-08-29 16:37:14,693: WARNING/ForkPoolWorker-8] 0000018A4070950C5A0294A2CECAB8DF
[2023-08-29 16:37:14,701: WARNING/ForkPoolWorker-8] [2023-08-29 16:37:14,701] WARNING in offce: obs_upload_file:OK
[2023-08-29 16:37:14,701: WARNING/ForkPoolWorker-8] obs_upload_file:OK
[2023-08-29 16:37:14,702: WARNING/ForkPoolWorker-8] test_1.png
[2023-08-29 16:37:14,736: INFO/MainProcess] Task pss.api.offce.put_content_to_obs[42c63363-9528-4f59-9e21-b2816907141f] received
[2023-08-29 16:37:14,737: INFO/ForkPoolWorker-8] Task pss.api.offce.put_content_to_obs[54e1286f-74c4-48d4-98e5-99937c65714c] succeeded in 0.4250246670001161s: True
[2023-08-29 16:37:14,755: WARNING/ForkPoolWorker-1] requestId:
[2023-08-29 16:37:14,756: WARNING/ForkPoolWorker-1] 0000018A407095555502052E5A386783
[2023-08-29 16:37:14,763: WARNING/ForkPoolWorker-1] [2023-08-29 16:37:14,761] WARNING in offce: obs_upload_file:OK
[2023-08-29 16:37:14,761: WARNING/ForkPoolWorker-1] obs_upload_file:OK
[2023-08-29 16:37:14,767: WARNING/ForkPoolWorker-1] test_2.png
[2023-08-29 16:37:14,785: INFO/ForkPoolWorker-1] Task pss.api.offce.put_content_to_obs[5b4d53fc-afc2-4e50-9b9c-905d5eddddde] succeeded in 0.3451121250000142s: True
[2023-08-29 16:37:14,788: INFO/MainProcess] Task pss.api.offce.put_content_to_obs[835c3945-2396-4490-94d0-421298d1813f] received
[2023-08-29 16:37:14,890: WARNING/ForkPoolWorker-2] requestId:
[2023-08-29 16:37:14,891: WARNING/ForkPoolWorker-2] 0000018A407095E8540AE00FC334E409
[2023-08-29 16:37:14,892: WARNING/ForkPoolWorker-2] [2023-08-29 16:37:14,892] WARNING in offce: obs_upload_file:OK
[2023-08-29 16:37:14,892: WARNING/ForkPoolWorker-2] obs_upload_file:OK
[2023-08-29 16:37:14,893: WARNING/ForkPoolWorker-2] test_3.png
[2023-08-29 16:37:14,895: INFO/ForkPoolWorker-2] Task pss.api.offce.put_content_to_obs[8fb95642-6900-4aaa-b666-11f98e3a0eea] succeeded in 0.3848593749999054s: True

服务器(Centos7.6): 只能同步跑,只有一个固定 worder 在运行,同步阻塞:

[2023-08-29 16:25:58,664: INFO/MainProcess] Task pss.api.offce.put_content_to_obs[873c3f38-98b4-47cc-98e8-6f65a58c3269] received
[2023-08-29 16:25:58,733: WARNING/ForkPoolWorker-7] requestId:
[2023-08-29 16:25:58,734: WARNING/ForkPoolWorker-7] 0000018A406644C054084BB9021C6A9B
[2023-08-29 16:25:58,734: WARNING/ForkPoolWorker-7] [2023-08-29 16:25:58,734] WARNING in offce: obs_upload_file:OK
[2023-08-29 16:25:58,734: WARNING/ForkPoolWorker-7] obs_upload_file:OK
[2023-08-29 16:25:58,734: WARNING/ForkPoolWorker-7] test_8.png
[2023-08-29 16:25:58,735: INFO/ForkPoolWorker-7] Task pss.api.offce.put_content_to_obs[873c3f38-98b4-47cc-98e8-6f65a58c3269] succeeded in 0.07009824365377426s: True
[2023-08-29 16:26:00,287: INFO/MainProcess] Task pss.api.offce.put_content_to_obs[7a4868f2-305f-4f6b-992c-6ea0791f3427] received
[2023-08-29 16:26:00,370: WARNING/ForkPoolWorker-7] requestId:
[2023-08-29 16:26:00,370: WARNING/ForkPoolWorker-7] 0000018A40664B17550A56827D6506B2
[2023-08-29 16:26:00,371: WARNING/ForkPoolWorker-7] [2023-08-29 16:26:00,371] WARNING in offce: obs_upload_file:OK
[2023-08-29 16:26:00,371: WARNING/ForkPoolWorker-7] obs_upload_file:OK
[2023-08-29 16:26:00,371: WARNING/ForkPoolWorker-7] test_9.png
[2023-08-29 16:26:00,372: INFO/ForkPoolWorker-7] Task pss.api.offce.put_content_to_obs[7a4868f2-305f-4f6b-992c-6ea0791f3427] succeeded in 0.08343333378434181s: True

如上运行日志

本地正常运行,有 3 个 worker:ForkPoolWorker-8 、ForkPoolWorker-1 和 ForkPoolWorker-2 ,并行处理,服务不阻塞

远程服务器只能同步运行,且只有一个固定的 worker: ForkPoolWorker-7,服务阻塞的

是不是远程哪里的配置不对,远程服务器启动信息摘要也跟本地一样:

 -------------- celery@xg-003 v5.2.7 (dawn-chorus)
--- ***** -----
-- ******* ---- Linux-3.10.0-1062.1.2.el7.x86_64-x86_64-with-centos-7.7.1908-Core 2023-08-29 15:48:54
- *** --- * ---
- ** ---------- [config]
- ** ---------- .> app:         pss:0x7f348eaec160
- ** ---------- .> transport:   redis://***:6379/6
- ** ---------- .> results:     redis://***:6379/6
- *** --- * --- .> concurrency: 8 (prefork)
-- ******* ---- .> task events: OFF (enable -E to monitor tasks in this worker)
--- ***** -----
 -------------- [queues]
                .> celery           exchange=celery(direct) key=celery

请各位指教,是不是哪里配置有问题

1596 次点击
所在节点    Python
19 条回复
kkk9
2023-08-29 16:56:03 +08:00
redis 正常吗
lzjunika
2023-08-29 16:56:17 +08:00
都能正常运行,本地由于是异步的,很快执行完毕
服务器上,一真无法异步,worker 从默认的 prefork ,调到 eventlet 和 gevent ,都不行
有没有人遇到这种情况
lzjunika
2023-08-29 16:56:47 +08:00
@kkk9 redis 正常,换成本机了
purensong
2023-08-29 16:58:43 +08:00
我觉得是参数的问题,
kkk9
2023-08-29 17:01:39 +08:00
首先确定队列是正常的,redis 存活,可以连接,密码没错

然后是 queues ,看服务器上只有 celery ,是否 mac 不是 celery ,另外指定?
kkk9
2023-08-29 17:02:09 +08:00
可以注意下 seetings 部分
lzjunika
2023-08-29 17:08:15 +08:00
服务器和本地运行 celery 的命令一样:
celery -A make_celery worker --loglevel INFO --logfile=/logs/celery.log
lzjunika
2023-08-29 17:09:57 +08:00
@kkk9 queues 和 seetings 这两怎么排查
Anybfans
2023-08-29 17:43:29 +08:00
安装个 flower 看下
celerysoft
2023-08-29 17:59:46 +08:00
@lzjunika #7 看了下你的启动命令,可能是因为没有指定 concurrency ,默认 concurrency 和机器的 CPU 核心数相同,盲猜服务器是 1 核的?那指定一下 concurrency 参数应该就好了
lzjunika
2023-08-29 18:05:36 +08:00
redis 确实是有值,说明是联通的:
127.0.0.1:6379[6]> keys *
1) "celery-task-meta-a144f43b-93eb-4047-bc01-6d0fdfe9b8f6"
2) "celery-task-meta-865395d9-2226-4969-a269-a93d56ee3c4c"
3) "celery-task-meta-2c44dafc-93e4-4792-8a40-7f747bbd063b"
4) "celery-task-meta-0203b744-504b-414f-adda-41b45fe2aff9"
5) "celery-task-meta-16d37b55-b645-4e05-b58b-55b87fbf4e37"
6) "celery-task-meta-1e2fc20a-a31d-41a3-9003-5c7ffef30e42"
7) "celery-task-meta-a819a02b-7c15-475d-907a-7ab5ed5221cd"
8) "celery-task-meta-c2779805-d922-4423-b2bd-976317e5486d"
9) "celery-task-meta-7a4868f2-305f-4f6b-992c-6ea0791f3427"
10) "celery-task-meta-ff756f38-02c7-4e1f-8b20-39db4722fe83"
11) "celery-task-meta-0e38860b-dd44-47c2-9e40-4a1f4a7c4bb4"
12) "celery-task-meta-3187c555-d3a3-46b1-bf13-3bc38bc79fbd"
13) "celery-task-meta-873c3f38-98b4-47cc-98e8-6f65a58c3269"
14) "_kombu.binding.celery"
15) "_kombu.binding.celery.pidbox"
16) "celery-task-meta-bca09af8-14f4-4d00-84d1-baae7d233070"
17) "celery-task-meta-4f2c9e67-86a8-410f-bbe4-1a408981fd1a"
18) "celery-task-meta-cc93cd0f-f931-4a8c-a24e-795863531953"
19) "celery-task-meta-53d64e39-c872-46d7-a392-57e8617b8751"
20) "celery-task-meta-30efb54a-9f95-46e0-bd49-4190d5684f4c"
21) "celery-task-meta-ca6a5f83-3cab-4111-92c8-f154c2e03766"
22) "celery-task-meta-02a741d2-7426-4339-ad57-a5eea00c72e6"
23) "_kombu.binding.celeryev"
24) "celery-task-meta-94218c29-08b7-4982-ac15-2bc349767fa6"
25) "celery-task-meta-2a9fd0de-2f14-4dbe-a26e-56d6a22c8466"
26) "celery-task-meta-2c9da801-8383-4829-8db0-a0cf5fb8030b"
27) "celery-task-meta-d3d0c01d-359d-45d2-809c-9cbc5072b73d"
28) "celery-task-meta-71610058-15ea-4d5c-b282-226448300228"
29) "celery-task-meta-ee4efe45-43c3-44e6-af0e-df843cb57ea6"
30) "celery-task-meta-6ea9d50a-6b6e-4e28-a8cb-837c6517da54"
lzjunika
2023-08-29 18:09:24 +08:00
@celerysoft
8 核 32g 的服务器,按默认启动 concurrency 值是 8 ,上面有运行摘要截图,这个试过了,手动指定也不行
lzjunika
2023-08-29 18:26:07 +08:00
手动指定了 3 个并发
唯一改变的是 workerid:celery -A make_celery worker --concurrency=3 --loglevel INFO

但还是同步的:

[2023-08-29 18:19:00,670: INFO/MainProcess] Task pss.api.offce.put_content_to_obs[02ac662b-e0bd-4cc3-b659-6345a471505a] received
[2023-08-29 18:19:00,756: WARNING/ForkPoolWorker-1] requestId:
[2023-08-29 18:19:00,756: WARNING/ForkPoolWorker-1] 0000018A40CDC0F9540ADCD7126FE0E9
[2023-08-29 18:19:00,757: WARNING/ForkPoolWorker-1] [2023-08-29 18:19:00,757] WARNING in offce: obs_upload_file:OK
[2023-08-29 18:19:00,757: WARNING/ForkPoolWorker-1] obs_upload_file:OK
[2023-08-29 18:19:00,757: WARNING/ForkPoolWorker-1] test_8.png
[2023-08-29 18:19:00,757: INFO/ForkPoolWorker-1] Task pss.api.offce.put_content_to_obs[02ac662b-e0bd-4cc3-b659-6345a471505a] succeeded in 0.08660224080085754s: True

[2023-08-29 18:19:02,301: INFO/MainProcess] Task pss.api.offce.put_content_to_obs[19d3c1aa-20be-4dcb-a819-360191532325] received
[2023-08-29 18:19:02,400: WARNING/ForkPoolWorker-1] requestId:
[2023-08-29 18:19:02,400: WARNING/ForkPoolWorker-1] 0000018A40CDC7595A03C83BB2923AA0
[2023-08-29 18:19:02,401: WARNING/ForkPoolWorker-1] [2023-08-29 18:19:02,401] WARNING in offce: obs_upload_file:OK
[2023-08-29 18:19:02,401: WARNING/ForkPoolWorker-1] obs_upload_file:OK
[2023-08-29 18:19:02,401: WARNING/ForkPoolWorker-1] test_9.png
[2023-08-29 18:19:02,402: INFO/ForkPoolWorker-1] Task pss.api.offce.put_content_to_obs[19d3c1aa-20be-4dcb-a819-360191532325] succeeded in 0.09988882020115852s: True
deplivesb
2023-08-30 16:34:44 +08:00
给我开个 ssh 我上去看看
lzjunika
2023-08-31 11:17:39 +08:00
这几天,被这事搞得有点晕了,我本地,以及本地 docker 都试过,可以异步,到服务器就不行
今天试了,在服务器中多运行一个 docker 容器,来跑 celery
也就是在服务器上宿主运行一个 celery worker ,docker 中运行一个 worker, 相当于有两个消息费了
但还是同步,发起了 10 个任务,宿主执行 4 个,docker 中执行了 6 个,有分配,但还是同步,总的执行时间没变

docker 的执行命令分别试了两种,但一样的效果:
Dockerfile1(使用默认的 prefork ,8 个并发):
...
CMD ["celery", "-A", "docker_celery", "worker", "--loglevel", "INFO", "--logfile=logs/celery_docker.log"]

Dockerfile2(使用 eventlet ,5 并发):
...
CMD ["celery", "-A", "docker_celery", "worker", "--pool=eventlet", "--concurrency=5", "--loglevel", "INFO", "--logfile=logs/celery_docker.log"]

一样的结果,同步,总执行时间不变,郁闷了...
lzjunika
2023-08-31 11:21:36 +08:00
一个生产者发任务,对应两个消费者(宿主服务+docker 服务)了,任务消息能正常处理,就是不能异步
生产者发任务很正常:
put_content_to_obs.delay(new_name, local_name)
在生产者端也没要求返回结果,只发送
lzjunika
2023-08-31 11:22:13 +08:00
到底是哪配置的问题,非常诡异,还是换个服务器?
lzjunika
2023-09-01 11:23:51 +08:00
终于看到希望了,在同一个服务器上,部署一个最小 flask+celery 服务,通过 docker-compose 跑起来,可以同步了
那就是服务器没问题,redis 没问题,就是原来程序哪里写的不对,再找找:

[2023-09-01 11:08:45,531: INFO/MainProcess] Task app.controller.index.add_together[d4885e83-f346-46b9-98c2-a9f981d7d1de] received
[2023-09-01 11:08:45,533: INFO/MainProcess] Task app.controller.index.add_together[0abc8808-5603-4c61-87de-f6bcd2747d53] received
[2023-09-01 11:08:45,535: INFO/MainProcess] Task app.controller.index.add_together[e1211bbc-8a76-4d8c-94d6-e3904cc50bdc] received
[2023-09-01 11:08:45,538: INFO/MainProcess] Task app.controller.index.add_together[3a099971-abc5-4c2c-b784-1a2aaba86a24] received
[2023-09-01 11:08:45,539: INFO/MainProcess] Task app.controller.index.add_together[f1a6604d-2757-4742-b4b5-33c4b92bbbb8] received
[2023-09-01 11:08:45,541: INFO/MainProcess] Task app.controller.index.add_together[d380858f-3e65-4569-bcea-54ea8db5e6cf] received
[2023-09-01 11:08:45,542: INFO/MainProcess] Task app.controller.index.add_together[740fbfed-7074-49f1-8680-6ddc48bfc2da] received
[2023-09-01 11:08:45,544: INFO/MainProcess] Task app.controller.index.add_together[78b6ee5f-15a0-409b-b41f-709b0fdcb818] received
[2023-09-01 11:08:45,545: INFO/MainProcess] Task app.controller.index.add_together[a482a9d2-1ffd-47df-b421-0bfcd1b386e1] received
[2023-09-01 11:08:45,546: INFO/MainProcess] Task app.controller.index.add_together[7baa35a0-d695-4010-8120-051d5eea9af7] received
[2023-09-01 11:08:46,535: INFO/ForkPoolWorker-7] Task app.controller.index.add_together[d4885e83-f346-46b9-98c2-a9f981d7d1de] succeeded in 1.0014203377068043s: 231
[2023-09-01 11:08:46,535: INFO/ForkPoolWorker-8] Task app.controller.index.add_together[0abc8808-5603-4c61-87de-f6bcd2747d53] succeeded in 1.001225769519806s: 647
[2023-09-01 11:08:46,537: INFO/ForkPoolWorker-1] Task app.controller.index.add_together[e1211bbc-8a76-4d8c-94d6-e3904cc50bdc] succeeded in 1.001103661954403s: 308
[2023-09-01 11:08:46,540: INFO/ForkPoolWorker-2] Task app.controller.index.add_together[3a099971-abc5-4c2c-b784-1a2aaba86a24] succeeded in 1.0009450502693653s: 735
[2023-09-01 11:08:46,542: INFO/ForkPoolWorker-3] Task app.controller.index.add_together[f1a6604d-2757-4742-b4b5-33c4b92bbbb8] succeeded in 1.0019154399633408s: 554
[2023-09-01 11:08:46,544: INFO/ForkPoolWorker-5] Task app.controller.index.add_together[740fbfed-7074-49f1-8680-6ddc48bfc2da] succeeded in 1.000898975878954s: 455
[2023-09-01 11:08:46,545: INFO/ForkPoolWorker-4] Task app.controller.index.add_together[d380858f-3e65-4569-bcea-54ea8db5e6cf] succeeded in 1.0016995184123516s: 771
[2023-09-01 11:08:46,546: INFO/ForkPoolWorker-6] Task app.controller.index.add_together[78b6ee5f-15a0-409b-b41f-709b0fdcb818] succeeded in 1.0007124096155167s: 281
[2023-09-01 11:08:47,537: INFO/ForkPoolWorker-8] Task app.controller.index.add_together[7baa35a0-d695-4010-8120-051d5eea9af7] succeeded in 1.00179473310709s: 788
[2023-09-01 11:08:47,538: INFO/ForkPoolWorker-7] Task app.controller.index.add_together[a482a9d2-1ffd-47df-b421-0bfcd1b386e1] succeeded in 1.0018408931791782s: 729
misoomang
2023-09-04 14:10:21 +08:00
作为菜鸡提出的验证想法:可以写一个 @worker_process_init.connect() 写个方法打印日志,确定是否有多个 worker

猜想:在服务器上从上面看上传 test_8.png 任务是执行了 0.08s ,间隔接收 test_9.png 有 2s 的时间间隔,是否会是 io 的影响导致接收图片过慢,导致第一个任务再 worker1 执行,第二个任务也在 worker1 中执行

这是一个专为移动设备优化的页面(即为了让你能够在 Google 搜索结果里秒开这个页面),如果你希望参与 V2EX 社区的讨论,你可以继续到 V2EX 上打开本讨论主题的完整版本。

https://www.v2ex.com/t/969246

V2EX 是创意工作者们的社区,是一个分享自己正在做的有趣事物、交流想法,可以遇见新朋友甚至新机会的地方。

V2EX is a community of developers, designers and creative people.

© 2021 V2EX