写了一个微博相册的爬虫,顺便根据爬虫抓到的图片,写了个随机发图的 Telegram 的 Bot。

2015-11-17 09:52:41 +08:00
 lincanbin

新浪微博上有不少福利号,想保存一下他们的图片。
于是昨天中午吃饭前花了点时间,拿 Python 写了个爬虫,扔 VPS 上爬了上万张图。

难以遏制想与大家分享的心情,看了看 Telegram 的 Bot API 文档,发现 tg 的 bot 设计还是挺简单的,于是根据 Telegram 给的 demo 进行修改,做了一个发图的 Bot 出来。

Telegram 的 Bot 的演示: https://telegram.me/canbin_bot

爬虫: https://github.com/lincanbin/Sina-Weibo-Album-Downloader
Telegram Bot: https://github.com/lincanbin/Telegram-Simple-Image-Bot

13833 次点击
所在节点    分享创造
35 条回复
joewangyz
2015-11-18 14:44:55 +08:00
关键是福利号啊,,不然哪获取 OID 和 照片墙的 cookie 。。
cclishan
2015-11-18 16:12:30 +08:00
@lincanbin Tumblr 太多了 。。要不要公布关注列表。。
banri
2015-11-18 21:37:31 +08:00
200 个绝对领域!
touch
2015-11-19 18:35:58 +08:00
@lincanbin 之前也爬过微博信息。但是账号被封了。怎么解决的
lincanbin
2015-11-19 19:41:33 +08:00
@touch 爬图床就没事,爬 API 有频率限制的。
touch
2015-11-20 11:54:09 +08:00
@lincanbin 我是直接通过页面 html 爬取没有调用 api
touch
2015-11-20 11:55:53 +08:00
@lincanbin 爬取一段时间后就只直接账号被封。被检测到属于异常行为
lincanbin
2015-11-20 13:42:43 +08:00
@touch 我爬的那部分,都不用登录啊
touch
2015-11-20 14:05:54 +08:00
@lincanbin #;#
fuliti
2015-11-22 16:03:12 +08:00
感觉好神奇 ,可惜不会用。
JiaFeiX
2015-12-02 12:34:52 +08:00
请问楼主爬取的哪些账号?
bbjoe
2016-08-30 17:43:21 +08:00
请问爬相册老会爬漏是什么问题呢?比如 402 个图片 id ,运行完只得百来张。
lincanbin
2016-08-30 17:49:25 +08:00
@bbjoe 设置 CRAWL_PHOTOS_NUMBER = 402
要爬取的图片上限
lincanbin
2016-08-30 17:49:46 +08:00
@bbjoe 或者直接让 CRAWL_PHOTOS_NUMBER = 10000
设个非常大的数值。
yxqcyl
2017-01-20 08:59:29 +08:00
出现以下错误是什么原因?

['4065529837148919']
9f128f33jw1e8qgp5bmzyj2050050aa8.jpg
lxhxixi_org.gif
2Flxhxixi_org.gif
9f128f33ly1fbw8sp2ro7j20qo1beq4l.jpg
Exception in thread Thread-51:
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/urllib3/connection.py", line 137, in _new_conn
(self.host, self.port), self.timeout, **extra_kw)
File "/usr/lib/python3/dist-packages/urllib3/util/connection.py", line 67, in create_connection
for res in socket.getaddrinfo(host, port, 0, socket.SOCK_STREAM):
File "/usr/lib/python3.5/socket.py", line 732, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -3] Temporary failure in name resolution

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 559, in urlopen
body=body, headers=headers)
File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 353, in _make_request
conn.request(method, url, **httplib_request_kw)
File "/usr/lib/python3.5/http/client.py", line 1106, in request
self._send_request(method, url, body, headers)
File "/usr/lib/python3.5/http/client.py", line 1151, in _send_request
self.endheaders(body)
File "/usr/lib/python3.5/http/client.py", line 1102, in endheaders
self._send_output(message_body)
File "/usr/lib/python3.5/http/client.py", line 934, in _send_output
self.send(msg)
File "/usr/lib/python3.5/http/client.py", line 877, in send
self.connect()
File "/usr/lib/python3/dist-packages/urllib3/connection.py", line 162, in connect
conn = self._new_conn()
File "/usr/lib/python3/dist-packages/urllib3/connection.py", line 146, in _new_conn
self, "Failed to establish a new connection: %s" % e)
requests.packages.urllib3.exceptions.NewConnectionError: <requests.packages.urllib3.connection.HTTPConnection object at 0x7f6d60069b00>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/requests/adapters.py", line 376, in send
timeout=timeout

这是一个专为移动设备优化的页面(即为了让你能够在 Google 搜索结果里秒开这个页面),如果你希望参与 V2EX 社区的讨论,你可以继续到 V2EX 上打开本讨论主题的完整版本。

https://www.v2ex.com/t/236680

V2EX 是创意工作者们的社区,是一个分享自己正在做的有趣事物、交流想法,可以遇见新朋友甚至新机会的地方。

V2EX is a community of developers, designers and creative people.

© 2021 V2EX