求助:想抓取这个网页的图片,有什么好办法?

2014-08-01 14:03:00 +08:00
 androidBrant
http://www.szeros-wedding.com/html/service/804.html#1

里面的图片,帮忙,谢谢。。
4230 次点击
所在节点    程序员
26 条回复
xiandao7997
2014-08-01 14:04:29 +08:00
Wget
faceair
2014-08-01 14:14:42 +08:00
http://www.szeros-wedding.com/UpFile/editor/2014032002455418.jpg
http://www.szeros-wedding.com/UpFile/editor/2014032002455652.jpg
http://www.szeros-wedding.com/UpFile/editor/2014032002456340.jpg
http://www.szeros-wedding.com/UpFile/editor/2014032002456480.jpg
http://www.szeros-wedding.com/UpFile/editor/2014032002457027.jpg
http://www.szeros-wedding.com/UpFile/editor/2014032002457496.jpg
http://www.szeros-wedding.com/UpFile/editor/2014032002457996.jpg
http://www.szeros-wedding.com/UpFile/editor/2014032002458527.jpg
http://www.szeros-wedding.com/UpFile/editor/2014032002458652.jpg
http://www.szeros-wedding.com/UpFile/editor/2014032002459152.jpg
http://www.szeros-wedding.com/UpFile/editor/2014032002460184.jpg
http://www.szeros-wedding.com/UpFile/editor/2014032002460340.jpg
http://www.szeros-wedding.com/UpFile/editor/2014032002460512.jpg
http://www.szeros-wedding.com/UpFile/editor/2014032002461262.jpg
http://www.szeros-wedding.com/UpFile/editor/2014032002461902.jpg
http://www.szeros-wedding.com/UpFile/editor/2014032002462480.jpg
http://www.szeros-wedding.com/UpFile/editor/2014032002463027.jpg
http://www.szeros-wedding.com/UpFile/editor/2014032002463746.jpg
http://www.szeros-wedding.com/UpFile/editor/2014032002464809.jpg
http://www.szeros-wedding.com/UpFile/editor/2014032002464934.jpg
http://www.szeros-wedding.com/UpFile/editor/2014032002465652.jpg
http://www.szeros-wedding.com/UpFile/editor/2014032002466230.jpg
http://www.szeros-wedding.com/UpFile/editor/2014032002466730.jpg
http://www.szeros-wedding.com/UpFile/editor/2014032002466918.jpg
http://www.szeros-wedding.com/UpFile/editor/2014032002467590.jpg
http://www.szeros-wedding.com/UpFile/editor/2014032002467746.jpg
http://www.szeros-wedding.com/UpFile/editor/2014032002468449.jpg
http://www.szeros-wedding.com/UpFile/editor/2014032002469090.jpg
http://www.szeros-wedding.com/UpFile/editor/2014032002469230.jpg
http://www.szeros-wedding.com/UpFile/editor/2014032002469902.jpg
http://www.szeros-wedding.com/UpFile/editor/2014032002470699.jpg
http://www.szeros-wedding.com/UpFile/editor/2014032002470840.jpg

贴到迅雷应该可以批量下载
zzetao
2014-08-01 14:16:56 +08:00
其实一些浏览器的插件可以做到
androidBrant
2014-08-01 14:17:03 +08:00
@faceair 如何快速抓到这些地址的?
nealv2ex
2014-08-01 14:23:55 +08:00
list = $('.pic img').map(function(o,item){
var a = document.createElement('a');
a.href = $(item).attr('original');
return a.href;
})
androidBrant
2014-08-01 14:25:02 +08:00
@xiandao7997

jiaqiqunaerdeiMac:pic jiaqiqunaer$ wget -r http://www.szeros-wedding.com/UpFile/editor/

--2014-08-01 14:20:58-- http://www.szeros-wedding.com/UpFile/editor/
Resolving www.szeros-wedding.com... 211.154.142.215
Connecting to www.szeros-wedding.com|211.154.142.215|:80... connected.
HTTP request sent, awaiting response... 403 Forbidden
2014-08-01 14:20:59 ERROR 403: Forbidden.
NemoAlex
2014-08-01 14:27:23 +08:00
faceair
2014-08-01 14:29:01 +08:00
imn1
2014-08-01 14:29:58 +08:00
save as...
complete html
Roboo
2014-08-01 14:32:35 +08:00
idm
xiandao7997
2014-08-01 14:36:02 +08:00
Wget -r --level=2 --accept=jpg [标题里的 url]
结束后在子目录的 upfile/editor 里面找
xiandao7997
2014-08-01 14:36:51 +08:00
@imn1 感觉自己 《社交网络》白看了
wesley
2014-08-01 15:24:05 +08:00
先清空浏览器缓存, 再打开那个网页, 再去浏览器缓存文件夹里找
androidBrant
2014-08-01 15:25:19 +08:00
@faceair 用xpath如何找到这些地址,表达式,谢谢
mengzhuo
2014-08-01 15:36:09 +08:00
再来个python版

import requests
from lxml import html
URL = 'http://www.szeros-wedding.com/html/service/804.html#1'
[x.attrib['src'] for x in html.fromstring(requests.get('http://www.szeros-wedding.com/html/service/804.html#1').text).xpath('//img')]

-------

['/skins/20140425/images/bg74.gif',
'/skins/20140425/images/t0.gif',
'/skins/20140425/images/t3.gif',
'/skins/20140425/images/t01.gif',
'/skins/20140425/images/t01.gif',
'/skins/20140425/images/bg75.gif',
'/skins/20140425/images/logo.jpg',
'/skins/20140425/images/bg6.jpg',
'/skins/20140425/images/bg7.jpg',
'/skins/20140425/images/bg8.jpg',
'/skins/20140425/images/bg9.jpg',
'/skins/20140425/images/bg10.jpg',
'/skins/20140425/images/f.jpg',
'/ueditor/asp/../../UpFile/editor/2014032002455418.jpg',
'/ueditor/asp/../../UpFile/editor/2014032002455652.jpg',
'/ueditor/asp/../../UpFile/editor/2014032002456340.jpg',
'/ueditor/asp/../../UpFile/editor/2014032002456480.jpg',
'/ueditor/asp/../../UpFile/editor/2014032002457027.jpg',
'/ueditor/asp/../../UpFile/editor/2014032002457496.jpg',
'/ueditor/asp/../../UpFile/editor/2014032002457996.jpg',
'/ueditor/asp/../../UpFile/editor/2014032002458527.jpg',
'/ueditor/asp/../../UpFile/editor/2014032002458652.jpg',
'/ueditor/asp/../../UpFile/editor/2014032002459152.jpg',
'/ueditor/asp/../../UpFile/editor/2014032002460184.jpg',
'/ueditor/asp/../../UpFile/editor/2014032002460340.jpg',
'/ueditor/asp/../../UpFile/editor/2014032002460512.jpg',
'/ueditor/asp/../../UpFile/editor/2014032002461262.jpg',
'/ueditor/asp/../../UpFile/editor/2014032002461902.jpg',
'/ueditor/asp/../../UpFile/editor/2014032002462480.jpg',
'/ueditor/asp/../../UpFile/editor/2014032002463027.jpg',
'/ueditor/asp/../../UpFile/editor/2014032002463746.jpg',
'/ueditor/asp/../../UpFile/editor/2014032002464809.jpg',
'/ueditor/asp/../../UpFile/editor/2014032002464934.jpg',
'/ueditor/asp/../../UpFile/editor/2014032002465652.jpg',
'/ueditor/asp/../../UpFile/editor/2014032002466230.jpg',
'/ueditor/asp/../../UpFile/editor/2014032002466730.jpg',
'/ueditor/asp/../../UpFile/editor/2014032002466918.jpg',
'/ueditor/asp/../../UpFile/editor/2014032002467590.jpg',
'/ueditor/asp/../../UpFile/editor/2014032002467746.jpg',
'/ueditor/asp/../../UpFile/editor/2014032002468449.jpg',
'/ueditor/asp/../../UpFile/editor/2014032002469090.jpg',
'/ueditor/asp/../../UpFile/editor/2014032002469230.jpg',
'/ueditor/asp/../../UpFile/editor/2014032002469902.jpg',
'/ueditor/asp/../../UpFile/editor/2014032002470699.jpg',
'/ueditor/asp/../../UpFile/editor/2014032002470840.jpg',
'/skins/20140425/images/f.jpg',
'/skins/20140425/images/jd.jpg',
'/skins/20140425/hzjd/1.jpg',
'/skins/20140425/hzjd/2.jpg',
'/skins/20140425/hzjd/3.jpg',
'/skins/20140425/hzjd/4.jpg',
'/skins/20140425/hzjd/5.jpg',
'/skins/20140425/hzjd/6.jpg',
'/skins/20140425/hzjd/7.jpg',
'/skins/20140425/hzjd/8.jpg',
'/skins/20140425/hzjd/9.jpg',
'/skins/20140425/hzjd/10.jpg',
'/skins/20140425/images/link.jpg',
'/skins/20140425/images/logo1.jpg']
zoudm
2014-08-01 16:34:16 +08:00
@androidBrant

Xpath:

/html/body/table/tbody/tr[1]/td/table/tbody/tr[5]/td/div/p[1]/img
...
...
/html/body/table/tbody/tr[1]/td/table/tbody/tr[5]/td/div/p[5]/img
muziyue
2014-08-01 17:05:27 +08:00
如果不是特别多的页面的话,我一般都是curl+s 然后文件夹里找
decken
2014-08-01 17:54:01 +08:00
decken
2014-08-01 17:55:25 +08:00
mopvhs
2014-08-02 10:20:49 +08:00
来分享下我的常用方法:



http://gist.github.com/4b93757c88b5fe558846

这是一个专为移动设备优化的页面(即为了让你能够在 Google 搜索结果里秒开这个页面),如果你希望参与 V2EX 社区的讨论,你可以继续到 V2EX 上打开本讨论主题的完整版本。

https://www.v2ex.com/t/125609

V2EX 是创意工作者们的社区,是一个分享自己正在做的有趣事物、交流想法,可以遇见新朋友甚至新机会的地方。

V2EX is a community of developers, designers and creative people.

© 2021 V2EX