批量获取某电影网站的下载链接

php 爬取网站下华语分类下的所有链接，匹配出 ftp 开头的下载链接，大概 1000 左右页面。直接放数组去重，貌似抓取到 300 页，程序就死掉了。有没有好的意见？

Wetoria

2017-05-20 03:26:49 +08:00

死掉是什么缘故？感觉你说的是内存崩了？

如果不是反爬虫，数据用数据库存呗，链接做主键，去重就简单了。

反爬虫的话，要针对性的应对了咯

Srar

2017-05-20 04:04:26 +08:00

http://php.net/manual/zh/function.set-time-limit.php

yangqi

2017-05-20 04:08:20 +08:00

代码写太烂了

liuwenxu

2017-05-20 05:07:20 +08:00

贴下代码大家可以跑下，就现有的逻辑可以优化吗？目前我只能手动每次 20，但是以后网站更新就太麻烦
$begin="http://www.ygdy8.net/html/gndy/china/list_4_2.html";
$data=array();
$detail=array();
for ($i=70; $i <90; $i++) {
$url="http://www.ygdy8.net/html/gndy/china/list_4_{$i}.html";
$str=file_get_contents($url);
if($str){
preg_match_all("/href=\"(.*?)\"/", $str,$urll);
foreach ($urll[1] as $key => $value) {
if (strpos($value, "gndy/dyzz/")) {
if(!in_array($value, $detail)){
$detail[]=$value;
}

}

}
}
}
print_r($detail);
$base="http://www.ygdy8.net";
foreach ($detail as $key => $value) {
$strs=file_get_contents($base.$value);
if($strs&&preg_match("/(ftp.*?)\"/", $strs,$urlls)){
if(!in_array($urlls[1], $data)){
$data[]=$urlls[1];

}

}
}
print_r($data);
$has=file("data.txt");
foreach ($data as $key => $value) {
if(empty($has)||!in_array($value, $has)){
$value=iconv("gbk", "utf-8", $value);
file_put_contents("data.txt", $value.PHP_EOL,FILE_APPEND);
}
}