用 PHP 写了一个
links.php
```
<?php
require __DIR__ . '/vendor/autoload.php';
global $base_uri, $wait_replace_imgs;
$base_uri = '
https://www.v2ex.com';
$t1 = microtime(true);
try {
set_time_limit(1800);
ini_set("max_execution_time", 1800);
ini_set('memory_limit', '512M');
$html_node = file_get_contents($base_uri);
$crawler = new \Symfony\Component\DomCrawler\Crawler($html_node, $base_uri);
$links = $crawler->filter('a')->links();
foreach ($links as $link) {
$temp_links[] = ['url' => $link->getUri(), 'text' => $link->getNode()->textContent];
}
file_put_contents('links.txt', json_encode($temp_links, JSON_UNESCAPED_UNICODE));
echo 'success<br/>';
$t2 = microtime(true);
echo 'time consuming ' . round($t2 - $t1, 3) . ' s' . PHP_EOL;
} catch (Exception $exception) {
echo $exception->getCode() . ', message:' . $exception->getMessage();
}
```
部分效果
```
[
{
"url": "https:\/\/
www.v2ex.com\/t\/575511",
"text": "写代码的时候没有思路 不知道如何写起,请教如何培养训练编程思路 谢谢!"
},
{
"url": "https:\/\/
www.v2ex.com\/member\/fanmouji",
"text": ""
},
{
"url": "https:\/\/
www.v2ex.com\/t\/575397",
"text": "JD 的 618 是不是走走过场?"
}
]
```
项目地址
https://github.com/MasterCloner/Cornerstone