Golang 中 http.Get 的耗时比 curl 耗时大很多是什么原因?

2018-12-05 16:49:17 +08:00
 rayhy

可能这个问题比较小白,大家轻喷

我用 Golang 中http库的Get函数写个简单的下载网页的工具,发现下载网页时间很长,比 curl 长很多:

Golang 代码:


// Fetch prints the content found at a URL.
package main

import (
	"time"
	"fmt"
	"io"
	"net/http"
	"os"
	"strings"
)

func main() {

	for _, url := range os.Args[1:] {
		start := time.Now()
		if !(strings.HasPrefix(url, "https://") || strings.HasPrefix(url, "http://")) {
			url = "http://" + url
		}

		resp, err := http.Get(url)
		if err != nil {
			fmt.Fprintf(os.Stderr, "fetch: %v\n", err)
			os.Exit(1)
		}
		fmt.Printf("[\033[0;31m%d\033[0m] %v\n", resp.StatusCode, resp.Request.URL)

		// Test download cost.
		//fmt.Println("Download cost:", time.Since(start))

		if _, err := io.Copy(os.Stdout, resp.Body); err != nil {
			fmt.Fprintf(os.Stderr, "\033[0;31mERROR\033[0m %v\n", err)
			os.Exit(1)
		}
		fmt.Println("Total cost:", time.Since(start))
	}
}

看到这么大的差距,我以为是io.Copy可能出了点问题,就在io.Copy前添了一个时间测试,结果:

Download cost: 21.651425s
Total cost: 21.656501s

可以看到时间主要是花在http.Get这里了。

我多次测试后都可以复现(包含测试不同网站),Google 也没有什么结果,搜索关键词都不知道怎么设置。。

这究竟是啥原因啊?

7897 次点击
所在节点    Go 编程语言
25 条回复
ucanuup
2018-12-05 17:06:12 +08:00
帮你试了下,我这儿无法复现。
go version go1.11.2
katsusan
2018-12-05 17:10:52 +08:00
可以抓包看是建立连接之前的处理太长还是数据传输的花费时间长,我这里测试没问题。

```
➜ testing-ground go run httpget.go http://httpbin.org/get
Get cost:496.770582ms
[200] http://httpbin.org/get
{
"args": {},
"headers": {
"Accept-Encoding": "gzip",
"Connection": "close",
"Host": "httpbin.org",
"User-Agent": "Go-http-client/1.1"
},
"origin": "124.79.234.173",
"url": "http://httpbin.org/get"
}
Total cost: 497.479966ms
```
rayhy
2018-12-05 17:13:30 +08:00
@ucanuup 啊谢谢!我还能复现。我感觉肯定不是 golang 的原因,想不通哪出问题了。连个测试的方向我都想不出来(总不能是学校路由器针对 golang 的 user-agent 吧)。
补充一下:
我的版本是 go version go1.11 linux/amd64
curl 测试的那行代码是
start=$(date +'%s') && curl httpbin.org/get && echo "It took $(($(date +'%s') - $start)) seconds"
topbandit
2018-12-05 17:20:16 +08:00
strace 和抓包看看
icexin
2018-12-05 17:22:36 +08:00
用 httptrace 包看一下在哪里耗时了 https://godoc.org/net/http/httptrace#WithClientTrace
xivisi
2018-12-05 17:28:57 +08:00
@rayhy
curl 好像只获取连接的第一个界面的文本。
你的保存网页是不是,图片、CSS JS 之类的也下载下来了?
webluoye
2018-12-05 17:34:49 +08:00
没复现,go 1.11 Total cost: 501.175936ms
用 http.Client 自己包装一个测试下看那个步骤耗时比较多。
zeyexe
2018-12-05 17:40:17 +08:00
可能和 resolver 有关
icexin
2018-12-05 17:51:10 +08:00
或者不想手写 httptrace 的话,试一下 github.com/davecheney/httpstat,这个是用 net/http 加 net/http/httptrace 做的工具,可以详细展示各个阶段的耗时。

➜ httpstat http://httpbin.org/get

Connected to 52.86.186.182:80

HTTP/1.1 200 OK
Server: gunicorn/19.9.0
Access-Control-Allow-Credentials: true
Access-Control-Allow-Origin: *
Content-Length: 231
Content-Type: application/json
Date: Wed, 05 Dec 2018 09:49:00 GMT
Via: 1.1 vegur
Connection: keep-alive

Body discarded

DNS Lookup TCP Connection Server Processing Content Transfer
[ 612ms | 375ms | 237ms | 0ms ]
| | | |
namelookup:612ms | | |
connect:988ms | |
starttransfer:1225ms |
total:1225ms
rayhy
2018-12-05 17:54:56 +08:00
@icexin
@webluoye
@katsusan
谢谢!我测试一下,看看是哪里的问题
rayhy
2018-12-05 18:38:09 +08:00
@icexin
@webluoye
@katsusan
@xivisi
@topbandit
@ucanuup
又用 @icexin 提到的工具`httpstat`测试了下,发现时间主要花在 DNS Lookup 阶段:
https://s1.ax1x.com/2018/12/05/FlsTEj.png
可以看到 windows 下执行就是正常时间, wsl 下执行时间 DNS Lookup 阶段就花了很多时间。我分别用 Get-NetIPConfiguration 和 cat /etc/resolv.conf 看了下,wsl 和 windows 的 DNS 服务器都是 223.5.5.5 223.6.6.6
用的是同一个 DNS。

还是没有解决问题,同在 wsl 下,curl 和 go 的 http.Get 差别还是很大,理论上它俩是一个 DNS 服务器啊。
https://imgchr.com/i/FlsIbQ
rayhy
2018-12-05 18:41:22 +08:00
忘记说环境了,我是 windows10,出现差异的是 wsl(Linux 4.4.0-17134-Microsoft )环境,在 windows 下 powershell 里是没问题的。
icexin
2018-12-05 18:56:54 +08:00
可能是 dns cache 的问题,看下 net 下的文档 https://godoc.org/net#hdr-Name_Resolution
类 unix 系统默认是纯 go 的实现走 dns 协议 Lookup,windows 默认是走系统调用,可能有 cache 加速。
你在 wsl 下加上 export GODEBUG=netdns=cgo 强制走系统查询试试

The method for resolving domain names, whether indirectly with functions like Dial or directly with functions like LookupHost and LookupAddr, varies by operating system.

On Unix systems, the resolver has two options for resolving names. It can use a pure Go resolver that sends DNS requests directly to the servers listed in /etc/resolv.conf, or it can use a cgo-based resolver that calls C library routines such as getaddrinfo and getnameinfo.

By default the pure Go resolver is used, because a blocked DNS request consumes only a goroutine, while a blocked C call consumes an operating system thread. When cgo is available, the cgo-based resolver is used instead under a variety of conditions: on systems that do not let programs make direct DNS requests (OS X), when the LOCALDOMAIN environment variable is present (even if empty), when the RES_OPTIONS or HOSTALIASES environment variable is non-empty, when the ASR_CONFIG environment variable is non-empty (OpenBSD only), when /etc/resolv.conf or /etc/nsswitch.conf specify the use of features that the Go resolver does not implement, and when the name being looked up ends in .local or is an mDNS name.

The resolver decision can be overridden by setting the netdns value of the GODEBUG environment variable (see package runtime) to go or cgo, as in:

export GODEBUG=netdns=go # force pure Go resolver
export GODEBUG=netdns=cgo # force cgo resolver
The decision can also be forced while building the Go source tree by setting the netgo or netcgo build tag.

A numeric netdns setting, as in GODEBUG=netdns=1, causes the resolver to print debugging information about its decisions. To force a particular resolver while also printing debugging information, join the two settings by a plus sign, as in GODEBUG=netdns=go+1.

On Plan 9, the resolver always accesses /net/cs and /net/dns.

On Windows, the resolver always uses C library functions, such as GetAddrInfo and DnsQuery.
rayhy
2018-12-05 18:59:03 +08:00
在 wsl 里分别执行了 3 条命令,截取了部分结果:

$dig @223.6.6.6
> ;; Query time: 287 msec

$ ./httpstat
> Connected to 34.196.224.14:80
> namelookup:15450ms

$ curl -v
> * Connected to httpbin.org (52.3.53.115) port 80 (#0)
> It took 1 seconds


仿佛有些头绪了。。。。
rayhy
2018-12-05 19:09:16 +08:00
@icexin 感谢!完全解决!
export GODEBUG=netdns=cgo
> namelookup:154ms
export GODEBUG=netdns=go
> namelookup:15484ms
看来 curl 和 go 区别应该就在这。 @zeyexe 在#8 已经提到了,刚看我还没搞懂,现在才明白 @zeyexe 说的是啥。
mritd
2018-12-05 19:12:45 +08:00
@icexin #13 又给我普及知识了^_^
JaguarJack
2018-12-05 21:40:36 +08:00
长知识了,我要收藏起来
itstudying
2018-12-05 21:54:03 +08:00
@icexin #13 学习了
iyaozhen
2018-12-05 22:04:51 +08:00
涨姿势了。这才是论坛的意义呀
XIVN1987
2018-12-06 09:16:31 +08:00
@icexin
厉害了,,膜拜

这是一个专为移动设备优化的页面(即为了让你能够在 Google 搜索结果里秒开这个页面),如果你希望参与 V2EX 社区的讨论,你可以继续到 V2EX 上打开本讨论主题的完整版本。

https://www.v2ex.com/t/514632

V2EX 是创意工作者们的社区,是一个分享自己正在做的有趣事物、交流想法,可以遇见新朋友甚至新机会的地方。

V2EX is a community of developers, designers and creative people.

© 2021 V2EX