There was problem when I tried to grab BAIDU tongji infor.
When I logined success, there was a 302 redirection to main page.
That means the internal redirect was from https://tongji.baidu.com/web/welcome/ico?s=sdfsdfsdfsdf to s://tongji.baidu.com/web/12323243/overview/index?siteId=sdfsf.
I wonder know that how does the program(may be the broswer? I am also not clear. LOL) pass the cookie from the 320 page to the destination page? and Why?
Could anyone do me a favor? Thanks in advace.
Append:
302 page : https://tongji.baidu.com/web/welcome/ico?s=sdfsdfsdfsdf
destination page: s://tongji.baidu.com/web/12323243/overview/index?siteId=sdfsf
|  |      1Cooky      2017-12-08 16:31:01 +08:00 via Android Chinese please or go to Stack Overflow | 
|  |      2shanechiu OP @Cooky I am a little worried about whether this question lives up to the Stack Overflow's strict standard. | 
|      3vincenttone      2017-12-08 16:35:05 +08:00 不知道你看得懂中文不 中文答案: 1. http 是无状态的 2. cookie 是通过 header 传递的 3. 留意一下 cookie 的域 | 
|      4hhacker      2017-12-08 16:38:11 +08:00 Because the cookie is shared by same domain | 
|  |      5shanechiu OP @vincenttone well, Is it that means the 302 page request cookie will also pass to the destination page by header and it also acts as a request cookie in the destination page? | 
|      6fml87      2017-12-08 16:50:47 +08:00 logined 是什么 | 
|  |      7shanechiu OP @fml87 a past tense of word "login", it means events or actions happen in the past. | 
|      8vincenttone      2017-12-08 16:54:42 +08:00 @shanechiu 如果你想理解 cookie 在 302 页面中的表现,就必须先了解 cookie 在普通页面中的表现。 如我刚才所说: 1. http 是无状态的 这个是前提。 cookie 存在本地,无状态的情况下,不关心你有没有做 302 跳转。 | 
|  |      9shanechiu OP @vincenttone Thanks for your kindness and patience. There seems like a outline about this. | 
|  |      10knightdf      2017-12-08 18:04:17 +08:00 这么秀的吗?看历史原来你不是会中文么? | 
|      11yospan      2017-12-09 15:19:00 +08:00 之前刚做了,用 session 啊,统计后台设置个第三方密码,然后 post 给他,保持 session 去请求其他页面,接着统计里的数据随便拿~ 那去参考下把;我是 py 新手; ``` ##百度统计的第三查看密码,登录并获得 session 和 siteid idwd = {'passwd': '66666'} S = Session() logined = S.post("https://tongji.baidu.com/web/welcome/ico?s=8dfdafdafadfa4bccd", data=idwd, headers=REQ_HEADERS) #获得 siteid,并转换成字符串 siteid= str(logined.url.split("=")[1]) webid = str(logined.url.split("/")[4]) ##搜索词的 post 参数 keyjson = {"siteId":siteid,"st":"","et":"","st2":"","et2":"","indicators":"['pv_count','visitor_count','ip_count','bounce_ratio','avg_visit_time']","order":"pv_count,desc","offset":"","pageSize":"","target":"-1","flag":"indicator","source":"","isGroup":"0","clientDevice":"all","reportId":"12","method":"source/searchword/a","queryId":""} readkeyjson = S.post("https://tongji.baidu.com/web/"+webid+"/ajax/post", data=keyjson, headers=REQ_HEADERS) #按文本读取 jsondata = readkeyjson.text #格式化 json readjsondict = json.loads(jsondata) keyNamejson = readjsondict['data']['items'][0] for items in keyNamejson: items2 = items print(items2[0]['name']) ``` |