问一个爬虫问题,如何生成网站需要的签名(token)

2018-11-07 12:21:32 +08:00
 freeman1974
我用 python 爬取一个海外电商网站数据。登录都搞定了,得到了 remember_key。但接下来,获取 order 的 detail 数据时,要求 POST 提交的 form 数据中,有一个 token。不知道是怎么生成的,token 是变长的,40 个字符左右变动。
请问如何破?我怀疑是本地 js 来生成的,但看了 js,并没有这样的操作啊。
2984 次点击
所在节点    Python
13 条回复
ltoddy
2018-11-07 12:24:42 +08:00
那个不是用过防 csrf 的嘛.
linchengzzz
2018-11-07 13:30:04 +08:00
token 是登录的时候后端返回的吧
mztql
2018-11-07 13:34:03 +08:00
大概率 js 或者之前某个请求返回值
lxy42
2018-11-07 13:35:50 +08:00
看一下是不是 CSRF_TOKEN
freeman1974
2018-11-07 15:03:35 +08:00
@linchengzzz。并不是。登录时,后端返回的是 remember_key=xxxxx。没有 token
freeman1974
2018-11-07 15:08:02 +08:00
我把请求包发一下。
:authority: sell.souq.com
:method: POST
:path: /orders/getUnitListDetails
:scheme: https
accept: application/json, text/plain, */*
accept-encoding: gzip, deflate, br
accept-language: zh-CN,zh;q=0.9
cache-control: no-cache
content-length: 325
content-type: application/x-www-form-urlencoded;charset=UTF-8
cookie: PLATEFORMC=ae; COCODE_AE=ae; PLATEFORML=en; c_Ident=15349883665889; s_fid=79AB962980789693-0187C28FB2F543C4; ab.storage.deviceId.2e4ae497-9aed-4a69-8a2d-91cd396ab384=%7B%22g%22%3A%224ad0c315-7a95-c21e-189f-a00d60fbbb7e%22%2C%22c%22%3A1534988369909%2C%22l%22%3A1534988369909%7D; _ga=GA1.2.546492056.1534988373; __gads=ID=088fbe71ec00d1b9:T=1534988375:S=ALNI_MYwWze-JpDhVgLk2t5ckQ_cN2F_TA; cto_lwid=dab74b02-330c-42a5-9d36-8558347551cb; optimizelyEndUserId=oeu1539745964092r0.11418005455424574; optimizelyBuckets=%7B%7D; _ga=GA1.1.546492056.1534988373; ab.storage.deviceId.dde4157a-6ed4-4e47-a940-cdd336f179b2=%7B%22g%22%3A%22c04c03b4-1478-17a3-fb29-d97324875588%22%2C%22c%22%3A1539745966917%2C%22l%22%3A1539745966917%7D; s_cc=true; s_source=%5B%5BB%5D%5D; cmgvo=Typed%2FBookmarkedTyped%2FBookmarkedundefined; s_ev21=%5B%5B%27Natural%2520Search%27%2C%271534988372649%27%5D%2C%5B%27Typed%2FBookmarked%27%2C%271539746164434%27%5D%5D; s_ev22=%5B%5B%27Natural%2520Search%253A%2520Baidu%253A%2520Keyword%2520Unavailable%27%2C%271534988372649%27%5D%2C%5B%27Typed%2FBookmarked%253A%2520HomePage%27%2C%271539746164435%27%5D%5D; VT_HDR=2; idc=16184445; ab.storage.userId.2e4ae497-9aed-4a69-8a2d-91cd396ab384=%7B%22g%22%3A%2216184445%22%2C%22c%22%3A1539746196470%2C%22l%22%3A1539746196470%7D; SCAUTT=BEARER; optimizelySegments=%7B%22182429971%22%3A%22referral%22%2C%22182476429%22%3A%22false%22%2C%22182494213%22%3A%22gc%22%7D; _gid=GA1.1.779890490.1540453689; formisimo=zA775h2lpdp6NtKHaHIEpJCQgd; _gid=GA1.2.1022938571.1540453881; s_campaign=NA%3ANA%3ANA%3ANA%3ANA%3ANA%3ANA%3ANA%3ANA%3ANA%3ANA%3ANA%3ANA%3ANA%3ANA%3Aae%3Aen%3ANA%3ANA%3ATab-Bookarked%3Afree; PHPSESSID=b22mbohdmk57qscsujbiv0dq0d4pp8ju; SCXAT=3HWrwJLXPXGe5YcNuo8M0DQIXOLI73wpPeXrOl8tSPM+1540501370639; s_vs=1; s_dl=1; remember_key=60879632d672d5167b313231660db7c1; is_logged_in=1; CARTID=15195280635a92287f547d0; customer_logged_in=0; ab.storage.sessionId.2e4ae497-9aed-4a69-8a2d-91cd396ab384=%7B%22g%22%3A%220f76d955-ac87-77a8-df13-85896d0caa82%22%2C%22e%22%3A1540503948783%2C%22c%22%3A1540501398853%2C%22l%22%3A1540502148783%7D; s_ppvl=HomePage%2C10%2C10%2C655%2C1366%2C655%2C1366%2C768%2C1%2CP; s_depth=10; s_sq=soqdev%3D%2526c.%2526a.%2526activitymap.%2526page%253DHomePage%2526link%253DSell%252520with%252520Us%2526region%253DinnerWrap%2526pageIDType%253D1%2526.activitymap%2526.a%2526.c%2526pid%253DHomePage%2526pidt%253D1%2526oid%253Dhttps%25253A%25252F%25252Fsell.souq.com%25252F%2526ot%253DA; SCAUAT=eyJhbGciOiJSUzI1NiJ9.eyJzdWIiOiIxNjE4NDQ0NSIsImZpcnN0TmFtZSI6IllhbiIsImxhc3ROYW1lIjoiWGllIiwicm9sZSI6IlNFTExFUiIsInNlbGxlcklkIjoxLjYxODQ0NDVFNywiZ3JvdXBJZCI6MS42MTg0NDQ1RTcsImlzcyI6IlNPVVEiLCJpZCI6MS42MTg0NDQ1RTcsImV4cCI6MTU0MDU4ODk4OCwidXNlck5hbWUiOiJVYnV5LXVhZSIsImlhdCI6MTU0MDUwMjU4OH0.vijDdv5s0coXK-n4DV5Ow6vLGWzLhLZJODNhTJlgFAAZVJ_ECiiwY2lmEVtmfJZPrGaWq4UKIRlouJ4jw6FPSQFdiiUnnI3XQbQTM8nN4V3pmhZL0ENU61xjkl_l8j6SIdCCPJt4E_aAeGmVapiuBMlUSapEFEmcoAAnL_aP32dz0zRzfERyu3LtHcfI1D9SIBAryGqc2pnsMzvg3OM2JeaMBofgNzG4Ro2EfD03DMjy2JilgxRyxghtteS0N88iOi9CrC6GQoXigr_hX2kN5wPfeMONBAJaTNXGJBUFH_fHxkwsGLkBfAQ2S26qjQykxsND0rOZcaVxVbfgKUieHEczjytBl32o--BV0ZAxUKrYkFLPir16aG5cfa_7CENv_rXKfxyps8ZhoJ7r4mj4utPSEQ-8AQIvzO1j7FDhipAiAw5vjKsGqHpcjmbHKEVQZjOQ3zx0RtnbDvqbQFGxtvwfxAbo0DYQh-RwengohrdAPe4gd_gfolofC5x9dVwHV_PG-ydEXa9nyUfM8jR2qu1FW0ymU6vRKTRBcVMtWfTUr6LVaaHj8Pro82xPfOyXMYdt0Y-VBErTHUWeZFveQn6HqN5wO_tw4vi7QgMP1lH7D1ptZQgALrPhv7Uac-V_ZgEWtxmGdIsQnbLVA1BgDUjcmlX5fiKkpc__UFCRM80; SCAURT=31120e270797f9d3b5f7da1a53493a52; ab.storage.sessionId.dde4157a-6ed4-4e47-a940-cdd336f179b2=%7B%22g%22%3A%22a70246b5-4dd8-5c31-1cb9-6ed1cdb1960a%22%2C%22e%22%3A1540504386313%2C%22c%22%3A1540501385158%2C%22l%22%3A1540502586313%7D; _gat=1; s_nr_lifetime=1540503851189-Repeat; s_nr_year=1540503851190-Repeat; s_nr_quarter=1540503851191-Repeat; s_ppv=HomePage%2C10%2C9%2C655%2C1366%2C655%2C1366%2C768%2C1%2CP
origin: https://sell.souq.com
pragma: no-cache
referer: https://sell.souq.com/orders/order-management?tab=confirmed
user-agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36
x-requested-with: XMLHttpRequest

data[0][0]: 3645400471
data[1][0]: 1250900376
data[2][0]: 231834900488
data[3][0]: 16000600468
data[4][0]: 4980800471
data[5][0]: 228639400488
data[6][0]: 4980600471
data[6][1]: 10055900471
data[7][0]: 5218600302
token: 3HWrwJLXPXGe5YcNuo8M0DQIXOLI73wpPeXrOl8tSPM

这就是 token,也就是针对数据报文的签名。是否前面返回的,我再仔细看一下。
freeman1974
2018-11-07 15:26:00 +08:00
@lxy42
关于是否是 CSRF_TOKEN 的问题。
是这样的,我注意到服务器有回复一个参数:
gSouqCsrfToken = "058e5fa7fff29647758beb38e6f5668a";
但此时,网站提交出去的 token 是:
token: Q3i6L7GD2HzjlrBN3CrzEe8NEj1wIOYbYT5ZvV43RL4
但同时我注意到,这个 token,在很多次请求中,都在放在( Request Header)中。并不是 response header 中。
cookie: SCXAT=Q3i6L7GD2HzjlrBN3CrzEe8NEj1wIOYbYT5ZvV43RL4+1541574632410;
所以,我只能认为它是由 js 来生成的,并不是后端返回的。
但 js 代码中,我并没有发现计算 token 值的地方。
momocraft
2018-11-07 15:29:57 +08:00
能自己加 header 的 AFAIK 只有 XHR/fetch,都是 JS 控制的,也许你没有找到
arrow8899
2018-11-07 15:39:33 +08:00
js 一步步调试,看从哪里取的
freeman1974
2018-11-07 16:10:45 +08:00
@momocraft,js 中确实有。但我看不太明白。
var Service=function(e){this.$http=e.get("$http"),this._config={method:"GET",timeout:60},this.method=function(e){return this._config.method=e,this},this.url=function(e){return-1!==e.indexOf("https://")||-1!==e.indexOf("http://")?this._config.url=e:this._config.url=gBaseUrl+e.replace(/^\//,""),this},this.params=function(e){return this._config.params=e,this},this.data=function(e){return this._config.data=e,this},this.config=function(e){return this._config=_.extend(this._config,e),this},this.defer=function(){if(_.isUndefined(this._config.url))throw new Error("No URL in Service definition");var t=e.get("$cookies");return this._config.data||(this._config.data={}),t.get("SCXAT")&&(this._config.data.token=t.get("SCXAT").split("+")[0]),this.$http(this._config)}
}
freeman1974
2018-11-07 16:13:18 +08:00
这段 js 代码获取 SCXAT。看起来是从后台的响应中取出来的,但我用调试工具,没看到后台响应中有给出来。
或许是我没看懂。
moxiaowei
2018-11-07 16:17:24 +08:00
这个是用来防 csrf 攻击的,真不知道怎么弄,标记下,看看高手怎么解决
lxy42
2018-11-08 09:07:20 +08:00
this._config.data.token=t.get("SCXAT").split("+")[0]

看这段代码好像是去 cookies 中取出 SCXAT

这是一个专为移动设备优化的页面(即为了让你能够在 Google 搜索结果里秒开这个页面),如果你希望参与 V2EX 社区的讨论,你可以继续到 V2EX 上打开本讨论主题的完整版本。

https://www.v2ex.com/t/505334

V2EX 是创意工作者们的社区,是一个分享自己正在做的有趣事物、交流想法,可以遇见新朋友甚至新机会的地方。

V2EX is a community of developers, designers and creative people.

© 2021 V2EX