V2EX = way to explore
V2EX 是一个关于分享和探索的地方
现在注册
已注册用户请  登录
推荐学习书目
Learn Python the Hard Way
Python Sites
PyPI - Python Package Index
http://diveintopython.org/toc/index.html
Pocoo
值得关注的项目
PyPy
Celery
Jinja2
Read the Docs
gevent
pyenv
virtualenv
Stackless Python
Beautiful Soup
结巴中文分词
Green Unicorn
Sentry
Shovel
Pyflakes
pytest
Python 编程
pep8 Checker
Styles
PEP 8
Google Python Style Guide
Code Style from The Hitchhiker's Guide
freeman1974
V2EX  ›  Python

问一个爬虫问题,如何生成网站需要的签名(token)

  •  
  •   freeman1974 · 2018-11-07 12:21:32 +08:00 · 3009 次点击
    这是一个创建于 2210 天前的主题,其中的信息可能已经有所发展或是发生改变。
    我用 python 爬取一个海外电商网站数据。登录都搞定了,得到了 remember_key。但接下来,获取 order 的 detail 数据时,要求 POST 提交的 form 数据中,有一个 token。不知道是怎么生成的,token 是变长的,40 个字符左右变动。
    请问如何破?我怀疑是本地 js 来生成的,但看了 js,并没有这样的操作啊。
    13 条回复    2018-11-08 09:07:20 +08:00
    ltoddy
        1
    ltoddy  
       2018-11-07 12:24:42 +08:00
    那个不是用过防 csrf 的嘛.
    linchengzzz
        2
    linchengzzz  
       2018-11-07 13:30:04 +08:00
    token 是登录的时候后端返回的吧
    mztql
        3
    mztql  
       2018-11-07 13:34:03 +08:00 via iPhone
    大概率 js 或者之前某个请求返回值
    lxy42
        4
    lxy42  
       2018-11-07 13:35:50 +08:00
    看一下是不是 CSRF_TOKEN
    freeman1974
        5
    freeman1974  
    OP
       2018-11-07 15:03:35 +08:00
    @linchengzzz。并不是。登录时,后端返回的是 remember_key=xxxxx。没有 token
    freeman1974
        6
    freeman1974  
    OP
       2018-11-07 15:08:02 +08:00
    我把请求包发一下。
    :authority: sell.souq.com
    :method: POST
    :path: /orders/getUnitListDetails
    :scheme: https
    accept: application/json, text/plain, */*
    accept-encoding: gzip, deflate, br
    accept-language: zh-CN,zh;q=0.9
    cache-control: no-cache
    content-length: 325
    content-type: application/x-www-form-urlencoded;charset=UTF-8
    cookie: PLATEFORMC=ae; COCODE_AE=ae; PLATEFORML=en; c_Ident=15349883665889; s_fid=79AB962980789693-0187C28FB2F543C4; ab.storage.deviceId.2e4ae497-9aed-4a69-8a2d-91cd396ab384=%7B%22g%22%3A%224ad0c315-7a95-c21e-189f-a00d60fbbb7e%22%2C%22c%22%3A1534988369909%2C%22l%22%3A1534988369909%7D; _ga=GA1.2.546492056.1534988373; __gads=ID=088fbe71ec00d1b9:T=1534988375:S=ALNI_MYwWze-JpDhVgLk2t5ckQ_cN2F_TA; cto_lwid=dab74b02-330c-42a5-9d36-8558347551cb; optimizelyEndUserId=oeu1539745964092r0.11418005455424574; optimizelyBuckets=%7B%7D; _ga=GA1.1.546492056.1534988373; ab.storage.deviceId.dde4157a-6ed4-4e47-a940-cdd336f179b2=%7B%22g%22%3A%22c04c03b4-1478-17a3-fb29-d97324875588%22%2C%22c%22%3A1539745966917%2C%22l%22%3A1539745966917%7D; s_cc=true; s_source=%5B%5BB%5D%5D; cmgvo=Typed%2FBookmarkedTyped%2FBookmarkedundefined; s_ev21=%5B%5B%27Natural%2520Search%27%2C%271534988372649%27%5D%2C%5B%27Typed%2FBookmarked%27%2C%271539746164434%27%5D%5D; s_ev22=%5B%5B%27Natural%2520Search%253A%2520Baidu%253A%2520Keyword%2520Unavailable%27%2C%271534988372649%27%5D%2C%5B%27Typed%2FBookmarked%253A%2520HomePage%27%2C%271539746164435%27%5D%5D; VT_HDR=2; idc=16184445; ab.storage.userId.2e4ae497-9aed-4a69-8a2d-91cd396ab384=%7B%22g%22%3A%2216184445%22%2C%22c%22%3A1539746196470%2C%22l%22%3A1539746196470%7D; SCAUTT=BEARER; optimizelySegments=%7B%22182429971%22%3A%22referral%22%2C%22182476429%22%3A%22false%22%2C%22182494213%22%3A%22gc%22%7D; _gid=GA1.1.779890490.1540453689; formisimo=zA775h2lpdp6NtKHaHIEpJCQgd; _gid=GA1.2.1022938571.1540453881; s_campaign=NA%3ANA%3ANA%3ANA%3ANA%3ANA%3ANA%3ANA%3ANA%3ANA%3ANA%3ANA%3ANA%3ANA%3ANA%3Aae%3Aen%3ANA%3ANA%3ATab-Bookarked%3Afree; PHPSESSID=b22mbohdmk57qscsujbiv0dq0d4pp8ju; SCXAT=3HWrwJLXPXGe5YcNuo8M0DQIXOLI73wpPeXrOl8tSPM+1540501370639; s_vs=1; s_dl=1; remember_key=60879632d672d5167b313231660db7c1; is_logged_in=1; CARTID=15195280635a92287f547d0; customer_logged_in=0; ab.storage.sessionId.2e4ae497-9aed-4a69-8a2d-91cd396ab384=%7B%22g%22%3A%220f76d955-ac87-77a8-df13-85896d0caa82%22%2C%22e%22%3A1540503948783%2C%22c%22%3A1540501398853%2C%22l%22%3A1540502148783%7D; s_ppvl=HomePage%2C10%2C10%2C655%2C1366%2C655%2C1366%2C768%2C1%2CP; s_depth=10; s_sq=soqdev%3D%2526c.%2526a.%2526activitymap.%2526page%253DHomePage%2526link%253DSell%252520with%252520Us%2526region%253DinnerWrap%2526pageIDType%253D1%2526.activitymap%2526.a%2526.c%2526pid%253DHomePage%2526pidt%253D1%2526oid%253Dhttps%25253A%25252F%25252Fsell.souq.com%25252F%2526ot%253DA; SCAUAT=eyJhbGciOiJSUzI1NiJ9.eyJzdWIiOiIxNjE4NDQ0NSIsImZpcnN0TmFtZSI6IllhbiIsImxhc3ROYW1lIjoiWGllIiwicm9sZSI6IlNFTExFUiIsInNlbGxlcklkIjoxLjYxODQ0NDVFNywiZ3JvdXBJZCI6MS42MTg0NDQ1RTcsImlzcyI6IlNPVVEiLCJpZCI6MS42MTg0NDQ1RTcsImV4cCI6MTU0MDU4ODk4OCwidXNlck5hbWUiOiJVYnV5LXVhZSIsImlhdCI6MTU0MDUwMjU4OH0.vijDdv5s0coXK-n4DV5Ow6vLGWzLhLZJODNhTJlgFAAZVJ_ECiiwY2lmEVtmfJZPrGaWq4UKIRlouJ4jw6FPSQFdiiUnnI3XQbQTM8nN4V3pmhZL0ENU61xjkl_l8j6SIdCCPJt4E_aAeGmVapiuBMlUSapEFEmcoAAnL_aP32dz0zRzfERyu3LtHcfI1D9SIBAryGqc2pnsMzvg3OM2JeaMBofgNzG4Ro2EfD03DMjy2JilgxRyxghtteS0N88iOi9CrC6GQoXigr_hX2kN5wPfeMONBAJaTNXGJBUFH_fHxkwsGLkBfAQ2S26qjQykxsND0rOZcaVxVbfgKUieHEczjytBl32o--BV0ZAxUKrYkFLPir16aG5cfa_7CENv_rXKfxyps8ZhoJ7r4mj4utPSEQ-8AQIvzO1j7FDhipAiAw5vjKsGqHpcjmbHKEVQZjOQ3zx0RtnbDvqbQFGxtvwfxAbo0DYQh-RwengohrdAPe4gd_gfolofC5x9dVwHV_PG-ydEXa9nyUfM8jR2qu1FW0ymU6vRKTRBcVMtWfTUr6LVaaHj8Pro82xPfOyXMYdt0Y-VBErTHUWeZFveQn6HqN5wO_tw4vi7QgMP1lH7D1ptZQgALrPhv7Uac-V_ZgEWtxmGdIsQnbLVA1BgDUjcmlX5fiKkpc__UFCRM80; SCAURT=31120e270797f9d3b5f7da1a53493a52; ab.storage.sessionId.dde4157a-6ed4-4e47-a940-cdd336f179b2=%7B%22g%22%3A%22a70246b5-4dd8-5c31-1cb9-6ed1cdb1960a%22%2C%22e%22%3A1540504386313%2C%22c%22%3A1540501385158%2C%22l%22%3A1540502586313%7D; _gat=1; s_nr_lifetime=1540503851189-Repeat; s_nr_year=1540503851190-Repeat; s_nr_quarter=1540503851191-Repeat; s_ppv=HomePage%2C10%2C9%2C655%2C1366%2C655%2C1366%2C768%2C1%2CP
    origin: https://sell.souq.com
    pragma: no-cache
    referer: https://sell.souq.com/orders/order-management?tab=confirmed
    user-agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36
    x-requested-with: XMLHttpRequest

    data[0][0]: 3645400471
    data[1][0]: 1250900376
    data[2][0]: 231834900488
    data[3][0]: 16000600468
    data[4][0]: 4980800471
    data[5][0]: 228639400488
    data[6][0]: 4980600471
    data[6][1]: 10055900471
    data[7][0]: 5218600302
    token: 3HWrwJLXPXGe5YcNuo8M0DQIXOLI73wpPeXrOl8tSPM

    这就是 token,也就是针对数据报文的签名。是否前面返回的,我再仔细看一下。
    freeman1974
        7
    freeman1974  
    OP
       2018-11-07 15:26:00 +08:00
    @lxy42
    关于是否是 CSRF_TOKEN 的问题。
    是这样的,我注意到服务器有回复一个参数:
    gSouqCsrfToken = "058e5fa7fff29647758beb38e6f5668a";
    但此时,网站提交出去的 token 是:
    token: Q3i6L7GD2HzjlrBN3CrzEe8NEj1wIOYbYT5ZvV43RL4
    但同时我注意到,这个 token,在很多次请求中,都在放在( Request Header)中。并不是 response header 中。
    cookie: SCXAT=Q3i6L7GD2HzjlrBN3CrzEe8NEj1wIOYbYT5ZvV43RL4+1541574632410;
    所以,我只能认为它是由 js 来生成的,并不是后端返回的。
    但 js 代码中,我并没有发现计算 token 值的地方。
    momocraft
        8
    momocraft  
       2018-11-07 15:29:57 +08:00
    能自己加 header 的 AFAIK 只有 XHR/fetch,都是 JS 控制的,也许你没有找到
    arrow8899
        9
    arrow8899  
       2018-11-07 15:39:33 +08:00
    js 一步步调试,看从哪里取的
    freeman1974
        10
    freeman1974  
    OP
       2018-11-07 16:10:45 +08:00
    @momocraft,js 中确实有。但我看不太明白。
    var Service=function(e){this.$http=e.get("$http"),this._config={method:"GET",timeout:60},this.method=function(e){return this._config.method=e,this},this.url=function(e){return-1!==e.indexOf("https://")||-1!==e.indexOf("http://")?this._config.url=e:this._config.url=gBaseUrl+e.replace(/^\//,""),this},this.params=function(e){return this._config.params=e,this},this.data=function(e){return this._config.data=e,this},this.config=function(e){return this._config=_.extend(this._config,e),this},this.defer=function(){if(_.isUndefined(this._config.url))throw new Error("No URL in Service definition");var t=e.get("$cookies");return this._config.data||(this._config.data={}),t.get("SCXAT")&&(this._config.data.token=t.get("SCXAT").split("+")[0]),this.$http(this._config)}
    }
    freeman1974
        11
    freeman1974  
    OP
       2018-11-07 16:13:18 +08:00
    这段 js 代码获取 SCXAT。看起来是从后台的响应中取出来的,但我用调试工具,没看到后台响应中有给出来。
    或许是我没看懂。
    moxiaowei
        12
    moxiaowei  
       2018-11-07 16:17:24 +08:00
    这个是用来防 csrf 攻击的,真不知道怎么弄,标记下,看看高手怎么解决
    lxy42
        13
    lxy42  
       2018-11-08 09:07:20 +08:00
    this._config.data.token=t.get("SCXAT").split("+")[0]

    看这段代码好像是去 cookies 中取出 SCXAT
    关于   ·   帮助文档   ·   博客   ·   API   ·   FAQ   ·   实用小工具   ·   6130 人在线   最高记录 6679   ·     Select Language
    创意工作者们的社区
    World is powered by solitude
    VERSION: 3.9.8.5 · 37ms · UTC 06:16 · PVG 14:16 · LAX 22:16 · JFK 01:16
    Developed with CodeLauncher
    ♥ Do have faith in what you're doing.