V2EX = way to explore
V2EX 是一个关于分享和探索的地方
Sign Up Now
For Existing Member  Sign In
• 请不要在回答技术问题时复制粘贴 AI 生成的内容
rotciv
V2EX  ›  程序员

请教两个 robots.txt 相关的问题

  •  
  •   rotciv · Jun 1, 2021 · 2202 views
    This topic created in 1794 days ago, the information mentioned may be changed or developed.
    robots.txt 内容:
    User-agent: *
    Disallow: /subject_search
    Disallow: /amazon_search
    Disallow: /search
    Disallow: /group/search
    Disallow: /event/search
    Disallow: /celebrities/search
    Disallow: /location/drama/search
    Disallow: /forum/
    Disallow: /new_subject
    Disallow: /service/iframe
    Disallow: /j/
    Disallow: /link2/
    Disallow: /recommend/
    Disallow: /doubanapp/card
    Disallow: /update/topic/
    Disallow: /share/
    Allow: /ads.txt
    Sitemap: https://www.douban.com/sitemap_index.xml
    Sitemap: https://www.douban.com/sitemap_updated_index.xml
    # Crawl-delay: 5

    User-agent: Wandoujia Spider
    Disallow: /

    User-agent: Mediapartners-Google
    Disallow: /subject_search
    Disallow: /amazon_search
    Disallow: /search
    Disallow: /group/search
    Disallow: /event/search
    Disallow: /celebrities/search
    Disallow: /location/drama/search
    Disallow: /j/

    1./group/topic 在标注为 Disallow 和 Allow 中都没有出现,那么应该默认为 Allow 还是 Disallow ?
    2."# Crawl-delay: 5"的单位是什么?
    4 replies    2021-06-02 09:38:56 +08:00
    AoEiuV020
        2
    AoEiuV020  
       Jun 1, 2021   ❤️ 1
    如果有 Disallow: /的话是继承的,
    rotciv
        3
    rotciv  
    OP
       Jun 1, 2021
    @zengxs @AoEiuV020 谢谢
    marktask
        4
    marktask  
       Jun 2, 2021
    如果未定义任何蜘蛛,默认就是允许。例如 robots 为空,就是允许任何蜘蛛抓取任何目录
    About   ·   Help   ·   Advertise   ·   Blog   ·   API   ·   FAQ   ·   Solana   ·   2406 Online   Highest 6679   ·     Select Language
    创意工作者们的社区
    World is powered by solitude
    VERSION: 3.9.8.5 · 39ms · UTC 04:17 · PVG 12:17 · LAX 21:17 · JFK 00:17
    ♥ Do have faith in what you're doing.