Su Di Kang Sohu measured whether micro blog Shanghai spider love shield

finally we see the third part, this part of the use of wildcards defines all search engines, not allowed to crawl the root directory (the equivalent of not allowed to grab any URL). Here we also need to pay attention to the interpretation of love Shanghai search help center. Love Shanghai official documents said: special attention is needed for the order of Disallow and Allow is significant, robot will according to the first, the success of the Allow or Disallow line to determine whether access to a URL. So, the third part of the prohibition of instruction, to love Shanghai spiders, is invalid. The first part of the love of spiders in Shanghai according to the requirements, you can grab all the URL.


in the guide to love Shanghai search Help Center (贵族宝贝baidu贵族宝贝/search/robots.html) can find such a sentence: "Disallow:" means all URL allow robot access to the site.

will not need to read, the definition of the Sogou search engine spiders crawl permissions.

< >


Sohu micro-blog Robots.txt file content (late June 9, 2011):


User-agent: Sogou

as long as the understanding of Shanghai dragon friends know Sohu recently used micro-blog grab away Shanghai tail words flow. Due to various reasons, Su Di Kang did not participate in this thing. In June 9, 2011, the team QQ group Su Di Kang where suddenly forwarding a message, said the Sohu micro-blog screen love Shanghai spider, and provides a Admin5 forum URL. After analysis, Su Dikang believes that the Sohu micro-blog did not love the Shanghai shield spider, the statement is a misreading of the micro-blog Robots.txt file by Sohu.


Disallow: /

Allow: /

User-agent: Baiduspider

The second part

first, we look at the Sohu micro-blog Robots.txt the first part is for the love of spiders in Shanghai.

therefore, the first part of the statement, allowing the love Shanghai spiders crawl all URL.

we can actually test a verbal statement without any proof. Treatment of known and noble love Shanghai baby treat Robots.txt file is the same, so we can use Google Webmaster Tools "crawler access" function to test.

User-agent: *


Leave a Reply

Your email address will not be published. Required fields are marked *