A robots.txt file is placed in the webroot. In most cases this wil be /wordpress/current/
The content of the file termines the rules for User Agents (webcrawlers).
Different crawlers allow for different rules. Standard Regex does not apply. However some special signs are widely supported;
* = Wildcard
$ = End of URL
The standard notation of a robots.txt file.
User-agent: [user-agent name] Crawl-Delay: [Delay in milliseconds per URL crawl] Disallow: [URL string to exempt from crawling]
Blocking Seekport from ALL pages
User-agent: Seekport Disallow: /
Crawl Delay Yahoo (Slurp) for 120MS and block the /contact page.
User-agent: Slurp Crawl-Delay: 120 Disallow: /contact$
Disables all PDF files from being crawled.
User-agent: msnbot Disallow: /uploads/*.pdf$
Multiple URL's for a single user agent.
User-agent: Slurp Dissalow: /example/$ Disallow: /contact/$ Disallow: /hidden/$
Multiple user agents. Seperate with an empty line.
User-agent: Ahrefsbot Crawl-Delay: 120 Disallow: /contact$ User-agent: Googlebot Crawl-Delay: 120 Disallow: /contact$ User-agent: Slurp Crawl-Delay: 120 Disallow: /contact$
Was this article helpful?
That’s Great!
Thank you for your feedback
Sorry! We couldn't be helpful
Thank you for your feedback
Feedback sent
We appreciate your effort and will try to fix the article