Robots.txt for multiple domains

robots txt sitemap
robots txt subdomains
robots txt endpoint
robots txt 404
google robots txt
robots txt wildcard
robots txt request rate
robots txt rate limit

We have different domains for each language

  1. www.abc.com
  2. www.abc.se
  3. www.abc.de

And then we have different sitemap.xml for each site. In robots.txt, I want to add sitemap reference for each domain.

  1. Is it possible to have multiple sitemap references for each domain in single robots.txt?
  2. If there are multiple, which one does it pick?

The robots.txt can only inform the search engines of sitemaps for its own domain. So that one will be the only one it honors when it crawls that domain's robots.txt. If all three domains map to the same website and share a robots.txt then the search engines will effectively find each sitemap.

I see no need to use robots.txt for that. use Google and Bings webmaster tools. Here you have each domain registered and can submit  Then in robots-allow.txt you could leave it as a blank file or adjust the crawl delay for all search engines like this: User-agent: * Crawl-delay: 1. The number after crawl-delay represents the minimum waiting time in seconds between requests to the server from the same user agent. share. Share a link to this answer.

Yes, if both sites are separate domains that you want to use in different ways, then you should place a different robots.txt file in each domain  The robots.txt can only inform the search engines of sitemaps for its own domain. So that one will be the only one it honors when it crawls that domain's robots.txt. If all three domains map to the same website and share a robots.txt then the search engines will effectively find each sitemap.

Based on Hans2103's answer, I wrote this one that should be safe to be included in just about every web project:

# URL Rewrite solution for robots.txt for multidomains on single docroot
RewriteCond %{REQUEST_FILENAME} !-d # not an existing dir
RewriteCond %{REQUEST_FILENAME} !-f # not an existing file
RewriteCond robots/%{HTTP_HOST}.txt -f # and the specific robots file exists
RewriteRule ^robots\.txt$ robots/%{HTTP_HOST}.txt [L]

This rewrite condition should just serve the normal robots.txt if it's present and only look for a robots/ directory with the specified file robots/<domain.tld>.txt.

N.B.: The above rewrite has not yet been tested. Feel free to correct me if you find any flaws; I will update this post for future reference upon any helpful corrective comments.

You need to conditionally serve a different robots.txt file based on which domain/​host has been accessed. On Apache you can do this in  In a robots.txt file with multiple user-agent directives, each disallow or allow rule only applies to the useragent(s) specified in that particular line break-separated set. If the file contains a rule that applies to more than one user-agent , a crawler will only pay attention to (and follow the directives in) the most specific group of instructions.

Google's John Mueller said on Twitter that having shared robots.txt across multiple domains is fine and should work for search. John wrote. I see no need to use robots.txt for that. use Google and Bings webmaster tools. Here you have each domain registered and can submit sitemaps to them for each domain. If you want to make sure that your sitemaps are not crawled by a bot for a wrong language.

Yes, if both sites are separate domains that you want to use in different ways, then you should place a different robots.txt file in each domain root so that they're​  Is it possible to exclude a domain name in robots.txt? I have a single Drupal site with multiple domains pointing on it. For example our dev and staging server are being crawled by Google when our .htaccess password protection is disabled. I don't intent to block any access in our development server but to tell search engine bots to stop

Ok, with you now. No, not an issue, as long as they are referenced correctly in your robots.txt files and in search console. This means that if you have multiple subdomains, BingBot must be able to fetch robots.txt at the root of each one of them, even if all these robots.txt files are the same. In particular, if a robots.txt file is missing from a subdomain, BingBot will not try to fall back to any other file in your domain, meaning it will consider itself allowed anywhere on the subdomain.

Comments
  • These are three different websites hosted together. Of course, the content is different and also the sitemap.xml file they have is different.
  • This is the only answer of it's kind and it has only one upvote.. the rest of the internet seems to rewrite the robots.txt to the domain specific one using htaccess, but this makes much more sense
  • Isn't it required that the robots.txt is in the root of each domain?
  • @AlexioVay according to robotstxt.org/orig.html : >start-quote< This file must be accessible via HTTP on the local URL "/robots.txt". >end-quote< Using the method above the file is accessible in the root, although it's not stored in the root.
  • Isn't it required that the robots.txt is in the root of each domain?
  • This way, every request to /robots.txt gets answered by sending the contents of the specified domain (if you've set up the directory and files correctly, ofcourse)