robots.txt
A plain-text file at the root of a domain that tells crawlers which paths they're allowed (or disallowed) to fetch.
In long form.
robots.txt is a voluntary protocol — well-behaved bots respect it; malicious scrapers ignore it. The file uses User-agent and Disallow directives to control crawler access, plus a Sitemap directive pointing to the site's XML sitemap. Disallowing a URL doesn't remove it from the index — it just prevents crawling. To prevent indexing, use a noindex meta tag (or X-Robots-Tag header) on a page that crawlers can still reach.
We've inherited sites where robots.txt was blocking the entire site (`Disallow: /`) — a relic from staging. The site had been invisible to Google for months. Always read robots.txt as part of an audit's first five minutes.
Talk to us about your engagement.
Discovery calls are free. Scope, timelines, and pricing are quoted after we understand what you’re solving.