SEO Glossary · Technical SEO

What Is Robots.txt?

Robots.txt is a plain-text file at the root of a website that tells search engine crawlers which URLs they may or may not request. It is the first file most crawlers check when they visit a site.

Robots.txt does not control indexing

A crucial nuance: Disallow blocks crawling, not indexing. A blocked URL can still appear in Google (without a description) if other pages link to it - the "Indexed, though blocked by robots.txt" warning. To keep a page out of the index, allow crawling and use a noindex meta tag instead.

Pro tip
A blocked URL can still appear in Google without a description if other pages link to it. Use noindex - not robots.txt - to truly remove a page.
Key takeaways
Robots.txt tells crawlers which URLs they may request.
It lives at your domain root and is the first file bots check.
Disallow blocks crawling, not indexing.
To keep a page out of the index, use a noindex tag instead.

Put it into practice with Soro

Understanding robots.txt is one thing - applying it across every page is another. Soro automates SEO content end to end, researching keywords and publishing optimised articles so your site ranks on Google and gets cited by AI. See how Soro works.

Frequently asked questions

Where does robots.txt go?

At the root of your domain - for example example.com/robots.txt. Crawlers will not find it in a subfolder.

Does robots.txt stop a page from being indexed?

No. It blocks crawling, not indexing. Use a noindex meta tag to prevent indexing.

Keep learning

Browse the full SEO glossary