Glossary · Glossary

Robots.txt

Robots.txt is a site file that gives crawler access rules for parts of a website.

Updated Jun 10, 2026 Reviewed Jun 10, 2026 en

Robots.txt is a plain-text file, usually available at /robots.txt, that gives crawler access rules for parts of a website. It tells compliant crawlers which URL paths they may or may not request.

The file is a crawl control mechanism, not a content quality or indexing system. It can help keep crawlers away from irrelevant paths, but it can also accidentally block important pages if rules are too broad.

Why it matters

Robots rules sit before the page fetch. If an important guide, glossary term, or report is disallowed, a search crawler may not be able to request the page and inspect its content. That weakens SEO discovery and can also reduce the page’s ability to support AI search features that depend on accessible web sources.

Robots.txt is also part of publication hygiene. A static site should allow public content routes while keeping internal build artifacts, admin paths, raw material, and accidental preview paths out of crawl paths when those paths exist.

How it differs

Meta robots directives live on a specific page and are seen only after a crawler can fetch that page. Robots.txt works before fetching by controlling crawler access to URL patterns.

Noindex is an indexing directive. Robots.txt is a crawling rule. Blocking a page in robots.txt can prevent crawlers from seeing a page-level noindex directive, which is why robots.txt is not the right tool for normal page removal from search results.

Example

User-agent: *
Disallow: /raw/
Disallow: /admin/
Allow: /

Sitemap: https://www.example.com/sitemap.xml

This example tells compliant crawlers not to request internal raw and admin paths, while keeping public routes available. The Sitemap line points crawlers toward the canonical sitemap, but it does not override disallow rules.

How teams use it

Teams review robots.txt when launching a site, moving content, opening a staging environment, changing CMS routes, or diagnosing crawl drops. A practical review also checks the file’s HTTP status code, because crawlers need to fetch the file before they can apply its rules.

A practical review checks:

Check	Question
Scope	Are rules targeted, or do they block broad public sections?
Sensitive paths	Are non-public paths excluded without relying on robots.txt for security?
Public content	Are published guides, glossary pages, and tools crawlable?
Sitemap	Does the sitemap URL point to the intended public sitemap?

Common misunderstanding

Robots.txt is not access control. It is a crawler instruction for compliant bots, not a security boundary. Private material should not be publicly reachable just because it is disallowed in robots.txt.

GEO vs SEO: How AI Answer Visibility Changes Search WorkA field guide to how GEO differs from traditional SEO, where the practices reinforce each other, and what operators should measure across rankings, citations, and AI answers.Alt TextAlt text is alternative text that describes meaningful images for accessibility and image understanding.BreadcrumbA breadcrumb is a navigation trail that shows where a page sits within a site's hierarchy.Canonical URLA canonical URL is the representative URL search systems choose or are asked to use when similar content exists at more than one address.