Can robots.txt block a page from appearing in Google search results?

No, not reliably. Robots.txt prevents crawling, but if other pages link to a blocked URL, Google can still index it as a snippetless "URL-only" result. To remove a page from search results, use a noindex meta tag or an X-Robots-Tag HTTP header on a page that is not disallowed, so Google can crawl it and see the directive.

How often does Google check robots.txt?

Google generally caches robots.txt for up to 24 hours, though it may hold the cached copy longer if your server returns errors. Every crawl decision uses the cached version until it refreshes. After an urgent change, open the robots.txt report in Google Search Console and request a recrawl rather than waiting for the cache to expire.

Should I block AI bots like GPTBot in robots.txt?

It depends on your goals. Blocking crawlers such as GPTBot, CCBot, and Google-Extended keeps your content out of AI training and some AI features, but it can also reduce your visibility in AI Overviews and answer engines. If appearing in AI answers matters to you, keep those crawlers allowed and manage usage through other means instead.

What happens if I have no robots.txt file at all?

If no robots.txt exists, crawlers assume they may request everything on the site. Your content still gets found, but bots waste budget on parameter URLs, internal search pages, and other low-value paths, and admin directories aren't kept out of the crawl. A minimal robots.txt that allows everything and declares your sitemap is almost always better than no file.

Where exactly should the robots.txt file be located?

It must sit at the root of each host and be served over the matching protocol — https://example.com/robots.txt controls only https://example.com/. Subdomains like blog.example.com and separate protocols need their own files. A robots.txt placed in a subdirectory (for example /blog/robots.txt) is ignored by crawlers entirely.

Technical SEO10 min read

Robots.txt Guide: Create, Test, and Optimize for SEO

A complete robots.txt guide: create, test, and optimize the file that controls crawlers — plus the disallow vs noindex trap that deindexes sites.

SlapMyWeb Team·April 20, 2026

Robots.txt Guide: Create, Test, and Optimize for SEO

Robots.txt is a plain-text file at the root of your domain (example.com/robots.txt) that tells search engine crawlers which URLs they may and may not request. It is the first file Googlebot fetches on every visit, it governs crawl budget rather than indexing, and one mistaken line — Disallow: / — can quietly wall your entire site off from search. This guide covers how to create a robots.txt file correctly, test it before it goes live, and tune it so crawlers spend their budget on pages that actually matter.

The single most important thing to know up front: robots.txt controls crawling, not indexing. A blocked page can still appear in Google as a bare URL if other sites link to it. If your goal is to remove a page from results, you need noindex, not Disallow — more on that distinction below.

What Robots.txt Is and Why It Matters

Robots.txt is part of the Robots Exclusion Protocol, a standard every major search engine respects and which Google helped formalize as RFC 9309. The file lives at exactly one location — the root of each host — and applies only to that host and protocol. So https://example.com/robots.txt governs https://example.com/, while https://blog.example.com/ and http://example.com/ each need their own file.

Free tool

Robots.txt Generator →

Build a robots.txt file with a visual editor. Add rules for any user agent. Free, no signup, runs in your browser.

#robots-txt #crawl-control #technical-seo #googlebot #crawl-budget #noindex #ai-crawlers

ShareX LinkedIn Facebook

SlapMyWeb Team

We build SlapMyWeb — a brutally honest AI website audit that scans 240+ SEO, performance and Core Web Vitals signals and hands you the fix code.

Ready to test your site?

Get a free AI audit in 30 seconds. No signup required.

Slap My Website

Competitor Keyword Analysis: Find the Terms They Rank For

Keyword Research · 10 min

Keyword Clustering: Group Keywords Into Topics That Rank

Keyword Research · 9 min

Keyword Difficulty: How to Judge If You Can Rank

Keyword Research · 10 min

Directive	Effect
`User-agent: *`	Rules apply to all crawlers without a more specific block
`User-agent: Googlebot`	Rules apply only to Google's main crawler
`Disallow:` (empty)	Allow everything — the explicit "open door"
`Disallow: /`	Block the entire site
`Disallow: /admin/`	Block the `/admin/` directory and everything under it
`Allow: /admin/public/`	Carve an exception out of a blocked parent
`Disallow: /*.pdf$`	Block all URLs ending in `.pdf`
`Sitemap: https://example.com/sitemap.xml`	Declare a sitemap (use the full absolute URL)

Goal	Right tool
Save crawl budget on large/duplicate sections	`Disallow` in robots.txt
Remove a page from search results	`noindex` (crawlable, not disallowed)
Keep a private file unreachable	Authentication / server rules — not robots.txt

Robots.txt Guide: Create, Test, and Optimize for SEO

What Robots.txt Is and Why It Matters

Ready to test your site?

Related posts

How Crawlers Read the File

Directive Syntax in Detail

1. Start With a Minimal Working File

2. Block Admin, Private, and Duplicate Paths

3. Decide Your Stance on AI Crawlers

4. Declare Every Sitemap

A Complete, Production-Ready Example

5. Test Before You Deploy

Common Robots.txt Mistakes

Robots.txt vs Noindex: The Distinction That Trips Everyone Up

Where Robots.txt Fits in Your Technical SEO

Frequently Asked Questions

Can robots.txt block a page from appearing in Google search results?

How often does Google check robots.txt?

Should I block AI bots like GPTBot in robots.txt?

What happens if I have no robots.txt file at all?

Where exactly should the robots.txt file be located?