Robots.txt: What to Allow, Block, and Why

Robots.txt tells crawlers what they can and can’t access. Get it wrong and you block important pages — or waste crawl budget on pages that don’t matter. Here’s how to configure it correctly.

What Robots.txt Does

Directs crawlers — Tells them which paths to skip (Disallow) or allow (Allow).
Does not block — It’s a request, not enforcement. Malicious bots may ignore it.
Affects crawl budget — Blocking low-value URLs frees Google to crawl important pages.

Basic Structure

User-agent: *
Disallow: /admin/
Disallow: /search/
Allow: /

Sitemap: https://yoursite.com/sitemap-index.xml

**User-agent: *** — Applies to all crawlers. (Googlebot, Bingbot, etc.)
Disallow — Paths to skip.
Allow — Explicit permission. Usually unnecessary; default is allow.
Sitemap — Points to your sitemap. One line, at the end.

What to Block

/wp-admin/ — WordPress admin.
/admin/ — Custom admin panels.
/login/ — Login pages.

No SEO value. Security risk if indexed. Block them.

Search Results

/?s= — WordPress search.
/search/ — Site search.
?q= — Query parameters.

Duplicate content. Infinite URLs. Wastes crawl budget.

Thank-You Pages

/thank-you/
/confirmation/

No value for search. Block.

Staging or Dev (If Exposed)

If staging is accidentally live, block it. Then fix the exposure. Blocking is a temporary measure.

Parameter-Heavy URLs

?sort=
?filter=
Session IDs

If these create infinite low-value URLs, consider blocking. Test first — some sites need them for discovery.

What Never to Block

Important Pages

/services/
/products/
/blog/
/resources/

Blocking these kills SEO. Double-check your Disallow rules.

CSS and JavaScript

Google needs CSS and JS to render pages. Don’t block:

*.css
*.js

Blocking them can hurt indexing and Core Web Vitals.

Sitemap

Don’t block your sitemap. Crawlers need to find it.

Common Mistakes

Blocking Everything

Disallow: /

Only use for sites you truly don’t want indexed (e.g., staging). Never on production.

Blocking Important Paths

A typo or overly broad rule can block key pages. Test with Google’s robots.txt Tester in Search Console.

No Sitemap Reference

Add Sitemap: https://yoursite.com/sitemap-index.xml at the end. Helps discovery.

Conflicting Rules

Disallow: /blog/
Allow: /blog/

Allow and Disallow for the same path. Google uses the most specific rule. Avoid confusion — keep it simple.

B2B Considerations

Multi-location — Don’t block location pages. They can rank for local intent.
Case studies — Don’t block. They’re valuable for SEO and links.
Resources — Blog, guides, docs. Never block.

Testing

Search Console — Robots.txt Tester. Check if important URLs are blocked.
Manual check — Visit https://yoursite.com/robots.txt. Verify rules.
Crawl — Use Screaming Frog or Sitebulb. Ensure important pages are crawlable.

We configure robots.txt for every project. Start a project and we’ll audit your crawl configuration.

Robots.txt: What to Allow, Block, and Why

What Robots.txt Does

Basic Structure

What to Block

Search Results

Thank-You Pages

Staging or Dev (If Exposed)

Parameter-Heavy URLs

What Never to Block

Important Pages

CSS and JavaScript

Sitemap

Common Mistakes

Blocking Everything

Blocking Important Paths

No Sitemap Reference

Conflicting Rules

B2B Considerations

Testing

Related articles

Solutions-Based SEO: Content That Answers and Converts

SEO Content Calendar: Planning and Prioritizing

Backlink Strategy for B2B: What Works Without Spam

Robots.txt: What to Allow, Block, and Why

What Robots.txt Does

Basic Structure

What to Block

Admin and Login

Search Results

Thank-You Pages

Staging or Dev (If Exposed)

Parameter-Heavy URLs

What Never to Block

Important Pages

CSS and JavaScript

Sitemap

Common Mistakes

Blocking Everything

Blocking Important Paths

No Sitemap Reference

Conflicting Rules

B2B Considerations

Testing

Get growth insights delivered.

Related articles

Solutions-Based SEO: Content That Answers and Converts

SEO Content Calendar: Planning and Prioritizing

Backlink Strategy for B2B: What Works Without Spam