ShipCheckr
Crawl setup

Sitemap and Robots.txt Checker

Sitemap and robots.txt files are small, public launch signals with a big job: help crawlers find the pages that matter and understand which paths should stay out of crawl results. This checklist helps you review those files before you submit a new site or announce a production URL.

Who this is for

Built for practical launch reviews

  • Builders who have just deployed a new site and want a quick crawlability check.
  • Next.js, Vercel, and static-site owners using generated sitemap and robots routes.
  • Anyone who wants to catch accidental noindex, blocked routes, or staging URLs before launch.

What to check first

Start with the checks most likely to block launch

Open the files on production

Visit /sitemap.xml and /robots.txt on the final domain. Do not rely only on local output or a preview URL.

Scan the URLs

Look for localhost, preview domains, duplicate paths, old slugs, private paths, and pages that should not be public.

Read robots rules carefully

A broad Disallow rule can affect more routes than expected. Check that important public pages remain crawlable.

Practical checklist

Work through these checks before launch

Check the sitemap

The sitemap should describe the public pages that are worth crawling.

Use absolute production URLs

Each sitemap entry should point to the live production domain, not localhost, a preview deployment, or a staging hostname.

List indexable pages only

Remove test pages, private dashboards, checkout states, duplicate URLs, and pages that intentionally carry noindex.

Keep generated pages in sync

If pages are generated from data, make sure the same source of truth is used for navigation and sitemap entries where possible.

Check robots.txt

Robots rules are simple, but a single broad rule can block more than intended.

Read the rules literally

Look for broad Disallow values such as / or patterns that cover important public routes.

Reference the sitemap

Include the sitemap location if your framework or hosting setup supports it, so crawlers can discover the file easily.

Avoid using robots.txt for secrets

Robots.txt is public. Do not rely on it to hide private data, admin URLs, tokens, or unpublished content.

Run a practical launch pass

The goal is to find obvious crawl mistakes before they become confusing search issues.

Compare navigation with sitemap entries

Important linked pages should usually appear in the sitemap unless you have a clear reason to exclude them.

Check noindex and canonical tags

A page can be allowed by robots.txt but still excluded by metadata or canonicalized to another URL.

Retest after deployment changes

Routing, redirects, and generated metadata can change during launch work, so repeat the check after the final production build.

Practical examples

What good launch checks look like

Sitemap URL

A public page should appear as https://example.com/open-graph-preview-checker, not http://localhost:3000/open-graph-preview-checker.

Robots rule

Disallow: /admin is usually safer than Disallow: / if the public site should be crawlable.

Sitemap reference

A robots.txt file can include Sitemap: https://example.com/sitemap.xml so crawlers can discover the sitemap directly.

Related pages

Keep checking the same launch path

FAQ

Short answers before launch

Do I need both a sitemap and robots.txt?

Usually, yes. The sitemap lists important public URLs. Robots.txt gives crawler instructions and can point to the sitemap.

Should every page be in the sitemap?

No. Include public canonical pages you want crawled. Leave out private pages, duplicates, thin test pages, and pages intentionally marked noindex.

Can robots.txt hide private pages?

No. Robots.txt is public and only gives crawler instructions. Private content needs proper access control.

When should I recheck these files?

Recheck after domain changes, routing changes, new landing pages, framework upgrades, or any launch where generated metadata changes.

Related checks

Related ShipCheckr checks

Follow these links to keep the launch review moving across search, metadata, deployment, and AI-specific checks.

ShipCheckr tools

Useful tools for this review