Crawl setup

Sitemap and Robots.txt Checker

Sitemap and robots.txt files are small, public launch signals with a big job: help crawlers find the pages that matter and understand which paths should stay out of crawl results. This checklist helps you review those files before you submit a new site or announce a production URL.

Last updated: June 2026

Start the checklist Browse launch tools

Who this is for

Built for practical launch reviews

Builders who have just deployed a new site and want a quick crawlability check.
Next.js, Vercel, and static-site owners using generated sitemap and robots routes.
Anyone who wants to catch accidental noindex, blocked routes, or staging URLs before launch.

What to check first

Start with the checks most likely to block launch

Open the files on production

Visit /sitemap.xml and /robots.txt on the final domain. Do not rely only on local output or a preview URL.

Scan the URLs

Look for localhost, preview domains, duplicate paths, old slugs, private paths, and pages that should not be public.

Read robots rules carefully

A broad Disallow rule can affect more routes than expected. Check that important public pages remain crawlable.

Practical checklist

Work through these checks before launch

Check the sitemap

The sitemap should describe the public pages that are worth crawling.

Use absolute production URLs

Each sitemap entry should point to the live production domain, not localhost, a preview deployment, or a staging hostname.

List indexable pages only

Remove test pages, private dashboards, checkout states, duplicate URLs, and pages that intentionally carry noindex.

Keep generated pages in sync

If pages are generated from data, make sure the same source of truth is used for navigation and sitemap entries where possible.

Check robots.txt

Robots rules are simple, but a single broad rule can block more than intended.

Read the rules literally

Look for broad Disallow values such as / or patterns that cover important public routes.

Good public-site pattern

Allow the public site to be crawled, block only routes that should not be crawled, and reference the sitemap on the same production domain.

Bad public-site pattern

A leftover launch rule such as Disallow: / can make the entire site difficult for search engines to crawl.

Reference the sitemap

Include the sitemap location if your framework or hosting setup supports it, so crawlers can discover the file easily.

Avoid using robots.txt for secrets

Robots.txt is public. Do not rely on it to hide private data, admin URLs, tokens, or unpublished content.

Run a practical launch pass

The goal is to find obvious crawl mistakes before they become confusing search issues.

Compare navigation with sitemap entries

Important linked pages should usually appear in the sitemap unless you have a clear reason to exclude them.

Check noindex and canonical tags

A page can be allowed by robots.txt but still excluded by metadata or canonicalized to another URL.

Retest after deployment changes

Routing, redirects, and generated metadata can change during launch work, so repeat the check after the final production build.

Practical examples

What good launch checks look like

Sitemap URL

A public page should appear as https://example.com/open-graph-preview-checker, not http://localhost:3000/open-graph-preview-checker.

Robots rule

Disallow: /admin is usually safer than Disallow: / if the public site should be crawlable.

Good robots.txt example

User-agent: * with Disallow: /admin and Sitemap: https://www.example.com/sitemap.xml keeps private admin paths out of crawl while still pointing crawlers at public pages.

Bad robots.txt example

User-agent: * with Disallow: / blocks the whole site from crawling. That is useful for some staging sites, but wrong for a public launch.

Sitemap reference

A robots.txt file can include Sitemap: https://example.com/sitemap.xml so crawlers can discover the sitemap directly.

Keep checking the same launch path

SEO setup

Google Search Console Setup Checklist for New Websites

Set up Google Search Console for a new website with clear checks for verification, sitemap submission, URL inspection, indexing basics, and launch follow-up.

Open checklist ->Launch checks

Vercel Launch Checklist for New Websites

Launch a new Vercel website with practical checks for domains, redirects, production builds, environment values, SEO files, metadata, and final smoke tests.

Open checklist ->Metadata

Open Graph Preview Checker

Check Open Graph preview basics before sharing a page, including title, description, image, URL, canonical metadata, and search snippet fallbacks.

Open checklist ->

FAQ

Short answers before launch

Do I need both a sitemap and robots.txt?

Usually, yes. The sitemap lists important public URLs. Robots.txt gives crawler instructions and can point to the sitemap.

Should every page be in the sitemap?

No. Include public canonical pages you want crawled. Leave out private pages, duplicates, thin test pages, and pages intentionally marked noindex.

Can robots.txt hide private pages?

No. Robots.txt is public and only gives crawler instructions. Private content needs proper access control.

When should I recheck these files?

Recheck after domain changes, routing changes, new landing pages, framework upgrades, or any launch where generated metadata changes.

Related checks

Related ShipCheckr checks

Follow these links to keep the launch review moving across search, metadata, deployment, and AI-specific checks.

Guide

ShipCheckr tools

Useful tools for this review

MetadataLive

Meta Title and Description Preview

Check how your page title and description read in search results.

Open tool

Launch checksLive

Launch Readiness Scorecard

Score the basics that make a small app feel ready to use.

Open tool

Launch checksLive

AI App Launch Checklist

Run through the core checks before you publish an AI-built app.

Open tool

Built for practical launch reviews

Start with the checks most likely to block launch

Open the files on production

Scan the URLs

Read robots rules carefully

Work through these checks before launch

Check the sitemap

Use absolute production URLs

List indexable pages only

Keep generated pages in sync

Check robots.txt

Read the rules literally

Good public-site pattern

Bad public-site pattern

Reference the sitemap

Avoid using robots.txt for secrets

Run a practical launch pass

Compare navigation with sitemap entries

Check noindex and canonical tags

Retest after deployment changes

What good launch checks look like

Sitemap URL

Robots rule

Good robots.txt example

Bad robots.txt example

Sitemap reference

Keep checking the same launch path

Google Search Console Setup Checklist for New Websites

Vercel Launch Checklist for New Websites

Open Graph Preview Checker

Short answers before launch

Related ShipCheckr checks

Start Here

New Website Indexing Troubleshooter

Google Search Console Setup Checklist

Website Launch Glossary

Launch Readiness Scorecard

Vercel Launch Checklist

Useful tools for this review

Meta Title and Description Preview

Launch Readiness Scorecard

AI App Launch Checklist