Sitemap and Robots.txt Checker
Sitemap and robots.txt files are small, public launch signals with a big job: help crawlers find the pages that matter and understand which paths should stay out of crawl results. This checklist helps you review those files before you submit a new site or announce a production URL.
Who this is for
Built for practical launch reviews
- Builders who have just deployed a new site and want a quick crawlability check.
- Next.js, Vercel, and static-site owners using generated sitemap and robots routes.
- Anyone who wants to catch accidental noindex, blocked routes, or staging URLs before launch.
What to check first
Start with the checks most likely to block launch
Open the files on production
Visit /sitemap.xml and /robots.txt on the final domain. Do not rely only on local output or a preview URL.
Scan the URLs
Look for localhost, preview domains, duplicate paths, old slugs, private paths, and pages that should not be public.
Read robots rules carefully
A broad Disallow rule can affect more routes than expected. Check that important public pages remain crawlable.
Practical checklist
Work through these checks before launch
Check the sitemap
The sitemap should describe the public pages that are worth crawling.
Use absolute production URLs
Each sitemap entry should point to the live production domain, not localhost, a preview deployment, or a staging hostname.
List indexable pages only
Remove test pages, private dashboards, checkout states, duplicate URLs, and pages that intentionally carry noindex.
Keep generated pages in sync
If pages are generated from data, make sure the same source of truth is used for navigation and sitemap entries where possible.
Check robots.txt
Robots rules are simple, but a single broad rule can block more than intended.
Read the rules literally
Look for broad Disallow values such as / or patterns that cover important public routes.
Reference the sitemap
Include the sitemap location if your framework or hosting setup supports it, so crawlers can discover the file easily.
Avoid using robots.txt for secrets
Robots.txt is public. Do not rely on it to hide private data, admin URLs, tokens, or unpublished content.
Run a practical launch pass
The goal is to find obvious crawl mistakes before they become confusing search issues.
Compare navigation with sitemap entries
Important linked pages should usually appear in the sitemap unless you have a clear reason to exclude them.
Check noindex and canonical tags
A page can be allowed by robots.txt but still excluded by metadata or canonicalized to another URL.
Retest after deployment changes
Routing, redirects, and generated metadata can change during launch work, so repeat the check after the final production build.
Practical examples
What good launch checks look like
Sitemap URL
A public page should appear as https://example.com/open-graph-preview-checker, not http://localhost:3000/open-graph-preview-checker.
Robots rule
Disallow: /admin is usually safer than Disallow: / if the public site should be crawlable.
Sitemap reference
A robots.txt file can include Sitemap: https://example.com/sitemap.xml so crawlers can discover the sitemap directly.
Related pages
Keep checking the same launch path
Google Search Console Setup Checklist for New Websites
Set up Google Search Console for a new website with clear checks for verification, sitemap submission, URL inspection, indexing basics, and launch follow-up.
Open checklist ->Launch checksVercel Launch Checklist for New Websites
Launch a new Vercel website with practical checks for domains, redirects, production builds, environment values, SEO files, metadata, and final smoke tests.
Open checklist ->MetadataOpen Graph Preview Checker
Check Open Graph preview basics before sharing a page, including title, description, image, URL, canonical metadata, and search snippet fallbacks.
Open checklist ->FAQ
Short answers before launch
Do I need both a sitemap and robots.txt?
Usually, yes. The sitemap lists important public URLs. Robots.txt gives crawler instructions and can point to the sitemap.
Should every page be in the sitemap?
No. Include public canonical pages you want crawled. Leave out private pages, duplicates, thin test pages, and pages intentionally marked noindex.
Can robots.txt hide private pages?
No. Robots.txt is public and only gives crawler instructions. Private content needs proper access control.
When should I recheck these files?
Recheck after domain changes, routing changes, new landing pages, framework upgrades, or any launch where generated metadata changes.
Related checks
Related ShipCheckr checks
Follow these links to keep the launch review moving across search, metadata, deployment, and AI-specific checks.
Start Here
Follow a guided path through ShipCheckr launch checks.
Open check ->SEO guideGoogle Search Console Setup Checklist
Verify the property, submit the sitemap, and inspect key URLs.
Open check ->ToolLaunch Readiness Scorecard
Score launch basics before sharing more widely.
Open check ->Deploy guideVercel Launch Checklist
Check domains, redirects, builds, metadata, and smoke tests.
Open check ->ShipCheckr tools
Useful tools for this review
Meta Title and Description Preview
Check how your page title and description read in search results.
Open toolLaunch Readiness Scorecard
Score the basics that make a small app feel ready to use.
Open toolAI App Launch Checklist
Run through the core checks before you publish an AI-built app.
Open tool