Sitemap Test
What is it?
An XML sitemap is a structured list of the URLs on your site that you want search engines to discover and consider for indexing, along with metadata such as the last-modified date for each URL. It is the single most efficient way to tell crawlers exactly which pages you care about, especially on large sites or sites where internal linking does not cleanly expose every important URL.
Why sitemaps still matter
Search engines can usually find your pages by following internal links, but a sitemap dramatically accelerates discovery and signals priority. Newly published pages get indexed faster, updated pages are recrawled sooner when their lastmod date changes, and pages buried deep in the site structure become reachable without relying on long crawl chains. For news, e-commerce, and content-heavy sites where freshness matters, the sitemap is one of the most cost-effective SEO investments you can make.
The sitemap also serves as a canonical source of truth that complements your robots.txt rules. While robots.txt tells crawlers what they may not access, the sitemap tells them what they should prioritize. Together they form a coherent crawl policy: blocked URLs stay out of indexes, listed URLs get attention, and everything else gets discovered through internal linking.
The newer reason: AI crawlers
AI training and answer-engine crawlers consume the same XML sitemap as classical search bots. Publishing a clean, current sitemap exposes your most important URLs to every well-behaved crawler in one fetch, which improves indexing odds across both classical search and AI-powered systems.
Common mistakes worth checking
- No sitemap published, especially on sites built with bespoke frameworks where the default sitemap module was not included.
- Sitemap lists 404 or redirected URLs, wasting crawl budget on dead entries.
- Mixed canonical and non-canonical URLs, splitting crawl signals across variants.
- Sitemap exceeding 50,000 URLs or 50 MB uncompressed, which requires splitting into multiple sitemaps with an index file.
- Sitemap not referenced from robots.txt or submitted to Google Search Console.
This test verifies that your site publishes an XML sitemap at a discoverable location. The fix guide below covers how to generate, publish, and submit sitemaps from the major content management systems and headless framework setups.
Pass rate:
-
Top 100 websites: 83%This value indicates the percent of top 100 most visited websites in the US that pass this test (in the past 12 months).
-
All websites: 79%This value indicates the percent of all websites analyzed in SEO Site Checkup (500,000+) in the past 12 months.
| 2021 | 75% |
|---|---|
| 2022 | 80% |
| 2023 | 74% |
| 2024 | 83% |
100
75
50
25
0
How do I fix it?
An XML sitemap gives search engines and AI crawlers an explicit, machine-readable list of the URLs you want indexed, along with metadata such as last-modified dates. Fixing this issue means generating a valid sitemap, publishing it at a stable URL, and pointing crawlers to it via robots.txt and Search Console. This is especially important for large sites, sites with weak internal linking, or sites where new content needs to be discovered quickly.
Example
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://example.com/</loc>
<lastmod>2026-04-15</lastmod>
</url>
<url>
<loc>https://example.com/blog/getting-started</loc>
<lastmod>2026-03-22</lastmod>
</url>
</urlset>
Where to make the change
- Raw HTML or static site: generate the XML at build time and publish it at
/sitemap.xml. Many static-site generators have a built-in sitemap option. - WordPress: a dedicated SEO plugin will generate and update the sitemap automatically, typically at
/sitemap_index.xml. - Shopify: Shopify generates
/sitemap.xmlautomatically and updates it as products and collections change. - Wix or Squarespace: both platforms generate the sitemap automatically and serve it at the standard root path.
- Headless or framework sites: use the framework's built-in sitemap module (for example
next-sitemapfor Next.js or the@astrojs/sitemapintegration) so the file is regenerated on every build.
Common causes and how to resolve them
- No sitemap published: generate one and publish it at the site root. Reference it from robots.txt with a
Sitemap:line. - Sitemap lists URLs that 404 or redirect: stale entries waste crawl budget and trigger Search Console warnings. Regenerate the sitemap from current published URLs only.
- Mixed canonical and non-canonical URLs: the sitemap should list canonical URLs only. Listing both
httpandhttps, or trailing-slash variants, splits crawl signals. - Sitemap exceeds size limits: a single sitemap may contain at most 50,000 URLs and 50 MB uncompressed. Split larger sites into multiple sitemaps and reference them from a sitemap index file.
Best practices
- Submit to Search Console: add the sitemap URL in Google Search Console (and Bing Webmaster Tools) so the platform tracks coverage and surfaces errors.
- Keep
lastmodaccurate: Google useslastmodas a hint when prioritizing crawls. Update it whenever the page content meaningfully changes; do not bump it on every build. - Exclude noindex and disallowed URLs: the sitemap should describe what you want indexed. Listing pages that are blocked from crawling sends mixed signals.
- Use sitemap index files for multi-sitemap setups: a sitemap index aggregates child sitemaps (for example one per content type) and is the single URL you submit to search engines.