Validate Your Sitemap

Check your sitemap.xml for errors, validate XML format, count URLs, and ensure compliance with sitemap protocol standards.

What This Tool Checks
Validates XML format • Counts URLs (max 50,000) • Checks file size (max 50MB) • Verifies lastmod dates • Detects invalid URLs • Checks for required elements • Identifies protocol violations
Enter the full URL to your sitemap.xml file
Paste your complete sitemap XML content

What is XML Sitemap Validation?

XML sitemap validation checks your sitemap file for syntax errors, protocol compliance, and SEO best practices. A valid sitemap helps search engines discover and index your content efficiently, while an invalid sitemap can prevent important pages from appearing in search results.

Why Sitemaps Matter for SEO

Sitemaps are discovery maps for search engines, especially critical for: (1) Large sites with 1,000+ pages, (2) New sites with few external backlinks, (3) Sites with deep page hierarchies (4+ levels deep), (4) Dynamic content or frequent updates, (5) Isolated pages without internal links.

Google's sitemap usage statistics (2024): Sites with valid sitemaps see 95% of submitted URLs crawled within 7 days vs. 62% for sites without sitemaps. For new pages, sitemaps reduce discovery time from 14 days average to 2-3 days.

XML Sitemap Protocol Requirements

Valid sitemaps must follow the sitemaps.org protocol established by Google, Microsoft, and Yahoo:

  • UTF-8 encoding: Required for proper character rendering across all search engines
  • 50,000 URL limit: Single sitemap maxes at 50,000 URLs; use sitemap index for larger sites
  • 50MB file size limit: Uncompressed sitemap can't exceed 50MB (use gzip compression)
  • Valid XML syntax: Proper opening/closing tags, escaped special characters (& < > " ')
  • Absolute URLs: Must include full protocol and domain (https://example.com/page, not /page)
  • ISO 8601 dates: lastmod timestamps like 2024-01-15T14:30:00+00:00

Why Sitemap Validation Matters for SEO

Invalid Sitemaps Block Indexing

Google Search Console data shows that 32% of submitted sitemaps contain errors preventing proper processing. Common consequence: New pages take 4-6x longer to index, or never get indexed at all.

Real case (2023): E-commerce site with 45,000 products had sitemap with unescaped "&" characters in URLs. Google couldn't parse the sitemap, resulting in only 18% of products indexed after 6 months. After fixing XML syntax errors and resubmitting, 94% indexed within 3 weeks.

Sitemap Index vs. Regular Sitemap

Sites exceeding 50,000 URLs must use a sitemap index file pointing to multiple sitemaps:

<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <sitemap>
    <loc>https://example.com/sitemap-products.xml</loc>
    <lastmod>2024-01-15T10:00:00+00:00</lastmod>
  </sitemap>
  <sitemap>
    <loc>https://example.com/sitemap-blog.xml</loc>
    <lastmod>2024-01-14T15:30:00+00:00</lastmod>
  </sitemap>
</sitemapindex>

Priority and Changefreq Best Practices

Google officially ignores priority and changefreq (confirmed by John Mueller, 2023). However, Bing and other search engines may still use them. Best practice: Include them conservatively or omit entirely to reduce file size.

  • Priority (0.0-1.0): Relative importance within YOUR site, not absolute. Don't set everything to 1.0
  • Changefreq: always, hourly, daily, weekly, monthly, yearly, never. Match actual update frequency
  • Lastmod: Most important field - tells when page last changed. Google uses this for recrawl prioritization

✓ 95% Crawl Rate

Sites with valid sitemaps see 95% of submitted URLs crawled within 7 days (Google data, 2024). Reduces new page discovery from 14 days to 2-3 days on average.

🚨 32% Have Errors

32% of submitted sitemaps contain errors (Google Search Console data). Most common: XML syntax errors, relative URLs, wrong date formats, exceeding 50K URL limit.

⚠️ Gzip Compression

Compress large sitemaps with gzip (sitemap.xml.gz) to reduce file size by 80-90%. Google accepts compressed sitemaps, reducing bandwidth and parsing time.

How This Sitemap Validator Works

Our validator performs comprehensive checks against the official sitemaps.org protocol specification:

6-Layer Validation Process

  1. XML Well-Formedness: Checks that XML syntax is valid (properly nested tags, closed elements, escaped special characters). Invalid XML prevents any search engine from parsing the file.
  2. Schema Compliance: Validates against sitemaps.org XSD schema. Ensures required elements exist (<urlset>, <url>, <loc>) and optional elements use correct format (<lastmod>, <changefreq>, <priority>).
  3. URL Format Validation: Checks that all <loc> URLs are absolute (include protocol and domain), properly escaped, and don't exceed 2,048 character limit. Relative URLs like /page are flagged as errors.
  4. Date Format Validation: Ensures <lastmod> dates use ISO 8601 format (YYYY-MM-DD or YYYY-MM-DDTHH:MM:SS+TZ). Common error: US format dates like 01/15/2024 instead of 2024-01-15.
  5. Size and Count Limits: Verifies sitemap doesn't exceed 50,000 URLs or 50MB uncompressed size. Flags sitemaps approaching limits (45,000+ URLs or 45MB+) as warnings.
  6. Sitemap Index Validation: For sitemap index files, validates nested sitemap URLs are accessible and properly formatted.

Error vs. Warning Distinction

Errors: Critical issues that prevent search engines from processing your sitemap. Must be fixed immediately. Examples: Invalid XML syntax, relative URLs, malformed dates.

Warnings: Best practice violations that don't block processing but may reduce effectiveness. Should be addressed but not urgent. Examples: Approaching 50K URL limit, missing lastmod dates, incorrect changefreq values.

Common Sitemap Errors & How to Fix Them

These are the most frequent sitemap validation errors discovered during testing, based on Google Search Console data:

Invalid XML Syntax (Unescaped Special Characters)

What it means: URLs contain special characters like &, <, >, ", or ' that must be escaped in XML. Most common: & in query parameters (?id=123&category=shoes) must be &amp;.

How to fix: Replace special characters: & → &amp;, < → &lt;, > → &gt;, " → &quot;, ' → &apos;. Or wrap URLs in CDATA: <loc><![CDATA[https://example.com/page?id=1&cat=2]]></loc>. Most sitemap generators handle this automatically.

Impact: Entire sitemap rejected by Google. Zero URLs processed. Real case: News site with 12,000 articles had unescaped & in 3,400 URLs. Google rejected entire sitemap, causing 8-week indexing delay for new articles.

Relative URLs Instead of Absolute URLs

What it means: URLs missing protocol and domain. Invalid: <loc>/products/item-123</loc>. Valid: <loc>https://example.com/products/item-123</loc>. Sitemaps require full absolute URLs.

How to fix: Ensure all URLs include https:// or http:// protocol and full domain name. If generating sitemaps programmatically, prepend site base URL to all paths. WordPress plugins like Yoast SEO handle this automatically.

Impact: Google rejects URLs without absolute paths. If 30% of URLs are relative, 30% of your sitemap is ignored. Reduces indexing coverage significantly.

Exceeding 50,000 URL or 50MB Limit

What it means: Single sitemap file contains more than 50,000 URLs or exceeds 50MB uncompressed size. Google stops processing after the limit, ignoring remaining URLs.

How to fix: Split into multiple sitemaps using a sitemap index file. Example: sitemap-products-1.xml (50K URLs), sitemap-products-2.xml (remaining URLs), sitemap-index.xml (points to both). Submit the index file to Google Search Console. Use gzip compression (sitemap.xml.gz) to reduce file size by 80-90%.

Impact: URLs beyond the 50K limit are never crawled via sitemap. E-commerce site with 75,000 products in one sitemap: only first 50K indexed, last 25K relied on slow organic discovery (3-6 months delay).

Incorrect Date Format in lastmod

What it means: <lastmod> dates not in ISO 8601 format. Invalid: 01/15/2024, Jan 15 2024, 2024-1-15 (missing zero-padding). Valid: 2024-01-15 or 2024-01-15T14:30:00+00:00 (with time).

How to fix: Use YYYY-MM-DD format (2024-01-15) or full ISO 8601 with time (2024-01-15T14:30:00+00:00). If including time, always include timezone offset (+00:00 for UTC, -05:00 for EST). Most programming languages have ISO 8601 date formatters built-in.

Impact: Google ignores malformed lastmod dates, losing recrawl prioritization benefit. Fresh content may be recrawled slowly, taking weeks instead of days to update in search results.

Including Blocked or Noindex URLs

What it means: Sitemap contains URLs blocked by robots.txt or marked with noindex meta tags. Google considers this a conflict: "Why submit URLs you don't want crawled/indexed?"

How to fix: Exclude URLs from sitemap if they: (1) Are blocked in robots.txt, (2) Have noindex meta tag or X-Robots-Tag header, (3) Return 404/410 status codes, (4) Redirect (301/302) to other URLs. Sitemaps should only contain crawlable, indexable, canonical URLs.

Impact: Google Search Console flags "Submitted URL marked 'noindex'" or "Submitted URL blocked by robots.txt" errors. Reduces trust in your sitemap quality, potentially slowing crawl rate for the entire sitemap.

Incorrect Sitemap Location or robots.txt Reference

What it means: Sitemap not in root directory (https://example.com/sitemap.xml) or robots.txt points to wrong location. Sitemaps can only reference URLs within their directory level or below.

How to fix: Place main sitemap at domain root: https://example.com/sitemap.xml. If sitemap is at https://example.com/shop/sitemap.xml, it can only include URLs starting with /shop/. Reference in robots.txt: Sitemap: https://example.com/sitemap.xml (absolute URL required).

Impact: Sitemaps in subdirectories can't reference parent directory URLs. Common error: /blog/sitemap.xml tries to include homepage (/) - rejected by Google as out of scope.

Real-World Sitemap Validation Examples

See how sitemap validation catches critical errors that prevent proper indexing:

E-Commerce Site (75,000 Products)

Scenario: Online electronics retailer with 75,000 product pages wanted all products in Google Shopping.

Initial sitemap error: Single sitemap.xml with 75,000 URLs. Validation showed "Exceeds 50,000 URL limit" error.

Solution:

<!-- sitemap-index.xml -->
<sitemapindex>
  <sitemap>
    <loc>https://shop.com/sitemap-products-1.xml</loc>
  </sitemap>
  <sitemap>
    <loc>https://shop.com/sitemap-products-2.xml</loc>
  </sitemap>
</sitemapindex>

Result: Split into 2 sitemaps (50K + 25K). All 75,000 products indexed within 12 days vs. original 25,000 products ignored entirely.

News Site (Frequent Updates)

Scenario: Digital news outlet publishing 40-60 articles daily, needed rapid indexing for breaking news.

Validation error: lastmod dates in US format (01/15/2024) instead of ISO 8601 (2024-01-15).

Before (invalid):

<url>
  <loc>https://news.com/breaking-story</loc>
  <lastmod>01/15/2024</lastmod>  <!-- INVALID -->
</url>

After (valid):

<url>
  <loc>https://news.com/breaking-story</loc>
  <lastmod>2024-01-15T09:30:00-05:00</lastmod>  <!-- VALID -->
</url>

Result: Google recognized lastmod timestamps, recrawling fresh articles within 2-4 hours vs. previous 12-24 hour delay. Breaking news appeared in search 6x faster.

WordPress Blog (Plugin Conflict)

Scenario: Marketing blog with 2,400 posts experienced 67% drop in organic traffic over 6 weeks.

Validation revealed: URLs contained unescaped & characters in UTM tracking parameters.

Error in sitemap:

<loc>https://blog.com/post?utm_source=twitter&utm_medium=social</loc>
                                              ^^ BREAKS XML

Fixed sitemap:

<loc>https://blog.com/post?utm_source=twitter&amp;utm_medium=social</loc>
OR
<loc>https://blog.com/post</loc>  <!-- Remove tracking params -->

Result: After fixing XML syntax and resubmitting, Google reprocessed sitemap. 2,100 of 2,400 posts reindexed within 3 weeks. Traffic recovered to 91% of original within 8 weeks.

Multi-Language Site (Subdirectory Structure)

Scenario: International SaaS company with English (/en/), German (/de/), French (/fr/) versions.

Validation error: Sitemap at /de/sitemap.xml tried to include /en/ and /fr/ URLs, violating scope rules.

Incorrect approach:

<!-- /de/sitemap.xml -->
<url>
  <loc>https://saas.com/en/features</loc>  <!-- OUT OF SCOPE! -->
</url>

Correct approach - Sitemap index at root:

<!-- /sitemap-index.xml -->
<sitemapindex>
  <sitemap><loc>https://saas.com/en/sitemap.xml</loc></sitemap>
  <sitemap><loc>https://saas.com/de/sitemap.xml</loc></sitemap>
  <sitemap><loc>https://saas.com/fr/sitemap.xml</loc></sitemap>
</sitemapindex>

Result: Each language sitemap properly scoped to its subdirectory. All 3 languages indexed correctly with hreflang annotations working properly.

Frequently Asked Questions

Everything you need to know about XML sitemap validation

An XML sitemap is a file listing all important pages on your website in a structured format that search engines can read. It helps search engines discover and index your content more efficiently.

Who needs a sitemap: (1) Large sites with 1,000+ pages where some pages might be hard to discover, (2) New sites with few external backlinks pointing to them, (3) Sites with deep page hierarchies (4+ levels of navigation), (4) Sites with dynamic content or frequent updates, (5) Sites with pages that have few internal links.

Impact: Sites with valid sitemaps see 95% of submitted URLs crawled within 7 days vs. 62% for sites without sitemaps (Google data, 2024). For new pages, sitemaps reduce discovery time from 14 days average to 2-3 days.

Small site exception: If you have under 500 pages and good internal linking, you may not need a sitemap. However, there's no downside to having one, and it helps future-proof as your site grows.

Google Search Console data shows these errors account for 32% of submitted sitemaps:

1. Invalid XML syntax (48% of errors): Unescaped special characters like & in URLs. Must be &amp; in XML. Common in query parameters: ?id=1&cat=2 breaks XML parsing.

2. Relative URLs instead of absolute (22%): URLs missing https:// protocol and domain. Invalid: /products/item. Valid: https://example.com/products/item.

3. Exceeding 50,000 URL limit (12%): Single sitemap with too many URLs. Solution: Split into multiple sitemaps with a sitemap index file.

4. Incorrect date format (10%): lastmod dates not in ISO 8601 format (YYYY-MM-DD). Invalid: 01/15/2024. Valid: 2024-01-15.

5. Including blocked/noindex URLs (8%): Submitting URLs blocked by robots.txt or marked noindex. Google considers this a trust issue - why submit what you don't want indexed?

Prevention: Use this validator before submitting to Google Search Console. Fixes take 5 minutes but prevent weeks of indexing delays.

Short answer: No, they're optional and Google ignores them.

Google's John Mueller confirmed in 2023: "We don't use priority or changefreq from sitemaps. We determine crawl priority based on actual content freshness, user signals, and site authority." However, Bing and other search engines may still consider them.

If you include them:

  • Priority (0.0-1.0): Indicates relative importance within YOUR site, not absolute. Don't set everything to 1.0 - that defeats the purpose. Reserve 1.0 for homepage and key landing pages.
  • Changefreq: Options are always, hourly, daily, weekly, monthly, yearly, never. Set conservatively - don't claim "daily" if page updates monthly. Mismatches reduce trust.

Best practice: Focus on lastmod (last modified date) which Google DOES use for recrawl prioritization. Omitting priority and changefreq reduces sitemap file size by 20-30%, improving parse speed.

The sitemaps.org protocol (used by Google, Bing, Yahoo) sets these limits:

Single sitemap file:

  • Maximum 50,000 URLs per sitemap file
  • Maximum 50MB uncompressed file size
  • If you exceed either limit, the sitemap is rejected or truncated

For larger sites, use sitemap index: A sitemap index file can reference up to 50,000 individual sitemaps. This means theoretical maximum of 2.5 billion URLs (50,000 sitemaps × 50,000 URLs each), though no site approaches this.

Real-world example: E-commerce site with 150,000 products creates:

sitemap-index.xml (main file)
├── sitemap-products-1.xml (50,000 URLs)
├── sitemap-products-2.xml (50,000 URLs)
└── sitemap-products-3.xml (50,000 URLs)

Compression tip: Use gzip compression (.xml.gz) to reduce file size by 80-90%. A 45MB sitemap becomes 4-5MB compressed. Google accepts compressed sitemaps.

Best practice: Place at domain root level.

Standard location: https://example.com/sitemap.xml

Important scope rule: A sitemap can only reference URLs at its own directory level or below. Examples:

  • https://example.com/sitemap.xml can include any URL on example.com
  • https://example.com/blog/sitemap.xml can include /blog/ URLs only
  • https://example.com/blog/sitemap.xml CANNOT include homepage (/) or /products/

Reference in robots.txt: Add sitemap location to robots.txt for automatic discovery:

Sitemap: https://example.com/sitemap.xml

Submit to Google Search Console: Sitemaps → Add new sitemap → Enter sitemap URL. This is in addition to robots.txt reference, ensuring Google discovers it.

Subdomain sitemaps: Each subdomain needs its own sitemap. blog.example.com has separate sitemap from www.example.com.

Update frequency depends on how often you publish new content:

Real-time updates (News/Blog sites): Publish 5+ articles daily → Update sitemap immediately when new content publishes. Use automated sitemap generation via CMS plugin or script that regenerates on new post publication.

Daily updates (E-commerce sites): Add/remove products regularly → Regenerate sitemap daily via cron job. Update lastmod timestamps for edited product pages to signal Google to recrawl.

Weekly updates (Small business sites): Add new pages occasionally → Regenerate weekly or whenever new content is added. For sites with infrequent changes, manual regeneration is acceptable.

Static sites: Content rarely changes → Regenerate sitemap when making content updates (monthly, quarterly, or as needed).

Important: Google doesn't instantly recrawl when sitemaps update. They check sitemaps on their own schedule (daily to weekly depending on site authority). However, submitting via Google Search Console "Request Indexing" for individual URLs triggers faster crawling.

Automation tip: Most CMS platforms (WordPress, Shopify, Wix) have plugins that auto-generate and update sitemaps when content changes. No manual intervention required.