Platform Research Insights Glossary Tools Compare FAQ Request Demo
Guides & How-To

The AI Visibility Checklist: 53 Data-Backed Checks with Fortune 500 Benchmarks

Published 2026-04-20 · PROGEOLAB Research

The PROGEOLAB 53-Point AI Visibility Checklist is an audit-ready list of checks covering every known signal that determines whether AI answer engines can access, understand, and cite your website's content. Each item has a Fortune 500 benchmark attached: the check passes when you match or exceed the adoption rate of the top-25 AI-ready companies.

Use it as a pre-launch audit, a quarterly review, or a gap analysis against the Fortune 500 leaders. The checklist is grouped into six dimensions. Higher-impact items are listed first within each dimension.

Dimension 1 — Access (9 checks)

  1. Homepage returns HTTP 200 to Chrome UA — 70.4% of Fortune 500 pass
  2. Homepage returns HTTP 200 to ChatGPT-User UA — 60% pass
  3. Homepage returns HTTP 200 to PerplexityBot UA
  4. Homepage returns HTTP 200 to ClaudeBot UA
  5. Homepage returns HTTP 200 to Googlebot UA (reverse-DNS verifiable)
  6. No Layer-2 datacenter IP blocking for known AI crawler IPs — 95.2% pass (24 fail)
  7. No Layer-3 TLS fingerprinting that rejects non-browser clients — 97% pass (15 fail)
  8. Server does not return challenge pages (JavaScript-only responses) to AI UAs
  9. No geographic blocking preventing AI crawlers based in US/EU datacenters from reaching the site

Dimension 2 — Bot Policy (9 checks)

  1. robots.txt exists and is reachable at /robots.txt
  2. robots.txt names at least 5 AI crawlers explicitly — 7.5% pass
  3. robots.txt distinguishes training crawlers from retrieval crawlers — 0% pass (first mover available)
  4. robots.txt includes GPTBot directive (Allow or Disallow)
  5. robots.txt includes ChatGPT-User directive
  6. robots.txt includes ClaudeBot directive
  7. robots.txt includes Google-Extended directive
  8. robots.txt declares a sitemap URL
  9. robots.txt policy matches WAF behavior (no declaration-enforcement gap)

Dimension 3 — Standards (10 checks)

  1. llms.txt exists at domain root and is reachable — 2.8% pass (body-validated)
  2. llms.txt contains at least 20 URLs organized in sections
  3. llms.txt Content-Type is text/plain or text/markdown (not HTML)
  4. llms.txt is reachable to ChatGPT-User specifically (not just Chrome)
  5. security.txt follows RFC 9116 format — 15% pass
  6. security.txt includes Contact and Expires fields
  7. security.txt includes PGP-signed URL (optional but signals maturity)
  8. sitemap.xml reachable and current
  9. ads.txt present if running programmatic advertising
  10. No soft-404 pages at AI-standard paths (ai.txt, agents.json, mcp.json)

Dimension 4 — Structured Data (9 checks)

  1. JSON-LD present on homepage — 24.4% pass
  2. JSON-LD uses Corporation type (or more specific) not generic Organization
  3. JSON-LD includes legalName, url, logo fields
  4. JSON-LD includes numberOfEmployees — 8.7% pass
  5. JSON-LD includes foundingDate
  6. JSON-LD includes sameAs array
  7. sameAs includes Wikidata QID URL — 0.6% pass (Apple, Comcast, Repsol)
  8. sameAs includes Wikipedia entity URL
  9. sameAs includes verified social media profiles (LinkedIn, X)

Dimension 5 — Content (8 checks)

  1. Homepage title tag follows Brand + Descriptor format — brand-only titles fail the check
  2. Homepage meta description present and 120-160 chars — 148 Fortune 500 missing meta descriptions
  3. Homepage has semantic H1 with company name or value proposition
  4. Homepage text-to-code ratio exceeds 20% — Fortune 500 average 34%
  5. Content is server-rendered, not JS-only — 9 Fortune 500 homepages fail
  6. Pricing, availability, or specs are in HTML (not just images or PDFs)
  7. Product documentation uses H2/H3 hierarchy extractable by AI
  8. FAQ pages use FAQPage schema (if FAQs exist)

Dimension 6 — Technical (8 checks)

  1. HTTPS-only, no HTTP-to-HTTPS mixed content
  2. Canonical URLs consistent (no http/https or www/non-www split)
  3. hreflang tags for international sites
  4. Server response time under 2s from common datacenter locations
  5. No JavaScript-only rendering for critical content paths
  6. Open Graph meta tags present (og:title, og:description, og:image)
  7. Twitter/X Card meta tags present
  8. AI crawler verification test runs quarterly (curl against all 4 UAs from non-corporate IP)

Scoring

Each check is binary (pass / fail). Sum the passes, divide by 53, multiply by 100. Adjust weighting if you want: the Access dimension matters 3× more than Technical on our internal scoring.

  • 0-10 — Level 0, Invisible. Most Fortune 500 sit here accidentally.
  • 11-25 — Level 1, Present but unoptimized. Chrome works, nothing else.
  • 26-40 — Level 2, Partial. Some JSON-LD, some robots.txt, WAF-WAF contradictions likely.
  • 41-60 — Level 3, Optimized. The 5-hour transformation lands here.
  • 61-80 — Level 4, AI-Ready. Top-25 Fortune 500 tier.
  • 81-100 — Level 5, AI-First. No Fortune 500 currently reaches this.