The AI Visibility Checklist: 53 Data-Backed Checks with Fortune 500 Benchmarks
Published 2026-04-20 · PROGEOLAB Research
The PROGEOLAB 53-Point AI Visibility Checklist is an audit-ready list of checks covering every known signal that determines whether AI answer engines can access, understand, and cite your website's content. Each item has a Fortune 500 benchmark attached: the check passes when you match or exceed the adoption rate of the top-25 AI-ready companies.
Use it as a pre-launch audit, a quarterly review, or a gap analysis against the Fortune 500 leaders. The checklist is grouped into six dimensions. Higher-impact items are listed first within each dimension.
Dimension 1 — Access (9 checks)
- Homepage returns HTTP 200 to Chrome UA — 70.4% of Fortune 500 pass
- Homepage returns HTTP 200 to ChatGPT-User UA — 60% pass
- Homepage returns HTTP 200 to PerplexityBot UA
- Homepage returns HTTP 200 to ClaudeBot UA
- Homepage returns HTTP 200 to Googlebot UA (reverse-DNS verifiable)
- No Layer-2 datacenter IP blocking for known AI crawler IPs — 95.2% pass (24 fail)
- No Layer-3 TLS fingerprinting that rejects non-browser clients — 97% pass (15 fail)
- Server does not return challenge pages (JavaScript-only responses) to AI UAs
- No geographic blocking preventing AI crawlers based in US/EU datacenters from reaching the site
Dimension 2 — Bot Policy (9 checks)
- robots.txt exists and is reachable at
/robots.txt - robots.txt names at least 5 AI crawlers explicitly — 7.5% pass
- robots.txt distinguishes training crawlers from retrieval crawlers — 0% pass (first mover available)
- robots.txt includes GPTBot directive (Allow or Disallow)
- robots.txt includes ChatGPT-User directive
- robots.txt includes ClaudeBot directive
- robots.txt includes Google-Extended directive
- robots.txt declares a sitemap URL
- robots.txt policy matches WAF behavior (no declaration-enforcement gap)
Dimension 3 — Standards (10 checks)
- llms.txt exists at domain root and is reachable — 2.8% pass (body-validated)
- llms.txt contains at least 20 URLs organized in sections
- llms.txt Content-Type is
text/plainortext/markdown(not HTML) - llms.txt is reachable to ChatGPT-User specifically (not just Chrome)
- security.txt follows RFC 9116 format — 15% pass
- security.txt includes Contact and Expires fields
- security.txt includes PGP-signed URL (optional but signals maturity)
- sitemap.xml reachable and current
- ads.txt present if running programmatic advertising
- No soft-404 pages at AI-standard paths (ai.txt, agents.json, mcp.json)
Dimension 4 — Structured Data (9 checks)
- JSON-LD present on homepage — 24.4% pass
- JSON-LD uses Corporation type (or more specific) not generic Organization
- JSON-LD includes legalName, url, logo fields
- JSON-LD includes numberOfEmployees — 8.7% pass
- JSON-LD includes foundingDate
- JSON-LD includes sameAs array
- sameAs includes Wikidata QID URL — 0.6% pass (Apple, Comcast, Repsol)
- sameAs includes Wikipedia entity URL
- sameAs includes verified social media profiles (LinkedIn, X)
Dimension 5 — Content (8 checks)
- Homepage title tag follows
Brand + Descriptorformat — brand-only titles fail the check - Homepage meta description present and 120-160 chars — 148 Fortune 500 missing meta descriptions
- Homepage has semantic H1 with company name or value proposition
- Homepage text-to-code ratio exceeds 20% — Fortune 500 average 34%
- Content is server-rendered, not JS-only — 9 Fortune 500 homepages fail
- Pricing, availability, or specs are in HTML (not just images or PDFs)
- Product documentation uses H2/H3 hierarchy extractable by AI
- FAQ pages use FAQPage schema (if FAQs exist)
Dimension 6 — Technical (8 checks)
- HTTPS-only, no HTTP-to-HTTPS mixed content
- Canonical URLs consistent (no http/https or www/non-www split)
- hreflang tags for international sites
- Server response time under 2s from common datacenter locations
- No JavaScript-only rendering for critical content paths
- Open Graph meta tags present (og:title, og:description, og:image)
- Twitter/X Card meta tags present
- AI crawler verification test runs quarterly (curl against all 4 UAs from non-corporate IP)
Scoring
Each check is binary (pass / fail). Sum the passes, divide by 53, multiply by 100. Adjust weighting if you want: the Access dimension matters 3× more than Technical on our internal scoring.
- 0-10 — Level 0, Invisible. Most Fortune 500 sit here accidentally.
- 11-25 — Level 1, Present but unoptimized. Chrome works, nothing else.
- 26-40 — Level 2, Partial. Some JSON-LD, some robots.txt, WAF-WAF contradictions likely.
- 41-60 — Level 3, Optimized. The 5-hour transformation lands here.
- 61-80 — Level 4, AI-Ready. Top-25 Fortune 500 tier.
- 81-100 — Level 5, AI-First. No Fortune 500 currently reaches this.