The Complete Guide to AI Visibility: What 134,000 Probes Reveal About Enterprise Readiness
Published 2026-04-20 · PROGEOLAB Research
Ten numbers that define enterprise AI visibility in 2026
The Fortune Global 500 AI Accessibility audit — 134,000 HTTP probes across 500 companies with four user agents between April 16 and April 19, 2026 — produces a stark picture of how the world's largest enterprises present themselves to AI answer engines. Ten numbers summarise the finding:
- 265 companies (53%) are accessible to ChatGPT-User
- 53 companies (10.6%) are in the GEO Visibility Gap — serving browsers but blocking AI
- 148 companies (29.6%) are unreachable by any automated client
- 20 of 267 robots.txt files (7.5%) name any AI bot
- 14 companies (2.8%) have a real, body-validated llms.txt
- 3 companies (0.6%) link their JSON-LD to Wikidata: Apple, Comcast, Repsol
- 75 companies (15%) have a real security.txt
- 160 companies (41%) serve soft-404 pages that inflate every adoption metric
- Nvidia scores 10.5 / 12 on the AI-Readiness Index — the highest of any Fortune 500 company
- Zero companies distinguish between training and retrieval crawlers in robots.txt
These numbers tell a story of massive untapped potential. The majority of Fortune 500 companies have not made a deliberate decision about their AI visibility — they have inherited default configurations that accidentally determine how AI systems represent them to hundreds of millions of users.
What is AI visibility (GEO)?
AI visibility — also called Generative Engine Optimization (GEO) — is the degree to which a website's content is accessible to, understandable by, and citable by AI answer engines like ChatGPT, Claude, Perplexity, and Google's AI Overviews. It is the AI-era counterpart to SEO, but with critical technical differences.
SEO governs how a page ranks in a list of search results. GEO governs whether a page's content appears at all — as a citation, an extract, or a factual claim — inside an AI-generated answer. If a crawler cannot access the page, the brand's current content cannot inform the answer. If the markup doesn't disambiguate the entity, the AI may conflate it with a similarly-named company. If the page structure is hostile to extraction, competitors whose markup is cleaner will be cited instead.
AI visibility is the infrastructure of the content; GEO strategy is the editorial and governance work on top of that infrastructure.
The audit: 67 probes, 4 user agents, 500 companies
Between April 16 and April 19, 2026, PROGEOLAB probed the primary corporate domain of every company on the Fortune Global 500 list. Each domain was hit with 67 HTTP endpoint requests covering home page, robots.txt, sitemap, llms.txt, JSON-LD-bearing paths, security.txt, ads.txt, and a long tail of AI-standard files including ai.txt, agents.json, and MCP discovery endpoints.
Each probe was run four times, once per user agent:
- Research bot — an honest, unknown UA identifying itself. The baseline.
- Googlebot — tests how sites treat the most established crawler. Not from a Google IP.
- Chrome — tests upper-bound accessibility from a real-browser UA string.
- ChatGPT-User — the AI retrieval crawler. The signal we most care about.
500 companies × 67 probes × 4 user agents = 134,000 individual HTTP requests. All response bodies were captured and analysed for WAF signatures, soft-404 markers, JSON-LD, and content validity. Methodology detail is in the flagship report.
The Three Contradictions
The single most important finding from synthesising all 26 research deliverables is a pattern of organisational disconnect. Three types of contradiction recur across the Fortune 500.
Contradiction 1 — Policy vs Enforcement
Cisco allows GPTBot, Google-Extended, and CCBot in its robots.txt — a deliberate content-team decision. But Cisco's Akamai WAF blocks ChatGPT-User completely. Goldman Sachs allows GPTBot and ChatGPT-User in robots.txt but blocks both at the WAF. The content team has declared a policy; the security team has not implemented it. This contradiction exists because robots.txt and WAF rules are managed by different teams that do not coordinate on AI-crawler policy.
Contradiction 2 — Content vs Access
Salesforce has 205 curated links in its llms.txt — one of the most comprehensive in the Fortune 500. But Salesforce blocks ChatGPT-User entirely. The content directory exists for an audience that cannot read it. Maersk similarly has 54 llms.txt links but blocks AI crawlers. The marketing team that creates llms.txt and the security team that configures the WAF operate independently.
Contradiction 3 — Identity vs Resolution
122 Fortune 500 companies have JSON-LD on their homepage — a meaningful investment in structured data. But 119 of those 122 (97.5%) omit a Wikidata sameAs link — the single property that enables AI to unambiguously identify which real-world entity the website represents. The SEO team implements JSON-LD for Google rich snippets; the AI entity-disambiguation use case is unconsidered.
These contradictions share a root cause: AI visibility crosses organisational boundaries. No single team owns the full stack. Content, SEO, security, and infrastructure each control one layer — and none reviews the others' AI impact.
The access layer: who can reach your site?
Access is the first gate. No subsequent signal matters if the crawler cannot fetch the response body. AI accessibility across the Fortune 500 decomposes into four buckets: fully open (265 companies serve every user agent), partially open (53 in the GEO gap), unreachable by all (148), and edge cases (34 with inconsistent behaviour).
The access layer is controlled by three independent mechanisms — UA string inspection, IP reputation, and TLS fingerprinting — each a separate countermeasure. The three-layers research covers the technical detail. The practical implication is that restoring AI access requires coordinated change across WAF rules, network-level IP policy, and potentially vendor-level TLS handling. One team's fix doesn't cover the others' blocks.
The standards layer: files that tell AI who you are
Access is necessary but not sufficient. A reachable page still requires markup that AI can extract cleanly. The standards layer comprises four signals:
- robots.txt AI directives — an explicit Allow or Disallow for named AI bots. Only 20 of 267 Fortune 500 robots.txt files do this.
- llms.txt — a Markdown file at the root of the domain that curates URLs for AI systems. 14 Fortune 500 companies have a real one; 339 serve soft-404 placeholders that falsely suggest they do.
- JSON-LD structured data — Schema.org markup on the homepage. 122 companies publish it.
- sameAs links to Wikidata — the entity-disambiguation property. 3 companies include it.
Together these four signals form what PROGEOLAB calls the GEO Maturity Model: L0 (Invisible, no signals) through L5 (AI-First, all signals plus AI-specific content curation).
Industry patterns: who blocks, who leads, who surprises
Industry-level patterns diverge sharply. Telecommunications blocks ChatGPT at 26.7% — the highest rate of any sector. Software sits at 50% (small sample). Banking is unexpectedly open at 5.3%. Pharma sits at 14% — above average but below telecom. Automotive produces the widest within-sector range (Volkswagen score 8, Tesla 0). Metals, trading, and energy sectors show zero AI-specific blocking.
The counterintuitive finding: tech infrastructure companies dominate the top-25 AI-ready ranking (Nvidia, Dell, HP, Apple, Samsung, Alphabet) even while the broader software sector blocks AI most aggressively. Companies that sell AI infrastructure invest in AI visibility; companies that sell software products block it. The AI Irony explores this split.
The 5-hour transformation: from L0 to L3
Moving a company from L0 (no AI visibility signals) to L3 (Optimized) requires approximately five hours of implementation. A step-by-step plan:
| Step | Action | Owner |
|---|---|---|
| Hr 1 (30 min) | Allow ChatGPT-User + PerplexityBot in WAF rules. Test with curl. | Security |
| Hr 1 (30 min) | Add the training-retrieval split robots.txt template. Name 10+ AI bots. | SEO |
| Hr 2 (60 min) | Create llms.txt with H1 + 20+ links to key content, organised by section. | Content |
| Hr 3 (30 min) | Add JSON-LD Corporation with sameAs to Wikidata, Wikipedia, social profiles. | SEO |
| Hr 3 (30 min) | Update title tag to brand + descriptor. Rewrite meta description to 120-160 chars. | SEO |
| Hr 4 (15 min) | Publish RFC 9116 security.txt with Contact + Expires fields. | Security |
| Hr 4 (45 min) | Verify: test every file with ChatGPT-User UA. Validate JSON-LD. Check for contradictions. | Verify |
| Hr 5 (60 min) | Document the AI visibility policy. Brief security team. Schedule quarterly review. | Ops |
These ten actions move a company from L0 to L3 (Optimized). The first Fortune 500 company to complete all ten is differentiated from 97% of peers. The 53-Point AI Visibility Checklist is the implementation-ready expansion of this roadmap.
Two case studies: the best and the biggest blocker
Volkswagen operates the second-largest llms.txt in the Fortune 500 — 198 curated links organized into Models, Shopping Tools, Owners, Financial Services, and Newsroom sections, with 65% of links dedicated to owner resources. VW is accessible across all four user agents, has JSON-LD structured data, and runs F5 BIG-IP configured to permit AI crawlers. Its AI Readiness score is 9 — higher than most tech companies. A traditional German automaker, founded in 1937, has out-invested Salesforce, Oracle, and IBM on AI visibility. Full Volkswagen case study walks through the llms.txt structure line by line.
Amazon is the opposite — and useful for a different reason. Amazon's robots.txt blocks 47 bots by name, including 16 distinct AI crawlers (GPTBot, ChatGPT-User, ClaudeBot, PerplexityBot, CCBot, Amazonbot itself, and 10 others). The block is not accidental; it is the most deliberate and technically complete AI blocking policy in the Fortune 500. Amazon's choice is coherent: protect the purchase funnel, accept the narrative cost. Whether that choice is right for your company depends on whether your revenue model is transactional (Amazon-like) or content-authority-driven (pharma, finance, professional services).
The research library
The research behind this pillar is published in full across 26 PROGEOLAB deliverables. Each links back to this page; this page links to each of them. The library below is organised by role.
The research library
Every deliverable in the PROGEOLAB Fortune 500 AI Accessibility audit, organised by theme. Each page cites this pillar; this pillar cites each of them.