The Fortune 500 Homepage Is 34% Text and 66% Code: What AI Actually Reads
Published 2026-04-20 · PROGEOLAB Research
When an AI crawler fetches a Fortune 500 homepage, the response is a mix of structural markup, JavaScript, CSS, tracking scripts, analytics tags, and — somewhere in the middle — actual text content. The text-to-HTML ratio measures how much of the page is human-readable content versus machine infrastructure. For AI systems that don't execute JavaScript, the initial ratio is what determines what the model can actually read.
Across 400 accessible Fortune 500 homepages in our April 2026 audit, the mean text-to-HTML ratio is 34%. Two-thirds of the bytes an AI crawler receives are not content. The distribution is heavy-tailed — most homepages cluster in the 25-45% range, but a long tail of near-zero sits below 10%.
The distribution
- Top decile — text ratio above 50%. Server-rendered, content-heavy homepages. Newsrooms, investor-relations pages, companies that treat the homepage as a content surface rather than a brand stage
- Mean — 34%. The typical Fortune 500 homepage is marketing chrome with embedded content blocks
- Bottom decile — text ratio below 15%. Heavy JavaScript apps where the initial HTML is a content shell waiting to be hydrated client-side
- 9 homepages with text ratio under 1% — fully JavaScript-rendered, no server-side text content at all
The 9 JavaScript-only homepages
Nine Fortune 500 homepages return an initial HTML payload that contains no meaningful text content. The entire visible page is constructed by JavaScript executing in the browser after the initial request. For a user with a modern browser, this is invisible — the page renders normally. For an AI crawler that does not execute JavaScript — which is most of them — the response is effectively blank.
Eight of the nine JS-only homepages are in the GEO Visibility Gap. ChatGPT-User does not execute JavaScript in its retrieval mode; Googlebot does, but Google-Extended is more selective. Perplexity and Anthropic retrieval crawlers behave like ChatGPT-User — server-rendered HTML only. A JS-only homepage effectively opts out of AI-mediated discovery for most AI systems.
What to measure
Two simple measurements reveal the text-ratio risk:
- Initial HTML text count. Run
curlagainst the homepage with a non-browser UA. Strip HTML tags. Count words. If fewer than 300 words are in the initial response, AI crawlers see almost nothing. - Ratio calculation. Divide the stripped-text byte length by the total response byte length. Below 20% means most of the page is infrastructure; below 10% means AI is reading empty chrome.
Fix: server-side render at least the homepage's hero content, value proposition, and primary navigation. JavaScript hydration for interactivity is fine; JavaScript rendering for content discovery is a silent AI-visibility block. The text-to-code ratio is check #41 in the 53-point checklist, and the companion title/meta audit covers the related homepage-hygiene checks.