Tech Giants Blocking AI: The Industry That Built It, Blocks It Most
Published 2026-04-20 · PROGEOLAB Research
The tech-giant AI-blocking irony is the pattern where companies that build AI infrastructure or sell software products are the most aggressive at blocking AI crawlers. Across the Fortune 500, software is the sector with the highest AI-blocking rate — 50% of the small sample — while the companies that supply AI training compute (Nvidia, Dell) are among the most AI-visible. The industry that built it, blocks it most.
Across 15 Fortune 500 technology companies sampled in our audit, the distribution is stark:
| Company | Chrome | ChatGPT | robots.txt AI | Score |
|---|---|---|---|---|
| Dell | 58 | 59 | Allow (1 bot) | 10 |
| Apple | 14 | 14 | — | 8 |
| HP | 8 | 7 | Allow (10 bots) | 8 |
| SAP | 2 | 2 | Allow (13 bots) | 7 |
| Meta | 2 | 14 | Allow (8 bots) | 5 |
| Salesforce | 13 | 0 | None | 2 |
| IBM | 12 | 0 | None | 2 |
| Amazon | — | — | Block (16 bots) | 1 |
| Oracle | 13 | 0 | None | 0 |
Amazon: the most thorough block
Amazon's robots.txt is 5,888 bytes with 48 User-agent sections. Sixteen of those sections name AI-specific crawlers, every one with Disallow: /. No other Fortune 500 company approaches this thoroughness. Amazon also has no JSON-LD, no llms.txt, no sameAs links — the AI-visibility score is 1, and the point comes from the fact that Amazon.com is accessible to Chrome.
The strategic logic is internally consistent: Amazon's competitive advantage is the transaction, not the citation. When AI extracts and redistributes pricing, reviews, and availability, consumers may get answers without visiting Amazon. Every AI-mediated comparison that doesn't end with a click to amazon.com is a potential lost transaction.
Salesforce: the 205-link content directory nobody can read
Salesforce maintains one of the largest llms.txt files in the Fortune 500 — 206 lines, 205 curated links organized across product, developer, and enterprise documentation. The file was body-validated in our llms.txt adoption audit. Yet Salesforce blocks ChatGPT-User entirely (0 of 13 endpoints return content). The content directory exists for an audience that cannot read it.
This is the sharpest example of the Content-Access Contradiction documented in the pillar guide: marketing invested in AI-specific content; security blocked AI access; no cross-team review caught the conflict.
Meta: allow the crawlers, break the server
Meta's robots.txt explicitly allows 8 AI crawlers (GPTBot, ChatGPT-User, ClaudeBot, PerplexityBot, Google-Extended, Applebot-Extended, meta-externalagent, FacebookBot). But actual access is strange: Chrome reaches 2 of 64 endpoints; ChatGPT-User reaches 14. Meta's Cloudflare deployment appears to treat ChatGPT-User more permissively than a datacenter-Chrome — possibly because Meta explicitly whitelisted OpenAI IP ranges without doing the same for generic datacenter Chromium traffic. Unintentional, but favourable to AI.
Dell, HP, SAP: tech's AI-native minority
Dell's 10/12 score comes from a 131-link llms.txt, JSON-LD, explicit Bytespider allow in robots.txt, and 59-of-64 ChatGPT accessibility. HP explicitly allows 10 AI bots and provides regional sitemaps for AI systems. SAP names 13 AI bots in robots.txt and runs security.txt. None of the three markets itself as AI-first; they simply haven't configured WAF rules that conflict with their content policy.
The companies that most loudly sell AI-powered products — Salesforce Einstein, Oracle Cloud AI, IBM Watson — are the ones whose web properties are most hostile to AI retrieval. The companies selling metal, chips, and servers are the ones whose web properties are most open. The pattern doesn't prove a thesis about AI product messaging; it does show that AI-visibility outcomes are uncorrelated with AI-brand marketing.