Why AI search visibility should be auditable, open, and accessible to every website on the internet.
For two decades, search worked the same way. You published content. A crawler indexed it. Users clicked links. The cycle was legible, auditable, understood. An entire ecosystem of tools, agencies, and expertise grew around it — because it was a system you could study and improve.
That cycle is breaking.
When someone asks ChatGPT which project management tool to use, they don't see ten blue links. They get a synthesized answer. One or two sources get cited. The rest don't exist. When Perplexity answers a question about the best Python libraries for data science, it doesn't return a SERP — it returns a verdict. When Claude summarizes "the leading GEO optimization tools," it draws from a training corpus and live retrieval that most websites have never been optimized for.
This is Generative Engine Optimization. And right now, most of the web is invisible to it.
The shift isn't hypothetical. ChatGPT reached 500 million weekly active users in early 2025. Perplexity processes hundreds of millions of queries per month. AI Overviews appear in a significant fraction of Google searches. The number of users receiving their first contact with information through a synthesized AI answer is growing every quarter. Gartner projects that by 2026, traditional search engine volume will drop 25% as AI chatbots capture informational queries.
Most websites have never accounted for this.
The problem isn't that the signals are unknown. Research from Princeton's NLP group (KDD 2024) was the first to quantify what actually increases AI citation rates: source citations boost visibility by 27–115%. Quotation inclusion adds 41%. Statistical claims add 33%. Fluent, authoritative prose adds 29%. These are measurable effects from a peer-reviewed study on 10,000 queries validated on real Perplexity.ai responses.
AutoGEO (ICLR 2026, Carnegie Mellon) went further — using LLM-based rule extraction and reinforcement learning to improve on Princeton's baseline by 50.99%. The research is there. The signals are documented. The llms.txt specification from Answer.AI is published. The geo-checklist.dev standard exists. The path from "invisible" to "cited" is not a mystery.
The problem is that most website owners have no idea any of this exists.
Enterprise SEO platforms are adding "AI visibility" dashboards. They cost hundreds or thousands of dollars per month. They're built for large marketing teams, not independent developers or small agencies. They're walled gardens: black boxes that produce scores without explaining exactly what signal is missing or what fixing it is worth.
Meanwhile, every piece of data these platforms analyze is public. The robots.txt file is accessible to anyone. The JSON-LD schema is in the HTML source. Whether a site allows GPTBot, ClaudeBot, or PerplexityBot can be checked in seconds. Whether a site has a properly structured llms.txt, an /ai/summary.json, a FAQPage schema — all of this is inspectable without a subscription.
The visibility gap isn't a data problem. It's an access problem. It's a tooling problem.
A score means nothing if you can't verify how it was calculated. Every check in our scoring engine is documented, tied to published research, and open for inspection. When we say Schema JSON-LD is worth 16 points, you can read exactly which signals we measure and why. When we say a site scores 47/100, we tell you exactly which 53 points are missing and precisely what it would take to earn them back.
We don't invent signals. Every check in our engine traces back to empirical research or documented specifications: Princeton KDD 2024, AutoGEO ICLR 2026, the llms.txt spec by Answer.AI, the geo-checklist.dev emerging standard, the established schema.org vocabulary. When we say a signal statistically increases AI citation rates, we cite the paper. When research updates, we update the weights.
GEO Optimizer works on any publicly accessible URL, regardless of what technology powers it. It doesn't care if your site runs on a custom Rust backend, a static site generator, a PHP monolith, a headless architecture, or anything in between. The HTTP response is what AI engines see. That's what we audit. The tool has no concept of "supported platforms" — if it responds to an HTTP request, we can audit it.
The algorithms are public. The weights are in the source code. The scoring rubric is in the documentation. Every architectural decision, every weight change, every new check is visible in the git history with a commit message explaining the rationale. You can fork this, extend it, integrate it into your CI pipeline, run it against your own infrastructure, and contribute improvements back — without asking permission or paying a license fee. Visibility shouldn't require a subscription.
An audit that returns 50 vague recommendations helps no one. Every recommendation we surface is concrete, actionable, and tied to a specific score impact. "Add a FAQPage schema: +3 points." "Allow OAI-SearchBot in robots.txt: +13 points." "Create /llms.txt with at least 1,000 words: +4 points." You know exactly what you're getting for the work you do, and you can prioritize by return on effort.
The web interface is for discovery. The CLI is where real work happens. JSON output that pipes into other tools. CI integration via geo-action. An MCP server for AI agent workflows. A Python API for custom integrations and monitoring scripts. GEO Optimizer is built by developers, for developers, with the assumption that a tool only gets used if it fits naturally into an existing workflow.
Are you letting the right crawlers in? OAI-SearchBot, ClaudeBot, Claude-SearchBot, and PerplexityBot are the crawlers that power real-time citations in ChatGPT, Claude, and Perplexity. If they're blocked — explicitly or via an overly restrictive wildcard — you're invisible by default, regardless of how good your content is. This is the most commonly misconfigured signal and the one with the most immediate fix.
The emerging standard for machine-readable site context. Proposed by Answer.AI, llms.txt gives AI agents a structured index of your content, capabilities, and documentation — without requiring them to crawl and parse hundreds of pages. Think of it as sitemap.xml written for language models. Depth matters: a minimal file scores 5 points; a comprehensive, well-structured companion with llms-full.txt earns all 18.
Structured data has always helped search engines. For AI engines, it's how entities, relationships, and facts become machine-readable. FAQPage schema answers questions directly. Article schema signals freshness and authorship. Organization schema establishes legal identity. WebSite schema provides canonical context. The richness of the schema — number of relevant attributes, presence of sameAs knowledge graph links — matters as much as which types are present.
The fundamentals: a descriptive title, a meta description that accurately summarizes the page, a canonical URL to prevent duplicate content confusion, and Open Graph tags so AI agents and social platforms have structured context. These signals have mattered for traditional search for decades. They matter for AI search for the same reason: they're the clearest, most explicit signals of what a page is about.
Concrete numbers. External citations. Proper heading hierarchy with H2 and H3. Key information front-loaded in the first 30% of the content. Lists and tables that break up dense prose. Minimum word count that signals substance. The Princeton research is unambiguous: specific, well-structured, citation-ready content gets cited more. Vague, jargon-heavy, poorly organized content gets ignored.
Language declaration on the <html> element. A discoverable RSS or Atom feed. Freshness indicators via dateModified in schema or Last-Modified HTTP headers. Small signals, individually. But they establish context, recency, and trustworthiness — the baseline credibility that AI engines use to decide whether your content is worth referencing.
The new standard for machine-readable service context: /.well-known/ai.txt for crawler permissions, /ai/summary.json for structured capability descriptions, /ai/faq.json for pre-answered common queries, /ai/service.json for endpoint and feature discovery. These files are to AI agents what robots.txt was to traditional crawlers — a structured handshake between your site and the systems that will reference it.
AI engines don't just index pages. They build models of entities. Brand name coherence across H1, title, og:title, and schema Organization name is the foundation. Knowledge graph connections via sameAs to Wikipedia, Wikidata, LinkedIn, and Crunchbase establish that your brand is a known, recognized entity, not an unknown string of text. About and contact pages signal that a real organization stands behind the content. Geographic identity and topic authority round out the entity model that AI engines use to decide how much to trust and cite your content.
The 0-100 score is one dimension. But real-world AI visibility depends on factors that aren't easily reducible to points. GEO Optimizer runs six additional informational checks that surface risks and signals no competitor currently detects:
Eight patterns of AI manipulation: hidden text via CSS, invisible Unicode characters, direct LLM instructions embedded in content, prompt injection in HTML comments, monochrome cloaking text, micro-font injection, data attribute abuse, and aria-hidden exploitation. Based on UC Berkeley EMNLP 2024 research. Severity: clean, suspicious, or critical.
A composite trust assessment across five dimensions: technical trust (HTTPS, security headers), identity trust (authorship, organization, about page), social trust (sameAs links, testimonials, social profiles), academic trust (citations, statistics, authoritative sources), and consistency trust (no contradictions between title, H1, schema, and meta). Grade A through F.
Eight signals that actively reduce your chances of being cited: excessive CTAs, popup/modal interference, thin content, broken links, keyword stuffing, missing author attribution, high boilerplate ratio, and mixed signals between promise and delivery. Based on UC Berkeley research on what AI engines actively avoid citing.
Tests whether Cloudflare, Akamai, Vercel, or other CDN/WAF configurations are silently blocking AI crawlers. Many sites think they allow GPTBot because their robots.txt says so — but their CDN serves a 403 or CAPTCHA to the actual crawler. This check sends requests with real bot User-Agents and compares the responses.
Detects whether critical content is only available after JavaScript execution. SPA frameworks (React, Vue, Angular) often render content client-side — invisible to crawlers that don't execute JS. This check compares the raw HTML response against expected content signals and flags framework-specific patterns that indicate JS-dependent rendering.
Measures how well a site exposes machine-readable context for MCP-compatible AI agents. Checks for registerTool() patterns, toolname attributes, and potentialAction schema. Four readiness levels: none, basic, ready, advanced. This signal anticipates the next generation of AI agents that consume structured context — not just crawled content — before generating responses.
These six checks don't add to the score. They surface problems that the score alone can't capture — and that your competitors' tools don't look for at all.
Every scoring decision will be documented. Every change to the rubric will appear in the changelog with a rationale. If we're wrong about a signal, we'll say so and update the weights.
That means a CLI that behaves the way CLIs should, JSON output that pipes cleanly, and an API that doesn't require authentication for a basic audit. Complexity lives in the options, not in the defaults.
GEO is a young field. Some signals that matter today may matter less as AI architectures evolve. Some signals we're not measuring yet may turn out to be critical. We'll update the rubric when the evidence changes, and we'll document why — not quietly recalibrate and hope no one notices.
The score you see is the real score. The recommendations you see are the real recommendations. Every feature in the paid tier of a competitor should be a feature in the open source tool.
AI search is the next major shift in how people find, evaluate, and trust information online. The signals that determine visibility in that shift are measurable, documented, and optimizable today.
Most of the web doesn't know this yet. That gap won't stay open forever.
The sites that act now — that structure their content for citability, make themselves discoverable to AI agents, establish entity coherence and knowledge graph presence — will have a durable advantage when AI search becomes the dominant channel and everyone else starts scrambling.
GEO Optimizer exists to make that preparation practical, transparent, and accessible. Not just to teams with enterprise budgets. To every developer, every agency, every indie founder, every open source maintainer who wants to be visible in the systems that are increasingly mediating how the world finds information.
You shouldn't need a subscription to know if your site is being seen.
That's why this tool exists. That's what it will keep doing.