What is GEO Optimizer?

GEO Optimizer is an open-source toolkit that audits websites for AI search engine visibility. It scores sites 0-100 based on 8 categories and 47 research-backed methods, covering robots.txt, llms.txt, Schema JSON-LD, meta tags, content quality, technical signals, AI discovery, and brand entity signals.

Why does AI search visibility matter?

AI search engines like ChatGPT, Perplexity, and Claude synthesize answers from cited sources. Sites not optimized for AI visibility are invisible in these answers. Princeton KDD 2024 research shows source citations increase AI visibility by 27-115%, statistics by 33%, and authoritative tone by 29%.

Is GEO Optimizer free and open source?

Yes. GEO Optimizer is MIT-licensed open source software. Install with: pip install geo-optimizer-skill. The scoring rubric, algorithms, and all weights are publicly documented.

What websites does GEO Optimizer work with?

GEO Optimizer works with any publicly accessible URL regardless of the underlying technology. It audits the HTTP response that AI engines see, so it works with static sites, WordPress, Next.js, custom backends, headless architectures, and anything else that responds to HTTP requests.

Open Source · Research-Backed · Developer-First

The GEO Optimizer
Manifesto

Why AI search visibility should be auditable, open, and accessible to every website on the internet.

Contents

The web changed. Most websites don't know it yet.
The visibility gap is a tooling problem.
What GEO Optimizer stands for.
The eight signals that determine AI visibility.
Beyond the score: what no competitor tells you.
Who this is for.
What we commit to.
The bottom line.

01 —

The web changed. Most websites don't know it yet.

For two decades, search worked the same way. You published content. A crawler indexed it. Users clicked links. The cycle was legible, auditable, understood. An entire ecosystem of tools, agencies, and expertise grew around it — because it was a system you could study and improve.

That cycle is breaking.

When someone asks ChatGPT which project management tool to use, they don't see ten blue links. They get a synthesized answer. One or two sources get cited. The rest don't exist. When Perplexity answers a question about the best Python libraries for data science, it doesn't return a SERP — it returns a verdict. When Claude summarizes "the leading GEO optimization tools," it draws from a training corpus and live retrieval that most websites have never been optimized for.

This is Generative Engine Optimization. And right now, most of the web is invisible to it.

The shift isn't hypothetical. ChatGPT reached 500 million weekly active users in early 2025. Perplexity processes hundreds of millions of queries per month. AI Overviews appear in a significant fraction of Google searches. The number of users receiving their first contact with information through a synthesized AI answer is growing every quarter. Gartner projects that by 2026, traditional search engine volume will drop 25% as AI chatbots capture informational queries.

Most websites have never accounted for this.

02 —

The visibility gap is a tooling problem.

The problem isn't that the signals are unknown. Research from Princeton's NLP group (KDD 2024) was the first to quantify what actually increases AI citation rates: source citations boost visibility by 27–115%. Quotation inclusion adds 41%. Statistical claims add 33%. Fluent, authoritative prose adds 29%. These are measurable effects from a peer-reviewed study on 10,000 queries validated on real Perplexity.ai responses.

AutoGEO (ICLR 2026, Carnegie Mellon) went further — using LLM-based rule extraction and reinforcement learning to improve on Princeton's baseline by 50.99%. The research is there. The signals are documented. The llms.txt specification from Answer.AI is published. The geo-checklist.dev standard exists. The path from "invisible" to "cited" is not a mystery.

The problem is that most website owners have no idea any of this exists.

Enterprise SEO platforms are adding "AI visibility" dashboards. They cost hundreds or thousands of dollars per month. They're built for large marketing teams, not independent developers or small agencies. They're walled gardens: black boxes that produce scores without explaining exactly what signal is missing or what fixing it is worth.

Meanwhile, every piece of data these platforms analyze is public. The robots.txt file is accessible to anyone. The JSON-LD schema is in the HTML source. Whether a site allows GPTBot, ClaudeBot, or PerplexityBot can be checked in seconds. Whether a site has a properly structured llms.txt, an /ai/summary.json, a FAQPage schema — all of this is inspectable without a subscription.

The visibility gap isn't a data problem. It's an access problem. It's a tooling problem.

03 —

What GEO Optimizer stands for.

Auditability over opacity

A score means nothing if you can't verify how it was calculated. Every check in our scoring engine is documented, tied to published research, and open for inspection. When we say Schema JSON-LD is worth 16 points, you can read exactly which signals we measure and why. When we say a site scores 47/100, we tell you exactly which 53 points are missing and precisely what it would take to earn them back.

Scientific foundation over marketing claims

We don't invent signals. Every check in our engine traces back to empirical research or documented specifications: Princeton KDD 2024, AutoGEO ICLR 2026, the llms.txt spec by Answer.AI, the geo-checklist.dev emerging standard, the established schema.org vocabulary. When we say a signal statistically increases AI citation rates, we cite the paper. When research updates, we update the weights.

Universality over platform bias

GEO Optimizer works on any publicly accessible URL, regardless of what technology powers it. It doesn't care if your site runs on a custom Rust backend, a static site generator, a PHP monolith, a headless architecture, or anything in between. The HTTP response is what AI engines see. That's what we audit. The tool has no concept of "supported platforms" — if it responds to an HTTP request, we can audit it.

Open source over gatekeeping

The algorithms are public. The weights are in the source code. The scoring rubric is in the documentation. Every architectural decision, every weight change, every new check is visible in the git history with a commit message explaining the rationale. You can fork this, extend it, integrate it into your CI pipeline, run it against your own infrastructure, and contribute improvements back — without asking permission or paying a license fee. Visibility shouldn't require a subscription.

Precision over noise

An audit that returns 50 vague recommendations helps no one. Every recommendation we surface is concrete, actionable, and tied to a specific score impact. "Add a FAQPage schema: +3 points." "Allow OAI-SearchBot in robots.txt: +13 points." "Create /llms.txt with at least 1,000 words: +4 points." You know exactly what you're getting for the work you do, and you can prioritize by return on effort.

Developer experience as a first-class concern

The web interface is for discovery. The CLI is where real work happens. JSON output that pipes into other tools. CI integration via geo-action. An MCP server for AI agent workflows. A Python API for custom integrations and monitoring scripts. GEO Optimizer is built by developers, for developers, with the assumption that a tool only gets used if it fits naturally into an existing workflow.

04 —

The eight signals that determine AI visibility.

robots.txt max 18 pts

Are you letting the right crawlers in? OAI-SearchBot, ClaudeBot, Claude-SearchBot, and PerplexityBot are the crawlers that power real-time citations in ChatGPT, Claude, and Perplexity. If they're blocked — explicitly or via an overly restrictive wildcard — you're invisible by default, regardless of how good your content is. This is the most commonly misconfigured signal and the one with the most immediate fix.

llms.txt max 18 pts

The emerging standard for machine-readable site context. Proposed by Answer.AI, llms.txt gives AI agents a structured index of your content, capabilities, and documentation — without requiring them to crawl and parse hundreds of pages. Think of it as sitemap.xml written for language models. Depth matters: a minimal file scores 5 points; a comprehensive, well-structured companion with llms-full.txt earns all 18.

Schema JSON-LD max 16 pts

Structured data has always helped search engines. For AI engines, it's how entities, relationships, and facts become machine-readable. FAQPage schema answers questions directly. Article schema signals freshness and authorship. Organization schema establishes legal identity. WebSite schema provides canonical context. The richness of the schema — number of relevant attributes, presence of sameAs knowledge graph links — matters as much as which types are present.

Meta Tags max 14 pts

The fundamentals: a descriptive title, a meta description that accurately summarizes the page, a canonical URL to prevent duplicate content confusion, and Open Graph tags so AI agents and social platforms have structured context. These signals have mattered for traditional search for decades. They matter for AI search for the same reason: they're the clearest, most explicit signals of what a page is about.

Content Quality max 12 pts

Concrete numbers. External citations. Proper heading hierarchy with H2 and H3. Key information front-loaded in the first 30% of the content. Lists and tables that break up dense prose. Minimum word count that signals substance. The Princeton research is unambiguous: specific, well-structured, citation-ready content gets cited more. Vague, jargon-heavy, poorly organized content gets ignored.

Technical Signals max 6 pts

Language declaration on the <html> element. A discoverable RSS or Atom feed. Freshness indicators via dateModified in schema or Last-Modified HTTP headers. Small signals, individually. But they establish context, recency, and trustworthiness — the baseline credibility that AI engines use to decide whether your content is worth referencing.

AI Discovery max 6 pts

The new standard for machine-readable service context: /.well-known/ai.txt for crawler permissions, /ai/summary.json for structured capability descriptions, /ai/faq.json for pre-answered common queries, /ai/service.json for endpoint and feature discovery. These files are to AI agents what robots.txt was to traditional crawlers — a structured handshake between your site and the systems that will reference it.

Brand & Entity Signals max 10 pts

AI engines don't just index pages. They build models of entities. Brand name coherence across H1, title, og:title, and schema Organization name is the foundation. Knowledge graph connections via sameAs to Wikipedia, Wikidata, LinkedIn, and Crunchbase establish that your brand is a known, recognized entity, not an unknown string of text. About and contact pages signal that a real organization stands behind the content. Geographic identity and topic authority round out the entity model that AI engines use to decide how much to trust and cite your content.

04b —

Beyond the score: what no competitor tells you.

The 0-100 score is one dimension. But real-world AI visibility depends on factors that aren't easily reducible to points. GEO Optimizer runs six additional informational checks that surface risks and signals no competitor currently detects:

Prompt Injection Detection security

Eight patterns of AI manipulation: hidden text via CSS, invisible Unicode characters, direct LLM instructions embedded in content, prompt injection in HTML comments, monochrome cloaking text, micro-font injection, data attribute abuse, and aria-hidden exploitation. Based on UC Berkeley EMNLP 2024 research. Severity: clean, suspicious, or critical.

Trust Stack Score 5 layers

A composite trust assessment across five dimensions: technical trust (HTTPS, security headers), identity trust (authorship, organization, about page), social trust (sameAs links, testimonials, social profiles), academic trust (citations, statistics, authoritative sources), and consistency trust (no contradictions between title, H1, schema, and meta). Grade A through F.

Negative Signals anti-citation

Eight signals that actively reduce your chances of being cited: excessive CTAs, popup/modal interference, thin content, broken links, keyword stuffing, missing author attribution, high boilerplate ratio, and mixed signals between promise and delivery. Based on UC Berkeley research on what AI engines actively avoid citing.

CDN Crawler Access access

Tests whether Cloudflare, Akamai, Vercel, or other CDN/WAF configurations are silently blocking AI crawlers. Many sites think they allow GPTBot because their robots.txt says so — but their CDN serves a 403 or CAPTCHA to the actual crawler. This check sends requests with real bot User-Agents and compares the responses.

JS Rendering access

Detects whether critical content is only available after JavaScript execution. SPA frameworks (React, Vue, Angular) often render content client-side — invisible to crawlers that don't execute JS. This check compares the raw HTML response against expected content signals and flags framework-specific patterns that indicate JS-dependent rendering.

WebMCP Readiness future

Measures how well a site exposes machine-readable context for MCP-compatible AI agents. Checks for registerTool() patterns, toolname attributes, and potentialAction schema. Four readiness levels: none, basic, ready, advanced. This signal anticipates the next generation of AI agents that consume structured context — not just crawled content — before generating responses.

These six checks don't add to the score. They surface problems that the score alone can't capture — and that your competitors' tools don't look for at all.

05 —

Who this is for.

Every developer who has ever deployed a project and wondered why it doesn't appear in AI-generated answers about the problem it solves.
Every agency managing client sites that need to maintain visibility as search behavior fundamentally shifts.
Every indie founder who can't afford enterprise SEO tools but understands that organic visibility is existential — and that the rules are changing faster than most people realize.
Every technical writer, documentation team, and open source maintainer who wants their work to be cited when AI assistants answer questions their content directly addresses.
Every DevOps engineer who wants GEO checks in their CI pipeline — the same way they have linting, security scans, and performance budgets.
Every researcher who wants to understand, empirically, what the current state of AI citation mechanics actually is.

06 —

What we commit to.

The core tool will stay open source.

Every scoring decision will be documented. Every change to the rubric will appear in the changelog with a rationale. If we're wrong about a signal, we'll say so and update the weights.

We'll build for developers first.

That means a CLI that behaves the way CLIs should, JSON output that pipes cleanly, and an API that doesn't require authentication for a basic audit. Complexity lives in the options, not in the defaults.

We'll be honest about uncertainty.

GEO is a young field. Some signals that matter today may matter less as AI architectures evolve. Some signals we're not measuring yet may turn out to be critical. We'll update the rubric when the evidence changes, and we'll document why — not quietly recalibrate and hope no one notices.

We won't sell audit credits or put core functionality behind a paywall.

The score you see is the real score. The recommendations you see are the real recommendations. Every feature in the paid tier of a competitor should be a feature in the open source tool.

07 —

The bottom line.

AI search is the next major shift in how people find, evaluate, and trust information online. The signals that determine visibility in that shift are measurable, documented, and optimizable today.

Most of the web doesn't know this yet. That gap won't stay open forever.

The sites that act now — that structure their content for citability, make themselves discoverable to AI agents, establish entity coherence and knowledge graph presence — will have a durable advantage when AI search becomes the dominant channel and everyone else starts scrambling.

GEO Optimizer exists to make that preparation practical, transparent, and accessible. Not just to teams with enterprise budgets. To every developer, every agency, every indie founder, every open source maintainer who wants to be visible in the systems that are increasingly mediating how the world finds information.

You shouldn't need a subscription to know if your site is being seen.

That's why this tool exists. That's what it will keep doing.

The GEO OptimizerManifesto