Skip to main content
Guides Checklist

AI visibility checklist: how to make your website easier to find, cite, and recommend

This checklist covers the signals that determine whether AI answer engines can find your site, parse its content, and cite it in responses. It's organized by category — from crawl-level basics to brand signals — with clear criteria for each item. Work through it top to bottom: the higher categories are foundational; the lower ones have diminishing returns if you skip the basics.

Published June 2026 · 14 min read

AI visibility checklist — five layers to audit AI visibility checklist — five layers to audit Crawlability Foundation Meta & Signals Foundation llms.txt + Schema AI Layer Content Quality Foundation Brand & Entity Foundation 5 checks 4 checks 6 checks 6 checks 4 checks Fix in order — each layer depends on the previous
Five layers of AI visibility signals — work through them in order, since each builds on the previous.

How to use this checklist

Each item has a criterion and what to check. "Pass" means the signal is present and correct. "Fail" means it's missing, broken, or actively working against you. Some items have partial states.

You can run through this manually on your own site, or use an automated audit to get a scored report across these categories. The checklist covers the same signals as the GEO Optimizer audit; the audit gives you a numeric score and specific recommendations per page.

Shorthand used below: AI crawler = agents like GPTBot, OAI-SearchBot, ClaudeBot, PerplexityBot, Google-Extended. Answer engine = ChatGPT web search, Perplexity, Gemini, Claude with web access, and similar.

1. Crawlability

Foundation — fix this first

  • robots.txt doesn't block AI crawlers you want

    Check: fetch yourdomain.com/robots.txt and look for Disallow: / rules under GPTBot, OAI-SearchBot, ClaudeBot, PerplexityBot, Google-Extended. If they're disallowed, those engines can't retrieve your content.

    Pass: AI crawlers you want are explicitly allowed, or not mentioned (default allow).

  • Pages return 200 OK

    Crawlers that hit 4xx or 5xx errors on important pages won't index them. Check your key landing pages, guides, and product pages return clean 200s.

    Pass: no important pages returning error status codes.

  • Redirects are 301, not chains

    Redirect chains (A → B → C) slow down crawlers and dilute link equity. Each extra hop is a failure point. Use 301s directly to the canonical URL.

    Pass: no redirect chains longer than one hop on key pages.

  • Response time under 3 seconds

    Slow pages get deprioritized or timed out during crawl. Measure TTFB (time to first byte), not just browser load time.

    Pass: TTFB under ~500ms; full page under ~3s for crawlers.

  • Sitemap is present and linked from robots.txt

    Add Sitemap: https://yourdomain.com/sitemap.xml to your robots.txt so any crawler can discover all your pages.

    Pass: sitemap exists, is accessible, and is referenced in robots.txt.

Technical layer — five checks Technical layer — five checks robots.txt AI crawlers allowed Canonical Self-referencing tag present Sitemap XML submitted & reachable llms.txt AI context file published Schema Structured data validated
The technical layer: five checks that determine whether AI engines can reach and index your pages.

2. Meta signals

Page-level identification

  • Each page has a unique, descriptive <title>

    Title tags are used by both classic search and AI retrieval systems to categorize pages. Duplicate or generic titles ("Home | Site Name" on every page) confuse retrieval.

    Pass: every important page has a unique title that describes its specific content.

  • Meta description present and accurate

    Not a direct ranking signal for AI, but descriptions are sometimes used as context by tools that summarize pages. Keep them factual and specific.

    Pass: meta description present on all key pages, accurate, under 160 characters.

  • Canonical tag present and self-referencing

    Without a canonical, retrieval systems may treat multiple URL variants of the same page as separate content, diluting its signal.

    Pass: <link rel="canonical"> on every page, pointing to the preferred URL.

  • Open Graph / og: tags present

    OG tags are used when pages are shared and summarized. Some AI summarization tools also read them. At minimum: og:title, og:description, og:image.

    Pass: og:title, og:description, og:image, og:url present on key pages.

3. llms.txt

Curated content map for AI tools

  • llms.txt exists at domain root

    Check: yourdomain.com/llms.txt should return a plain text file, not a 404 or HTML page.

    Pass: file accessible, returns plain text (Content-Type: text/plain).

  • llms.txt follows the correct format

    The file should use Markdown: H1 for site name, a blockquote for the summary, H2 sections for categories, and link lists with descriptions. See our implementation guide.

    Pass: valid Markdown, site description present, at least one section with linked pages.

  • llms.txt includes most important pages

    A file that only lists low-priority pages (or auto-generates every post) is worse than a curated short list. Include homepage, key product/service pages, and your best content.

    Pass: core pages present; no more than ~30-50 entries (quality over quantity).

4. Structured data (schema.org)

Explicit entity and relationship signals

  • Organization or Person schema present

    Every site should declare its identity. An Organization schema with name, URL, logo, and description helps AI systems build entity associations.

    Pass: valid JSON-LD Organization (or Person for personal sites) on homepage.

  • WebSite schema with search action

    Helps engines understand the site as a whole. Include a SearchAction if you have on-site search.

    Pass: WebSite schema present, name and URL match the canonical domain.

  • Article schema on blog/guide pages

    Marks content as authored, dated, and categorized. Include headline, author, datePublished, dateModified.

    Pass: valid Article schema on all editorial content pages.

  • FAQPage schema where applicable

    FAQ sections marked with FAQPage schema give AI systems clean question-answer pairs to pull from.

    Pass: FAQPage schema present on pages with Q&A sections; questions and answers accurate and specific.

  • BreadcrumbList schema on sub-pages

    Helps engines understand site structure and page context.

    Pass: breadcrumb schema present on all pages more than one level deep.

  • No schema validation errors

    Malformed JSON-LD (unclosed brackets, wrong types) can be silently ignored. Validate with Google's Rich Results Test or schema.org validator.

    Pass: no errors in validator; warnings are acceptable if the meaning is clear.

5. Content signals

Quotability and clarity

  • Content is in server-rendered HTML

    View the raw HTML source (not the rendered DOM). Your key paragraphs, headings, and facts must appear there. If they only appear after JavaScript runs, assume a basic crawler won't see them.

    Pass: all important content present in raw HTML source.

  • Pages lead with a direct answer or summary

    AI retrieval favors pages where the first paragraph states the answer clearly. Burying the key point under marketing prose reduces quotability.

    Pass: first paragraph of each key page states its main point directly.

  • Headings match real questions or intents

    H2/H3 headings that match how people phrase queries help retrieval systems identify which section answers which question.

    Pass: headings are descriptive and query-shaped; not generic ("Introduction", "Section 3").

  • Facts are specific and verifiable

    Vague claims ("industry-leading performance") aren't quotable. Numbers, dates, and concrete statements are. A model can cite "response time under 200ms"; it can't cite "blazing fast".

    Pass: key claims are specific, dated where relevant, and don't require context to interpret.

  • One idea per paragraph

    Dense paragraphs that mix multiple ideas are harder to lift cleanly. Short, focused paragraphs make it easy for a model to pull a clean passage.

    Pass: most paragraphs cover one point; rarely more than 5-6 sentences.

  • Processes are written as numbered steps

    Numbered steps are one of the easiest formats for AI to quote and reproduce. If you're explaining how to do something, use an ordered list.

    Pass: all how-to content uses numbered steps, not prose-only descriptions.

Content layer — five checks: Direct Answer, Clear Headings, Specific Facts, Numbered Steps, One Idea per Paragraph Content layer — five checks Direct Answer Lead with the answer, not context first Clear Headings H2/H3 match real questions people ask Specific Facts Numbers, dates, names the AI can cite Numbered Steps Ordered lists signal clear procedure One Idea / Para Short, dense paragraphs easy to extract
The content layer: five signals that determine whether your pages are quotable by AI answer engines.

6. AI discovery signals

Findability beyond direct crawl

  • XML sitemap is complete and up to date

    A sitemap tells crawlers what URLs exist. An outdated sitemap with broken or missing URLs undermines discoverability.

    Pass: sitemap includes all important pages; lastmod dates are accurate; no 404s in the sitemap.

  • Internal linking connects key pages

    Orphan pages (reachable only via sitemap, not through in-site links) are less likely to be indexed and less likely to accumulate authority.

    Pass: every important page linked from at least two other pages; no key pages require sitemap to reach.

  • No noindex on pages you want cited

    A noindex meta tag or X-Robots-Tag header tells crawlers to exclude the page from their index. Verify it isn't set accidentally on content you want visible.

    Pass: no noindex on pages you want indexed and cited.

7. Brand and entity signals

Who you are, consistently stated

  • Brand name used consistently across all pages

    Inconsistent naming (sometimes "Acme", sometimes "Acme Inc.", sometimes "acme.com") fragments entity associations. AI systems use consistent references to build confidence in who you are.

    Pass: brand name spelled identically across title tags, schema, footer, and copy.

  • About page describes the entity clearly

    A clear, factual About page (who you are, what you do, where you're based, when founded) gives models anchor points for entity recognition.

    Pass: About page exists, factual, describes the organization concisely.

  • Social profiles and external mentions are consistent

    LinkedIn, GitHub, Crunchbase, press mentions — all should use the same brand name and description. Cross-web consistency strengthens entity association.

    Pass: major external profiles exist and match on-site branding.

  • Wikidata or Wikipedia entry (if applicable)

    Not achievable for most sites, but the strongest entity signal if you qualify. Models are trained on Wikipedia and Wikidata; being there creates a canonical entity record.

    Pass: entry exists, accurate, linked from your site. Skip if not notable enough — a bad entry is worse than none.

Priority order for a first pass

If you're auditing a site for the first time and want to know where to start:

  1. Fix crawlability issues first — robots.txt blocks, redirect chains, slow responses.
  2. Add or fix canonical tags and meta titles on all key pages.
  3. Create an llms.txt — low effort, explicitly surfaces your priorities.
  4. Add Organization and Article schema to your homepage and content pages.
  5. Rewrite your most important page introductions to lead with direct answers.
  6. Fix internal linking so no important page is an orphan.
  7. Work on brand/entity consistency across the web — this is slower but compounds.

The GEO vs SEO guide explains why these priorities differ from a classic SEO checklist.

AI visibility is a continuous loop, not a one-time setup AI visibility is a continuous loop, not a one-time setup ① Audit Full site scan ② Monitor Track changes ③ Fix Apply patches ④ Measure Check impact ⑤ Improve Score grows repeat every 30 days
AI visibility is not a one-time audit. Treat it as a continuous loop: audit, monitor, fix, measure, improve.

Limits of this checklist

This checklist covers signals that are in your control. It doesn't cover:

  • Training data inclusion — you can't directly control whether your site was included in a model's training corpus, only whether future crawls can access it.
  • Citation frequency — even a perfect score doesn't guarantee citations. Models have their own retrieval and selection logic, and it changes between versions.
  • Query intent matching — if your content doesn't actually answer the queries people are asking, technical optimization won't manufacture relevance.
  • Domain authority — established, heavily-cited domains have an advantage that a checklist won't close overnight.

Use this checklist to remove friction and make your site's signals accurate. Don't use it expecting guaranteed citation outcomes — no checklist can promise that.

Get a scored audit across all these categories

Instead of working through this manually, run the free audit: it checks your site across these eight categories and gives specific, actionable recommendations. No account required.

A GeoReady account lets you save reports, monitor domains over time, and track whether your changes improve your score. See pricing.

Further reading