AI Development

How to Measure AI Visibility Across ChatGPT & Perplexity

May 27, 2026

11 min read

FNA Technology

Why measurement comes first
What AI visibility measurement actually tracks
Step 1: Build your prompt library
Step 2: Run prompts and capture outputs
Step 3: Score what you find
Step 4: Calculate share of voice
What good AI visibility data looks like
Common mistakes that break the measurement
How FNA measures AI visibility for clients

The short version: You can't improve AI visibility without measuring it first. This framework covers the four steps — prompt library, output capture, citation scoring, and share of voice calculation — that turn vague AI mentions into a trackable metric. Most brands skip measurement entirely and wonder why their AI presence doesn't improve.

Most brands have strong opinions about whether they "show up" in AI. Few have data. The ones who do are ahead of competitors who are still guessing.

Why measurement comes first

The instinct when you discover AI visibility is to start publishing content. More blog posts. Better structured pages. More FAQ sections.

That's not wrong. But it's backwards if you haven't measured first.

Without a baseline, you can't tell whether the content you're creating is moving anything. You can't see which topics you're invisible on versus which you're already included in. And you can't prioritise — so you write for the topics that feel important rather than the ones where you're losing ground.

Measurement solves all three problems. It tells you where you stand, where the gap is widest, and which changes are actually working.

What AI visibility measurement actually tracks

AI visibility measurement has three components. They're different from each other, and each one tells you something the others don't.

Inclusion rate — What percentage of responses to a given prompt mention your brand at all? This is the base metric. If ChatGPT mentions you in 2 out of 10 runs of the same prompt, your inclusion rate for that prompt is 20%.

Characterisation — When you are mentioned, how? Are you described as a leading option, a niche alternative, or dismissed with a caveat? Inclusion without favourable characterisation is weak signal. A mention that says "some users consider Brand X, though it has limited integrations" is worse than not being mentioned at all in some competitive contexts.

Share of voice (SoV) — Across all prompts in a topic cluster, what percentage of total brand mentions are yours versus competitors? If your category generates 200 brand mentions across a prompt set and your brand accounts for 18 of them, your SoV is 9%. That number is what you improve against week over week.

Metric	What it measures	How to use it
Inclusion rate	Whether you appear in responses	Identify invisible topics
Characterisation	How you're described when mentioned	Flag negative or weak mentions
Share of voice	Your mentions vs. competitors total	Track competitive position over time

Step 1: Build your prompt library

The prompt library is the foundation. Every other measurement depends on it being well-constructed.

What makes a good prompt

The best prompts are phrased the way real buyers ask AI tools — not the way marketers think about keywords.

"AI visibility tools" is a keyword. "What tools can I use to track whether my brand appears in AI answers?" is a prompt. The second one generates more useful data because it's closer to actual user behaviour.

Use these sources to build your initial library:

People Also Ask — Google's PAA boxes show real questions in your category
Perplexity's "Related" suggestions — what users ask after an initial query
Reddit and Quora threads — how people phrase questions to other people
Your sales call recordings — how buyers describe their problems before they know your solution

How many prompts you need

20 prompts is enough to start. 40 gives you enough data to segment by topic and intent. More than 60 and you're adding diminishing returns unless you have a very complex competitive set.

Organise prompts into topic clusters:

Code

Topic: Pricing and ROI
  - "How much does [solution category] typically cost?"
  - "Is [solution category] worth the investment for small businesses?"
  - "What's the ROI on [solution category]?"

Topic: Comparison and alternatives
  - "What are the best [solution category] tools?"
  - "[Competitor] alternatives"
  - "[Competitor] vs [Competitor] — which is better?"

Topic: Use cases
  - "[Solution category] for [specific industry]"
  - "How do companies use [solution category] to [specific outcome]?"

What to avoid in prompt construction

Don't name your brand in the prompt. This biases the response — you're testing discoverability, not direct recall.

Don't use prompts so broad they produce encyclopaedic answers rather than brand recommendations. "What is AI?" will not generate useful citation data.

Don't rotate prompts between measurement periods. Fix the library for at least 6 weeks. Changing prompts mid-measurement is like changing the questions between surveys — you lose comparability.

Step 2: Run prompts and capture outputs

Which engines to test

At minimum: ChatGPT (GPT-4o) and Perplexity. They behave differently and weight sources differently, so a brand can be strong in one and invisible in the other.

If your buyers use Google's AI Overviews heavily, add that. For enterprise clients in some regions, Gemini is worth including. In practice, most B2B measurement programmes track two to three engines.

How to run at scale

Manual testing works for an initial audit of 20 prompts. For anything ongoing, you need API access.

ChatGPT: OpenAI API with GPT-4o. Set temperature to 0.3 to 0.5 — low enough to get consistent responses, high enough to surface the natural variation the model would show real users. Run each prompt 3 times and record all three responses.

Perplexity: Perplexity API with the sonar-pro model. Run each prompt twice — Perplexity's live web access means responses vary more than ChatGPT's, so the second run helps catch variance.

What to record for each response

For each prompt run, capture:

Which brands were mentioned (exact names as the model stated them)
Whether your brand was mentioned (yes/no)
How your brand was characterised (positive, neutral, negative, with caveat)
Which URLs were cited as sources (if shown)
The position of your first mention (first brand named, second, third, not mentioned)

Don't paraphrase. Capture exact text around your brand mention. "X is a solid option for mid-market teams" and "X is sometimes used by mid-market teams" are different characterisations.

Step 3: Score what you find

Raw data is noisy. Scoring gives you something to act on.

Citation scoring model

Assign each mention a score:

Mention type	Score
Named as primary recommendation	3
Named alongside top competitors	2
Named with a caveat or limitation	1
Not mentioned	0
Mentioned negatively	-1

Run this for every prompt, every run, every engine. The resulting matrix tells you:

Which topics you're scoring well on
Which topics need content work
Which topics have caveat problems (you're mentioned but being damped)

The caveat category is underrated. A lot of brands are technically being mentioned in AI answers but with language that reduces purchase intent. "Brand X is popular but some users find the onboarding complex" is a mention. It's not a good one.

Consistency score

Because AI models are probabilistic, run each prompt multiple times and measure how consistently you appear. If you show up in 1 out of 3 runs, that's low consistency — meaning you're on the edge of the model's consideration set for that topic.

Code

Consistency = (runs where brand was mentioned) / (total runs)

High SoV but low consistency means something. It means you appear in some responses for that topic but not reliably — which often means your on-site content is partially structured correctly but missing elements that lock in consistent inclusion.

Step 4: Calculate share of voice

SoV is the number you report on. Everything else feeds into it.

The calculation

For a given topic cluster over a measurement period:

Code

SoV = (your brand mentions across all prompt runs) / (total brand mentions across all prompt runs) × 100

If 5 competitors generated 240 total brand mentions across your prompt set and your brand generated 22, your SoV is 9.2%.

Calculate SoV per topic cluster and per AI engine. Your SoV on "pricing" questions might be 18% on Perplexity but 4% on ChatGPT. That gap tells you exactly where to focus: ChatGPT draws on different sources for pricing content, and your current content isn't in them.

What to compare against

Three benchmarks matter:

Your own historical baseline — is your SoV moving week over week?
Primary competitors — are you gaining or losing ground relative to the 2 or 3 brands you most often compete against in deals?
Topic-level spread — which of your topic clusters has the lowest SoV? That's where lost deals are hiding.

The topic-level spread is where most brands find the biggest gap between their self-perception and reality. A company might feel strong on "security" positioning and discover their AI SoV on security-related prompts is 3%, because the sources AI uses for security content don't include them at all.

What good AI visibility data looks like

A well-structured measurement output gives you a table like this per measurement period:

Topic cluster	Your SoV	Top competitor SoV	Your inclusion rate	Avg. characterisation score
Pricing & ROI	12%	31%	40%	1.8
Use cases	24%	28%	70%	2.4
Comparisons	5%	44%	20%	1.2
Technical integration	31%	19%	80%	2.7

That table tells a clear story: strong on technical integration, nearly invisible on comparison prompts, a characterisation problem on pricing. The actions follow directly.

Common mistakes that break the measurement

Rotating prompts. If you change the prompt set between measurement periods, your data is not comparable. Treat the library as fixed infrastructure, not a variable.

Running prompts only once. AI models have natural variance. A single run per prompt tells you what the model said once, not what it typically says. Run each prompt at least 3 times.

Counting mentions without scoring characterisation. A mention is not a good mention by default. Track how you're being described, not just whether you appear.

Measuring too many engines at once without enough prompts. If you have 10 prompts split across 5 AI engines, you have 2 data points per engine. That's not enough signal. Go deep on 2 engines before expanding.

Not separating topic clusters. A single aggregate SoV number hides which topics you're invisible on. Always break down by cluster.

How FNA measures AI visibility for clients

At FNA Technology, AI visibility measurement is part of our AI development services — built into the weekly reporting cycle, not treated as a one-off audit.

The setup takes about two weeks: we build a 30 to 40 prompt library based on your category, competitive set, and buyer questions, then run the first baseline measurement across ChatGPT and Perplexity. From there, measurement runs weekly. The cost is under $3 per week in API usage for a typical client.

The output is a prioritised action list: which topic clusters need content created, which existing pages need structural updates, and where your off-site presence is missing from the sources AI draws on. The score tells you where you are. The action list tells you what to change to move it.

Want to see where your brand stands in AI answers right now? Run a visibility audit with our team and we'll show you your baseline AI share of voice across your category before we do anything else.

Frequently Asked Questions

It tracks how often your brand is mentioned in AI-generated answers across tools like ChatGPT and Perplexity — broken down by topic, prompt type, and engine. Unlike traditional SEO metrics (rankings, clicks), AI visibility measurement focuses on inclusion: were you named, how were you characterised, and how often versus competitors? The output is a share of voice score per topic and per AI engine.

In our experience, 20 to 40 prompts per topic cluster gives you enough signal to detect meaningful patterns. Fewer than 10 prompts produces noisy data — one phrasing change can swing results significantly. We recommend a fixed prompt library held constant for at least 6 weeks before drawing conclusions. This lets you compare week-over-week rather than just taking snapshots.

Yes, but it's slow and hard to scale. Manual testing — entering prompts into ChatGPT and Perplexity and recording results in a spreadsheet — works for a quick initial audit. For ongoing tracking across 30+ prompts, 3 AI engines, and weekly cadence, API access to the models is necessary. The cost is low: around $1 to $3 per week for a mid-size client prompt library.

The framework is the same — prompt library, citation tracking, share of voice scoring — but the prompt topics and competitive set differ by industry. A software company tracks prompts like 'best project management tool for remote teams'. A professional services firm tracks prompts like 'top accounting firms in Dubai'. The structure transfers. The inputs are industry-specific.

It varies by AI system. Perplexity indexes content faster — sometimes within days for newly published pages. ChatGPT's training data updates on a slower cycle, so structural changes often take 4 to 12 weeks to show up in responses. This means AI visibility is a lagging indicator. You should expect a gap between publishing optimised content and seeing movement in your scores.

#AI visibility measurement#measure AI visibility#ChatGPT brand mentions#Perplexity visibility tracking#AI share of voice#generative engine optimisation#GEO metrics#AI search tracking#brand visibility AI#AI citation tracking

Share this article:

Written by

FNA Technology

Team Member at FNA Technology

FNA Technology is a software development company specializing in AI, mobile apps, and web solutions.

Work with us

AI Development

How to Measure AI Visibility Across ChatGPT & Perplexity

May 27, 2026

11 min read

FNA Technology

Why measurement comes first
What AI visibility measurement actually tracks
Step 1: Build your prompt library
Step 2: Run prompts and capture outputs
Step 3: Score what you find
Step 4: Calculate share of voice
What good AI visibility data looks like
Common mistakes that break the measurement
How FNA measures AI visibility for clients

The short version: You can't improve AI visibility without measuring it first. This framework covers the four steps — prompt library, output capture, citation scoring, and share of voice calculation — that turn vague AI mentions into a trackable metric. Most brands skip measurement entirely and wonder why their AI presence doesn't improve.

Most brands have strong opinions about whether they "show up" in AI. Few have data. The ones who do are ahead of competitors who are still guessing.

Why measurement comes first

The instinct when you discover AI visibility is to start publishing content. More blog posts. Better structured pages. More FAQ sections.

That's not wrong. But it's backwards if you haven't measured first.

Measurement solves all three problems. It tells you where you stand, where the gap is widest, and which changes are actually working.

What AI visibility measurement actually tracks

AI visibility measurement has three components. They're different from each other, and each one tells you something the others don't.

Metric	What it measures	How to use it
Inclusion rate	Whether you appear in responses	Identify invisible topics
Characterisation	How you're described when mentioned	Flag negative or weak mentions
Share of voice	Your mentions vs. competitors total	Track competitive position over time

Step 1: Build your prompt library

The prompt library is the foundation. Every other measurement depends on it being well-constructed.

What makes a good prompt

The best prompts are phrased the way real buyers ask AI tools — not the way marketers think about keywords.

Use these sources to build your initial library:

People Also Ask — Google's PAA boxes show real questions in your category
Perplexity's "Related" suggestions — what users ask after an initial query
Reddit and Quora threads — how people phrase questions to other people
Your sales call recordings — how buyers describe their problems before they know your solution

How many prompts you need

20 prompts is enough to start. 40 gives you enough data to segment by topic and intent. More than 60 and you're adding diminishing returns unless you have a very complex competitive set.

Organise prompts into topic clusters:

Code

Topic: Pricing and ROI
  - "How much does [solution category] typically cost?"
  - "Is [solution category] worth the investment for small businesses?"
  - "What's the ROI on [solution category]?"

Topic: Comparison and alternatives
  - "What are the best [solution category] tools?"
  - "[Competitor] alternatives"
  - "[Competitor] vs [Competitor] — which is better?"

Topic: Use cases
  - "[Solution category] for [specific industry]"
  - "How do companies use [solution category] to [specific outcome]?"

What to avoid in prompt construction

Don't name your brand in the prompt. This biases the response — you're testing discoverability, not direct recall.

Don't use prompts so broad they produce encyclopaedic answers rather than brand recommendations. "What is AI?" will not generate useful citation data.

Don't rotate prompts between measurement periods. Fix the library for at least 6 weeks. Changing prompts mid-measurement is like changing the questions between surveys — you lose comparability.

Step 2: Run prompts and capture outputs

Which engines to test

At minimum: ChatGPT (GPT-4o) and Perplexity. They behave differently and weight sources differently, so a brand can be strong in one and invisible in the other.

If your buyers use Google's AI Overviews heavily, add that. For enterprise clients in some regions, Gemini is worth including. In practice, most B2B measurement programmes track two to three engines.

How to run at scale

Manual testing works for an initial audit of 20 prompts. For anything ongoing, you need API access.

Perplexity: Perplexity API with the sonar-pro model. Run each prompt twice — Perplexity's live web access means responses vary more than ChatGPT's, so the second run helps catch variance.

What to record for each response

For each prompt run, capture:

Which brands were mentioned (exact names as the model stated them)
Whether your brand was mentioned (yes/no)
How your brand was characterised (positive, neutral, negative, with caveat)
Which URLs were cited as sources (if shown)
The position of your first mention (first brand named, second, third, not mentioned)

Don't paraphrase. Capture exact text around your brand mention. "X is a solid option for mid-market teams" and "X is sometimes used by mid-market teams" are different characterisations.

Step 3: Score what you find

Raw data is noisy. Scoring gives you something to act on.

Citation scoring model

Assign each mention a score:

Mention type	Score
Named as primary recommendation	3
Named alongside top competitors	2
Named with a caveat or limitation	1
Not mentioned	0
Mentioned negatively	-1

Run this for every prompt, every run, every engine. The resulting matrix tells you:

Which topics you're scoring well on
Which topics need content work
Which topics have caveat problems (you're mentioned but being damped)

Consistency score

Code

Consistency = (runs where brand was mentioned) / (total runs)

Step 4: Calculate share of voice

SoV is the number you report on. Everything else feeds into it.

The calculation

For a given topic cluster over a measurement period:

Code

SoV = (your brand mentions across all prompt runs) / (total brand mentions across all prompt runs) × 100

If 5 competitors generated 240 total brand mentions across your prompt set and your brand generated 22, your SoV is 9.2%.

What to compare against

Three benchmarks matter:

Your own historical baseline — is your SoV moving week over week?
Primary competitors — are you gaining or losing ground relative to the 2 or 3 brands you most often compete against in deals?
Topic-level spread — which of your topic clusters has the lowest SoV? That's where lost deals are hiding.

What good AI visibility data looks like

A well-structured measurement output gives you a table like this per measurement period:

Topic cluster	Your SoV	Top competitor SoV	Your inclusion rate	Avg. characterisation score
Pricing & ROI	12%	31%	40%	1.8
Use cases	24%	28%	70%	2.4
Comparisons	5%	44%	20%	1.2
Technical integration	31%	19%	80%	2.7

That table tells a clear story: strong on technical integration, nearly invisible on comparison prompts, a characterisation problem on pricing. The actions follow directly.

Common mistakes that break the measurement

Rotating prompts. If you change the prompt set between measurement periods, your data is not comparable. Treat the library as fixed infrastructure, not a variable.

Running prompts only once. AI models have natural variance. A single run per prompt tells you what the model said once, not what it typically says. Run each prompt at least 3 times.

Counting mentions without scoring characterisation. A mention is not a good mention by default. Track how you're being described, not just whether you appear.

Not separating topic clusters. A single aggregate SoV number hides which topics you're invisible on. Always break down by cluster.

How FNA measures AI visibility for clients

At FNA Technology, AI visibility measurement is part of our AI development services — built into the weekly reporting cycle, not treated as a one-off audit.

Want to see where your brand stands in AI answers right now? Run a visibility audit with our team and we'll show you your baseline AI share of voice across your category before we do anything else.

Frequently Asked Questions

Share this article:

Written by

FNA Technology

Team Member at FNA Technology

FNA Technology is a software development company specializing in AI, mobile apps, and web solutions.

Work with us

How to Measure AI Visibility Across ChatGPT & Perplexity

Table of Contents

Why measurement comes first

What AI visibility measurement actually tracks

Step 1: Build your prompt library

What makes a good prompt

How many prompts you need

What to avoid in prompt construction

Step 2: Run prompts and capture outputs

Which engines to test

How to run at scale

What to record for each response

Step 3: Score what you find

Citation scoring model

Consistency score

Step 4: Calculate share of voice

The calculation

What to compare against

What good AI visibility data looks like

Common mistakes that break the measurement

How FNA measures AI visibility for clients

Frequently Asked Questions

FNA Technology

How to Measure AI Visibility Across ChatGPT & Perplexity

Table of Contents

Why measurement comes first

What AI visibility measurement actually tracks

Step 1: Build your prompt library

What makes a good prompt

How many prompts you need

What to avoid in prompt construction

Step 2: Run prompts and capture outputs

Which engines to test

How to run at scale

What to record for each response

Step 3: Score what you find

Citation scoring model

Consistency score

Step 4: Calculate share of voice

The calculation

What to compare against

What good AI visibility data looks like

Common mistakes that break the measurement

How FNA measures AI visibility for clients

Frequently Asked Questions

FNA Technology