When Your SpeechRater Dimensions Look Great But the Overall Score Is Meh

From an Elite user --

Hey John,

May I ask you a quick question? Why do I sometimes receive a great score on each dimension, but a mediocre overall SpeechRater task score?

Answer: Thanks for asking. I know how strange it feels when the dimensions look great, but the overall SpeechRater score lands lower. Here's why the mismatch happens, how to read it correctly, and what to do next.

TL;DR

The SpeechRater Dimension scores shown in My Speaking Score are percentiles.
The overall SpeechRater score is produced by a separate, proprietary model from ETS that uses raw features, weighting, and scaling.
Percentiles compare you to others in one slice of performance, while the overall score summarizes many underlying signals together.
Mismatches are uncommon but expected at the edges of the model. Use them as a prompt to strengthen natural delivery and stability.

First principle: percentiles are not raw scores

The Dimension numbers you see in My Speaking Score are percentiles, not the raw values ETS uses internally.

A percentile tells you how you rank relative to the population on that specific dimension.
The overall score is not an average of percentiles. It is generated by a separate model that blends many raw features.

This is why you can post several strong percentiles across dimensions and still land at a mediocre overall. The overall model is sensitive to patterns across features, not just the per-dimension rank snapshots.

How the overall SpeechRater score is built (at a high level)

ETS does not publish exact formulas, but in practice you should assume the overall score draws on:

Dozens of raw acoustic, prosodic, lexical, and structural features.
Weighting and scaling steps that balance these features.
Normalization to keep scores consistent across prompts and populations.

The result is a holistic estimate of speaking proficiency that may push the overall score up or down compared to your visible percentiles.

Why do “great dimensions + mediocre overall” cases happen?

Typical causes we see in outlier reports:

Feature interactions the percentile view cannot show
You might have high percentile ranks on individual dimensions, yet a particular combination of timing, rhythm, or error patterns depresses the overall score.
Stability and naturalness effects
ETS models reward natural, listener-friendly speech. If delivery feels rehearsed or “robotic,” it can underperform in the holistic model even when individual slices look strong.
Normalization and scaling
The overall model may compress or widen score ranges after comparing your full feature set to cohort norms, which can change where your final number lands.
True statistical outliers
Rarely, the system produces edge cases where visible percentiles and the overall score diverge more than expected. We track these to refine guidance.

How to interpret your report the right way

Use the overall score as the final proficiency estimate.
Use percentiles diagnostically to target practice. They tell you where you are stronger or weaker relative to others, not how the overall model will weight you.
Look for patterns over multiple responses, not one-offs. Consistency is a real signal.

SpeechRater Dimensions: what they mean and how to act

Construct	Dimension	What it Measures	Why it Matters	If Your Percentile Is Low	Practice Moves That Help
Delivery	Speaking Rate (SR)	Words per second	Controls clarity and listener effort	Speech sounds rushed or sluggish	Target ~140–160 wpm with short sentences; readback + record to calibrate pace
Delivery	Sustained Speech (SS)	Ability to keep talking without unnecessary breaks	Signals fluency and idea completion	Frequent stalls, broken phrases	60-second monologue drills; speak through commas, pause at periods
Delivery	Pause Frequency (PF)	How often you pause	Too many pauses increase effort	Choppy cadence	Breath timing drills; plan sentence “chunks” of 6–10 words
Delivery	Distribution of Pauses (DP)	Where pauses occur	Mid-phrase pauses disrupt comprehension	Stops in the middle of ideas	Mark pauses at clause boundaries; rehearse with punctuation
Delivery	Repetitions (Re)	Repeated words/phrases	Signals uncertainty and fills time	“I, I, I think…”	Slow the first second of each response; swap repeated words for synonyms
Delivery	Rhythm (Rh)	Cadence, stress, intonation	Natural rhythm improves coherence	Monotone or irregular stress	Shadow 30-second native clips; exaggerate pitch on keywords
Delivery	Vowels (Vo)	Vowel clarity and consistency	Core to intelligibility	Misheard words, unclear contrasts	Minimal pairs (“ship/sheep”); slow-motion practice then restore pace
Language Use	Vocabulary Depth (VDe)	Precision and appropriateness of word choice	Signals control and specificity	Generic or vague wording	Micro-lists of topic verbs and precise nouns; swap “good/bad” for specific descriptors
Language Use	Vocabulary Diversity (VDi)	Variety of unique words	Adds richness when used naturally	Recycling the same terms	Paraphrase drills: restate ideas with 3 alternative words
Language Use	Grammatical Accuracy (GA)	Correctness of forms and syntax	Directly affects clarity	Tense slips, agreement errors	Record and mark 3 errors per response; retell correctly once
Language Use	Grammatical Complexity (GC)	Average phrase/clause length and variety	Shows control over complex ideas	Only simple sentences	Add one modifier per sentence; use because/which/although once per response
Topic Development	Discourse Coherence (DC)	Organization and logical connections	Enables quick understanding	Ideas feel scattered	Use a 3-part template: claim → reason → example; repeat the claim in the last line

A practical way to reduce mismatches

Record 5 responses this week on different prompts.
Track three numbers per attempt: overall score, Speaking Rate, and Sustained Speech percentiles.
Listen for naturalness. Ask: would a friend find this engaging and easy to follow.
Fix one delivery lever at a time for two days each: pauses, rhythm, then rate.
Retest on fresh prompts. You should see tighter alignment between your overall and the pattern of percentiles.

Pro tip: practicing as if you are speaking to a real person is a strong move. It stabilizes rhythm and pause placement, two signals that often reconcile “great dimensions, mediocre overall” cases.

Common scenarios and what they mean

High Vocabulary Depth and Diversity, average overall
You sound varied, but delivery is jerky or grammar slips under pressure. Prioritize Sustained Speech and GA cleanup.
Great Speaking Rate percentile, low overall
Pace is fine, but rhythm and pause placement reduce intelligibility. Smooth the cadence and pause at clause boundaries.
Strong Delivery percentiles, lower overall
Useful sign that content structure is the bottleneck. Tighten Discourse Coherence using a fixed claim-reason-example template.

FAQ

Q1. If dimensions are percentiles, should I ignore them and look only at the overall score
No. Use the overall score as your proficiency estimate, and use percentiles to choose what to fix next. Percentiles are your practice roadmap.

Q2. Can I average my dimension percentiles to predict the overall
No. The overall score comes from a separate model with different inputs and weights. An average of percentiles will mislead you.

Q3. Why does my overall score sometimes swing more than my percentiles
The overall model considers interactions among features. Small shifts in timing or naturalness can move the overall more than any single dimension’s percentile.

Q4. Is Vocabulary Diversity a shortcut to a higher overall
Not on its own. Variety helps when it supports clarity and coherence, but delivery, accuracy, and organization carry significant weight in the holistic estimate.

Q5. What should I do first if I see a mismatch
Stabilize delivery: consistent pauses, steady rhythm, and a natural 140–160 wpm. Then fix one language lever (GA or GC), and finally tighten coherence with a simple template.

Q6. Do rehearsed scripts help
Scripts can raise certain percentiles, but the overall model rewards natural, connected delivery. Practice with bullet points and real-person intent rather than memorized lines.

The takeaway

Percentile dimensions are powerful diagnostics. The overall score is a separate, holistic estimate. When they diverge, assume the overall model is reacting to feature interactions you cannot see on the percentile dashboard. Use that signal to polish delivery, accuracy, and coherence. With a week or two of targeted practice, these mismatches usually disappear.