From an Elite user --
Hey John,
May I ask you a quick question? Why do I sometimes receive a great score on each dimension, but a mediocre overall SpeechRater task score?
Answer: Thanks for asking. I know how strange it feels when the dimensions look great, but the overall SpeechRater score lands lower. Here's why the mismatch happens, how to read it correctly, and what to do next.
TL;DR
- The SpeechRater Dimension scores shown in My Speaking Score are percentiles.
- The overall SpeechRater score is produced by a separate, proprietary model from ETS that uses raw features, weighting, and scaling.
- Percentiles compare you to others in one slice of performance, while the overall score summarizes many underlying signals together.
- Mismatches are uncommon but expected at the edges of the model. Use them as a prompt to strengthen natural delivery and stability.
First principle: percentiles are not raw scores
The Dimension numbers you see in My Speaking Score are percentiles, not the raw values ETS uses internally.
- A percentile tells you how you rank relative to the population on that specific dimension.
- The overall score is not an average of percentiles. It is generated by a separate model that blends many raw features.
This is why you can post several strong percentiles across dimensions and still land at a mediocre overall. The overall model is sensitive to patterns across features, not just the per-dimension rank snapshots.
How the overall SpeechRater score is built (at a high level)
ETS does not publish exact formulas, but in practice you should assume the overall score draws on:
- Dozens of raw acoustic, prosodic, lexical, and structural features.
- Weighting and scaling steps that balance these features.
- Normalization to keep scores consistent across prompts and populations.
The result is a holistic estimate of speaking proficiency that may push the overall score up or down compared to your visible percentiles.
Why do “great dimensions + mediocre overall” cases happen?
Typical causes we see in outlier reports:
- Feature interactions the percentile view cannot show
You might have high percentile ranks on individual dimensions, yet a particular combination of timing, rhythm, or error patterns depresses the overall score. - Stability and naturalness effects
ETS models reward natural, listener-friendly speech. If delivery feels rehearsed or “robotic,” it can underperform in the holistic model even when individual slices look strong. - Normalization and scaling
The overall model may compress or widen score ranges after comparing your full feature set to cohort norms, which can change where your final number lands. - True statistical outliers
Rarely, the system produces edge cases where visible percentiles and the overall score diverge more than expected. We track these to refine guidance.
How to interpret your report the right way
- Use the overall score as the final proficiency estimate.
- Use percentiles diagnostically to target practice. They tell you where you are stronger or weaker relative to others, not how the overall model will weight you.
- Look for patterns over multiple responses, not one-offs. Consistency is a real signal.
SpeechRater Dimensions: what they mean and how to act
A practical way to reduce mismatches
- Record 5 responses this week on different prompts.
- Track three numbers per attempt: overall score, Speaking Rate, and Sustained Speech percentiles.
- Listen for naturalness. Ask: would a friend find this engaging and easy to follow.
- Fix one delivery lever at a time for two days each: pauses, rhythm, then rate.
- Retest on fresh prompts. You should see tighter alignment between your overall and the pattern of percentiles.
Pro tip: practicing as if you are speaking to a real person is a strong move. It stabilizes rhythm and pause placement, two signals that often reconcile “great dimensions, mediocre overall” cases.
Common scenarios and what they mean
- High Vocabulary Depth and Diversity, average overall
You sound varied, but delivery is jerky or grammar slips under pressure. Prioritize Sustained Speech and GA cleanup. - Great Speaking Rate percentile, low overall
Pace is fine, but rhythm and pause placement reduce intelligibility. Smooth the cadence and pause at clause boundaries. - Strong Delivery percentiles, lower overall
Useful sign that content structure is the bottleneck. Tighten Discourse Coherence using a fixed claim-reason-example template.
FAQ
Q1. If dimensions are percentiles, should I ignore them and look only at the overall score
No. Use the overall score as your proficiency estimate, and use percentiles to choose what to fix next. Percentiles are your practice roadmap.
Q2. Can I average my dimension percentiles to predict the overall
No. The overall score comes from a separate model with different inputs and weights. An average of percentiles will mislead you.
Q3. Why does my overall score sometimes swing more than my percentiles
The overall model considers interactions among features. Small shifts in timing or naturalness can move the overall more than any single dimension’s percentile.
Q4. Is Vocabulary Diversity a shortcut to a higher overall
Not on its own. Variety helps when it supports clarity and coherence, but delivery, accuracy, and organization carry significant weight in the holistic estimate.
Q5. What should I do first if I see a mismatch
Stabilize delivery: consistent pauses, steady rhythm, and a natural 140–160 wpm. Then fix one language lever (GA or GC), and finally tighten coherence with a simple template.
Q6. Do rehearsed scripts help
Scripts can raise certain percentiles, but the overall model rewards natural, connected delivery. Practice with bullet points and real-person intent rather than memorized lines.
The takeaway
Percentile dimensions are powerful diagnostics. The overall score is a separate, holistic estimate. When they diverge, assume the overall model is reacting to feature interactions you cannot see on the percentile dashboard. Use that signal to polish delivery, accuracy, and coherence. With a week or two of targeted practice, these mismatches usually disappear.