The Truth About TOEFL Speaking Scoring (From a Former ETS Rater)

On June 21st, I hosted a webinar with my favorite partner in crime, Nathan Mills. If you don’t know Nathan, he spent 10 years scoring TOEFL Speaking responses at ETS, and together we’ve seen close to 150,000 student responses. This session was about truth vs. myth—what’s actually happening behind the scenes in scoring, and what test-takers need to do if they’re serious about getting a 26+.

You can watch the whole thing here on Vimeo: https://vimeo.com/1095279490/51eedb2b66?share=copy (note, there is a very emotional part of the webinar and I have the participant's permission to share).

Here’s what we covered—and why it matters.

❌ The Myths That Keep You Stuck

We opened the webinar with some tough love: Most people under-resource their prep. They spend too little time, too little money, and they chase magic bullets like changing test centers or re-scoring requests. Nathan and I have seen this over and over.

The worst part? Many decisions are made based on myths.

Let’s deal with a few of the big ones.

🎯 Myth vs. Fact: The Scoring System

Here’s a breakdown of what people believe vs. how TOEFL Speaking actually works.

Myth Truth
Test centers affect your score Scores are calculated remotely and randomly. The center has zero impact.
Raters can have bad days and punish you ETS uses strict calibration protocols before and during shifts. If raters deviate, they’re pulled.
Rescoring gives you a better chance Re-scoring has a success rate <20%. Human raters don’t know if it’s a re-score.
You need 2 reasons in Task 1 One well-developed reason is not only fine—it’s better.
You must speak loudly to be heard Shouting makes responses unscorable. Natural, clear delivery is the goal.
Accents hurt your score They don’t—unless they affect intelligibility.

Let’s go deeper.

🔍 Behind the Curtain: How Scoring Really Works

Nathan gave us an inside look at the ETS rater workflow. It’s more rigorous than most people realize:

  • Raters calibrate before every shift. They have to pass mini-tests to continue scoring.
  • Each response is heard by two raters. If they disagree, a scoring leader reviews it.
  • No rater scores more than two tasks per test-taker, and they don’t know who you are.
  • Raters never see the SpeechRater score and don’t know if a response is being re-scored.

Bottom line: The system is designed to be fair. If you’re not hitting your target, it’s probably not bias—it’s more likely a gap in delivery, development, or grammar that you haven’t fixed yet.

🛠️ Task 1: Strategy, Not Templates

Let’s talk about the most misunderstood part of TOEFL Speaking: Task 1.

Nathan destroyed the two-reason myth. He explained that you only need one clear, well-developed reason. His advice?

  • Use a value word (e.g., "important," "efficient," "practical").
  • Support it with real detail: examples, cause-effect, or short stories.
  • Avoid generic filler like “it’s helpful” or “it’s good.”

We call this going to the “third layer.” That’s what gets you a 4.

⚠️ Grammar, Pronunciation, and Automaticity

Nathan walked us through what human raters look for:

  • They don’t count grammar errors like a math quiz.
  • But patterns of mistakes (e.g., dropping -ed endings repeatedly) lower your score.
  • Pronunciation is judged based on clarity, not accent.
  • The gold standard is automaticity—speech that feels fluent and spontaneous.

One more thing: Don’t shout. It doesn’t help. In fact, it can make your audio unscorable by SpeechRater and frustrate human raters.

📦 Why Your Prep Might Not Be Working

I shared a case study of a test-taker who used 133 SpeechRater reports and 5 hours of coaching with Nathan, spent about $10 a day—and jumped to a 27.

We see this all the time. Most people aren’t scoring low because they’re incapable. They’re scoring low because they’re not investing enough effort, time, or intensity in the right areas.

This test is beatable. But you need real practice, real feedback, and a disciplined plan.

❓FAQs from the Webinar

Q: Why didn’t my score change after re-scoring?

A: Less than 20% of re-scores change. Raters are trained to score the same way, and they don’t know it’s a re-score.

Q: Does the test center I choose matter?

A: No. Scoring is randomized. Choose the most comfortable environment for you, but it won’t affect the numbers.

Q: Can I get a 4 with an accent?

A: Yes. As long as your words are clear and understandable, accent is not penalized.

Q: How many grammar errors are allowed?

A: It’s about patterns. One or two minor errors? Fine. But if you consistently mess up tense or agreement, your Language Use score drops.

Q: Is Task 1 more important than the others?

A: No. Each task counts the same.

Q: Can I be scored by the same rater across all tasks?

A: Nope. No more than 2 tasks per rater. And they don’t know it’s you.

Q: What happens if I don’t finish Task 4?

A: Your Topic Development score drops, but if your Delivery and Language Use are strong, you can still get a 3.

Q: If I get a 4–4–3, is that still a 4?

A: No. That’s a 3 overall. You need 4s across all three categories to get a 4.

🎯 Final Takeaways

  1. The scoring system is fair—but demanding.
  2. Task 1 requires depth, not a formula.
  3. Grammatical consistency matters more than perfection.
  4. Your mindset matters. Stop looking for shortcuts. Build the skill.
  5. SpeechRater + coaching = a serious edge.

If you want to build a disciplined plan around these insights—or you’re not sure how to break out of the 24/25 loop—reach out via our socials or help desk. We’ve seen hundreds of transformation stories, and we’d love to help write yours.

— John