What 1,200 TOEFL Speaking Questions Reveal About the New Test

I went through about 1,200 TOEFL Speaking questions from our chat logs since the new TOEFL launched a couple of months ago.

The questions cluster into three areas:

scoring
Listen & Repeat
Interview performance

That distribution matters.

It shows where the friction is.

Most users are not asking broad questions about English. They are trying to understand the mechanics of the new Speaking section: how it is scored, why one task drops, what counts as a real mistake, and what they should do under immediate-response conditions.

About half of our users are new to this version of the TOEFL, so there are fewer legacy-style questions about templates and integrated-task habits. But those questions have not disappeared. We still get daily questions about how the new scale relates to the old one, and some users are returning after years away from the test.

Below is the clearest pattern I found, along with answers.

1) Scoring is the biggest source of confusion

This was the number one theme by far.

People keep asking:

Why did my score drop from 4 to 3.5 from one task to the next?
What exactly caused this response to get a 3 instead of a 4?
How much do pauses affect my score?
Do small grammar mistakes lower my score, or is it mostly fluency?
How strict is the scoring if I miss a few words in Listen & Repeat?
What score do I need on the new scale if I want a 26 on the old scale?
How close was I to the next band?
Does speaking faster improve my score?
How much do fillers like “um” and “uh” hurt me?
Is pronunciation more important than grammar?
Do I lose points if I finish early?
Why is my Listen & Repeat score lower than Interview?
Are both tasks weighted equally?

What these questions tell us

Users want causal clarity.

They do not want a score in isolation. They want to know what moved it.

That is a major difference from older TOEFL prep conversations. Under the legacy format, users often asked how to structure an answer or how to memorize useful language. Under the new format, a much larger share of the conversation starts after the response is finished.

Answers

Why did my score drop from one task to the next?
Because the tasks measure different things. Listen & Repeat rewards accurate recall plus intelligible delivery. Interview rewards clarity, pace, development, and language control in spontaneous speech. A user can sound strong in Interview and still lose points in Listen & Repeat because of recall slips, delayed starts, omitted words, or rhythm problems.

What makes a 3 different from a 4?
Usually not one dramatic mistake. More often it is accumulation: a few hesitations, a weak stretch of delivery, limited elaboration, reduced control, or some pronunciation strain that starts to affect clarity. On Listen & Repeat, small changes that preserve meaning can still be acceptable, but once the response becomes incomplete, inaccurate, or hard to understand, the score drops more sharply.

How much do pauses matter?
Pauses matter when they disrupt continuity. Short natural pauses are normal. Frequent pauses, long pauses, and restart-heavy delivery hurt flow and make the response sound less controlled. In the Interview rubric, choppy pace and frequent filler words are associated with lower performance.

Do grammar mistakes matter?
Yes, but not all grammar mistakes matter equally. Minor grammar problems do less damage when the message stays clear. Repeated errors that reduce precision or make the response harder to follow matter more.

Does speaking faster improve the score?
No. Faster speech helps only if it remains clear and controlled. Once speed starts creating blur, swallowed words, missing endings, or unstable rhythm, it works against the speaker.

Do fillers hurt?
A few fillers are normal. Frequent fillers signal hesitation and can damage fluency.

Do I lose points if I finish early?
Sometimes. Finishing early often means there was not enough development, especially in the Interview task. That does not mean every shorter answer is weak, but many early-finished responses lose points because they are underdeveloped.

Why is Listen & Repeat lower than Interview?
Because the skill demand is different. Many users can speak about their own ideas more comfortably than they can reproduce heard language accurately under time pressure.

Are both tasks weighted equally?
The exact operational weighting is not usually explained to users in public-facing materials. Practically, though, both task types matter, so a weak Listen & Repeat performance can pull the overall Speaking result down even when Interview performance is stronger.

How does the new scale connect to the old one?
This remains one of the biggest transition questions. Users still think in old-score targets like 25 or 26. ETS materials now frame Speaking on the new banded scale, and users want translation because universities, agents, and personal goals still live partly in the old system. That conversion question is not going away anytime soon. The new test overview and technical manual describe the 2026 scoring framework and CEFR alignment.

2) Listen & Repeat creates a lot of friction

The questions here are more specific:

How am I supposed to start speaking immediately?
Do I need to repeat everything exactly?
Should I start speaking immediately or wait a second?
Why is Listen & Repeat so much harder than Interview?
I repeated every word but my score is not perfect. Why?
What happens if I miss one word?
Can I rephrase or do I have to copy exactly?
Do I need to mimic the accent of the speaker?
What if I understand the sentence but can’t remember it fully?
Does pronunciation matter more than accuracy here?
How fast should I speak in this section?
What causes a big score drop in Listen & Repeat?
Why does this feel harder than it looks?

What these questions tell us

This task looks simple and feels technical.

Users hear a sentence and assume the job is memory. It is more than memory.

Listen & Repeat combines:

auditory processing
short-term retention
speech timing
intelligibility
control under immediate response conditions

That combination catches people off guard.

Answers

How am I supposed to start speaking immediately?
You need a fast start habit. That means no internal translation, no search for synonyms, and no attempt to redesign the sentence. The task rewards rapid auditory capture and accurate production.

Do I need to repeat everything exactly?
For the top score, the response should be an exact repetition. The scoring guide for Listen & Repeat says a 5 is an exact, fully intelligible repetition. A 4 can still capture the meaning with minor wording or grammar changes that do not substantially change meaning. Larger losses in completeness or accuracy drop the score further.

Should I start immediately or wait a second?
Do not build a visible delay into your response. A brief natural reaction time is fine. Waiting too long creates a broken start and increases the chance of losing the sentence entirely.

What if I miss one word?
One missed word does not automatically destroy the score. The effect depends on which word it is, whether meaning is preserved, and whether the response remains full and intelligible.

Can I rephrase?
Minor rewording may still keep you in a strong range if the meaning remains intact, but rephrasing is risky because the task is built around repetition, not paraphrase.

Do I need to mimic the accent?
No. You need to be intelligible. Accent imitation is irrelevant.

Does pronunciation matter more than accuracy here?
Both matter. You can lose points for inaccurate repetition, and you can lose points if pronunciation reduces intelligibility.

How fast should I speak?
Fast enough to sound natural, slow enough to stay clear. The target is controlled delivery, not speed for its own sake.

What causes a big score drop?
Three common causes:

incomplete repetition
distorted meaning
low intelligibility

That is exactly how the scoring guide separates higher and lower performance.

Why does this feel harder than it looks?
Because it compresses listening, memory, and speaking into one fast action. There is no planning buffer.

3) Interview questions are about execution

This cluster is more familiar to anyone who has taught speaking:

How do I avoid pauses and hesitation?
How can I speak for 45 seconds without stopping?
What should I do if I don’t know what to say?
How do I organize my answer quickly?
How do I keep speaking when I run out of ideas?
What’s the ideal speaking speed?
How do I recover if I make a mistake?
What should I say exactly during the response?
Can I say the same thing across all tasks?

What these questions tell us

Users are feeling the pressure of zero-prep speaking.

The Interview task is not producing confusion at the same rate as Listen & Repeat, but it creates a different problem: sustained spontaneous production. Users are trying to manage time, pacing, and idea development without the old planning habits they may have used for legacy TOEFL tasks.

Answers

How do I avoid pauses and hesitation?
Use a simple answer pattern. For example:

answer the question directly
give one reason
add one example
close cleanly

That structure reduces search time.

How can I speak for 45 seconds without stopping?
Do not try to sound advanced. Aim to sound continuous. One clear idea plus one concrete example is usually enough to fill the time better than three weak ideas.

What should I do if I don’t know what to say?
Choose something ordinary and develop it. Users waste time looking for the perfect idea. The test rewards clear delivery and support, not originality.

How do I organize quickly?
Think in units, not full sentences:

answer
reason
example
conclusion

How do I keep speaking when I run out of ideas?
Extend the example. Add a result, comparison, or detail from personal experience.

What’s the ideal speaking speed?
There is no magical number that saves a weak response. The goal is steady pace with clear words and manageable pauses.

How do I recover if I make a mistake?
Correct it quickly if needed and continue. Long self-repair chains create more damage than the original error.

What should I say exactly during the response?
There is no single correct content template. The better question is: how can I keep the response direct, supported, and easy to follow?

Can I say the same thing across all tasks?
Not effectively. Recycled content tends to sound thin or mismatched. The task may tolerate generic examples, but strong responses feel connected to the question asked.

What all of this means

Three things stand out.

First, scoring literacy is now a major part of TOEFL prep

Users want a model of cause and effect. They want to know what changed, what mattered, and what to fix. That means score explanation is no longer a side issue. It is part of prep.

Second, Listen & Repeat is the main adjustment shock

Users underestimate it. They expect a simple imitation task and discover that it is a compressed performance task with almost no room to hide.

Third, the old TOEFL is still alive in users’ heads

Even after the new test launch, people continue asking old-to-new questions:

Is this like a 26?
What does a 4 or 5 mean in older terms?
How should I explain this to a university?

That transition issue will likely continue through the next couple of years.

The practical takeaway

The biggest question under all of these questions is simple:

Why did I get this score?

If a test-taker can answer that clearly, the next step becomes easier.

If they cannot, every new attempt feels random.

Theme	Typical User Questions	What the Questions Reveal	Best Short Answer
Scoring	Why did my score drop from 4 to 3.5? How much do pauses affect my score? Does finishing early hurt me? Why is one task lower than the others?	Users want cause-and-effect clarity. They do not want a score alone. They want to know what moved it.	Small changes in fluency, clarity, completeness, and control can move the score. Different tasks also reward different strengths.
Listen & Repeat	Do I need to repeat everything exactly? What happens if I miss one word? Why is this harder than Interview? Should I start speaking immediately?	Users underestimate the task. It looks simple, but it combines listening, memory, and speech control under time pressure.	A top response is exact and intelligible. Minor changes may still score well, but omissions, distortions, and weak intelligibility cause drops.
Interview	How do I speak for 45 seconds? How do I avoid hesitation? What do I do if I run out of ideas? How do I recover from a mistake?	Users are feeling the pressure of zero-prep speaking and want ways to stay organized while speaking spontaneously.	Use a simple response pattern: answer, reason, example, close. Clear development matters more than trying to sound advanced.
Old vs New Scale	What score do I need on the new scale if I want a 26? Is a 4 or 5 like a 25 or 26?	Users still think in legacy targets because universities, agents, and personal goals still often use the old score language.	The new test uses a different scoring framework, so users need interpretation support. This transition question is still very active.