Noise – Chasing Tales

You know that feeling when you see the perfect ball? You may not even see who the bowler is from the camera angle, but you see the ball, and it’s… beautiful and lethal and just. Like a song that hits the spot. It’s the physical manifestation of joy.

So when a bowler bowls such a ball, is that because they are in really good form, or is that just how they bowl in general? That has to be different for every bowler, right? And yes this is about bowlers because I just saw Mitch Starc bowl and the bowling. The bowling. Poetry.

This piece asks a rude question: If you took a bowler whose underlying ability wasn’t changing from match to match, and just let randomness do its thing, how often would you see those streaks anyway? Also, what does “form” mean for a fan watching from the outside, and what does it mean for the bowler inside the game?

Why is this about bowlers and not batters?
When a batter scores runs, the runs are directly and unambiguously attributable to them. They chose the shot, they middled it or they didn’t, it went where they hit it or it didn’t. A batter who scores 80 runs scored 80 runs. The metric accumulates continuously across the innings- every ball is a data point, not just the ones that produce wickets. So in a single innings you might have 150-200 data points building toward the final score. That’s a meaningful sample size from one performance.

A bowler’s primary success metric- wickets- is a joint event. It requires the bowler to produce a good delivery AND the batter to fail to handle it. Even a perfect delivery can be survived if the batter gets a thick outside edge that falls short of slip. A rank bad ball can produce a wicket if the batter top-edges a slog. The wicket isn’t just measuring the bowler. It’s measuring the bowler plus the batter plus the fielding plus a slice of luck.

And wickets are rare. A good Test bowler might take one wicket every 40–60 balls on average.¹ ² That means in a full spell you might have one, maybe two wickets to evaluate. That’s almost no data, and like I said above, the noise in a single wicket is enormous relative to the signal because it’s a joint event.

You might think economy rate solves this because it’s not as rare as wickets because it accumulates ball by ball like a batting score. But economy rate is contaminated by things the bowler doesn’t control: fielding (a misfield gives away four runs the bowler didn’t deserve), the batter’s attacking intent (a batter in T20 mode will score off good balls that a Test batter would defend), and conditions (a wet outfield makes everything travel faster to the boundary). A batter’s score is more directly their own than a bowler’s economy rate is their own.

A batter who scores a hundred has, by definition, not gotten out for the entire duration of that hundred. Getting out removes them from the sample. So a large batting score has a built-in quality filter because the batter proved they could handle everything thrown at them for that entire period. A bowler doesn’t have an equivalent filter- we’ve all seen instances of bowlers getting absolutely pounded, but also of bowlers just not bowling even when they are available because the captain doesn’t trust them. Also, batters can get out and end their performance, but a bad spell stays in the data rather than ending it.

Batting produces a continuous, accumulating, directly attributable metric with a survivorship filter. Bowling produces a rare, jointly caused metric with high per-event noise and no survivorship filter. They could have the same amount of underlying skill variation, but batting gives you better and more frequent measurements of it. So the signal-to-noise ratio is structurally worse for bowling than for batting, even before you account for the process/outcome problem with great balls going for four.

So there are two questions tangled up in that one perfect ball. One: how much does a bowler’s true underlying state actually move around over time? Two: given how noisy bowling outcomes are, how much of what we call a purple patch is just random clumps sitting on top of the former?

Streaks, Clumps and Slumps
Remember that time we kept losing coin tosses across formats, tournaments, venues and captains? If not, you can read more about it here and here. It was exasperating, but it is just something that happens naturally with numbers. In any long sequence of random outcomes, you will see streaks. Not because something changed, but because that’s just what randomness looks like. The longer the sequence, the longer the streaks you’d expect just from chance. Cricket careers are long sequences. Impressive-looking purple patches will appear in random data.

A signal is the meaningful, underlying information or pattern within a dataset that conveys useful information about a phenomenon.³ In statistics it is the meaningful information, true pattern, or underlying trend hidden within a data set. It is the “message” you are trying to find, separate from random irregularities, which is called noise, which is random, unwanted, and unpredictable fluctuations or variability in data that obscure the underlying signal or true pattern.⁴ Think of information vs. data, or someone singing under their breath in an otherwise busy room.

Any observed performance is a mixture of signal (the bowler’s actual underlying ability on that day) and noise (luck, conditions, batter quality, fielding, the specific random variation of where each ball lands). The question isn’t whether good performances cluster- they do. The question is whether the clustering is bigger than the noise would produce on its own. This requires knowing how much noise there is in cricket outcomes.

If real form exists, if something genuine changes in a bowler’s body or mind that persists across matches, then knowing how they bowled last match should help you predict how they’ll bowl this match. That’s called autocorrelation⁵ ⁶ ⁷: the extent to which a value in a sequence is correlated with the value before it. If form is real, you’d expect positive autocorrelation in performance sequences. If it’s just randomness, autocorrelation should hover near zero.

Also there is a caveat: even if real form exists as a statistical signal, most cricket careers may not be long enough to detect it reliably. A bowler might play 50 Tests. Each Test gives you a handful of spells at a maximum. Separating signal from noise in 50-100 data points, when each data point contains substantial outcome variance, requires a very strong signal.

Regression to the Mean
Mean⁸ is a statistical word for average (sum all your data points, divide the sum by the number of data points, what we used to do in school). There’s a related concept that’s even more important for understanding how we perceive form, and it’s called regression to the mean^{(statistical phenomenon where extreme, unusual, or outlier measurements tend to be followed by measurements closer to the average}⁹⁾.

Extreme performances, either very good or very bad, are partly skill and partly luck. After an unusually good spell, the most likely next result is something closer to average. Not because form dropped, or because the bowler did anything differently, but because the extreme outcome reflected an unusually lucky combination of skill and chance variation, and that combination is unlikely to repeat in exactly the same manner.

This creates a specific perceptual trap: Imagine a bowler takes wickets in four consecutive matches. Everyone says they’re in form. The next match is average. Everyone says they’ve lost their rhythm. What actually happened is: the four good matches were skill plus good luck, and the average match was skill without particularly good luck. The average is always more representative.

What looks like form peaking and then fading is often just an extreme performances being followed by more typical ones, because that’s how averages work.

Hot Hands
In 1985, three psychologists, Gilovich, Vallone, and Tversky, published a paper about basketball players.¹⁰ They tested whether a player who had made several consecutive shots was actually more likely to make the next one, as coaches, players, and fans universally believed. The answer was no. When they applied proper statistical tests to shooting data, the streaks that looked like “hot hands” were completely consistent with what you’d expect from random sequences. The hot hand was, they concluded, a cognitive illusion- the human brain is extraordinarily good at finding patterns and just terrible at recognising what randomness looks like.

Then, in 2018, two researchers named Miller and Sanjurjo found a problem with it.¹¹ There’s a subtle mathematical bias that appears when you look for streaks within short sequences — the way the original paper sampled the data produced an underestimate of the true streak effect. When they corrected for it, a small but real hot hand effect appeared in the data. Not large, not the dramatic momentum that commentators describe, but statistically detectable.

So: the hot hand probably exists a little, but there is probably a difference between what it means for the bowler themself (internal), the fan (external), and the statistician (mathematical).

Has this been tested in cricket?

A study titled Significant hot hand effect in the game of cricket specifically looked at ODI and Test performances.¹² Unlike the basketball study, which found outcomes were independent, this research used self-exciting point processes (a fancy way of saying “success increases the probability of immediate future success”, I don’t know why researchers talk like this, it’s so annoying) and found:

Predictability exists: In both ODIs and Tests, individual performance sequences showed more clustering than random chance would allow.
The “60% Rule”: The researchers found that models accounting for the hot hand outperformed random-null models^{(a simplified, randomized version of real data used to test if observed patterns are due to chance. It keeps some data structure fixed (like totals) but randomises others to create a “baseline” for comparison)} about 60-62% of the time, which is statistically significant^{(this means that the outcome is unlikely to have occurred by chance alone}¹³⁾.

Real vs. Casual
A famous Yale study¹⁴ on bowling (ten-pin bowling, that is) data found something similar: the hot hand is real but not causal. One strike doesn’t magically cause the next; instead, players go through high‑ and low‑performance states where every ball in that window is a bit more or less likely to work.

So, a player isn’t “hot” because they just took a wicket; they are “hot” because they are currently in a high-performance state where the probability of a wicket is higher for every ball in that window.

What Does “Normal” Look Like for This Bowler?
Here’s the idea: pick a bowler, and choose a stretch of their career where they feel like roughly the same player- same role, same format, similar fitness, no huge technical overhauls, then take the average of all their performance numbers. The result, is what you’d expect from this bowler on a typical day in this phase of their career, which means it is the baseline- the normal level of this bowler in the period you care about.

Why does version control^{(called non-stationarity in statistics}¹⁵⁾ matter? Because different versions of the same person should not be put in the same streak of matches, because an early Mitch Starc and present day Mitch Starc are completely different bowlers. So they produce maybe the same looking ball, but the process and consistency must be completely different (maybe, I think), and in that way that is a different ball altogether.

And also, as with any average, the larger the sample size, the more representative it will be of the next ball that will be bowled.¹⁶ This is why we want a reasonable sample, so please think dozens of matches, not 3-4.

Now we want to know, game by game, whether the bowler was better than their own usual level or not, and because we know this, we can find out for each match how much they deviated from this average. This is straightforward subtraction- if the Average is A, and the new data point (the performance from the current match) is B, then if:

A>B, the bowler didn’t bowl as well as the recent most applicable average,
A<B, the bowler bowled better than the recent most applicable average, and
A=B, the bowler bowled as well as the recent most applicable average.

We can take this as:

A>B as a score of -1,
A<B as a score of +1, and
A=B as a score of 0- that is, they are bowling on average neither better nor worse than the current average.

So,

+1 = better than usual
0 = roughly normal
−1 = worse

Which means that over ten matches one might see: +1,0,−1,+1,+1,0,+1,−1,+1,+1.
That’s a crude form diary for this bowler in this phase of their career.

What randomness actually looks like
Imagine a bowler whose underlying ability is completely fixed- same skill, same fitness, same everything, match after match. No slumps, no golden periods, no form at all. Just a consistent underlying level of performance with some random variation in outcomes from match to match, because cricket is not a controlled experiment and outcomes are noisy.

Now watch that bowler for fifty matches.

You will see streaks. Strings of matches above their average. Strings below. At some point you will see five good matches in a row and think: they’re in form. At some point you will see four mediocre matches and think: they’ve lost it. Neither conclusion would be correct. You’d be reading patterns into randomness.

In statistics, this is called a ‘run’. A statistical run is a streak of similar outcomes¹⁷, so above average, above average, above average is a run of three. The Runs Test¹⁸ asks: given this sequence of above-and-below-average performances, does the pattern of runs look like what you’d expect from pure randomness, or is there more clustering (or more alternating) than chance would produce? You don’t need the formula. The logic is: count the runs, compare to the expected number if the sequence were random, and ask whether the difference is bigger than chance alone would explain.

But to know what is above average, first we need to know what is average for that bowler.

The Poetry is the Signal
So, is a bowler who scores +1,0,−1,+1,+1,0,+1,−1,+1,+1 in good form or average form?

As far as I can see, there are three types of form in cricket bowling:

Type 1: Physical/biomechanical state: The bowler’s body is working or it isn’t. Rhythm, run-up, shoulder position, wrist at release. This is internal and real and the bowler feels it immediately. A niggle, a slight change in action, fatigue, just their mental state, these affect the actual delivery. This is the closest thing to true form.
Type 2: Outcome form: What the scorebook says. Wickets, economy, match figures. This is what fans and selectors see. It’s a noisy, delayed, jointly-caused signal that reflects Type 1 form plus batter quality plus fielding plus luck. It can diverge wildly from Type 1- a bowler can be in beautiful physical form and get hammered because the batters are brilliant that week, or be slightly off and take a five-for because edges keep flying to hand.
Type 3: Perceived form: What fans, commentators, and sometimes selectors believe based on Type 2. Subject to all the cognitive biases described- hot hand illusion, regression to the mean misread as form loss, pattern-finding in noise.

But bowlers are not coin tosses. People are not numbers, so bowlers remember the last ball, they feel their front leg blocking well or not, they sense whether the seam is landing upright, they know if their shoulder is sore. Those things change the underlying probability of a good ball in a way no simple random model can capture.

The hot‑hand studies in other sports end up in a similar place: they find real fluctuations in a player’s underlying performance level over time, but very little of it is the magical one-success-causes-the-next momentum commentators seem to love.

So,

Individual bowlers do have better and worse phases. There is some persistence in performance beyond pure randomness.
But bowling outcomes (wickets, runs) are so noisy and so joint that even a completely flat bowler would eventually generate streaks that look like form.
The “hot hand” we see on TV, such as wickets in clumps, commentators rhapsodising, is mostly our brain misreading random clumps as deep narrative, with a thin layer of real underlying changes.
For the bowler inside the game, “form” probably means something more process‑y: how their body and action feel, whether they can hit the length in their head, whether their corrections are working. The scorecard is a crude, laggy, sometimes unfair reflection of that.

The “Signal” isn’t the wicket. The wicket is the Outcome, and the Outcome is noisy, messy, and shared with ten other people. The “Signal” is that feeling I had watching Starc bowl. It’s the Process– the perfect snap of the wrist, the late tail of the ball, and only then, sometimes, the sound of the stumps.

Sources

Tag: Noise

Signal and Noise in Cricket Performance

Share this: