Practice notes on testing and learning

Field notes on starting small, learning in public, and building the confidence to change

Jun 30, 2026

Test and learn, in theory, is at its best when an organisation treats what it is learning about itself as seriously as what it is learning about the service.

And, thankfully, the approach feels like it is experiencing a real sense of renewed energy in the UK right now. A spark, even beyond the usual bureaucracy wonks, around changing how government delivers change – testing assumptions earlier, learning through delivery rather than discovery alone, starting with real service conditions, and building capability through practice rather than waiting for the perfect framework.

Test and learn is not necessarily a new idea. But something does feel like it is gathering. Where The Radical How helped sharpen the case for changing how government changes, the Cabinet Office’s Test, Learn and Grow programme has given the practice a more visible centre of gravity. Perhaps most significantly from a policy perspective, the Magenta Book — the UK government’s evaluation bible (as Nick Kimber put it) now includes a Test and Learn annex. Even if only as a signal, this matters.

For me, this is currently exactly the work we are doing through CustomerFirst. A deliberately high-velocity slice of test and learn in a NewCo space, making small, real changes to a live service that can begin to gather momentum for wider transformation, while also learning what those changes reveal about the organisation around them.

So, in the midst of that momentum, it felt like a good moment to pause and reflect on what actually happens when an organisation tries to practise test and learn seriously. Not as a method to master, but as a way of thinking out loud. What it reveals, what it makes possible, what it unsettles, and what it asks of the people trying to change a live public service from within.

As ever, I feel very fortunate to have daily conversations with brilliant colleagues, and to have the space to use writing as a way of making sense of what we are learning together. I hope they see some of their own thinking reflected here too.

1. Start by testing the problem

One of the things test and learn has made us more attentive to is how quickly a service problem can smuggle in its own solution.

It may sound obvious, but in practice it is surprisingly easy to miss, something I’m all too guilty of. A problem can arrive already shaped by the language around it. Too many people are calling. Too many cases are waiting. Too many applications are delayed. Too many people are doing the wrong thing. The temptation is to accept that framing and move straight into fixing it. Reduce the calls. Clear the queue. Speed up the process. Change the behaviour.

Sometimes that is exactly what is needed. But often the first problem statement is only the visible edge of something else.

A service might appear to have a demand problem when it actually has a confidence problem. People are not calling because they want to call. They are calling because the service has not given them enough certainty to wait. A process might appear to have a compliance problem when it actually has a comprehension problem. People are not failing to follow the rules because they are careless. They may be trying to navigate a form, a letter, a threshold or a decision they simply do not understand.

That is why problem definition matters so much in test and learn. The work is not just to ask, “What can we test?” It is to ask, “What do we think is really happening here, and how would we know if we were wrong?”

This is where the craft sits. Not accepting the first causal story too quickly. Not defining the solution into the problem. Not treating the data point as the explanation. Ten per cent of people filing late, a rise in avoidable contact, a queue that keeps growing — these things tell you something is happening. They do not, on their own, tell you why.

In practice, a better problem definition makes the work more testable. It gives you something behavioural and concrete to learn about. What are people unsure about? Where do they lose confidence? Which decision points create avoidable work? What would need to change for someone to act differently?

This is also where test and learn starts to move beyond the pipe logic of public service transformation. If the problem is defined only as demand exceeding capacity, then the answer will usually be to widen the pipe, unblock the pipe or move people through it faster. But if the problem is partly about uncertainty, trust, comprehension, risk or decision-making, then the work changes. You are no longer only improving flow. You are testing whether the service is organised around the right understanding of the problem in the first place.

For me, that is where test and learn begins. Not with the experiment, but with the discipline of refusing to pretend the first problem statement is the whole truth.

2. Hold the tension without turning it into pressure

One of the biggest pulls I’ve felt through this work is between the urgency to fix things at scale and the discipline to start small enough to actually learn something.

The ambition is deliberately big. We want to improve outcomes for citizens, make the work genuinely better for staff, and fundamentally rethink how a service operates. But when that ambition meets the reality of a live service – existing backlogs, growing operational pressure, partial data, legacy systems, local knowledge, risk and accountability – it also meets a history of attempted transformations and political pressure to demonstrate progress.

I always think about this type of work as stretching an elastic band. On one side is the scale of change people need and expect. On the other is what we can responsibly claim to know and do today. Starting small does not remove that tension. It creates space inside it, allowing us to make something real enough to learn from without pretending we already have the answer.

That may sound intense, so it is important to say that holding this tension together as a team is not the same as creating pressure. If the band is pulled too tightly, people brace against it. The work becomes hurried, performative or overly certain. If it is simply let go, things snap back to the way they were before. The challenge is to make the discomfort survivable, and even useful, so that people can stay open to what the work is revealing.

We have tried to name this directly within the CustomerFirst team. There will be moments when it feels like we are pushing too hard or testing something before we know the right answer. That feeling is not necessarily a sign that something has gone wrong. Sometimes it means we are getting closer to shipping something real rather than something safe. Holding this tension is something we have to support one another to do – noticing the discomfort, talking about it openly, and judging together whether it is productive.

This is not a tension you resolve once, either. Holding the tension can become the work. Every new idea, test or way of framing the problem stretches the elastic again. You do not get to bank the judgement from last time. You have to keep deciding how far to pull, when to hold and when to ease off. Hopefully, with practice, the team becomes a little better at recognising the difference between the discomfort that comes with learning and the pressure that stops learning altogether.

Don’t get me wrong, the point is not to make discomfort a virtue. We are all human, and what we are really working towards is a psychologically safe space where people can do brilliant work, support one another and enjoy themselves while doing it.

3. Test the organisation, not just the service

At its simplest, the high-velocity version of test and learn we’re working with keeps returning to three core questions: what is the hypothesis, what can we test to find out, and – the one that matters most – what does that unlock for the wider transformation?

The first two questions are principally about the service. Did this message reduce uncertainty? Did this small change help colleagues make better decisions? Does it move anything at all? But the third question tests something quieter and often unappreciated in this type of work. Can we work this way? Can we act before everything is known and live with what that feels like? Can the organisation tolerate the uncertainty that learning requires?

What I think this is really building toward is something less visible than the test itself. The harder transformation decisions that come later – the ones involving operating models, risk tolerance, policy constraints and organisational boundaries – will not be made by people simply reading a report. They will be made by people who have, or have not, experienced what it feels like to test something, learn something and change something as a result.

That experience changes what people believe is possible. It is not a side effect of testing. It is part of what the test is for. The external learning tells you something about the service. The internal learning – the growing confidence that evidence-led change is possible at all – is what allows you to keep doing it.

4. Start small enough to snowball

This is why a small experiment with one cohort or one part of a journey can matter, even when it feels counterintuitive in a transformation programme and inadequate against the scale of what needs to change. Early tests are not always trying to prove a business case or demonstrate statistical significance. Sometimes they are trying to create a signal – enough to know what breaks, what moves, what surprises you and whether a bigger test is worth running.

One of our first tests involved sending a hundred people a text message telling them their documents had been received. That was never going to provide definitive statistical proof. But that was never really the point. It took an idea out of a slide deck and put it into a live service within a matter of weeks. It made us confront the practical questions, bring the right people together and surface the blockers that would matter if we tried to do it at greater scale.

Dan Hill, writing about mission-oriented design, describes a related approach as snowballing, beginning with small, located prototypes that gather evidence, relationships, credibility and momentum as they move, until the work has enough gravity to attract wider institutional and political support that is needed for transformation.

Snowball dynamics — Mission Design Handbook – Dan Hill

Starting small, then, is not a compromise on ambition. It is a decision about sequencing. One test creates the conditions for the next, and each turn of the snowball should leave the organisation with more than it had before. More evidence, more confidence and more people who have experienced a different way of working.

This is an organisation becoming more capable as it moves. That is a different kind of rigour, but it is still rigour because the evidence that eventually convinces institutions to commit, reshape governance or unlock investment is rarely produced in advance. It is produced through the accumulation of real-world interactions in meaningful contexts.

5. Notice what travels beyond the protected space

What happens, though, when the learning reaches the edges of the protected space where it began? I have written before about innovation labs and other protected spaces for change. At their best, they create the conditions for people to work differently, crossing boundaries and testing ideas that the ordinary organisation struggles to hold. But those conditions can also be fragile. The work can remain contained within the lab, disappear when attention moves elsewhere or be absorbed back into the organisation without changing very much at all.

And at the moment, that history is sitting with me in this work. Our NewCo deliberately creates different conditions around a live service, giving a multidisciplinary team the space and permission to move quickly. But the real test is not only whether those conditions help us make progress. It is whether anything travels beyond them.

Do these protected spaces eventually replace parts of the original organisation? Do they get absorbed back into it? Or do they diffuse into nothing once the attention moves elsewhere? We are too early to know which of these is happening to us.

It has been fascinating, and slightly disorienting, to watch the language and energy of this way of working begin to travel beyond the team that built it. Both in our team and within the DVLA, people are using the phrase “NewCo” – invoking it in exactly the kind of institutional argument it was designed to sidestep.

I’m unsure whether that is a good sign or a warning. It might mean the idea has enough weight that people reach for it instinctively, which feels closer to Andrew Greenway’s distinction between a method and a movement for test and learn – where a method can be handed over, but a movement has to travel through people, practice and repeated use. Or it might mean the protected space is already being folded into the politics it was meant to remain separate from, language and all.

I genuinely don’t know which way it is going yet. I can point to the experiments, but not yet to evidence of which direction the learning is travelling. That is an uncomfortable thing to admit in a piece that has otherwise been fairly confident about what test and learn is doing. But I think the discomfort is the honest answer, rather than a gap to paper over.

Some reflections going forward

Whatever happens next, the work has to stay rooted in the people who run the service. Many have spent years watching transformation programmes arrive and leave, consultants come in, strategies launch, governance get restructured, and the service stay much the same. They know the workarounds better than anyone because they are often the people who have had to build them.

What I keep relearning, every time I sit with the people who actually run the service, is that nothing replaces what they know. They remember the change that did not make sense at the time and still does not. They know which small thing would make the greatest difference – an extra field on a form, a sentence reordered, a button moved – because they are closest to the person on the other end of it. The biggest improvements are very rarely the ones that look biggest on a roadmap.

Test and learn cannot arrive as another solution looking for buy-in. It has to be an invitation into a process, and that invitation has to lead somewhere. In practice, this requires more than listening. It means listening and responding — taking what we have heard and pushing an idea through, rather than simply noting it down, while keeping an open door for teams to share what is getting in the way before we have decided what the answer is.

The risk, as test and learn gathers momentum, is that organisations scale the visible parts too quickly. The terminology, the templates, the governance decks, the pilot labels, the show-and-tells. Those things can help, but scaling the language does not necessarily scale the learning.

The question going forward is what each test leaves behind. Does it make the service better? Does it also make the next act of learning easier because the relationships are stronger, the evidence is clearer and more people have experienced change happening in practice?

Building that capacity to keep improving feels like the harder project. And probably the more important one.

John Mortimer

Thanks Jack. When I read what you have written, I am reminded that many of us need those that can pen our true thinking onto paper. Reading your article is like reading my mind, in a way that I cannot structure myself.

Thank you for articulating the true principles and intent of Test & Learn, as that is what matters.

Discussion about this post

Ready for more?