


Marshmallowy: What LLMs Change — and Where the Edges Are

February 10, 2026
The genuinely new thing LLMs bring to software isn’t intelligence or automation: It’s marshmallowiness.
LLMs are soft, squishy, language-first systems. They’re unreasonably good at understanding intent, filling in missing context, and producing explanations that feel human.
That changes how you should build. But it also forces you to learn—sometimes the hard way—where marshmallowy systems need boundaries.
Leaning Into Marshmallowy
Traditional software forces users to adapt to rigid structures. LLMs flip that: users can speak naturally and let the system interpret.
This is powerful. A user can say: “Can you chase up that invoice from the plumber, the one from a couple weeks ago?”
No invoice number. Approximate time. Vendor by trade, not name.
Classic software would fall over because this is not a structured query. At best a UX designer could give you buttons to mash to get a structured query. However with LLMs, modern systems thrive on this marshmallowy context.
That softness is the interface. It's super-powerful and it’s actually quite hard to get used to the “We have an LLM” thing. That's another post. But let's say you do click and suddenly you see LLM-shaped opportunities everywhere: how does it go wrong? Where do you draw the line?
The Bounds for Humans: Guidance, Not Guessing
Doing Marshmallowy properly doesn’t mean “anything goes”. In practice, fully open-ended interfaces create uncertainty. Users don’t know what’s possible, what’s safe, or what the system expects.
So we’ve learned that marshmallowy systems still need to provide the people using them guidance:
- templates instead of blank slates
- prompts that reveal capability
- progressive disclosure instead of overwhelming configuration
The LLM gives flexibility to accept anything, but the product still gives confidence through UX. This means experts can always go further, but new users aren’t left guessing.
Examples
Prompts that reveal capability
While it's really cool that you can type anything into this field and the LLM will get it, that doesn't help the user understand of feel confident about what they can type in there. So providing Prompts are great way to give them guidance and confidence to navigate the big empty text-box.

Templates instead of blank slates
Another option is to provide a clear known and working starting place, or example. We have templates for common workflows so people can skip straight past the blank state.

The Bounds for Agents: Structure, Not Trust
Marshmallowy also doesn’t mean trusting the model. LLMs lie and they don’t listen. LLMs are persuasive, confident, and wrong in subtle ways. They hallucinate outcomes, skip steps, and slowly drift.
Keeping agents honest
Our job is to ensure that given that reality, our agents are still reliable for our customers. So agents operate inside non-negotiable constraints that reduce, catch or eliminate their short-comings. We’ve developed an agentic system that harnesses an LLM through:
- explicit planning / todos before action
- tool-only execution
- tool verification against intent and permissions
- completion verification so “done” actually means done
- drift detection for long-running work
Examples
Explicit planning / todos before action
We know agents perform better when they plan. But they don't always plan. So we make them go through a planning phase before they are allowed to execute. This also provides a nice affordance for our users to put a task in “Supervised” mode which requires explicit approval to move out of the Planning and into the Execution phase.


Ready to Execute Validates Intent: Context Gathering is Required
And because agents are trained to be super-keen to please, they need some encouragement to slow down and plan more carefully. So we make sure they gather context during their planning phase (with read tools), so they can make a more specific plan.

Quality Pass Before Completion
Separate sub-agent with fresh context reviews the work. Three outcomes: approved, fix and retry, or escalate to human.

Task Complete Validates All Todos
Agents are prone to hallucinate actions (especially after they've planned an action), so we built in guard rails to make sure that during the Planning phase agents need to say what tools they will need to complete specific to-dos, and what documents/entities they will need to create/update. This means that when they mark a to-do as “complete” we can verify those tools were called against those entities. If not: we can reject the completion and ask the agent to try again.

Completion Review Challenge
First task_complete call triggers a challenge: "Look at your original plan. Did you actually do everything?"
Empowering agents in the finance domain
And while Marshmallowiness is great for allowing extremely flexible systems, we are building a specialised agent so our users can fly.
Our users don’t want to be constructing and tracking schemas for standard finance concepts like Invoices, FX, Prepayments etc. Taking marshmallowiness to the extreme might look like “Well can’t users build FX tracking by stringing together Tasks, and Artifacts etc?” While they could, should they? We aren't building a generic agent, we’re building one that’s fluent in finance.
We should support those structures so our users only need to think about what’s unique about their business domain: whether it is tracking horse breeds or AWS cost centers. We ask ourselves: will every finance team need this? If so, we should lay the foundations.
Examples
Prepayments & Amortization

FX/Currency Built Into Every Transaction

Multi-Stage Approval Workflow

Invoice Understanding (Not Just Raw PDFs)

Designing for Two Audiences
This is the key shift: When you build with LLMs, you’re no longer designing for just one user.
You’re designing for:
- humans, who need flexibility and confidence
- agents, which need freedom to interpret and hard edges to stay aligned
Marshmallowy is a new way of thinking: knowing where to lean into softness, knowing where to put very hard edges. Knowing that interpretation is valuable—and authority must be earned.
Get that balance right and you don’t get just a chatbot: you get supervised labour that actually works.
That’s what marshmallowy means.
Book in a demo with our Founder CEO today

A 30-min call is all it takes to see how Sterling can start helping you save time right away.
Book a demo with Nik
