Engineering
Harnesses for Knowledge Work
The harness conversation of 2026 has been all about coding. Knowledge Work is next.
Apr 22, 2026
The first principle of a knowledge-work harness is proactivity. In this post, we explore what that unlocks — and what it demands.
The conversation around AI has been dominated by chat assistants. The AI works with a human in front of it, who’s paying attention to the model in real time. As a result, the model’s response latency has become a primary cost metric to optimize for.
A harness built for knowledge work is different. Most knowledge work isn’t a user sitting at a prompt waiting for an answer. It’s a stream of situations that someone has to notice, think about, and act on. The bottleneck doesn’t become how fast the model can reply. It becomes how well it detects and reasons about situations before anyone else even sees them. For the developer, this means the real challenge of building such systems lies in the memory frameworks. When detecting a new assignment, the system has to remember the situation you resolved successfully last week.
The knowledge work harness cares about what changes when you stop optimizing for the user’s attention and start optimizing for the quality of the output. It’s the shift that defines how we’ve built Qorpera, and it leads to an architecture that looks nothing like the AI tools most are familiar with today.
How the work arrives
To understand why latency doesn’t matter in the way other harness designs assume, it helps to understand where the work comes from in the first place.
In a keystroke-bound harness, the user is the source of work. They type a query, phrase a task, describe an edit — and the harness responds. The session is the unit of work, and the user is the one triggering each cycle.
A knowledge-work harness inverts this. The work arrives on its own. The harness is connected to the tools a company already runs on — email, calendar, CRM, accounting, ticketing, document storage — and watches continuously, reading the signals that cross those tools. A payment sixty days late. A customer conversation that shifts tone. A contract up for renewal. A deal that hasn’t moved in weeks. Most of what a knowledge worker does all day is spotting these patterns; a harness connected to the same tools can spot them too, and it doesn’t stop at 6 p.m.
Spotting is the easy part. Making sense of what you’ve spotted is harder, and it’s where the second piece of the architecture comes in: the harness maintains a living model of the organisation it works inside. Who the customers are, which ones matter, how this team writes, what’s been decided about this account, what’s worked before in situations that look like this one. Without this context, a detected signal is noise — any competent model can tell you an invoice is overdue, and none of them can tell you whether this specific customer gets a phone call or an email and which of your team’s people should be the one to make it. That judgement lives in the company, not in the model, and the harness has to have a structured way to keep that knowledge up to date on its own, or every proposal becomes generic.
Put the two together — continuous detection against a live organisational model — and the workflow that emerges looks very different from what a keystroke-bound harness produces. The harness spots a situation, assembles the context, reasons about what to do, and drafts a proposal. Minutes or hours later, the operator reviews it. Approve, reject, edit. The harness learns from the decision and resumes watching. The user is never waiting because the user was never asking. They’re reviewing work that arrived on its own.
That’s the loop the rest of this post is about. Every architectural decision downstream follows from this shape — from the fact that the harness is upstream of the user’s attention, not downstream of it.
What thinking time unlocks
Without a user waiting, decision quality replaces latency as the scarce resource. And decision quality, unlike latency, benefits from more work, not less. Running three passes of reasoning is three times more expensive per decision, but the marginal compute cost is trivial against the cost of a wrong decision in a business context. A bad invoice-chase email sent to a high-value customer costs more than a hundred multi-pass reasoning cycles. A due-diligence finding that missed a red flag costs more than ten thousand.
Thinking is cheaper than being wrong. That sentence is the organizing principle of a knowledge-work harness, and the architectural moves below all follow from it.
The work shifts from write-heavy to read-heavy.Before any proposal reaches the operator, the harness might read hundreds of pages of organisational context, run dozens of retrievals across tools, review resolved situations that look like the current one, and check policies that apply. The operator might receive three paragraphs; underneath those three paragraphs is an entire investigation, extending the read:write ratio toward 100:1. For perspective, the commoditized tools we’ve looked at during our research typically run between 2:1 and 6:1. A harness that reads heavily before writing can afford to be careful; one that writes constantly cannot.
Context becomes a resource to be curated, not a prompt to be stuffed.Most AI systems built on retrieval work the same way: run a vector search, grab the top-K results, stack them into the context window, hope relevance sorting does the work. It’s the only pattern that fits under a tight latency budget. With a wider budget, the harness can read more than it will use, evaluate what it read against the current situation, keep the parts that inform the reasoning, and discard the rest. Context is actively shaped across a single reasoning run, rather than pre-assembled and frozen at the start. The longer the reasoning continues, the more deliberate its attention becomes, rather than the more polluted.
Internal experimentation becomes a tool the harness reaches for.The harness can run work against itself before anything reaches the operator. Generate several candidate proposals, simulate the likely outcome of each using the organisational context, pick the one that survives. Rehearse a plan step by step, predict what happens at each step, flag the steps where the prediction is uncertain. Attack the draft with a critic framing, rewrite incorporating the critique. Run the same proposal with a key assumption assumed-false and see whether the recommendation changes — divergence reveals which assumptions the answer depends on.
None of this is visible to the operator. What they see is a better first draft, with the weaker alternatives already discarded. Every one of these patterns costs compute the operator doesn’t see; none of them cost the operator’s attention.
What this demands
None of these moves work without a substrate to support them. A harness that reads fifty pages for every proposal, triages context mid-reasoning, explores paths in parallel, and runs experiments against its own drafts is generating state the whole time — and that state has to live somewhere durable, or every pass starts from zero.
This is where memory stops being a side feature and becomes load-bearing. There’s no way to hand over a context window between model calls; the only coherent coordination medium between passes is a durable artifact the first pass produces. Which means a knowledge-work harness isn’t a reasoning engine with memory bolted on — it’s a memory system with reasoning running on top of it.
The further consequence: because the memory persists across time, it accumulates. Every decision the operator approves teaches the harness something about this company’s specific reality — the way this team writes, the customers who respond to phone calls rather than email, the situations where a particular approach has historically worked. Frontier labs will close the reasoning gap quarterly. The memory gap widens the longer the harness is in service.
Reasoning is commoditized. Memory is not.The durable value of a knowledge-work harness is built from what it has learned about a specific business over months of operation — something no model weight can be trained on, and no competitor can copy from the outside.
Qorpera is the knowledge-work harness we’ve been building for the last year. It connects to the tools a company already uses, maintains a living model of how that company operates, and proposes the operational decisions that knowledge workers would otherwise be making manually. The operator approves or rejects; the harness learns from both. Over time, situation types the operator has consistently approved can be delegated to autonomous execution — when, and only when, the operator decides so.
Frontier models are going to keep getting cheaper, faster, and smarter. The cortex is a commodity. What isn’t — and what compounds with every week of operation inside a specific business — is the body around it.
Keep reading