Business Context on AI Data Won’t Be Solved by Retrieval.

Business Context on AI Data Won’t Be Solved by Retrieval.

Context Engineering
author

By Caber Team

10 Aug 2025

Think MCP, ANP, agents.json, Agora, LMOS, or AITP will "solve business context"? They won't. These protocols are great at making data available to agents. They make it easier to pass the right chunks from this data to other agents. They do not tell you what any given chunk actually means in your business, or when it's appropriate to use. That's not a transport problem, it's a meaning and governance problem.

Business context isn't just the metadata attached to the document your AI happened to retrieve from, or the semantic relationships you can squeeze out of a single paragraph. It's an ecosystem of relationships, and the most important ones often live outside the immediate source.

The Hidden Life of a Chunk

When we talk about context, we're really talking about chunks. That's how LLMs, RAG systems, and most vector databases see the world. A "chunk" might be a sentence, a table cell, or a paragraph, small enough to embed, big enough to convey meaning.

Here's the catch: the exact same chunk might appear in dozens of different documents, each with its own metadata, authorship, timestamps, confidentiality flags, regulatory tags, workflow stage, and so on.

If you only look at the document the chunk was read from, you miss the bigger story.

That's like judging a single line of code based solely on the file it's in, without knowing it's been copied into six other repos, patched in three of them, and is currently being used in production by a system with a different security model.

Why Proximate Metadata Falls Short

Most classification and governance tools still think like librarians: "This document has these attributes, so its contents must too." The reality?

  • That "confidential" paragraph in your quarterly earnings report may be identical to one already published in last year's public filing.
  • The same compliance clause in a contract may also appear in an internal policy draft with stricter access rules.

If your governance layer only sees the local metadata, it will either over-restrict (blocking safe use) or under-restrict (risking a leak).

The Authorized Junk Problem

There is a third failure mode that over-restrict and under-restrict don't capture: authorized irrelevance. Authorization determines whether a user is allowed to see a piece of data. It says nothing about whether that data is relevant, current, or helpful for the question being asked.

An AI system that fills its context window with authorized-but-irrelevant fragments produces worse answers than one with a smaller, curated context. Boilerplate clauses, stale drafts, duplicated paragraphs, and template language all pass access checks without difficulty. They are permissioned. They are also noise. And because they satisfy the policy bar, no governance tool flags them.

The result is a context window full of content the user is allowed to see but that degrades the answer the user actually receives. Every governance tool that stops at "is this user allowed to see this" lets noise through. Authorization is necessary, but it is not a quality signal.

The Broad-View Context Model

To really capture business context, you need to:

  1. Aggregate metadata across every occurrence of a chunk, in every document, across time.
  2. Resolve conflicts using precedence (source of authority) and prevalence (most common use) rules.
  3. Maintain lineage so you can see exactly where that chunk has been, who touched it, and under what policy it lived at each point.

This is a fundamentally different mindset. Instead of starting with "this document says X about this chunk," you start with "this chunk exists in 27 places, and here's the union of everything we know about what it means and how it can be used, in other words, its business significance."

In practice, this broad view enables every fragment to be evaluated against four questions before it enters an AI context window:

  1. Is it current? Has the source been superseded, updated, or retracted since this chunk was embedded?
  2. What is its provenance? Where did this chunk originate, where else does it appear, and which source is authoritative?
  3. Is it authorized? Does the requesting user have permission to access this specific fragment, given the union of all its metadata across every location it appears?
  4. Is it relevant? Does this fragment contribute substantive information to the question being asked, or is it noise that happens to pass the access check?

Without the broad view, you cannot answer any of these questions reliably. With it, governance stops being a gate that either blocks or permits, and becomes a filter that ensures only current, authoritative, authorized, and relevant content reaches the model.

Why Agent Communication Protocols Can't Fix This

Let's be clear: protocols aren't the villain; they're just not the mechanism for meaning. MCP, ANP, Agora, agents.json, LMOS, and AITP define how agents talk to each other, how they exchange tasks, pass along context, and authenticate participants.

Many of them address critical security pillars for agent interactions:

  • Confidentiality of communications (so only the intended parties see the data)
  • Integrity (so the data isn't altered in transit)
  • Authenticity and non-repudiation (so you know who sent it, and they can't deny it later)

That's essential for trust between agents. But none of it answers the core questions: what does this chunk mean in your business, and under what conditions can it be used?

Knowing which agent handed you a paragraph doesn't disambiguate its business significance. The same chunk can appear across silos under conflicting metadata. Until you resolve that conflict at the chunk level, policies will fail, regardless of how well your agents authenticate each other.

Protocols move context securely. Meaning and policy come from a cross-corpus, metadata-aggregated view of the chunk itself, the broad view we've been talking about.

The Payoff

When you take the broad view:

  • AI answers become explainable, because every retrieved chunk comes with a history.
  • Permissions stop being brittle, because you're applying the right policy for the chunk, not the default for the nearest file.
  • Data quality improves, because duplicate and stale chunks can be identified and resolved before they pollute your retrieval set.
  • Authorized noise gets filtered, because relevance and currency are evaluated alongside access, not ignored.

And perhaps most importantly, you stop treating "context" as whatever happens to be nearby, and start treating it as the complete, cross-document truth.

Popular Tags:
Semantic Layer
Chunking Strategy
MCP
Metadata
Follow us on LinkedIn:
Share this post: