Meaning Contracts for AI Agents and Enterprise Data

Most enterprise AI agents will not fail because the warehouse lacks data. They will fail because the warehouse does not know what the business means.

That sounds subtle until an agent answers a question like: Which active patients had low therapy usage last month? or How many active accounts did we have this quarter? The query may run. The SQL may be valid. The answer may still be wrong because the system missed the rules that experienced analysts apply before they trust a number.

Those rules are rarely stored in one place. They are scattered across dashboards, SQL, Slack threads, Jira comments, meeting notes, data quality exceptions, and the memories of people who know the history.

Humans have been compensating for that mess for years. AI agents cannot safely guess their way through it.

That is the gap I think data platforms need to address next: not another catalog entry, not another dashboard, and not raw warehouse access wrapped in a prompt. The missing interface is what I call a meaning contract: a governed, machine-readable description of what a metric or data product means, how it can be used, who owns it, what quality expectations apply, and what caveats must travel with the answer.

Figure 1. Earlier data stacks relied on people to complete the last mile of business meaning.

The problem AI exposes

Experienced analysts rarely answer business questions by looking at a table name alone.

They know which metric Finance trusts. They know which table should not be joined directly. They know which feed is useful for exploration but not executive reporting. They know who to ask when a number looks suspicious.

This is why two teams can run valid SQL against valid source data and still arrive at different answers. The issue is not always a broken pipeline. Often, it is a business assumption that one group knows and another group does not.

Which status codes count? Which exceptions should be excluded? Which device event is a real business signal and which one is operational noise? Which definition changed after a leadership review but never made it back into the model?

That hidden judgment layer is manageable when the consumer is a person. It becomes risky when the consumer is software that can generate SQL, summarize trends, recommend action, or trigger workflows.

Semantic layers are not new. The missing part is the contract.

Semantic layers are often discussed as if they appeared because of AI. They did not.

Earlier versions of the same idea existed as data dictionaries, enterprise glossaries, BI models, certified marts, data catalogs, and metric stores. Each solved part of the problem.

Capability	What it helps with	What it still does not fully answer
Data catalog	Discovery, ownership, lineage, descriptions	Whether the answer should be trusted for a specific business use
Data dictionary	Field-level explanation and terminology	Business exceptions, decision rules, and historical assumptions
Semantic layer	Reusable metrics, dimensions, joins, and business-friendly access	Operational caveats, maturity, quality, ownership, and permitted use
Quality and observability	Freshness, checks, failures, and reliability signals	Why a metric is trusted only for a narrow context

The change is not that semantics suddenly matter.

The change is that agents need this context to be explicit, governed, and machine-readable. A person can interpret ambiguity. An agent can turn ambiguity into a confident answer.

Metadata alone will not save the agent

Catalogs and semantic layers are necessary, but they are not sufficient on their own.

Why this matters

Discovery without meaning creates confusion. Meaning without quality creates risk. Quality without ownership creates orphaned responsibility. Ownership without policy creates unsafe access.

For AI, these pieces need to work together. The agent should not simply know that a table exists. It should know whether the metric is certified, what grain it supports, which joins are valid, whether the data is fresh enough, which caveats apply, and whether the user is allowed to use the answer.

Meaning in practice: two examples

The easiest way to see the difference is through questions that look simple but are not.

Example 1: healthcare or medtech telemetry

Question: "Which active patients had low therapy usage last month?"

Without a meaning contract, an agent might generate a query like this:

SELECT
    patient_id,
    SUM(therapy_minutes) AS therapy_minutes
FROM telemetry_events
WHERE event_date >= DATE_TRUNC('month', CURRENT_DATE - INTERVAL '1 month')
  AND event_date < DATE_TRUNC('month', CURRENT_DATE)
GROUP BY patient_id
HAVING SUM(therapy_minutes) < 900;

The SQL may run. It may even be syntactically correct. But the answer can still be wrong.

It may include test devices, demo devices, maintenance events, discontinued patients, therapy days that are not eligible for reporting, or telemetry delayed by connectivity. It may also return patient-level detail to a user who should only see an aggregate.

With governed meaning, the agent would know the certified metric, the reporting grain, the allowed population, the exclusions, the freshness expectation, the permitted use, and the caveats that must travel with the answer.

A governed query path for the same healthcare question might look like this:

-- Governed path: query the certified metric, not raw telemetry events
SELECT
    patient_id,
    metric_value AS therapy_minutes,
    definition,
    grain,
    owner,
    freshness_status,
    permitted_use,
    caveats
FROM semantic_metrics.therapy_usage_last_month
WHERE reporting_month = DATE_TRUNC('month', CURRENT_DATE - INTERVAL '1 month')
  AND metric_value < 900
  AND status = 'certified'
  AND access_scope = 'patient_level_allowed';

The exclusions for demo devices, maintenance events, discontinued patients, ineligible therapy days, and delayed telemetry handling are defined inside the certified metric, not rewritten by the agent every time.

Same source data. Different level of trust.

The difference is not the model. The difference is meaning.

Figure 2. Healthcare example: a meaning contract changes what the agent counts and how much the answer can be trusted.

Example 2: commercial and finance reporting

Question: "How many active accounts did we have this quarter?"

Without a meaning contract, one team might count CRM account status. Another might count billing activity. Another might count shipped accounts. Sales, Finance, and Operations can all produce slightly different numbers and still believe they are correct.

-- CRM interpretation: active because the CRM record says active
SELECT COUNT(DISTINCT account_id) AS active_accounts
FROM crm_accounts
WHERE account_status = 'Active'
  AND last_modified_at >= DATE_TRUNC('quarter', CURRENT_DATE);

-- Billing interpretation: active because the account had billable activity
SELECT COUNT(DISTINCT account_id) AS active_accounts
FROM billing_invoices
WHERE invoice_date >= DATE_TRUNC('quarter', CURRENT_DATE)
  AND invoice_date < DATE_TRUNC('quarter', CURRENT_DATE) + INTERVAL '3 months'
  AND invoice_status = 'posted'
  AND invoice_amount > 0;

-- Operations interpretation: active because something shipped
SELECT COUNT(DISTINCT account_id) AS active_accounts
FROM shipment_events
WHERE shipped_at >= DATE_TRUNC('quarter', CURRENT_DATE)
  AND shipped_at < DATE_TRUNC('quarter', CURRENT_DATE) + INTERVAL '3 months'
  AND shipment_status = 'delivered';

With governed meaning, the agent uses the certified active-account definition, including the effective date, account exclusions, billing treatment, approved dimensional cuts, and reporting caveats. It can also explain when the metric is suitable for forecasting, executive reporting, or operational follow-up.

-- Governed path: query the certified metric, not raw source tables
SELECT
    metric_value AS active_accounts,
    definition,
    grain,
    owner,
    freshness_status,
    permitted_use,
    caveats
FROM semantic_metrics.active_accounts_current_quarter
WHERE reporting_quarter = DATE_TRUNC('quarter', CURRENT_DATE)
  AND status = 'certified';

This is the practical gap AI exposes. The problem is not just finding data. The problem is knowing how the business expects the answer to be interpreted.

What mature data platforms have been solving all along

The best public platform write ups show the same pattern: mature data teams eventually manage business meaning, not just pipelines.

Example	What they built	Practical lesson
Airbnb Minerva	A companywide metric platform used as a source of truth for analytics, reporting, and experimentation. Airbnb reported 12,000+ metrics, 4,000 dimensions, and 200+ producers in Minerva.	Metric consistency needs lifecycle management, definitions, code review, validation, quality checks, serving APIs, and deprecation.
Netflix DataJunction	An API-first semantic layer that stores metrics and dimensions, uses connected graph metadata, and generates SQL through a serving layer.	A semantic layer becomes more valuable when it is programmable, discoverable, connected to lineage, and used by applications and AI interfaces.
Uber uMetric and Data180	Unified metric and data-culture efforts focused on metric definition, discovery, computation, quality, access, and consumption.	Standardization must include governance, deduplication, ownership, quality, access control, and social adoption.
DataHub architecture	A metadata platform using APIs, a metadata graph, stream-based metadata changes, and support for federated metadata services.	Metadata becomes more useful when it is connected, fresh, and integrated into workflows rather than treated as static documentation.

These examples do not prove every company needs to build the same platform. They prove something more useful: at scale, trusted business meaning has to become a managed product, not scattered documentation.

Architecture pattern: keep agents behind a governed meaning interface

A meaning contract does not have to be a new vendor category. It is an architectural responsibility: making business meaning available before data reaches BI tools, notebooks, applications, or AI agents.

Figure 3. Reference architecture: meaning contracts sit between data products and consumers as a governed interface layer.

A useful meaning interface should answer the questions an experienced analyst would ask before trusting an output:

Is this metric certified?
What grain does it support?
Which dimensions and joins are approved?
Is the data fresh enough for this question?
Which caveats apply?
Who owns the definition?
Is the user allowed to see or use the answer?
Is this suitable for exploration, operational reporting, or executive reporting?

In this design, the semantic layer remains useful, but it is only one part of the broader architecture. It gives structure to metrics and dimensions. It should not be expected to carry quality, ownership, lineage, security, lifecycle state, and organizational memory by itself.

What a meaning contract looks like

It is easy to talk about business context abstractly. In practice, it needs to become a contract.

For example, a therapy-usage metric in a healthcare or medical-device environment might be represented like this:

version: 1
contract_type: metric
metric: therapy_usage_last_month
grain: patient_month
owner: clinical_operations_analytics
status: certified
source_data_product: therapy_usage_product

business_definition: >
  Total eligible therapy minutes recorded for an active patient
  during the reporting month.

inclusion_rules:
  - patient is active during the reporting period
  - device is assigned to the patient
  - therapy day is eligible for reporting

exclusion_rules:
  - test or demo devices
  - service or maintenance events
  - discontinued patients after discontinuation date

freshness_expectation: telemetry available within 24 hours

quality_checks:
  - completeness_by_device
  - duplicate_event_check
  - eligible_day_validation

allowed_dimensions:
  - reporting_month
  - device_model
  - patient_region

privacy_classification: restricted_internal

permitted_use:
  - operational monitoring
  - internal analytics

caveats:
  - delayed connectivity may understate recent usage
  - timezone is based on patient local time

Most of this is not data. It is the meaning that determines whether an answer is trusted or rejected.

A useful agent response should return not only the answer, but also the definition, grain, owner, freshness, caveats, and policy state behind it.

{
  "metric": "therapy_usage_last_month",
  "grain": "patient_month",
  "status": "certified",
  "owner": "clinical_operations_analytics",
  "freshness": "telemetry available within 24 hours",
  "caveats": [
    "Delayed connectivity may understate recent usage",
    "Timezone is based on patient local time"
  ],
  "answer_provenance": {
    "source_data_product": "therapy_usage_product",
    "quality_checks_passed": true,
    "permitted_use": "internal analytics"
  }
}

That is the difference between an AI system that only retrieves data and an AI system that can explain why the answer should be trusted.

A safer agent access pattern

The safest architecture is not raw warehouse access plus a clever prompt.

It is a governed tool layer that retrieves approved metrics, cites definitions, enforces access, checks freshness, and returns caveats with the answer.

Figure 4. A safer flow keeps AI agents behind a meaning API, policy checks, and metric-level quality gates.

A safer flow looks like this:

The user asks a business question.
The agent resolves the question to a certified metric or approved data product.
The meaning API returns definitions, grain, allowed dimensions, owner, caveats, and permitted use.
The policy layer checks whether the user is allowed to ask that question at that level of detail.
The quality layer checks freshness and reliability at the metric or data-product level.
The query broker executes against approved data products.
The answer returns with provenance, not just a number.

This pattern also creates a better audit trail. When an agent gives an answer, the platform can record which metric definition, data product, policy, quality result, and query path were used.

That matters when the question is sensitive, cross-domain, or likely to influence action.

The organizational reality: domain context, horizontal stewardship

Technology is only half the challenge. The harder part is the operating model.

Modern data architecture often argues for domain ownership. The principle is right: people closest to the business usually understand the data best. Finance understands finance definitions. Commercial teams understand commercial behavior. Clinical and operational teams understand the reality behind process and device data.

But a literal version of fully decentralized data ownership often breaks down in large organizations.

Valuable data products do not stay politely inside one domain. Customer data becomes useful to Finance. Finance data becomes useful to Commercial teams. Operational data becomes useful to Product and Support. Clinical context can influence multiple downstream analytics and reporting needs.

The more valuable a data product becomes, the more cross-functional it becomes.

That is where power, priority, and control challenges appear. If every domain independently owns its own pipelines, definitions, tooling, and standards, teams optimize locally. Definitions drift. Duplicate pipelines appear. Reuse becomes political. A central Jira queue then becomes the place where inconsistencies are discovered after the fact.

The model I have seen work better is hybrid.

Figure 5. A hybrid operating model keeps domain meaning close while preserving reusable, cross-domain implementation discipline.

This is not old-school centralization. It is horizontal stewardship.

Business domains provide meaning. The data organization turns that meaning into governed, reusable, reliable products.

Jira or intake workflows can still exist. The difference is that requests flow into a product-oriented backlog owned by a data product team, not into a fragmented support model where nobody owns the reusable asset end to end.

How I would implement it

The implementation should be boring in the best possible way. Do not start by trying to build an enterprise ontology for everything. Start with the questions and metrics that repeatedly cause disagreement across domains. A practical rollout can look like this:

Pick 10 to 20 cross-domain metrics that are reused often and debated often.
Map each metric to a data product, owner, grain, definition, exclusions, allowed dimensions, and approved joins.
Store those definitions as versioned contracts in Git or in a governed semantic registry.
Add metric-level quality and freshness checks, not only raw-table checks.
Attach privacy classification, policy rules, and permitted-use rules before exposing metrics to agents.
Expose the metrics through a semantic API or governed query broker that returns the answer, definition, lineage, caveats, and owner.
Connect AI agents only after the governed tool layer exists, with narrow tools, audit logs, access checks, and answer provenance.
Review actual agent questions and failed responses to improve definitions, caveats, and contracts over time.

This is not glamorous architecture. That is the point. The best version looks like production discipline: ownership, tests, review, contracts, APIs, and feedback loops.

Figure 6. Design principles for meaning contracts and AI-ready data platforms.

Design principles for an AI-ready data platform

Treat definitions as production assets. Metrics, dimensions, and business rules should be versioned, reviewed, tested, and owned.

Capture assumptions, not just descriptions. The most important meaning is often why something is included, excluded, deprecated, or trusted only for a narrow use case.

Attach quality to business meaning. Freshness and reliability should be visible at the level consumers use: metric, data product, and API, not only raw table.

Expose meaning through APIs. AI agents, BI tools, notebooks, and applications should consume the same definitions instead of duplicating logic.

Govern by maturity. Exploration, team-level metrics, and enterprise-certified metrics should have different review paths and expectations.

Keep domain expertise close, but standards horizontal. Domains know meaning. The data organization should own reusable implementation and cross-domain consistency.

Final thought

For years, analysts quietly carried the meaning that enterprise systems failed to capture.

AI changes the equation because software cannot rely on institutional memory.

The organizations that succeed will not be the ones with the largest models. They will be the ones that finally make business meaning a first-class architectural asset.

AI did not create the enterprise data problem. It exposed it.

Why Enterprise AI Needs Meaning Contracts

The problem AI exposes

Semantic layers are not new. The missing part is the contract.

Metadata alone will not save the agent

Meaning in practice: two examples

Example 1: healthcare or medtech telemetry

Example 2: commercial and finance reporting

What mature data platforms have been solving all along

Architecture pattern: keep agents behind a governed meaning interface

What a meaning contract looks like

A safer agent access pattern

The organizational reality: domain context, horizontal stewardship

How I would implement it

Design principles for an AI-ready data platform

Final thought

References

Comments

Command Palette

The problem AI exposes

Semantic layers are not new. The missing part is the contract.

Metadata alone will not save the agent

Meaning in practice: two examples

Example 1: healthcare or medtech telemetry

Example 2: commercial and finance reporting

What mature data platforms have been solving all along

Architecture pattern: keep agents behind a governed meaning interface

What a meaning contract looks like

A safer agent access pattern

The organizational reality: domain context, horizontal stewardship

How I would implement it

Design principles for an AI-ready data platform

Final thought

References

Comments