The Semantic Model Is the Contract

I've spent most of the past decade building business-intelligence environments at every scale: Fortune-1000 banks, mid-market manufacturers, healthcare networks, non-profits with $400K budgets. The technology gets blamed for failures more often than it deserves. The actual culprit is almost always upstream of the technology: nobody decided what the metrics mean.

This essay is about the artifact where that decision lives — the semantic model — and why I think it's the most under-appreciated component of every BI rollout I've ever seen.

What a semantic model is, in one paragraph

A semantic model sits between your raw data and every report that depends on it. It's where you say "a customer is identified by this column," "active means this," "revenue is recognized this way," and "these two tables join on this key." Every chart, scorecard, and dashboard that gets built on top inherits those decisions. Power BI calls it a "dataset" (now "semantic model"); dbt calls them "models"; Looker calls them "explores." The vocabulary differs; the function is identical.

The semantic model is, in plain terms, the contract between data and decisions.

Why most rollouts skip it

The fastest way to ship a Power BI report is to connect Power BI Desktop directly to a database, drag a table onto the canvas, and start building visuals. It works. The first week, it feels productive. Reports show up; meetings get screenshots; everyone's happy.

The trouble starts around week six. Someone in finance pulls a "monthly revenue" number and someone in sales pulls a "monthly revenue" number and they're $80K apart. Then someone asks "well, which is right?" and the honest answer is: both, depending on whether you count an invoice on the date it was sent or the date it was paid, and whether you exclude credit memos, and whether you include the rebate accruals.

That conversation should have happened on day one, in the semantic model, in writing. Instead, it happens on day forty, in a Slack thread, with three different reports as exhibits.

The cheapest place to define a metric is the semantic model. The most expensive place is a board meeting.

The four contracts a semantic model encodes

When I review a model that's been built well, I look for four explicit decisions:

1. Identity contracts

Who is a customer? A row in the customer table? A row joined to a billing table? A row that has at least one paid invoice in the last twelve months? Each definition produces a different "active customer count," and they can be off by 30% or more.

Identity contracts get encoded in dimension tables. DimCustomer is the canonical list of customer entities; the columns on it represent the agreed-upon attributes. Anyone needing to count, segment, or filter customers reaches for DimCustomer, not for raw transactional data.

2. Metric contracts

What is "Revenue"? Net of returns? Inclusive of recurring + one-time? Recognized when invoiced or when paid? Currency-converted at the transaction date or month-end?

This is where DAX measures live. The convention I follow:

Revenue =
    CALCULATE(
        SUM('FactInvoice'[NetAmount]),
        'FactInvoice'[Status] = "Recognized"
    )

RecurringRevenue =
    CALCULATE(
        [Revenue],
        'FactInvoice'[InvoiceType] = "Recurring"
    )

OneTimeRevenue = [Revenue] - [RecurringRevenue]

Three observations about that block:

The names are readable English. Not m_xxx_001. A finance lead can read it, even without DAX fluency.
The composition is explicit. OneTimeRevenue is defined as the residual of Revenue minus RecurringRevenue. If the definition of Revenue changes, the others follow automatically.
The filter logic is centralized. "Recognized" only appears in the base measure. Every dependent measure inherits it. No drift.

That's a metric contract. It's worth more than half the dashboards built on top of it.

3. Time contracts

"Year-over-year" only means anything if you've agreed on what year boundary to use. Calendar year? Fiscal year ending in March? Comparable-period (last 30 days vs the 30 before that)?

A clean semantic model has a dedicated date table (a "calendar dimension") with explicit columns for fiscal periods, calendar periods, and rolling-window flags. Time intelligence then composes cleanly:

RevenueLY =
    CALCULATE(
        [Revenue],
        SAMEPERIODLASTYEAR('DimDate'[Date])
    )

RevenueYoY% =
    DIVIDE([Revenue] - [RevenueLY], [RevenueLY])

Once those two measures exist, every report that needs a YoY comparison reaches for RevenueYoY%. Anyone re-implementing it from scratch in a new report has done two things wrong: built something that already exists, and risked drifting from the canonical definition.

4. Security contracts

Who can see what? In a multi-site, multi-region, or multi-product business, the answer is rarely "everyone sees everything." Sales reps should see their own pipeline, not the team's. Site managers should see their site, not the others. Executives should see all of it.

Row-level security (RLS) is configured on the semantic model, not on individual reports. That means: build the rules once, and every dashboard that depends on the model inherits them. Add a new dashboard tomorrow; it's secure by default.

I've seen organizations layer security at the report level instead. It's a maintenance trap. Every new report becomes another place where the rules might leak. RLS at the model layer is one place to audit, one place to maintain, one place to break if it breaks.

How we build them: the actual workflow

For Stage 04 engagements at BiWize, the semantic model is built in roughly this sequence:

Step 1: KPI workshop

One half-day session with leadership and the people closest to the data. Whiteboard the 5-15 metrics that matter. For each: who owns it, where the source data lives, what definitional ambiguities exist. The output is a one-page metric register that becomes the spec for the model.

Step 2: Source audit

Trace every metric back to its origin. ERP table? CRM API? Spreadsheet on someone's laptop? Identify the gaps before the build. This is where data-quality flags get logged: "this column has 12% nulls," "these two source rows disagree by $1,200/month."

Step 3: Star schema design

Most operational reporting fits cleanly into star schema: one fact table per business process (invoices, sales, orders), surrounded by dimension tables (customers, products, dates, regions). I prefer this even when the data warehouse is already shaped differently — the star schema is the lens, not the storage.

Step 4: Model build

In Power BI Desktop, in Tabular Editor, in .bim files, in source control. The model gets named, related, governed measures. RLS rules get encoded. The audit trail starts here.

Step 5: Documentation

Every measure gets a description. Every relationship gets a comment. The metric register from Step 1 becomes the README that ships with the model.

Step 6: Reports built on top

Only after the model is in good shape do we build dashboards on it. The first dashboard takes a week; the next ones take days, because they inherit everything that's already been decided.

The DAX-readability rule

Here's the rule I don't compromise on. Every measure in the model must be readable by a non-DAX-fluent business user, and re-implementable by a junior analyst, six months after I'm gone.

Practically:

Names are English. RevenueYoY% beats m_rev_yoy_001 in every meeting that follows.
Comments explain intent, not syntax. If someone needs to read a comment to understand what SAMEPERIODLASTYEAR does, they need to read the docs, not your code.
Composition over duplication. If OneTimeRevenue can be derived from Revenue, derive it. Don't reimplement the filter logic.
One canonical home per concept. If "Active Customer" appears in three different measures with three different filter conditions, two of them are wrong.

We don't write a single DAX measure that the team can't read six months later.

The case against shortcuts

I want to address two common objections, because they come up on every fit call.

"We're a small shop. We don't need a semantic model." If your business runs on three reports that two people maintain, that's true. The moment a fourth person needs to extend a report, or a fourth report needs to reuse the same metric, the cost of not having a model crosses zero. Most teams don't realize they've crossed that line until they're already on the wrong side of it.

"Power BI Desktop has DirectQuery. We don't need a separate model." DirectQuery is a query engine, not a semantic layer. You still need to decide what a customer is, what revenue is, how time periods compare. DirectQuery just changes where the query runs. The contracts still need to exist somewhere; the question is whether they're in your head, in a spreadsheet, or in a model that survives the next person who joins the team.

What good looks like, in a sentence

You can tell a good semantic model from a bad one by handing it to someone who didn't build it and watching what happens. Good models are extensible — the new person reads the existing measures, understands the conventions, and adds three new ones that fit the same patterns. Bad models are forks — the new person looks at the existing mess, despairs, and builds something parallel that won't talk to anything else.

The semantic model is the contract. Sign it once, properly, before anyone builds a report. Everything that follows gets cheaper, more reliable, and more defensible. Skip it, and every dashboard you build is renting credibility on borrowed time.

The semantic model is the contract.