Smart AI

This blog specializes in smart usages of Generative AI for professional use.

Meeting summaries (AI or not) - forced conclusions and missing history
Published 14.5.2026 by Christian Ginter
Categories: use-casesai-tools

Meeting summaries (AI or not) - forced conclusions and missing history

Quite a while ago I realised that often I did not want to actually have a summarization although i asked AI to e.g. 'summarize this' ... but what i wanted was more like "Summarize and extract key takeaways, do citations on key aspects.".

Related to that I’ve been spending some time figuring out how to integrate AI tools into daily workflows without just speeding up the rate at which mess is created. When we automate meeting summaries, we often force models to jump straight to conclusions without providing the necessary history, which is why empty sections are actually fine.

But also that is maybe too simplified ... it depends on what the output should stand for... Should it just give a quick refresh for someone who was part of a meeting last week, or is it intended for the client-documentation folder for the whole organisation to use?

While a "quick refresh" might survive the hallucinations of a standard AI prompt, any summary intended for shared documentation or decision-making requires a more rigorous architecture to prevent shipping "cognitive debt."

Identification - asking if the data actually supports the claim

I was reading two articles on this topic LLM Summarizers Skip the Identification Step by William Gieng and Identification: The Key to Credible Causal Inference by Murat Unal.

They discuss a failure mode that happens when we ask a model to summarize. In causal inference, there is a step called "Identification." It’s the part where you stop and ask: Does the data I have actually support the claim I want to make?

Only after you prove identification do you move on to "Estimation" (producing the actual number or claim).

Gieng argues that standard AI summarization pipelines skip Identification entirely. If the prompt asks for "Decisions," the system is essentially forced to estimate a decision to fill the template. If the meeting was just five minutes of vague banter, the model will often infer a firm decision from a half-finished sentence—much like a human note-taker guessing what their boss wanted to hear.

It isn't summarizing; it's pattern-matching what a summary should look like.

Illustration 1

A meeting is more than its transcript

This got me thinking about what we are actually feeding into these models.

When we summarize a meeting, we don't just rely on the verbatim words spoken. We bring a ton of context:

  1. The raw transcript (what was said).
  2. Visual artifacts (the slides or code we looked at).
  3. Real-time intent (what someone cared enough about to write down in their notes).
  4. Historical trajectory (knowing that this client has complained about this exact bug three times in the last month).

If you hand an audio recording to a human assistant who has zero context about the project, their summary will likely miss the point. The same is true for AI. If we only feed the AI the raw transcript, it lacks the context that actually makes a conversation matter. If someone says "Everything is mostly on track," the literal transcript sees a positive update. But the historical context knows we missed the last two deadlines—making that polite statement a massive risk.

If we design systems that only feed the AI a raw transcript, intentionally blinding it to context and intent, we end up shipping cognitive debt into our team's shared understanding.

Exploring a different architecture

So, how do we use these tools without letting them invent consensus?

Gieng suggests we have to force the Identification step into the pipeline. Instead of a single prompt, he proposes a constrained architecture:

  1. Extraction: Pull facts conservatively. Don't invent.
  2. Synthesis: Build claims, but explicitly label them (e.g., Observed vs Inferred) and point to the evidence.
  3. Audit: A final review stage that is strictly constrained to weaken claims.

This audit stage feels like the key. It cannot rewrite text to make it smoother. It can only delete a claim, downgrade its certainty, or insert an [Insufficient Evidence] tag.

For those interested in the mechanics of how these constraints are mathematically and logically enforced, I highly recommend reading Gieng's original piece linked in the references below.

How this looks in practice

To make this concrete, here are two common failure modes and how forcing the Identification step changes the outcome.

Example 1: The polite update (Missing History) Imagine a quick sync where an engineer says: "Yeah, we're mostly on track, just a few final things to wrap up."

  • Standard AI summary: Outputs a clean bullet point: "Status: Project on track."
  • The Audit Pipeline: The Extraction step pulls the quote. The Synthesis step tries to claim "on track." But the Audit stage compares this against the historical trajectory (the CRM shows this milestone has been pushed back three times). The Audit stage is forced to weaken the claim, outputting: Status: [Unclear - statement contradicts historical delays].

Instead of silently shipping false confidence to the team, the system flags the risk.

Example 2: The thin meeting (Forced Abstention) Imagine a 15-minute call where two people brainstorm vaguely about a new feature but don't actually commit to who is doing what.

  • Standard Approach: Forces the model to infer who is doing what just to fill its required template, resulting in three hallucinated Action Items.
  • The Audit Pipeline: The Synthesis stage might try to guess action items based on who spoke last. But during the Audit stage, the AI looks for concrete extraction evidence (like "I will do X by Tuesday") and finds none. Because it is forbidden from smoothing text, it deletes the hallucinated items and outputs: Action Items: [None explicitly identified].

Abstention is a feature, not a bug

This is the part to take away: if a meeting was thin, unstructured, or lacked real decisions, the summary should be empty.

We often view an empty summary section as a failure of the AI. But Gieng points out that allowing empty sections is actually letting the system do the right thing—declining to assert what the source doesn't support.

Closing

I’m still starting to experiment with how to build these constraints into daily workflows. But realizing that "speed" often just means skipping the identification step changes how we look at AI summarization completely. An honest, empty summary is better than a fast, fabricated one.

References