Counterparty Data Reconciliation at Scale: A 350,000-Record Case Study – E133
Most counterparty data reconciliation projects fail at the same assumption: that one identifier — usually a tax ID — can resolve who you’re actually doing business with. In Episode 133 of What Counts, Maura Dunn walks through a real two-year project to reconcile 350,000 counterparty records across eight systems at a company built through acquisition: four contract management platforms, one ERP carrying both customer and supplier masters, and three trading systems, each with its own naming conventions, character limits, and overflow fields. She unpacks the 18 months of unproductive matching that came first, the rule-precedence approach that finally worked once Snowflake and Elasticsearch replaced the spreadsheet attempts, and the 10-to-1 collapse from 350K records down to 35K true entities. She also makes the case for where AI fits this kind of work today — and the one thing it still can’t do unless you put deep institutional knowledge into the prompt. If you want to see what’s hiding in your own shared drives right now, search TrailBlazer Insight in the Microsoft Store — it scans locally for PII, HIPAA, PCI, and other compliance risks with no cloud upload and no IT ticket required.
This episode picks up where Episode 132: Data Reconciliation Before AI left off — Maura delivers the full case study we teased last time.
Topics covered in this episode include post-acquisition contract data cleanup, duplicate counterparty detection across multiple CLM platforms, rule-based data matching at scale using Snowflake and Elasticsearch, and the role of AI in contract data reconciliation when source systems lack consistent identifiers.
Episode length: 00:21:01
0:00 – Pre-roll: TrailBlazer Insight — local compliance scanning for PII, HIPAA, and PCI
0:20 – Show intro
0:47 – Setting up the case study: 350,000 records across 8 systems
1:52 – How growth through acquisition created 7 (then 8) active counterparty sources
3:55 – Why the same legal entity can appear differently in every system
6:22 – The small business analogy: 4 addresses in 13 years
8:09 – The first 18 months: why tax ID matching failed at scale
10:48 – Name matching, character limits, overflow fields, and legacy system formatting
12:41 – Spreadsheet-by-spreadsheet spinning wheels
13:25 – The breakthrough: contract type bucketing + multi-variable matching
15:08 – Moving to Snowflake and Elasticsearch for rule-precedence matching
16:14 – Where AI could accelerate this today — and what it still needs from you
17:41 – The result: 350K records collapsed to 35K true entities
18:13 – What came out of all that work
19:49 – Teaser: next episode covers how to prevent this from happening again
What Counts is produced by TrailBlazer Consulting, LLC and hosted by Lee Karas and Maura Dunn. Learn more at trailblazer.us.com or email us at info@trailblazer.us.com. Explore compliance-ready training at the TrailBlazer Learning Academy. Read more from Maura at mauradunn.substack.com. Music by Jason Blake. Full disclaimer.














