what is driving the semantic layer revival? (ai can’t live without it)

TL;DR: We have been trying to define meaningful context layers for decades, but the humans we created them for just wouldn’t embrace them. Along come AI agents and they are thirsty (yes, I’m personifying AI) for this clarifying metadata in an attempt to provide more accurate answers.

Same old problem

As database, table, and column names are defined by technologists, use a variety of abbreviation patterns, and their initial intention is rarely rigorously aligned with the data that eventually gets stored within them, it should not come as a surprise that we have always needed a well-documented and regularly-maintained business translation layer. Nowadays, we call this the ‘semantic layer’, but 30+ years ago when I started in software we called it a ‘data dictionary’.

There have been many other names along the way (data catalog, business object universe, glossary, metadata repository, etc), but the goal is still the same — to create an abstraction between raw data and the people or systems consuming it. Why have we never really solved this in a pervasive way? As is usually the case, I blame us humans.

Skinning elbows

Humans have a tendency to want to learn everything themselves. Imagine how advanced of a civilization we would be if we could just take a face-value all of the knowledge and life-lessons that those before us built up in their lifetime. Unfortunately, we have to ‘skin our elbows’ on the same problems that have been solved so many times before. That unwillingness to learn from others is the heart of why we don’t already have incredibly accurate semantic layers.

Without that desire to just accept knowledge others are trying to share, it is also no wonder the ‘data czar’ role never really hit at the vast majority of enterprises. When it did, it was more like an intellectual society writing for themselves and nobody else wanted to listen. Then came the self-service BI world that exclaimed there was no need for a semantic layer.

Awesome BI tools brought forward the ‘story tellers’ of the data. These artists sure didn’t need a business glossary to help them tell their stories around the campfire!

Interested clients

So what’s new? Yep, GenAI is here; like it or not. Along with it is the expectation that somehow these probabilistic pattern-matching models will magically be cognizant and be able to provide truly deterministic answers. That isn’t going to happen, but as with the earlier phases of ‘prompt engineering’ and RAG, the current round of AI ‘agents’ (yep, they are still heavily using the LLMs) return better answers when provided with additional definitions, business meanings, and example scenarios (aka CONTEXT).

Will it all work this time? Will this generation of data systems actually have accurate semantic layers representing them? The pessimistic-optimist in me says it might just so, but the pragmatic programmer in me thinks it probably won’t. Costs will continue to rise for AI tooling, unrealistic expectations will continue to drive AI adoption, and ultimately I fear folks don’t want to maintain these data dictionaries and executives don’t want to pay for their creation and upkeep.

The semantic layer — it is STILL a good idea!

Published by lestermartin

Developer advocate, trainer, blogger, and data engineer focused on data lake & streaming frameworks including Trino, Hive, Spark, Flink, Kafka and NiFi.

Leave a Reply

Discover more from Lester Martin (l11n)

Subscribe now to keep reading and get access to the full archive.

Continue reading