Rethink Metadata … Gaps & Opportunities

Deepak Chandramouli
StellarSense
Published in
6 min readDec 6, 2021

--

Background Image by Yeshi Kangrang on Unsplash

This is the 4th episode of Rethink Metadata … Blog Series.

So far, we covered —

  1. Metadata & it’s Facets
  2. Relevance of metadata
  3. Business Functions, Capabilities & Products

In this episode, let’s look at the Gaps in leveraging Enterprise Metadata in its current state.

Current Challenges

In order to better understand the challenges, let us first consider the nature of problems & then analyze the resulting impact these problems pose on the business side.

Nature of Problems

1. Enterprise Metadata is fragmented

The scope of Metadata is ever expanding. The term originally gravitated mostly around data. But in today’s world, metadata covers many different facets as we have seen in the first episode. With evolution, Enterprise Metadata has become siloed by highly specialized products that are critical for operations. Taking few examples —

  • API catalogs are prominent in large scale services in any enterprise.
  • Data Catalogs deal with data centric details.
  • App Catalogs power the Data Application lifecycle.
  • ML Catalogs enable Model Management in the ML space.
  • Feature Catalogs are gaining prominence with evolution of ML.
  • Infra Catalogs are foundational to technology landscape, helping manage the lifecycle of systems & databases.

2. Deep Links between Facets of Metadata are hard to establish

As covered on several occasions through this series —

Modern world’s Enterprise Metadata is a Deep Super Graph connecting People, Technologies & their Lifecycles.

Lack of strong connections between the 3 realms (People, Lifecycle, Technology) — inhibits an enterprise from attaining higher level of intelligence.

3. Current forms of integration are not sustainable

While integrations are evolving across the various facets of metadata, they are still loosely coupled & short-lived. We often see quick integrations between products and services as a way of connecting the dots in a platform centric ecosystem. However, with time — these integrations often tend to be short-lived & wither away as the individual products evolve to keep-up with their ever demanding roadmap.

Let’s take the case of solving for Data Lineage as an example. To effectively deliver data lineage — an organization has to collate details from the facets of API, App, Data, Classification, ML and Features. The solution most often comes in the form of metadata Integrations or loosely coupled data points. But by nature, each facet of metadata is not owned and managed by a single team or organization. And each product has its own priorities to deliver in a given period of time. On a longer time period, we end up with loosely coupled systems that do not deliver an effective solution that outlives changes in teams, organizations and technology stacks.

4. Individual Facets of Metadata systems offer only a Linear View of its realm

Products & Features in each Facet of Metadata offer a very linear view of its realm. To really fulfill enterprise use cases — it is necessary to collate information across metadata systems.

  • ML Catalogs go deep into Machine learning metadata specs, but rich information about the data that the models rely on often lie within the worlds of Feature Catalogs. Going one step further, the true metadata about the datasets are entrenched in Data Catalogs.
  • App Catalogs are crucial for data driven enterprises. The Data Application lifecycle is powered by these App Catalogs. However, the App Catalogs offer little information on the Data that is consumed or produced by the applications. For this level of information, one has to hop over to Data Catalogs.
  • API Catalogs, as we had covered in the previous episodes, are core to enterprises that run large scale services. In order to understand the downstream flow of information from the various APIs & Services, one has to stitch the information with other metadata systems such as App Catalog, Data Catalog and ML Catalog . This is most commonly seen as Lineage.

Resulting Business Impact

1. Big Picture is hard to paint

Having an Enterprise sense is to gain broader landscape visibility — across the dimensions of People, Technologies and Lifecycles. Today we lack such a capability in complex ecosystems. Context is limited to specific fields. Here is a video elaborating this specific problem.

2. User Experience is hard to tailor for various personas

Each Metadata centric product is designed to deliver a tailored user experience.

As an example, the user experience in ML catalog is crafted around the lifecycle of Machine Learning domain. On the other, API Catalogs focus the governance & operations of APIs and services. But a solution like data lineage touch many more facets of metadata, such as Apps, Data & ML.

Along similar lines, a privacy use case requires a user experience connecting Data, Classification (DLP), Glossary & Apps.

3. Hard to Unlock Hidden Potentials & Risks

The very separation of concerns, by nature, inhibits connecting the dots in an Enterprise. Let’s take an example of productivity. Imagine surfacing recommendations around Artifacts, Data, App, similar users — for a new employee in an organization. This requires building context that requires deep connections between org, user, app, data & then generating similarities via the context. With the conventional solutions — building recommendations is a stretch due to limited depth of information. And any attempt to build a solution would entail moving vast amounts of metadata across the realms of app, user, data, etc,.

4. In current form — Metadata driven Decision Making comes at High Cost

Mature organizations track & deliver complex solutions via KRIs, KPIs or other forms of insights. But with focus mostly towards operationalizing the individual stacks — reporting values & insights become last-mile problem. This often get outsourced to BI products that extract data out of various metadata systems to surface insights. Such solutions are expensive to develop, maintain & evolve over a longer period of time, resulting in High Total Cost of Ownership.

5. Delayed Insights result in Slower Decision Making

Often advanced insights are necessary to take timely actions — be it mitigating risk or improving operational efficiencies. Due to the fragmented nature of Metadata & Its facets — gaining new or quick insights are not feasible. This requirement often entails time consuming projects, which results in prolonged risk exposures and lost opportunities.

Opportunities

In order to maximize the value from metadata & monetize it — we have to address the underlying nature of problems. This is a huge opportunity to bring positive business impact.

Stay Tuned…

In the upcoming posts, I’ll take a 2 step approach to addressing the underlying nature of problems by —

  1. Looking at the current system design characteristics of Metadata centric or Infra Products.
  2. Discussing a new product centric approach to address the challenges.
Photo by Kevin Butz on Unsplash

--

--