Skip to main content


      Many organisations face difficulties when managing their internal data flows. Understanding the end-to-end flow of data as it moves within and across systems is often a challenging task, due not only to the inherent complexity of the systems themselves, but also to a lack of structure and documentation.

      Data lineage provides a structured framework for tracking the flow of data across tables, programs, and applications, detailing its origins and movements. It offers a comprehensive view of your data ecosystem while also enabling in-depth analysis of specific processes. As a critical component of effective master data management, data lineage ensures greater transparency and control within your organisation.

      Regulatory requirements are often a key driver behind data lineage initiatives. For example, GDPR requires businesses to keep track of user data—something that can be difficult without a clear overview of data lineage.

      Data lineage also plays a key role in maintaining and evolving legacy systems. The process of developing, updating, or migrating to these systems can be challenging due to accumulated technical debt or limited expertise. By automatically mapping upstream and downstream dependencies and data flows, data lineage simplifies this process and provides valuable insights for smoother transitions. 

      Our NewTech team is a leader in generative AI and the adoption of emerging technologies. We have successfully developed GenAI-based accelerators for data lineage that automatically analyze source scripts to establish lineage with exceptional accuracy. Tasks that would typically take an experienced programmer weeks to complete are now accomplished in just minutes.


      Mads Galatius
      Mads Galatius

      Director, Advisory

      KPMG in Denmark



      We can help you with:

      Our data lineage accelerator is versatile and can be applied across various stages of the data flow. Whether it's analyzing program dependencies in SAS scripts, managing the loading and saving of persistent data tables, or tracing field-level data flows between SQL tables, the accelerator can be tailored to meet any use case. By combining agent networks, prompt engineering, and structured connections, the solution achieves an impressive 95% accuracy (based on real client engagements), surpassing human capabilities.

      The output is a graph network of nodes and edges that visually illustrates the locations of data and their relationships. This network can be easily queried for valuable insights or navigated visually for an in-depth overview. With this tool, you can quickly answer essential questions such as:

      • Which downstream data fields are influenced by my original data field?
      • What programs does this SAS script rely on?
      • Which files do this script save to, and how are they integrated into other programs?

      The inter-dependencies within a data ecosystem define how scripts interact with various components, such as data sources, programs, tables, and other scripts. Scripts often depend on specific programs, libraries, or software environments. For example, a SAS script might rely on external functions, packages, or tools to perform its tasks, and changes to these dependencies—such as version updates—can impact the script’s behavior or results (known as program dependencies).

      Furthermore, scripts may be read from or written to specific databases, tables, or files, establishing dependencies on the availability and integrity of those data sources (known as data-flow dependencies).

      We will both establish the data lineage snapshot and provide tools to automate the updating process. Data lineage is only as good as its accuracy, and any change to the underlying process or programs mandates an update. By leveraging a GenAI-based approach, we eliminate the risk of outdated lineage and save future time by automating what would otherwise be a manual process.

      There are several existing tools that leverage data lineage to generate insights, with Microsoft Purview, Informatica, and Collibra being prominent examples. However, commercially available systems typically cover only 50–70% of the complete data lineage, leaving gaps in the full data flow analysis. Our data lineage accelerator can seamlessly integrate with the beforementioned tools, allowing clients to leverage their existing systems while gaining a more comprehensive understanding of their data lineage.




      Other relevant services

      We deliver end-to-end services in GenAI – from identifying and ensuring you have the right level of data quality to building the software application.

      We offer tailor-made projects and managed services, having delivered some of the largest enterprise-wide automation solutions.

      Let our master data management experts accelerate your journey towards achieving consistent and trustworthy data across your organization.

      Explore our insights on AI & data

      Your one-stop destination for AI insights, events, and services.

      Smart automation industry robot in action - industry 4.0 concept - 3D Render