Data Integration – Choosing the Right Approach

By Dr. Chris Harding, Founder and Principal of Lacibus Ltd.

How do you approach data integration?

While it is often done on a case-by-case basis, a recent survey of enterprise and solution architects by The Open Group found that 62% of organisations are using or planning to use a specific data integration approach such as data virtualisation, data fabric, or data mesh. How would you pick the right approach for your organisation?

Questions like this are not best answered “off the top of your head”. You want to understand how the different approaches would work in your particular situation. You want to read explanations of them, and case studies. You want to talk to people with similar problems, see what they are doing or have done, and how this compares with what you want to do. You want to be able to follow standards and best practices.

The Open Group survey gives us a solid understanding of the state of data integration in enterprises today, and of the issues faced by enterprise and solution architects. It is a starting point for the development of standards and best practices to guide architects in the future.

The State of Data Integration

The survey was completed initially by some members of The Open Group Architecture Forum. It was then modified slightly in the light of feedback, and completed by members of the Association of Enterprise Architects. In all, there were over 600 responses.

You will probably recognise many features of the picture that the survey paints. Business leaders mostly view data as a strategic corporate asset, but data use is often localized by business unit. Some data is in the Cloud, some on-premise. Overall data quality is mixed: some excellent, some terrible, most somewhere in between. There are often islands of “quality” data with differing management regimes.

Respondents identified several improvement points:

  • Governance and Stewardship (47%)
  • Accelerating speed of discovery and delivery of data – e.g. DataOps (20%)
  • Creating a data platform (18%)
  • Self Service (7%)
  • Systematically protecting data (3%)
  • Culture, data and content modelling, silos, technical capabilities, and understanding value were also mentioned.

Data to be integrated comes from corporate functions and lines of business. It is mostly in databases, but often in electronic documents, sometimes from real-time sensors or social media. Information requirements are mostly specified by the CIO and business analysts, sometimes by business departments. Quality characteristics are specified for some, but not all, data. The data may include personally-identifiable information (PII).

Some 29% of respondents had a corporate integrated information-sharing environment such as data warehouses, data lakes, or archives; 17% had point to point interfaces between applications and services internally and externally; 16% had line of business data silos; and 36% had a mixture of these. However, 62% were in organizations that were using or planning to use a specific data integration approach, such as data virtualization (37%), data fabric (27%), or data mesh (23%).

Pain Points

The survey included a free-form question that asked respondents for their biggest data integration problems and pain points. The answers can be grouped in five main areas.

1. Lack of commitment from business units

Departments don’t want to share their data. They have no understanding of the business value of doing so. It is difficult to find the data that is needed, and hard to get the subject-matter experts to explain it.

2. Lack of commitment at corporate level

Enterprise data integration is not seen as a business initiative that justifies investment.

3. Heterogeneous sources and tool-stacks

There are different formats with different processing needs, different data platforms, web services with different languages and operating systems, and SaaS providers with different interfaces.

4. Conflicting data models

There is often no enterprise data model. The data is not standardized. It includes data from legacy and open systems. It incorporates external data that is ontologically and taxonomically un-normalized and/or at odds with internal data.

5. No culture of data management

There is often no data governance task force, and no policies that speak directly to data. There are data quality issues, such as inconsistent data from different sources and duplicate records. 

A Guide to Data Integration

To help architects cope with these issues, The Open Group Data Integration Work Group will produce a Guide to Data Integration using The Open Group Standards.

Since its first version in 1995, the TOGAF® standard has been adopted by more than 80% of the world’s leading enterprises as the architecture framework and development method of choice. In 2019 and 2020, The Open Group added to it the Digital Practitioner Body of KnowledgeTM Standard, to assist individuals and organizations who wish to create and manage product offerings with an increasing digital component or lead their organization to becoming a Digital-First Enterprise, and the Open Agile ArchitectureTM standard, incorporating agile practices to guide enterprises through digital transformation. These three standards are at the core of The Open Group Digital Portfolio of Open Standards.

This is an exciting time for Data Integration, with emerging technical developments such as Data Fabric, Data Mesh, and Data Ops, driven by advances in enterprise computing and artificial intelligence. While the Digital Portfolio standards provide a solid framework for traditional and agile Enterprise Architecture, suited to the Digital age, they do not provide specific guidance on Data Integration. The new guide will fill this gap.

The Data Integration Work Group, which is part of The Open Group Architecture Forum, conducted the survey of enterprise and solution architects, having previously published a White Paper on Technical Standards for Data Integration. Development of the Guide is the next stage in its roadmap. In order to write the Guide, it will research trends and use cases, and review the relevant Open Group standards. The work will be carried out by members of the Architecture Forum, with the assistance of invited experts from other bodies.

To find out more about the Architecture Forum, the Data Integration Work Group, and the benefits of participating in the effort to develop a new Guide, please contact us here.

Dr. Chris Harding is Founder and Principal of Lacibus Ltd. He formed the company to provide services based on virtual data lakes and data-centered architecture. Chris developed the ideas that led to the formation of the company while working as Director of the Open Platform 3.0™ Forum of The Open Group.

Chris was a staff member of The Open Group for many years, supporting its member activities in data communications, directory interoperability, web, service-oriented architecture, cloud computing, and other areas. He was the lead author of The Open Group Guide: Cloud Computing for Business, has helped produce a number of other publications by The Open Group, and has written many online articles. He remembers the early development of the TOGAF® Standard, a standard of The Open Group, and maintains an interest in Enterprise Architecture as a member of the Work Group on TOGAF Supporting the Digital Enterprise. His main focus is now on data platforms. He follows several industry initatives related to this, and participates in The Open Group Data Integration Work Group.

Before joining The Open Group, Chris was a data communications consultant and, before that, a software engineer and team leader. He has a PhD in Mathematical Logic.

He lives with his wife in Lincolnshire, UK, where he has scope to pursue his hobbies of gardening and photography.