Unified Data Integration In A Distributed Data Landscape

Companies need new and innovative solutions to manage and govern data ingestion, data integration and data movement across workload optimized analytical systems in the distributed data architecture.


DOWNLOAD

The Increasingly Distributed Data Landscape

Over the last several years, data complexity has increased dramatically in many companies & continues to do so as more & more data is captured & more databases & object stores emerge to store it. This includes both internal & external data sources.

  • Core transaction processing systems are now on the cloud as well as on-premises.
  • NoSQL databases are being adopted to allow web and mobile commerce applications to capture non-transactional data at scale.
  • The adoption of cloud storage is also increasing – especially for capturing big data.
  • Multiple data warehouses have been built creating islands of overlapping historical data.
  • Big data platforms like Hadoop and Graph DBMSs have entered the enterprise extending analytical environments beyond the data warehouse.
whitepaper-1

 

The Modern Analytical Ecosystem

The emergence of big data has resulted in new analytical workloads that are not well suited to traditional data warehouse environments. These workloads, typically being driven by data characteristics (variety, velocity and volume) and the types of analysis required, have caused many companies to extend their analytical set-up beyond the data warehouse to include multiple analytical data stores. Multiple platforms now exist in the enterprise to support different analytical workloads. As a result data integration and data movement has increased rapidly across data stores in this new analytical ecosystem.

The distributed data landscape is causing increased complexity. Different data integration technologies are being used in different parts of the ecosystem. Both production and agile self-service data integration technologies are being used and silos have emerged. Also many companies are rapidly reaching a point where a ‘data deluge’ is occurring in that data is now arriving faster than they can consume it. The conclusion here is obvious. There has to be a better, more governed way to fuel productivity and agility without causing data inconsistency and chaos. Everyone for themselves is not an option.

whitepaper-1

Data Integration Use Cases in a Distributed Lake

  • Data is being collected via streaming, batch ingest, replication and archiving with some data too big to move once captured.
  • Data lakes / reservoirs are increasingly becoming distributed
  • Compliance with different data privacy laws in different jurisdictions around the world is a key reason why some data will be kept apart.
  • Data integration software should exploit the power of underlying platforms to scale ETL processing.

 

Managing Distributed Data Integration Using Diyotta

Diyotta is a new vendor offering distributed data integration software. Diyotta is a provider of distributed data integration software that handles the complexity of multiple platforms in a modern analytical ecosystem. Diyotta Data Integration Suite supports a range of on-premises, cloud-based, and external data sources. Diyotta Data Integration Suite supports integration of structured, semi-structured and unstructured data.Data integration jobs are executed in a distributed fashion.
Jobs are developed centrally and execute locally whereas metadata is stored centrally. Tasks are pushed down to run close to the data and all data is moved point-to-point. Also Diyotta can leverage the power of Hadoop, Spark and MPP RDBMSs. Diyotta also enables data integration jobs to be broken up into re-usable components. Diyotta can also invoke data integration jobs developed in 3rd party tools to help unify silos and coordinate data integration across a distributed data environment.

Data Integration Requirements in a Distributed Landscape

  • Integrate multiple data

    Integrate multiple data types in-motion and at rest

  • Define Once

    Define once, execute anywhere

  • Pushdown

    Pushdown processing to exploit scalable platforms

  • Execute in a hybrid

    Execute in a hybrid environment

  • nest workflows

    Nest workflows and invoke 3rd party data integration jobs

  • Support rule

    Support rule versioning for compliance

  • Data Integration

    Data integration on-demand and as a service

Conclusion

The explosion in the number of data sources, together with the need to analyse new types of data has led many companies to extend their analytical environments beyond the data warehouse to include new data stores and platforms optimised for new analytical workloads. The data is becoming harder to access because it is in multiple data stores and multiple formats and yet, paradoxically, business is demanding more and more agility, together with the ability to respond much more rapidly than ever before.

In this kind of environment, companies need new tools to manage and govern data ingestion, data integration and data movement across workload optimised analytical systems. They also need the ability to scale to handle volume and velocity as required. Diyotta Data Integration Suite is a clear candidate technology to deal with this new set of requirements, address the data deluge, unify data integration siloes and allow companies to remain agile in a distributed data landscape.