We live in a new era of data. It’s not just big data, it is modern data — with new types, sources, volumes, and locations of data like never before. We also live an era with new emerging solutions to meet the most pressing data integration challenges. This modern data is more complex and more distributed than anything we have encountered in the past, and if we are going to take advantage of it, then we must connect emerging data with modern data integration.

At Diyotta we have identified five key principles of modern data integration to unlock unprecedented new insight from the matrix of data that surrounds us. Working together, they take advantage of the evolution of new data and new platforms, rather than fighting against the rising tide.

Screen Shot 2015-09-03 at 10.05.49 PM


1. Take the processing to where the data lives.

There is too much data to move every time the need to blend or transform arise. The logic is simple — place agents where the data lives and process it locally. Carefully coordinated instructions should be sent to the agent, and the work done on the host platform before any data is moved. By taking the processing to where the data lives, you eliminate the bottleneck of the ETL server and decrease the movement of data across the network.

2. Fully leverage all platforms based on what they were designed to do well.

You wouldn’t use a pair of scissors to cut your grass, so why would you use Hadoop for speedy queries? If you’ve already invested in powerful databases and you are making new investments in modern data platforms, every platform has a set of workloads that it handles well and a set of built-in functions. Modern Data Integration allows you to call-out those native functions locally, process them within powerful platforms, and distribute the workload in the most efficient way possible. By fully leveraging existing platforms, you increase the performance of all of your data blending and enrichment while minimizing data movement.

3. Move data point-to-point to avoid single server bottlenecks.

Since data is moving in all directions, it is vital to establish processes that move data point-to-point, rather than through a server. This new approach gives you the ability to move data at the right time. There are times when you want to move all of your data; but most of the time, it makes more sense to process it in its native environment, and then move the result data set. By moving data point-to-point and eliminating server bottlenecks, you get massive network savings and increase the speed at which you transfer data.

4. Manage all of the business rules and data logic centrally.

Who has the time to chase their data in different directions? Operating dispersed integration applications creates chaos in today’s changing data ecosystem. New data and modern data platforms must be managed centrally with all business rules and data logic in a single design studio. You can only do this in an architecture where a central design studio is completely separate from local processing agents using native functions. Managing all of your business rules and data logic centrally gives you complete transparency, accessible lineage, and maximum reuse. You’ll never have to waste your time redesigning flows.

5. Make changes using existing rules and logic.

Don’t waste your time doing the same job over again. With all management handled centrally, you are also able to keep all the existing business rules and data logic templates in the metadata repository. As a result, when changes need to be made with new data, new platforms or even migrations from one platform to another; you are able to make those changes quickly, using the existing rules and logic.


It’s time for a shift in paradigm to modern data integration so that we can spend more energy gaining insight from our data rather than fumbling around with it.