3rd Principle of Modern Data Integration
Posted By Jonathan Wu
04 May 2016
The first principle of modern data integration is to “take the processing to where the data lives.” The objective of the 1st principle is to utilize host systems for specific processing in order to create efficiency by preparing and moving only the data that is needed. The second principle is to “fully leverage all platforms based on what they were designed to do well.” The 2nd principle was defined to create an optimal balance of processing and workload by utilize the source and target platforms for the capabilities that they were created and available to do. The third principle is to “move data point-to-point to avoid single server bottlenecks.” The objective of the 3rd principle is to move data in the most efficient and fastest manner possible.
The best way to illustrate the first three principles of modern data integration is to compare and contrast it against the architecture of traditional data integration technologies, which is commonly referred to as Extraction, Transformation and Load (“ETL”). In an ETL architecture, there is a two-step process of moving data from source systems to a data integration server, and then moving the data from the integration server to the target system. The data integration server performs the processing, transformation, integration of the data. In this two-step data movement process, network traffic is impacted by the movement of the data twice and processing of the data is limited and dependent upon the resources of the data integration server. The first two principles of modern data integration revamps ETL architecture by eliminating the middle component, the data integration server, and placing the processing capabilities at the source and target systems. The 3rd principle of modern data integration directs the flow of data from source to target systems, which is a one-step data movement process.Diyotta’s first customer entertained the idea of moving and processing 5 terabytes of data from its operational systems on a daily basis to a massively parallel processing (MPP) platform using a traditional data integration technology. Given the ETL architecture of the traditional data integration technology, the data processing would not take advantage of MPP. They also realized that using a traditional approach to data integration would create a bottleneck in the data flow and hinder the availability of the data downstream for applications. After they implemented Diyotta’s modern data integration technology, they were able to move and process 5 terabytes of data on a daily basis in a manner of a couple of hours versus tens of hours that was predicted if they had used a traditional ETL approach.
In today’s information environment, data is moving in all directions and it is vital to establish processes that move data point-to-point, rather than through an integration server. This new approach gives you the ability to move data at the right time. There are times when you want to move all of your data; most of the time, it makes more sense to process in in its native environment, and then move the result set. By moving data point-to-point and eliminating integration server bottlenecks, massive network savings are created and the speed at which data is transfer greatly increases.
The remaining two principles of modern data integration continue to build upon the first three principles to form an efficient and highly effective architecture to address modern and big data.