Data Integration – Where do we go from here?
Posted By claudia
08 March 2016
- Creation of highly portable data integration processes. Creating portable data integration packages insulates your data integration processes from technological changes and/or evolving new data sources/targets. It means creating “containers” of data integration processes. You may start with a relational DBMS, and later move your data and data integration processing elsewhere – say to Hadoop without impacting the previously built processes. The key is that wherever you choose to create your BI or analytics environment via data integration, you can adapt to the evolving changes in the technology which are inevitable. And your data integration architecture should have built-in resilience towards these evolving changes.
- Support for multiple deployment options. Organizations are deploying their analytic assets on-premises, in the cloud (public and private), and in traditional databases or on a Hadoop cluster. The data must be integrated within the cloud or on-premises. And the integration should occur within databases, applications, or middleware. Finally we need data integration to occur in batches, via a request/response mechanism (APIs), or in real time or micro-batches. Real time means the data integration platform must be able to capture large volumes of rapid but small messages from streaming data.
- Support for both traditional data pull (extract, then integrate) and the newer push process. Data is pushed from the source when something changes or upon a timed schedule to the backend data integration service. This is quite different from the traditional data integration mechanism of pulling or extracting the data on demand. This push action would be far more reactive and timely for some sets of data. In this case, ETL would mean event-transform-load and would support the continuous delivery of data to the transformation and loading processes.
- Distributed vs. Centralized. While we see global boundaries starting to shrink as businesses expand (e.g., Uber, Netflix, AirBnB), this also mandates that the data landscape become more diverse and distributed. This is most obvious as we begin to see Cloud, Mobile and IoT initiatives gaining momentum across enterprises. Even if data volume is not an issue in your organisation or you don’t have an IoT initiative, it is unrealistic in today’s analytics architecture to demand that all data end up in a single physical centralized data store. There are many reasons why data may remain distributed. We need modern architectures that can manage distributed data processing and storage as well as centralized ones.