Wednesday, November 23, 2011

Data Virtualisation As An Approach To Data Integration

Many different approaches are now available for Data Integration, yet far and away the most popular approach currently still remains as Extract Transform and Load (ETL).
However the pace of Business change and the requirement for agility demands that organizations support multiple styles of data integration.

Three leading options present themselves; let’s now describe the differences among the three major styles of integration.

1.        Physical Movement and Consolidation

Probably the most commonly used approach is physical data movement.  This is used when you need to replicate data from one database to another.  There are two major genres of physical data movement, Extract Transform & Load (ETL) and Change Data Capture (CDC). 
ETL is typically run according to a schedule and is used for bulk data movement, usually in in batch.  CDC is event driven and delivers real-time incremental replication.  Example products in these areas are Informatica (ETL) and GoldenGate (CDC).


 2.        Message based synchronization & propagation

Whilst ETL and CDC are Database to Database integration approaches, the next approach, message based syncronisation and data propogation is used for application to application integration.  Once again there are two main genres, Enterprise Application Integration (EAI) and Enterprise Service Bus (ESB) approaches, but both of these are used primarily for the purpose of event driven business process automation.  A leading product example in this area is the ESB from Tibco.

 3.        Abstraction / Virtual Consolidation (aka Federation)

Thirdly you have Data Virtualization (DV).  The key here is that the data source (usually a database), and the target or consuming application (usually a business application) are isolated from each other.  The information is delivered on-demand, to the Business Application when the user needs it.  The consuming business application can consume the data as though it were a database table, a star schema, an XML message or in many other forms.  The key point with a DV approach is that the form of the underlying source data is isolated from the consuming application.  The key rationale for Data Virtualization within an overall Data Integration strategy is to overcome complexity, increase agility and reduce cost.  A leading product example in this area is Composite Software.

ETL or DV?
The suitability of Data Integration approaches needs to be considered for each case.  Here are 6 key considerations to ponder:

1. Will the data be replicated in both the DW and the Operational System?

      Will data need to be updated in one or both locations?
      If data is physically in two locations beware of regulatory & compliance issues associated with having additional copies of the data (e.g. SoX, HIPPA, BASEL2, FDA etc)

2. Data Governance

      Is the data only to be managed in the originating Operational System?

      What is the certainty that a DW will be a reporting DW only
(vs Operational DW)?

3. Currency of the data, i.e. Does it need to be up to the minute?

      How up to date are the data requirements of the DW?
      Is there a need to see the operational data?

4. Time to solution i.e. how quickly is the solution required?

      Immediate requirement?
      Confirmed users & usage?

5. What is the life expectancy of source system(s)?
      Are any of the source systems likely to be retired?
      Will new systems be commissioned?
      Are new sources of data likely to be required?

6. Need for historical / summary / aggregate data
      How much historical data is required in the DW solution?
      How much aggregated / summary data is required in the DW solution?

 Leading analyst firms like Gartner are recommending that data virtualization be added to your integration tool kit, and that you should use the right style of data integration for the job for optimal results. 
 Just like so many things in Infromation MAnagement - there's more than way way to accomplish Data Integration; ETL is not the only way.  Data Virtualisation is well worth considering a a part of your overall strategy.