Monday, July 8, 2013

The Bookbinder, the Librarian & a Data Governance story

Over the last few years I have worked upon several high end Data Governance programmes and listened to several excellent (and even more awful) Data Governance presentations at conferences.  I have also had the honour of sitting on the panel of judges at the annual Data Governance best practice awards.

Two of the recurring themes, irrespective of Industry sector are:
1) How do you make the "business case" for Data Governance; and 
2) How best to encourage "the business" to take real ownership responsibility for data.

I intend to cover the first point in more detail in a separate blog posting, but for now the key thing to mention is to keep the case "real". Make it relevant to real business problems that are or have been encountered. Collect "horror stories" (note: please don't use the phrase "burning platforms" for horror stories if you ever intend to work in the Oil & Gas sector).  Armed with the horror stories develop strategies that demonstrate how the proposed DG approach would have trapped these (I like to use simple swim lane diagrams to illustrate these scenarios) and then develop interim transition organisation structures for the client to migrate to.

Moving to the "ownership" question, just how can you best encourage "the business" to take real ownership responsibility for data?  Firstly I hate the term "the business" but I'm not going to get all prissy and go on about that.
The challenge many people face regarding Business Ownership of data in the context of a Data Governance strategy is that most business folks either a) kind of assume its IT who do this anyway; and b) have little frame of reference as to what's actually involved ... what does the "own the data" really mean?  To many it sounds like extra work!

Trying to come up with a meaningful story or analogy relating to Data Ownership has proven difficult.
Then a few weeks ago whilst interviewing several CxO's during a DG strategy at a big Bank I had a light bulb moment.  In the fabulously lush offices were bookcases with some of the old ledgers from the banks early days.

During our discussions we reminisced about the days when the Accountants & Bankers would, using their best copperplate hand writing enter details into the ledgers in best double entry book keeping style.  They would also add to separate ledgers details of debtors and creditors.   Sometimes this would be done initially on vine vellum or parchment paper and then passed to the bookbinders to beautifully bind these together inside leather book covers and fabulous seam stitching.  Following this the bound ledgers would be filed by the librarians typically in date order but with additional customer index cards so that they could readily be accessed when required.

During our reminiscences, I said to the Bank CxO's "so it was the bookbinders who "owned" the data then as they controlled where it was stored?"
Light chortles ensued, so I replied, "well if not the bookbinders wasn't it the librarians who owned it as they were the people who controlled how it was indexed and archived to provide easy retrieval?"
No, no they said.  It's was the Chief Accountant or the Head Teller, or Account Manager who "owned" the data then as they were the real interface with the customers.

Ahhhh I smiled, so what's changed now?  Why have you passed "ownership" to the modern day bookbinders and librarians ie IT?

The "light bulb" revelation moment was priceless. At that point they got it.
 
Now I realise any analogy can be picked apart & before IT folks get too defensive I know there's more they do, however the analogy worked for these guys.
 
From this point on in our discussion, the concept of business ownership of data was firmly accepted. Following the CxO's endorsement of the DG programme the organisation structures, roles and responsibilities are slotting in nicely.  A key enabler to this program's success is getting the hearts & minds culture change message sorted and providing on-going mentoring to the Data owners.
 
IT are fully bought in & still "own" the technical systems environments whilst playing a major part in data custodianship.

Sunday, January 13, 2013

Data Governance is about Hearts and Minds, not Technology


Unsurprisingly, the principal point of discussion at FIMA 2012 was the area of information management and the rise of its importance within the finance sector. With regulatory pressure driving interest – hardly something the finance industry is not used to! – along with the proposed legal entity identifier which is pushing all businesses to have a growing and willing demand for detailed, even real-time, knowledge, information management was a topic that permeated almost every discussion at the three day event in London.

Of course, increasing awareness and debate around this topic can surely only be good news for the industry. There is clear benefit in those in the finance sector now realising that failure to manage data effectively – and therefore conform to legislative and regulatory requirements – can have catastrophic effects, resulting in imprisonment as well as businesses being shut down. After all, where other sectors, such as pharmaceuticals, have been deploying information management systems for some time, in the finance arena it is a surprisingly relatively new concept.

Therefore, the interesting workshops dedicated to information management at FIMA 2012 were very welcome and apt, however these could have held more relevance through cross-industry comparisons. Had these presentations and workshops shown delegates examples of successful deployments of information management systems and processes in a relatively comparative industry, than those who were slightly on the fence about the need for information management would have left with a solid understanding of how such a system can really benefit a business.

On a related note, the drive for organisations to hire a Chief Data Officer was also highlighted at FIMA 2012. It is becoming glaringly apparent that the role of a CIO (largely though their typical 
experience) is solely to manage IT systems and infrastructure, and information management rarely 
therefore goes beyond its protection and storage. In order to instead manage data appropriately as a corporate asset, organisations must therefore hire or internally develop an individual to take responsibility and ensure this data governance – a trend I would actively encourage in the near future.

Overall, FIMA 2012 stoked the coals of a rising Information management emphasis within the finance sector. It is apparent that the industry is thankfully now seeing such activity as a necessity. Whether this is through fear of legislative backlash or a drive to improve efficiency and visibility is largely immaterial, provided there is a recognition that a failure to store, manage and use data appropriately is likely to lead to regulatory or customer service-related horror stories being unveiled at FIMA 2013. 


Wednesday, November 14, 2012

Chief INFORMATION Officer - Not really!

The recent news that the Prudential has been fined £50,000 by the Information Commissioners Office (ICO) after a "mix-up" over the administration of two customers’ accounts should send a further warning to CIOs and Compliance officers that managing information as real critical business asset must be taken seriously.

Chris Bradley, IPL’s Chief Development Officer and Head of Business Consulting who has been evangelising the Information Management message across the world for many years said:
“Unfortunately, only a few companies are really serious regarding management of information as a vital corporate asset.  If the assets of cash or physical property or employees were treated as poorly as Information there would be major scandal, but the mis-management (all be it unintentional) of information is fast becoming a critical impedance to business success”

The Prudential mix up led to tens of thousands of pounds, meant for an individual’s retirement fund ending up in the wrong Customers account.  Bradley further commented that:
“This is important because it is the first ICO penalty served which does not relate to loss of data, but rather puts the spotlight firmly on master data management in companies”

The original error was caused when the records of both customers, who share the same first name, surname and date of birth, were mistakenly merged in March 2007.

Stephen Eckersley, ICO head of enforcement, said, “In this case two customer files were consistently confused and the company failed to remedy the situation despite being alerted to the problem on more than one occasion before it was finally resolved”

IPL have been advising its clients upon the vital importance of Data Governance and the critical role Master Data Management (MDM) plays in this.  IPL have successfully introduced their Business focused MDM approach into Global organisations in the Finance, Oil & Gas and Pharmaceutical sectors.  Acknowledged Information Management thought leader and author Bradley further commented “We’re delighted to help our clients truly see the value that effective business focused Master Data Management plays and how it is critical to achieving effective information governance”

He further continued, “make no mistake, this is one of the most important considerations for CIO’s in just about any organisation that is subject to any degree of regulatory or compliance pressure”

His insight is echoed by Gartner whose recent research stated “By 2016, 20 percent of CIOs in regulated industries will lose their jobs for failing to implement the discipline of information governance successfully” The same 2012 Gartner survey also supports IPL’s Information Management position stating: “Through 2016, spending on governing information must increase to five times the current level to be successful”.

The good news however that is some enlightened organisations are recognising the importance of managing data as an asset, however several holders of the CIO role are not really taking responsibility for Information but rather focus upon the delivery of Technology & Applications.

The realisation that Information must be managed as a corporate asset has given rise to the new phenomenon of the Chief Data Officer.  As recently as June 21, 2012, according to a survey by GoldenSource Corporation over 60 percent of firms surveyed are actively working towards creating specialized data stewards, and eventually Chief Data Officers, for their enterprise.

So a few years after the financial crisis, institutions are still struggling to get a 360-degree view of their data.  Considering organizational, policy, and behaviours within which a data control framework operates is as important as the underlying technology services that enable it. Appointing data stewards and Chief Data Officers to incite governance across these firms will be crucial to success.

Bradley concludes that the Chief Data Officer, distinct from the Chief Information Officer will be one of the top critical hires in 2013/14.

Wednesday, June 13, 2012

DFD's and ELH's ... Back to The Future?

I was recently asked by a client and also at a conference, "we need to be able to model flows of data - how do we do that"? How to model Data Flows.....hmmm that'll be Data Flow Diagrams then.

These were one of the great features in methods like LSDM and SSADM and was pretty well supported by early generation CASE tools. I still remember with some fondness how useful these were and we used them extensively in the 80's and 90's. With DFD's there was some problem with determining how to create diagrams of the appropriate level, but TBH this was really a question of practitioner experience. Simplified Level 1 DFD

As the fundamental problem that was addresed by DFD's still exists, I'm not really sure why modeling tools now don't support this any more.

I was also asked during a data modelling class, "we know we shouldnt create separate entities for each state that one could be in, but how do we model the change of entity state"?

Just in case youre not sure what this means, a simple example of state change would be how a Suspect becomes a Prospect, then a Customer, then a Gold Customer and maybe a lapsed Customer. All these are examples of when the entity has changed state over the history of its lifetime.
Here's a simple example for a Purchase Order: Purchase Order Simplified ELH

So for a bonus point, the approach to modelling the change of state of a Data Entity over time is Entity Life Histories and State Transition Diagrams.  Simple State Transition Diagram
These were also very popular in the 80's and 90's but again seemed to become unpopular for a while and not widely supported by modeling or CASE tools. Fortunately their importance is now being recognised again & I've increasingly seen these (or similar approaches) being picked up again recently.

The problems that DFD's, ELH's and State Transition Diagrams address haven't gone away, so let's use the approaches that actually work! Maybe it's a case of back to the future?

Thursday, March 1, 2012

Data Governance; a vital component of IT Governance

Thursday 1st March ... I have just completed delivering a presentation on Data Governance at Ovum London conference http://governance-planning.ovumevents.com/
I'll add a link to the slides here shortly.

I'm sure that we all know that data is growing at a vast rate, however there's been an even bigger problem concering uncontrolled growth that I have recently read about ...
 ... 12 Grey Rabbits were brought to Autralia in the 19th Century for sport.  After 2 years there were in excess of 2million per year were being shot & still the population wasnt dented.  A few years later the population was over 400million.  So, even the Data explosion highlighted by the 2011 IDC Digital Universe study hasn't yet reached these proportions.

IT Governance:
From several of the well established frameworks (eg ITIL), the common key components of an IT Governance framework seem to be:

1) Strategic Alignment: 
Alignment of the business and IT strategy with regard to the definition as well as the review of and improvement in IT’s contribution to value.
2) Value Delivery:
Within their service cycle, IT services in their entirety bring a benefit in respect of the corporate strategy and generate added value for the enterprise.
3) Resource Management:
Efficient management of resources such as applications, information, infrastructure and people, as well as optimization of the investment.
4) Risk Management:
Identification and analysis of the risks in order to avoid unpleasant surprises and to gain a clear understanding of the company’s risk preference.
5) Performance Management:
Monitoring and control of all performances in terms of their orientation towards the corporate strategy.

Looking now at  Data Governance, some of the key areas that need to be considered, certainly to folks more used to IT Governance are:

1) There are usually 3 main drivers for Data Governance:
Pre-emptive:  Where organisations are facing a major change or threats. Designed to ward off significant issues that could affect success of the company.
Reactive: Where efforts are designed to respond to current pains
Pro-active:  Where Governance efforts designed to improve capabilities to resolve risk and data issues. This builds on reactive governance to create an ever-increasing body of validated rules, standards, and tested processes.

2) Data Governance can be implemented in 3 ways, often these may overlap (Tactical, Operational, Strategic).

3) There is certainly no "one size fits all" approach to Data Governance. Need to have a flexible approach to Data Governance that delivers maximum business value from its data asset.
Data Governance can drive massive benefit, however to accomplish this there needs at least to be  reuse of data, common models, consistent understanding, data quality, and shared master and reference data.
Organisationally, different parts of the business have different needs, and different criteria for their data.  A matrix approach is needed  do these different parts of the organisation and data types are  driven from different directions.
However, no matter how federated the organisation may be there will be some degree of central organization required.  This is to drive Data Governance adoption, implement corporate repositories and establish corporate standards
The IPL Business Consulting practice have a flexible DG framework that can be tailored to help.

4) Communication & stakeholder engagement is key.  No matter how brilliant the framework is, or how great your polices or DG council are, if you dont adequately engage and communicate with the stakeholders, the DG initiative will go nowhere.

5) Finally, all of this is only important if Information REALLY is a key corporate asset for your organisation ..... so ask yourself, is it?

So IT Governance vs. Data Governance?
In summary, Data Governance is a vital frequently overlooked component of an overall IT Governance approach.  Remember the 5 commmon components of an IT Governance approach?  Well, lets apply these in a Data Governance context and we see ...

1) Strategic Alignment:
Alignment of the business information needs and the IT methods and processes for delivering information that is fit for purpose.
2) Value Delivery:
Delivering information to the requisite quality, time, completeness and accuracy levels and  optionally monetising the value of information.
3) Resource Management:
Ensuring people, and technology resources are optimised to ensure definition, ownership, and delivery of information resources meet business needs.
4) Risk Management:
Information security, backup & retention and delivery are balanced against regulatory and accessibility needs as befits the company’s risk preference.
5) Performance Management:
Monitoring and control of Data Governance roles, responsibilities and workflows such that they meet the demands of the corporate strategy.

Wednesday, November 23, 2011

Data Virtualisation As An Approach To Data Integration

Many different approaches are now available for Data Integration, yet far and away the most popular approach currently still remains as Extract Transform and Load (ETL).
However the pace of Business change and the requirement for agility demands that organizations support multiple styles of data integration.

Three leading options present themselves; let’s now describe the differences among the three major styles of integration.

1.        Physical Movement and Consolidation

Probably the most commonly used approach is physical data movement.  This is used when you need to replicate data from one database to another.  There are two major genres of physical data movement, Extract Transform & Load (ETL) and Change Data Capture (CDC). 
ETL is typically run according to a schedule and is used for bulk data movement, usually in in batch.  CDC is event driven and delivers real-time incremental replication.  Example products in these areas are Informatica (ETL) and GoldenGate (CDC).


 2.        Message based synchronization & propagation

Whilst ETL and CDC are Database to Database integration approaches, the next approach, message based syncronisation and data propogation is used for application to application integration.  Once again there are two main genres, Enterprise Application Integration (EAI) and Enterprise Service Bus (ESB) approaches, but both of these are used primarily for the purpose of event driven business process automation.  A leading product example in this area is the ESB from Tibco.

 3.        Abstraction / Virtual Consolidation (aka Federation)

Thirdly you have Data Virtualization (DV).  The key here is that the data source (usually a database), and the target or consuming application (usually a business application) are isolated from each other.  The information is delivered on-demand, to the Business Application when the user needs it.  The consuming business application can consume the data as though it were a database table, a star schema, an XML message or in many other forms.  The key point with a DV approach is that the form of the underlying source data is isolated from the consuming application.  The key rationale for Data Virtualization within an overall Data Integration strategy is to overcome complexity, increase agility and reduce cost.  A leading product example in this area is Composite Software.

ETL or DV?
The suitability of Data Integration approaches needs to be considered for each case.  Here are 6 key considerations to ponder:

1. Will the data be replicated in both the DW and the Operational System?

      Will data need to be updated in one or both locations?
      If data is physically in two locations beware of regulatory & compliance issues associated with having additional copies of the data (e.g. SoX, HIPPA, BASEL2, FDA etc)

2. Data Governance

      Is the data only to be managed in the originating Operational System?

      What is the certainty that a DW will be a reporting DW only
(vs Operational DW)?

3. Currency of the data, i.e. Does it need to be up to the minute?

      How up to date are the data requirements of the DW?
      Is there a need to see the operational data?

4. Time to solution i.e. how quickly is the solution required?

      Immediate requirement?
      Confirmed users & usage?

5. What is the life expectancy of source system(s)?
      Are any of the source systems likely to be retired?
      Will new systems be commissioned?
      Are new sources of data likely to be required?

6. Need for historical / summary / aggregate data
      How much historical data is required in the DW solution?
      How much aggregated / summary data is required in the DW solution?

 Leading analyst firms like Gartner are recommending that data virtualization be added to your integration tool kit, and that you should use the right style of data integration for the job for optimal results. 
 Just like so many things in Infromation MAnagement - there's more than way way to accomplish Data Integration; ETL is not the only way.  Data Virtualisation is well worth considering a a part of your overall strategy. 

Saturday, July 2, 2011

Big Data – Same Problems?

A recent (June 2011) IDC Digital Universe study found that the world's data is doubling every two years—this is growing faster than Moore's Law.  It reckoned that 1.8 zettabytes (1.8 trillion gigabytes) will be created and replicated in 2011 and that Enterprises will manage 50X more Data and Files will Grow 75X in the Next Decade.
The “big data” phenomenon is driving transformational, technological, scientific, and economic changes and "Information taming" technologies are driving down the cost of creating, capturing, managing and storing information

We’ve all seen how organisations have an insatiable desire for more data as they believe that this information will radically change their businesses.

They are right – but it’s only the effective exploitation of that data, turning it into really useful information and then into knowledge & applied decision making that will realise the true potential of this vast mountain of data.

Incidentally, do you have any idea how much data 1.8 zettabytes really is?  It’s about the same amount of data if every person in the world sent twenty tweets an hour for the next 1200 years!

Data by itself is useless, it has to be turned into useful information & then have effective business intelligence applied to realise its true potential.

The problem is that big data analytics push the limit of traditional data management.  Allied to this the most complex big data problems start with huge volumes of data in disparate stores with high volatility of data.  Big data problems aren’t just about volume though; there’s also the volatility of the data sources & rate of change, the variety of the data formats and the complexity of the individual data types themselves.  So is it always the most appropriate route to pull all this data into yet another location for its analysis? 

Unfortunately though many organisations are constrained by traditional data integration approaches that can slow adoption of big data analytics. 

Approaches which can provide high performance data integration to overcome data complexity & data silos will be those which win through.  These need to integrate the major types of “big data” into the enterprise.  The typical “big data” sources include:
  • Key/value Data Stores such as Cassandra,
  • Columnar/tabular NoSQL Data Stores such as Hadoop & Hypertable,
  • Massively Parallel Processing Appliances such as Greenplum & Netezza,  and
  • XML Data Stores such as CouchDB & MarkLogic.
Fortunately approaches such as Data Federation / Data Virtualisation are stepping up to meet this challenge.

Finally & of utmost importance is managing the quality of the data.  What’s the use of this vast resource if its quality and trustworthiness is questionable?  Thus, driving your data quality capability up the maturity levels is key.

Data Quality Maturity – 5 levels of maturity
Level 1 - Initial
Level 2 - Repeatable
Level 3 - Defined
Level 4 - Managed
Level 5 - Optimised
Limited awareness within the enterprise of the importance of information quality.  Very few, if any, processes in place to measure quality of information. Data is often not trusted by business users.
The quality of few data sources is measured in an ad hoc manner. A number of different tools used to measure quality. The activity is driven by a projects or departments.   Limited understanding of good versus bad quality.  Identified issues are not consistently managed.
Quality measures have been defined for some key data sources.  Specific tools adopted to measure quality with some standards in place. The processes for measuring quality are applied at consistent intervals.  Data issues are addressed where critical.
Data quality is measured for all key data sources on a regular basis. Quality metrics information is published via dashboards etc.  Active management of data issues through the data ownership model ensures issues are often resolved. Quality considerations baked into the SDLC.
The measurement of data quality is embedded in many business processes across the enterprise. Data quality issues addressed through the data ownership model. Data quality issues fed back to be fixed at source.