The most recent blog in this series concentrated on data quality and the various ways that bad data can enter and pollute your big data. In this blog, I want to review other aspects of data governance around big data and finish off by taking a look how you might get started with big data governance.
For me, that final point is key. Information governance can feel like a daunting prospect with organisations unsure of where to start and put off by the sheer scale of the problem. This situation is only gets worse with the increasing number of big data projects that are tapping into new data sources where volumes, variety and velocity dwarf those of traditional sources. Add in the internet of things and it’s clear we are likely to be drowning in a sea of data if we don’t quickly get a handle on an information governance strategy.
In addition to data quality, there are two aspects that will need to be managed:
- Lifecycle management of the data
- Privacy concerns related to what is being stored and permission rights
All Data Has Value, But Some Data Has More Value Than Others
At first, lifecycle management might seem like a low priority. After all, why worry about archiving information out of the big data system when it’s all about the volume and analysing as much information as possible. The reality is that lots of irrelevant data just creates noise that makes it harder for business people to see the key trends and insights in their data. The concept of business-relevant data is a key pillar of information governance.
Research has shown that business users think that 69% of enterprise data has no value so it makes a lot of sense to setup business rules around data relevancy before you go-live with your big data project. If using an in-memory database, you should be particularly ruthless here because although memory is cheaper than it’s ever been, there is still a cost to storing unnecessary data.
As well as filtering out irrelevant data, you should also look to establish a retention policy. You will need to identify if any of your big data has any regulatory or legal retention conditions and ensure you regularly check for compliance. Plan to create a process and set rules around when you want data to be moved out of the in-memory system into an archive. Some big data bloggers have advocated classifying data as hot, warm or cold in order to help drive the retention policy.
The second aspect of data governance which needs addressing is data privacy. This is probably an aspect that you didn’t worried about too much when your big data project was in the pilot stage. Once you start to think about go-live, you will need to identify what sensitive data forms part of your big data and work out how to protect it from users who are not entitled to access it. You will also need to invest in some kind of audit capability so that you can see who is accessing the sensitive information. If this control and audit process seems counter to the spirit of the big data project, you should consider anonymising the data so that trends and data relationships can still be analysed without compromising data privacy rules.
3 Steps to Kick Start Your Big Data Governance
- The first step of any information governance project must start with understanding the current state of your data. All of our customers have undertaken a data assessment, which can be used to understand both the quality of your big data sources as well as the big data itself. An assessment will give you a fast view on how much of your data is business-relevant, how it ranks against various data quality dimensions and perhaps more importantly whether it actually supports the requirements of your big data project.
- The second step is all about prioritisation. Inevitably, a data assessment is going to bring out a large number of issues and it can often be difficult to figure out where to start. My advice here is simple, you should follow the money. By that I mean, look at the data objects that will have the greatest financial impact on the business and fix those. For example, if you are looking to analyse supplier performance and spend to help optimise that part of the supply chain, look to identify duplicates across all your sources so your analytics consolidate correctly. The stark reality is that these areas of greatest financial impact are going to attract sponsorship, funding and resources for your big data governance project more easily than others.
- If you work in the public sector then profit is unlikely to be a driver of funding so here I would look for data areas which could achieve significant cost savings or would improve customer service levels. Fraud detection is a popular big data use case in public sector and therefore governing ‘customer’ big data would be a higher priority in this situation. The third step involves turning information governance into a robust and repeatable business process. Ownership will need to be assigned to big data stewards. They will be responsible for fixing the data quality issues and, most importantly, being part of the on-going process of monitoring quality on a constant basis. For this step you will need an information governance platform that enables you to manage the data stewards and automate the process of monitoring and alerting them to big data issues that need remediation.
So do you feel confident in your big data governance endeavours? Are you ready for big data governance? I think it’s time for you to take that first step.