Understanding Data Governance: Tools, Documents, Processes, and People
Data governance is a common need across organizations, and can be a very challenging subject to tackle. Understanding data governance’s components, what good governance looks like, and the drivers behind adopting it is essential to creating a successful governance effort.
What Is Governance?
Governance (more specifically, data governance) is a discipline for managing all aspects of organizational information. That’s a very broad definition – and governance is a very broad topic. It is also, however, a key part of any data management strategy, something that is top of mind for CIOs and CEOs according to KPMG and Pitney Bowes:
Source: pitneybowes.com
Some particular areas of focus within governance include:
- Data quality
- Data usability
- Data integrity
- Data security
- Data preservation
Data Quality: Is the data accurate and correct? Problems can arise throughout your data’s lifecycle. It might be entered incorrectly, or corrupted by a transformation process, or be out of date due to changes that were not accurately reflected.
Data Usability: How easy is it to use the organization’s data? This is a critical challenge, and a lot of governance solutions revolve around it. Users need to be able to discover data assets, understand what they contain and how they work, and understand the tools and techniques required to use them.
Data Integrity: Related to data quality, data may be correct but not consistent with respect to other data. This may be due to data quality issues directly, varying refresh times for different sources, differing definitions of terms (“Annual total;” does this mean calendar or fiscal year?), or for other reasons.
Data Security: This is the most widely adopted facet of data governance (although most organizations do not consider it in that light). Nearly every organization considers who has access to their data; however, these decisions are not always revisited on a regular basis.
Data Preservation. When and how is data preserved, archived, and deleted?
Why Do You Need Governance?
As organizations use data (and analytics) more, and for more important questions, the need to govern those assets increases. Every organization should be concerned about data quality in their source systems, but often these concerns are isolated and not visible across departments. In fact, these issues are commonly known only to the business report writers who create manual workarounds for them in Excel, Access, or similar systems. Sometimes these solutions are embedded in spreadsheets and forgotten by current employees.
A more distributed organization with multiple source systems covering the same subject areas has a greater need for data governance. Different parts of the organization might have different methods for storing the same entity (for example, different ways of tracking customer IDs, or a different set of attributes for product descriptions and hierarchies). Each department or unit functions on its own, but any sort of consolidation across them runs into consistency issues. Even if some or all parts of the organization are managing their data, they may do so differently and those differences may or may not be understood. For example, imagine one group that deletes data after 2 years and one that retains data for 5 years; a historical report may look very skewed to someone not aware of this.
A data warehouse is often a trigger for surfacing data quality and integrity issues, as it brings data into a new analytic context and attempts to integrate data that may have lived in independent source systems. For many organizations this leads to the first conscious attempts at understanding data governance as they try to reconcile competing data definitions, integrate disparate data sources, and correct data quality issue which are affecting analytic reports.
Today, many organizations are formalizing a bimodal analytics architecture that enables data discovery tools like Tableau to work alongside traditional warehouse analytics. In these environments, data is being accessed in carefully prepared data marts as well as more homegrown approaches where elements may be combined, transformed, and integrated on the fly. This is not new – users have been doing this in Excel for years. What is new is acknowledging it as part of an organization’s data architecture. In this environment, managing the data assets is even more important. Data discovery users need to understand what sources are available to them and how they should connect.
What Does Governance Look Like?
As a discipline, data governance covers a broad range of processes, tools, and documents. One useful way of visualizing the many elements under governance’s umbrella is Informatica’s 10 facets of governance diagram (shown below). As you can see, there is a lot you need to account for.
Source: informatica.com
Some of the more typical examples of these are listed below. It’s important to remember that these are not the only ones, and they may not be the right ones for a particular organization. It’s important to understand company culture and needs when implementing any aspect of a governance solution.
Data Dictionary: A listing of data elements along with attributes such as description, owner, security, retention policy, lineage, etc. This is what many people think of when they think governance. The important consideration is the “entry point” that this type of document creates; people who are new to the organization, or looking at a new data source, need a way to understand what is out there and what they can use. Defining data elements provides a common ground for different parts of the organization to ensure consistency across analytics.
Stewardship Process: Populating a data dictionary is not a simple task. Even just listing all data elements and understanding where they came from can be daunting, and that pales in comparison to the disputes that can arise over differing uses of common business terms. This work is not a one-time task; over time, new data elements will arise, definitions will change, and business needs will evolve. A stewardship process provides the framework for ensuring that these issues are resolved in an ongoing and consistent manner.
Governance Council: Whatever you name it, having a group of people responsible for overseeing the governance process is critical to long-term success. Just as data dictionaries help users understand data discovery efforts, a governance council serves as a known entry point into resolving data governance issues. The specifics of the group will vary by organization, and there is often more than one group created to handle specific types of governance processes.
InfoSphere Information Governance Catalog: This is an IBM tool designed to assist with governance tasks (data quality, business glossaries, data lineage, etc.). Having a dedicated tool helps a great deal with ensuring that the right information and workflows are supported. Regardless of the specific tool however, it is important to consider the mechanics of storing the output of governance processes.
As mentioned earlier, there are many, many more documents, processes, and tools within data governance, but this is a good starting point.
You Have to Maintain Governance
There is a lot of detailed information available on data governance, and a lot of areas to consider. When starting a governance program, it’s very easy to take a more theoretical approach and create a set of processes and outputs which place a high burden on the organization. This makes it less likely that the governance program will succeed. There may be results, but they will fade over time. Documents will become obsolete as they are not updated with changes. User adoption will decrease. Alternate resolution methods (often isolated and inconsistent) will emerge.
Instead, it is important to carefully consider what is important to the organization – where does governance provide the most business value? Create a program that provides that value as simply as possible. As that program matures, the organization will better perceive the types of value that data governance provides and identify areas of expansion. As with most analytics efforts, starting small and thinking incrementally is the path to success.
Are You Ready for Governance?
Data governance involves all parts of the organization that touch data; those who enter it, administer it, consume it, store it, and analyze it. For a governance program to succeed, stakeholders from all those areas should be involved and convinced of the value that data governance can provide. Governance should not just be driven by IT and pushed out to the rest of the organization. In fact, according to TDWI business involvement is essential as the business side’s role in data and analytics has only increased over the years.
Source: tdwi.org
For this reason, organizations often find that creation and adoption of an enterprise analytics strategy is a precursor for strong governance efforts. A data warehouse often reveals data issues (quality, integration, definition, security, retention, etc.) that require conversation across a wider organizational audience to resolve. Groups that may have operated independently (for example, those maintaining an ERP system) should now be drawn into a larger architecture as their data is put to new uses.
If an organization has a data warehouse, it becomes much easier to assess its readiness for governance. Have data issues been identified, and how seriously are they considered by the various parts of the organization? Is the business concerned with helping IT find a solution or do they consider it “not their problem?” At the same time, it’s also possible to gain insight into the degree of governance challenges that exist and thus estimate the business value that different aspects of data governance can provide.
If an organization does not have enterprise analytics but is interested in governance, it’s important to understand what is driving this interest. Is the organization particularly decentralized? Is data quality particularly challenging? Usually, behind these challenges will be a business model that places greater than usual emphasis on data assets. Making sure all parts of the organization understand this connection is critical to implementing governance.
The Discipline of Governance
Data governance combines tools, documents, processes, and people to maintain and improve an organization’s data assets. While every organization has governance needs, not every organization needs (or is ready for) a dedicated governance program. Understanding business value and organizational readiness is essential to crafting a successful data governance program.
About Ironside
Ironside was founded in 1999 as an enterprise data and analytics solution provider and system integrator. Our clients hire us to acquire, enrich and measure their data so they can make smarter, better decisions about their business. No matter your industry or specific business challenges, Ironside has the experience, perspective and agility to help transform your analytic environment.