Business Intelligence: the flashpoint between business and technology

Data dictionary: a beginning

6 January, 2011 by Stephen Simmonds

Having read a variety of thoughts on data dictionaries, I have come to the conclusion that there is no ready consensus on exactly what they look like and what they are for.

The latter sounds easy: a data dictionary is meant to communicate (socialise) the contents (or rather, metadata) of one or more data sources.

That’s laudable: we all want to be on the same page.

Yet most of the descriptions I’ve seen amount to little more than sucking out the tables, views and columns from a database, and publishing. The result must be both incomprehensible for and unusable by business stakeholders.

I also see quite a dichotomy between starting a data dictionary as an aid to building a (greenfields) data source, and documenting an already-existing system. First, the immediate goal is quite different in each case, although the end-point would be the same. And in fact the starting point would be different in each case: on the one hand, business aims lead, and the exercise would begin from business definitions. On the other hand, documenting the here-and-now of the data sources is what matters, and business sense is built up from the atomic data definitions.

In documenting existing data sources, the hardest task seems to be to translate database definitions to business terminology, and I’ve never seen that done well, far less done automatically.

My vision of a data dictionary to document an existing system is as follows.

Suck out the atomic data definitions. From this, create a Wiki that can enable Subject Matter Experts from both business and IT disciplines to build up a common, agreed understanding. From changes to this Wiki, generate update code for the data sources.

Since documenting this vision, I’ve only found one organisation that has made a stab at anything like this: Metcash – and they apparently use it to document their data but not to generate updates. I’ve not seen their Wiki in detail, but they have a couple of thoughts to feed to the mix.

One: Updating of the Wiki needs to be restricted. It’s unwieldy to permit everyone to have a stab at improving definitions. Keep the update community small and manageable.

Two: it’s an evolving process. As with all development work around data, the project never reaches an end point – if for no other reason than business changes and so data would keep changing.

Posted in business intelligence, data, metadata | Tagged metadata | 1 Comment »

Data Quality 2: Some thoughts from a case study

12 August, 2010 by Stephen Simmonds

Although data quality exercises have a variety of business paybacks, they are often low on the radar until a particular business need (or failure!) arises. That can be short-sighted; they can be often enough justified by a cost-benefit analysis of the existing data quality. Business decision-makers that don’t want to allocate budget should be properly aware of the ramifications of saying no – all too often, the case is not presented clearly enough in a business context. But once undertaken a data quality project should eventually transform into ongoing data quality processes, including audits and governance, which are far less costly than revisiting the same issues later when stemming from different failures.

Data quality projects can emerge from many affective issues. General examples are problems with:

accuracy
completeness
timeliness
consistency

which can especially derail change in an organisation, whether new business development or new IT functionality.

A talk given through TDWI by Chris King illustrated some typical experiences with a data quality project, in the context of a regional level of an international hotel chain. There were 8,000 employees in the region, half of whom could directly impact the data.

The starting point was the ‘single customer view’ objective. This is a very common confluence of business need with IT strategy: as common as it is to hear that a company’s customers are its most significant business resource, it is almost axiomatic that the biggest data quality issues are to be found with customer data. Customer data tends to come from a variety of sources – of variable quality; too often the customer can enter their own information without human mediation.

Yet the relationship between a customer-centric businesses and its data quality strategy is variable. Jim Harris at OCDQ has a tragicomic tale to relate about an MDM/EDW* project with 20 customer sources. He characterised the company as having a business need to identify its most valuable customers, yet they “just wanted to get the data loaded” (sounds familiar) and intended to rely on MDM and “real-time data quality” via the ETL processing.

How valid is that approach? It should be decided by the key business stakeholders, with input from the technical analysts on current data quality (and project constraints). From the sound of it, that’s not how the decision-making was done – yet even if so, how confident were the key business stakeholders that they had a good handle on the issues (and weren’t obfusticated by the technical details)?

In the case of the hotel chain, 40% of bookings arrived centrally, while 60% were people applying directly. Generally there was better quality data in the former, as it tended to be repeat customers with an established history – resulting in some informal cleansing in the past.

Issues were sourced to the variety of collection points, such as:

call centre: cost containment requirements had crunched call time, with an attendant reduction in data capture;
third-party collectors of information, such as travel websites: they may have their own data capture requirements, but they’re just as likely to regard the customer as their own, and forward minimal details;
email marketing: less focus on eliciting the full gamut of customer details.

Mandation of fields presents a typical quandary: you want as much as possible, but people will always find a reason to circumvent them, and a way. But what’s worse than no data? Bad data – especially when shuffled into good data. Among the ideas tested were simply highlighting some fields rather than mandating them, and a trial of requesting drivers licenses.

They separated information from I.T., as Information Services. This to better deliver information management, champion data quality, and support decision-making. As opposed to Jim Harris’ example above, they worked on data quality before data integration projects – which can significantly reduce the cost of such projects when it comes their turn. In fact, Chris commented that once the objectives of the data quality project were well understood, it was both far easier to introduce the changes, and softened up the stakeholders for other objectives like integration.

Data Stewardship is an important part of the ongoing process. Once you’ve brought people together initially, it’s easier to set up a structure to manage data continuously, not just as a centralised dictionary, but as a necessary and useful dialogue with affected stakeholders. This can prevent in advance situations Chris uncovered, such as finding one person’s VIP code has been set up by someone else to flag inclusion in a blacklist.

Data quality thresholds were addressed by incentives as basic as ice cream in call centres, through to General Manager bonuses.

Chris commented that there remained some wider business issues for resolution, such as tracking business vs leisure travel, and upselling into different brands [of hotel]. But as I said, further developments are less likely to be stymied by poor data, with a cleaning exercise under one’s belt and a quality structure in place.

* Master Data Management, Enterprise Data Warehouse

Posted in data | Tagged data cleansing, data quality | 1 Comment »

Older Posts »

Business Intelligence: the flashpoint between business and technology

ideas from the interface between business and I.T.

Data dictionary: a beginning

Data Quality 2: Some thoughts from a case study

Where business and technology should meet.

Stephen Simmonds

Last 8 posts:

Archives

Top Posts

Blogroll

Meta