Although data quality exercises have a variety of business paybacks, they are often low on the radar until a particular business need (or failure!) arises.  That can be short-sighted; they can be often enough justified by a cost-benefit analysis of the existing data quality.  Business decision-makers that don’t want to allocate budget should be properly aware of the ramifications of saying no – all too often, the case is not presented clearly enough in a business context.  But once undertaken a data quality project should eventually transform into ongoing data quality processes, including audits and governance, which are far less costly than revisiting the same issues later when stemming from different failures.

Data quality projects can emerge from many affective issues.  General examples are problems with:

  • accuracy
  • completeness
  • timeliness
  • consistency

which can especially derail change in an organisation, whether new business development or new IT functionality.

A talk given through TDWI by Chris King illustrated some typical experiences with a data quality project, in the context of a regional level of an international hotel chain.  There were 8,000 employees in the region, half of whom could directly impact the data.

The starting point was the ‘single customer view’ objective.  This is a very common confluence of business need with IT strategy: as common as it is to hear that a company’s customers are its most significant business resource, it is almost axiomatic that the biggest data quality issues are to be found with customer data.  Customer data tends to come from a variety of sources – of variable quality; too often the customer can enter their own information without human mediation.

Yet the relationship between a customer-centric businesses and its data quality strategy is variable.  Jim Harris at OCDQ has a tragicomic tale to relate  about an MDM/EDW* project with 20 customer sources.  He characterised the company as having a business need to identify its most valuable customers, yet they “just wanted to get the data loaded” (sounds familiar) and intended to rely on MDM and “real-time data quality” via the ETL processing.

How valid is that approach?  It should be decided by the key business stakeholders, with input from the technical analysts on current data quality (and project constraints).  From the sound of it, that’s not how the decision-making was done – yet even if so, how confident were the key business stakeholders that they had a good handle on the issues (and weren’t obfusticated by the technical details)?

In the case of the hotel chain, 40% of bookings arrived centrally, while 60% were people applying directly.  Generally there was better quality data in the former, as it tended to be repeat customers with an established history – resulting in some informal cleansing in the past.

Issues were sourced to the variety of collection points, such as:

  • call centre: cost containment requirements had crunched call time, with an attendant reduction in data capture;
  • third-party collectors of information, such as travel websites: they may have their own data capture requirements, but they’re just as likely to regard the customer as their own, and forward minimal details;
  • email marketing: less focus on eliciting the full gamut of customer details.

Mandation of fields presents a typical quandary: you want as much as possible, but people will always find a reason to circumvent them, and a way.  But what’s worse than no data? Bad data – especially when shuffled into good data.  Among the ideas tested were simply highlighting some fields rather than mandating them, and a trial of requesting drivers licenses.

They separated information from I.T., as Information Services.  This to better deliver information management, champion data quality, and support decision-making.  As opposed to Jim Harris’ example above, they worked on data quality before data integration projects – which can significantly reduce the cost of such projects when it comes their turn.  In fact, Chris commented that once the objectives of the data quality project were well understood, it was both far easier to introduce the changes, and softened up the stakeholders for other objectives like integration.

Data Stewardship is an important part of the ongoing process.  Once you’ve brought people together initially, it’s easier to set up a structure to manage data continuously, not just as a centralised dictionary, but as a necessary and useful dialogue with affected stakeholders.  This can prevent in advance situations Chris uncovered, such as finding one person’s VIP code has been set up by someone else to flag inclusion in a blacklist.

Data quality thresholds were addressed by incentives as basic as ice cream in call centres, through to General Manager bonuses.

Chris commented that there remained some wider business issues for resolution, such as tracking business vs leisure travel, and upselling into different brands [of hotel].  But as I said, further developments are less likely to be stymied by poor data, with a cleaning exercise under one’s belt and a quality structure in place.

* Master Data Management, Enterprise Data Warehouse

Yesterday I was involved in a few discussions about meeting business needs.

Well, that covers a multitude of sins.

Someone said that in his experience, getting business requirements for BI results in either “give me exactly what I have already”, or blue sky, ie “everything”.  That’s been pretty much my experience too, and can signal that the stakeholder isn’t successfully engaged, perhaps because they don’t know what they can get, or they don’t prioritise the exercise highly enough to put in the requisite effort.

Managing scope is another issue.  BI projects are especially susceptible to scope creep, for a number of reasons.  In particular, business stakeholders often only engage belatedly on the fuller range of opportunities presented them.  This can be for rational reasons, as early deliveries often trigger further ideas and needs – not to mention their realisation you can deliver them something meaningful, cool even.

Still scope needs management one way or another.  Formalised signoffs are common, but what do you do for enhancement requests or incremental changes?  A trickle can become a steady stream.  In some situations I’ve seen a very strict policy taken: any further requirements can only be admitted via a subsequent project.  The most extreme was when a project was underquoted by an external supplier, and cost was fixed.  Black-letter adherence to a document can lead to poisonous – or at least cold – relationships, so usually there’s been some tolerance allowed or built in.  Ideally, you’d quote in a bit of slack, over-deliver, make everyone happy, and generate further collaboration.

Then there’s business-as-usual BI.

Identifying opportunities for further BI development:  not usually high on the agenda.  This because of a familiar experience that was voiced yesterday: the six-month queue for new development.  Delivering business intelligence is more a matter of managing what’s being requested than drumming up work (how to get a six-month queue: drum up work).

Prioritising is necessary, but not the ultimate answer: it doesn’t shorten the queue, and you can guarantee that as a result some worthy requests can end up languishing in a permanent limbo; somebody will be put offside.

Another common approach, which I favour wherever possible, is to foster skills loci in individual business units.  It’s often possible to identify someone in a given business area who has an analytical bent – who, by temperament, interest or both, is not only open to the idea but keen for the opportunity to extract and analyse themselves.

That’s a two-edged sword for several reasons.  Primarily: unfettered access can result in people building non-conforming versions of commonly-used metrics; some sort of auditing or filtering process needs to take place.

Mentioned yesterday was a forum of such power users, meeting monthly under the auspices of a BI professional.  Sharing experience and best practice is one aim, but it also helps to be aware of the directions people are headed, training needs, and to keep on top of resourcing levels.  I don’t think control should be an issue per se, but with workload decentralisation it’s easy to lose sight of the use of both toolsets and resources, which understanding is necessary when planning updates or changes to environment or data.  Again, it remains important to keep an eye on the use of metrics, where possible via published – and updated – standards, with acknowledged business owners.  This model can become unwieldy when there is not at least centralised insight into the use of the data resources provided.

I don’t think any of this is particularly new, but for various reasons it’s not always effected with sufficient enthusiasm – on either side.  While it’s important to ensure people are reading off the same script, I don’t think that either business or IT interests are served by maintaining BI skills within IT – with or without business analysts interfacing.  Even if there’s pushback from the business units, they will have to acknowledge they are their own subject matter experts, and shouldn’t abrogate that knowledge by delegating to those without a direct interest.

It’s hard to get through 2010 without stumbling across the term ‘agile’, which is being spilt everywhere.  Like most bandwagonesque ideas, the exact meaning is by turns carelessly mislaid, blithely trampled on, or deliberately stolen.

The origins of “agile software development” goes right back to 2001, published in The Agile Manifesto.  In theory, anything not referencing it is either wilful, ignorant, or indifferent.  But language is organic; these things will happen.

The Wikipedia definition of agile software development accords with the Manifesto.  And an example of the breakout process comes from Maureen Clarry of CONNECT: “Confusing Agile with a capital A and agility is a common mistake. Agile as a methodology is a small piece compared to organizational agility. Closely related to that, we sometimes see BI organizations that use Agile methodology as an excuse so that they don’t have to define standards or document anything. This is another example of trading speed and adaptability for standardization and reuse. It does not need to be an either/or proposition.”

Ouch.  The battle lines are clearly drawn; it can’t be surprising to see it in the business intelligence arena.

This current discussion will look at the capital A, which has definition.  As such, the Agile Manifesto is not for everyone.  Up front, they say:

“we have come to value:

Individuals and interactions over processes and tools
Working software over comprehensive documentation
Customer collaboration over contract negotiation
Responding to change over following a plan”.

That’s not motherhood – and it’s obviously not universally applicable.  Enterprise-level organisations will necessarily favour processes and tools, simply because they need good communication, integration between parts of the body to make it work –and grow – well.  In that context, the Manifesto could be seen as permission for cancer to grow: it may be successful, but out of step with the rest of the body.  On the other hand, it may be good for pilots where they don’t need tight integration with the body corporate.

The Agile Principles should be viewed in full here, but a short version could be summarised as:

  1. Highest priority: to satisfy the customer through early and continuous delivery
    of valuable software.
  2. Embrace changing requirements, even late in development.
  3. Deliver working software frequently.
  4. Business people and developers must work together daily.
  5. Build projects around motivated individuals, and resource them.
  6. Face-to-face meetings!
  7. Working software is the primary measure.
  8. Sustainable development: the ability to maintain a constant pace indefinitely.
  9. Continuous attention to good design.
  10. Simplicity: maximise the amount of work not done.
  11. Self-organising teams.
  12. Reflect as a team at regular intervals, on how to be more effective.

MIP, an Australian data management consultancy, are the ones who first brought MicroStrategy, Brio and Informatica to Australia.  Recently they gave a presentation on Agility in its formal sense, in the context of presenting RED, a data warehouse development tool from a New Zealand company called WhereScape.

WhereScape RED has:

–          automatic creation of surrogate keys, load timestamps, etc;

–          code generation, execution, re-execution;

–          a source repository;

–          change management/ version control, including comparison tools;

–          generated user and technical documentation, with auto commenting, diagrams, glossaries;

–          prototyping facilities;

–          notification of data issues (although it is not a data quality tool per se, it uses an error mart).

MIP presented WhereScape RED as inextricably linked to Agile development; a simpler IDE than Microsoft’s Visual Studio, and an intuitive ETL tool.  It has been customer-quoted as a “perfect complement” to SQL Server technology (albeit I can’t say how well it fits in with other database technology).

What I saw did look good.  It makes sense that it would suit an Agile development project.  I noted one caveat at the time: that with such tools and methodology, it would be easy to lose the development thread in the process of rapid reiteration.  A challenge, but not an insurmountable one, for the data professional.

Update 05-Aug-10: The Data Warehouse Institute’s World Conference has Agile as its theme.  Some of the adjunct discussions can be seen to muddy the waters somewhat (is it a methodology? a process? a style? – depends on who’s talking, and how loose their language is).  An earlier discussion – “Being” Agile vs. “Doing” Agile – is salient, especially the comments.  One of the author’s own comments is worth repeating, that promoting Agile on the basis of speed specifically is “wrong-headed”:

“When speed is the primary objective, quality and value often suffer. But when the focus is on incremental value delivery (of high quality); increased productivity occurs naturally”.

Yesterday I took the opportunity to attend CA Expo 10 in Sydney – CA’s annual talkfest.  Having been there in the past, I knew there would be some pearls to be gleaned on current and future directions in IT.

CA, once known as Computer Associates, now wants to be known as CA Technologies – returning the core emphasis to their brand name.  They’re one of the largest software and services organisations in the world, traditionally aimed at system management in the mainframe and enterprise market.

Their theme this year was cloud technology, which they took pains to portray as a step up from virtualisation.  The primary target of their presentations seemed to be Chief Information Officers – and anyone else influencing the IT spend.  Although they emphasised their point that The Cloud was an evolutionary move (not revolutionary!), their aim was clear: they wanted to scare the bejesus out of the CIO.

CA wants to ensure the CIO is aware of the large-scale changes in the wind, but not to worry: CA are there with the solutions.

In fairness, they drove a number of meaningful points…

The keynote speech was given by Peter Hinssen, presented as a European technology writer/lecture/strategist.  As a keynote, the intention was to both entertain and to flag CA’s themes. One of them was the cloud as a change of strategic emphasis: from efficiency to agility.  As context, he depicted a cultural trend from digital as a novelty to digital as the norm.  We’re “halfway there” he said, somewhat arbitrarily, but while we have been beavering away in the background over the past 15 years, digital has infiltrated the mainstream consumer end of society.  He rightly reminded us (who were mostly old enough to remember Pong and Space Invaders) that current entrants into the workforce had grown up surrounded by digital – and [some] could claim to have better technology at home than at work.

Ah, but I digress.  Hinssen depicted the cloud as not only Software as a Service, but Platform [and infrastructure] as a service.  Security issues require a change in thinking: from firewalling people out to a “conditional yes” – we will need partners and customers to integrate with our systems.  And he, too, wanted to scare the CIO: “You will now be known as the Cloud Interface Officer” (a “joke” repeated elsewhere).

But beneath the words, the structural changes are both daunting and complex.  CA were sometimes roundabout, sometimes direct: EVP Ajei Gopal: there will be a transition from the “IT [department] as a monolithic supplier of services, to manager of a supply chain”. ( Therein lies the challenge.  It won’t be immediate, and it will never be comprehensive, but technology management will inevitably be forced to grapple with that changing role.  Is this the same as outsourcing?  In some ways yes, in some ways no – the change is likely to be far more gradual, as new functionality is quitely placed in a cloud domain, for example.)  CA’s Chris Dickson said competitive advantage comes from managing external resources. ( But that rather begs the question: how good is your capacity to manage external resources?)

But back to scaring the CIO. Gopal: “You will get a call from the CEO: What are we doing in the cloud?”  (The absolutely natural response to hearing that is to ensure there’s a proof-of-concept pilot in place, just to show IT has the concept on the map.)  Gopal wanted to demonstrate that CA had the answers, by listing a number of CA acquisitions in recent times, including Oblicore, Nimsoft, NetQoS, and 3tera, which he characterised as strategic to CA’s cloud focus.

The proof of the pudding, however… CA want to prove their capability, with their knowledge base, their software, their expertise.  They have security product, which they presented at length.  But a question from the floor stymied their chief security architect: on managing social networks spilling out from the workplace.  It’s a known challenge, and they’re working on it.

On networking: CA have set up a “Cloud Commons” community, where experiences can be exchanged, best practice shared.  They developed the infrastructure, but communities only work when a critical mass is hit.  Various product (and other) scores can be aggregated in this Commons, for example – but as we see all the time, only useful where enough people participate.

CA went into more detail on managing infrastructure, on security, on transition, all of which were meaningful, while at the same time saying “here is your problem, and we are your solution”.

In conception, their vision is what the cloud is.  The more your capacity is successfully abstracted into the cloud, the fewer points of failure to affect business with your partners and your customers.  But CA’s value propositions are large-scale, high cost projects.  They’re reaching for the sky.  Yet in the short term, they may have to settle for hand-holding exercises in proof-of-concept.

All presentations from the day can be viewed here.

Addendum:  Gartner has just put out a release on SaaS.  Inter alia, it forecasts the SaaS market to grow to US$8.5 billion in 2010 from $7.5b.  Interestingly, they estimate that “75 percent of the current SaaS delivery revenue could be considered as a cloud service”, but that will increase “as the SaaS model matures and converges with cloud services models”.  Further, they expect SaaS to comprise 26% of the CRM market in 2010 (due in no small measure to Salesforce.com, I’d say).  That’s likely to be the easiest route to a pilot cloud project at the moment.

BBC news carried a commentary recently which provides a salient warning for outsourcing strategies.

In the wake of the Gulf oil spill disaster, it was noted that for outsourcing to be successful, it was essential that an organisation had “contractor management” as one of its “core competencies”.

The commentator went on to characterise the disaster as a “management failure”. How could BP have avoided the disaster? By modelling the contingencies, he said.

Would that have sufficed in your organisation?  For BP it did not.

This highlights one of the often-overlooked perils of outsourcing. It’s not just internal contingencies that need to be taken into consideration. In BP’s case, although they weren’t the organisation on the ground (so to speak), it is their global brand that bears the burden – externally and worldwide.

Last week I read analysis that suggested outsourcing had largely peaked (within the BI sphere). But management models come and go in waves; there has certainly been an amount of reinstitutionalising of hitherto outsourced functionality. Anecdotally, some of the reasons mentioned included the cost reductions not meeting expectations – and the difficulty maintaining functionality to a sufficient standard compared to inhousing.

Is contractor management a clear core competency of your organisation? Are the risks sufficiently modelled, understood and accepted? Can your organisation withstand a BP-level crisis originating externally? Or are your management processes more robust when functionality is maintained internally?

A quick listing of HP’s latest analysis of trends within Business Intelligence:

1.  Data and BI program governance

– ie managing BI [and especially data] more strategically.

2. Enterprise-wide data integration

– recognising the value of such investment.

3. (the promise of) semantic technologies

– especially taking taxonomical (categorising) and ontological (relating) approaches to data.

4. Use of advanced analytics

– going beyond reporting/OLAP, to data mining, statistical analysis, visualisation, etc.

5. Narrowing the gap between operational systems and data warehouses

6. New generation, new priorities in BI and DW – ie updating BI/DW systems

– HP identifies renewals of systems, greater investment in new technology – perhaps in an emerging economic recovery context.

7. Complex event processing

– correlating many, varied base events to infer meaning (especially in the financial services sector)

8. Integrating/analysing content

– including unstructured data and external sources.

9. Social Computing [for BI]

– yet at the moment it takes great manual effort to incorporate such technology into BI

10. Cloud Computing [for BI]

You can find the full 60-minute presentation here.  HP noted that these points are very much inter-related.  I would also add a general tenor that I got from the discussion: that these are clearly more aspirational trends than widespread current initiatives.  HP’s research additionally highlighted the four most important current BI initiatives separately:

– data quality

– advanced analytics [again]

– data governance

– Master Data Management

Other current buzzwords, such as open source, Software as a Service, and outsourcing, didn’t emerge at the forefront of concerns.  For the first two, the comment was made that these were more background enabling technologies.  As for outsourcing, it looked like those who were going to do it had largely done it, and there was current stability around that situation.

Business Intelligence has obviously moved away from simple reporting from a single repository.   Concerns are now around data quality, integration/management – and making greater sense of it, particularly for decision-making.  Those trends are clear and current.  But I’d also like to note one small point almost buried in the above discussion: the use of external data sources.  Business value of data must inevitably move away from simple navel-gazing towards facing the whole of the world, and making business sense of it.  That’s a high mountain, and we’re only just becoming capable of moving towards that possibility in a meaningful way.