Again, this week I am gathering together a few reads that I have found to stick in my mind, for one reason or another.

The future of Analytics
The Data Warehouse Institute has a series of “Best Practice Reports”; a recent one is called Delivering Insights with Next-Generation Analytics.  It provides an analysis on the future of analysis, backed up with some survey results.  It characterises BI as central to analytics in a business context (and it’s hard to say what part of business analytics BI would not be involved in).  Reporting and monitoring remain crucial components of such activity, but TDWI places an emphasis on differentiating users of information and analytics, from production report consumers (wide in scope but terse in analytical focus) to the power user analysts and managers concerned with forecasting and modelling.  The essence of its recommendations are to provide appropriate tools to the differentiated users, and keep an eye on technology.  Although at a top level this isn’t exactly news, this report is packed with useful detail for those making an effort to keep on top of the intersection between business and technology.

The future of Data Warehouses
Although I had a look at some new technology in data warehousing recently, this second TWDI report (Next generation Data Warehouse Platforms) is necessarily more systematic.  It models the DW technology stack, outlines new technology and business drivers, intersperses user stories, and outlines emerging trends (eg appliances, in-memory, cloud/SaaS, columnar, open source, etc) not too different from my list.  Recommendations include: focusing on the business drivers; moving away from expensive in-house development; preparing for high-volume data; anticipating multiple path solutions, including open source.

In-memory databases
TDWI’s above report treated in-memory DWs seriously, without going into much detail on feasibility.  This is odd, given one of their recommendations involves preparing for an explosion in data to be stored.  I read a discussion on this technology (TDWI again: Q&A: In-memory Databases Promise Faster Results), which still doesn’t convince me that this isn’t a cat chasing its own tail.  The only realistic way forward I can see is by developing a dichotomy between core and peripheral data and functionality.  Haven’t seen that discussed.  Yet.

Forrester on trends and spotting them
Forrester has a new report aimed at Enterprise Architects: The Top 15 Technology Trends EA Should Watch.  These are grouped into five themes: “social computing for enterprises, process-centric information, restructured IT service platforms, Agile applications, and mobile as the new desktop”.  Some of it is discussed here, by Bill Ives.  Further, Forrester gives an outline of the criteria it uses for paying attention to a technology.  This includes how meaningful it is in the near term, its business impact, its game-changing potential, and its integrational complexity.

Vendor news: Oracle and bulk financials
Finally, news that Oracle has bought up again, this time taking over HyperRoll, whose software is geared for analysing “large amounts of financial data”.  Sounds a sensible move.


Coincidence: a scant twelve days after I discussed the contribution BI can make to process improvement, I found myself listening to Olivera Marjanovic similarly drawing a confluence between the two – from a more structured, organisational perspective.

Dr Marjanovic, an academic with the University of Sydney, has a focus on the integration of Business Process Management with Business Intelligence (a paper of hers on that subject can be found here).  At a recent TDWI meeting (abstract here), she aimed to present a roadmap for this integration.

Countering the business/data technology community’s traditional cynicism of academia, her talk was wide-ranging and stimulating.  I can only summarise some of what she said, because she raised many more discussion points than can be covered in a brief post – or captured in hurried notes.

BPM suffers a variety of both definitions and practice – and has changed over time – so it’s important to put a context on the term.  Dr Marjanovic says in her abstract that PBM “has evolved beyond technologies for process automation and methodologies for process efficiency improvement”.  Her definitional synthesis (based on one I have seen usually attributed to the Aberdeen Group) is

Business Process Management is the “identification, comprehension, management and improvement of business processes that involves people and systems, both within and across organisations“.

It’s a process-driven management philosophy, one where effectiveness is more crucial than efficiency (which is pointless if a process is not effective).  Technology is, of itself, insufficient: it comes from the interaction of strategy, people, processes and systems.  From the HR perspective, this should include training in:
– design thinking
– process innovation
– lateral thinking; but in particular
– learning how to learn.

Within knowledge management, Dr Marjanovic emphasised an organisation’s tacit knowledge – that is, experiential knowledge, that which conveys competitive advantage – over explicit knowledge, something that can easily be replicated [in other organisations].  This is the difference between

  • procedural business processes: highly structured decisions, tight information/decision coupling, and decision-centred BP improvement


  • practice-oriented business processes: unstructured decision-making, loose information/decision coupling, and collaborative knowledge-based improvement.

In this sense “knowledge management repositories don’t work” for process improvement – in that they date too rapidly (they are better suited for business continuity and risk management functions).

The greatest value of BI comes from being increasingly embedded within the business processes themselves” – TWDI’s Best Of BI, 2008.

Dr Marjanovic offered some lessons for Business Intelligence, which included:
– BI practitioners are not [sufficiently] trained in process improvement or process thinking in general
– BI training methods are still very traditional, and skill-based BI practitioners need advanced training methods to help them learn how to learn.

When I outlined my experience with process improvement through BI or data quality initiatives (mentioned a couple of weeks ago), Dr Marjanovic suggested this was not common practice.  She clarified: “What is not common is a systematic (rather than ad-hoc) application of BP improvement methodologies in the context of BPM/BI integration.”  That does not surprise me: it accords with my (anecdotal) experience that the two disciplines don’t often meet.  But as I’ve said before, if BI practitioners, by intention or direction, retain a narrow focus on BI-specific projects, both they and their organisation risk abrogating the value they could express in both process and data improvement.

This week I have pointers to three discussions I’ve been reading.

BI Workspace: another ‘future of BI’ is discussed here, traceable back to a report from industry analysts Forrester (executive summary here).  What it is: the concept of a fully navigable data environment, geared specifically to the power user who has sufficient understanding of the data and its business context to make rational use of the full extent of data exploration.

Data Quality as an issue of contextA discussion at the always-useful OCDQ on data quality being a wider issue than simply accuracy.  Data accuracy was fully acknowledged, but other dimensions raised.  My contribution to the discussion focused (as usual) on the quality – fitness – of the data as a business resource: including timeliness, format, usability, relevance – and delivery mechanisms. (To give the discussion its due, it was prompted by Rick Sherman’s report on a TDWI Boston meeting.)

Quality Attributes as a point of architecture: An ambitious point was raised as a discussion point on LinkedIn’s TDWI group.  The essence was a suggestion that data quality dimensions defined as standards or Architectural Criteria when designing repositories.  Should standards such as ‘availability’, ‘portability’, ‘recovery’ be built into a data repository’s initial design?  Sounds laudible, but how practical is it to define it to measurable detail?  How intrinsic should such measures be to such a project’s SLAs?

Finally, a comment by Atif Abdul-Rahman (blog Knowledge Works) on my previous post linking business intelligence to business process improvement.  Atif effectively said BI+EPM=BPM.  My first reaction was to treat it as spam 🙂   – what do you think?

What does Business Process Improvement have to do with BI?

Not much, if you read Wikipedia.  But I’m beginning to suspect that a large number of the Wikipedia articles that stand at the confluence between business and technology are written by management ‘experts’ who are practicing for their next book, based on their part-time management studies.  Certainly most of those articles are arcane enough for those of us steeped in practical application of technology.

Yet there are other ways to bring about improvement in business processes than by following a rigorous methodology imposed from on high.

On the one hand, it’s often possible to just walk into a business and identify candidates for improvement, if not processes that are thoroughly broken.  That’s not just because a fresh set of eyes can help, but because experience, and enough experiences in different workplaces, can help to quickly identify both the work practices that are worth repeating, and those that are well broken.

But that’s not what I’m talking about either.

It is this: a business’ data is a model of the business and its practices.  In  return, business intelligence is a process of accurately reflecting that business and its processes.  And in endeavouring to do so, a good amount of business analysis is called for, to understand the business as you are reflecting it.  And that wholistic engagement process has a habit of uncovering what is not working as expected in business processes, both in practice (when analysing what people are doing) and in virtualisation – because when the data is shown to be incorrect and/or not as expected, that mismatch tends to reflect business processes that are awry.

That is not necessarily a part of the brief of a business intelligence professional.  Yet with forward-thinking management, it can be.

But at the very least, business intelligence professionals are ideally placed to gain insight into both a business and the model of the business and, in identifying mismatches, to foster improvements in business processes.  It would be negligent to waste such opportunities.

Why the buzz over columnar databases recently?  They’ve been around since the 1970s at least.  At the moment it remains the realm of niche players, although Sybase has had such a product for years, in Sybase IQ.

As far back as 2007, Gartner has been giving it a big tick.

Yet for some reason, I’ve been assailed by the concept from several disparate sources over the past month or so, some of which are heavy on the blah blah blah advantages but light on specifics such as reasons for those advantages.

I don’t pretend to have done the research, so I’ll just present a quick overview and links.  (At the very least, you can revert to Wikipedia’s article at top.)

In a nutshell, it is as it says, in that data is stored by column rather than by row (however, retrieval implementation seems to vary, with both row- and column-based querying variously supported).  Typified as meaningful in an OLAP more than OLTP context, it is said to be particularly beneficial when frequently working with a small subset of columns (in a table which has a large number of columns).  And, you guessed it, aggregation particularly thrives.


  • There’s a simple overview in a blog by Paul Nielsen, who puts his hand up for an implementation in SQL Server;
  • There’s a small simulation in Oracle, in a blog by Hemand Chitale  (with caveat in a comment);

[Part one of this discussion looked at different definitions of BI, and a very salient example of how it can be done well.]

When I’ve presented to people on the opportunities inherent in business intelligence, they marvel when they see information that is directly relevant to their work, in a new and meaningful light: summarised, for example, or detailed, or with direct visual impact that promotes new insights.

That’s the easy part.  Delivery is harder.

When I need to take a step back and assess what I am doing, I ask:

What does business want out of business intelligence?

This is particularly cogent if a BI implementation is less than successful – and I’ve never seen an implementation that really, I mean really, delivers.  I’m not talking about simply analysing business requirements, but understanding what is needed to deliver effectively.

There are many different ways of answering this question.

1) The anecdotal

My experience is probably not too different from many others.  In general, the feedback I’ve had from business stakeholders is:

  • They don’t know what they want; and/or
  • They want you to do it all for them

That’s a bit glib, but later I’ll extract some value from it.  In fact, as long as you’re delivering tangible value, I’ve found the business information consumers are reasonably happy.  It’s easy enough to rest on that, but as a professional it pays to think ahead.  Unfortunately, there remains a need for a level of business commitment to information issues – and I’m not talking about getting training in the tools or the data qua data, more about adopting an information-as-strategic-resource mindset.

2) The statistical

In a recent survey run by BeyeNetwork, the top two desires of business for BI are:

  • Better access to data
  • More informed decision-making

Axiomatic, no?  These effectively say the same thing, but there is nuance in each.

On the one hand, can business get whatever information they can possibly envisage, and in a format (whether presentation or analytical input) they can use effectively?  Clearly not – that’s a moving target.  But it’s also a goal to constantly strive for.

On the other hand, for business decisions to be made, it needs to be asked: what would support them in that process?  That’s too high-level for an immediate answer from most people.  Drilling into the detail of the processes is business analysis.  Maintaining such an understanding of business processes should rightly belong with the business, who should be fully on top of what they do and how they do it.  In practice, it’s often only when prompted by necessity – such as analysing information needs – that that exercise is done with much rigour.

3) The ideal

In an ideal world we would provide the knowledge base for a worker to be truly effective – which includes not just the passive support information, but the active intelligence that can generate useful new insights.  There’s a lot that can go into this, but the wishlist includes fuller realisation of:

  • Data integration: of information from disparate sources (not just databases)
  • transformation: from data to business meaning
  • Presentation: insightful representation of information (current buzzword being visualisations)
  • Discovery: the opportunity to explore the information (discovery)
  • Timeliness: information when they need it, where they need it, no delays
  • Control: the ability to save (and share) meaning that they encounter
  • Tools: a good, intuitive user experience – no learning hurdle, no techy barrier
  • Technical integration: seamless integration with the software and hardware environment (applications, devices respectively)
  • autonomy: the ability to do it themselves

That last one is an interesting one: it’s the exact opposite of what I said I’d experienced.  But the gap there is in the toolset, the environment in which the information is presented.  If it’s something they can intuitively explore for themselves, extract meaning without a painful learning curve, they would want to do it themselves.

This can’t be achieved by the data professional in isolation.  To achieve the above needs collaborative efforts: with business stakeholders, other IT professionals, and software vendors.

I don’t think there’s any BI implementation out there that delivers to the ideal.  Better business engagement, better business commitment, more resources for BI, better software tools, better integration: these would help.

We will get a lot closer to the delivery ideal.  But by then, BI will look rather different from today’s experience.

The dangling question: are new paradigms needed for BI to be fully realised?  If it is so hard to properly achieve the potential of BI today, there must be ways of working better.

Bill Inmon is one of the two gurus of data warehousing.  His claim is to have invented the modern concept of data warehousing, and he favours the top-down approach to design.

[Ralph Kimball is the other modern guru, who is credited with dimensional modelling – facts and dimensions.  He favours bottom-up design, first building data marts.]

Inmon is associated with the BeyeNetwork, maintaining his own “channel” there, on Data Warehousing.

Recently discussing data quality, he canvassed the issue of whether to correct data in a warehouse when it’s known to be wrong.

One approach is that it is better to correct data – where known to be in error – before it reaches the warehouse (Inmon credits Larry English for this perspective).

In contrast, there’s the notion that data should be left in the warehouse as it stands, incorrect but accurately representing the production databases. Inmon attributes this approach to Geoff Holloway.

Of course, Inmon easily demonstrates cases for both perspectives.  This is understandable because both versions of the data – corrected or incorrect – provide information.  On the one hand, business data consumers would want correct information, no mucking around.

But on the other hand, incorrect data is an accurate reflection of production values – and it can be misleading to represent it otherwise.  In particular, bad data highlights the business process issues that led to the entry the errors, and that in itself is valuable business information.

And here’s where I branch beyond Inmon.  I would argue the case for both forms of the data to be preserved in one form or another.

We have all experienced the exasperation of being faced with poor quality data flowing into business reports/information.  On a day-to-day basis, the information consumer doesn’t want to know about errors – they just want to use the information as it should rightly be, as a business input.  They may well be aware of the issues, but prefer to put them to one side, and deal with BAU* as it stands.

What this is saying is that the approach to data quality fixes should really be a business decision.  At the very least, the relevant business stakeholders should be aware of the errors – expecially when systemic – and make the call on how to approach them.  In fact, ideally this is a case for… a Data Governance board – to delegate as they see fit.  But unless the issues are fully trivial, errors should not be fully masked from the business stakeholders.

So if the stakeholders are aware of the data issues, but the fix is not done and they don’t want to see the errors on day to day reportage, how to deal the need to fix – at least as the data is represented?

I see four options here, and I think the answer just pops out.

Option 1: correct the data in the reports
Option 2: correct the DW’s representation of the data with a view
Option 3: correct the data itself in the DW
Option 4: correct it in ETL processing

Option 1 is fully fraught.  I have done this on occasion when it has been demanded of me, but it is a poor contingency.  You’re not representing the data as it exists in the DW, but more importantly, if you have to run a transform in one report, you may well have to reproduce that transform.  Over and over.

Option 2: creating a view is adding a layer of complexity to the DW that is just not warranted.  It makes the schema much harder to maintain, and it slows down all processing – both ETL and reporting.

Fixing the DW data (option 3) is done.  But again, it may have to be done over and over, if ETL overwrites it again with the bad data.  And there is a very sensible dictum I read recently, paraphrased thus: any time you touch the data, you can introduce more errors.  Tricky.  Who can say with certainty that they have never done that?

Of course, I would favour handling it in ETL.  More specifically, I would like to see production data brought to rest in a staging area that is preserved, then transformed into the DW.  That way, you have not touched the data directly, but you have executed a repeatable, documentable process that performs the necessary cleansing.

Not always possible, with resource limitations.  Storage space is one problem, but it may be more likely (as I have experienced) that the ETL processing window is sufficiently narrow that an extra step of ETL processing is just not possible.  Oh well.  There’s no perfect answer; the solution always has to fit the circumstance.  Again, of course, it’s a matter of collaboration with the business (as appropriate via the data steward or DG board).

Oh, and most importantly: get back to the business data owner, and get them working (or work with them) on the process issue that led to the bad data.

*BAU=Business As Usual – at risk of spelling out the obvious.  I find acronyms anathemic, but spelling them out can interrupt the flow of ideas.  So I will endeavour to spell them out in footnotes, where they don’t have to get in the way.