Posts Tagged ‘future’

This week I have pointers to three discussions I’ve been reading.

BI Workspace: another ‘future of BI’ is discussed here, traceable back to a report from industry analysts Forrester (executive summary here).  What it is: the concept of a fully navigable data environment, geared specifically to the power user who has sufficient understanding of the data and its business context to make rational use of the full extent of data exploration.

Data Quality as an issue of contextA discussion at the always-useful OCDQ on data quality being a wider issue than simply accuracy.  Data accuracy was fully acknowledged, but other dimensions raised.  My contribution to the discussion focused (as usual) on the quality – fitness – of the data as a business resource: including timeliness, format, usability, relevance – and delivery mechanisms. (To give the discussion its due, it was prompted by Rick Sherman’s report on a TDWI Boston meeting.)

Quality Attributes as a point of architecture: An ambitious point was raised as a discussion point on LinkedIn’s TDWI group.  The essence was a suggestion that data quality dimensions defined as standards or Architectural Criteria when designing repositories.  Should standards such as ‘availability’, ‘portability’, ‘recovery’ be built into a data repository’s initial design?  Sounds laudible, but how practical is it to define it to measurable detail?  How intrinsic should such measures be to such a project’s SLAs?

Finally, a comment by Atif Abdul-Rahman (blog Knowledge Works) on my previous post linking business intelligence to business process improvement.  Atif effectively said BI+EPM=BPM.  My first reaction was to treat it as spam 🙂   – what do you think?


Read Full Post »

Is there a single future for Data Warehousing?  Obviously the path is not set in concrete, but some clear trends are emerging when people advocate advancement.  Here, I quickly sketch the alternatives listed last week, and synthesise a direction.

I have rearranged the listed order of last week’s future visions, to group them more logically.

Classic DW (#1): no innovation, but in its various warehouse/mart permutations it will survive for some time to come.

ROLAP (#7): Not only do we see the ‘is ROLAP dead’ debate, but the ‘ROLAP is back’ movement (see ‘is MOLAP dead?’).  Something to the effect of a simple copy of production data, instead of transformation into a star/snowflake form.  The argument here is that improvements in hardware processing/storage obviate the need for ETL.  Simple, yes, and I heard this argument only a few months ago from a consultant at a TDWI meeting: “why a data warehouse at all? Not necessary with current technology”.  Not so fast: there will always be a need to optimise from transaction processing to querying, particularly with larger amounts of data.

Bus architecture (#3):  This is also described as a Kimball architecture, Ralph Kimball being one of the two original DW prophets.  His original article on this dates back to 1998, and can be found here.  Also discussed here, described as not advocating centralised DWs, and consisting of two types of data marts, aggregated and atomic, while incorporating star schemas.  Although I’ve seen this mentioned recently as an alternative to a DW, it’s not exactly new.

MPP Warehouse appliance (#2): That’s Massive Parallel Processing, and a warehouse appliance is simply a packaged product – in theory, just feed it the data.  This debate could be framed in the same language as ERP commodification, an accepted reality – who still builds/maintains a customer ERP from scratch?  Likewise DWs.  Implementation would be an admixture of customisation and adjusting an enterprise’s own business processes to meet the software – SAP being the archetype here.  Is it the future?  One version, yes.  This is no conceptual advance, just more of the same, albeit heavily commodified.

Column-oriented databases: Yes, that’s opposed to row-oriented.  See the overview in Wikipedia.   There are particular efficiencies in a Data Warehouse context, somewhat reminiscent of cubing.

Enterprise Information Integration (#6): In a nutshell, EII entails the abstraction of data from consumers, and the provision of a standardised interface.  This will be familiar as APIs such as ODBC and OLE DB.

Data Delivery Platform (#4):  This from Rick van der Lans.  He starts by enumerating flaws in the classical architecture, such as issues of latency, redundacy, flexibility, unstructure data, and non-shareable specifications.  His Platform, echoing EII, is a mediation between data sources and data consumers, such that the consumers request information without regard to the structure or source – which, presumably, can change without affecting consumer or provider.  He doesn’t advocate removal of the DW, just abstraction of the data from its consumers.  He doesn’t seem to prescribe the exact form the Platform (and its data) will take at any time, and that is the point – that the model contains sufficient flexibility to be able to change the inherent storage architecture, or even hardware/software technologies, without affecting consumer or provider.  Rick’s description here.

Data Provisioning (#5):  Similar to the Data Delivery Platform, this was described in a previous post.  In a nutshell, it revolves around a staging area where the data is clean and governed, and data is delivered any way desired, especially as fungible (disposable) DWs.  I had asked its progenitor, Karen Heath, for some further documentation, but all she sent me was Ricks’ Data Delivery Platform above, from which it obviously originated (so no web reference available for this).  Not to detract from Rick’s work, it stands in contrast with its additions, and would be a work in progress.

DW 2.0 (#8): This term has actually been trademarked by Bill Inmon, the other DW prophet.  A central issue is to come to grips with unstructured data, and it looks like Inmon goes for integration via a(n amount of) parsing of the documents as they are stored (as ETL processing, presumably).  This to include extracting/standardising dates and other measures found in the data.  Presumably extraction of titular information is also not beyond the pale.  This would seem to be an advance, rather than a restructuring, of the classic DW, to accomodate the already-pressing issue of unstructured data.  (See Inmon’s exposition here.)

In summary, the “alternatives” frequently offered up in general entail either more of the same, or a decoupling of data consumers from the specific underlying technology or sources.  That is probably the best takeaway: that some sort of SOA abstraction could well come to predominate in the DW business – which ties in nicely with the “anywhere” philosophy of cloud computing.  I for one, would be happy if my clients the data consumers could extract their business information/knowledge without needing to be trained/retrained on technologies or underlying data structures.

It makes sense to abstract repository from user.  Whether there’s a separate staging area or not, where the cleaning and transforms are done, whether untransformed data is kept, are relatively minor points of separation.  I for one would like a) access to untransformed data, for reality checks, and b) strong data governance, which entails business ownership of data, an overarching responsibility for the data’s quality and standards. (This is something business stakeholders usually seem to flick to I.T., who are not of themselves fully skilled to make business sense of the data.)

Abstraction, governance, and access to untransformed data.  That’s a healthy wish list.

Update 30-Jul-09:  IBM’s new Smart Analytics Optimiser is yet another option – in a uniquely IBM vein.  It sounds like a rather involved way to overlay analytics on production data.  I’m unconvinced it will last, but you can read about it here.

Update 12-Aug-09:  Added Column-based architecture to the above list.’

Update 21-Oct-09: In my latest post, BI Readings 2, I outline and link to TDWI’s paper on next-gen DWs.

Read Full Post »