Unravelling the future of Data Warehousing

Is there a single future for Data Warehousing?  Obviously the path is not set in concrete, but some clear trends are emerging when people advocate advancement.  Here, I quickly sketch the alternatives listed last week, and synthesise a direction.

I have rearranged the listed order of last week’s future visions, to group them more logically.

Classic DW (#1): no innovation, but in its various warehouse/mart permutations it will survive for some time to come.

ROLAP (#7): Not only do we see the ‘is ROLAP dead’ debate, but the ‘ROLAP is back’ movement (see ‘is MOLAP dead?’).  Something to the effect of a simple copy of production data, instead of transformation into a star/snowflake form.  The argument here is that improvements in hardware processing/storage obviate the need for ETL.  Simple, yes, and I heard this argument only a few months ago from a consultant at a TDWI meeting: “why a data warehouse at all? Not necessary with current technology”.  Not so fast: there will always be a need to optimise from transaction processing to querying, particularly with larger amounts of data.

Bus architecture (#3):  This is also described as a Kimball architecture, Ralph Kimball being one of the two original DW prophets.  His original article on this dates back to 1998, and can be found here.  Also discussed here, described as not advocating centralised DWs, and consisting of two types of data marts, aggregated and atomic, while incorporating star schemas.  Although I’ve seen this mentioned recently as an alternative to a DW, it’s not exactly new.

MPP Warehouse appliance (#2): That’s Massive Parallel Processing, and a warehouse appliance is simply a packaged product – in theory, just feed it the data.  This debate could be framed in the same language as ERP commodification, an accepted reality – who still builds/maintains a customer ERP from scratch?  Likewise DWs.  Implementation would be an admixture of customisation and adjusting an enterprise’s own business processes to meet the software – SAP being the archetype here.  Is it the future?  One version, yes.  This is no conceptual advance, just more of the same, albeit heavily commodified.

Column-oriented databases: Yes, that’s opposed to row-oriented.  See the overview in Wikipedia.   There are particular efficiencies in a Data Warehouse context, somewhat reminiscent of cubing.

Enterprise Information Integration (#6): In a nutshell, EII entails the abstraction of data from consumers, and the provision of a standardised interface.  This will be familiar as APIs such as ODBC and OLE DB.

Data Delivery Platform (#4):  This from Rick van der Lans.  He starts by enumerating flaws in the classical architecture, such as issues of latency, redundacy, flexibility, unstructure data, and non-shareable specifications.  His Platform, echoing EII, is a mediation between data sources and data consumers, such that the consumers request information without regard to the structure or source – which, presumably, can change without affecting consumer or provider.  He doesn’t advocate removal of the DW, just abstraction of the data from its consumers.  He doesn’t seem to prescribe the exact form the Platform (and its data) will take at any time, and that is the point – that the model contains sufficient flexibility to be able to change the inherent storage architecture, or even hardware/software technologies, without affecting consumer or provider.  Rick’s description here.

Data Provisioning (#5):  Similar to the Data Delivery Platform, this was described in a previous post.  In a nutshell, it revolves around a staging area where the data is clean and governed, and data is delivered any way desired, especially as fungible (disposable) DWs.  I had asked its progenitor, Karen Heath, for some further documentation, but all she sent me was Ricks’ Data Delivery Platform above, from which it obviously originated (so no web reference available for this).  Not to detract from Rick’s work, it stands in contrast with its additions, and would be a work in progress.

DW 2.0 (#8): This term has actually been trademarked by Bill Inmon, the other DW prophet.  A central issue is to come to grips with unstructured data, and it looks like Inmon goes for integration via a(n amount of) parsing of the documents as they are stored (as ETL processing, presumably).  This to include extracting/standardising dates and other measures found in the data.  Presumably extraction of titular information is also not beyond the pale.  This would seem to be an advance, rather than a restructuring, of the classic DW, to accomodate the already-pressing issue of unstructured data.  (See Inmon’s exposition here.)

In summary, the “alternatives” frequently offered up in general entail either more of the same, or a decoupling of data consumers from the specific underlying technology or sources.  That is probably the best takeaway: that some sort of SOA abstraction could well come to predominate in the DW business – which ties in nicely with the “anywhere” philosophy of cloud computing.  I for one, would be happy if my clients the data consumers could extract their business information/knowledge without needing to be trained/retrained on technologies or underlying data structures.

It makes sense to abstract repository from user.  Whether there’s a separate staging area or not, where the cleaning and transforms are done, whether untransformed data is kept, are relatively minor points of separation.  I for one would like a) access to untransformed data, for reality checks, and b) strong data governance, which entails business ownership of data, an overarching responsibility for the data’s quality and standards. (This is something business stakeholders usually seem to flick to I.T., who are not of themselves fully skilled to make business sense of the data.)

Abstraction, governance, and access to untransformed data.  That’s a healthy wish list.

Update 30-Jul-09:  IBM’s new Smart Analytics Optimiser is yet another option – in a uniquely IBM vein.  It sounds like a rather involved way to overlay analytics on production data.  I’m unconvinced it will last, but you can read about it here.

Update 12-Aug-09:  Added Column-based architecture to the above list.’

Update 21-Oct-09: In my latest post, BI Readings 2, I outline and link to TDWI’s paper on next-gen DWs.

~ by Stephen Simmonds on 10 July, 2009.

3 Responses to “Unravelling the future of Data Warehousing”

  1. Hey there,

    I think this is an interesting article, especially about the entire ecosystem for Data Warehousing — Data Provisioning and Delivery aren’t always considered when people look at platforms.

    I think you might find an article I’ve written on http://www.roadtofailure.com to be of interest. It describes the first revolution in Data Warehousing — affordable, scalable processing on Big Data. Take a look if you get a chance :)

    • Thx, Bradford.

      Must admit, I’d not even heard of those Apache products, Hbase/Hadoop, much less heard tell of an implementation. Something to watch out for, definitely, especially as open source BI/DW gains traction.

      Still, I’m not myself ready to punt on either the decline or eclipse of the DW in the near future…

  2. [...] future of Data Warehouses Although I had a look at some new technology in data warehousing recently, this second TWDI report (Next generation Data Warehouse Platforms) is necessarily more [...]

Leave a Reply