Saturday, January 3, 2009

Should data mining be a part of a datawarehousing tool

Data Mining ... we keep hearing about it so often ... but then what is all the fuss about data mining...?

Imagine this ... you are a consumer company selling widgets for the past 20 years and have about 5 years of sales data with you with a reasonable amount of data cleanliness.

A product variant is being launched with much fanfare and a lot of money spent on advertising. The budget committee is concerned that the spends are on the higher side and wants to know how the company has favored in similar launches and the sales growth due to a promotion and the amount of sales that can be attributed to a particular form of advertising...

Very pertinent question..... but what does it bode for us...?

Scenario 1: You call in the data mining experts - who are they ?
typically domain experts who understand the market dynamics very well having worked in the same field for years.

The number crunchers - they who can bring sense out of data

These are the people who work with the domain experts to see how they can arrive at the required answer using complicated models , some statistics , some stochastic , some causal etc etc... jargon to many...!!

Scenario 2: The data warehousing team is asked to provide some answers to the question since they best know the data that they store....

Interesting scenario ... what do you do ?

Lets take the following scenario in SAP BI....

The first answer that you get is "Go to TCODE RSANWB" - ( Have come across some such answers in the forums!!!)

what is this - this is the Analysis Process Designer which 'also' has data mining capabilities.

But then the dilemma surrounding data mining in a data warehouse environment is that there is a vast skill set mismatch.

The person who owns / runs the data warehouse understands the data that is in there but not necessarily all the information in there. The architect knows the data models and relationship but then the knowledge brought in by the data mining modeler is absent. For instance I know the data resides in the Sales order Line Item cube... but then I have no clue as to whether I have to use a cluster or a decision tree or a regression etc.

Statistics is not my cup of tea. And even if I do decide to try something out ... I get answers which I cannot make sense out of. I can only present the results to the person who asked for it and hope that this is what they wanted....

Data Mining moreover has become a buzzword with so many companies setting up shop in the field of Analytics and specialized niche domains. Due to this each company tries to distinguish itself on factors of complex data mining models and domain expertise.

One thing that gets lost in the detail is the simpler data models for data mining. We used to do a T test , F test on excel and on paper ... such simple tests could still be done in SAP BI but then when the same becomes covered , dressed up in the form of a data mining model .. the meaning gets lost.

This maybe one of the reasons why data mining in SAP BI or any data warehouse

What do you think ? should data mining be a part of mainstream data warehousing or be left to the domain experts and the analytics experts to figure out ?

1 comment:

  1. Ware housing of data inturn towards cleansing and extracting towards sensible dashboards makes bi technology complete.