OLAP/MDX query correlation/association within same dimension -
03-24-2006
, 07:16 AM
Simple dim mod setup:
Fact table is (user_id, context_id, action_id, <facts/measures>)
User dimension is (user_id, <user attributes>)
Context dimension is (context_id, <context attributes>)
Action dimension is (action_id, action_name, <other action attributes>)
Users does a number of actions in a number of different contexts, this
is recorded in the fact table. Hierarchies are not relevant for the
time being.
An OLAP cube is created in SSAS 2005.
I need to query the correlation or association within same dimension,
and I need the result as a pivot table with the same set of members on
both axes, e.g. a ActionName X ActionName cross tabulation.
As I understand it, by default you are not allowed to execute MDX with
same dimension on two or more axes. This is initially logical, due to
that the outcome would a near complete zero-matrix, but with the
measure aggregate in the diagonal. The content of the diagonal is far
easier and more intuitive to retrieve in other ways.
However this zero-matrix with diagonal was not my intention anyway. By
introducing a "Sample Unit" or as called in Data Mining "The Case
Key", a ActionName X ActionName query would make far more sense in my
pursuit of measuring the correlation of Action Names. E.g. selecting
the sample unit to be users and performing the query should tell me how
Actions are used together by users, i.e. the correlation in pairs of
actions if query is 2 dimensional, triplets if query is 3 dimensional,
and so on. The measure aggregate could be anything from distinct user
count to aggregates on the facts. E.g. 43 users did both Action_X and
Action_Y, 11 users did both Action_Y and Action_Z, and so on for all
combinations of two actions.
I need to be able to change the sample unit to other dimensions, e.g.
selecting context would give me the correlation of action in respect to
context. I also need to be able to change the dimension of interest,
e.g. Context X Context with users as sample unit to see the correlation
of contexts over users.
This can be done directly on a relational database with SQL, however
you do have to do the pivoting yourself. But with millions of entries
in both fact and dimension tables, the performance becomes a serious
problem in a real time system, and therefore I put my eyes on cubes.
I looked into Data Mining with Sql Server 2005 with is able to run on
OLAP-cubes and hence potentially high performance. However it seems
like DM is all about prediction based on rare batch generated Mining
Models in flattened tables. I came to the conclusion that what I
actually needed was a pivot version of the Mining Model tables. But I
do not expect that real time generation and pivoting of Mining Models
are feasible - because in that case I might as well do my own SQL on
a relational database to ensure optimal performance tuning.
So what do I do? I really hoped for OLAP and MDX on SSAS 2005 would do
the trick, but does it, and how?
I suspect that the solution would involve changing the dimensions I
want to be sample units into fact tables and thus introduce
many-to-many relations, which is handled by SSAS 2005.
/je |