Pentaho Data Integration - Kettle

As an ETL developer, I would like a step for maintaining helper tables for multivalued dimensions

Details

  • Type: New Feature New Feature
  • Status: Open Open
  • Priority: Severe Severe
  • Resolution: Unresolved
  • Affects Version/s: None
  • Fix Version/s: Future Release
  • Component/s: Step
  • Customer Case:
  • Description:
    Hide
    Imagine you have a fact table joined to the helper table. The helper table is joined to the dimension table, but also includes a weighting factor column (such that fact rollups are accurate). This is used to handle many-to-many relationships in the dimension.
    Summarized nicely at: http://www.dbmsmag.com/9808d05.html

    Imagine I have two transactional records:

    1,one,$100
    2,two,$500.

    I also have the following records relating site to market:

    100,1,10
    101,1,20
    102,2,10

    Site 1 is part of market 10 & 20. Site 2 is just part of market 10.
    So you can see there is a many-to-many relationship between site and market.

    Ideally, I need a fact table built that would contain the following:

    fact key, join to helper table key, $50
    fact key, join to helper table key, $50
    fact key, join to helper table key, $500

    This way the sum of the revenue is correct.

    This is where the dimensional helper table comes in. It would look like the following:

    helper key, site id, market id, weighting
    200,1,10,0.5
    201,1,20,0.5
    202,2,10,1.0

    There must be a way to build the helper table dynamically. (i.e. weights are dependent on the number of many to many relationships to market for a given site).


    Show
    Imagine you have a fact table joined to the helper table. The helper table is joined to the dimension table, but also includes a weighting factor column (such that fact rollups are accurate). This is used to handle many-to-many relationships in the dimension. Summarized nicely at: http://www.dbmsmag.com/9808d05.html Imagine I have two transactional records: 1,one,$100 2,two,$500. I also have the following records relating site to market: 100,1,10 101,1,20 102,2,10 Site 1 is part of market 10 & 20. Site 2 is just part of market 10. So you can see there is a many-to-many relationship between site and market. Ideally, I need a fact table built that would contain the following: fact key, join to helper table key, $50 fact key, join to helper table key, $50 fact key, join to helper table key, $500 This way the sum of the revenue is correct. This is where the dimensional helper table comes in. It would look like the following: helper key, site id, market id, weighting 200,1,10,0.5 201,1,20,0.5 202,2,10,1.0 There must be a way to build the helper table dynamically. (i.e. weights are dependent on the number of many to many relationships to market for a given site).
  1. 3698_helperTable.zip
    (3 kB)
    Jens Bleuel
    14/Aug/08 8:51 AM

Activity

Jens Bleuel made changes - 14/Aug/08 8:43 AM
Field Original Value New Value
Component/s Step [ 10321 ]
Customer Case 3698
Hide
Jens Bleuel added a comment - 14/Aug/08 8:51 AM
Sample as a Transformation for starting with this...
Problem when counts don't divide equally, e.g. when there is a rest of division (need to add the rest to one of the parts).
Show
Jens Bleuel added a comment - 14/Aug/08 8:51 AM Sample as a Transformation for starting with this... Problem when counts don't divide equally, e.g. when there is a rest of division (need to add the rest to one of the parts).
Jens Bleuel made changes - 14/Aug/08 8:51 AM
Attachment 3698_helperTable.zip [ 14123 ]
Hide
Jens Bleuel added a comment - 14/Aug/08 8:52 AM
User comment:
I think it would be interesting to add this as a permanent step (along the same lines as the closure generator step) for others to use.

My suggestion is that you would give the step the group id in the dimension and the sub group id of the dimension. It would handle the calculations and generate the helper table with the dimensional key and add the weighting. In more complex examples as change data comes in, it should also be able to allow for versions and create and end dates for the helper table records.

Very powerful stuff with applicability in many industries (such an insurance with diagnosis groups, etcetera).
Show
Jens Bleuel added a comment - 14/Aug/08 8:52 AM User comment: I think it would be interesting to add this as a permanent step (along the same lines as the closure generator step) for others to use. My suggestion is that you would give the step the group id in the dimension and the sub group id of the dimension. It would handle the calculations and generate the helper table with the dimensional key and add the weighting. In more complex examples as change data comes in, it should also be able to allow for versions and create and end dates for the helper table records. Very powerful stuff with applicability in many industries (such an insurance with diagnosis groups, etcetera).
Jake Cornelius made changes - 16/Feb/09 9:05 AM
Status Open [ 1 ] Open [ 1 ]
Fix Version/s Future Release [ 10307 ]
Assignee Triage [ project admin ] Unassigned User [ unassigned ]
Jake Cornelius made changes - 26/Feb/09 8:09 AM
Customer Case 3698
Jake Cornelius made changes - 26/Feb/09 8:09 AM
Status Open [ 1 ] Open [ 1 ]
Priority Unknown [ 7 ] Severe [ 3 ]
Gretchen Moran made changes - 04/Aug/09 6:34 PM
Workflow Pentaho Engineering 3.0 Workflow [ 178440 ] Pentaho Engineering 4.0 Workflow [ 370393 ]
Gretchen Moran made changes - 04/Aug/09 6:37 PM
Workflow Pentaho Engineering 3.0 Workflow [ 370393 ] Pentaho Engineering 4.0 Workflow [ 372811 ]
Marc Batchelor made changes - 04/Aug/09 8:38 PM
Workflow Pentaho Engineering 4.0 Workflow [ 372811 ] Pentaho Engineering 5.0 Workflow [ 411108 ]
Marc Batchelor made changes - 17/Aug/09 10:16 AM
Workflow Pentaho Engineering 5.0 Workflow [ 411108 ] Pentaho Engineering 6.0 Workflow [ 426833 ]
Marc Batchelor made changes - 06/May/10 6:57 PM
Workflow Pentaho Engineering 6.0 Workflow [ 426833 ] Pentaho Engineering 7.0 Workflow [ 501901 ]
Marc Batchelor made changes - 04/Aug/10 10:19 PM
Workflow Pentaho Engineering 7.0 Workflow [ 501901 ] Pentaho Engineering 8.0 Workflow [ 524307 ]
Transition Status Change Time Execution Times Last Executer Last Execution Date
Open Open Open Open
186d 1h 22m
1
Jake Cornelius
16/Feb/09 9:05 AM
Open Open Open Open
9d 23h 4m
1
Jake Cornelius
26/Feb/09 8:09 AM

People

Dates

  • Created:
    14/Aug/08 8:42 AM
    Updated:
    26/Feb/09 8:09 AM