Pentaho Analysis - Mondrian
  1. Pentaho Analysis - Mondrian
  2. MONDRIAN-962

Create support for Non-Additive and Semi-Additive Measures

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Severe Severe
    • Resolution: Unresolved
    • Affects Version/s: 3.6.x (5.0.0 GA Backlog)
    • Component/s: None
    • Labels:
      None

      Description

      Please create support for Non-Additive measure creation in the schema.

      It is very difficult to go creating measures that handle all the roll-up and non-roll up cases all along the hierarchies. This would be handled much better schema side by standardized ways of declaring them.

        Issue Links

          Activity

          Hide
          Jake Cornelius added a comment -
          Can you search and de-dupe
          Show
          Jake Cornelius added a comment - Can you search and de-dupe
          Hide
          Jagdeesh.SS added a comment -
          Hi,

          Kindly include "semi-additive" measures as well.
          Request from the customer.

          Regards,
          Jagdeesh
          Show
          Jagdeesh.SS added a comment - Hi, Kindly include "semi-additive" measures as well. Request from the customer. Regards, Jagdeesh
          Hide
          Julian Hyde added a comment -
          Here is a specification for a possible feature to address this issue.

          The idea is to specify which dimensions a measure can roll up on. For example:

          <Measure name='Inventory units' column='inventory_units' aggregator='sum' datatype='integer' rollupExcept='Time'/>

          The 'rollupExcept' attribute has a comma-separated list of dimensions it cannot roll up on. Mondrian would have to find a fact or agg table that has the measure at that precise level of the Time dimension.

          If you specify 'rollupExcept=*' you are creating a non-additive measure.

          (If we implemented this feature, I would probably do the XML slightly differently, so we are not parsing asterisks and commas out of an XML attribute. But you get the idea.)

          This feature is not perfect. It doesn't allow you to say "you can roll up from month to year using the LAST_VALUE function", so you'd end up creating more agg tables than absolutely necessary. But I think it is a reasonable first step, and I can't envisage the perfect solution right now. It would be about 4 weeks of development effort, on top of mondrian-4.
          Show
          Julian Hyde added a comment - Here is a specification for a possible feature to address this issue. The idea is to specify which dimensions a measure can roll up on. For example: <Measure name='Inventory units' column='inventory_units' aggregator='sum' datatype='integer' rollupExcept='Time'/> The 'rollupExcept' attribute has a comma-separated list of dimensions it cannot roll up on. Mondrian would have to find a fact or agg table that has the measure at that precise level of the Time dimension. If you specify 'rollupExcept=*' you are creating a non-additive measure. (If we implemented this feature, I would probably do the XML slightly differently, so we are not parsing asterisks and commas out of an XML attribute. But you get the idea.) This feature is not perfect. It doesn't allow you to say "you can roll up from month to year using the LAST_VALUE function", so you'd end up creating more agg tables than absolutely necessary. But I think it is a reasonable first step, and I can't envisage the perfect solution right now. It would be about 4 weeks of development effort, on top of mondrian-4.
          Hide
          Julian Hyde added a comment -
          Design:

          Example #1: A measure similar to distinct-count that cannot be rolled up.
          <Measure name='Customer Count' aggregation='none'/>

          If there is not an aggregate table of the exact granularity, the result is null (and Mondrian logs a warning). It is impossible to calculate the value, because there is no known SQL aggregate function to compute it from the fact table rows.

          Example #2: A measure similar to distinct-count that can be rolled up along a particular dimension
          <Measure name='Customer Count' aggregation='none'>
          <Rollup dimension='Customer' aggregation='sum'/>
          </Measure>

          To illustrate that you can rollup distinct-count along the customer dimension but not the product dimension:
          If you know that 5 customers bought beer and 3 customers bought milk, then you can't compute how many customers bought beer or milk. (The number is between 5 and 8.)
          If you know that 7 male customers bought beer and 4 female customers bought beer, you know that 11 customers bought beer.

          If the value exists in an aggregate table for a set of customers, you can roll up customers.

          Example #3: Other forms of roll-up
          <Measure name='inventory' aggregation='sum'>
          <Rollup dimension='Time' aggregation='last'/>
          <Rollup dimension='Warehouse' aggregation='avg'/>
          </Measure>

          To compute ([Measures].[Inventory], [Warehouse].[Region 1], [Time].[2012].[5]), we first compute ([Measures].[Inventory], <warehouse>, [Time].[2012].[5]) for each warehouse, then avg them.


          Tasks:
          1. Implement aggregation='none'. Make sure to disable in-memory rollup. 10d.
          2. Implement <Rollup>, to allow rollup using sum along 1 or more dimensions. In-memory rollup disabled. 5d.
          3. Enable in-memory rollup when rollup directive is present. 5d.
          4. Implement <Rollup> to allow rollup using min, max or a mixture of sum, min, max along dimensions. May need to generate nested queries: 'SELECT sum… FROM (SELECT max… GROUP BY x, y, z) GROUP BY x, y'. ?d
          5. Implement <Rollup … aggregation='last'/>. ?d
          6. Implement user-defined rollups. ?d
          Show
          Julian Hyde added a comment - Design: Example #1: A measure similar to distinct-count that cannot be rolled up. <Measure name='Customer Count' aggregation='none'/> If there is not an aggregate table of the exact granularity, the result is null (and Mondrian logs a warning). It is impossible to calculate the value, because there is no known SQL aggregate function to compute it from the fact table rows. Example #2: A measure similar to distinct-count that can be rolled up along a particular dimension <Measure name='Customer Count' aggregation='none'> <Rollup dimension='Customer' aggregation='sum'/> </Measure> To illustrate that you can rollup distinct-count along the customer dimension but not the product dimension: If you know that 5 customers bought beer and 3 customers bought milk, then you can't compute how many customers bought beer or milk. (The number is between 5 and 8.) If you know that 7 male customers bought beer and 4 female customers bought beer, you know that 11 customers bought beer. If the value exists in an aggregate table for a set of customers, you can roll up customers. Example #3: Other forms of roll-up <Measure name='inventory' aggregation='sum'> <Rollup dimension='Time' aggregation='last'/> <Rollup dimension='Warehouse' aggregation='avg'/> </Measure> To compute ([Measures].[Inventory], [Warehouse].[Region 1], [Time].[2012].[5]), we first compute ([Measures].[Inventory], <warehouse>, [Time].[2012].[5]) for each warehouse, then avg them. Tasks: 1. Implement aggregation='none'. Make sure to disable in-memory rollup. 10d. 2. Implement <Rollup>, to allow rollup using sum along 1 or more dimensions. In-memory rollup disabled. 5d. 3. Enable in-memory rollup when rollup directive is present. 5d. 4. Implement <Rollup> to allow rollup using min, max or a mixture of sum, min, max along dimensions. May need to generate nested queries: 'SELECT sum… FROM (SELECT max… GROUP BY x, y, z) GROUP BY x, y'. ?d 5. Implement <Rollup … aggregation='last'/>. ?d 6. Implement user-defined rollups. ?d
          Hide
          Julian Hyde added a comment -
          1572 describes a case of a semi-additive measure over a many-to-many relationship.
          Show
          Julian Hyde added a comment - 1572 describes a case of a semi-additive measure over a many-to-many relationship.
          Hide
          Bryan Senseman added a comment - - edited
          I would like to add another use case around distinct counts. At the aggregate measure level tell mondrian that it is valid to re-calculate the distinct count from the aggregate table versus doing a summarization. This is especially useful when you are actually counting a dimensional key which is included in the aggregate! In my use case I have equipment, owners, locations, statuses and hours (plus alot more dimensionality that makes pre-calculating agg tables an impossible task), I can create aggregate tables on TIME that result in 90% row compression which would allow for VERY fast distinct counts, yet I have no way to tell mondrian this is ok.

          Maybe something as simple as:
                  <AggMeasure column="equipment_count" name="[Measures].[Equipment Count]" reCalc="true">
                  </AggMeasure>

          On a releated note; it appears Mondrian gets comfused if you have both a dimension and a measure on the same field and have aggregate tables. Mondrian never used the aggregate table until I did the following trick.
              <Measure name="Equipment Count" datatype="Integer" formatString="#,##0" aggregator="distinct-count" caption="Equipment Count" description="Count of distinct equipment items during the reporting period." visible="true">
                <MeasureExpression>
                  <SQL dialect="generic">
                    <![CDATA["fact_utilization_hourly"."equipment_id"]]>
                  </SQL>
                </MeasureExpression>
              </Measure>
          Show
          Bryan Senseman added a comment - - edited I would like to add another use case around distinct counts. At the aggregate measure level tell mondrian that it is valid to re-calculate the distinct count from the aggregate table versus doing a summarization. This is especially useful when you are actually counting a dimensional key which is included in the aggregate! In my use case I have equipment, owners, locations, statuses and hours (plus alot more dimensionality that makes pre-calculating agg tables an impossible task), I can create aggregate tables on TIME that result in 90% row compression which would allow for VERY fast distinct counts, yet I have no way to tell mondrian this is ok. Maybe something as simple as:         <AggMeasure column="equipment_count" name="[Measures].[Equipment Count]" reCalc="true">         </AggMeasure> On a releated note; it appears Mondrian gets comfused if you have both a dimension and a measure on the same field and have aggregate tables. Mondrian never used the aggregate table until I did the following trick.     <Measure name="Equipment Count" datatype="Integer" formatString="#,##0" aggregator="distinct-count" caption="Equipment Count" description="Count of distinct equipment items during the reporting period." visible="true">       <MeasureExpression>         <SQL dialect="generic">           <![CDATA["fact_utilization_hourly"."equipment_id"]]>         </SQL>       </MeasureExpression>     </Measure>
          Hide
          Sarah Baca added a comment - - edited
          This needs to be broken down. Prioritize the work within the case, please. Non-additive, semi-additive measurement requirements.
          Show
          Sarah Baca added a comment - - edited This needs to be broken down. Prioritize the work within the case, please. Non-additive, semi-additive measurement requirements.

            People

            • Assignee:
              Unassigned User
              Reporter:
              Brandon Jackson
            • Votes:
              18 Vote for this issue
              Watchers:
              22 Start watching this issue

              Dates

              • Created:
                Updated: