Uploaded image for project: 'Pentaho Analysis - Mondrian'
  1. Pentaho Analysis - Mondrian
  2. MONDRIAN-2398

Wrong results when aggregate tables used to calculate avg value which rolls up one or more NULLs

    Details

    • Type: Bug
    • Status: Closed
    • Severity: Urgent
    • Resolution: Fixed
    • Affects Version/s: Master, 5.3.0.5, 5.4.0.2 GA
    • Fix Version/s: 7.1.0 GA
    • Component/s: None
    • Labels:
      None
    • Story Points:
      0
    • Notice:
      When an issue is open, the "Fix Version/s" field conveys a target, not necessarily a commitment. When an issue is closed, the "Fix Version/s" field conveys the version that the issue was fixed in.
    • Sprint Team:
      Maintenance

      Description

      When calculating an average from a summed rollup in the aggregate table, Mondrian will compute

      sum(factvalue) / fact_count

      I.e. the summed fact value over the count of the fact rows in that rollup. Unfortunately if any of the individual fact values is equal to NULL this calculation will differ from a SQL avg, which excludes NULLs.

      When this was discussed on the Mondrian dev group, the suggestion for a fix was to include new fact_count columns in the aggregate table that are specific to the measure, such that we have a count of non-NULLs to use for the avg calc.

      Possible workaround While this bug is outstanding, one workaround would be to create aggregate tables that are specific to the avg measure in question (i.e. no other measures present) and use a fact_count value that matches the count of non-NULLs. This workaround needs to be tested before suggesting for production use.

        Issue Links

          Activity

          Hide
          henninggodske Henning Godske added a comment -

          The use of fact_count shall be removed. It do only make sense when the aggregations are true sums of fact tables.
          Many other uses are possible for aggregation tables!

          In my case I have very different aggregation types per row in the data base table (sum, avg, max, etc). So fact_count is always set to 1.
          The aggregated data is basically not aggregated from the fact tables, but loaded as already aggregated and precalculated data from other systems using complex aggregation formulas (e.g. weighted average)). This is a very common case in a real-world production system.

          So only way to calculate a proper average value is to use the native "avg" function of the data-base. I think all data-bases have one. The same is true for other aggregation types.

          Show
          henninggodske Henning Godske added a comment - The use of fact_count shall be removed. It do only make sense when the aggregations are true sums of fact tables. Many other uses are possible for aggregation tables! In my case I have very different aggregation types per row in the data base table (sum, avg, max, etc). So fact_count is always set to 1. The aggregated data is basically not aggregated from the fact tables, but loaded as already aggregated and precalculated data from other systems using complex aggregation formulas (e.g. weighted average)). This is a very common case in a real-world production system. So only way to calculate a proper average value is to use the native "avg" function of the data-base. I think all data-bases have one. The same is true for other aggregation types.
          Hide
          henninggodske Henning Godske added a comment - - edited

          Still not fixed. How hard can it be to use native AVG function and skip Mondrians own (not working) function?

          Show
          henninggodske Henning Godske added a comment - - edited Still not fixed. How hard can it be to use native AVG function and skip Mondrians own (not working) function?

            People

            • Assignee:
              Unassigned
              Reporter:
              mcampbell Matt Campbell
            • Votes:
              2 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: