Details

    • Type: Bug
    • Status: Closed
    • Severity: Urgent
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 6.0.0 GA
    • Component/s: Aggregation Tables
    • Labels:
      None
    • Notice:
      When an issue is open, the "Fix Version/s" field conveys a target, not necessarily a commitment. When an issue is closed, the "Fix Version/s" field conveys the version that the issue was fixed in.
    • Sprint Team:
      Maintenance

      Description

      When aggregate tables are used and AVG is calculated among measures, it is calculated incorrectly.

      The SQL query looks like this :

      select
      sum(`id`) as `m0`,
      sum(`measure_avg`) / sum(`fact_count`) as `m1`,
      sum(`measure_sum`) as `m2`
      from
      `agg_c_101_something`

      Indeed, sum of averages divided by sum of number of facts is not equal to average.

      Possible solution: always calculate sum in aggregate tables instead of avg, using

      INSERT INTO agg_l_101_something (
      id,
      measure_avg,
      measure_sum,
      fact_count)
      SELECT
      COUNT AS `id`,
      SUM(`measure`) AS `measure_avg`, /* NOW AVG */
      SUM(`measure`) AS `measure_sum`,
      COUNT AS `fact_count`
      FROM
      etc.

      in this case the result will be correct.

      ----------------

      NOTE: created 2 new cases (MONDRIAN-2398, MONDRIAN-2399) to cover variations of bad behavior with averages in aggregate tables. We've focused this case specifically on the issue of average values from aggregate tables being mishandled when a query hits the lowest granularity of the table. NULL handling and issues with aggregation method selection are covered under the other 2 cases.

        Issue Links

          Activity

          Hide
          henninggodske Henning Godske added a comment - - edited

          Similar issue if some of the measures are NULL

          avg(1,2,null,3,4) = 2.5

          but

          sum(1,2,null,3,4) / sum(1,1,1,1,1) = 2.0

          As fact_count will still be 1 (or other positive value) if other measures in data table row is not null.

          Show
          henninggodske Henning Godske added a comment - - edited Similar issue if some of the measures are NULL avg(1,2,null,3,4) = 2.5 but sum(1,2,null,3,4) / sum(1,1,1,1,1) = 2.0 As fact_count will still be 1 (or other positive value) if other measures in data table row is not null.
          Hide
          henninggodske Henning Godske added a comment -

          Still no plans for a fix? This produces invalid data!
          Why is it not fixed?

          Show
          henninggodske Henning Godske added a comment - Still no plans for a fix? This produces invalid data! Why is it not fixed?
          Hide
          henninggodske Henning Godske added a comment -

          The description is NOT correct.

          Event if it was possible to always use the sum, the calculations are still wrong in the case some values are NULL. Please read previous comments.

          As aggregation values can be anything, including externally calculated values (like avg), the only way to ensure a proper average calculation is to call the database native avg function.

          Show
          henninggodske Henning Godske added a comment - The description is NOT correct. Event if it was possible to always use the sum, the calculations are still wrong in the case some values are NULL. Please read previous comments. As aggregation values can be anything, including externally calculated values (like avg), the only way to ensure a proper average calculation is to call the database native avg function.
          Hide
          mcampbell Matt Campbell added a comment -

          You're correct, presence of NULL values will result in incorrect results. I've created a separate defect for that issue (MONDRIAN-2398). Unfortunately that's only one of 3 identified issues with averages in aggregate tables. This case will cover the issue of bad data when queries are for the lowest granularity of the aggregate table (whether or not NULLs are present).

          Show
          mcampbell Matt Campbell added a comment - You're correct, presence of NULL values will result in incorrect results. I've created a separate defect for that issue ( MONDRIAN-2398 ). Unfortunately that's only one of 3 identified issues with averages in aggregate tables. This case will cover the issue of bad data when queries are for the lowest granularity of the aggregate table (whether or not NULLs are present).
          Hide
          andrey.khayrutdinov Andrey Khayrutdinov (Inactive) added a comment -

          Henning Godske, doesn't MONDRIAN-2398 describe the problem you are talking about?

          Show
          andrey.khayrutdinov Andrey Khayrutdinov (Inactive) added a comment - Henning Godske , doesn't MONDRIAN-2398 describe the problem you are talking about?

            People

            • Assignee:
              aliaksandr Aliaksandr Bialkevich
              Reporter:
              dedushka_shubin dedushka_shubin
            • Votes:
              3 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Agile