Pentaho Analysis - Mondrian
  1. Pentaho Analysis - Mondrian
  2. MONDRIAN-747

When joining a shared dimension into a cube at a level other than its leaf level, Mondrian gives wrong results.

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Cannot Reproduce
    • Affects Version/s: 3.1.6 (3.5.2 GA Suite Release)
    • Component/s: None
    • Labels:
      None
    • Customer Case:

      Description

      This is related to a customer case. The customer has tested this issue on the Foodmart DB. He has added a fact table (transformation attached).

      I observed the same behavior locally. Following are the case comments from the customer:

      ========================================================================
      I have a schema with 3 cubes (2 cubes and 1 virtual cube) and 2 shared dimensions.
      I want to use DimensionUsage - to join a shared dimension into a cube at a level other than its leaf level.
      So, in one of the cubes, I am using the following code : <DimensionUsage source="Store" level="state" name="Store" foreignKey="sales_state_province"> cf Schema2.xml attached.

      I am using foodmart data set available in mondrian demo folder (http://sourceforge.net/projects/mondrian/). I have created a transformation (ss.ktr attached) that create a new fact table named sales_fact_1997_test with the additional column sales_state_province. The aim is to have a cube with store dimension at the level of the city (leaf of the dimension) (cube1) and a second one (cube2) that use the same dimension but at diferent level (state : not the leaf). An additional requirement is the use of a virtual cube that uses this shared dimension and virtual measures that come from both cubes cube1 and cube2.

      The issues are the following one :

      1 : on the cube2
      open analyzer
      open cube2 : unexpected behaviour of analyzer : I can see the level city in my dimension store
      add the levels country and state to the cube
      add the measure unitsales2
      add the level city : it crashes with the error "java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.lang.NullPointerException"

      2 : on the cube named virtual_cube
      open analyzer
      open virtual_cube
      add the levels country and state to the cube
      add the measure unitsales2
      add the measure unitsales1
      add the level city
      add the level product_name as column : it crashes with the error "java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.lang.NullPointerException"

      Using the same measure and the same levels (country, state) in both cube cube1 and cube2 doesn't show the same result.
      It seems that the result from cube2 is X time bigger that cube1; X is the number of members "city" below the level "state".
      ========================================================================

      I have verified the behavior on both, JPivot and the Analyzer. The only difference in the two tools is that the JPivot does not crash due to the null pointer exception. But it does not give any output.

      I am attaching the schema and the transformation, to create/populate the "sales_fact_1997_test " table in foodmart, to the JIRA.
      1. Schema2.xml
        2 kB
        Rahul Gurjar
      2. ss.ktr
        15 kB
        Rahul Gurjar

        Issue Links

          Activity

          Hide
          Julian Hyde added a comment -
          Having a hard time reproducing this issue. I have created "sales_fact_1997_test" on oracle as follows:

          create table "sales_fact_1997_test" as select * from "sales_fact_1997";
          alter table "sales_fact_1997_test" add "sales_state_province" varchar2(30);
          update "sales_fact_1997_test" set "sales_state_province" = (
            select "store_state" from "store" where "store_id" = "sales_fact_1997_test"."store_id");
          commit;

          Is that the same effect as the kettle transform? Hope so.

          Using mondrian-3.2.0, I can several queries. The following queries all succeed, and all give at least one row with positive cell values.

          select [Measures].[unitsales2] on 0, [Store].members on 1 from [cube2]
          select [Measures].[unitsales2] on 0, [Store].[country].members on 1 from [cube2]
          select [Measures].[unitsales2] on 0, [Store].[state].members on 1 from [cube2]
          select [Measures].[unitsales2] on 0, [Store].[city].members on 1 from [cube2]
          select non empty [Measures].[unitsales2] on 0, non empty [Store].[state].members on 1 from [cube2]

          The following query gives no rows. It should probably fail, because Store is joined at the [state] level. If you care about this, log another bug:

            select non empty [Measures].[unitsales2] on 0, non empty [Store].[city].members on 1 from [cube2]

          For now, cannot reproduce this issue. To proceed, I will need (a) MDX of queries that fail, (b) confirm that the way I have created the table is equivalent to the kettle transform, (c) full error stacks (from tomcat log if it does not appear on the screen), (d) log of all SQL statements mondrian executes.
          Show
          Julian Hyde added a comment - Having a hard time reproducing this issue. I have created "sales_fact_1997_test" on oracle as follows: create table "sales_fact_1997_test" as select * from "sales_fact_1997"; alter table "sales_fact_1997_test" add "sales_state_province" varchar2(30); update "sales_fact_1997_test" set "sales_state_province" = (   select "store_state" from "store" where "store_id" = "sales_fact_1997_test"."store_id"); commit; Is that the same effect as the kettle transform? Hope so. Using mondrian-3.2.0, I can several queries. The following queries all succeed, and all give at least one row with positive cell values. select [Measures].[unitsales2] on 0, [Store].members on 1 from [cube2] select [Measures].[unitsales2] on 0, [Store].[country].members on 1 from [cube2] select [Measures].[unitsales2] on 0, [Store].[state].members on 1 from [cube2] select [Measures].[unitsales2] on 0, [Store].[city].members on 1 from [cube2] select non empty [Measures].[unitsales2] on 0, non empty [Store].[state].members on 1 from [cube2] The following query gives no rows. It should probably fail, because Store is joined at the [state] level. If you care about this, log another bug:   select non empty [Measures].[unitsales2] on 0, non empty [Store].[city].members on 1 from [cube2] For now, cannot reproduce this issue. To proceed, I will need (a) MDX of queries that fail, (b) confirm that the way I have created the table is equivalent to the kettle transform, (c) full error stacks (from tomcat log if it does not appear on the screen), (d) log of all SQL statements mondrian executes.
          Hide
          Erwan JEZEQUEL added a comment -
          You did not replicate my table "sales_fact_1997_test". Indeed, you have created a table in which you have as many lines as in the original one. My idea was to agregate measures (store_sales, store_cost, unit_sales) by sales_state_province on the new table sales_fact_1997_test.

          So here is a new example to illustrate the issue.

          I am still working on the same database foodmart, on sales_fact_1998.
          I create this view aggregating on store_country.
          CREATE VIEW test AS
          SELECT `product_id`, `time_id`, `customer_id`, `promotion_id`, `store`.`store_country` , SUM(`store_sales`) as store_sales, SUM(`store_cost`) as store_cost, SUM(`unit_sales`) as unit_sales FROM sales_fact_1998 INNER JOIN store ON sales_fact_1998.store_id = store.store_id
          GROUP BY `product_id`, `time_id`, `customer_id`, `promotion_id`

          The new cube schema is attached and named schema_foodmart_2.xml

          select NON EMPTY {[Measures].[unitsales1]} ON COLUMNS,
            NON EMPTY {[Store].[Canada], [Store].[Mexico], [Store].[USA]} ON ROWS
          from [cube1]

          results cube1:
          Measures
          Store unitsales1
          Canada
          46,157
          Mexico
          203,914
          USA
          259,916

          log cube1 :
          2010-06-24 17:52:21,005 DEBUG [mondrian.sql] 0: Segment.load: executing sql [select sum("sales_fact_1998"."unit_sales") as "m0" from "sales_fact_1998" as "sales_fact_1998"]
          2010-06-24 17:52:21,068 DEBUG [mondrian.sql] 0: , exec 47 ms
          2010-06-24 17:52:21,114 DEBUG [mondrian.sql] 0: , exec+fetch 109 ms, 0 rows
          2010-06-24 17:52:39,786 DEBUG [mondrian.sql] 1: SqlMemberSource.getMemberChildren: executing sql [select "store"."store_country" as "c0" from "store" as "store" group by "store"."store_country" order by ISNULL("store"."store_country"), "store"."store_country" ASC]
          2010-06-24 17:52:39,802 DEBUG [mondrian.sql] 1: , exec 0 ms
          2010-06-24 17:52:39,802 DEBUG [mondrian.sql] 1: , exec+fetch 0 ms, 3 rows
          2010-06-24 17:52:39,880 DEBUG [mondrian.sql] 2: RolapStar.Column.getCardinality: executing sql [select count(distinct "store"."store_country") as "c0" from "store" as "store"]
          2010-06-24 17:52:39,880 DEBUG [mondrian.sql] 2: , exec 0 ms
          2010-06-24 17:52:39,880 DEBUG [mondrian.sql] 2: , exec+fetch 0 ms, 1 rows
          2010-06-24 17:52:39,880 DEBUG [mondrian.sql] 3: Segment.load: executing sql [select "store"."store_country" as "c0", sum("sales_fact_1998"."unit_sales") as "m0" from "store" as "store", "sales_fact_1998" as "sales_fact_1998" where "sales_fact_1998"."store_id" = "store"."store_id" group by "store"."store_country"]
          2010-06-24 17:52:40,646 DEBUG [mondrian.sql] 3: , exec 766 ms
          2010-06-24 17:52:40,646 DEBUG [mondrian.sql] 3: , exec+fetch 766 ms, 0 rows



          select NON EMPTY {[Measures].[unitsales2]} ON COLUMNS,
            NON EMPTY {[Store].[Canada], [Store].[Mexico], [Store].[USA]} ON ROWS
          from [cube2]

          results cube2:
          Measures
          Store unitsales2
          Canada
          92,314
          Mexico
          1,835,226
          USA
          3,638,824

          log cube2
          2010-06-24 17:54:41,693 DEBUG [mondrian.sql] 4: Segment.load: executing sql [select "store"."store_country" as "c0", sum("test"."unit_sales") as "m0" from "store" as "store", "test" as "test" where "test"."store_country" = "store"."store_country" group by "store"."store_country"]
          2010-06-24 17:54:49,614 DEBUG [mondrian.sql] 4: , exec 7906 ms
          2010-06-24 17:54:49,614 DEBUG [mondrian.sql] 4: , exec+fetch 7906 ms, 0 rows


          I don't know if it is the correct behaviour of mondrian. I would have expeced the same results for both cube.
          I see the issue on the SQL queries generated that do a cartesian product for the mdx query on the cube2 : the result is multiplied by the number of stores inside the country.

          Thank you.
          Show
          Erwan JEZEQUEL added a comment - You did not replicate my table "sales_fact_1997_test". Indeed, you have created a table in which you have as many lines as in the original one. My idea was to agregate measures (store_sales, store_cost, unit_sales) by sales_state_province on the new table sales_fact_1997_test. So here is a new example to illustrate the issue. I am still working on the same database foodmart, on sales_fact_1998. I create this view aggregating on store_country. CREATE VIEW test AS SELECT `product_id`, `time_id`, `customer_id`, `promotion_id`, `store`.`store_country` , SUM(`store_sales`) as store_sales, SUM(`store_cost`) as store_cost, SUM(`unit_sales`) as unit_sales FROM sales_fact_1998 INNER JOIN store ON sales_fact_1998.store_id = store.store_id GROUP BY `product_id`, `time_id`, `customer_id`, `promotion_id` The new cube schema is attached and named schema_foodmart_2.xml select NON EMPTY {[Measures].[unitsales1]} ON COLUMNS,   NON EMPTY {[Store].[Canada], [Store].[Mexico], [Store].[USA]} ON ROWS from [cube1] results cube1: Measures Store unitsales1 Canada 46,157 Mexico 203,914 USA 259,916 log cube1 : 2010-06-24 17:52:21,005 DEBUG [mondrian.sql] 0: Segment.load: executing sql [select sum("sales_fact_1998"."unit_sales") as "m0" from "sales_fact_1998" as "sales_fact_1998"] 2010-06-24 17:52:21,068 DEBUG [mondrian.sql] 0: , exec 47 ms 2010-06-24 17:52:21,114 DEBUG [mondrian.sql] 0: , exec+fetch 109 ms, 0 rows 2010-06-24 17:52:39,786 DEBUG [mondrian.sql] 1: SqlMemberSource.getMemberChildren: executing sql [select "store"."store_country" as "c0" from "store" as "store" group by "store"."store_country" order by ISNULL("store"."store_country"), "store"."store_country" ASC] 2010-06-24 17:52:39,802 DEBUG [mondrian.sql] 1: , exec 0 ms 2010-06-24 17:52:39,802 DEBUG [mondrian.sql] 1: , exec+fetch 0 ms, 3 rows 2010-06-24 17:52:39,880 DEBUG [mondrian.sql] 2: RolapStar.Column.getCardinality: executing sql [select count(distinct "store"."store_country") as "c0" from "store" as "store"] 2010-06-24 17:52:39,880 DEBUG [mondrian.sql] 2: , exec 0 ms 2010-06-24 17:52:39,880 DEBUG [mondrian.sql] 2: , exec+fetch 0 ms, 1 rows 2010-06-24 17:52:39,880 DEBUG [mondrian.sql] 3: Segment.load: executing sql [select "store"."store_country" as "c0", sum("sales_fact_1998"."unit_sales") as "m0" from "store" as "store", "sales_fact_1998" as "sales_fact_1998" where "sales_fact_1998"."store_id" = "store"."store_id" group by "store"."store_country"] 2010-06-24 17:52:40,646 DEBUG [mondrian.sql] 3: , exec 766 ms 2010-06-24 17:52:40,646 DEBUG [mondrian.sql] 3: , exec+fetch 766 ms, 0 rows select NON EMPTY {[Measures].[unitsales2]} ON COLUMNS,   NON EMPTY {[Store].[Canada], [Store].[Mexico], [Store].[USA]} ON ROWS from [cube2] results cube2: Measures Store unitsales2 Canada 92,314 Mexico 1,835,226 USA 3,638,824 log cube2 2010-06-24 17:54:41,693 DEBUG [mondrian.sql] 4: Segment.load: executing sql [select "store"."store_country" as "c0", sum("test"."unit_sales") as "m0" from "store" as "store", "test" as "test" where "test"."store_country" = "store"."store_country" group by "store"."store_country"] 2010-06-24 17:54:49,614 DEBUG [mondrian.sql] 4: , exec 7906 ms 2010-06-24 17:54:49,614 DEBUG [mondrian.sql] 4: , exec+fetch 7906 ms, 0 rows I don't know if it is the correct behaviour of mondrian. I would have expeced the same results for both cube. I see the issue on the SQL queries generated that do a cartesian product for the mdx query on the cube2 : the result is multiplied by the number of stores inside the country. Thank you.
          Hide
          Erwan JEZEQUEL added a comment -
          Cube definition of schema_foodmart_2.xml

          <Schema name="Test_DimensionUsage">
              <Dimension type="StandardDimension" highCardinality="false" name="Store">
                  <Hierarchy hasAll="true" primaryKey="store_id">
                      <Table name="store">
                      </Table>
                      <Level name="country" column="store_country" type="String" uniqueMembers="false" levelType="Regular" hideMemberIf="Never">
                      </Level>
                      <Level name="state" column="store_state" type="String" uniqueMembers="false" levelType="Regular" hideMemberIf="Never">
                      </Level>
                      <Level name="city" column="store_city" type="String" uniqueMembers="false" levelType="Regular" hideMemberIf="Never">
                      </Level>
                  </Hierarchy>
              </Dimension>
              <Dimension type="StandardDimension" highCardinality="false" name="Product">
                  <Hierarchy name="New Hierarchy 0" hasAll="true" primaryKey="product_id">
                      <Table name="product">
                      </Table>
                      <Level name="product_name" column="product_name" type="String" uniqueMembers="false" levelType="Regular" hideMemberIf="Never">
                      </Level>
                  </Hierarchy>
              </Dimension>
              <Cube name="cube1" cache="true" enabled="true">
                  <Table name="sales_fact_1998">
                  </Table>
                  <DimensionUsage source="Store" name="Store" foreignKey="store_id" highCardinality="false">
                  </DimensionUsage>
                  <DimensionUsage source="Product" name="Product" foreignKey="product_id" highCardinality="false">
                  </DimensionUsage>
                  <Measure name="unitsales1" column="unit_sales" datatype="Numeric" aggregator="sum" visible="true">
                  </Measure>
              </Cube>
              <Cube name="cube2" cache="true" enabled="true">
                  <Table name="test">
                  </Table>
                  <DimensionUsage source="Store" level="country" name="Store" foreignKey="store_country" highCardinality="false">
                  </DimensionUsage>
                  <DimensionUsage source="Product" name="Product" foreignKey="product_id" highCardinality="false">
                  </DimensionUsage>
                  <Measure name="unitsales2" column="unit_sales" datatype="Numeric" aggregator="sum" visible="true">
                  </Measure>
              </Cube>
              <VirtualCube enabled="true" name="virtual_cube">
                  <VirtualCubeDimension highCardinality="false" name="Store">
                  </VirtualCubeDimension>
                  <VirtualCubeDimension highCardinality="false" name="Product">
                  </VirtualCubeDimension>
                  <VirtualCubeMeasure cubeName="cube1" name="[Measures].[unitsales1]" visible="true">
                  </VirtualCubeMeasure>
                  <VirtualCubeMeasure cubeName="cube2" name="[Measures].[unitsales2]" visible="true">
                  </VirtualCubeMeasure>
              </VirtualCube>
          </Schema>
          Show
          Erwan JEZEQUEL added a comment - Cube definition of schema_foodmart_2.xml <Schema name="Test_DimensionUsage">     <Dimension type="StandardDimension" highCardinality="false" name="Store">         <Hierarchy hasAll="true" primaryKey="store_id">             <Table name="store">             </Table>             <Level name="country" column="store_country" type="String" uniqueMembers="false" levelType="Regular" hideMemberIf="Never">             </Level>             <Level name="state" column="store_state" type="String" uniqueMembers="false" levelType="Regular" hideMemberIf="Never">             </Level>             <Level name="city" column="store_city" type="String" uniqueMembers="false" levelType="Regular" hideMemberIf="Never">             </Level>         </Hierarchy>     </Dimension>     <Dimension type="StandardDimension" highCardinality="false" name="Product">         <Hierarchy name="New Hierarchy 0" hasAll="true" primaryKey="product_id">             <Table name="product">             </Table>             <Level name="product_name" column="product_name" type="String" uniqueMembers="false" levelType="Regular" hideMemberIf="Never">             </Level>         </Hierarchy>     </Dimension>     <Cube name="cube1" cache="true" enabled="true">         <Table name="sales_fact_1998">         </Table>         <DimensionUsage source="Store" name="Store" foreignKey="store_id" highCardinality="false">         </DimensionUsage>         <DimensionUsage source="Product" name="Product" foreignKey="product_id" highCardinality="false">         </DimensionUsage>         <Measure name="unitsales1" column="unit_sales" datatype="Numeric" aggregator="sum" visible="true">         </Measure>     </Cube>     <Cube name="cube2" cache="true" enabled="true">         <Table name="test">         </Table>         <DimensionUsage source="Store" level="country" name="Store" foreignKey="store_country" highCardinality="false">         </DimensionUsage>         <DimensionUsage source="Product" name="Product" foreignKey="product_id" highCardinality="false">         </DimensionUsage>         <Measure name="unitsales2" column="unit_sales" datatype="Numeric" aggregator="sum" visible="true">         </Measure>     </Cube>     <VirtualCube enabled="true" name="virtual_cube">         <VirtualCubeDimension highCardinality="false" name="Store">         </VirtualCubeDimension>         <VirtualCubeDimension highCardinality="false" name="Product">         </VirtualCubeDimension>         <VirtualCubeMeasure cubeName="cube1" name="[Measures].[unitsales1]" visible="true">         </VirtualCubeMeasure>         <VirtualCubeMeasure cubeName="cube2" name="[Measures].[unitsales2]" visible="true">         </VirtualCubeMeasure>     </VirtualCube> </Schema>
          Hide
          Julian Hyde added a comment -
          Erwan, Can you clarify. Does your test case produce a NullPointerException? That is the issue I was trying to reproduce.

          I know there may be other issues. I strongly suspect that there are. But if so, I would like to log them under a different bug number. We are chasing our tails here, lumping several issues into one.
          Show
          Julian Hyde added a comment - Erwan, Can you clarify. Does your test case produce a NullPointerException? That is the issue I was trying to reproduce. I know there may be other issues. I strongly suspect that there are. But if so, I would like to log them under a different bug number. We are chasing our tails here, lumping several issues into one.
          Hide
          Julian Hyde added a comment -
          In change 13724, I have checked in the test case I described in my comment dated 21/Jun/10. It already worked (or at least, does not produce an NPE), but I want to make sure that it stays working.
          Show
          Julian Hyde added a comment - In change 13724, I have checked in the test case I described in my comment dated 21/Jun/10. It already worked (or at least, does not produce an NPE), but I want to make sure that it stays working.
          Hide
          Erwan JEZEQUEL added a comment -
          Julian,

          my test doesn't produce NullPointerException. My issue is that the result (the value of measure inside the cube) is different from what I expect. And I am wondering if it is normal or not.
          Show
          Erwan JEZEQUEL added a comment - Julian, my test doesn't produce NullPointerException. My issue is that the result (the value of measure inside the cube) is different from what I expect. And I am wondering if it is normal or not.
          Hide
          Russ Weedon added a comment -
          Hi,

          I think I can see what's happening here. Rahul is trying to combine facts of differing granularity using a star schema (dimension levels all in one table). This results in results which are too large - for example, summary data with a 'state' grain connected to a store dimension with a 'city' grain should not produce a Null Pointer, but will produce results which are too large - in the example shown below, I'm guessing that Canada has 2 cities, Mexico 9 and USA 14, leading to results which are double, 9x and 14x the desired/correct ones. I'm not sure what impacts this has in Analyzer/Virtual Cube scenario, but it certainly won't deliver what you want. The correct way to handle this in Mondrian is (I believe) to snowflake the store dimension using a join as source and define a table for each level used in a fact table. That way, when the country level data are connected to the store dimension, they are joined to a 'country' table which has country as its primary key and are not multiplied.

          Sadly this does not appear to work for nested joins - so correctly combining fact data at country, state and city level would not currently be possible. I am logging a new bug for this - reference to follow.

          Hope I'm helping and not adding to the confusion!

          Russ
          Show
          Russ Weedon added a comment - Hi, I think I can see what's happening here. Rahul is trying to combine facts of differing granularity using a star schema (dimension levels all in one table). This results in results which are too large - for example, summary data with a 'state' grain connected to a store dimension with a 'city' grain should not produce a Null Pointer, but will produce results which are too large - in the example shown below, I'm guessing that Canada has 2 cities, Mexico 9 and USA 14, leading to results which are double, 9x and 14x the desired/correct ones. I'm not sure what impacts this has in Analyzer/Virtual Cube scenario, but it certainly won't deliver what you want. The correct way to handle this in Mondrian is (I believe) to snowflake the store dimension using a join as source and define a table for each level used in a fact table. That way, when the country level data are connected to the store dimension, they are joined to a 'country' table which has country as its primary key and are not multiplied. Sadly this does not appear to work for nested joins - so correctly combining fact data at country, state and city level would not currently be possible. I am logging a new bug for this - reference to follow. Hope I'm helping and not adding to the confusion! Russ
          Hide
          Russ Weedon added a comment -
          New JIRA Bug is #794
          Show
          Russ Weedon added a comment - New JIRA Bug is #794
          Hide
          Julian Hyde added a comment -
          I can reproduce now. See 'if (!Bug.Mondrian747Fixed) ...' I just added to SchemaTest. Total for [Stores].[All Stores] and [Stores].[USA] should be 266,773. It currently comes out higher due to cartesian-product behavior.
          Show
          Julian Hyde added a comment - I can reproduce now. See 'if (!Bug.Mondrian747Fixed) ...' I just added to SchemaTest. Total for [Stores].[All Stores] and [Stores].[USA] should be 266,773. It currently comes out higher due to cartesian-product behavior.

            People

            • Assignee:
              Julian Hyde
              Reporter:
              Rahul Gurjar
            • Votes:
              2 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: