Pentaho Analysis - Mondrian
  1. Pentaho Analysis - Mondrian
  2. MONDRIAN-1133

Issue with native filter evaluation always joining fact table

    Details

    • QA Validation Status:
      Not Yet Validated

      Description

      When the property mondrian.native.filter.enable is set to true, Mondrian will evaluate some Filter MDX functions in SQL. The function may or may not have references to measures from the fact table. However, Mondrian is always including a join to the fact table regardless if it is needed or not.

      Ex.

      With
      Set [*NATIVE_CJ_SET] as '[*BASE_MEMBERS_Product]'
      Set [*SORTED_ROW_AXIS] as 'Order([*CJ_ROW_AXIS],[Measures].[*FORMATTED_MEASURE_2],BASC)'
      Set [*BASE_MEMBERS_Product] as 'Filter([Product].[Product Name].Members,[Product].CurrentMember.Caption Matches ("(?i).*cdr.*"))'
      Set [*BASE_MEMBERS_Measures] as '{[Measures].[*FORMATTED_MEASURE_2],[Measures].[*FORMATTED_MEASURE_0],[Measures].[*FORMATTED_MEASURE_1]}'
      Set [*CJ_ROW_AXIS] as 'Generate([*NATIVE_CJ_SET], {([Product].currentMember)})'
      Set [*CJ_COL_AXIS] as '[*NATIVE_CJ_SET]'
      Member [Measures].[*FORMATTED_MEASURE_2] as '[Measures].[Unit Sales]', FORMAT_STRING = 'Standard', SOLVE_ORDER=400
      Member [Measures].[*FORMATTED_MEASURE_1] as '[Measures].[Sales Count]', FORMAT_STRING = '#,###', SOLVE_ORDER=400
      Member [Measures].[*FORMATTED_MEASURE_0] as '[Measures].[Store Sales]', FORMAT_STRING = '#,###', SOLVE_ORDER=400
      Select
      [*BASE_MEMBERS_Measures] on columns,
      [*SORTED_ROW_AXIS] on rows
      From [Sales]


      Result in this SQL:

      HighCardSqlTupleReader.readTuples [[Product].[Product Name]]: done executing sql [select `product_class`.`product_family` as `c0`, `product_class`.`product_department` as `c1`, `product_class`.`product_category` as `c2`, `product_class`.`product_subcategory` as `c3`, `product`.`brand_name` as `c4`, `product`.`product_name` as `c5` from `FOODMART`.`product` as `product`, `FOODMART`.`product_class` as `product_class`, `FOODMART`.`sales_fact_1997` as `sales_fact_1997` where `product`.`product_class_id` = `product_class`.`product_class_id` and `sales_fact_1997`.`product_id` = `product`.`product_id` group by `product_class`.`product_family`, `product_class`.`product_department`, `product_class`.`product_category`, `product_class`.`product_subcategory`, `product`.`brand_name`, `product`.`product_name` having UPPER(c5) REGEXP '.*cdr.*' order by ISNULL(`product_class`.`product_family`) ASC, `product_class`.`product_family` ASC, ISNULL(`product_class`.`product_department`) ASC, `product_class`.`product_department` ASC, ISNULL(`product_class`.`product_category`) ASC, `product_class`.`product_category` ASC, ISNULL(`product_class`.`product_subcategory`) ASC, `product_class`.`product_subcategory` ASC, ISNULL(`product`.`brand_name`) ASC, `product`.`brand_name` ASC, ISNULL(`product`.`product_name`) ASC, `product`.`product_name` ASC], exec+fetch 284 ms, 28 rows

      The fact table join is incorrect and actually results in the wrong result.

      If we set mondrian.native.filter.enable to false, then we get the correct result:

      SqlTupleReader.readTuples [[Product].[Product Name]]: executing sql [select `product_class`.`product_family` as `c0`, `product_class`.`product_department` as `c1`, `product_class`.`product_category` as `c2`, `product_class`.`product_subcategory` as `c3`, `product`.`brand_name` as `c4`, `product`.`product_name` as `c5` from `FOODMART`.`product` as `product`, `FOODMART`.`product_class` as `product_class` where `product`.`product_class_id` = `product_class`.`product_class_id` group by `product_class`.`product_family`, `product_class`.`product_department`, `product_class`.`product_category`, `product_class`.`product_subcategory`, `product`.`brand_name`, `product`.`product_name` order by ISNULL(`product_class`.`product_family`) ASC, `product_class`.`product_family` ASC, ISNULL(`product_class`.`product_department`) ASC, `product_class`.`product_department` ASC, ISNULL(`product_class`.`product_category`) ASC, `product_class`.`product_category` ASC, ISNULL(`product_class`.`product_subcategory`) ASC, `product_class`.`product_subcategory` ASC, ISNULL(`product`.`brand_name`) ASC, `product`.`brand_name` ASC, ISNULL(`product`.`product_name`) ASC, `product`.`product_name` ASC], exec 23 ms

      However, now we don't get the benefit of native evaluation of the contains/regexp filter which would kill performance on large dimensions.
      1. Last Analyzer Query Log.htm
        7 kB
        Benny Chow
      2. mondrian.properties
        6 kB
        Benny Chow

        Issue Links

          Activity

          Hide
          Julian Hyde added a comment -
          I agree this is a serious problem. However I would like to see the bug further characterized.

          1. What is the minimal set of properties that causes this behavior?
          2. What is the minimal MDX query that causes this behavior?
          Show
          Julian Hyde added a comment - I agree this is a serious problem. However I would like to see the bug further characterized. 1. What is the minimal set of properties that causes this behavior? 2. What is the minimal MDX query that causes this behavior?
          Hide
          Benny Chow added a comment -
          SQL Log with HighCardSqlTupleReader and unncessary join to fact table
          Show
          Benny Chow added a comment - SQL Log with HighCardSqlTupleReader and unncessary join to fact table
          Hide
          Benny Chow added a comment -
          Here's a simple MDX to reproduce the problem:

          With
          Set [*BASE_MEMBERS_Product] as 'Filter([Product].[Product Name].Members,[Product].CurrentMember.Caption Matches ("(?i).*cdr.*"))'
          Select
          [*BASE_MEMBERS_Product] on columns
          From [Sales]

          Please use the attached mondrian.properties
          Show
          Benny Chow added a comment - Here's a simple MDX to reproduce the problem: With Set [*BASE_MEMBERS_Product] as 'Filter([Product].[Product Name].Members,[Product].CurrentMember.Caption Matches ("(?i).*cdr.*"))' Select [*BASE_MEMBERS_Product] on columns From [Sales] Please use the attached mondrian.properties
          Hide
          Luc Boudreau added a comment - - edited
          Fixed in https://github.com/pentaho/mondrian/commit/f37603f44e08d6d2f20dbd53b6f359348102f7ec

          To reproduce, create a schema with the following content:

          ---------------------------------------------------------
          <?xml version="1.0"?>
          <Schema name="custom">
            <Dimension name="Store">
              <Hierarchy hasAll="true" primaryKey="store_id">
                <Table name="store"/>
                <Level name="Store Country" column="store_country" uniqueMembers="true"/>
                <Level name="Store State" column="store_state" uniqueMembers="true"/>
                <Level name="Store City" column="store_city" uniqueMembers="false"/>
                <Level name="Store Name" column="store_name" uniqueMembers="true">
                </Level>
              </Hierarchy>
            </Dimension>
            <Dimension name="Time" type="TimeDimension">
              <Hierarchy hasAll="true" primaryKey="time_id">
                <Table name="time_by_day"/>
                <Level name="Year" column="the_year" type="Numeric" uniqueMembers="true"
                    levelType="TimeYears"/>
                <Level name="Quarter" column="quarter" uniqueMembers="false"
                    levelType="TimeQuarters"/>
                <Level name="Month" column="month_of_year" uniqueMembers="false" type="Numeric"
                    levelType="TimeMonths"/>
              </Hierarchy>
            </Dimension>
            <Cube name="Sales1" defaultMeasure="Unit Sales">
              <Table name="sales_fact_1997">
              </Table>
              <DimensionUsage name="Store" source="Store" foreignKey="store_id"/>
              <DimensionUsage name="Time" source="Time" foreignKey="time_id"/>
              <Measure name="Unit Sales" column="unit_sales" aggregator="sum"
                formatString="Standard"/>
              <Measure name="Store Cost" column="store_cost" aggregator="sum"
                formatString="#,###.00"/>
              <Measure name="Store Sales" column="store_sales" aggregator="sum"
                formatString="#,###.00"/>
            </Cube>
          </Schema>
          ---------------------------------------------------------

          Deploy it in the BI server and configure datasources.xml so that it points to the FoodMart datasource. Then login to the BI server and launch the following MDX.

          ---------------------------------------------------------
          With
          Set [*BASE_MEMBERS_Product] as 'Filter([Store].[Store State].Members,[Store].CurrentMember.Caption Matches ("(?i).*CA.*"))'
          Select
          [*BASE_MEMBERS_Product] on columns
          From [Sales1] ]
          ---------------------------------------------------------


          and check the logs for the following SQL.

          ---------------------------------------------------------
          select
              `store`.`store_country` as `c0`,
              `store`.`store_state` as `c1`
          from
              `store` as `store`
          group by
              `store`.`store_country`,
              `store`.`store_state`
          having
              c1 REGEXP '.*CA.*'
          order by
              ISNULL(`store`.`store_country`) ASC, `store`.`store_country` ASC,
              ISNULL(`store`.`store_state`) ASC, `store`.`store_state` ASC
          ---------------------------------------------------------

          Notice that the fact table is not joined to.
          Show
          Luc Boudreau added a comment - - edited Fixed in https://github.com/pentaho/mondrian/commit/f37603f44e08d6d2f20dbd53b6f359348102f7ec To reproduce, create a schema with the following content: --------------------------------------------------------- <?xml version="1.0"?> <Schema name="custom">   <Dimension name="Store">     <Hierarchy hasAll="true" primaryKey="store_id">       <Table name="store"/>       <Level name="Store Country" column="store_country" uniqueMembers="true"/>       <Level name="Store State" column="store_state" uniqueMembers="true"/>       <Level name="Store City" column="store_city" uniqueMembers="false"/>       <Level name="Store Name" column="store_name" uniqueMembers="true">       </Level>     </Hierarchy>   </Dimension>   <Dimension name="Time" type="TimeDimension">     <Hierarchy hasAll="true" primaryKey="time_id">       <Table name="time_by_day"/>       <Level name="Year" column="the_year" type="Numeric" uniqueMembers="true"           levelType="TimeYears"/>       <Level name="Quarter" column="quarter" uniqueMembers="false"           levelType="TimeQuarters"/>       <Level name="Month" column="month_of_year" uniqueMembers="false" type="Numeric"           levelType="TimeMonths"/>     </Hierarchy>   </Dimension>   <Cube name="Sales1" defaultMeasure="Unit Sales">     <Table name="sales_fact_1997">     </Table>     <DimensionUsage name="Store" source="Store" foreignKey="store_id"/>     <DimensionUsage name="Time" source="Time" foreignKey="time_id"/>     <Measure name="Unit Sales" column="unit_sales" aggregator="sum"       formatString="Standard"/>     <Measure name="Store Cost" column="store_cost" aggregator="sum"       formatString="#,###.00"/>     <Measure name="Store Sales" column="store_sales" aggregator="sum"       formatString="#,###.00"/>   </Cube> </Schema> --------------------------------------------------------- Deploy it in the BI server and configure datasources.xml so that it points to the FoodMart datasource. Then login to the BI server and launch the following MDX. --------------------------------------------------------- With Set [*BASE_MEMBERS_Product] as 'Filter([Store].[Store State].Members,[Store].CurrentMember.Caption Matches ("(?i).*CA.*"))' Select [*BASE_MEMBERS_Product] on columns From [Sales1] ] --------------------------------------------------------- and check the logs for the following SQL. --------------------------------------------------------- select     `store`.`store_country` as `c0`,     `store`.`store_state` as `c1` from     `store` as `store` group by     `store`.`store_country`,     `store`.`store_state` having     c1 REGEXP '.*CA.*' order by     ISNULL(`store`.`store_country`) ASC, `store`.`store_country` ASC,     ISNULL(`store`.`store_state`) ASC, `store`.`store_state` ASC --------------------------------------------------------- Notice that the fact table is not joined to.
          Hide
          Luc Boudreau added a comment -
          Just checked in a better fix for MONDRIAN-1133 and fixes regressions MONDRIAN-1248 and MONDRIAN-1246. When deciding whether or not to join the fact table, the RolapNativeFilter was not very smart. I've used a MdxVisitor to go through the filter expression and spot all measures. If one is present, we have to join on the fact table. Additionally, this fix introduced a regression when Role objects come in after the fact and added more conditionals. They were presuming that the fact table was already part of the query but they were not mandatory anymore. They will now join correctly to the required tables according to the graph.

          ref: https://github.com/pentaho/mondrian/commit/2abe132f83a8a162ca19bdaf005b60fbd9fdf814
          Show
          Luc Boudreau added a comment - Just checked in a better fix for MONDRIAN-1133 and fixes regressions MONDRIAN-1248 and MONDRIAN-1246 . When deciding whether or not to join the fact table, the RolapNativeFilter was not very smart. I've used a MdxVisitor to go through the filter expression and spot all measures. If one is present, we have to join on the fact table. Additionally, this fix introduced a regression when Role objects come in after the fact and added more conditionals. They were presuming that the fact table was already part of the query but they were not mandatory anymore. They will now join correctly to the required tables according to the graph. ref: https://github.com/pentaho/mondrian/commit/2abe132f83a8a162ca19bdaf005b60fbd9fdf814
          Hide
          Li Deng added a comment - - edited
          Followed the steps above and verified that the fact table is now in the query:

          Segment.load: done executing sql [select "store"."store_state" as "c0", sum("sales_fact_1997"."unit_sales") as "m0" from "store" as "store", "sales_fact_1997" as "sales_fact_1997" where "sales_fact_1997"."store_id" = "store"."store_id" and "store"."store_state" in ('CA', 'Yucatan', 'Zacatecas') group by "c0"], exec+fetch 164 ms, 1 rows, ex=74, close=74, open=[]
          Show
          Li Deng added a comment - - edited Followed the steps above and verified that the fact table is now in the query: Segment.load: done executing sql [select "store"."store_state" as "c0", sum("sales_fact_1997"."unit_sales") as "m0" from "store" as "store", "sales_fact_1997" as "sales_fact_1997" where "sales_fact_1997"."store_id" = "store"."store_id" and "store"."store_state" in ('CA', 'Yucatan', 'Zacatecas') group by "c0"], exec+fetch 164 ms, 1 rows, ex=74, close=74, open=[]

            People

            • Assignee:
              Li Deng
              Reporter:
              Benny Chow
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: