Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Severe Severe
    • Resolution: Fixed
    • Affects Version/s: None
    • Component/s: None
    • Labels:
      None
    • Environment:
      mondrian 3.1.1; mondrian latest from p4

      Description

      We discovered that after putting in partial roll-up policy on several mid-size dimensions, the performance becomes very bad.

      Attached is a modified FoodMart schema, with the following changes:
      1. The Customers dimension is shared, and the Sales cube uses the dimension 3 times, i.e. [Customers], [Customers2], and [Customers3].
      2. A role is created for each State+City combination, e.g. CA.Los Angeles, WA.Seattle, etc.
      3. Each role restricts access to dimensions [Customers], [Customers], and [Customers3] for the particular State+City combination, with partial roll-up policy.
      4. A role named "Test" is created to union all the above roles.

      So, now we have partial roll-up policy on 3 dimensions with 10k members each.

      Using the schema, the simplest MDX of:

      SELECT FROM [Sales]

      will take more than 60 seconds to finish on my Core 2 Duo 2GHz machine.

      I think this performance problem makes Mondrian a rather poor choice for a multi-tenant database, where a restriction has to be set for each and every dimension. If there is no viable solution to this, perhaps a warning shall be placed at http://mondrian.pentaho.org/documentation/schema.php#Rollup_policy
      1. FoodMart.xml
        117 kB
        Yap Sok Ann
      2. FoodMart2.xml
        60 kB
        Yap Sok Ann

        Activity

        Hide
        Yap Sok Ann added a comment -
        modified FoodMart schema
        Show
        Yap Sok Ann added a comment - modified FoodMart schema
        Hide
        Yap Sok Ann added a comment -
        Forgot to mention that the MDX is run with the "Test" role.
        Show
        Yap Sok Ann added a comment - Forgot to mention that the MDX is run with the "Test" role.
        Hide
        Yap Sok Ann added a comment -
        A related problem is that the query is just as slow in subsequent runs. I supposed no caching is being done for partial roll-up policy?
        Show
        Yap Sok Ann added a comment - A related problem is that the query is just as slow in subsequent runs. I supposed no caching is being done for partial roll-up policy?
        Hide
        Julian Hyde added a comment -
        I suspect that there is a performance problem because of the sheer number of roles you have created. Possibly an O(n^2) loop in a piece of code looking up a role; I suspect it is in the Union role code, which was recently introduced.

        I'm guessing that the MDX runs fine when you run it as one of the tenant-specific roles, e.g. WA.Seattle. So multi-tenanting shouldn't be a problem.
        Show
        Julian Hyde added a comment - I suspect that there is a performance problem because of the sheer number of roles you have created. Possibly an O(n^2) loop in a piece of code looking up a role; I suspect it is in the Union role code, which was recently introduced. I'm guessing that the MDX runs fine when you run it as one of the tenant-specific roles, e.g. WA.Seattle. So multi-tenanting shouldn't be a problem.
        Hide
        Julian Hyde added a comment -
        As I suspected... just ran the test case, and RoleImpl.getAccess is being called 371M times. This is a problem with union roles only.
        Show
        Julian Hyde added a comment - As I suspected... just ran the test case, and RoleImpl.getAccess is being called 371M times. This is a problem with union roles only.
        Hide
        Yap Sok Ann added a comment -
        Ah, thanks for locating the problem so quickly. Now I have better idea how to work around this.

        Currently, we actually have a role for each Location-Company combination, e.g. WA.Seattle-Company A, CA.Los Angeles-Company B. This allows us to union them together as necessary, e.g. John can have access to all WA.Seattle data across companies, while Mary can have access to USA data for just Company B. This also allows us to change user permissions while keeping the Mondrian schema fairly static.

        Anyway, that's our use case for union roles. I guess for now we just have to put in one-to-one mapping for users <-> roles.
        Show
        Yap Sok Ann added a comment - Ah, thanks for locating the problem so quickly. Now I have better idea how to work around this. Currently, we actually have a role for each Location-Company combination, e.g. WA.Seattle-Company A, CA.Los Angeles-Company B. This allows us to union them together as necessary, e.g. John can have access to all WA.Seattle data across companies, while Mary can have access to USA data for just Company B. This also allows us to change user permissions while keeping the Mondrian schema fairly static. Anyway, that's our use case for union roles. I guess for now we just have to put in one-to-one mapping for users <-> roles.
        Hide
        Yap Sok Ann added a comment -
        I don't think union roles should take all the blames here. Attached is a modified schema with one big role consisted of many MemberGrant's. The "SELECT FROM [Sales]" query takes 6 seconds with this schema, which is 10 times faster than union roles, but definitely is not stellar in terms of performance, especially when repeated runs also take 6 seconds.
        Show
        Yap Sok Ann added a comment - I don't think union roles should take all the blames here. Attached is a modified schema with one big role consisted of many MemberGrant's. The "SELECT FROM [Sales]" query takes 6 seconds with this schema, which is 10 times faster than union roles, but definitely is not stellar in terms of performance, especially when repeated runs also take 6 seconds.
        Hide
        Julian Hyde added a comment -
        Improved performance significantly by adding implementation of role that caches results, rather than scanning through all underlying roles. Fixed in change 13067; will be in mondrian-3.1.5 and mondrian-4.0.
        Show
        Julian Hyde added a comment - Improved performance significantly by adding implementation of role that caches results, rather than scanning through all underlying roles. Fixed in change 13067; will be in mondrian-3.1.5 and mondrian-4.0.
        Hide
        Yap Sok Ann added a comment -
        Excellent. The SELECT FROM [Sales] query with union roles only takes 2.4 seconds now, a 25x improvement in speed.
        Show
        Yap Sok Ann added a comment - Excellent. The SELECT FROM [Sales] query with union roles only takes 2.4 seconds now, a 25x improvement in speed.

          People

          • Assignee:
            Julian Hyde
            Reporter:
            Yap Sok Ann
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: