Pentaho Analysis - Mondrian
  1. Pentaho Analysis - Mondrian
  2. MONDRIAN-1130

Random IllegalArgumentException and NullPointerException exceptions when executing queries in parallel threads

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 3.4.1 GA (4.5.0 GA Suite Release)
    • Component/s: None
    • Labels:
      None
    • QA Validation Status:
      Not Yet Validated

      Description

      I am using Mondrian in JRuby using mondrian-olap library (https://github.com/rsim/mondrian-olap).

      Recently I upgraded mondrian-olap library from Mondrian 3.3.0 to 3.4.1 and in my production application I started to experience random exceptions when executing queries. When I start many different queries against the same cube (and all these queries run in separate threads in my JRuby application) then I get random IllegalArgumentException and NullPointerException exceptions. Here are example of top of stack traces that I have recorded in my logs:

      org.olap4j.OlapException: mondrian gave exception while executing query
          "root cause: java.lang.IllegalArgumentException: ",
          "root cause: mondrian.rolap.RolapAggregator$1.aggregate(RolapAggregator.java:73)",
          "root cause: mondrian.rolap.agg.SegmentBuilder.rollup(SegmentBuilder.java:408)",
          "root cause: mondrian.rolap.FastBatchingCellReader.loadAggregations(FastBatchingCellReader.java:293)",
          "root cause: mondrian.rolap.RolapResult.phase(RolapResult.java:500)",
          "root cause: mondrian.rolap.RolapResult.executeBody(RolapResult.java:839)",
          "mondrian/olap4j/MondrianOlap4jConnection.java:759:in `createException'",
          "mondrian/olap4j/MondrianOlap4jStatement.java:421:in `executeOlapQueryInternal'",
          "mondrian/olap4j/MondrianOlap4jPreparedStatement.java:72:in `executeQuery'",
          "mondrian/olap4j/MondrianOlap4jPreparedStatement.java:42:in `executeQuery'",
          

      org.olap4j.OlapException: mondrian gave exception while executing query
          "root cause: java.lang.NullPointerException: ",
          "root cause: mondrian.rolap.RolapAggregator$1.aggregate(RolapAggregator.java:62)",
          "root cause: mondrian.rolap.agg.SegmentBuilder.rollup(SegmentBuilder.java:376)",
          "root cause: mondrian.rolap.FastBatchingCellReader.loadAggregations(FastBatchingCellReader.java:293)",
          "root cause: mondrian.rolap.RolapResult.phase(RolapResult.java:500)",
          "root cause: mondrian.rolap.RolapResult.loadMembers(RolapResult.java:609)",
          "mondrian/olap4j/MondrianOlap4jConnection.java:759:in `createException'",
          "mondrian/olap4j/MondrianOlap4jStatement.java:421:in `executeOlapQueryInternal'",
          "mondrian/olap4j/MondrianOlap4jPreparedStatement.java:72:in `executeQuery'",
          "mondrian/olap4j/MondrianOlap4jPreparedStatement.java:42:in `executeQuery'",

      It's quite hard to reproduce this issue on some sample schema. For additional information I can include some anonymized sample MDX that are failing (each time some different MDX statements are failing):

      SELECT NON EMPTY HIERARCHIZE([Time].[Year].Members) ON COLUMNS,
      NON EMPTY HIERARCHIZE([People].[Person].Members) ON ROWS
      FROM [Projects]
      WHERE ([Project Status].[Active], [Projects and Tasks].[Company 1], [Completion Status].[All Completion Statuses], [Measures].[Actual hours])

      SELECT NON EMPTY HIERARCHIZE([Projects and Tasks].[Company].Members) ON COLUMNS,
      NON EMPTY {[Measures].[Actual hours]} ON ROWS
      FROM [Projects]
      WHERE ([People].[Company 1].[Person 1], [Time.Weekly].[2011], [Project Status].[Active], [Completion Status].[All Completion Statuses])

      When I run reports (which execute these MDX statements) sequentially then I do not get any errors. But when I run about 10 reports in parallel then I always can get that some MDX execution is failing.

      When I downgraded mondrian.jar and olap4j.jar to versions from Mondrian 3.3.0 version then I cannot reproduce issue any more. So I think that the issue is caused by some changes between Mondrian 3.3.0 and Mondrian 3.4.1 that something is not thread-safe anymore.

      If you have any suggestions what could be causing this issue and what additional debug information I should collect then please let me know and then I could try to provide additional details.

        Activity

        Hide
        Raimonds Simanovskis added a comment -
        It would be good either to change status from Resolved back to Open or create new issue for the current unresolved problem that I described.

        Would be very happy to hear any progress about this quite critical issue :)
        Show
        Raimonds Simanovskis added a comment - It would be good either to change status from Resolved back to Open or create new issue for the current unresolved problem that I described. Would be very happy to hear any progress about this quite critical issue :)
        Hide
        Luc Boudreau added a comment -
        This should be all fixed in
        https://github.com/pentaho/mondrian/commit/6ad920a317277855771d1a4dd0801d1d9f6ea6d4

        Instead of doing a bunch of instanceof checks I'm using the measure datatype to determine the correct class to aggregate to.
        Show
        Luc Boudreau added a comment - This should be all fixed in https://github.com/pentaho/mondrian/commit/6ad920a317277855771d1a4dd0801d1d9f6ea6d4 Instead of doing a bunch of instanceof checks I'm using the measure datatype to determine the correct class to aggregate to.
        Hide
        Raimonds Simanovskis added a comment -
        I tried to use latest build from http://ci.pentaho.com/job/mondrian-git/112/ and now all parallel queries (that previously were giving random failures) completed successfully.

        So it seems that this issue is fixed. Thanks a lot for the solution :)
        Show
        Raimonds Simanovskis added a comment - I tried to use latest build from http://ci.pentaho.com/job/mondrian-git/112/ and now all parallel queries (that previously were giving random failures) completed successfully. So it seems that this issue is fixed. Thanks a lot for the solution :)
        Hide
        Raimonds Simanovskis added a comment -
        Testing this latest Mondrian build in production with the same many parallels queries got new error messages :(

        Here is the top of the stack trace:

        Mondrian::OLAP::Error: org.olap4j.OlapException: mondrian gave exception while executing query
            "root cause: java.lang.ClassCastException: java.util.ArrayList cannot be cast to mondrian.rolap.SmartRestrictedMemberReader$AccessAwareMemberList",
            "root cause: mondrian.rolap.SmartRestrictedMemberReader.getMemberChildren(SmartRestrictedMemberReader.java:37)",
            "root cause: mondrian.rolap.RolapSchemaReader.internalGetMemberChildren(RolapSchemaReader.java:167)",
            "root cause: mondrian.rolap.RolapSchemaReader.getMemberChildren(RolapSchemaReader.java:149)",
            "root cause: mondrian.olap.DelegatingSchemaReader.getMemberChildren(DelegatingSchemaReader.java:246)",
            "root cause: mondrian.olap.fun.FunUtil.getNonEmptyMemberChildren(FunUtil.java:2164)",
            "mondrian/olap4j/MondrianOlap4jConnection.java:827:in `createException'",
            "mondrian/olap4j/MondrianOlap4jStatement.java:421:in `executeOlapQueryInternal'",
            "mondrian/olap4j/MondrianOlap4jPreparedStatement.java:72:in `executeQuery'",
            "mondrian/olap4j/MondrianOlap4jPreparedStatement.java:42:in `executeQuery'"

        As I see in SmartRestrictedMemberReader.java the failing code is

                        (AccessAwareMemberList) reader.cacheHelper
                            .getChildrenFromCache(parentMember, constraint);

        Do you have idea why this is sometimes failing or do you need some output from jdb to find out what is the result of getChildrenFromCache method call in this case?
        Show
        Raimonds Simanovskis added a comment - Testing this latest Mondrian build in production with the same many parallels queries got new error messages :( Here is the top of the stack trace: Mondrian::OLAP::Error: org.olap4j.OlapException: mondrian gave exception while executing query     "root cause: java.lang.ClassCastException: java.util.ArrayList cannot be cast to mondrian.rolap.SmartRestrictedMemberReader$AccessAwareMemberList",     "root cause: mondrian.rolap.SmartRestrictedMemberReader.getMemberChildren(SmartRestrictedMemberReader.java:37)",     "root cause: mondrian.rolap.RolapSchemaReader.internalGetMemberChildren(RolapSchemaReader.java:167)",     "root cause: mondrian.rolap.RolapSchemaReader.getMemberChildren(RolapSchemaReader.java:149)",     "root cause: mondrian.olap.DelegatingSchemaReader.getMemberChildren(DelegatingSchemaReader.java:246)",     "root cause: mondrian.olap.fun.FunUtil.getNonEmptyMemberChildren(FunUtil.java:2164)",     "mondrian/olap4j/MondrianOlap4jConnection.java:827:in `createException'",     "mondrian/olap4j/MondrianOlap4jStatement.java:421:in `executeOlapQueryInternal'",     "mondrian/olap4j/MondrianOlap4jPreparedStatement.java:72:in `executeQuery'",     "mondrian/olap4j/MondrianOlap4jPreparedStatement.java:42:in `executeQuery'" As I see in SmartRestrictedMemberReader.java the failing code is                 (AccessAwareMemberList) reader.cacheHelper                     .getChildrenFromCache(parentMember, constraint); Do you have idea why this is sometimes failing or do you need some output from jdb to find out what is the result of getChildrenFromCache method call in this case?
        Hide
        Luc Boudreau added a comment -
        This is a different issue. MONDRIAN-1259

        It should be fixed in the following hours. Marking this issue as fixed for now.
        Show
        Luc Boudreau added a comment - This is a different issue. MONDRIAN-1259 It should be fixed in the following hours. Marking this issue as fixed for now.

          People

          • Assignee:
            Unassigned User
            Reporter:
            Raimonds Simanovskis
          • Votes:
            2 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: