Uploaded image for project: 'Pentaho Data Integration - Kettle'
  1. Pentaho Data Integration - Kettle
  2. PDI-8715

PDI Operations Mart: Out of memory exception on loading the data mart on larger log tables

    Details

    • Notice:
      When an issue is open, the "Fix Version/s" field conveys a target, not necessarily a commitment. When an issue is closed, the "Fix Version/s" field conveys the version that the issue was fixed in.

      Description

      In the 4 transformations to load the data mart (ETL - Operations Mart ...), the start date is not taken into account in the SELECT query (Read log table information) and the rows are filtered out later by the last run date.
      The "Get Last Run Date" should be taken into account in the SELECT query to avoid unnecessary workload.

      The JavaScript "Read log table information" has an issue since it loads all rows to process into memory and throws an OOME on larger log tables.

        Activity

        Hide
        mburgess Matt Burgess added a comment -

        Committed Kettle 4.4.0 revision 40414 to change JS steps to UDJC steps. Still need to add the WHERE filter to the SQL query if possible.

        Show
        mburgess Matt Burgess added a comment - Committed Kettle 4.4.0 revision 40414 to change JS steps to UDJC steps. Still need to add the WHERE filter to the SQL query if possible.
        Hide
        mburgess Matt Burgess added a comment -

        Committed revision 40497

        Show
        mburgess Matt Burgess added a comment - Committed revision 40497
        Hide
        mburgess Matt Burgess added a comment -

        Dev tested in pdi-operations-mart CI build, assigned to Jens for verification

        Show
        mburgess Matt Burgess added a comment - Dev tested in pdi-operations-mart CI build, assigned to Jens for verification
        Hide
        jbleuel Jens Bleuel added a comment -

        The following three transformations have REPLAYDATE in the UDJC WHERE clause and throw an exception on the log (but do not error out):
        1) ETL - Operations Datamart - Step
        2) ETL - Operations Datamart - Job Entry
        3) ETL - Operations Datamart - Performance

        It must be
        sql.append(" WHERE LOG_DATE > ?");

        Also the transformation should be stopped when an exception is thrown, e.g. the log table is not existing anymore.

        Show
        jbleuel Jens Bleuel added a comment - The following three transformations have REPLAYDATE in the UDJC WHERE clause and throw an exception on the log (but do not error out): 1) ETL - Operations Datamart - Step 2) ETL - Operations Datamart - Job Entry 3) ETL - Operations Datamart - Performance It must be sql.append(" WHERE LOG_DATE > ?"); Also the transformation should be stopped when an exception is thrown, e.g. the log table is not existing anymore.
        Hide
        mburgess Matt Burgess added a comment -

        Committed revision 40757

        Show
        mburgess Matt Burgess added a comment - Committed revision 40757
        Hide
        mburgess Matt Burgess added a comment -

        Dev re-tested in pdi-operations-mart CI build, assigned to Jens for verification

        Show
        mburgess Matt Burgess added a comment - Dev re-tested in pdi-operations-mart CI build, assigned to Jens for verification
        Hide
        jbleuel Jens Bleuel added a comment -

        Validated in CI build.

        Show
        jbleuel Jens Bleuel added a comment - Validated in CI build.

          People

          • Assignee:
            jbleuel Jens Bleuel
            Reporter:
            jbleuel Jens Bleuel
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: