Uploaded image for project: 'Pentaho Data Integration - Kettle'
  1. Pentaho Data Integration - Kettle
  2. PDI-4212

Get data from XML step - Bad performace with multiple source files

    XMLWordPrintable

    Details

    • Notice:
      When an issue is open, the "Fix Version/s" field conveys a target, not necessarily a commitment. When an issue is closed, the "Fix Version/s" field conveys the version that the issue was fixed in.
    • Operating System/s:
      Windows XP

      Description

      We have found a performance issue when the "Get data from XML step" has to read multiple files (wildcard). In our test case, we are importing about 400 XML-Files with a size up to 500MB (each). The transformation starts with a friendly speed over 1000k r/s. But after a few minutes - it slows down to < 10 r/s.

      By just adding a "get file names" at first, and configuring the xml input to "xml source is defined in a field + XML source is a filename" - the output of the xml input step is massive better and constant high over the full time.

      It looks like the XML input step does open all files in parallel?

        Attachments

          Activity

            People

            • Assignee:
              sflatley Sean Flatley (Inactive)
              Reporter:
              michaelbieri Michael Bieri
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: