We have found a performance issue when the "Get data from XML step" has to read multiple files (wildcard). In our test case, we are importing about 400 XML-Files with a size up to 500MB (each). The transformation starts with a friendly speed over 1000k r/s. But after a few minutes - it slows down to < 10 r/s.
By just adding a "get file names" at first, and configuring the xml input to "xml source is defined in a field + XML source is a filename" - the output of the xml input step is massive better and constant high over the full time.
It looks like the XML input step does open all files in parallel?