Uploaded image for project: 'Pentaho Data Integration - Kettle'
  1. Pentaho Data Integration - Kettle
  2. PDI-8695

REST Client is very slow compared to HTTP Post



    • Type: Improvement
    • Status: Open
    • Severity: High
    • Resolution: Unresolved
    • Affects Version/s: 4.2.1 (4.1.0 GA Suite Release)
    • Fix Version/s: Backlog
    • Component/s: Step
    • Labels:
    • Environment:
      Windows 7
    • PDI Sub-component:
    • Notice:
      When an issue is open, the "Fix Version/s" field conveys a target, not necessarily a commitment. When an issue is closed, the "Fix Version/s" field conveys the version that the issue was fixed in.


      To demonstrate the problem, I created two identical transformations that generate 1,000 rows of test input and POST the data to a web service.

      The HTTP Post example finishes in 1.2 seconds, with the HTTP Post step averaging 890 rows/second.

      The REST Client example however requires over 30 seconds to complete, and the REST Client step can only average 32 rows/second.

      I can accept smaller performance discrepancies, but unfortunately a 30:1 slowdown makes the REST Client unacceptable for my uses. For one thing, I am using PDI to update an Apache Solr search engine with millions of documents.

      I do not quite know what might be wrong with the REST Client implementation. It appears to use Jersey as a JAX-RS implementation. Jersey however is overkill for PDI--a java.net.HttpURLConnection would do just as well to connect to a web service, control HTTP headers and method, send a request body and receive a response body. You don't need Jersey to do any of those things.

      I do see however that the Jersey objects are fully initialized upon each call to processRow. Assuming there is non-negligible overhead to some of this initialization, that could explain some of the performance problems.




            Unassigned Unassigned
            jsturm Jeff Sturm
            1 Vote for this issue
            9 Start watching this issue