Uploaded image for project: 'Pentaho Data Integration - Kettle'
  1. Pentaho Data Integration - Kettle
  2. PDI-19035

Increasing the NIO buffer size does not improve performance on the Pentaho server

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Open
    • Severity: Urgent
    • Resolution: Unresolved
    • Affects Version/s: 8.3.0.16 GA, 9.1.0.1 GA
    • Fix Version/s: Backlog
    • Component/s: Step
    • Labels:
      None
    • Story Points:
      0
    • Notice:
      When an issue is open, the "Fix Version/s" field conveys a target, not necessarily a commitment. When an issue is closed, the "Fix Version/s" field conveys the version that the issue was fixed in.
    • Steps to Reproduce:
      Hide
      1. Start spoon
      2. Connect to the Pentaho repository.
      3. Import the attached CSV_test.ktr
      4. Save this transformation in the repository.
      5. Create and Save test.csv.
        1. Copy 1 record that is data of sample.csv for 10,000,000 rows.
        2. Save it created in 5.1 as test.csv in any directory.
      6. Set path of test.csv in filename of CSV input step and save this transformation.
        NOTE
        NIO buffer size option is set to the default value of 50000.
      7. Execute this transformation in Pentaho Server and check processing time.
        NOTE
        Processing time is the time taken from the start to the end of transformation.
      8. Change the NIO buffer size option in the CSV File Input step to 1048576(1MB) and save this transformation.
      9. Execute this transformation in Pentaho Server and check processing time.
      10. Change the NIO buffer size option in the CSV File Input step to 5242880(5MB) and save this transformation.
      11. Execute this transformation in Pentaho Server and check processing time.
      12. Change the NIO buffer size option in the CSV File Input step to 10485760(10MB) and save this transformation.
      13. Execute this transformation in Pentaho Server and check processing time.

      Expected Result
      Increasing the NIO buffer size should improve performance and increase throughput.

      Actual Result
      Increasing the NIO buffer size does not improve performance and increase throughput.
      The execution result in our environment is as follows.
      NIO buffer size: processing time
      50000 bytes :63.6s
      1048576 bytes :63.3s
      5242880 bytes :63.5s
      10485760bytes :63.9s
      Comparing the result of 1048576bytes with the result of 50000bytes, 1048576bytes is 0.3s smaller.

      However, when tried to run it locally on spoon client here we could see the difference

      • Execute the transformation locally.
      • Go to the Metrics tab in spoon client.
      • Repeat the step you have mentioned by changing the NIO buffer size value.
      • You will notice the performance improved.
      Show
      Start spoon Connect to the Pentaho repository. Import the attached CSV_test.ktr Save this transformation in the repository. Create and Save test.csv. Copy 1 record that is data of sample.csv for 10,000,000 rows. Save it created in 5.1 as test.csv in any directory. Set path of test.csv in filename of CSV input step and save this transformation. NOTE NIO buffer size option is set to the default value of 50000. Execute this transformation in Pentaho Server and check processing time. NOTE Processing time is the time taken from the start to the end of transformation. Change the NIO buffer size option in the CSV File Input step to 1048576(1MB) and save this transformation. Execute this transformation in Pentaho Server and check processing time. Change the NIO buffer size option in the CSV File Input step to 5242880(5MB) and save this transformation. Execute this transformation in Pentaho Server and check processing time. Change the NIO buffer size option in the CSV File Input step to 10485760(10MB) and save this transformation. Execute this transformation in Pentaho Server and check processing time. Expected Result Increasing the NIO buffer size should improve performance and increase throughput. Actual Result Increasing the NIO buffer size does not improve performance and increase throughput. The execution result in our environment is as follows. NIO buffer size: processing time 50000 bytes :63.6s 1048576 bytes :63.3s 5242880 bytes :63.5s 10485760bytes :63.9s Comparing the result of 1048576bytes with the result of 50000bytes, 1048576bytes is 0.3s smaller. However, when tried to run it locally on spoon client here we could see the difference Execute the transformation locally. Go to the  Metrics tab in spoon client. Repeat the step you have mentioned by changing the  NIO buffer size  value. You will notice the performance improved.

      Description

      On the Pentaho server, there is no impact by changing the NIO buffer size,
      The execution result in our environment is as follows.

      NIO buffer size processing time
      50000 bytes 63.6s
      1048576 bytes 63.3s
      5242880 bytes 63.5s
      10485760bytes 63.9s

      Comparing the result of 1048576bytes with the result of 50000bytes, 1048576bytes is 0.3s smaller.

      The following statement is in Pentaho Documentation
      Pentaho Data Integration performance tips:
      https://help.pentaho.com/Documentation/8.3/Setup/Pentaho_Data_Integration_performance_tips
      "These new steps have been rewritten using Non-blocking I/O (NIO) features.
      Typically, the larger the NIO buffer you specify in the step,the better your read performance will be."

      The following statement is in Best Practices, Performance Tuning for PDI.pdf
      "NIO Buffer Size: This parameter determines the amount of data read at one time from a text file. This can be adjusted to increase throughput. "

      However, if we do the above settings and just do a local execution(execute on spoon) then we see the performance improvement.

        Attachments

        1. CSV_test.ktr
          53 kB
        2. sample.csv
          0.3 kB

          Activity

            People

            Assignee:
            Unassigned Unassigned
            Reporter:
            jagdeeshss Jagdeeshss
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Dates

              Created:
              Updated: