Uploaded image for project: 'Pentaho Data Integration - Kettle'
  1. Pentaho Data Integration - Kettle
  2. PDI-19144

"Univariate statistics": The output of "Percentile" is different from the expected result.

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Severity: Urgent
    • Resolution: Not a Bug
    • Affects Version/s: 8.3.0 GA, 8.3.0.8 GA, 9.1.0 GA, 9.1.0.4 GA
    • Fix Version/s: Backlog
    • Component/s: Step
    • Labels:
    • Story Points:
      0
    • PDI Sub-component:
    • Notice:
      When an issue is open, the "Fix Version/s" field conveys a target, not necessarily a commitment. When an issue is closed, the "Fix Version/s" field conveys the version that the issue was fixed in.
    • Sprint Team:
      Tatooine (Maint)
    • Steps to Reproduce:
      Hide

      Steps to Reproduce :

      • open the attached transformation.
      • Make sure all the paths are changed to point to your filesystem.
      • Run the sample transformation.
      • Verify the output and we can see wrong data.
      Show
      Steps to Reproduce : open the attached transformation. Make sure all the paths are changed to point to your filesystem. Run the sample transformation. Verify the output and we can see wrong data.

      Description

      We are using Univariate Statistics step to calculate Percentile and it is giving wrong results.

      The following formula are used to calculate the percentile.

      (n+1)P/100
      n: Number of data.
      P: Value set to "Percentile".

      (200+1)*75/100=150.75

      If the result of the calculation contains a value that is less than or equal to the decimal point, the following calculation is also used.
      K th value + Decimal([K+1 th value] - [K th value])

      The value of K is 150.
      150+0.75*(151-150)=150.75

      But, PDI is generating wrong result as 150.5

      Expected Result
      The text file is output as follows.
      ----------
      data1(75th percentile)
      150.75

      Actual result
      The text file was output as follows.
      ----------
      data1(75th percentile)
      150.5

        Attachments

        1. CCL-PDI-S065501-037-L.ktr
          132 kB
        2. data1.csv
          0.9 kB
        3. PDI-19144_1.PNG
          PDI-19144_1.PNG
          142 kB
        4. PDI-19144_2.PNG
          PDI-19144_2.PNG
          14 kB

          Activity

            People

            Assignee:
            brana Bhupendra Rana
            Reporter:
            nprakash Nikhil Prakash
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved: