Uploaded image for project: 'Pentaho Data Integration - Kettle'
  1. Pentaho Data Integration - Kettle
  2. PDI-19144

"Univariate statistics": The output of "Percentile" is different from the expected result.

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Severity: Urgent
    • Resolution: Not a Bug
    • Affects Version/s: 8.3.0 GA, 8.3.0.8 GA, 9.1.0 GA, 9.1.0.4 GA
    • Fix Version/s: Backlog
    • Component/s: Step
    • Labels:
    • Story Points:
      0
    • PDI Sub-component:
    • Notice:
      When an issue is open, the "Fix Version/s" field conveys a target, not necessarily a commitment. When an issue is closed, the "Fix Version/s" field conveys the version that the issue was fixed in.
    • Sprint Team:
      Tatooine (Maint)
    • Steps to Reproduce:
      Hide

      Steps to Reproduce :

      • open the attached transformation.
      • Make sure all the paths are changed to point to your filesystem.
      • Run the sample transformation.
      • Verify the output and we can see wrong data.
      Show
      Steps to Reproduce : open the attached transformation. Make sure all the paths are changed to point to your filesystem. Run the sample transformation. Verify the output and we can see wrong data.

      Description

      We are using Univariate Statistics step to calculate Percentile and it is giving wrong results.

      The following formula are used to calculate the percentile.

      (n+1)P/100
      n: Number of data.
      P: Value set to "Percentile".

      (200+1)*75/100=150.75

      If the result of the calculation contains a value that is less than or equal to the decimal point, the following calculation is also used.
      K th value + Decimal([K+1 th value] - [K th value])

      The value of K is 150.
      150+0.75*(151-150)=150.75

      But, PDI is generating wrong result as 150.5

      Expected Result
      The text file is output as follows.
      ----------
      data1(75th percentile)
      150.75

      Actual result
      The text file was output as follows.
      ----------
      data1(75th percentile)
      150.5

        PractiTest Integration




          Attachments

          1. PDI-19144_2.PNG
            PDI-19144_2.PNG
            14 kB
          2. PDI-19144_1.PNG
            PDI-19144_1.PNG
            142 kB
          3. data1.csv
            0.9 kB
          4. CCL-PDI-S065501-037-L.ktr
            132 kB

            Activity

              People

              Assignee:
              brana Bhupendra Rana
              Reporter:
              nprakash Nikhil Prakash
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: