Uploaded image for project: 'Pentaho Data Integration - Kettle'
  1. Pentaho Data Integration - Kettle
  2. PDI-1824

Parsing numbers with scientific notation

    XMLWordPrintable

    Details

    • Type: New Feature
    • Status: Closed
    • Severity: Unknown
    • Resolution: Fixed
    • Affects Version/s: 3.2.0 GA
    • Fix Version/s: 3.2.0 GA
    • Component/s: None
    • Labels:
      None
    • Notice:
      When an issue is open, the "Fix Version/s" field conveys a target, not necessarily a commitment. When an issue is closed, the "Fix Version/s" field conveys the version that the issue was fixed in.

      Description

      Kettle 3.1 and prior releases has had limited support for scientific
      notation, eg 4.03075e+006. One such question in the forum was
      answered that BigNumber should be tried out:

      http://forums.pentaho.org/showthread.php?t=61836&highlight=exponent

      That has worked until now. In the trunk for future releases, BigNumber
      has been enhanced to take into consideration the Format, by using
      DecimalFormat.parse(). However, these changes also means Kettle
      looses any built-in support for scientific notation. Text File Input now
      cannot parse this as a Number nor a BigNumber.

      Moreover, since Text File Input silently truncates non-number parts of number fields
      4.03075e+006 would not result in an error but in 4.03075. I think
      this might lead to future confusion and difficulties. See this discussion
      in the forum:

      http://forums.pentaho.org/showthread.php?t=65037&highlight=no-break-space

      I have unfortunately no good solution for how this can be easily solved.
      One suggestion would be to use parse(String, ParsePosition) for
      string-to-number conversions and add a checkbox to Text File Input to
      handle such numbers with error or truncation as discussed in the forum.

      However, this doesn't solve the scientific notation. That could maybe
      be done by acting upon parseposition < length and try to parse the latter
      parts of the field as scientific notation (comparing with the specified
      Format). If you happen to think this is a good approach (feels a bit
      dirty to me) you could have a look at the BigDecimal constructor
      which does this parsing.

      If not, maybe a workaround could be suggested in the Wiki?

      Regards,
      Claes

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                gdavid Golda Thomas
                Reporter:
                wwwclaes Claes Svensson
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: