Uploaded image for project: 'Pentaho Data Integration - Kettle'
  1. Pentaho Data Integration - Kettle
  2. PDI-9957

"Unable to get VFS File object for filename" message using the Hadoop File Input step

    XMLWordPrintable

    Details

    • PDI Sub-component:
    • Notice:
      When an issue is open, the "Fix Version/s" field conveys a target, not necessarily a commitment. When an issue is closed, the "Fix Version/s" field conveys the version that the issue was fixed in.
    • Operating System/s:
      Ubuntu 12.x (64-bit)

      Description

      I encountered the following error while testing an existing Pentaho MapReduce transformation with a dev build of the CDH42 big data plugin:

      ERROR 28-05 11:47:46,166 - FileInputList - org.pentaho.di.core.exception.KettleFileException:

      Unable to get VFS File object for filename 'hdfs://ubuntu:8020/user/cloudera/callrecords/reference/areacodes.csv' : Could not resolve file "hdfs://ubuntu:8020/user/cloudera/callrecords/reference/areacodes.csv".

      at org.pentaho.di.core.vfs.KettleVFS.getFileObject(KettleVFS.java:161)
      at org.pentaho.di.core.vfs.KettleVFS.getFileObject(KettleVFS.java:104)
      at org.pentaho.di.core.fileinput.FileInputList.createFileList(FileInputList.java:301)
      at org.pentaho.di.core.fileinput.FileInputList.createFileList(FileInputList.java:151)
      at org.pentaho.di.trans.steps.textfileinput.TextFileInputMeta.getTextFileList(TextFileInputMeta.java:1610)
      at org.pentaho.di.trans.steps.textfileinput.TextFileInput.init(TextFileInput.java:1612)
      at org.pentaho.di.trans.step.StepInitThread.run(StepInitThread.java:62)
      at java.lang.Thread.run(Thread.java:662)

      This file can be viewed interactively by Spoon by using the Hadoop File Input step, but fails to resolve when deployed in a mapper.

      To recreate:

      1. Create the following HDFS directories:

      /user/cloudera/callrecords/input
      /user/cloudera/callrecords/output
      /user/cloudera/callrecords/reference

      2. Copy the file callrecords_all.csv to /user/cloudera/callrecords/input

      3. Copy the file areacodes.csv to /user/cloudera/callrecords/reference

      4. Adjust the hostname:port values in the .kjb and .ktr for your cluster

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              dhenry Dave Henry (Inactive)
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: