Uploaded image for project: 'Pentaho Data Integration - Kettle'
  1. Pentaho Data Integration - Kettle
  2. PDI-7610

Cassandra output is much faster with thrift than with CQL - offer option in Cassandra writer to write via thrift.

    Details

    • Notice:
      When an issue is open, the "Fix Version/s" field conveys a target, not necessarily a commitment. When an issue is closed, the "Fix Version/s" field conveys the version that the issue was fixed in.

      Description

      When writing millions of rows to cassandra from pentaho map reduce job, performance is greatly improved by writing via thrift rather than cql. With very high write volumes with builtin cql based writer in the Cassandra output, throughput slows and starts to get service unavailable errors.

      By using thrift based batch mutate, throughput is improved and service unavailable errors are removed.

      I have written code for this - contact me if interested in using it in Cassandra writer

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              ronanstokes Ronan Stokes
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: