Pentaho Data Integration - Kettle
  1. Pentaho Data Integration - Kettle
  2. PDI-7610

Cassandra output is much faster with thrift than with CQL - offer option in Cassandra writer to write via thrift.

    Details

    • Notice:
      When an issue is open, the "Fix Version/s" field conveys a target, not necessarily a commitment. When an issue is closed, the "Fix Version/s" field conveys the version that the issue was fixed in.
    • QA Validation Status:
      Unvalidatable

      Description

      When writing millions of rows to cassandra from pentaho map reduce job, performance is greatly improved by writing via thrift rather than cql. With very high write volumes with builtin cql based writer in the Cassandra output, throughput slows and starts to get service unavailable errors.

      By using thrift based batch mutate, throughput is improved and service unavailable errors are removed.

      I have written code for this - contact me if interested in using it in Cassandra writer

        Activity

        Ronan Stokes created issue -
        Doug Moran made changes -
        Field Original Value New Value
        Status Open [ 1 ] Open [ 1 ]
        Priority Unknown [ 7 ] Critical [ 2 ]
        Assignee Triage [ project admin ] Mark Hall [ mhall ]
        Fix Version/s 4.3.1 [ 11367 ]
        Doug Moran made changes -
        Affects Version/s 4.3.0 GA (4.5.0 GA Suite Release) [ 11229 ]
        Affects Version/s 4.3.0 RC (4.5.0 RC Suite Release) [ 11364 ]
        Mark Hall made changes -
        Fix Version/s Big Data - 4.6 Backlog [ 11446 ]
        Fix Version/s 4.3.1 [ 11367 ]
        Doug Moran made changes -
        Labels q_bad
        Mark Hall made changes -
        Status Open [ 1 ] In Progress [ 3 ]
        Show
        Mark Hall added a comment - https://github.com/pentaho/big-data-plugin/pull/43
        Show
        Mark Hall added a comment - https://github.com/pentaho/big-data-plugin/commit/63867fa410076fc4791074e99d7c9d761608fc5d
        Mark Hall made changes -
        Status In Progress [ 3 ] Resolved [ 5 ]
        Mark Hall made changes -
        Assignee Mark Hall [ mhall ] Unassigned User [ unassigned ]
        Doug Moran made changes -
        Fix Version/s 4.4.0 GA (4.8.0 GA Suite Release) [ 11367 ]
        Fix Version/s BD 4.4 (Platform 4.8) Backlog [ 11446 ]
        Golda David made changes -
        Assignee Unassigned User [ unassigned ] Carter Everett [ ceverett ]
        Carter Everett made changes -
        Summary Cassandra output is much faster with thrift than with CQL - offer option in Cassandra writer to write via thrift. Cassandra output is much faster with thrift than with CQL - offer option in Cassandra writer to write via thrift.
        Hide
        Carter Everett added a comment -

        Mark, could you provide a repro for this case?

        Show
        Carter Everett added a comment - Mark, could you provide a repro for this case?
        Hide
        Mark Hall added a comment -

        Err, I don't think I can. It didn't have an option to use Thrift before and now it does. The results of writing data to a table using either CQL or Thrift mode should be the same.

        Show
        Mark Hall added a comment - Err, I don't think I can. It didn't have an option to use Thrift before and now it does. The results of writing data to a table using either CQL or Thrift mode should be the same.
        Hide
        Carter Everett added a comment -

        Mark, did you create any unit tests for this one?

        Show
        Carter Everett added a comment - Mark, did you create any unit tests for this one?
        Hide
        Mark Hall added a comment -

        Hi Carter,

        There are no unit tests at all for Cassandra yet. There is a JIRA for refactoring the Cassandra steps to allow for some unit tests to be written:

        http://jira.pentaho.com/browse/PDI-8229

        The best we can do at the moment is to test the CQL and thrift modes manually to ensure that they produce the same results.

        Show
        Mark Hall added a comment - Hi Carter, There are no unit tests at all for Cassandra yet. There is a JIRA for refactoring the Cassandra steps to allow for some unit tests to be written: http://jira.pentaho.com/browse/PDI-8229 The best we can do at the moment is to test the CQL and thrift modes manually to ensure that they produce the same results.
        Carter Everett made changes -
        Assignee Carter Everett [ ceverett ] Unassigned User [ unassigned ]
        Sean Flatley made changes -
        Assignee Unassigned User [ unassigned ] Sean Flatley [ sflatley ]
        Carter Everett made changes -
        Assignee Sean Flatley [ sflatley ] Carter Everett [ ceverett ]
        Hide
        Carter Everett added a comment -

        Per discussion with Doug this is currently unvalidatable.

        Show
        Carter Everett added a comment - Per discussion with Doug this is currently unvalidatable.
        Carter Everett made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        QA Validation Status Not Yet Validated Unvalidatable
        Resolution Fixed [ 1 ]
        Carter Everett made changes -
        Resolution Fixed [ 1 ]
        Status Closed [ 6 ] Reopened [ 4 ]
        Carter Everett made changes -
        Status Reopened [ 4 ] In Progress [ 3 ]
        Carter Everett made changes -
        Status In Progress [ 3 ] Open [ 1 ]
        Carter Everett made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Carter Everett made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Assignee Carter Everett [ ceverett ] Unassigned User [ unassigned ]
        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Open Open Open Open
        4d 23h 4m 1 Doug Moran 28/Mar/12 3:17 PM
        Open Open In Progress In Progress
        112d 4h 8m 1 Mark Hall 18/Jul/12 7:26 PM
        In Progress In Progress Resolved Resolved
        18d 8h 38m 1 Mark Hall 06/Aug/12 4:05 AM
        Closed Closed Reopened Reopened
        14s 1 Carter Everett 11/Oct/12 3:26 PM
        Reopened Reopened In Progress In Progress
        3s 1 Carter Everett 11/Oct/12 3:26 PM
        In Progress In Progress Open Open
        3s 1 Carter Everett 11/Oct/12 3:26 PM
        Open Open Resolved Resolved
        6s 1 Carter Everett 11/Oct/12 3:26 PM
        Resolved Resolved Closed Closed
        66d 11h 21m 2 Carter Everett 11/Oct/12 3:27 PM

          People

          • Assignee:
            Unassigned User
            Reporter:
            Ronan Stokes
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: