I’ve been working with Cassandra recently, and have been using its Thrift interface quite a bit. Cassandra stores values as byte arrays, which offers a lot of room for flexibility, but also makes it a bit clumsy to get values in. The following Java code shows a minimal Cassadra example, opening a connection and mutating a few rows.
package org.mccv; import org.apache.thrift.transport.TSocket; import org.apache.thrift.transport.TTransportException; import org.apache.thrift.protocol.TBinaryProtocol; import org.apache.thrift.TException; import org.apache.cassandra.service.Cassandra; import org.apache.cassandra.service.InvalidRequestException; import org.apache.cassandra.service.UnavailableException; import java.io.UnsupportedEncodingException; public class CassandraJava { public void insertCassandra(String cassHost, int cassPort){ try{ // boilerplate to get Cassandra set up TSocket sock = new TSocket(cassHost, cassPort); TBinaryProtocol tr = new TBinaryProtocol(sock); Cassandra.Client c = new Cassandra.Client(tr,tr); sock.open(); // run the inserts c.insert("MyTable","key","colFamily:colName1", "foo".getBytes("UTF-8"), System.currentTimeMillis(),false); c.insert("MyTable","key","colFamily:colName2", "bar".getBytes("UTF-8"), System.currentTimeMillis(),false); c.insert("MyTable","key","colFamily:colName3", "baz".getBytes("UTF-8"), System.currentTimeMillis(),false); }catch(TTransportException e){ e.printStackTrace(); }catch(TException te){ te.printStackTrace(); }catch(InvalidRequestException ire){ ire.printStackTrace(); }catch(UnavailableException ue){ ue.printStackTrace(); }catch(UnsupportedEncodingException uee){ uee.printStackTrace(); } } }
A few things to note. First, calling getBytes(“UTF-8″) on strings is sorta ugly. Note that this gets even worse when you switch to Cassandra trunk/0.4, as the column names also become byte arrays. Second, I have to pass in my table, key, timestamp and blocking arguments on each call even though they don’t change.
Let’s see if we can solve the first problem with Scala. The following sample is a first attempt. Note the implicit function defined at the beginning of the object. This tells the Scala compiler that anywhere (well, anywhere that implicit function is in scope) that you need an Array[Byte] and have String, you can execute the function to do the conversion.
package orc.mccv import org.apache.thrift.transport.{TSocket,TTransportException} import org.apache.thrift.protocol.TBinaryProtocol import org.apache.thrift.TException import org.apache.cassandra.service.{Cassandra, InvalidRequestException,UnavailableException} object CassandraScala{ // create a transparent conversion of strings to byte arrays private implicit def string2ByteArray(s: String): Array[Byte] = s.getBytes("UTF-8") def cassInsert(cassHost:String,cassPort:Int) = { // boilerplate to get cassandra set up val sock = new TSocket(cassHost, cassPort) val tr = new TBinaryProtocol(sock) val c = new Cassandra.Client(tr,tr) sock.open // run the inserts c.insert("MyTable","key","colFamily:colName1", "foo",System.currentTimeMillis(),false) c.insert("MyTable","key","colFamily:colName2", "bar",System.currentTimeMillis(),false) c.insert("MyTable","key","colFamily:colName3", "baz",System.currentTimeMillis(),false) } }
The resulting code is quite a bit shorter. Note that I’m cheating a bit by not catching all the exceptions… but it’s still a fair bit more concise.
But what about all the duplicate arguments? Here we can use currying to create a new function with our default arguments applied, leaving us to just supply the changing arguments.
package orc.mccv import org.apache.thrift.transport.{TSocket,TTransportException} import org.apache.thrift.protocol.TBinaryProtocol import org.apache.thrift.TException import org.apache.cassandra.service.{Cassandra, InvalidRequestException,UnavailableException} object CassandraScala{ // create a transparent conversion of strings to byte arrays private implicit def string2ByteArray(s: String): Array[Byte] = s.getBytes("UTF-8") def cassInsertCurried(cassHost:String,cassPort:Int) = { // boilerplate to get cassandra set up val sock = new TSocket(cassHost, cassPort) val tr = new TBinaryProtocol(sock) val c = new Cassandra.Client(tr,tr) sock.open // create a partially applied function with our // default arguments in place val insert = c.insert("MyTable","key",_:String,_:String, System.currentTimeMillis(),false) insert("colFamily:colName1","foo") insert("colFamily:colName2","bar") insert("colFamily:colName3","baz") } }
Here we create a new Function object call ins. The syntax _:String in our call means that those arguments are not applied, and will need to be supplied in calls to ins. Since Function defines an apply method, you can call it just like a normal method. The currying call may throw some people, but the resulting insert calls are almost certainly more readable than the version that has six args.
Nice – I like. Were all those exceptions run time exceptions – is that why you aren't catching them?
Not sure on the exact question… all of the Java ones are checked exceptions. Scala doesn't force you to catch them, so I didn't. Realistically there should by a try block somewhere to handle Thrift and Cassandra errors.
Ah yes of course – I remembered that soon after I posted
Ah yes of course – I remembered that soon after I posted