
Tuesday, May 31, 2011

Bridging the gap between SQL ao NoSQL: A state of the art

Here is a state of the art report I wrote on SQL and NoSQL, and a way to bring them closer. This is actually the theme of my master thesis, so you should probably get some more posts on this topic in the future.

Hasta. ;)


Monday, May 9, 2011

Running a Cassandra cluster with only one machine

I've noticed that if you want to run a cassandra cluster on your own pc, for the purpose of small tests, there is no guide in the wiki to do just that.

Therefore, here is how I've done it.

First of you'll need to create an alias for you network interface:

Mac OS
ifconfig en0 alias

ifconfig eth0:0

Here I've chosen the en0 (or eth0) interface, but you can choose the one you like, and also the IP address you like.

The first file you'll have to edit is the conf/cassandra.yaml:

- Change the commit_log, data and saved_caches directories, so it doesn't conflict with the ones from the previous "node";
- Change the rpc_port (used for Thrift or Avro) to one that is free;
- Change the listen_address to the IP of your "fake" interface.

Next open the conf/cassandra-env.sh file and change the JMX_PORT.

The last file to edit is the bin/cassandra.in.sh where you'll need to change all the occurences of $CASSANDRA_HOME to the path of the "node". For example, if you're bin directory is in /XXX/YYY/node2/bin, the path is /XXX/YYY/node2.

You can do this to create as many nodes as you want, and then just run them as usual, with bin/cassandra -f

Sunday, May 1, 2011

Inserting data with Thrift and Cassandra 0.7

A lot has changed from Cassandra 0.6 to 0.7, and sometimes it is hard to find examples of how things work. I'll be posting how to's on some of the most usual operations you might want to perform when using Cassandra, written in Java.

First of you have to establish a connection to the server:

TFramedTransport transport = new TFramedTransport(new TSocket("localhost", 9160));
Cassandra.Client client = new Cassandra.Client(new TBinaryProtocol(transport));

Here I'm using localhost and the default port for Cassandra, but this can be configured.

One difference to the previous versions of Cassandra is that the connection can be bound to a keyspace, and can be set as so:


With the connection established you need only the data to insert, now. This data is passed to the server in the form of mutations (org.apache.cassandra.thrift.Mutation).

In this example, I'll be adding a column to a row in a column family in the predefined keyspace.

List<Mutation> insertion_list = new ArrayList<Mutation>();

Column col_to_add = new Column(ByteBuffer.wrap(("name").getBytes("UTF8")), ByteBuffer.wrap(("value").getBytes("UTF8")),System.currentTimeMillis());

Mutation mut = new Mutation();
mut.setColumn_or_supercolumn(new ColumnOrSuperColumn().setColumn(col_to_add));

Map<String,List<Mutation>> columnFamilyValues = new HashMap<String,List<Mutation>>();

Map<ByteBuffer,<String,List<Mutation>>> rowDefinition = new HashMap<ByteBuffer,<String,List<Mutation>>>();


The code is pretty much self explaining, apart from some values that can be reconfigured at will, as the encoding of the strings (I've used UTF8), and the consistency level of the insertion (I've used ONE).

In the case of the consistency levels you should check out Cassandra's wiki, to better understand it's usage.

To close the connection to the server it as easy as,


Hope you find this useful. Next I'll give an example of how to get data from the server, as soon as I have some time. ;)