Getting Started with Cassandra

After spending some time with Cassandra, thought about writing a small post. The post is an attempt to get started with Cassandra quick examples. This post does not try to explain the Data Model.

Pre-requisite

1. Cassandra 0.7
2. Cassandra GUI
3. Hector (Cassandra client). You can use any other client with slight modifications

You shall need following jars in your classpath (available with downloads above)
1. apache-cassandra-0.7.0.jar
2. hector-core-0.7.0-22.jar
3. slf4j-log4j12-1.6.1.jar
4. slf4j-api-1.6.1.jar
5. libthrift-0.5.jar
6. log4j-1.2.16.jar
7. perf4j-0.9.12.jar
8. high-scale-lib-1.0.jar

NOTE: I did tweaked the Cassandra GUI to work with Cassandra 0.7.

The reason for using GUI is to see how things are stored within Cassandra and this visualization helps in understanding things faster

We shall explore Cassandra with 3 independent examples.

Downloading the example code from - https://code.google.com/p/cassandra-examples

Lets first do some house keeping stuff like starting Cassandra and applying the schema

Starting Cassandra

1. Download and extract Cassandra 0.7
2. Go to Cassandra-install-dir/bin directory
3. Execute following command
> cassandra -f

This shall start cassandra in foreground.

NOTE: We are running Cassandra single node for the sake of simplicity

Applying the Schema

The schema used for the examples is part of example code

Steps to apply the schema
1. Keep the schema file (cassandra.yaml) in classpath of Cassandra server like in conf directory
2. Launch jconsole (from JAVA_HOMEbin)
3. Connect to Cassandra

JConsole Connect Dialog

JConsole Connect Dialog

4. Click MBeans tab and navigate to org.apache.cassandra.db.StorageService MBean
5. Expand Operations and click on loadSchemaFromYaml

Load Schema

Load Schema

6. Clock the button loadSchemaFromYaml in right pane

You shall receive a success message which means schema was loaded.

Lets see how the Schema looks in cassandra-gui

Schema View

Schema View

So far so good, lets move to our first sample

Example 1: Tweets

When I started with Cassandra, got Tweets examples a lot to learn. So here is my simple Tweet version. For those who are looking for complete Tweet application, twissjava is the way to go.

This idea of this sample is to store all the tweets that are received giving it a unique id.

The POJO has just three fields

public class Tweet implements Serializable {
    private final UUID key;
    private final String uname;
    private final String body;

    public Tweet(UUID key, String uname, String body) {
        this.key = key;
        this.uname = uname;
        this.body = body;
    }

    // Eliminated get/set for clarity
}

Now we need to get hold to Cluster. The usage is specific to Hector

final static Cluster cluster = HFactory.createCluster("LogsCluster",
            new CassandraHostConfigurator("localhost:9160"));
final static Keyspace keyspace = HFactory.createKeyspace("LogData", cluster);

These 2 lines gets a reference to the Cassandra cluster and the keyspace we are using. The names have been specified in the cassandra.yaml that we specified.

Now lets see how we save the tweets

public void saveTweet(Tweet tweet) {
       Mutator<String> m1 = HFactory.createMutator(keyspace, ss);
        m1.addInsertion(tweet.getKey().toString(), 
                              TWEETS, 
                              HFactory.createStringColumn("uname", tweet.getUname()))
          .addInsertion(tweet.getKey().toString(), 
                            TWEETS, 
                            HFactory.createStringColumn("body", tweet.getBody()));
        m1.execute();
}

Here we create a Mutator for the given keyspace and insert the tweet details, like username and tweet data. The column is a UUID, to uniquely identify a tweet and acts as a key.

Now lets see the main function

public static void main(String[] args) {

        // Number of Tweets to be stored
        int count = 500;

        TweetSample sample = new TweetSample();
        System.out.println("Saving Tweets ....");

        for (int i = 0; i < count; i++) {
            Tweet tweet = new Tweet(UUID.randomUUID(), 
                                                 "paliwalashish", 
                                                 "This is tweet# "+ i);
            sample.saveTweet(tweet);
            System.out.println("Saving Tweet # : "+ i);
        }

        System.out.println("Tweet Saved....");
    }

This is simple, we create tweet objects and insert them in a loop. So how does our Cassandra data looks after we run this program.

Tweets DB View

Tweets DB View

As we see for each key (UUID) we have stored two column, username and tweet data.

Example 2: Saving User Action Log

This example is very similar to our first example. Here we are storing userid, action and the URL in the DB.

Example 3: Saving Logs per Hour

In this example we wanted to save Log file per hour so that we can analyze them easily. I choose to use a SuperColumn for this, Day and hour as the keys. There can be other approaches of getting the same functionality. The idea is to have following structure for the logs

Log Storage

Log Storage

For each day, we will store logs per hour

Log POJO just has a string message to be saved. Real world scenario can be more sophisticated

Lets see how we add the data to the SuperColumn

 public void saveLogs(String tag, String hrTag, Log logMessage) {
       Mutator<String> mutator = HFactory.createMutator(keyspace, ss);
        mutator.insert(tag, LOGS, HFactory.createSuperColumn(hrTag,
                    Arrays.asList(HFactory.createStringColumn(UUID.randomUUID().toString(), 
                                     logMessage.getLogMessage())),
                    ss, ss, ss));
        mutator.execute();
    }

The calls are essentially similar, but we just add more keys, like tag is Day key (YYYYMMDD), LOGS is the name of the SuperColumn. Inside the SuperColumn, we add log message with a unique id.

Here is how it looks, when stored in Cassandra.

Logs DB View

Logs DB View

Cassandra Data Model is slightly tricky to understand in the begining. There are really wondeful posts out there explaining the same. Take some time to read about the Data Model and tweak the examples, and have fun.

Happy Scaling 🙂

6 thoughts on “Getting Started with Cassandra

  1. Hi,

    That may not be a great data schema for the logs. The data is partitioned between your Cassandra nodes based on the row key. Since your row key is the same for a whole day, this would mean all logs for each day get sent to only one of your nodes, which may or may not be what you want. See http://www.datastax.com/docs/1.0/cluster_architecture/partitioning

    It may be better to use a UUID for the key so that the logs are distributed between servers.

    Jeff

  2. Thanks Jeff! Your point well taken. Also this was sort of just a sample to get started, so didn’t went deeper into partitions. And UUID is way better for even partitions 🙂

    cheers !
    ashish

Leave a Reply

Your email address will not be published. Required fields are marked *