Hi! Welcome...

Thread.currentThread().join() I am Ashish. I am Member of Apache MINA PMC, ASF Committer and avid Code hacker.

24 January 2011 ~ 2 Comments

Getting Started with Cassandra


After spending some time with Cassandra, thought about writing a small post. The post is an attempt to get started with Cassandra quick examples. This post does not try to explain the Data Model.

Pre-requisite

1. Cassandra 0.7
2. Cassandra GUI
3. Hector (Cassandra client). You can use any other client with slight modifications

You shall need following jars in your classpath (available with downloads above)
1. apache-cassandra-0.7.0.jar
2. hector-core-0.7.0-22.jar
3. slf4j-log4j12-1.6.1.jar
4. slf4j-api-1.6.1.jar
5. libthrift-0.5.jar
6. log4j-1.2.16.jar
7. perf4j-0.9.12.jar
8. high-scale-lib-1.0.jar

NOTE: I did tweaked the Cassandra GUI to work with Cassandra 0.7.

The reason for using GUI is to see how things are stored within Cassandra and this visualization helps in understanding things faster

We shall explore Cassandra with 3 independent examples.

Downloading the example code from - https://code.google.com/p/cassandra-examples

Lets first do some house keeping stuff like starting Cassandra and applying the schema


Starting Cassandra

1. Download and extract Cassandra 0.7
2. Go to Cassandra-install-dir/bin directory
3. Execute following command
> cassandra -f

This shall start cassandra in foreground.

NOTE: We are running Cassandra single node for the sake of simplicity

Applying the Schema

The schema used for the examples is part of example code

Steps to apply the schema
1. Keep the schema file (cassandra.yaml) in classpath of Cassandra server like in conf directory
2. Launch jconsole (from JAVA_HOME\bin)
3. Connect to Cassandra

JConsole Connect Dialog

JConsole Connect Dialog

4. Click MBeans tab and navigate to org.apache.cassandra.db.StorageService MBean
5. Expand Operations and click on loadSchemaFromYaml


Load Schema

Load Schema

6. Clock the button loadSchemaFromYaml in right pane

You shall receive a success message which means schema was loaded.

Lets see how the Schema looks in cassandra-gui

Schema View

Schema View

So far so good, lets move to our first sample


Example 1: Tweets

When I started with Cassandra, got Tweets examples a lot to learn. So here is my simple Tweet version. For those who are looking for complete Tweet application, twissjava is the way to go.

This idea of this sample is to store all the tweets that are received giving it a unique id.

The POJO has just three fields

public class Tweet implements Serializable {
    private final UUID key;
    private final String uname;
    private final String body;

    public Tweet(UUID key, String uname, String body) {
        this.key = key;
        this.uname = uname;
        this.body = body;
    }

    // Eliminated get/set for clarity
}

Now we need to get hold to Cluster. The usage is specific to Hector

final static Cluster cluster = HFactory.createCluster("LogsCluster",
            new CassandraHostConfigurator("localhost:9160"));
final static Keyspace keyspace = HFactory.createKeyspace("LogData", cluster);

These 2 lines gets a reference to the Cassandra cluster and the keyspace we are using. The names have been specified in the cassandra.yaml that we specified.

Now lets see how we save the tweets

public void saveTweet(Tweet tweet) {
       Mutator<String> m1 = HFactory.createMutator(keyspace, ss);
        m1.addInsertion(tweet.getKey().toString(),
                              TWEETS,
                              HFactory.createStringColumn("uname", tweet.getUname()))
          .addInsertion(tweet.getKey().toString(),
                            TWEETS,
                            HFactory.createStringColumn("body", tweet.getBody()));
        m1.execute();
}

Here we create a Mutator for the given keyspace and insert the tweet details, like username and tweet data. The column is a UUID, to uniquely identify a tweet and acts as a key.

Now lets see the main function

public static void main(String[] args) {

        // Number of Tweets to be stored
        int count = 500;

        TweetSample sample = new TweetSample();
        System.out.println("Saving Tweets ....");

        for (int i = 0; i < count; i++) {
            Tweet tweet = new Tweet(UUID.randomUUID(),
                                                 "paliwalashish",
                                                 "This is tweet# "+ i);
            sample.saveTweet(tweet);
            System.out.println("Saving Tweet # : "+ i);
        }

        System.out.println("Tweet Saved....");
    }

This is simple, we create tweet objects and insert them in a loop. So how does our Cassandra data looks after we run this program.


Tweets DB View

Tweets DB View


As we see for each key (UUID) we have stored two column, username and tweet data.

Example 2: Saving User Action Log

This example is very similar to our first example. Here we are storing userid, action and the URL in the DB.


Example 3: Saving Logs per Hour

In this example we wanted to save Log file per hour so that we can analyze them easily. I choose to use a SuperColumn for this, Day and hour as the keys. There can be other approaches of getting the same functionality. The idea is to have following structure for the logs


Log Storage

Log Storage

For each day, we will store logs per hour

Log POJO just has a string message to be saved. Real world scenario can be more sophisticated

Lets see how we add the data to the SuperColumn

 public void saveLogs(String tag, String hrTag, Log logMessage) {
       Mutator<String> mutator = HFactory.createMutator(keyspace, ss);
        mutator.insert(tag, LOGS, HFactory.createSuperColumn(hrTag,
                    Arrays.asList(HFactory.createStringColumn(UUID.randomUUID().toString(),
                                     logMessage.getLogMessage())),
                    ss, ss, ss));
        mutator.execute();
    }

The calls are essentially similar, but we just add more keys, like tag is Day key (YYYYMMDD), LOGS is the name of the SuperColumn. Inside the SuperColumn, we add log message with a unique id.

Here is how it looks, when stored in Cassandra.


Logs DB View

Logs DB View


Cassandra Data Model is slightly tricky to understand in the begining. There are really wondeful posts out there explaining the same. Take some time to read about the Data Model and tweak the examples, and have fun.

Happy Scaling :)


Tags: ,

17 August 2010 ~ 11 Comments

Getting Started with Terracotta Toolkit – Part 3 (Clustered Map)


Did you ever wanted to have a Map which can be clustered across JVM, without doing much work for replicating data, ensuring coherence of data? Terracotta Toolkit allows you to fullfill your dream, Use the same Map API's you have been using for long and see your data being clustered transparently with Terracotta. The fun begins, when you add new clients and they have the data with them without any additional programming.

Lets start with some general information about Toolkit. Refer Toolkit javadoc for more details here

Pre-requisite

  • Download Terracotta 3.3.0 from Download page
  • Include following terracotta-toolkit-1.0-runtime-1.0.0.jar into your classpath for using Toolkit. The jar is present inside Terracotta_Install_dir/common folder.
  • You can spend some time reading the following posts on Toolkit, although they can be read in any order

    Lets explore a bit what do we achieve when we say Clustered Map

    Unclustered Map

    Unclustered Map

    As you can see in the figure above, we have different JVM's, each having a Map instance used by Application. Now if we were to share the same instance across app's running in different nodes, we would have to write a lot of infrastructure code to replicate data, keep data coherent etc.

    Clustered Map

    Clustered Map

    The figure above depicts a Clustered Map. The Map instance is visible to App's across JVM's as if they are accessing a local instance. You can achieve this in a few lines, without worrying about the infrastructure code. Lets see how we can achieve this in few simple steps.

    Steps to creare Clustered Map

  • Initialize Toolkit
  • Get a Map reference
  • Use the Map

  • Initializing the Toolkit

    Before any operations can be performed using Toolkit, it needs to be initialized first. The initialization code is very simple, just single line

    ClusteringToolkit clustering = new TerracottaClient("localhost:9510").getToolkit();
    

    This line initializes the Terracotta Client and gets an instance of Toolkit to work with. The argument passed to the TerracottaClient is the IP Address and Port of the Terracotta Server.

    Have created a simple function for initialization of the Toolkit.

    ClusteringToolkit clusterToolkit = null;
    
    public void initializeToolKit(String serverAdd) {
    clusterToolkit = new TerracottaClient(serverAdd).getToolkit();
    }
    

    Get/Create the Map

    public void initializeMap() {
        clusteredMap = clusterToolkit.getMap(MAP_NAME);
    }
    

    The code above creates a Clustered Map. There is no more infrastructure code that you have to write. This gives us a Clustered Map reference. The best part is, there are no new API's to be learned, we continue to use the same old Map API's.

    Let's see the data class that we would share across the Map.

    public class SharedData implements Serializable {
    
        private String data;
    
        public SharedData(String data) {
            this.data = data;
        }
    
        public String getData() {
            return data;
        }
    
        public void setData(String data) {
            this.data = data;
        }
    }
    

    Its a no-brainer class. Just a simple class for demonstration purpose.

    Just to keep life simple, we are going to create a function which populates the Map, and the Nodes can then consume the populated data. Consume here would mean just call a get() and print. In real life scenario, you can mutate the data, and the updated data shall be visible across the cluster.

    Lets see the populate function

    protected void populateMap(int objectCount) {
            for(int i = 0; i < objectCount; i++) {
                SharedData data = new SharedData(""+i);
                clusteredMap.put(i, data);
            }
        }
    

    Its a simple function which just puts the data in the Map.

    Lets look at the consume function

     protected void consumeMapData(int maxObjectId) {
            Range range = new Range(1, maxObjectId);
            while(true) {
                long key = range.getRandomKey();
                byte[] dataBytes = clusteredMap.get((int)key);
                System.out.println("Key = "+key);
                if(dataBytes == null) {
                    System.out.println("OOPS!.. somethings wrong dude...");
                    return;
                }
                SharedData data = (SharedData)deserializeObject(dataBytes);
                if(data != null) {
                    System.out.println("Data = "+data.getData());
                }
                try {
                    // Sleep for 5 secs
                    Thread.sleep(5000);
                } catch (InterruptedException e) {
                    e.printStackTrace();
                }
            }
        }
    

    The function just takes a random key and call a get on the map.

    To run this implementation, we would have 3 Nodes running, 1 producer and 2 consumer. The picture below gives an idea of the topology.

    Runtime Topology

    Runtime Topology

    To run the implementation

  • Start Terracotta Server
  • Start Producer
  • Start one or more Consumer
  • You can get the complete source of the sample here


    09 August 2010 ~ 1 Comment

    Getting Started with Terracotta Toolkit – Part 2 (Cluster Events)


    In the last post we saw Terracotta toolkit Queue. In this Part, lets explore Cluster Events. Cluster Events are propagated across the Terracotta Cluster, whenever specific event happens inside the Cluster. In this post we shall explore what are the different types of Events available.

    Cluster Events

    As of 3.3.0, following are the four Events that are available

    • Node Joined
    • Node Left
    • Operations Enabled
    • Operations Disabled

    So what do we need to do, in order to receive these events? Just two simple steps

    • Implement ClusterListener
    • Register the Listener with toolkit

    Pre-requisite

  • Download Terracotta 3.3.0 from Download page
  • Include following terracotta-toolkit-1.0-runtime-1.0.0.jar into your classpath for using Toolkit. The jar is present inside Terracotta_Install_dir/common folder.
  • Implementing ClusterListener

    Here is the code

    class MyClusterListener implements ClusterListener {
    
            public void nodeJoined(ClusterEvent clusterEvent) {
                try {
                    System.out.println("Event Type = "+clusterEvent.getType());
                    System.out.println("Node Joined "+clusterEvent.getNode().getId()
                            + ", IP="+clusterEvent.getNode().getAddress());
                } catch (UnknownHostException e) {
                    e.printStackTrace();
                }
            }
    
            public void nodeLeft(ClusterEvent clusterEvent) {
                try {
                    System.out.println("Node Left "+clusterEvent.getNode().getId()
                            + ", IP="+clusterEvent.getNode().getAddress());
                } catch (UnknownHostException e) {
                    e.printStackTrace();
                }
            }
    
            public void operationsEnabled(ClusterEvent clusterEvent) {
                System.out.println("Operations Enabled");
            }
    
            public void operationsDisabled(ClusterEvent clusterEvent) {
                System.out.println("Operations Disabled");
            }
        }
    

    As you can see, the implementation is pretty simple. Now its time to register the listener

    ClusterInfo clusterInfo = clusterToolkit.getClusterInfo();
    clusterInfo.addClusterListener(new MyClusterListener());
    

    Get the complete code here

    Time to see the implementation in Action

    • Start Terracotta Server
    • Start the Listener
    • Start/Stop Terracotta Clients and see the events being received

    There can be some innovative things that can be done inside the ClusterListener. For ex, you can send SNMP Traps to your management system for nodeJoined or nodeLeft events.

    Stay tuned for more on Terracotta Toolkit...

    Looking for more details, you find them here

  • Terracotta Documentation
  • Terracotta Toolkit Javadoc
  • 04 August 2010 ~ 0 Comments

    Introducing Apache Vysper


    Apache Vysper aims to be a fully compliant XMPP Server. It's a sub-project of Apache MINA. Currently its latest release is 0.5 and work is in progress for 0.6 release. It already has an implementation of XEP0045 (Multi User Chat) and XEP0060 (Publish-Subscribe) extensions. Lets see it in action.

    Pre-requisite

    Download Apache Vysper from http://mina.apache.org/vysper/downloads.html

    Vysper can be run in standalone mode as well as in an embedded mode.

    Running as Standalone Server

    Goto the bin directory and execute

    run.sh

    The Server shall start :)

    Running in an Embedded Mode

    Embedding Vysper into your own App is easy. Here is a glimpse

    XMPPServer server = new XMPPServer("myembeddedjabber.com");
    server.start();
    

    There is still some glue code needed to get the Server function completely. Rather than me duplicating the stuff, please refer http://mina.apache.org/vysper/embed-into-another-application.html

    Looks Interesting ! Give Vysper a try and let us know about your experience.

    02 August 2010 ~ 2 Comments

    Getting Started with Terracotta Toolkit – Part 1


    Terracotta Toolkit was released as part of Terracotta 3.3.0. The Toolkit is a delight for developers working on Scalable Apps, Frameworks. For more details on features of Toolkit, refer this link.
    In this post we shall work out a few samples using the Toolkit. We shall start with looking at Cluster Events, Clustered Queues and move to clustered Locks.

    Lets start with some general information about Toolkit. Refer Toolkit javadoc for more details here

    Pre-requisite

  • Download Terracotta 3.3.0 from Download page
  • Include following terracotta-toolkit-1.0-runtime-1.0.0.jar into your classpath for using Toolkit. The jar is present inside Terracotta_Install_dir/common folder.
  • Initializing the Toolkit

    Before any operations can be performed using Toolkit, it needs to be initialized first. The initialization code is very simple, just single line

    ClusteringToolkit clustering = new TerracottaClient("localhost:9510").getToolkit();
    

    This line initializes the Terracotta Client and gets an instance of Toolkit to work with. The argument passed to the TerracottaClient is the IP Address and Port of the Terracotta Server.

    Have created a simple function for initialization of the Toolkit.

    ClusteringToolkit clusterToolkit = null;
    
    public void initializeToolKit(String serverAdd) {
    clusterToolkit = new TerracottaClient(serverAdd).getToolkit();
    }
    


    Playing with Toolkit Queue

    Toolkit provide various Data structures that you can use, Queue being one of them.

    Lets try and create a Producer-Consumer sample, using Toolkit Queue. Here is what we are trying to do




    We have a producer which is going to produce some work, and the consumer shall process the work in different JVM's.

    To achieve we need to do the following

    • Create a Work object which shall be pushed by Producer to consumers.
    • We need to create a Producer that shall produce the work
    • We need to create a Consumer which shall consume the work. We can have multiple instances of Consumers

    Lets have a look at Work Object

    class Work implements Serializable, Runnable {
    private String workItem;
    
    Work(String workItem) {
    this.workItem = workItem;
    }
    
    public String getWorkItem() {
    return workItem;
    }
    
    public void setWorkItem(String workItem) {
    this.workItem = workItem;
    }
    
    public void run() {
    System.out.println(Thread.currentThread().getName() + "processing - "+getWorkItem());
    }
    }
    

    NOTE: The Objects that shall be pushed onto the Queue should implement Serializable.

    What we have done here. Its a very simple implementation, where we just print what the Producer pushed.
    Implementing Runnable is not necessary. I just added it so that I can push the work unit directly to an Executor

    Creating Producer

    
    static final String QUEUE_NAME = "DATA_QUEUE";
    
    public void startProducer(int capacity) {
    BlockingQueue<byte[]> queue = clusterToolkit.getBlockingQueue(QUEUE_NAME, capacity);
    System.out.println("Starting Producer....");
    int i = 0;
    while (i < capacity) {
    queue.add(serializeObject(new Work(""+i++)));
    try {
    Thread.sleep(1000);
    } catch (InterruptedException e) {
    e.printStackTrace();
    }
    }
    }
    

    Creating a clustered Queue is a one line job. We have already initialized the Toolkit, now we just call getBlockingQueue() API from the toolkit, and we get a Distributed Queue backed by Terracotta :)
    After creating the Queue, we keep on adding work to the queue till its capacity is reached.

    Creating Consumer

    public void startConsumer(int capacity) {
    BlockingQueue<byte[]> queue = clusterToolkit.getBlockingQueue(QUEUE_NAME, capacity);
    
    // Lets add some threads to consumer
    ExecutorService executors = Executors.newFixedThreadPool(5);
    
    // a dumb way for a loop
    while(true) {
    try {
    executors.execute((Work)deserializeObject(queue.take()));
    } catch (InterruptedException e) {
    e.printStackTrace();
    }
    }
    }
    

    Aha! no difference in creation, the only difference is our Consumer calls take() API on the Queue to consume the work objects. Here we have a fixed size Thread pool that keep consuming the work that the producer is pushing.

    This is it. Our implementation is complete :) We are ready to run the sample. To execute we need to do three steps

    • Start Terracotta Server
    • Start Producer
    • Start one or more Consumers

    See them in action yourself.

    The complete code can be found at http://code.google.com/p/terracotta-samples/

    Some ideas around what can we do with Distributed queue
    1. Distributed SEDA stages across JVM's
    2. Distributed Task processing
    ... add your own ideas..

    Would be interested in knowing what you do around Terracotta Toolkit Queue. Please do add comments with your implementations.

    What's Next?

    In the next post we shall be exploring more about the Cluster Events as part of Terracotta Toolkit.