Exploring Terracotta tim-async – Scalable way to off-load DB updates from App Transaction

Lets start this chat with a small conversation

Manager: What are the options we have to increase the TPS of our SMSC Server?

Architect: Can you elaborate a bit on this?

Manager: As of now we process a request and update the Database in the same tranaction. Essentially, we are operating our Server at the TPS at which our DB can operate.

Architect: Well we have to ensure that our Data is persisted

Manager: Yes, but can't we update the DB asynchronously, without lossing data

Architect: Yes it can be done, but we have to create a lot of infrastructure code around this, so that we don't loose data for failed transaction, process crashes etc.

If you have been in such a situation, the post is meant for you.

Summary

For any request, don't update the Database in same transaction, but data need to be persisted. This has always been a typical need to increase the liveliness of the application. The application accessing data in memory work must faster, than fetching/updating data form databases. Assume you have a solution which helps you to work in memory and allows your application to update the data to Datastore in separate transaction, providing HA and Scalability out of the box. If this is what you are looking at, Terracotta async processor (tim-async) is the right fit for you.

The article is organized to detail what tim-async is, followed by a sample application.

We shall see tim async in brief and then later would walk through the a small POC that I build around the same.

Pre-requisite

You should have basic understanding of Terracotta 3.2 and its Configuration.
You need to download tim-async using tim-get tool

What is tim-async?

tim stands for Terracotta Integration Module, and tim-async is module for asynchronous processing. tim-async provides a scalable, high performance way to asynchronously write the business data to data source (typically a database), while the application works on in-memory data structures. Decoupling the main processing from the underlying database decreases the write latency of domain objects, while high availability of data is provided by terracotta.

What features does it have

  • Multithreaded write behind to flush the processed data from the client VMs (L1s) to the data source
  • Data to be flushed remains highly available (HA) as it is shared with terracotta.
  • Every client takes the responsibility of writing its data set to the data source
  • Listeners to monitor work progress at bucket level.

NOTE: Have pulled out this information from the attachment here

Building Blocks of tim-async from User Perspective

From a User's POV, these three are the important building blocks

  • ItemProcessor
  • AsyncCoordinator
  • AsyncConfig

These are all classes/interface that are part of tim-async. Lets look at them in brief

ItemProcessor

Its an interface and the implementation class of this interface is where we shall be writing the actual processing logic of the Domain Object, like persisting it to the DB.

public interface ItemProcessor<I> {
public void process(I item) throws ProcessingException;
}

AsyncCoordinator

Its a core class and need to be declared as Shared Root in tc-config.xml. It allows work to be added and processed asynchronously.
The class has a method start() that needs to be called by every client node (Terracotta L1) to participate in processing.
An ItemProcessor instance is local to the node, so its the right place to hold stuff DB Connection etc

AsyncConfig

Its an interface and can be implemented to custom tailor the configuration of Async processing. If we don't define one, the org.terracotta.modules.async.configs.DefaultAsyncConfig is used.

public interface AsyncConfig {
public long getWorkDelay();

public long getMaxAllowedFallBehind();

public boolean isStealingEnabled();
}

Now lets see how can we implement tim-async in our application (We shall see the sample application)

- We need to implement ItemProcessor for doing custom processing say for example putting the stuff in DB
- We need to initialize AsyncCoordinator and call start() API, pass on the ItemProcessor from previous step as a param.
- Create tc-config.xml (pick teh sample from tim-async example), add the AsyncCoordinator field instance as the root
- Start TC Server and Client JVM's with tc-config.xml created above

Lets see these steps in Action.

Sample Application

I had written a SMSC simulator based on jsmpp which used to receive the SubmitSM request from Client, used to persist in the DB and return
the id to the client. There is a specific reason why I choose this application. I already had this running and since the SubmitSM just needed to persisted in DB for subsequent processing by other process. The business case was justified that the Server is very responsive to Client, as well as its ensured that the Data is persisted.

Lets walk through the a simple processing sequence

  • Client send a SubmitSM request
  • Server translated it into an internal SMS POJO, assigns a unique ID to the SMS
  • Server persists the SMS in Database
  • Sever sends the ID generate as part of SubmitSM response

The application uses Hibernate to save the POJO in MySQL Database.

Since the Domain object is least significant so won't discuss the internals of SMS POJO

Lets see how we enhanced the application to asynchronously update the DB

Implementing the ItemProcessor

public class AsyncSMSProcessor implements ItemProcessor {
public void process(final SMS sms) throws ProcessingException {
Session hibernateSession = DAO.getSession();
DAO.getSession().getTransaction().begin();
DAO.getSession().save(sms);
DAO.getSession().getTransaction().commit();
}
}

Essentially what we have to write the logic of saving the SMS object to the database. This is the object that Terracotta is going to pass back to us.
What we do here is save the SMS in a Hibernate transaction. This completes out ItemProcessor code

Initialize AsyncCoordinator

public class SMSProcessor {

public AsyncCoordinator asyncUpdator;

public SMSProcessor() {
asyncUpdator = new AsyncCoordinator();
// Use the default API

asyncUpdator.start(new AsyncSMSProcessor());

}

}
public long processIncomingSMS(SubmitSm submitSm,
SMPPServerSession source, ...) {

// some logic here
// give the date to AsyncCoordinator
asyncUpdator.add(sms);
// some logic there

}

This is the change that we have done here. Earlier we used to save the SMS in DB here. We have now given the data to AsyncCoordinator for subsequent processing.

Let's look at the tc-config.xml

this is what we need to add to the tc-config.xml

<application>
<dso>
<!--Declaring a field of a class a root will make it available for all instances
of our app that runs via DSO-->
<roots>
<!-- XXX: -->
<root>
<field-name>com.tc.timasync.demo.smsAsyncCoordinator</field-name>
</root>
</roots>
</dso>
</application>

We declared the AsyncCoordinator variable as DSO root here, which makes it shared across cluster.

That's it.

We are ready. Now we start Terracotta server with our tc-config file and start our SMSC Simulator as TC Client.

How does it all fit together

As we see the instead of updating the DB in the same transaction, we handed the data over to Terracotta. Essentially, this meant that our data has gone to Server, and won't be lost against crashes. The application operated at memory speed and responded to client. The Terracotta Server based on the configuration of tim-async shall call the ItemProcessor in different transaction. We can have multiple nodes updating the DB asynchronously or some other way, based upon the Use Case.

Let's run the both the implementations

Note on Comparison of both the implementations

I shall run both the implementation on the same hardware/OS, which would be my Laptop with 2GB of RAM, runnning Windows and MYSQL 5.X. The comparison shall be relative for the two implementation approached, and not the absolute figures. The figures would be much better if the samples are run on higher end machined with Terracotta Server (L2) running on a dedicated machine and the Clients (L1) running on different machines. The idea here is just to see the TPS of the Server.

Case I: DB updates in App transaction

For 10 runs of sending 1000 SubmitSM request client took following time

Total Time for 1000 SMS = 28891
Total Time for 1000 SMS = 34234
Total Time for 1000 SMS = 27812
Total Time for 1000 SMS = 34547
Total Time for 1000 SMS = 29953
Total Time for 1000 SMS = 34344
Total Time for 1000 SMS = 33985
Total Time for 1000 SMS = 35328
Total Time for 1000 SMS = 36968

Case II: DB updates based on tim-async

Total Time for 1000 SMS = 1359
Total Time for 1000 SMS = 1125
Total Time for 1000 SMS = 781
Total Time for 1000 SMS = 813
Total Time for 1000 SMS = 750
Total Time for 1000 SMS = 719
Total Time for 1000 SMS = 703
Total Time for 1000 SMS = 719
Total Time for 1000 SMS = 718
Total Time for 1000 SMS = 735

There was a significant decrease in the processing time of request 🙂

What's next

Download Terracotta from here and try yourself. You can also register for upcoming Darwin release, which has many new interesting features

Resources

  • http://blog.terracottatech.com/2009/02/offloading_a_db_even_when_upda.html - A wonderful post by Ari
  • http://www.slideshare.net/sbtourist/real-terracotta-presentation
  • http://forums.terracotta.org/forums/forums/list.page

4 thoughts on “Exploring Terracotta tim-async – Scalable way to off-load DB updates from App Transaction

  1. Why shouldn’t I consider using a message queue for write-behind? MQs expose manageability and configuration that can get one up and running very fast.

    If all I need is in-memory async operations, why not something like redis[1] or an in-memory queue data structure upon which two threads (1. customer-facing write response and 2. DB-facing write-behind operation) operate?

    [1] http://code.google.com/p/redis/

  2. Well there are multiple ways to achieve this. BTW, the example shown there works across multiple JVM. That’s the beauty. And with Terracotta your Threads in different JVM’s can coordinate as if they were in same JVM. In the next release write behind is the feature in-built in the product 🙂

    Also, with TC things are a little different with TC, redis saves ur data periodically, but with TC works a little different way. I encourage you to try Terracotta, and see for yourself.

  3. Hi Ashish

    I have just started using terracotta. Can you tell me from where u downloaded tim-async jar ? I have already downloaded few jars but i cannot resolve AsyncCoordinator class with those jars ? Do we have to use multiple jars ?

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.