Lets start this chat with a small conversation
Manager: What are the options we have to increase the TPS of our SMSC Server?
Architect: Can you elaborate a bit on this?
Manager: As of now we process a request and update the Database in the same tranaction. Essentially, we are operating our Server at the TPS at which our DB can operate.
Architect: Well we have to ensure that our Data is persisted
Manager: Yes, but can't we update the DB asynchronously, without lossing data
Architect: Yes it can be done, but we have to create a lot of infrastructure code around this, so that we don't loose data for failed transaction, process crashes etc.
If you have been in such a situation, the post is meant for you.
Summary
For any request, don't update the Database in same transaction, but data need to be persisted. This has always been a typical need to increase the liveliness of the application. The application accessing data in memory work must faster, than fetching/updating data form databases. Assume you have a solution which helps you to work in memory and allows your application to update the data to Datastore in separate transaction, providing HA and Scalability out of the box. If this is what you are looking at, Terracotta async processor (tim-async) is the right fit for you.
The article is organized to detail what tim-async is, followed by a sample application.
We shall see tim async in brief and then later would walk through the a small POC that I build around the same.
Pre-requisite
You should have basic understanding of Terracotta 3.2 and its Configuration.
You need to download tim-async using tim-get tool
What is tim-async?
tim stands for Terracotta Integration Module, and tim-async is module for asynchronous processing. tim-async provides a scalable, high performance way to asynchronously write the business data to data source (typically a database), while the application works on in-memory data structures. Decoupling the main processing from the underlying database decreases the write latency of domain objects, while high availability of data is provided by terracotta.
What features does it have
- Multithreaded write behind to flush the processed data from the client VMs (L1s) to the data source
- Data to be flushed remains highly available (HA) as it is shared with terracotta.
- Every client takes the responsibility of writing its data set to the data source
- Listeners to monitor work progress at bucket level.
NOTE: Have pulled out this information from the attachment here
Building Blocks of tim-async from User Perspective
From a User's POV, these three are the important building blocks
- ItemProcessor
- AsyncCoordinator
- AsyncConfig
These are all classes/interface that are part of tim-async. Lets look at them in brief
ItemProcessor
Its an interface and the implementation class of this interface is where we shall be writing the actual processing logic of the Domain Object, like persisting it to the DB.
public interface ItemProcessor<I> {
public void process(I item) throws ProcessingException;
}
AsyncCoordinator
Its a core class and need to be declared as Shared Root in tc-config.xml. It allows work to be added and processed asynchronously.
The class has a method start() that needs to be called by every client node (Terracotta L1) to participate in processing.
An ItemProcessor instance is local to the node, so its the right place to hold stuff DB Connection etc
AsyncConfig
Its an interface and can be implemented to custom tailor the configuration of Async processing. If we don't define one, the org.terracotta.modules.async.configs.DefaultAsyncConfig is used.
public interface AsyncConfig {
public long getWorkDelay();
public long getMaxAllowedFallBehind();
public boolean isStealingEnabled();
}
Now lets see how can we implement tim-async in our application (We shall see the sample application)
- We need to implement ItemProcessor for doing custom processing say for example putting the stuff in DB
- We need to initialize AsyncCoordinator and call start() API, pass on the ItemProcessor from previous step as a param.
- Create tc-config.xml (pick teh sample from tim-async example), add the AsyncCoordinator field instance as the root
- Start TC Server and Client JVM's with tc-config.xml created above
Lets see these steps in Action.
Sample Application
I had written a SMSC simulator based on jsmpp which used to receive the SubmitSM request from Client, used to persist in the DB and return
the id to the client. There is a specific reason why I choose this application. I already had this running and since the SubmitSM just needed to persisted in DB for subsequent processing by other process. The business case was justified that the Server is very responsive to Client, as well as its ensured that the Data is persisted.
Lets walk through the a simple processing sequence
- Client send a SubmitSM request
- Server translated it into an internal SMS POJO, assigns a unique ID to the SMS
- Server persists the SMS in Database
- Sever sends the ID generate as part of SubmitSM response
The application uses Hibernate to save the POJO in MySQL Database.
Since the Domain object is least significant so won't discuss the internals of SMS POJO
Lets see how we enhanced the application to asynchronously update the DB
Implementing the ItemProcessor
public class AsyncSMSProcessor implements ItemProcessor {
public void process(final SMS sms) throws ProcessingException {
Session hibernateSession = DAO.getSession();
DAO.getSession().getTransaction().begin();
DAO.getSession().save(sms);
DAO.getSession().getTransaction().commit();
}
}
Essentially what we have to write the logic of saving the SMS object to the database. This is the object that Terracotta is going to pass back to us.
What we do here is save the SMS in a Hibernate transaction. This completes out ItemProcessor code
Initialize AsyncCoordinator
public class SMSProcessor {
public AsyncCoordinator asyncUpdator;
public SMSProcessor() {
asyncUpdator = new AsyncCoordinator();
// Use the default API
asyncUpdator.start(new AsyncSMSProcessor());
}
}
public long processIncomingSMS(SubmitSm submitSm,
SMPPServerSession source, ...) {
// some logic here
// give the date to AsyncCoordinator
asyncUpdator.add(sms);
// some logic there
}
This is the change that we have done here. Earlier we used to save the SMS in DB here. We have now given the data to AsyncCoordinator for subsequent processing.
Let's look at the tc-config.xml
this is what we need to add to the tc-config.xml
<application>
<dso>
<!--Declaring a field of a class a root will make it available for all instances
of our app that runs via DSO-->
<roots>
<!-- XXX: -->
<root>
<field-name>com.tc.timasync.demo.smsAsyncCoordinator</field-name>
</root>
</roots>
</dso>
</application>
We declared the AsyncCoordinator variable as DSO root here, which makes it shared across cluster.
That's it.
We are ready. Now we start Terracotta server with our tc-config file and start our SMSC Simulator as TC Client.
How does it all fit together
As we see the instead of updating the DB in the same transaction, we handed the data over to Terracotta. Essentially, this meant that our data has gone to Server, and won't be lost against crashes. The application operated at memory speed and responded to client. The Terracotta Server based on the configuration of tim-async shall call the ItemProcessor in different transaction. We can have multiple nodes updating the DB asynchronously or some other way, based upon the Use Case.
Let's run the both the implementations
Note on Comparison of both the implementations
I shall run both the implementation on the same hardware/OS, which would be my Laptop with 2GB of RAM, runnning Windows and MYSQL 5.X. The comparison shall be relative for the two implementation approached, and not the absolute figures. The figures would be much better if the samples are run on higher end machined with Terracotta Server (L2) running on a dedicated machine and the Clients (L1) running on different machines. The idea here is just to see the TPS of the Server.
Case I: DB updates in App transaction
For 10 runs of sending 1000 SubmitSM request client took following time
Total Time for 1000 SMS = 28891
Total Time for 1000 SMS = 34234
Total Time for 1000 SMS = 27812
Total Time for 1000 SMS = 34547
Total Time for 1000 SMS = 29953
Total Time for 1000 SMS = 34344
Total Time for 1000 SMS = 33985
Total Time for 1000 SMS = 35328
Total Time for 1000 SMS = 36968
Case II: DB updates based on tim-async
Total Time for 1000 SMS = 1359
Total Time for 1000 SMS = 1125
Total Time for 1000 SMS = 781
Total Time for 1000 SMS = 813
Total Time for 1000 SMS = 750
Total Time for 1000 SMS = 719
Total Time for 1000 SMS = 703
Total Time for 1000 SMS = 719
Total Time for 1000 SMS = 718
Total Time for 1000 SMS = 735
There was a significant decrease in the processing time of request
What's next
Download Terracotta from here and try yourself. You can also register for upcoming Darwin release, which has many new interesting features
Resources
- http://blog.terracottatech.com/2009/02/offloading_a_db_even_when_upda.html - A wonderful post by Ari
- http://www.slideshare.net/sbtourist/real-terracotta-presentation
- http://forums.terracotta.org/forums/forums/list.page