Cloud Enabled Scalability – My Story
Something prompted me to blog again after a long time and not sure if I would be motivated to continue. First, I thought to share a long pending experience, which I had the opportunity to have eight years back.
The best weapon we have with the rise of Cloud is on demand scalability (my reference is of 8 years back which everyone knows now). Most of the times when we talk about cloud, we are not talking enabling application for cloud adoption but only talk about cloud infrastructure management (which is sad).
I am going to talk about my experience in making an application cloud enable and the use of the design principles. As I am talking an experience gained 8 years back (I am a bit rusty now) I will go through the journey with the case study.
Thats How it Started
So our story start back in 2011 when hero Mr Dan came up
with an idea to monetize the online imagery content of advertisement and make
some money (back then it was a cool idea, Google was doing it only for its videos).
I will not get the architectural design of the application, as that is not the
purpose of this blog.
Let me jump to the main turning point of our story. Application
version 1.0 was ready. The pilot website was ready. Time was chosen to showcase
the world new rise of website real estate monetization power.
Thats How it Unfolded
6 AM morning the whole customer was ready to see the
results. My team was on the console to read the pulse of the application.
Everything was set.
And ....
In 10 minutes, the application went BOOM. We all were
devastated that our whole effort blown up in minutes.
Retrospection showed that all the lexical relevance and proximity
algorithms were good. Coding standards were fine, but what went wrong.
WE COULD NOT SCALE, based on the traffic flow.
Let’s get to the point now on what we did to fix the issue.
Below is the scalability design principles we came up with and implemented.
Thats How It Changed
Following are the points of learning from the experience in
the order of implementation to make your application scalable and leverage
cloud infrastructure. However, the “Design Pattern for Scalable Application”
was derived almost 8 years back but still holds good.
Concurrency
First and foremost, your application should support the concurrency.
It actually means that your application design should enable processing mutually
exclusive tasks independently. It helps us use the CPU time slicing to the
best. What actually it means is that first, we have to use each CPU to the best
probable capacity.
In java, Concurrency framework is there for your rescue. It
primarily consists of two components, processes and threads. Not going in
details, process contains thread and threads are lightweight processes. The
java.util.concurrent package offers improved support for concurrency compared
to the direct usage of Threads.
What we did
We created separate processes for ‘Crawler’, ‘Data Processing’
and ‘Best Match’ on each URL, as these were completely independent pieces of
puzzle. It enabled us to control the resource allocation individually. In ‘Data
Processing’ process, each URL was processed in its individual thread.
Parallelism
It mean the ability to break a task in smaller tasks to
complete them faster. Divide and Conquer is the best example for implementation
(remember merge sort). Another excellent design example for parallel processing
is Map / Reduce.
Our privilege to rely on CPU speed restricted back around
~2008 when we maxed out the capacity of number of transistors on one CPU. Then
the era started for multicores and the need for parallelism.
Back then, Java did have a concurrency framework but
unfortunately nothing for parallelism. Later we saw the rise of languages,
which came up to manage this challenge like GoLang.
Java also came up with pipelines and streams (more of
MapReduce). JDK also came up Fork and Join implementation in Concurrency
framework in JDK 8.
What we did
Unfortunately, we did not have much of the language
framework support back then. We resorted to the basics by implementing threads
(fork and join). Each URL was supposed to be processed with multiple data
analysis criteria to arrive page scoring. Each criteria score was calculated
with a separate child thread (divide) and then merge the results to final page
score with weightage (conquer).
Remove Contention
If you have a non-scalable module in your application which
all or most of the application depends, means you have a contention point. These
needs to be removed or your other aspects for scalability improvements may lose
the relevance as overall effective solution. There is no silver bullet for it.
It all depends on the application design and the need of the solution.
What we did
We did a quick performance evaluation of the application and
realized that even though we are implementing parallel processing, all
different pieces are heavily dependent on the DB. Which prompted three
decisions. One redesign DB based and de-normalize the tables based on the
modules specific needs rather being a purely normalized DB. Secondly, the
persistence layer needed to be scalable service to use load-balancing
capabilities. And, lastly we reduced the DB IO operation by having in memory DB
for operations which could wait for persistence for a duration of time.
Microism / Modularism
Most of us know now what it means by 2020. The more modular
we make our application design or as micro we make our services, more are the
chances to use the cloud infrastructure to scale our application runtime.
What we did
As you have seen in previous steps, each step contributed in
making the services modular. The more formalized version is micro services architecture
in today’s world.
Hardware Scaling
The last step is now to leverage the hardware scaling. Cloud
gives us the capability to perform hardware scaling at length. By the time we
reach to this step, we are the king. First step, vertical scaling where we add
high capacity CPU and with concurrency implemented, we are ready to use it to
our advantage. Second step again vertical scale by adding multi core processing
power. With parallelism, we are ready to use the added horsepower.
Next step includes the horizontal scaling. We add load
balanced machines based on the firepower need for individual component of the
application like DB, persistence, analytics etc. and since our application is
modularized and micro, we are all set to take advantage.
Conclusion
As per my learning, if you really need to leverage cloud for
limitless scale, please follow the steps in order to refactor your application
then world is yours.
What happened to us
We refactored the whole application in 2 months and had a
major success. This was a very big learning on how to achieve success with
right decisions. Many things we implemented at that time are all readily
available in languages or frameworks now. We had to create our own producer
consumer model but now we have Apache Kafka. Similarly, we went for binary
protocol transfer than Text data protocols like JSON for any internal data
exchange.
I hope I could make some sense. At least enjoyed the story.
Thanks for reading my story.
Very nice
ReplyDeletewell written !
ReplyDelete