Thursday, July 11, 2013

Scaling Mount Success




With success of a product comes demands on your infrastructure from usage and data associated with it. Invariably you have a mountain of data that you can use to personalize or learn from. The infrastructure needs to do that was exactly the topic of  great discussion with a person from mobile and social gaming industry. His query was - "how would you scale when you have over millions of users and billion plus user records and growing".  In other words, do we need to add servers or should we use cloud infrastructure such as Amazon Web Services. Excellent question and I suspect he already knew the answer. My answer and indeed most startup I know today would have the same answer - move to a cloud based infrastructure, It just makes sense not just from a setup cost perspective, but from a cost benefit analysis and the ability to tune up or down based on demand. His point that game usage  has a fad trajectory - extremely popular for a few months and then gone the next-  reinforced that point.

Having said that, I do see regulatory requirement particularly around some security and privacy issues, existing investment in infrastructure, have stringent SLA requirements, and need to have full control over the infrastructure as primary factors behind not going to IAAS.

The next question naturally was around scaling data. How would you handle data warehouses which indeed can be unwieldy when your customer base grows. Yes, we can always take for granted HA, FT needs and cluster computing in our architecture. But when it comes today's world, we have to think Big  Data...

I particularly like Hive on Hadoop. Hadoop is already mainstream platform where packaged applications or SAAS applications like Silvo can addresses millions of retail interactions that deliver business value to users and is becoming widely available. Walmart labs in San Bruno is doing amazing things and expect them to be the flag bearer.  I believe Hadoop has gone beyond the first generation of applications primarily around consumer to becoming a core platform to deliver data heavy products for enterprises. Retail Science delivered over a SAAS platform can benefit from its ability to scale.

Yes, Hadoop can be slow but using it with Hive not for live or RT queries, but for preprocessing seems to make sense. I fashion Cassandra for real time live queries. In general, I look for solutions that hide complexity of database partitioning, Federated Tables or sharding, offer high performance and can be scaled. What more, I am seening some elements of support for ORM that will allow me to write Rails App easily


More on that later! Would love to hear your thoughts.




1 comment:

Unknown said...

Hi Satya

Fantastic points. I will send you a presentation I made at Grenoble at IBM that reflects the evolving architecture.