QCon San Francisco 2010 Redux

So, I just got back from QCon SF 2010 last night.  All in all, a very good conference.  Rather than write up any kind of extensive summary, I’ll offer up my rapid digest of the major themes from the sessions I attended.  Without further ado, here it is in outline form:

  1. Dealing with data at large scale
    1. OLTP
      1. Those who can get away with it are using systems that have more flexible consistency models than traditional RDBMS (CAP theorem trade-offs)
        1. Most using some form of eventual consistency
        2. Many sites implementing their own Read-Your-Own-Writes consistency on top of more general storage systems
        3. These systems must deal with data growth (partitioning data across nodes)
        4. Must deal with hot spots (redistributing / caching hot data across many nodes)
        5. Must deal with multiple data centers (some are simply punting on this)
      2. Twitter and Facebook both built their own key-value stores on top of MySql, Memcache
        • Twitter’s solution seemed a little cleaner, Facebook’s a little more crusty
      3. Amazon S3: also key value store with own caching, replication, consistency models
        • This one had the most sophisticated seeming solution for dealing with hot spots
    2. OLAP
      1. Lots of people using Hadoop to crunch offline data
        1. Good tools for workflow of jobs, dependency management, monitoring are essential
        2. Quantcast found that EC2 was not adequate for their needs in terms of throughput compared to an owned, highly-tuned cluster, though it has improved over time
          • still good to have on hand for surge capability
    3. Operating on the public cloud
      1. Increased demand for monitoring — and most monitoring tools not built for cloud instances that wink in and out of existence
      2. Increased demand for fault-tolerance — latency can vary more widely, hardware failures happen out of your control
      3. Increased demand for sophisticated deployment automation
      4. Motivation is that you want to use a cloud, not build one
        1. Capacity planning is difficult when you’re in a huge growth scenario
        2. Leverage the staffing and expertise of the public cloud companies (Amazon, Gigaspaces, etc)
        3. Data center is a large, inflexible capital commitment
      5. Traditional CDNs are still necessary and useful for low-latency, high bandwidth media delivery
      6. PCI compliant storage in the cloud is not a solved problem
    4. Serious interest in alternative languages, both on and off the JVM
      1. There are lots of serious choices available in this sphere (scala, jruby, javascript -> node.js, erlang, clojure)
      2. Lots of enthusiasm for JVM, less enthusiasm for oracle’s ability or intention to be good stewards of it
    5. Though there were many very good sessions, especially in the Architectures You Always Wondered About track, in terms of sheer rock-star appeal these two presentations appeared to be the standouts that had everyone talking:
      1. LMAX – How to do over 100k concurrent transactions per second at less than 1ms latency
      2. Node.js

Leave a Reply

Your email address will not be published. Required fields are marked *