Category Archives: Software Engineering

Software engineering best practices, theory, horror stories, etc.

Java Open Source Ruby Software Engineering The Business The Innerweb

QCon San Francisco 2010 Redux

So, I just got back from QCon SF 2010 last night.  All in all, a very good conference.  Rather than write up any kind of extensive summary, I’ll offer up my rapid digest of the major themes from the sessions I attended.  Without further ado, here it is in outline form:

  1. Dealing with data at large scale
    1. OLTP
      1. Those who can get away with it are using systems that have more flexible consistency models than traditional RDBMS (CAP theorem trade-offs)
        1. Most using some form of eventual consistency
        2. Many sites implementing their own Read-Your-Own-Writes consistency on top of more general storage systems
        3. These systems must deal with data growth (partitioning data across nodes)
        4. Must deal with hot spots (redistributing / caching hot data across many nodes)
        5. Must deal with multiple data centers (some are simply punting on this)
      2. Twitter and Facebook both built their own key-value stores on top of MySql, Memcache
        • Twitter’s solution seemed a little cleaner, Facebook’s a little more crusty
      3. Amazon S3: also key value store with own caching, replication, consistency models
        • This one had the most sophisticated seeming solution for dealing with hot spots
    2. OLAP
      1. Lots of people using Hadoop to crunch offline data
        1. Good tools for workflow of jobs, dependency management, monitoring are essential
        2. Quantcast found that EC2 was not adequate for their needs in terms of throughput compared to an owned, highly-tuned cluster, though it has improved over time
          • still good to have on hand for surge capability
    3. Operating on the public cloud
      1. Increased demand for monitoring — and most monitoring tools not built for cloud instances that wink in and out of existence
      2. Increased demand for fault-tolerance — latency can vary more widely, hardware failures happen out of your control
      3. Increased demand for sophisticated deployment automation
      4. Motivation is that you want to use a cloud, not build one
        1. Capacity planning is difficult when you’re in a huge growth scenario
        2. Leverage the staffing and expertise of the public cloud companies (Amazon, Gigaspaces, etc)
        3. Data center is a large, inflexible capital commitment
      5. Traditional CDNs are still necessary and useful for low-latency, high bandwidth media delivery
      6. PCI compliant storage in the cloud is not a solved problem
    4. Serious interest in alternative languages, both on and off the JVM
      1. There are lots of serious choices available in this sphere (scala, jruby, javascript -> node.js, erlang, clojure)
      2. Lots of enthusiasm for JVM, less enthusiasm for oracle’s ability or intention to be good stewards of it
    5. Though there were many very good sessions, especially in the Architectures You Always Wondered About track, in terms of sheer rock-star appeal these two presentations appeared to be the standouts that had everyone talking:
      1. LMAX – How to do over 100k concurrent transactions per second at less than 1ms latency
      2. Node.js
Open Source Software Engineering

The Web Browser as a NeoVictorian Computing Triumph

I had a minor realization today about what Mark Bernstein was talking about in his blog posts about NeoVictorian Computing (which I mentioned earlier).  I had emotionally connected to what he was saying about how the software industry should return to an ethos of craft and artisanship, and how software should not try to hide its joints or its materials, but rather be constructed honestly and display its structure.  I got it, kind of, but the ideas remained abstract for me.

But I realized today that in fact the web browser, now so ubiquitous, is in its own way very much of a kind with his ideas.  Sure, browsers now are built by large teams, but the original Mosaic browser was designed and implemented by a few talented people at NCSA. And though browsers have gained a lot of functionality and efficiency through years of revision, they still retain the essential form of the original.  The exposed URLs in the location bar giving evidence of the network protocols and spaces between the loaded documents. That simplicity and willingness to make the user meet the technology head on is part of what has made the browser such a success as a technology.

Ruby Software Engineering

First Impressions of Ruby

I’ve been reading my way through The Ruby Programming Language by David Flanagan and Yukihiro Matsumoto over the last couple of weeks (very well-written, by the way) and having read up through Chapter 7, I’m about at the point now where I’m ready to do something useful with the language.  I figure now is a good time to jot down a few quick first impressions of Ruby before I get comfortable enough with the language that all of its idiosyncrasies vanish into familiarity.

First, the negative side:

  • I’m skeptical of the value of optional parentheses in method invocation.  I see how being able to omit parentheses in a call to a setter method makes it look more like straight assignment, and that’s pretty cool.  But other than that, I see nothing clearer or cleaner about “o.do_something a, b” than, “o.do_something(a, b)”.  Given the numerous situations in which confusion can arise with the rules about when parentheses can and can’t be omitted, I think it’s highly debatable whether optional parentheses are a plus.
  • Method aliasing.  I have not gone through the chapter on meta-programming yet, which is where I think the real justification for method aliasing comes from.  However, in regular-old-programming land, having multiple aliases for methods sucks.  The claim is that it makes the language more expressive because you can define a name for a method that fits the use case better.  An example is the Range class’ include? method which also has the alias member?.  To me, a method that needs aliasing is a method that wasn’t properly named in the first place.  It’s also just more complexity than necessary.
  • “do” and “end” as the conventional block-delimiters for large blocks.  Ugh.  I guess Matz must have been an avid shell-scripter.  To me, this is just bad usability.  The C family of languages (and Perl!) got it right with { and }.  There’s a symmetry there that mirrors their logical function, they are commonly used as delimiters for a set in mathematical notation, and what is a block but a set of statements or expressions? Plus, the construct of a block is something you will be typing over and over and over.  Why do we have to type a total of five characters to delimit it?  “do” and “end” doesn’t really flow nicely, and it doesn’t even come close to mimicking any sort of natural language expression.  I don’t see where it adds any clarity.   Sigh.  However, I’ve come to like Python’s rather individual use of significant whitespace, so perhaps I’ll come to like this as well.
  • Constants that are not enforced as constant.  We have a name for this where I come from: global variables.

On the positive side:

  • I like the use of $, @, and @@ prefixes on variable names to indicate global, instance and class scope.
  • The ability to pass arbitrary blocks to methods is an interesting approach to functional programming.
  • The little I’ve seen of Ruby’s reflection abilities looks like they enable you to do more than Java’s reflection API with fewer lines of code. That may be dangerous, but it certainly looks like fun.
  • Modules!  Yay!  I’m sick of writing utility-style classes with static methods in Java and then playing tricks to keep people from instantiating them.

I will try to write a follow-up after I’ve used the language a bit more.  I’m sure I’ll disabuse myself of some of my criticisms along the way.

Software Engineering The Business

business and engineering

I’ve been thinking lately about the balancing act that a lot of engineering managers have to do between serving the needs of “the business” and serving more technological motivations. It’s certainly a tightrope walk. On one hand, the manager’s job is to keep their team’s activity in budget and in line with the higher level goals of the business. I haven’t had those responsibilities, but I can see that they’re both necessary and full of difficulties. Standing opposite of that perspective is the need to stay on top of technological change. Things move fast in the software world, and people who stand still are frequently left behind.

My impression is that the most common mistake that otherwise smart, responsible engineering managers make is to focus on the business aspects of their responsibilities to the exclusion of the technology aspects. This is understandable — presumably their training and earlier professional experience is in technology and thus the general business aspect of the job is what demands their extra attention. However, I’m of the firm opinion that losing enthusiasm for technology, or losing touch with what’s happening with technology is a cardinal sin for someone in technology management.

It’s not just that those sorts of loss represent a dangerous detachment from one’s roots, but also that they take away what would be the technology manager’s edge. They put one in the position of simply being a “person who knows how to talk to the engineers”. There are probably people like that graduating from business school every day. A far rarer creature is the one who could build the system from the ground up by themselves using all of the latest technologies and who also understands how to guide their team to fulfilling the needs of the larger business. It’s the latter person who is in a better position to deliver innovative and timely solutions.

Java Software Engineering

Yegge’s Code Bloat

This is another quick foray into blog criticism in response to a recent missive, Code’s Worst Enemy, by the inimitable Steve Yegge. Let me preface my criticism by saying that I’m sure Steve Yegge is a nice guy, and I realize that he hyperbolizes for effect in his blog. I am merely doing the same here. So, to the matter at hand…

Mr. Yegge’s post has a few main points:

1) That code bloat is a serious problem. He claims to be somehow in the minority opinion on this, but actually I think it’s kind of a noncontroversial point. No one likes code bloat. In fact, the very term “bloat” is deliberately negative. If anyone felt neutral about it, it would be called “code growth”.

2) That the Java programming language and Java programmers are the worst offenders when it comes to code bloat.

3) That dynamic languages (a la Javascript, Python, Ruby) are the answer.

I’d like to call bullshit on both his rhetorical style and point #2 (Java is evil). Let’s start with the rhetoric.

Mr. Yegge plays a few rhetorical tricks in his writing. The first thing he does is partition off his intended audience, saying that he’s only writing for the young (high school or college) programmer because, “unfortunately most so-called experienced programmers do not know how to detect bloat, and they’ll point at severely bloated code bases and claim they’re skinny as a rail.” This allows him to write off all criticism from anyone who might know what they’re talking about as the irrational whining of jaded professionals.

Fortunately for us, the wise and intrepid Mr. Yegge is one of the few people in the world (in his estimation) qualified to lead us through this exploration of code bloat, having written a half-million line codebase all on his very own. This is another rhetorical trick. All of us can shut up, he says, because we just don’t understand.

Yegge also warns that we might hear a lot of “hand-wavy” complaints about his writing. This in particular is amusing because I’d say that his essay is a pretty good example of a hand-wavy non-argument. This is ironic given the fact that his audience is putatively comprised of the young and inexperienced, who could not be expected to fill in the missing details of his arguments. How does he dodge the details? Here’s an example:

However, copy-and-paste is far more insidious than most scarred industry programmers ever suspect. The core problem is duplication, and unfortunately there are patterns of duplication that cannot be eradicated from Java code. These duplication patterns are everywhere in Java; they’re ubiquitous, but Java programmers quickly lose the ability to see them at all.

I guess at least one Java programmer (ahem, Mr. Yegge) must have lost the ability to list any of them, since he simply moves on from this rather incendiary accusation with no supporting arguments. The essay is full of these sorts of a priori assertions. Dependency injection makes your codebase bigger, design patterns intrinsically lead to duplication, Java itself means that your code will bloat. Everything is asserted and nothing is supported. If he’s right about the things he says, you wouldn’t know from reading his words.

Mr. Yegge’s basis for claiming expertise in the matter of code bloat is his sizeable and single-handedly maintained codebase supporting some variety of MUD game. Props to him for writing the game, I’m sure it’s nice to play and an impressive feat of solo-engineering. However, if he’s saying that there is NO level of compression he could do to his codebase while keeping it in Java, I think he’s exaggerating. I agree that one can probably cut down on the amount of code in a given codebase to a greater degree working in a more high-level language, but to say that there is nothing that could be done to improve upon his existing Java code is likely false.

I think the more accurate way to read what Mr. Yegge is saying would be as below (my words):

I wrote this game in Java. The code has somewhat of a bloat problem and I’d like to do a major rewrite of it. However, Java has become unfashionable over the last couple years, and dynamic languages are much more in vogue with the software cognoscenti. And anyway, I already know Java quite well. If I’m going to do a rewrite of this big piece of software all on my own, I might as well do it one of the more fashionable dynamic languages like Ruby or Javascript/ECMAScript so that I gain some deeper expertise in those out of the process. JavaScript seems like the one to use, especially since the company I work for is getting behind it in a big way. Hey kids, everybody follow me over to JavaScript — maybe in a couple of years I can write a book on the topic and you can all buy it!

That would not be a bad argument to make, actually. It would certainly be more honest than the one he does make.