Category Archives: Java

Anything related to the Java programming language

Java Open Source Ruby Software Engineering The Business The Innerweb

QCon San Francisco 2010 Redux

So, I just got back from QCon SF 2010 last night.  All in all, a very good conference.  Rather than write up any kind of extensive summary, I’ll offer up my rapid digest of the major themes from the sessions I attended.  Without further ado, here it is in outline form:

  1. Dealing with data at large scale
    1. OLTP
      1. Those who can get away with it are using systems that have more flexible consistency models than traditional RDBMS (CAP theorem trade-offs)
        1. Most using some form of eventual consistency
        2. Many sites implementing their own Read-Your-Own-Writes consistency on top of more general storage systems
        3. These systems must deal with data growth (partitioning data across nodes)
        4. Must deal with hot spots (redistributing / caching hot data across many nodes)
        5. Must deal with multiple data centers (some are simply punting on this)
      2. Twitter and Facebook both built their own key-value stores on top of MySql, Memcache
        • Twitter’s solution seemed a little cleaner, Facebook’s a little more crusty
      3. Amazon S3: also key value store with own caching, replication, consistency models
        • This one had the most sophisticated seeming solution for dealing with hot spots
    2. OLAP
      1. Lots of people using Hadoop to crunch offline data
        1. Good tools for workflow of jobs, dependency management, monitoring are essential
        2. Quantcast found that EC2 was not adequate for their needs in terms of throughput compared to an owned, highly-tuned cluster, though it has improved over time
          • still good to have on hand for surge capability
    3. Operating on the public cloud
      1. Increased demand for monitoring — and most monitoring tools not built for cloud instances that wink in and out of existence
      2. Increased demand for fault-tolerance — latency can vary more widely, hardware failures happen out of your control
      3. Increased demand for sophisticated deployment automation
      4. Motivation is that you want to use a cloud, not build one
        1. Capacity planning is difficult when you’re in a huge growth scenario
        2. Leverage the staffing and expertise of the public cloud companies (Amazon, Gigaspaces, etc)
        3. Data center is a large, inflexible capital commitment
      5. Traditional CDNs are still necessary and useful for low-latency, high bandwidth media delivery
      6. PCI compliant storage in the cloud is not a solved problem
    4. Serious interest in alternative languages, both on and off the JVM
      1. There are lots of serious choices available in this sphere (scala, jruby, javascript -> node.js, erlang, clojure)
      2. Lots of enthusiasm for JVM, less enthusiasm for oracle’s ability or intention to be good stewards of it
    5. Though there were many very good sessions, especially in the Architectures You Always Wondered About track, in terms of sheer rock-star appeal these two presentations appeared to be the standouts that had everyone talking:
      1. LMAX – How to do over 100k concurrent transactions per second at less than 1ms latency
      2. Node.js

jdb: pretty f#*king useful

I rediscovered the joys of using Java’s built in command line debugger, jdb, today.  I’d gotten so used to using the graphical debugger in Eclipse that I’d forgotten all about jdb, but lately Eclipse has been choking on debugging projects with many maven dependencies for me.  I really needed a low-tech, high-reliability solution for debugging and going back to the old “add logging statement, recompile, run” routine was not the answer (yes, it helps in production, but in your own dev environment you deserve better).  jdb actually does the job pretty well.  It could definitely do with some improvements, like more shell-like line editing behavior and the ability specify to specify alternate init scripts at runtime.  But like the can opener on a Swiss Army knife, it will get the job done.

Java Software Engineering

Yegge’s Code Bloat

This is another quick foray into blog criticism in response to a recent missive, Code’s Worst Enemy, by the inimitable Steve Yegge. Let me preface my criticism by saying that I’m sure Steve Yegge is a nice guy, and I realize that he hyperbolizes for effect in his blog. I am merely doing the same here. So, to the matter at hand…

Mr. Yegge’s post has a few main points:

1) That code bloat is a serious problem. He claims to be somehow in the minority opinion on this, but actually I think it’s kind of a noncontroversial point. No one likes code bloat. In fact, the very term “bloat” is deliberately negative. If anyone felt neutral about it, it would be called “code growth”.

2) That the Java programming language and Java programmers are the worst offenders when it comes to code bloat.

3) That dynamic languages (a la Javascript, Python, Ruby) are the answer.

I’d like to call bullshit on both his rhetorical style and point #2 (Java is evil). Let’s start with the rhetoric.

Mr. Yegge plays a few rhetorical tricks in his writing. The first thing he does is partition off his intended audience, saying that he’s only writing for the young (high school or college) programmer because, “unfortunately most so-called experienced programmers do not know how to detect bloat, and they’ll point at severely bloated code bases and claim they’re skinny as a rail.” This allows him to write off all criticism from anyone who might know what they’re talking about as the irrational whining of jaded professionals.

Fortunately for us, the wise and intrepid Mr. Yegge is one of the few people in the world (in his estimation) qualified to lead us through this exploration of code bloat, having written a half-million line codebase all on his very own. This is another rhetorical trick. All of us can shut up, he says, because we just don’t understand.

Yegge also warns that we might hear a lot of “hand-wavy” complaints about his writing. This in particular is amusing because I’d say that his essay is a pretty good example of a hand-wavy non-argument. This is ironic given the fact that his audience is putatively comprised of the young and inexperienced, who could not be expected to fill in the missing details of his arguments. How does he dodge the details? Here’s an example:

However, copy-and-paste is far more insidious than most scarred industry programmers ever suspect. The core problem is duplication, and unfortunately there are patterns of duplication that cannot be eradicated from Java code. These duplication patterns are everywhere in Java; they’re ubiquitous, but Java programmers quickly lose the ability to see them at all.

I guess at least one Java programmer (ahem, Mr. Yegge) must have lost the ability to list any of them, since he simply moves on from this rather incendiary accusation with no supporting arguments. The essay is full of these sorts of a priori assertions. Dependency injection makes your codebase bigger, design patterns intrinsically lead to duplication, Java itself means that your code will bloat. Everything is asserted and nothing is supported. If he’s right about the things he says, you wouldn’t know from reading his words.

Mr. Yegge’s basis for claiming expertise in the matter of code bloat is his sizeable and single-handedly maintained codebase supporting some variety of MUD game. Props to him for writing the game, I’m sure it’s nice to play and an impressive feat of solo-engineering. However, if he’s saying that there is NO level of compression he could do to his codebase while keeping it in Java, I think he’s exaggerating. I agree that one can probably cut down on the amount of code in a given codebase to a greater degree working in a more high-level language, but to say that there is nothing that could be done to improve upon his existing Java code is likely false.

I think the more accurate way to read what Mr. Yegge is saying would be as below (my words):

I wrote this game in Java. The code has somewhat of a bloat problem and I’d like to do a major rewrite of it. However, Java has become unfashionable over the last couple years, and dynamic languages are much more in vogue with the software cognoscenti. And anyway, I already know Java quite well. If I’m going to do a rewrite of this big piece of software all on my own, I might as well do it one of the more fashionable dynamic languages like Ruby or Javascript/ECMAScript so that I gain some deeper expertise in those out of the process. JavaScript seems like the one to use, especially since the company I work for is getting behind it in a big way. Hey kids, everybody follow me over to JavaScript — maybe in a couple of years I can write a book on the topic and you can all buy it!

That would not be a bad argument to make, actually. It would certainly be more honest than the one he does make.


Ugly But Proud: the instanceof operator

Ah the unlovely instanceof operator — that poor cousin of the reflection API and redheaded stepchild of Java’s object-oriented programming model — does it have no friends in this world? Was it brought forth from the depths of James Gosling‘s mind only to instigate mass hackishness? If you’ve read almost any book on Java programming, you’ve no doubt been warned repeatedly that use of instanceof is a warning sign on the road to BAD OBJECT ORIENTED DESIGN. The accusation does have its merits. Abuse of instanceof can often be a lazy man’s stand-in for a more properly thought out use of polymorphism. But we should not be so hasty to throw stones at this humble and venerable Java language feature. instanceof is not evil, it’s simply misunderstood.

The type of example often used in warning young programmers away from instanceof goes something like this:

public void doSomething(MyClass obj) {
  if(obj instanceof MyClassDescendant1) {
    // take one path...
  } else if (obj instanceof MyClassDescendant2) {
    // take another path...
  } else if (obj instanceof MyClassDescendant3) {
    // take a third path...

Clearly, if we had instead declared the doSomething() method as a member of MyClass and allowed subclasses to simply override the method as needed to specialize the behavior, then we could have the much cleaner:

// using a factory simply to illustrate that
// we don't know what the actual instance type is
MyClass obj = MyClassFactory().newMyClass();

// the appropriate subclass method is executed

So one should avoid using instanceof to select behavior based on object type when polymorphism could do that in a much cleaner way. But if you’ve spent any time programming in Java, then you’ve seen that instanceof is commonly used. Often it’s abused, but sometimes it’s used with good reason. So, what are the guidelines for its proper use?

1. Use instanceof for parameter checking. For example, if you’re overriding Object.equals() in one of your own classes, a common first part of such a method is to write:

public MyClass {
  // ...
  public boolean equals(Object o1) {
    if(!(o1 instanceof MyClass)) {
      return false;
    // rest of method...

2. Use instanceof when you are obliged to program to an interface that involves objects typed as one of those empty “marker” interfaces as parameters. A perfect example can be found by looking at the interface. If you endeavor to implement this interface, you will quickly find yourself using instanceof, since the single parameter to the handle() method is an array of Callback objects, and Callback is an empty interface.