Thursday, September 24, 2009

andThen andThen andThen... Scala, where's my car?

When I saw Dude Where's My Car?, several years ago with my wife (who is Chinese), she started laughing right as they pulled into the parking lot of a Chinese restaurant. I had no idea why, but she said to wait until the end of the scene. In the scene, after they order each item, a voice from the speaker says, "and then?" and continues after they run out of things to order until Ashton Kutcher eventually destroys the speaker in frustration. When the scene was over, I asked her why she started to laugh at the beginning of the scene. She said she had heard people quoting that at work, but she never realized that the Chinese name of the restaurant, which she saw at the beginning of the scene, was "And Then."

Appropriately, Scala defines infinite loops in actors using andThen. Here's how...

In Scala, one way to compose functions is "f andThen g", which is the same as
def newFunc[T](x: T) = g(f(x))
You can think of it as "evaluate f(x) and then evaluate g with that result". It's an easy way to chain functions and it's good for noncommutative operations, where "order matters," like matrix operations (the same type of idea you would use in implementing a safe navigation operator {quickly ducking for cover}). For instance, if you want make a function that takes a matrix, transposes it and inverts it and then returns the determinant, you could write that like this:
def transInv = transpose andThen invert andThen determinant
Those not familiar with functional programming will note the lack of arguments and return type in the definition of transInv. TransInv gets its arg types from transpose and its return type from determinant, so you can still use transInv(myMatrix). The definition of Function1[A, R].andThen is (here, Function1 takes an argument of type A and returns a result of type R):
def andThen[A](g: R => A): T1 => A = { x => g(apply(x)) }
Scala defines infinite loops in actors using andThen:
def loop(body: => Unit): Unit = body andThen loop(body)
So, if want to simulate the scene in Dude, Where's My Car?, using an actor, you can say:
loop {choose}
which translates to:
((_: Unit) => choose) andThen (_ => choose) andThen (_ => choose) andThen...

Monday, September 21, 2009

Xus: A Simple Peer-to-Peer Platform

A friend of mine wanted to implement peer-to-peer source code repositories with social networking features, like popularity ratings and such and Plexus was already on the way to being able to support something like that, so I decided to break out the back end of Plexus into a separate project, called Xus. We started talking about what such a thing could do.

Then the bottom fell out of Plexus.

Apparently, the FreePastry project is winding down. It was, in my opinion, the most promising of the major peer-to-peer platforms. It was straightforward and relatively small. We chose it over JXTA for our communication in Plexus. We actually started with JXTA, back in 2002, but JXTA was just too over the top with the excessive design patterns, the large surface area, the lack of a usability layer, and things of this nature. Why does Sun seem to delight in requiring so much ritual from developers to do the simplest things with their libraries? So Pastry is way cleaner than JXTA. Like JXTA, Pastry was made to scale to 1000s of nodes in a cloud and had interesting routing schemes and other mechanisms to promote scalability and reliability. We liked it a lot. We did experience a few problems:
  1. An improperly configured NATted peer could royally screw up the cloud, because all of the peers take part in routing and connect directly to each other
  2. We had to roll our own presence using our own heartbeats, because Pastry doesn't inform you when a peer disconnects and you couldn't easily enumerate the peers on a broadcast channel (publishing topic)
This is all moot now, because FreePastry is not going to be actively developed.


So I'm announcing the Xus project. Here's the web page and here's the source, so far. There's a VERY simple, console-based chat to test it. I don't have unit tests yet. I've only been working on this for about a week and I'm not in the TDD habit. Maybe I should be. Anyway, because of the FreePastry news, Xus is a peer-to-peer platform and the other stuff will be services on top of it. Here's a snippet from the doc:
Xus is a peer-to-peer protocol with a reference implementation in Scala. The point of Xus is to provide a peer-to-peer platform that is simple, powerful, practical, and easy to use. People can use it for games, social networks, source code repositories, or whatever.

  • Simple: able to implemented with a relatively small amount of code so that it's easy to maintain and fix
    • Layered: there are 3 layers. Layering separates the concerns of the protocol and makes it easier to understand
    • Self contained: the reference implementation does not rely on third party libraries, reducing cognitive overhead for future developers
    • Scala: reference implementation is is Scala, which allows for smaller code that is "less noisy" than Java
    • Straightforward implementation: simple, direct routing
  • Powerful: strong enough to support common p2p tropes and some new ones
    • Messaging: multicast, unicast, dht, and delegated and direct p2p messaging
    • Delegated messaging allows peers behind NATs to receive messages from other peers
    • File storage, backed with Git
    • Replicated properties for simple shared data
  • Practical
    • Medium scale allows you to know when peers disconnect
    • Works through NATs
  • Easy to use
    • Built-in port testing peers can request other peers to test their port configurations
    • API is small with a minimum of required setup
    • Reference implementation will include a UPnP port configuration tool (ported from the Plexus code base)
Xus is centered around "topic spaces," which are groups of topics. A topic space is a slice of the total peer-to-peer network, consisting of a medium amount of peers and each peer can connect to many topic spaces. Probably 100 or fewer peers in a topic space -- depends on the app. All of the peers in a topic space connect to the active owner, which does all of the message routing for the space. So, if a peer wants to broadcast a message to a topic, it sends a message to the active owner of the topic's space and the owner then sends a message to each peer in the space. Any peer can potentially be an active topic space owner if it can accept connections and it is, in fact, an owner of that topic space (Xus uses strong authentication for this kind of thing) . Peers which can't accept connections can still participate; they just can't be active topic space owners.

I use NIO in the reference implementation, so this routing isn't computationally intensive, but astute readers will notice that this broadcasting arrangement takes up a relatively lot of bandwidth for the active topic space owner, compared to everyone else. Hence the "medium amount of peers". By the way, Xus accomplishes broadcasting in a way that mirrors the most common architecture for first person shooter servers, so it CAN work extremely well (see Sauerbraten). There are issues and constraints, of course.

See, this is just the sort of thing that Pastry was trying to avoid by having all of the peers do routing, but it was also the thing that accounted for some of the major complexity in Pastry and all of the problems we had with it. Thus in Xus, broadcasting is limited to medium-sized groups of peers but this allows Xus to be more reliable and faster -- latency is usually much lower with this architecture. Xus is for when you want to build a peer-to-peer app quickly and defer scalability issues to when you realize that your project has, in fact, become wildly popular. At that point, there is nothing preventing you from making levels 1 and 2 of the Xus stack more scalable. You'll even be able to reuse the Xus protocol, probably adding messages and attributes that are required to support routing.

The protocol is XML-based and I'm using fastinfoset (which is built-in to Java) to take the fat off the wire. This allows you to use your favorite XML tools without worrying about how bloated the communication is. I'm using TCP for connections to simplify things. Maybe I'll look into SCTP, but for now, it's just TCP. The protocol is full duplex but does support request-response in cases where messages could fail, like direct p2p messages, for instance.

I'm not entirely sure whether to have broadcasts return a response with a list of the nodes that failed to accept the broadcast for whatever reason (security or lack of resources, perhaps). This sounds like it would be useful for some applications. Right now, Plexus is what's driving the requirements. If other developers see needs for things like this, they are welcome to jump on board the project and help out.

Sunday, September 6, 2009

Can functional programming really offer something substantial to Java?

This post on Scala-lang talks about research done on how the style of language affects productivity (the time it takes to understand other peoples' code). It is specifically comparing Java with Scala. Here is the paper that the post refers to.

Scala allows functional programming, as in Scheme, as opposed to procedural or object-oriented programming, as in Java, but Scala allows a nice blend of the two, which is a unique approach among programming languages. Yes, there was LOOPS and CLOS, but Scala's approach is quite different and I think it's much a more integrated one. There seem to be more and more posts out there about the importance of functional languages. Personally, I agree with the premise: that functional techniques lead to denser code and that can be (if used properly) easier to understand and maintain. Actually, this is related to one thing that I emphatically agree with the eXtreme Programming camp on: don't use accessors unless absolutely necessary (i.e. use refactoring to insert them when you have need of them).

There is a controversy here, though. Every now and then, over the years, I get extreme push-back on my coding style, because some programmers REALLY hate functional style. They act as if it's offensive -- as if people who use functional techniques are snobs and feel they're too good to use the straightforward, meat-and-potatoes approaches that the gumshoe programmer in the street uses. That's not it at all for me -- I use procedural techniques too. Just not all the time; not as a golden hammer.

It IS possible to use functional techniques in Java (using anonymous inner classes), but they are ugly and verbose. Nevertheless, I use them in places where I feel they're appropriate. Some people have complained bitterly about this in the past. That hasn't had an impact on my code, because of my position and the unique constraints and properties of it but in other circumstances I would be forced to write in what I consider a less effective way.

Maybe this attitude will change with the attention that functional languages are getting. I was a Smalltalk programmer in the 80s and 90s, and I remember the Java hype. A lot of Smalltalkers were saying that it would pass because it was just hype. After all, they said, look how inferior Java is with its primitive types, substandard collection library, and skeletal GUI support, compared to Smalltalk's! And anyway, it was just the new new thing. I figured Java was going to eat Smalltalk's lunch. I made noises so that people at our company got trained in Java and I made a technology that connected Smalltalk to Java so that Smalltalk could essentially use Java as a display medium, which I thought might slow Java's impact on Java (as of last year, there was still at least one Smalltalk project using that technology).

Are functional languages just being hyped now as the latest attempt at a silver bullet or do they REALLY have something to offer? I've always thought they were interesting and powerful and fun to program with. As I mentioned before, the functional paradigm has been around a long, LONG time: Church introduced Lambda Calculus in 1932 and Turing introduced his machine in 1936. They were both VERY different approaches to computing and the Turing Machine approach is currently "in vogue."

That may change. What if functional languages are shown to scale better in several dimensions, such as code comprehension and multithreaded programming? I guess we'll see, but it's starting to smell that way to me. I did a lot of programming in both LISP and Smalltalk, back in the day. When I switched to Java, the lack of lambda in JDK 1.0.X was a real pain. When they introduced inner classes in 1.1, things got better, but so far they really don't have the brevity of Smalltalk's blocks.

Groovy and Scala have nice lambdas and Scala really focuses on functional tools. My money's on a change that integrates functional tools into the JVM. Maybe that'll come quickly as more and more deep pocketed groups adopt Scala. That's how Smalltalk got mainstreamed in the 90s. Even if Scala doesn't get super broad adoption, if high tech, high finance, and infrastructure orgs continue to adopt Scala, there will be a nice little Scala niche for a lot of people I know.

Tuesday, September 1, 2009

Livecoding: connecting the dots from Lambda Calculus to Gibson's cowboys

Dot #1: Lambda Calculus...

My wife and I took a vacation recently and I had a chance to learn some more Lambda Calculus; how the Y-combinator works, in particular. At Purdue, we focused on Turing Machines and either we didn't really cover LC or I slept during that part. If you haven't, I HIGHLY recommend learning LC. Lambda Calculus is the most elegant system I, personally, know of (maybe that's not saying much). You functional-types out there will already know this next part, but here's some history and stuff.

It was invented by Alonzo Church, who was Turing's Ph. D. adviser (later they proved that the Lambda Calculus and Universal Turning Machines are equivalent, which was one really impressive feat). The Y-combinator is a way to define recursive functions using only lambda binding. Lambda binding is very simple and it's pretty much what Lambda Calculus is. It more or less just lets you assign values to variables and then use them in an expression, like this (in LISP):

( (lambda (x) (print x)) 50)

(lambda (x) (print x)) is a function definition that says to print whatever x is bound to. Using it like in the above "applies" the function to the argument by first "binding" x to the value 50 and then "evaluating" the expression (print x). Pretty simple concept. The trick is how to define recursive functions using this mechanism, since both the set of variables to be bound and the expression to evaluate are separate. In order to be able to call a function, you either need to pass it in as an argument or have it defined by an "outer" function.

A recursive function can't take itself as its own argument and it's also not defined in an outer scope, so it essentially has to be given a copy of itself as an argument. The Y-combinator is a relatively clean way to do this. I wasn't able to understand it while we were out on our vacation, because I (deliberately) left my computer at home. Once home though, I was able to type things into LISP and play with them to finally be able to grok the combinator (hmm, "grok the combinator" sounds like some sort of euphemism).


Dot #2: Scala...

So, that launched me on a journey to rediscover functional programming, since I hadn't messed with it really since the late 80s, when Roy Riggs and I (TEAM CTHULHU) were working on our MUD. My friend, Serj, reminded me of this functional language for the JVM called Scala and told me that James Strachan wrote in a blog post that if someone had shown him the new Scala book back in 2003, he never would have written Groovy. Groovy has been my favorite scripting language for quite a few years, because it integrates with Java so well (low barrier to adoption for Java coworkers). So, I started messing with Scala. I even made this blog so I could post some musings and info about my Ober project, which I updated from JavaScript to Scala. I recoded a bunch of Groovy stuff for Dataswarm in Scala and that's worked out really well.


Dot #3: Scheme...

I met this guy on a plane to Israel a few months ago. He's in his 50s and he got his masters in education, applying virtual worlds to education. he's not a programmer (yet), but he's been wanting to essentially "do cyberspace" for education since 1996. He got a 17 year old Israeli programmer to work with him and make a bunch of interesting mockups. They tried to commercialize it and whatnot, but never really got anywhere. When I told him about Plexus (which is currently stalled, by the way), he was sort of shocked into action and really got inspired to work on his project again. I asked him that since Plexus was open source anyway, why not build on top of it instead of from scratch?

Eventually, I realized that he really needed a better way to get a feel of what's currently possible, etc. and that would require him to know something of how to program. I looked for a good intro to pure programming, with little between him and the algorithm and that led me to PLT Scheme (http://plt-scheme.org/). There's a GREAT book on programming that basically plows through a few semesters of CS very quickly (How to Design Programs here: http://docs.plt-scheme.org/). The quick intro also does graphics on the command-line, which is pretty slick, letting you define shapes that print out like smileys. This reminded me of "the good old days" in our LISP MUD when we could type in code and change the game as it was running.


Dot #4: testing and log analysis...

I've been doing a lot of log analysis with our technology at work. We're working on making it "bullet proof" now, with the goal of being able to remove the battery from a laptop in a demo, sticking it back in, and starting back up without data loss (after the file system recovers). This requires some heavy journaling and online backup. Test runs can be long and produce giant diagnostic logs (0.5 - 2G). I have to comb through these logs and find errors. Then I have to trim them and transfer them from Kansas to my machine in Israel so I can inspect them more closely. This takes a lot of ad-hoc scripting and the needs change after I fix each bug. This reminds me of what we did with our MUD, because I'm "writing code as I go". One of my hopes with the Ober project is that I can have a nice tool to do this with.


Dot #5: Gibson's cowboys...

I started rereading Neuromancer and it's even MORE impressive than it was the first time I read it. Early in the book, Armitage tells case that he was there when they "invented" Case -- i.e. when they created the software cowboys use for net running. This reminded me of typing code "in-game" with our MUD and I remembered how cool it was to adapt the game as people were playing it. So I was thinking that there ought to be some powerful stuff programmers could do that we might not really be doing yet.

I had told Serj about PLT Scheme. He's early in college and I'm trying to mentor him to be a guru -- I think he has the potential. He started messing with Scheme and reading the book and sent me a link to "fluxus" http://www.pawfal.org/fluxus/ This is a "livecoding" system. People are using this to actually code music and graphics in front of an audience: http://toplap.org/uk/about/ The first vid is fairly poor and what you might fear such a performance would be like (sorry guys who made the vid), but the rest are very interesting, I think. Again, kind of the type of thing that we used to do with our MUD.


Dot #6: livecoding in Scala...

So people are working on creating systems specifically to accomodate changing programs as they run. You could say "that's what a shell is" and you'd be right, to an extent. Shell tools are powerful ways to slice and dice text, but for things like the log analysis I'm doing, I think I need something more like an IDE. Something that gives programmers an environment they can use to modify code and test it in a tight loop, with access to the code, the data, and intermediate results as the program is running, like in the Haskell Hackery performance here http://toplap.org/uk/about/ it's well worth watching. That's more like EMACS or WILY than a shell, I think.

I'm hoping Ober can do something like this for Scala, backed with something like Eclipse, to get syntax highlighting, and whatnot. One idea might be to make Ober an Eclipse plugin instead (or in addition to) a stand-alone environment. It might be interesting to "skin" Eclipse and remove the right-button menus, replacing the behavior with Oberon menus (or 'tags' in Wily -- a single Oberon menu bar above all of the editors might be sufficient, with some commands to open subpanels or something). Perhaps running in the debugger will help or ClassGhost (http://classghost.sourceforge.net/).

One sticky issue is that Java can't redefine arbitrary methods in a running VM -- you can't redefine a method using a different signature as you could in Smalltalk and you can't add or remove fields. A quick check of the new features of JDK7 doesn't show any evidence of change to these limitations. Coding conventions would help with this (kind of like writing "structured COBOL"). As opposed to hotswapping methods, you can just "reload" a class, which defines a NEW class with the SAME name. Production servlet hotswappers are familiar with the perils of this.

Ober's command/namespace registry makes it a system like a shell or Oberon. Indirecting through the registry instead of using hard pointers allows you to reload a class file without worrying about signatures, provided the fundamental command interface didn't change (i.e. the command itself should be the only thing keeping references to its objects). A central storage area where commands could place simple data would allow commands to keep state that was protected from redefinition and they could be initialized with this state.

So here's a simple way to get what I think would be decent support for livecoding with Scala:
  • run Ober out of the debugger so you can use hotswapping when possible, using Ober to access intermediate results and as a scratchpad for data munging and Eclipse to change code
  • make the class loading namespace check for hints in the class, like annotations for namespaces and such
  • provide per-command data storage from the Ober object (a trait can make this even easier to access)
  • make a Reload command as a fallback for when hotswapping doesn't work
  • make a log that appends useful Ober commands, such as prefilled Reload commands and access to previous command results and provide easy access to this from the Ober object and a command to show the log
  • keep commands clean -- don't hold references to other commands' objects; go through the command registery to get them
  • simplify access to the command registry to make it simple for commands to use it to access each other