Tuesday, December 27, 2011

Code Google

UPDATE: Added structure and a few more ideas...

Intro

Have you ever seen someone include a giant open source library in a project, just to access some small part of its functionality? It's very important to have someone else write, debug, and maintain a large part of your code, but when you do this over and over, it can add quite a lot of bloat to an app for comparatively little gain in functionality.

This is a huge problem, as I see it, but maybe there is a solution?

What about allowing people to integrate only small subsets of modules into their code. I'm calling this idea "code google," for lack of a better term, although it's a LOT more than just search. It involves search, analysis, IDE integration, and social networking, so that library developers can track what parts of their code are actually being used. It's far from trivial to implement, but I think it could be very useful

Before our company started, a friend and I wrote a fair-sized game in LISP (a couple tens of thousands of lines) and later versions used OO languages (both class-based and prototype-based).  My friend and I seem both to be coming to the conclusion that what is happening in the Java world is not a good thing.  A lot of modern programming seems actually to be just gluing together libraries and frameworks.  Many times, frameworks are chosen because of a few of their features and glued onto other frameworks similarly chosen.

Module systems concentrate on versioning in the large -- each framework is versioned as a whole, despite the fact that so many people (apparently) pick a small set of features and use that.  Programmers achieve modularity by marking off a segment of code and declaring it to be a module.  APIs are defined explicitly by developers, but I, on the other hand, often need to use implicit APIs that the developers didn't consider to be "first class," when they wrote their code.

Code Search

What I think is needed, in addition to explicit module declaration, is a "code Google" that searches the vast sea of code and finds a small subset that does what I need at the time.  Curiously enough, this already exists, in part, for Haskell programmers, with Hoogle, that lets you search for functions based on their type signatures.  It might be hard to imagine how this could possibly work in Java, because you won't necessarily know the name of the actual type you are looking for, but it becomes more practical as Java gets closer to LISP (or Haskell), because it becomes easier to express generic behavior without knowing as many explicit names.  A "code Google" for Java could allow you to inline a method signature without knowing the name of the SAM type that the code needs.  This could be done before Java actually has a lambda syntax.

Closures will go a long way toward making Java simpler to use, since so many design patterns are trivial with them -- I made a command-driven framework in Java 1.0.4, before there were even anonymous inner classes and had to make a separate class file for each of several hundred commands.  You can look at how much much easier AWT became after the introduction of anonymous inner classes for an example of this detangling; you don't have to subclass Button, anymore, when you make a GUI, you just make an anonymous event handler inner class.  In Java 8 (or later?), each of these 1.0.4 files would just be a lambda expression -- same anonymous inner class mechanism as in Java 6, but it's a LOT easier on the eyes.

Code Snippet Retrieval

In addition to search, Code Google should be able to retrieve the dependencies of the code snippet you need and pack it all up for you in a library jar with source included.  Hoogle does not do anything like this -- it just gets you to the module version.  Again, as Java gets closer to LISP, dependencies won't need to be as tangled up because as closures become easier to use and more prevalent in code, it becomes easier to write code that stands on its own (see the AWT evolution for examples).  It's already possible to do this in Java, by using anonymous inner classes, but these are verbose and a lot of people don't like to use them (except when they write event handlers :) ).

Social Networking and IDE integration

Once you have the snippet, all jarred up, there should be a way to track the search query and the version of the module it came from, so you can automatically update your code.  When someone downloads a snippet, there should be a social networking tie-in with tools that the developers use so that they can see an annotation in their code to let them know that people are using that particular API, so that the the developer doesn't unintentionally change or remove that implicit API. If the search does fail because the code is no longer in a module, someone can still spawn another open source module to support it.  The social networking support should also indicate how many people are using a piece of functionality and allow people to attach comments to the code so the developers and other users can see them.

As an alternative to IDE integration, there should also be command line and web-based tools for your project.  Mods to Gitweb and Fossil, maybe?

Summary

  1. Code Search indexes vast amounts of code and lets people search for what they need by specifying type signatures in addition to just unstructured text
  2. Code Snippet Retrieval incorporates a snippet and its dependencies into your project's codebase and includes metadata that describes where the snippet came from (the module, version number, site, etc.)
  3. IDE Integration puts this information at your fingertips so users can be informed when new versions are available and developers can be informed which pieces of their code that people are finding important and how many people are using the code
  4. Social Networking lets people comment and collaborate with the developers to keep them connected to the users of their code

The Eclipse Code Recommenders plugin looks like it's trying to do the search portion, at least.