Lambda on the Web. Or, how I learned to stop worrying and love linked lists.
Arrays and linked lists are the two classic datatypes for storing “collections of things”. Arrays typically are preferred in conventional programming tasks. They take up less memory, have better data localization, and are faster for look-ups, provided you know the index of what you’re looking for. However, I’ve recently come to the opinion that a lot of common web-oriented tasks just are not well-suited for arrays. I’m not saying arrays aren’t useful, but that lambda type operations on linked-list data structures are starting to make a lot of sense to me.
One reason that linked-lists are more useful is that web developers are usually not coding with heavily optimized, low-level languages like C. Many developers will code for virtual-machine platforms that do not allocate and handle arrays optimally for speed or efficiency. Adobe’s AVM2 is a good example of this, and several people have noticed that linked-lists can outperform arrays for a lot of relevant tasks. While there are workarounds for working efficiently in arrays, these can introduce a lot of cognitive overhead, especially when working with complex objects (such as XML/JSON data types).
On the server side, linked-lists are useful because it’s not always possible to store a simple index of items (people, pages, clicks, etc.) on one machine. This is one reason that Google has started to work more with Lambda type operations such as MapReduce, and to spread the calculation of PageRank scores over multiple machines in a flexible and more fail-safe environment. Lambda and linked list data types go together quite well because Lambda methods are normally intended to be “stateless.” For each step, a typical Lambda method usually only concerns itself with an individual element, a function, and some output element. For this reason, they can work on successive individual elements of a linked list without having to keep the whole list or index in memory. The drawback to linked lists are that you don’t have random access to any of the elements (you have to go through them in order), and you incur a significant amount of memory overhead for each element (each element has to contain a pointer to the next element, which essentially is a memory location… i.e. for a 32 bit machine, it’ll take at least 4 bytes). So, if you’re only storing (common) Integers in a linked-list, then the space required to store the pointers is going to be equal to the data itself. However, keeping these limitations in mind, they still can be very useful in appropriate contexts.
One of the reasons that I’ve been interested in haXe, is that it includes a Lambda class as part of its API. This fact, combined with its concise type-checking facilities, means that you can create very powerful Lambda type operations, and implement them with strong type checking. This is great for performance and debugging, and haXe’s target options for client side and server side could make this type of programming extremely useful.
So to summarize, array’s performance benefits can be lost when dealing with client/server side virtual machines, or in large scale server side information processing. I think it’s worth checking into some functional programming techniques to improve performance or scalability for some common web site tasks, and I think haXe is a wonderful place to start.
To this end, I’ve started to put together a library that extends the basic functionality of the haXe Lambda class, and offered it as part of an open source haXe library called “sugar-hx“. The library includes functions taken from different Lambda method implementations in other languages. It currently includes scan, unfold, zip, unique, bifurcate, as well as several “grouping” functions. If you use it, or have suggestions, let me know!