Thinking Erlang
January 10, 2009
I set out this morning to learn how Joe Armstrong handles responses in his Simple Web Server.
But pressing questions about Erlang have been gathering in my mind like looming thunderclouds. Felt it might be more productive to tease them out into the light of day.
The most nagging question is this:
It’s been far more difficult than I expected to trace the core logic of Joe’s Web Server. And, it’s been damned tedious, as you may have noticed.
Why should this be?
No doubt much of the problem stems from my own inexperience with the language. Certainly my ignorance of anonymous functions is a stumbling block.
But I talked with a young Austrian programmer a few months ago who warned me that Erlang source could be rather opaque. And, I’ve noted passing references to this problem in various Erlang-critical blogs.
With your indulgence, I’d like to get your input on this before we continue on. Rather than criticize, I’d rather analyze, synthesize, and strive toward deeper understanding. I’d even hope that we could strive toward an explicit approach to Erlang development and documentation that would vastly simplify code review.
If such methodology already exists, by all means please point the way.
As I see it now, the problem is four-fold:
1) Processes
Cheap concurrent processes are the heart of Erlang. But as Joe spins them off in his Simple Web Server I find it increasingly difficult to maintain clarity of mind. It’s like trying to keep track of all the magician’s spinning plates. I try to make little bubble diagrams, but still lose track of who’s passing what to whom.
2) Factoring
Factoring is the art of breaking down complexity into a set of simple essential functional components, then explicitly denoting the interfaces between them.
It appears to me that factoring decisions face us at three levels in Erlang:
Functions: Taking into account performance constraints, how do we design functions that are simple and easy to understand, yet maximally powerful in combination with other functions?
Modules: Erlang groups functions into modules, presumably to aid conceptual understanding of the program, e.g. hide the complexities of individual function design; and to provide useful “toolkits” or libraries of functionality that can be reused across programs.
The issue here, again, is how do we most efficiently map module functionality to the conceptual components of our various problem domains? In other words, how do we decide which functions should go where in our module scheme?
Processes: How much functionality should we include within a process? When and how should we decide to spawn off a new process?
In general, I found the functions that comprise Joe Armstrong’s Simple Web Server clean and easy to understand. Moreover, the conceptual factoring of modules into web server, http driver, and tcp driver makes sense on the surface.
But the concurrent processes that comprise Simple Web Server seem to run cross-grain with the modules. To understand a given process, I needed to move across different modules. It was this shift back-and-forth across modules, it seems, that tended to break my thread of thought.
I sent Joe Armstrong an e-mail asking his thoughts about factoring Erlang source code. Haven’t heard back. Perhaps he never caught it.
3) Anonymous Functions
Much said in earlier posts. I’m still not confident that my walk-through of Joe’s Simple Web Server to date is 100 percent correct, largely because I lose track of which anonymous function is being passed where.
4) Documentation
Erlang has excellent documentation standards and tools. Joe neglected to avail himself of them.
So, do my difficulties following the logic of Simple Web Server ring a bell with you? Is this a generic problem with Erlang? Or is it a function of a given Erlang programmer’s style? Or is it my own ignorance?
What’s your experience? Does tracing Erlang source become easier as one accumulates understanding of the language? Is there profit in continuing this discussion? Or would you rather move on with the walk-through? Or maybe move on to another topic altogether?
Please Let me know.
January 10, 2009 at 8:47 pm
The most difficult for me thinking in Erlang was similar. I didn’t have problems with anon. functions or anything, but I really struggled with the added dimension of processes. I only really started to get a handle on it when my mind, trained to compartmentalize functionality in modules (or objects or namespaces or whatever) finally started accepting that the erlang processes represent a whole new dimension. A single module will often have functions that run in completely different process-spaces. There is usually a function that starts up a gen_server, for example, but that runs in the process-space of the calling code. Then there are the functions that run in the gen_server loop, and then often you spawn other functions within the same module that run in their own process space. In other words you may have module_1 that has 3 different process spaces, and/or you may have a process space that utilizes functions in multiple modules.
That is to say, the process dimension isn’t mirrored by the module dimension- it’s, well, a completely new dimension.
Initially I would try to separate out my modules so that their functions would be in the same process space (more or less)- but my brain eventually grew to accept the new dimension. Only after having written several complexly interacting gen_servers with spawned functions myself, though.
Now that my brain accepts the new dimension, it’s very difficult going back, and I have no problem writing or reading modules that mix “process space.” I’m having a hard time articulating this, but the point is that one day the added dimension does click, and, as per the Blub Paradox, you’ll be unable to explain to Python users (for example) why you feel so constrained by that language these days…
I’d add also that while we love Joe’s contributions to the understanding of the language and appreciate his code- he is (and considers himself) an Erlang evangelist, not necessarily a hard core Erlang coder. For amazing code that’s well documented look into eJabber or eTorrent.
January 10, 2009 at 11:01 pm
Thanks, Joseph, for your really thoughtful reply.
I think I dig what you’re saying.
I’m wondering, are you aware of any meta-notations, graphics graphic doodles, etc., that can help one keep the spinning plates clearly in sight until one’s sixth sense for process dimensionality kicks in?
Thanks again,
LRP
January 11, 2009 at 4:23 am
Something like sequence diagrams ( http://en.wikipedia.org/wiki/Sequence_diagrams ) may be useful to model control flow between processes, for a specific “protocol” implementation.
January 11, 2009 at 4:33 am
Thank you!
I’ll take a look.
LRP.
January 13, 2009 at 12:04 am
I agree with the comments above about the extra dimension of processes just being a new concept to internalize when reading Erlang code. It just sinks in after a while.
Regarding Joe’s use or non-use of tools, such as the documentation tools – remember that Joe’s use of Erlang predates the existence of these tools, and that masters of a craft sometimes do things that their followers shouldn’t necessarily emulate.
Jim
January 13, 2009 at 12:47 am
Hi Jim,
You’re the second person to urge caution in emulating Master Joe too slavishly. And your respectful rationale makes sense.
Do you have favorite tools for documenting or otherwise enhancing the productivity of your Erlang programming?
Thanks,
LRP
January 13, 2009 at 10:38 am
Yes it was certainly a little difficult to grasp at first for me, but I was so excited by the possibilities I kept at it. My knowledge of Javascript and using anonymous functions there helped immensely.
I think you’re being tag-teamed by being new to the functional paradigm and Erlang’s original concepts all at once.
January 13, 2009 at 5:10 pm
Much appreciate the encouragement, Matt.
I’m certainly excited by possibilities of the language. And, I think I’m beginning to see daylight conceptually. Now just need to write a bunch of code to build experience.
I have found any number of published Erlang programs that I’d love to play with but find user docs insufficient for one reason or another. This is one of my motivations for learning how to probe more deeply into published source code. I’m hoping to get good enough at it that I can share my findings with others.
Thanks again,
LRP