This one of gems that come across mailing lists every now and then:
http://www.haskell.org/mailman/listinfo/haskell
------------
Asked how she had come to choose GHC as the topic for her award-nominated PhD dissertation, freshly graduated doctor of software archeology Simone Tolduso revealed:
"At first, there were a few small curiosities that triggered my interest, like why were darcs patches sent to the cvs-ghc mailinglist, or why did GHC releases traditionally bundle the predecessor of the current Cabal version when the missing libraries depended on its successor?
But then I looked into the repository, with its layers on layers of build systems, source formats, deprecation warnings, directory structure fragments, todo logs, broken builds resulting either from OS-tools advancing and playing havoc with the built-in assumptions of fragile build configurations or from multiple, partially completed, mutually incompatible heart-liver-and-lung transplants supporting the newest language extensions (which of course were all needed to build the compiler branch supporting said features, and whose documentation tended to be spread over user manual, API comments, mailing list threads, research papers, plus half a dozen different Wikis and ticket trackers), supported by often outdated documentation in a never-ending variety of formats, and I knew I had stumbled onto a goldmine.
Not to mention remains of earlier projects (what were fptools, or libraries?), a variety of test and compilation languages (including Haskell, C, Perl, Python, alongside the usual scripting suspects), or the proliferation of sediment layers into user space by the simple, but ingeneous, means of binary incompatibility. In spite of its comparatively small size, the project was beginning to rival the complexities of other Microsoft products of the same period.
In what seems to have been an attempt to push open source ideas to their logical conclusion, you actually had to guess at the right combination of versions for a number of independently evolving toolchains, libraries, OSes, and use those to bootstrap from a consistent snapshot of the compiler, library, and sometimes even tool sources, or nothing works - a situation which was later increasingly exacerbated by the dispersion of the Haskell Cabal replacing coordinated releases. Preliminary mining of the relevant mailinglist and bug tracker archives suggests that binary releases were mainly public data points used to indicate intermediate states of GHC _not_ suitable for specific applications (apart from the obligatory Cabal pre-version lacking the new features needed for installing the extra libraries, other examples include versions of Data.ByteString _not_ based on the famous paper, _not_ supporting essential optimisations, or _not_ supporting API safety fixes). So there seemed to be no way to avoid direct access to the source repositories with their associated build processes and toolchains.
And let us not forget that, unlike the programmers at the time, we are in the fortunate situation of already having complete repositories for the pieces and dependencies involved. Finding matching versions is a non-trivial, but essentially combinatorial exercise, while for them, the process of building GHC would often have involved developing and submitting the patches that make up our repositories of all the pieces of software GHC builds depended on.
We still haven't found the key that enabled the ancients to navigate this labyrinth and to keep their toolchains up to date while still making any progress in their daily work, not to mention recording such progress via darcs (in itself written in Haskell, and not free of troubles). Agent-based simulations of developer communities at arbitrary slices through the repositories show the majority of agents getting stuck in a recursive cycle of installing, debugging, and updating dependency chains without ever reaching a productive state, so we do know that we are missing some crucial information.
Several of my correspondents have come to favour the somewhat controversial theory that the general programmer in those days must have been substantially more intelligent than people are today. And it does make sense, in a way - I mean, if anyone had been the slightest bit bothered by all this complexity, surely someone would have tried to simplify things?
Of course, my work has not all been happy progress: for instance, while there really was an 'evil mangler', the equally persistent rumour that GHC was named after some scottish town has turned out to be a wild goose chase (cf Appendix GC); my colleagues in dirt archeology assure me there was no town called 'glorious'. The 'real' archeologists, as they call themselves, had a field day laughing about my gullability there. Still, there are so many burried treasures in this area - just waiting to be investigated."
Dr Tolduso is currently working on a follow-on project, "Haskell by committee - design and syntax through the ages".
Dept. of Software Archeology, University of New Atlantis
(for immediate release)