2007/10/23

Technical libraries for technical times

I hope that libraries of technical information are going to be unrecognisable in the future. And I hope that information is going to become globalised and centralised. This post marks the first solid thoughts I’ve been able to put together on some ideas I’ve been vaguely musing over in the last few months, I guess. The themes of these ideas are about how we are taught and how we learn and how we research. Obviously, my viewpoint is going to be very biased to my experiences, but I hope that my ideas here can eventually be generalised.

There are several projects around the place to create open centres of learning, an initiative that I strongly support. Unfortunately, the problem so far seems to be that it’s an incredible task to put even a semester’s worth of learning material together, and few people are creating content for these websites. Examples are Wikiversity, Wikibooks, The Open University and Connexions. Browsing through these websites reveals extraordinary nuggets of information completely out of context, and shows how very far we have to go before it’s possible to access learning materials for an entire discipline like Mechanical Engineering (to use something I’m familiar with).

(Note that these projects vary compared to Google Books or OpenLibrary or even pioneer Project Gutenberg, which all simply collect books without linking them together nor providing facilities to create new or edit current books. Both kinds of project have their place.)

But there’s more to the problem than getting a thousand engineers to write a thousand books and calling it a day. When we say we want open content, that’s not enough. The content has be written for a purpose and needs to be written differently depending what it’s being written for. If we could imagine the ideal case where everything we wanted to know was linked though a giant library, how would we be using that library? I break it down into three categories: learning, reference, and research.

The similarities between learning and reference is much greater than between reference and research. (Indeed, I’m not even sure about the “research” layer at this stage. More on that later.) Much of their content could even overlap. But whereas a reference book will be explicit and terse, a learning book will have analogies and examples and tutorials and may very well skip the detail that makes a reference book what it is (dry and boring — no, I jest).

But remember that we’re no longer talking about books any more. This information would exist in “blobs” in the library to be chained together in whichever order made sense for the application. Control theory is widely applicable over at least mechanical, electrical, and chemical engineering, but the teaching methods between them can often vary considerably. Similarly for the more fundamental maths that underpins the more rigourous engineering subjects.

And this chaining, I feel, is one of the fundamental advantages of a central store of information. Places like Wikiversity might have modules that are related to each other, but the best they can hope for is a cross-reference to link them together. It’s impossible to reduce science into such small pieces of “things to know” that they can be placed in a linear fashion and be absorbed all at once. There are branches, dead-ends, intersections, and circular loops that defy any canonical reference. For different applications, different references need to be written. By chaining blobs together, not only can material be re-used efficiently, but consistent terminology can be used across all scientific disciplines.

Greater abstractions can only be built on top of steady foundations, and as more and more becomes known about the world we’re approaching the limits of what we can learn in the four or five years we’re given as graduate researchers. And this is a where that “research layer” I spoke of earlier comes in. Every new research student, guided or not, will follow a literature trail in the subject of their thesis. Their evolving bibliographic database is a representation of the “information space” their have mapped by the research they’ve managed to find, and they’ll proceed to carve out their own little niche in that space.

I’ve observed in my own research that my literature search is never complete. And it’s obvious reading others’ papers that theirs never is either when you find similar papers published years apart. Sometimes all I want to do is catalogue as much research as I can find, and this is where the seeds come from for an idea of a framework for documenting ongoing progress. Why should two researchers working on opposite sides of the world have to replicate each other’s journeys into finding work in their field that’s been done years before?

I’d like to see “literature review” as a giant web of cross-references which differs from a reference library in that old work won’t be forgotten, exactly, just hidden away behind the newer work that encompasses it. When a new research book is written, it can cover years of work in a field for which those papers are now, in a sense, obsolete. This resource would allow “forward linking” for random papers that you stumble across so that you can easily follow what research might have come out from that work. And if none — is there scope for more research?


All of these ideas have been glummed together over the last while as I’ve had time to tack them together. The concepts are muddy in my head and I’m not even sure how feasible this project is. Perhaps it’s impossible. Probably it’s impossible, at least today. I’ve got many more ideas and details in my head but I’ll let them ferment for a little longer.