2007/09/28

My first steps with open source licenses (& LaTeX)

I’ve finally got around to learn a little bit about open source licences the other day. The whole premise seems easy enough: I write this code and don’t put restrictions on it for other peoples’ use. But the devil’s in the details, and there was a lot to get my head around at first. This is a short summary of what I’ve learnt (or, at least, what I think I’ve learnt).

First thing’s first: it’s a Bad Idea to make code public that doesn’t have a licence. You will be legally responsible, theoretically, for any bad things that happen resulting from others using that code. Secondly, it’s Not Possible to release code “into the public domain”, although many people claim to do just that in an attempt to obviate their copyright responsibilities. Copyright is automatically assigned and it’s legally murky ground to attempt to get around that (and varies from country to country in how successful you will be in that attempt).

It’s easy to say “well, my code will never be used by anyone else anyway, so it doesn’t matter if I don’t release it with a copyright licence” but that’s a little short-sighted. It wouldn’t be public if you didn’t think that anyone would find it useful, and if someone wants to re-use what you’ve written, the absence of a licence will prevent them from doing so, even if you’d like them to in principle. Furthermore, the absence of a warranty (again, theoretically) could get you in hot water if things turn out poorly due an error on your part. So free code must be licensed.

The question is then “which licence to use?”. You wouldn’t think this would be such a problem, but there’re heaps to chose from and many of them are quite similar. Making a good choice without knowing the details is more a matter of luck than anything else. Over at Google Code Project Hosting, they’re trying really hard to restrict the number of open source licences around by only offering a small number of choices for the projects they host; a laudable goal. And yet their list is still eight deep. Even if you want people to use your code essentially without restriction, there are three to choose from: the BSD, MIT, & Apache licences. Which to choose even in this simple case? I’ll discuss their differences five paragraphs hence.

There are three broad classes of open source licence that can be summed up by three specific “best practice” ones: the GNU General Public Licence (GPL), the Lesser GPL (LGPL), and the Apache Licence. The GPL is probably the most well known and popular free software licence: it requires that the work be distributed with its source code and stipulates that derivative works also follow the GPL. This ensures freedom at all costs, with the expense of flexibility; you’ll never see GPL code turn up inside proprietary products (illegal exceptions notwithstanding.

The LGPL was written to allow proprietary software to use the functionality of GPL-like free software without having to open the entire product. A library with the LGPL licence can be used in a closed product without having to open the source for the whole project. I won’t really consider this class of licence too much here (the Mozilla Public Licence is similar). Suffice it to say that it’s a slightly more liberal license than the GPL for certain types of software.

Finally, the Apache license is a model example of a license that lets you do pretty much anything you like with the code. Not only is the code free, but it can be re-used where-ever you like, under whatever license you like. There’s an obvious tension between a “copyleft” license like the GPL and an Apache-like license: for the former the code is free and will always be free; for the latter, the code is free but someone might take it, improve it, and lock it up — which doesn’t help you any but you do allow it.

I’m in the Apache licence camp more than GPL: I’d prefer my code to be maximally useful to as many people as possible than restrict its use in order to ensure that it will “always be free”. Of course, if everyone used the GPL then that wouldn’t matter, but that’s simply not going to happen. I might change my tune if my coding were more directly useable in commercial products, however. I can certainly see the idealistic appeal of the GPL. (While I’m on the matter, the GPL recently had some major changes made for v3.0, and it’s apparently rather controversial. I don’t understand the whole matter at this stage so I’ll leave the intricacies of this licence for another time.)


If you don’t want to choose the GPL for similar reasons to me, let’s revisit the question “which licence to choose?” and discuss the differences between the various (popular) Apache-like licences. The distinctions are subtle but there are valid reasons for choosing between them. As mentioned, the big three are the BSD, MIT, and Apache licences, where the latter is a later and more formal extension of the ideas in the other two.

The MIT licence is the most simple: you can do whatever you like to the code (distribute, sell, modify, relicense), provided that “The [ … ] copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.” Even the text of the licence itself can be changed.

The BSD licence adds one condition on top: “Neither the name of the [organization] nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.” Sounds sensible to me.

The Apache licence is the one that I’ve been implicitly endorsing when I used it as the “best case” example in the beginning for these “no restriction” free licenses. I’m really taking a cue from Greg Stein of Google who says:

That is one of the reasons that Google chooses the Apache License (2.0) as the default for the software it open-sources. It is permissive like BSD, but (unlike BSD) actually happens to mention the rights under copyright law and gives you a license under those rights. In other words, it actually knows what it is doing unlike some of the other permissive licenses.

(Not necessarily an un-biased comment, I have to admit; he’s also Chairman of the Apache Foundation.) The additional terms in the Apache licence (over BSD & MIT) require changes made in modified works to be prominently marked as such. I like to think of such measures as “enforced politeness” — it’s not like people won’t be doing this in general anyway. I believe that the Apache licence itself cannot be altered, but I don’t actually know for sure.


Finally, the reason I got into all of this is from the various bits and pieces of LaTeX code that I have written. And they’re licensed under the LaTeX Project Public Licence (LPPL), which is different again to those I’ve already discussed above. It’s pretty interesting, and I think it deserves a little attention. (Link disclaimer above: at time of writing some of that Wikipedia page was written by me.)

Because LaTeX code almost always defines a document syntax (it’s a programming language of communication, essentially), it’s pretty important that things don’t change meaning without warning. I want a document that is typeset on my machine to be exactly the same on your machine under reasonably similar circumstances. While LaTeX is free to modify and distribute, they don’t allow people to take the code and alter it without potential users knowing that it’s not canonical. This follows the original licence of TeX itself, probably the earliest piece of free software still in use. (According to Wikipedia, Emacs was first released in 1984; development on TeX started in 1977 but the version most similar to the one we know today was released in 1982.)

To try and formalise TeX’s licence, the LPPL allows modification and distribution only under the proviso that the user is made well aware that they’re using a modification to that work. This is usually done with a change in name, but technically speaking minimal conformance could be achieved (and strongly frowned upon) simply by printing out a message on the console stating that the package you’ve loaded isn’t the original version. A good example is a conference proceedings document class, for which you certainly don’t want someone changing the margins or fonts without calling it something different!

So if only the copyright holder is allowed to make changes to the code without changing the name of the package, what happens if the original author loses interest in or can no longer work on the project? The LPPL also defines the concept of a project “maintainer”, who may make public changes to the work with the authority of the copyright holder. You can become a maintainer of a project either by being bestowed the title or (when the previous maintainer cannot be contacted) by announcing publicly your intent to start maintaining the code; maintainership falls to you after three months if your claim is uncontested.

None of this changes the problem of ensuring backwards compatibility in packages, but it goes a long way to ensure that documents remain portable into the foreseeable future. This is a laudable goal when compared to the philosophy of “closed source” document programs like Word Perfect or Microsoft Word, whose old files are sometimes now unreadable.


Now, in my explanations above I have omitted many specifics in order to try and get the ideas across about the licences I was talking about. Diving too deep into the legalese makes it impossible to get a broad picture of each licence to be able to compare them. Obviously, I am not a lawyer and my terminology could be improved but I hope that I got the gist across. (Also, I hope that I’ve understood it correctly myself!)

I’m using the British spelling for licence and license here (for noun and verb respectively; cf. practice & practise — I remember these rules by the mnemonic “ice is a noun”). When I talk about licences above, I’m referring to their current versions: 3-clause BSD, 2-clause MIT, LPPL v1.3c, Apache v2. One day I might understand the difference between GPL v2 and v3, but not at the moment.