Open source software is broken, and it's not getting better

September 14th, 2024

I've been trying to write this blog post for six months. I started writing it back in March of 2024. This is the 7th iteration. It's been really difficult for me to find the right words to express my feelings about the current state of affairs of Open Source Software (OSS), because it's such a delicate topic.

Open source software is broken.

I'm pretty sure it's not the first time you've read this or a similar statement. You probably also saw that (now) famous meme of the overworked Open Source Software (OSS) dev, represented as a small, thin stick, which supports the weight of an entire software ecosystem. This has always been the reality of open source software. I won't even go into the countless examples of consumers of open source software directly contacting maintainers with support and enhancement demands. Yet, despite all of the negatives, OSS remained the holy grail for developers who are passionate about their craft - the pros almost always outweighed the cons.

When people ask me what was it that attracts me to OSS, I give these three reasons:

  1. Popular (not all!) OSS repos tend to have a very high bar for contributions. You get to work with the some of the brightest engineers, learn and apply best practices and, if the stars align, use new technologies.
  2. The code you write is used by thousands of individuals and companies. Real impact from day one. This is obviously not true for all OSS projects, but the assumption is that you are contributing to (or better yet - authoring) a project which is widely used.
  3. You gain a lot of credibility and visibility by contributing to, and being associated with, well known OSS projects.

You never go into open source for the money

Because there is none. There were a few attempts by various companies (remember Open Collective?) but nothing really serious enough to make a dent. The current model which seems to work for some companies is the "Open core, paid services" model. The core of the product is developed out in the open and is free for anyone to use, but premium features and maintainer support requires a paid contract. What does that leave, then? The satisfaction of knowing - and having others know - that you contributed to this code, authored that package or fixed that bug.

LLMs complicated things further

Let me start by saying that LLMs are, on aggregate, a net positive addition to the software development ecosystem. I occasionally use them myself. However, the biggest problem I personally have with LLMs, is that they are trained on this vast ocean of code - sometimes with complete disregard to the license of the code they are trained on. Most OSS licenses require whoever is making use of the code to give credit to the original author(s). LLMs don't do that. They are not built in a way which allows for attribution - the whole idea of predicting the next token based on the previous tokens doesn't really account for the concept of "standing on the shoulders of giants". As far as the LLMs are concerned, they are the authors of the code they generate.

Sure, there are ways to improve the situation by using some form of Retrieval-augmented Generation (RAG), but that is akin to putting lipstick on a pig. The fundamental problem remains - LLMs are not built to respect the licenses of the code they are trained on. Commercial LLMs also pose a moral dilemma - private companies are making money off of code they didn't write, and the original authors are not getting any compensation. This has implications.

We have switched from open by default to closed by default

Let's say you developed library which you know would solve a real problem for many users. Would you open source it? A decade ago, my answer to that question was a resounding YES! Today, I'm leaning no. Recall my observation that you don't go into open source for the money, but you do go into it for the recognition. If you open source your library, there is no guarantee that the same code you wrote won't be regurgitated by a LLM in Sally's VS Code, and she won't even know that you wrote it. You lose the recognition, and Sally gets the credit.

It's not only code where the default had changed; data access is also being closed off behind paid APIs and paywalls, for very similar reasons.

Maybe it's a good thing

Taking off my software engineer hat, and putting on my economist hat for a moment: LLMs democratize software engineering access. I do agree with the sentiment that a good LLM is akin to a technical co-founder. And that's a very appealing proposition for entrepreneurs who, for the first time in history, can conjure minimum viable products in days - and without depending on someone to write the code for them.

But that about OSS?

I think we'll need to evolve the OSS model to account for the presence of LLMS. I don't know what that would look like, but if it means that we will just need to give less for free then so be it - maybe that's what OSS needed to begin with in order to remain sustainable.