I’ve been trying to decide how I feel about the rise of open-source libraries and frameworks at the expense of enterprise frameworks.
Before I dig in, I want to define some terms, at least as I understand them. To my mind, .NET Core is commercial because it is backed by a corporation and a framework because it does many things. Perhaps it is not fully commercial, as it’s marginally distant from their core business of selling Windows (compared to say Active Directory or the Win32 API) and I don’t know who pays the developers, but it still comes across as being an offering by a corporation. By comparison, most NodeJS libraries I’ve used are maintained by one or few developers and at best might have been side projects at a company but typically seemed unrelated to any enterprise so I’d class them non-commercial, and as their function tended to be quite specific (e.g. an XML parser, Open ID Connect client) I’d call them libraries. As with most things, these are not binary choices: commercial vs. non-commercial and framework vs. library are continua, with projects falling everywhere between the respective endpoints.
My transition to open source libraries
The bulk of my experience has been in the Microsoft ecosystem, but only in the last year have I started working with open source offerings, notably ASP.NET MVC 5 and now I’m very ensconced in .NET Core. In that year I was also involved in my first NodeJS back-end development.
Before this lack of open-source exposure casts me as some kind of Luddite, molly-coddled in the Microsoft way: 1. I’m referring to my business experience, where ‘time is money’, not the tools I’ve played with in personal projects; and 2. I’ve certainly used open source libraries and frameworks as a web developer – the main ones were PrototypeJs, YUI 2, jQuery, and ExtJS (before and after it became commercial). There were also plenty of small libraries used to fulfill specific web and back-end objectives – at one point I had to list them during a due diligence exercise and I’m pretty sure we got into the 30s. However the bulk of my development time has been written against frameworks and libraries that were either commercial (closed and open source, and usually free-of-cost) or very mature.
Thus in the last year I have gone from coding against predominately mature open source or closed source commercial frameworks to coding against a wide mix of small and large open source frameworks and libraries, and I’ve often found this transition to be detrimental to my goal of building an application to meet a business need. And thus we can conclude the introduction having reached the purpose behind this post: to elaborate on my thoughts about the consequences of open source on building software products.
My negative experiences with open source libraries
The area where NodeJS was starkly different to my previous experience was that many of the functions needed to make a basic application required an external library. The most memorable of these was finding an XML parsing and manipulation library. I don’t recall how many libraries I tried, but ultimately none of them represented a holistic solution for XML in the way
DOMParser does. Looking back now I don’t recall which ones were tried and why they didn’t work (possibly TypeScript related at times) or even which one we eventually settled on, I just remember it being an annoying process that took us away from actually building a product. And I know NodeJS is all about JSON as a fundamental data structure, but XML is still everywhere and has to be consumed and produced all the time so for a major environment to be without a sanctioned and reliable XML manipulator was, well, a culture shock.
The XML library experience illustrates one common characteristic of open source libraries which is they only tend to implement some sub-set of a specification or expectation.
The challenge then is to know what part of the specifications are implemented and how correctly those parts are implemented. In some cases the library wiki or documentation provides a guide on what it does or doesn’t cover, and in some cases a suite of tests hints at correctness. Ultimately the only reliable way to learn if the library will do the job is to code against it and test the result.
I found this out the hard way recently. After following the documentation and struggling to understand why a key function didn’t work, I got the source code of the library, and managed to step through it to discover the functionality simply hadn’t been implemented. I also eventually found a vaguely related GitHub issue confirming that. That was nearly a day wasted which could have been saved by a short list on the GitHub landing page saying ‘this library supports these 4 (of only 5) major functions’.
To be fair this is not unique to open source. I recall with anguish the peril of straying off the beaten path with several mature or commercial libraries, where things that it felt like it should be able to do became prohibitively complex.
My biggest gripe with open source libraries is their documentation tends to be somewhere between poor and acceptable only for the primary use case. This is completely rational – if the contributors are primarily developers then their most effective contribution is to develop. As a result, there seems to be an acceptance that developers using the library will mostly have to help themselves via extensive searching or finding a public forum, like Stack Overflow, to get questions answered. This can be very time-consuming (especially when time-zones don’t match up) and again detracts from building business value.
Whereas a paid library typically comes with support, and as it is in the best interests of the company to minimize expensive human support time, they provide excellent documentation, forums, and other ways for people to help themselves easily.
I have to say that I’ve worked in the technical writing industry, and there is a substantial difference between what developers and good technical writers produce as documentation. Technical writers have an understanding of how people learn and come at concepts from multiple angles, and can be systematic about identifying what is and isn’t covered.
The framework that illustrates this point most effectively at present is .NET Core. On the surface it looks like there is significant documentation, but compared to what MSDN provides for the .NET Framework, it is missing a great deal: lack of namespacing (I curse at the amount of time I spend tracking down namespaces); API references lacking examples of both class and property use; inheriting classes missing inherited properties; poor cross-referencing between concept topics and API topics; shallow concept topics.
It’s entirely possible Microsoft has been paying technical writers to do this and I am therefore perhaps criticizing their investment levels rather than the style of content, in which case it is a problem of commercial priorities rather than open source in general.
Speaking as a developer, creating new functionality is fun. Fixing bugs is not fun, neither is writing extensive automated tests, or localization. And if you’re a great developer but struggle with written communication then taking time to document or to textually support library users seems like a really poor use of your time. So given a choice between expanding the library, and perhaps gaining the pride of greater adoption, or making minor fixes, what is the rational choice?
This is the natural consequence of removing both the support of a wider organization with customer support, documentation, and QA specialists; and removing the commercial incentives to meet paying customers’ needs. It is much easier to ignore a tricky issue if no-one is paying for it.
Let me be clear that I’m not denigrating developers here – most developers I’ve met have a strong sense of pride in their work and will do their best to provide quality and capability, but ultimately are limited in the time and desire they have available.
And again, this problem isn’t unique to open source. Companies make the same trade-offs all the time, often to their paying customers ire, and can get away with it because it costs the customer too much to change away from them.
But Open Source == Freedom, Right?
Having cast aspersions on open-source libraries for several paragraphs, it is time to throw out some of the positives.
Top of my list of the benefits of open source is that the initial barriers to entry have basically evaporated. Do you have a computer and can code? Then you can solve someone’s problems without it costing you more than your labor plus the electricity to run the computer.
I’m careful to say initial here, because the concerns above are certainly barriers in themselves, but they tend not to strike early on in development because we usually start out following the paradigm of a given library, and only when we stretch away from its core capabilities do we encounter some of the aforementioned problems.
Unless the library in question is dead (i.e. no longer being maintained), I’ve found that issues generally get fixed faster. This may be because smaller teams are more nimble, or that open source developers are often top-shelf developers adhering to many of the practices than enable fast turnaround like good test coverage and continuous integration. Companies tend to be less responsive because they have greater inertia, which comes from the time cost in organizing across departments as well as teams. Some of that inertia is in providing things like the documentation or localization, so being responsive does come at a price.
With open source libraries you are not dependent on a vendor’s whims to get issues resolved. Instead there is the option to download the source and step through it to figure out what is going wrong, potentially fix the problem, and submit the fix back to the library so future versions will include it (having your own branch of a library is not desirable).
With the source code it is also possible to deeply understand what the library is doing and better understand how it was designed to be worked with. Source code is the ultimate documentation, and it is even better if it comes with a decent test suite.
But all this comes with a price – time. Trying to read and understand unfamiliar source code is a complicated and time consuming activity, and compared to a developer familiar with the code, it may take orders of magnitude longer for an unfamiliar developer to fix a problem.
I didn’t come through this with an agenda. The negatives are longer than the positives simply because it is easier to find things to complain about than to itemize what works. I’ve had some of these thoughts for a while and wanted to put them all down and think about it.
I think, in summary, that as an industry we’ve decided to trade time for money. Instead of paying for tools that are warranted to do the job by their vendors, we go free, but spend more time figuring out how to use the tools because the documentation is limited and the implementation is less complete than what that documentation might lead people to expect.
The first resulting question is, is this a good use of our time (i.e. company money)? Developers are expensive. Having developers diverted from their business mission because of tool challenges could be considered wasteful, or it could be considered the cost of doing business.
The next question is, is this what we (developers) want to be doing? Sometimes the answer is yes – we want to be continually learning or on the cusp of new technology; but sometimes it is no – we simply have a job to get done. What is more useful is better ways of telling what libraries are good and what they are good at. Obviously better documentation would help, but aggregators that work on networks and statistics are also very useful. For instance the download counts in nuget or npm, the scoring in npmsearch, or the Google rank tell us about the adoption of the library which is assumed to correlate with library value. The downside of putting too much emphasis on scoring tools is that it solidifies the status quo and therefore limits innovation. Is accidentally being early-adopters and getting angry with new libraries an acceptable price for the industry to pay to allow innovation to prosper?
And finally, have I identified the wrong problem? Much of what I’ve noted is also a feature of many less mature or less widely used commercial libraries. Is what I’ve observed actually the consequence of my recent career transition which could also be described as from mature and conservative frameworks to newer and less tested ones? For instance, what would comparing the documentation between .NET Framework 1.1 and .NET Core 1.1 be like?
I’ve chosen to end with questions because I don’t have the answers. There are many trade-offs and different people in different circumstances will have different valid answers. There are undoubtedly frustrations with open source libraries, just as there are in commercial ones. There are also improvements that can be made to the open source ecosystem, like encouraging the involvement (through pride) of technical writers and QA experts to improve the quality of what we consume.