I’ve been working towards hosting a website for my musical compositions and one thing I wanted to do is to add text and images to my PDFs to indicate the music is a preview, i.e. a watermark.
There are a great many existing PDF libraries out there but I opted to build something myself. This would be a poor economic decision smacking of severe NIH-syndrome if it were done in a business setting. However, this was for a personal project meaning cost was a factor and the solutions out there for .NET either come at considerable cost (which I can understand having now spent time with the spec), have hard to judge quality, or are ports from other languages and don’t take advantage of .NET features. Finally it has been quite some time since I wrote a lexer and parser so it was a nice exercise.
The library, in the state it is in, is available at GitHub. There is no nuget package thus far so using it requires cloning the repo and then following one of the examples from it. The classes created focus on loading and saving PDFs and working with the objects found directly in the document. Once it comes to manipulating the contents of the page, any user must (at present) understand the format being used (i.e. sections 8 and 9 of the PDF spec 1.7).
Taking a first look at the PDF format was quite interesting. Its syntax is based on PostScript, so for instance dictionaries are surrounded by double-angle-brackets. It structures items as objects, which can be referenced-from or embedded-in objects that use them. Binary objects, like images, are typically stored within compressed streams.
I look forward to putting this library into practice, and maybe it will find some uses for other people too.