Leadership

This is something of a digression from previous ‘how-to’ posts. Instead I’ve felt motivated to share my perspective on leadership, which is an issue in society that impacts organizations of all sizes and kinds, from parenting through to corporations and government.

What is Leadership

I’ve often been struck by the distance between leadership as it is defined in management texts and how it is executed. My trusty management textbook places leadership as the fourth pillar of management in the section “Leading – To Inspire Effort”, and defines leadership as “the process of inspiring others to work hard to accomplish important tasks” (Schermerhorn 2001, p262). This is a fairly open definition that could include anything from managing an empowered self-managing team through to slavery. To be fair to the author, it is followed by six chapters expanding on the subject.

The contents of that text are based on the outcome of decades of research and analysis. In general, research seeks to simplify the thing under study as much as possible – to be the crucible that burns away insignificance and leaves us with the key factors that impact something. I believe concise leadership contingency models like Hersey-Blanchard and it’s three-dimensional matrix of relationship and task behavior and follower readiness illustrate how complicated systems can be abstracted to their significant details, and that such models are critical for illuminating various facets of management and preparing managers to handle the many different people and situations they may encounter (and to be clear: leaders are managers. If you are inspiring people then you are managing them).

What scientific management seems to cover less (or at least less so in introductory textbooks) is that people are.. well.. people. They aren’t ‘rational agents’ conforming to the box neatly defined by research, and they have – insert-deity-here forbid – feelings! People are squishy and unpredictable, and frankly if you’re in a management position and don’t think that I’m stating the bleeding obvious, then you need to find another job. The literature on this side of management tends to be more anecdotal, but also easy to empathize with regardless of which side of the managing/managed fence you fall on.

Theory vs Practice

So now I will add my anecdotes through some hypothesizing. I’m told (by Wikipedia) that around 1% of the population suffers from the most extreme form of narcissism, and it is my contention that these people tend to cluster in leadership roles. The very nature of “knowing you’re right” and projecting that confidence (however un-examined it may be) creates the vision that management theory looks for. It also creates an environment that followers need to have a sense of fulfillment – after all in our comfortable post-Maslow worlds we need to make a difference to find satisfaction, and what better way than fulfilling a vision to ‘achieve great change/improvement/innovation/etc’. The people who espouse this confidence are also lauded by their superiors who naturally prefer supposed simplicity over the complex reality of the situation, and thus these people tend to elevate into positions of power. Unfortunately people who ‘know’ they’re right also tend to be extremely resistant to anything that challenges their perspective. Such a conflict can be very personal and highly destructive given that any challenge is perceived as a threat to that person’s self-image or core values.

In practice the leader’s vision tends to be skewed towards their own goals, and while organizational alignment is usually covered by at least lip service, the goals tend to be angled towards their individual needs, whether for career progression (who hires the manager who thought the status quo was working great and opted not to change anything?) or a psychological need (e.g. admiration, entitlement).

This is the point where I start to struggle with these people. I believe I’m experienced enough to be positive about people and work with them to foster the goals of the relevant organization, but my natural desire for analysis means that over time I tend to find concerning dissonance in their positions. Where I’m not experienced enough, or perhaps just disinclined to submit to this aspect of culture, is that I will point out that dissonance, and in doing so create the conflict.

The world of management theory tells us to be transparent with problems because organizations can’t fix problems they can’t see, and it tells us managers that a moderate level of conflict is good (too little means people have stopped caring and are probably looking for other jobs). What it doesn’t tell us is that some of the time the manager is going to see that as a personal threat, or they’re going to ignore it and place you in the ‘whiner’ box, because these managers aren’t entirely cut out for their jobs, but there is seldom any way to observe this problem and correct it in an organization. Studies strongly indicate that the most significant factor in employee retention is their immediate manager, and yet dysfunction in that relationship is often invisible to the organization until it is too late.

We know what’s good for us, but…

Perhaps the most scientific expression of this I’ve run into is in the book Good to Great by Jim Collins. Chapter 3 very clearly summarizes that the best leaders have the opposite traits to narcissists. They are modest and under-stated, are diligent and workman-like, and they give credit where it is due but shield others from failure (Collins 2001, p39). This doesn’t stop them having a firm vision and a strong will to achieve it, but they do so by getting the right people and getting them to buy into the vision and steer the organization toward it, and expecting they’ll do the same at the next level of the organization. It is a positive and virtuous cycle if achieved.

The literature also highlights how salary and performance of top leaders correlate negatively. And yet this need for ‘leadership’ for the self-fulfilment and simplicity reasons I highlighted earlier mean these leaders, who are by all accounts bad at their jobs, continue to be highly rewarded – and probably more so that than less confident peers given their heightened sense of self-worth likely translates into salary expectations.

I doubt any of this is new. Much has been written about how “ignorance more frequently begets confidence than does knowledge” (Darwin). What remains surprising or perhaps depressing, is that for all the things we’ve learned about scientific management and about people and behaviors, is that we still reward sub-optimal behavior. Put another way, society seems to revere leaders that overestimate and under-deliver, and who are comfortable treating us as disposable minions to be crushed on the path to their own glory. And that doesn’t seem like progress.

Profiling .NET Core

Some of my application requests are running slowly and I need an overview of what is taking so long, so I turned to the internet to find a profiler. With .NET Core still relatively new I expected that finding mature profilers would be challenging. In addition .NET (in general) has thrown a curve-ball in the direction of profilers in the last few years with the use of async.

Calling await in a method causes dotnet to generate a state-machine that splits the function up into many different parts, and fills stack traces with MoveNext() functions. To be useful, a profiler needs to link these pieces of state-machine – which I believe could be running on different threads – back-together so the developer can understand what it is waiting for.

The Field

The only profiler that seemed to handle async was ANTS Performance 9.6. I initially found it’s results quite counter-intuitive until I changed the timing options drop-down to wall-clock time. Then it became much clearer from the call tree where the delays were. However it didn’t seem to load the source code despite PDB files being and place, and it was also the most expensive tool I evaluated.

The best free tool, in my opinion, was CodeTrack which provides a reasonable timeline view to enable navigation of the calls, but doesn’t have any in-built async handling.

A similar function was provided by dotTrace 2017.2 (EAP3). dotTrace also seems to be able to handle a few async cases, combining calls from the same source with async or cont, but for most cases it didn’t link them together.

There are also light-profilers, intended more for monitoring. MiniProfiler seems tailored for full MVC apps, and I couldn’t get it to produce output in my view-less API project. Prefix didn’t seem to work at all, as noted by other commented on their website, which may be related to using Core 1.1.

Finally, I should not I do not have Visual Studio 2017 so I don’t know what its profiler is like.

.Net Core Serializing File and Objects

For one of my API methods I wanted to send a file as well as object data. This is straight-forward enough when the object data consists of value types: the front end adds key-value-pairs to a FormData object, including the File object as one of the values; and the .NET Core back-end model object includes an IFormFile. e.g.

// JavaScript client
let data = new FormData();       
data.append("file", file);
data.append("id", "44b6...");
return this.httpClient.fetch(`...`, { method: 'post', body: data });
// C# Model
public class MyObj {
    public Microsoft.AspNetCore.Http.IFormFile File { get; set; }
    public Guid Id { get; set; }
}
// C# Controller Method
[HttpPost]
public async Task<IActionResult> Post(MyObj request) { ... }

However this approach fails if the model includes objects as in the following case where Numbers will be null.

public class MyObj {
    public Microsoft.AspNetCore.Http.IFormFile File { get; set; }
    public Guid Id { get; set; }
    public List<int> Numbers { get; set; }
}

At this point the model deserialization in .NET Core and the serialization done in JavaScript don’t match. However I found trying to use the suggested techniques to be somewhat over-complicated. My impression is the ‘right’ approach is to use a custom Model Binder. This seemed nice enough, but then got into details of needing to create and configure value binders, when I really just wanted to use some built-in ones for handling lists.

In the end I went with a different, perhaps less flexible or DRY, but vastly simpler approach: creating objects that shadowed the real object and whose get/set did the serialization.

public class ControllerMyObj : MyObj {
    public new string Numbers {
        get {
            return base.Numbers == null ? null : Newtonsoft.Json.JsonConvert.SerializeObject(base.Numbers);
        }
        set {
            base.Numbers = Newtonsoft.Json.JsonConvert.DeserializeObject<List<int>>(Numbers);
        }
    }
}

// Controller Method
[HttpPost]
public async Task<IActionResult> Post(ControllerMyObj request) { 
   MyObj myObj = request;
   ...
}

And now the front-end needs to be changed to send JSON serialized objects. That can be done specifically by key or using a more generic approach as follows.

let body = new FormData();
Object.keys(data).forEach(key => {
    let value = data[key];
    if (typeof (value) === 'object')
        body.append(key, JSON.stringify(value));
    else
        body.append(key, value);
});
body.append("file", file);
// fetch ...

.NET Core Code Coverage

I haven’t written anything for a while because, frankly, I’m passed the platform R&D stages of my current application and just churning out features, and so far I haven’t found anything much to inspire me to write about. After digging through my code looking at things I’d done, one area I thought may be interesting to readers is getting (free) code coverage in .NET Core.

OpenCover

The open source tool of choice for code coverage seems to be OpenCover. This comes as a nice zip file which can be extracted anywhere. Getting set up for .NET Core was mostly a case of following various instructions online, and there was just one gotcha: the MSBuild DebugType must be full, which is typically not the case for .NET Core where the goal is deployment to multiple operating systems. To get around this my coverage script overwrites the .proj file before running and puts portable back when it is done.

The script runs the dotnet executable from the assembly folder, meaning the assemblies aren’t directly specified in the script. The graphical output of the coverage is put together using ReportGenerator, which I have deployed inside my report output folder.

Here is a cut-down version of my Powershell script:

Push-Location

# change portable to full for projects
$csprojFiles = gci [Repository-Path] -Recurse -Include *.csproj
$csprojFiles | %{
     (Get-Content $_ | ForEach  { $_ -replace 'portable', 'full' }) | 
     Set-Content $_
}

# Setup filter to exclude classes with no methods
$domainNsToInclude = @("MyNamespace.Auth.*", "MyNamespace.Data.*")
# Combine [assembly] with namespaces
$domainFilter = '+[AssemblyPrefix.Domain]' + ($domainNsToInclude -join ' +[AssemblyPrefix.Domain]')
$filter = "+[AssemblyPrefix.Api]* $domainFilter"

# Integration Test Project
$integrationOutput = "[output-path]\Integration.xml"
cd "D:\Code\RepositoryRoot\test\integration"
dotnet build
[open-cover-path]\OpenCover.Console.exe `
    -register:user `
    -oldStyle `
    "-target:C:\Program Files\dotnet\dotnet.exe" `
    "-targetargs:test" `
    "-filter:$filter" `
    "-output:$integrationOutput" `
    -skipautoprops

# Generate Report
$reportFolder = "[output-path]\ReportFolder"
[report-generator-path]\ReportGenerator.exe `
    "-reports:$integrationOutput" `
    "-targetdir:$reportFolder"

# restore portable in projects
$csprojFiles | %{
     (Get-Content $_ | ForEach  { $_ -replace 'full', 'portable' }) |
     Set-Content $_
}

Pop-Location

The end result after opening the index.html is something like this (looks like I need to work on that branch coverage!):
coverage

ASP.NET Core Authentication

One of the challenges I had early in developing my current project was getting authentication set up nicely. My back-end is an API running in .NET Core, and my general impression is that ASP.NET Core’s support for API use cases is somewhat weaker than for MVC applications.

ASP.NET Core’s default transport for authentication context still seems to be via cookies. This was quite surprising as my impression of the industry is that, between their complexity (from which it is easy to make security mistakes) and recent EU rules, cookies were on their way out. ASP.NET Core also introduced Identity for authentication, but the use of ViewModel in examples indicates that is targeted towards an MVC application.

My preference was to use JSON Web Tokens (JWTs) sent as bearer tokens in the authorization header of an HTTP request. I also wanted to use authorization attributes, like [Authorize("PolicyName")], to enforce security policy on the API controllers.

Validation and Authorization

.NET Core has support for validating JWTs via the System.IdentityModel.Tokens.Jwt package. Applying this requires something like the following in the Startup.Configure method:

JwtSecurityTokenHandler.DefaultInboundClaimTypeMap.Clear();
app.UseJwtBearerAuthentication(new JwtBearerOptions()
{
    Authority = Configuration["AuthorityUrl"],
    TokenValidationParameters = new TokenValidationParameters() { ValidateAudience = false },
    RequireHttpsMetadata = true,
    AutomaticAuthenticate = true,
    Events = new JwtBearerEvents { OnTokenValidated = IocContainer.Resolve<Auth.IValidatedTokenHandling>().AddUserClaimsToContext },
};

The recommended approach to authorization in ASP.NET Core is to use claims and policies. To that end the code above responds to the OnTokenValidated event and sends it to a method that queries the user and adds claims based on information about the user.

public async Task AddUserClaimsToContext(TokenValidatedContext context) 
{
    var claims = new List<Claim>();

    // JWT subject is the userid
    var sub = context.Ticket.Principal.FindFirst("sub")?.Value;
    if(sub != null)
    {
        var user = await _users.FindById(Guid.Parse(sub));
        if(user != null)
        {
            if(user.UserVerification > 0)
                claims.Add(new Claim("MustBeValidatedUser", "true", ClaimValueTypes.Boolean));
        }
    }
    var claimsIdentity = context.Ticket.Principal.Identity as ClaimsIdentity;
    claimsIdentity.AddClaims(claims);
}

Finally the policies themselves must be defined, typically in the Startup.ConfigureServices method:

mvc.AddAuthorization(options => {
    options.AddPolicy("MustBeValidatedUser", policy => policy.RequireClaim(Auth.ClaimDefinitions.MustBeValidatedUser, "true"));                   
});

Generating Tokens

.NET Core does not have support for generating JWTs. For this it recommends IdentityServer4.

IdentityServer4 is intended to be a fully fledged authentication server supporting the many flows of OAuth2 and Open ID Connect. For my purposes I only required username and password validation, so in many respects IdentityServer4 was overkill, but given lack of alternatives for generating JWTs, I forged ahead with it anyway.

It is worth noting my solution deviates from the norm. IdentityServer seems predicated on the idea that the authentication service is a standalone server, microservice style. Given the early stage of development I was at, having another server seemed like an annoyance, so I opted to have the authentication service as part of the API server. Really the only problem with this was it obscured the distinction between the ‘client’ (the JWT validation and authorization) and the ‘server’ (IdentityServer4) meaning it perhaps took a little longer than I’d have preferred to understand my authentication and authorization solution.

Using identity server is trivial – one line in the Startup.Configure: app.UseIdentityServer();. Set up, even for a basic solution, is a little more complex and will admit that to this day I do not fully understand scopes and the implications of them.

Supporting the server involves defining various resources in Startup. The scopes referenced in the Configure method end up in the scopes field in the JWT payload.

using Microsoft.AspNetCore.Builder;
using Microsoft.Extensions.DependencyInjection;
using System.IdentityModel.Tokens.Jwt;
using Microsoft​.AspNetCore​.Authentication​.JwtBearer;
using Microsoft.IdentityModel.Tokens;

public virtual IServiceProvider ConfigureServices(IServiceCollection services)
{
    services.AddIdentityServer()
        .AddInMemoryIdentityResources(Auth.IdentityServerConfig.GetIdentityResources())
        .AddInMemoryApiResources(Auth.IdentityServerConfig.GetApiResources())
        .AddInMemoryClients(Auth.IdentityServerConfig.GetClients())
        .AddTemporarySigningCredential();
}
public void Configure(IApplicationBuilder app, IHostingEnvironment env, ILoggerFactory loggerFactory, IApplicationLifetime appLifetime)
{
    app.UseIdentityServer();
    // Configure authorization in the API to parse and validate JWT bearer tokens
    JwtSecurityTokenHandler.DefaultInboundClaimTypeMap.Clear();
    app.UseJwtBearerAuthentication(GetJwtBearerOptions());
    app.AllowScopes(new[] {
        IdentityServer4.IdentityServerConstants.StandardScopes.OpenId,
        IdentityServer4.IdentityServerConstants.StandardScopes.Profile,
        Auth.IdentityServerConfig.MY_API_SCOPE
    });
}

The configurations referenced in ConfigureServices link to a static class with a similar structure to that from the quick starts.

Testing

The final challenge with this set up was running integration tests with ASP.NET Core’s TestServer. The difficultly was that the authentication process would try to make a web request to the authentication server URL (e.g. http://localhost:5000). However because TestServer is not a real server listening on a port, then no authentication response would be received.

To resolve this an additional option was added to the JwtBearerOptions during Startup only for the integration tests. This class intercepts the authentication request and copies it to the TestServer’s client instance (using a static, which I’m not proud of). This is all illustrated below.

options.BackchannelHttpHandler = new RedirectToTestServerHandler();

public class RedirectToTestServerHandler : System.Net.Http.HttpClientHandler
{
    ///<summary>Change URL requests made to the server to use the TestServer.HttpClient rather than a custom one</summary>
    protected override Task<HttpResponseMessage> SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
    {
        HttpRequestMessage copy = new HttpRequestMessage(request.Method, request.RequestUri);
        foreach (var header in request.Headers)
            copy.Headers.Add(header.Key, header.Value);
        copy.Content = request.Content;

        Serilog.Log.Information("Intercepted request to {uri}", request.RequestUri);
        HttpResponseMessage result = TestContext.Instance.Client.SendAsync(copy, cancellationToken).GetAwaiter().GetResult();
        return Task.FromResult(result);
    }
}

What happened to enterprise frameworks?

I’ve been trying to decide how I feel about the rise of open-source libraries and frameworks at the expense of enterprise frameworks.

Definitions

Before I dig in, I want to define some terms, at least as I understand them. To my mind, .NET Core is commercial because it is backed by a corporation and a framework because it does many things. Perhaps it is not fully commercial, as it’s marginally distant from their core business of selling Windows (compared to say Active Directory or the Win32 API) and I don’t know who pays the developers, but it still comes across as being an offering by a corporation. By comparison, most NodeJS libraries I’ve used are maintained by one or few developers and at best might have been side projects at a company but typically seemed unrelated to any enterprise so I’d class them non-commercial, and as their function tended to be quite specific (e.g. an XML parser, Open ID Connect client) I’d call them libraries. As with most things, these are not binary choices: commercial vs. non-commercial and framework vs. library are continua, with projects falling everywhere between the respective endpoints.

My transition to open source libraries

The bulk of my experience has been in the Microsoft ecosystem, but only in the last year have I started working with open source offerings, notably ASP.NET MVC 5 and now I’m very ensconced in .NET Core. In that year I was also involved in my first NodeJS back-end development.

Before this lack of open-source exposure casts me as some kind of Luddite, molly-coddled in the Microsoft way: 1. I’m referring to my business experience, where ‘time is money’, not the tools I’ve played with in personal projects; and 2. I’ve certainly used open source libraries and frameworks as a web developer – the main ones were PrototypeJs, YUI 2, jQuery, and ExtJS (before and after it became commercial). There were also plenty of small libraries used to fulfill specific web and back-end objectives – at one point I had to list them during a due diligence exercise and I’m pretty sure we got into the 30s. However the bulk of my development time has been written against frameworks and libraries that were either commercial (closed and open source, and usually free-of-cost) or very mature.

Thus in the last year I have gone from coding against predominately mature open source or closed source commercial frameworks to coding against a wide mix of small and large open source frameworks and libraries, and I’ve often found this transition to be detrimental to my goal of building an application to meet a business need. And thus we can conclude the introduction having reached the purpose behind this post: to elaborate on my thoughts about the consequences of open source on building software products.

My negative experiences with open source libraries

The area where NodeJS was starkly different to my previous experience was that many of the functions needed to make a basic application required an external library. The most memorable of these was finding an XML parsing and manipulation library. I don’t recall how many libraries I tried, but ultimately none of them represented a holistic solution for XML in the way System.XML or DOMParser does. Looking back now I don’t recall which ones were tried and why they didn’t work (possibly TypeScript related at times) or even which one we eventually settled on, I just remember it being an annoying process that took us away from actually building a product. And I know NodeJS is all about JSON as a fundamental data structure, but XML is still everywhere and has to be consumed and produced all the time so for a major environment to be without a sanctioned and reliable XML manipulator was, well, a culture shock.

Partial Implementations

The XML library experience illustrates one common characteristic of open source libraries which is they only tend to implement some sub-set of a specification or expectation.

The challenge then is to know what part of the specifications are implemented and how correctly those parts are implemented. In some cases the library wiki or documentation provides a guide on what it does or doesn’t cover, and in some cases a suite of tests hints at correctness. Ultimately the only reliable way to learn if the library will do the job is to code against it and test the result.

I found this out the hard way recently. After following the documentation and struggling to understand why a key function didn’t work, I got the source code of the library, and managed to step through it to discover the functionality simply hadn’t been implemented. I also eventually found a vaguely related GitHub issue confirming that. That was nearly a day wasted which could have been saved by a short list on the GitHub landing page saying ‘this library supports these 4 (of only 5) major functions’.

To be fair this is not unique to open source. I recall with anguish the peril of straying off the beaten path with several mature or commercial libraries, where things that it felt like it should be able to do became prohibitively complex.

Poor Documentation

My biggest gripe with open source libraries is their documentation tends to be somewhere between poor and acceptable only for the primary use case. This is completely rational – if the contributors are primarily developers then their most effective contribution is to develop. As a result, there seems to be an acceptance that developers using the library will mostly have to help themselves via extensive searching or finding a public forum, like Stack Overflow, to get questions answered. This can be very time-consuming (especially when time-zones don’t match up) and again detracts from building business value.

Whereas a paid library typically comes with support, and as it is in the best interests of the company to minimize expensive human support time, they provide excellent documentation, forums, and other ways for people to help themselves easily.

I have to say that I’ve worked in the technical writing industry, and there is a substantial difference between what developers and good technical writers produce as documentation. Technical writers have an understanding of how people learn and come at concepts from multiple angles, and can be systematic about identifying what is and isn’t covered.

The framework that illustrates this point most effectively at present is .NET Core. On the surface it looks like there is significant documentation, but compared to what MSDN provides for the .NET Framework, it is missing a great deal: lack of namespacing (I curse at the amount of time I spend tracking down namespaces); API references lacking examples of both class and property use; inheriting classes missing inherited properties; poor cross-referencing between concept topics and API topics; shallow concept topics.

It’s entirely possible Microsoft has been paying technical writers to do this and I am therefore perhaps criticizing their investment levels rather than the style of content, in which case it is a problem of commercial priorities rather than open source in general.

Boring Stuff

Speaking as a developer, creating new functionality is fun. Fixing bugs is not fun, neither is writing extensive automated tests, or localization. And if you’re a great developer but struggle with written communication then taking time to document or to textually support library users seems like a really poor use of your time. So given a choice between expanding the library, and perhaps gaining the pride of greater adoption, or making minor fixes, what is the rational choice?

This is the natural consequence of removing both the support of a wider organization with customer support, documentation, and QA specialists; and removing the commercial incentives to meet paying customers’ needs. It is much easier to ignore a tricky issue if no-one is paying for it.

Let me be clear that I’m not denigrating developers here – most developers I’ve met have a strong sense of pride in their work and will do their best to provide quality and capability, but ultimately are limited in the time and desire they have available.

And again, this problem isn’t unique to open source. Companies make the same trade-offs all the time, often to their paying customers ire, and can get away with it because it costs the customer too much to change away from them.

But Open Source == Freedom, Right?

Having cast aspersions on open-source libraries for several paragraphs, it is time to throw out some of the positives.

Top of my list of the benefits of open source is that the initial barriers to entry have basically evaporated. Do you have a computer and can code? Then you can solve someone’s problems without it costing you more than your labor plus the electricity to run the computer.

I’m careful to say initial here, because the concerns above are certainly barriers in themselves, but they tend not to strike early on in development because we usually start out following the paradigm of a given library, and only when we stretch away from its core capabilities do we encounter some of the aforementioned problems.

Responsiveness

Unless the library in question is dead (i.e. no longer being maintained), I’ve found that issues generally get fixed faster. This may be because smaller teams are more nimble, or that open source developers are often top-shelf developers adhering to many of the practices than enable fast turnaround like good test coverage and continuous integration. Companies tend to be less responsive because they have greater inertia, which comes from the time cost in organizing across departments as well as teams. Some of that inertia is in providing things like the documentation or localization, so being responsive does come at a price.

Transparency

With open source libraries you are not dependent on a vendor’s whims to get issues resolved. Instead there is the option to download the source and step through it to figure out what is going wrong, potentially fix the problem, and submit the fix back to the library so future versions will include it (having your own branch of a library is not desirable).

With the source code it is also possible to deeply understand what the library is doing and better understand how it was designed to be worked with. Source code is the ultimate documentation, and it is even better if it comes with a decent test suite.

But all this comes with a price – time. Trying to read and understand unfamiliar source code is a complicated and time consuming activity, and compared to a developer familiar with the code, it may take orders of magnitude longer for an unfamiliar developer to fix a problem.

Evaluating

I didn’t come through this with an agenda. The negatives are longer than the positives simply because it is easier to find things to complain about than to itemize what works. I’ve had some of these thoughts for a while and wanted to put them all down and think about it.

I think, in summary, that as an industry we’ve decided to trade time for money. Instead of paying for tools that are warranted to do the job by their vendors, we go free, but spend more time figuring out how to use the tools because the documentation is limited and the implementation is less complete than what that documentation might lead people to expect.

The first resulting question is, is this a good use of our time (i.e. company money)? Developers are expensive. Having developers diverted from their business mission because of tool challenges could be considered wasteful, or it could be considered the cost of doing business.

The next question is, is this what we (developers) want to be doing? Sometimes the answer is yes – we want to be continually learning or on the cusp of new technology; but sometimes it is no – we simply have a job to get done. What is more useful is better ways of telling what libraries are good and what they are good at. Obviously better documentation would help, but aggregators that work on networks and statistics are also very useful. For instance the download counts in nuget or npm, the scoring in npmsearch, or the Google rank tell us about the adoption of the library which is assumed to correlate with library value. The downside of putting too much emphasis on scoring tools is that it solidifies the status quo and therefore limits innovation. Is accidentally being early-adopters and getting angry with new libraries an acceptable price for the industry to pay to allow innovation to prosper?

And finally, have I identified the wrong problem? Much of what I’ve noted is also a feature of many less mature or less widely used commercial libraries. Is what I’ve observed actually the consequence of my recent career transition which could also be described as from mature and conservative frameworks to newer and less tested ones? For instance, what would comparing the documentation between .NET Framework 1.1 and .NET Core 1.1 be like?

Conclusion

I’ve chosen to end with questions because I don’t have the answers. There are many trade-offs and different people in different circumstances will have different valid answers. There are undoubtedly frustrations with open source libraries, just as there are in commercial ones. There are also improvements that can be made to the open source ecosystem, like encouraging the involvement (through pride) of technical writers and QA experts to improve the quality of what we consume.

Musings on Unit Testing and Architecture in .NET Core

One of the challenges I’ve found in architecture has been how to effectively mock the data layer for unit testing domain classes. I’ve worked with various combinations of tiers and repositories, and what I consider the optimum approach is to take a domain-first approach. In a domain-first approach we construct operations in terms of the models and interfaces needed to fulfill the operation, then rely on an overseer, the ‘dependency injector’ or ‘composite root’, to serve up objects that implement those interfaces. The nice thing about this approach is it allows for very granular operations which at their extreme can be single operation classes in the style used by the command pattern. This granularity fits well with SOLID design principles because a single operation should have clear responsibilities, we are injecting the dependencies, and we can define highly specific interfaces giving us excellent interface segmentation.

Typically a good chunk of these interfaces will be for accessing data, and the result of this approach would be a repository class something like

public class SomethingRepository : IGetSomethingByName, IGetSomethingByGuid, IGetSomethingByNameAndType, ...

This is often somewhat confusing because we’re encouraged to create repository classes as generic as possible into order to avoid repetition.

// A classic generic repository interface
public interface IRepository<Something> {       
  IEnumerable<Something> Get();
  Something GetById(int id);
  IEnumerable<Something> Find(Expression<Func<Something, bool>> predicate);
  void Add(Something something);
  void Delete(Something something);
  void Edit(Something something);
  void Save();
}

Already there is a mismatch. The domain behaviour expressed by the interfaces acts in terms of targeted methods like IGetSomethingByName.Get(string name) while the generic repository uses a more general Find(predicate). Some comprise must be made – either we let the domain know more about the data layer by getting it to specify predicates, thus reducing our domain-first approach and interface segmentation; or we extend the generic repository for Something with the tailored methods.

Then we get to more complex operations that involve multiple data-sources and we either have to get units of work involved, which now means sharing context between repositories which in turn makes creation (by injection) awkward; or we create wider scoped repositories more suitable for the whole bounded context which tends to reduce cohesion. And then we have to consider how to deal with transactions.

The point is, that after all this we’ve created a very extensive plumbing layer to fulfil two purposes: to get a gold star for architectural design; and to allow the domain to be effectively tested.

How do we implement the repository behemoth layer? If we’re dealing with a database then the default today is to get out Entity Framework because writing raw SQL comes with maintenance penalties. And here is where it all goes a little wrong…
Here is the opening paragraph on the Repository pattern from P of EAA:

A system with a complex domain model often benefits from a layer, such as the one provided by Data Mapper, that isolates domain objects from details of the database access code. In such systems it can be worthwhile to build another layer of abstraction over the mapping layer where query construction code is concentrated.

This is what Entity Framework is. When we use Entity Framework (and I’m thinking code-first here) we define a domain model and then we tell EF how to map that data to a schema e.g. how to map inheritance, keys, constraints, etc. The repositories are each DbSet in the DbContext, and the DbContext itself is a unit of work.

So if we create a custom repository layer that calls EF we’re basically trying to re-implement EF using EF, which is not a very good use of time. If instead we expressed our domain behavior in terms that EF understands, like IQueryable, then we could just use EF.

At this point you could argue that using DbContext as a dependency is not a well segregated interface at all, and overall I’d agree as EF doesn’t map to our domain interfaces. But the granularity of its methods allows us to express domain behavior in terms of domain objects and limited tools for manipulating those, so I feel satisfied it is a good clean boundary. And of course, we’re in business, so let’s not waste valuable time and mental resources on extra layers whose only purpose is to earn an architecture award.

But this lack of a concise interface is a problem for testing, because adequately mocking something of the scope of EF is an enormous challenge. And historically this is where having that extra layer wrapping EF was beneficial; even necessary.

Finally we’ve reached the tool that inspired this post.

In Entity Framework 7 there is a new feature, an in-memory database provider. To quote the docs:

InMemory is designed to be a general purpose database for testing, and is not designed to mimic a relational database.

With this tool our testing problem has gone. We can now effectively mock a DbContext by setting it up with pre-canned data, just as we would have via manual repository mocks, and then inject it into the test. It’s that simple: the same DbContext class used for production can be used in test by giving it a different database provider.

Here is the builder my unit tests use. NewGuid() is used to give the instance a unique name because, by default, the same in-memory database instance will be shared by all instances of a given context type.

var dbContextOptions = new DbContextOptionsBuilder<SqlContext>()
  .UseInMemoryDatabase(databaseName: "UnitTest" + Guid.NewGuid()).Options;

var ctx = new SqlContext(dbContextOptions);