Hype

I like shiny new things.

It’s pretty universal that we humans are attracted to what is new, the latest-and-greatest, whether out of the likes of curiosity, restlessness, or envy. On top of this is layered a positive feedback cycle – our desire to be part of the group and be in on the topics of conversation – that reinforces the shiniest and most popular objects out there. These can be called fads or trends, and at times they are rightfully the center of attention, but when the reality can’t live up to the expectation then we’ve entered the world of hype.

It has been very clear right from my first days at university that ongoing professional development is essential to maintaining employability, and few industries move as fast software development. Learning about new technologies by playing with them a little is both healthy and important, and gives us an ability to make better choices about them in our professional careers. What I want to focus on, however, is when businesses succumb to the hype.

A business becomes a victim of hype when it adopts a new technology for the sake of adopting it rather than considering whether it adds business value. Lest I sound like a stuck-in-the-mud, there are a myriad of ways that new technologies add business value: they make development faster, more flexible, or interact more smoothly with other enabling products; they may simply have lower costs, whether by virtue of licensing, labour, or lifetime/support costs; they may make the company more attractive to potential employees.

But a new technology also comes with costs: time for individuals to come up to speed with the technology and ecosystem; reduced ability to deliver product and therefore respond to organization needs and competitive threats (Joel Spolsky described rewriting your codebase as the “single worst strategic mistake that any software company can make”); staff turnover as specialists opt to take their existing expertise elsewhere, and replacements require extensive training in the organization’s domain.

Failing to evaluate the costs and benefits of adopting new technology, and/or planning how it is adopted, can seriously damage a business. And yet it still happens, and I posit this is partially to do with our pursuit of the new and shiny, but equally important is the ‘resume factor’, which brings us to the next section…

Recruitment

Hype is driven by recruitment. This is more so in IT than many other industries where professional standards or outright experience tend to carry far more weight. In IT, there remains a prejudice that, because the industry moves so fast, existing practitioners are more likely to be lagging behind the forefront of the industry.

Firstly, this is a poor assumption. Whilst I have seen some individuals flat-line their professional development, the vast majority of people I’ve worked with are doing what they can to keep up with the changes and trends in the industry, both in their work and spare time. Secondly, with experience comes wisdom, and we need that to grow as an industry or we will keep making the same mistakes.

Finally, what behaviour is this going to drive in a mature developer? They’re going to put their weight behind adopting the new technology. As altruistic as we like to try and be, there is always a tension between career needs and business needs, and if there is an opportunity to improve career needs employees are likely to favor that over overall business value because they can move on more easily.

Is Newer Better?

In situations where you’re talking about replacing some part of an existing codebase, then a cost-benefit analysis has to compare the existing technology with the new option(s). And the first question that should come up is, is it better?

I know that sounds both obvious and a little stupid, but hype has a way of making the decision-making process turn a bit stupid at times. Perhaps I should clarify by adding the implied rest of the sentence: is it better for your organization?

At present my web framework of choice is Aurelia. Whilst it isn’t the best known, it has solid support, ongoing development, and just makes things easy. But I need to keep up with what is going on in the industry so I had a play with React, and have subsequently met it more (reading rather than writing) in my current job. Based on my needs, I feel like React is hype. Why (in my eyes)?… The way state and event management works is a big step back from the fairly seamless data-binding of Aurelia or Angular or most non-web UI frameworks, and it requires quite particular data flows for props. Whilst it doesn’t require Redux, I’m not sure I’ve ever seen a job ad for React that doesn’t include Redux, and Redux is massive overkill. I’m not sure I’ve ever needed to centralize state like that in a single page application, nor manage it in a pseudo-CQRS style. If I was building something as interactive as a spreadsheet then Redux might be handy, but the majority of web applications are still focused on one UI element at a time and can fetch data on demand without it really interrupting the UX. What React does very well is componentize – it certainly feels more natural to make components in React than Aurelia.

In short, my cost-benefit analysis says that Aurelia is better for my purposes than React. Decisions are always made with limited information, and people with more knowledge of React would certainly disagree with my assessment. And this is the point – my environment, background, and needs are different from theirs, so we should reach different conclusions.

Conclusion

I want to work with things that are new because I like to learn things, but I also work for a business that needs sound decision making that considers far more than ‘what’s cool’. As software professionals we have to keep on learning so we can provide and evaluate all the options to the businesses that we are involved with, but we also have to be able to step back from the hype and make sound technology decisions based on our environment and the good of the business.

Rate Limited Async Loop

A recent project included some modest load testing. For this we created a small console application to hit our API over HTTPS. A key metric in load testing is the number of requests an endpoint can handle per second, so it’s useful to be able to control and configure the rate at which requests are made.

This in itself is not difficult: a basic sleep wait of duration 1/requests-per-sec will achieve this. However we had an additional constraint that called for a slightly more complex solution.

The application uses Auth0, an authentication-as-a-service provider, and it rate limits use of its API. Exceeding the rate results in failed HTTP requests, and if frequent enough, can result in users being blocked. Furthermore, it is a remote and relatively slow API, with round-trip times in the order of 3 seconds (i.e. fetching 100 users serially would take 5 minutes), so it’s important that we access it concurrently, up to our limit. Additionally, the token received from calling it is cachable until its expiry, and if we can get the token from our cache then we want to skip any sleep-wait in order to minimize running time.

This leads to the goal: to maximize the number of concurrent requests made to an API up to a fixed number of requests per second; and to use cached data (and therefore not use a request) where possible. To solve this I want a rate-limited concurrent loop.

Implementation

A little searching on the internet resulted in either extensive libraries that implemented a different paradigm, like Reactive, or things that didn’t quite meet my requirements. I therefore – having taking the appropriate remedies to treat potential Not-Invented-Here Syndrome – went ahead and put something together myself.

public class RateLimitedTaskProperties
{
    public bool IgnoreRateLimit { get; set; }
}

public static async Task RateLimitedLoop(int perSec, IEnumerable enumerable, Func<T, Task> action)
{
    int periodMs = 1000 / perSec;
    var tasks = new List();
    foreach(T item in enumerable)
    {
        T capture = item;
        Task task = action(capture);
        tasks.Add(task);

        if (task.IsCompleted && task.Result.IgnoreRateLimit)
            continue;

        System.Threading.Thread.Sleep(periodMs);
    }

    await Task.WhenAll(tasks);
}

The loop starts a new task every periodMs. Concurrency is achieve by using tasks, which are non-blocking, and waiting for their completion outside the loop with await Task.WhenAll(tasks). The case where something has been retrieved from a cache is handled by the task returning synchronously and setting the IgnoreRateLimit flag. This combination causes the loop to skip the sleep and move straight onto triggering the next task.

The following is an example of its use, where MyOperation() is a method that returns a flag indicating whether or not it performed a fetch from the rate-limited API.

const int tokenReqsPerSec = 5;
await RateLimitedLoop(tokenReqsPerSec, items, async(item) =>
{
    bool requiredFetch = await item.MyOperation();
    // don't rate limit if I got it from the cache (fetch wasn't required)
    return new RateLimitedTaskProperties { IgnoreRateLimit = !requiredFetch };
});

django ms-sql datetimeoffset

My current project has me dealing with python, which is a language I’ve dabbled with for many years, but I think this is the first time using it professionally. It’s quite interesting seeing where the language has evolved: I recall having previously been enamored with the quasi-Lisp approach of processing lists with map and filter, but have found list and dictionary comprehensions to be the current standard.

The project is a small Django API (if I were selling it to a VC then it’d be called a microservice :rolleyes:) and as the back-end is predominately in the Microsoft stack, it references a SQL Server database. This database includes some DateTimeOffset columns. I’m not sure I see the need for this type – dates should always be stored as UTC for maximum portability, and clients can display the local time based on client settings. If it’s necessary for a service to work with those dates, then the database should store a user timezone name or offset, but that is specific to the user and not the date. Anyway, I digress… Unfortunately DateTimeOffset columns are not natively supported by the common python ODBC connectors, and thus something of a workaround was required.

This was made extra challenging by Django, which intermediates the database relationship via its models, and therefore thwarted some early attempts to treat the columns as bytes. What it does expose is a connection_created signal which allows the connection to be intercepted before it is used, and that connection includes a method add_output_converter for handling ODBC types. In this case the type is -155 and using a little struct magic we can construct a python datetime.

One area of concern was ensuring that the signal handling was tidied up, even if exceptions were thrown. To handle this, the DateTimeOffset handling code was wrapped into a class that supports the with statement.

Apologies in advance if some of this python code is highly naive – as already noted – it’s my first professional python foray 🙂

import struct
import datetime
from django.db.backends.signals import connection_created

class DjangoSqlDateTimeOffset(object):

    def __enter__(self):
        connection_created.connect(self.on_connection_created)

    def __exit__(self, exc_type, exc_value, traceback):
        # to see connection info, including queries, uncomment and look at cnx with settings.py DEBUG = True 
        # cnx = connections['qs-sql']
        connection_created.disconnect(self.on_connection_created)

    def on_connection_created(self, sender, **kwargs):
        conn = kwargs['connection']
        conn.connection.add_output_converter(-155, self.handle_datetimeoffset)

    def handle_datetimeoffset(self, dto_value):
        tup = struct.unpack("<6hI2h", dto_value)  # e.g., (2017, 3, 16, 10, 35, 18, 0, -6, 0)
        tweaked = [tup[i] // 10000 if i == 6 else tup[i] for i in range(len(tup))]
        dto_string = "{:04d}-{:02d}-{:02d} {:02d}:{:02d}:{:02d}.{:05d} {:+03d}{:02d}".format(*tweaked)
        return datetime.datetime.strptime(dto_string, '%Y-%m-%d %H:%M:%S.%f %z') 

With that class available, querying DateTimeOffset columns becomes nice and simple:

with DjangoSqlDateTimeOffset():
  item = ModelName.objects.raw('SELECT ... ')

PowerShell History

I do like PowerShell, but sometimes find myself pressing the up-arrow a lot to find commands made in previous sessions. Unfortunately the F8 search shortcut only works with the current session, so I wanted a way to find older commands more easily.

Knowing that PowerShell can retrieve history from older sessions, I assumed it must be stored on disk, and after a bit of guessing found this file: %appdata%\Microsoft\Windows\PowerShell\PSReadline\ConsoleHost_history.txt

To make it a bit more useful, I’ve removed common commands and duplicates using the following script.

$patterns = @("^cls", "^cd.*", "^\w:", "^exit", "^mkdir")

Get-Content "$env:APPDATA\Microsoft\Windows\PowerShell\PSReadline\ConsoleHost_history.txt" | 
    Select-String -pattern ($patterns -join "|") -notmatch | 
    Select -Unique |
    Out-File commands.txt

Auth0 Mock

Auth0 is a well-known authentication-as-a-service provider. Its database connection storage option allows organizations to reference a custom database, which is very useful if you want to store your user information with your business data and maintain integrity between those using foreign key constraints. You can do this in Auth0 by setting up a connection that accesses your hosted database (with appropriate firewall restrictions!) to add, update, and remove users.

A challenge with this is that each new environment requires a new database and Auth0 setup. This is particularly difficult if that environment is a developer’s machine and isn’t accessible to a connection string from the internet (due to Firewalls/NAT). One option is for each developer to have their own cloud database, but that gets expensive quickly, and adds unrealistic latency to database calls from their machine, making development more difficult.

I was faced with this problem while building integration tests using Auth0 and .NET Core, and opted to create a mock object.

Implementation

The top level interface for Auth0 in C# is IManagementApiClient. This consists of a number of client interface properties, and it’s these that I found most appropriate to mock using Moq. This leads to a basic structure as follows:

using Auth0.Core;
using Auth0.Core.Collections;
using Auth0.Core.Http;
using Auth0.ManagementApi;
using Auth0.ManagementApi.Clients;
using Auth0.ManagementApi.Models;
using Moq;

public class Auth0Mock : IManagementApiClient
{
  Mock _usersClient = new Mock();
  Mock _ticketsClient = new Mock();

  public Auth0Mock()
  {
    // setup for _usersClient and _ticketsClient methods
  }

  public IUsersClient Users => _usersClient.Object;
  public ITicketsClient Tickets => _ticketsClient.Object;

  public IBlacklistedTokensClient BlacklistedTokens => throw new NotImplementedException();
  // etc. for ClientGrants, Clients, Connections, DeviceCredentials,  EmailProvider, Jobs, Logs, ResourceServers, Rules, Stats, TenantSettings, UserBlocks
  public ApiInfo GetLastApiInfo()
  {
    throw new NotImplementedException();
  }
}

In this project only a small number of Auth0 methods were used (something I expect would be true for most projects), so only a few Auth0 client methods actually needed to be mocked. However it is quite important, for integration testing, that these methods replicate the key behaviours of Auth0, including writing to a database, and storing user metadata (which isn’t always in the database). To support these, the mock class includes some custom SQL, and a small cache, which are used by the mocked methods. The following code illustrates this using two methods. They are set up in the constructor, and implemented in separate methods.

using System.Collections.Generic;
using System.Data.SqlClient;
using Dapper;

private string _sql;

// local cache storing information that our sql table doesn't
private Dictionary _users = new Dictionary();

public Auth0Mock(/* injection for _sql connection string */)
{
  _usersClient.Setup(s => s.CreateAsync(It.IsAny())).Returns((req) => CreateAsync(req));
  _usersClient.Setup(s => s.DeleteAsync(It.IsAny())).Returns((id) => DeleteAsync(id));
}

private async Task CreateAsync(UserCreateRequest request)
{
  int userId = 0;
  using (var conn = new SqlConnection(_sql))
  {
    var rows = await conn.QueryAsync(@"INSERT INTO [MyUserTable] ...", new { ... });
    userId = (int)rows.Single().userId;
  }

  var user = new Auth0.Core.User
  {
    AppMetadata = request.AppMetadata,
    Email = request.Email,
    FirstName = request.FirstName,
    LastName = request.LastName,
    UserId = "auth0|" + userId
  };
  _users[user.UserId] = user;
  return user;
}

private async Task DeleteAsync(string id)
{
  var match = Regex.Match(id, @"auth0\|(.+)");
  string userId = match.Groups.Last().Value;

  using (var conn = new SqlConnection(_connStr))
    await conn.ExecuteAsync(@"DELETE FROM [MyUserTable] ...", new { userId });

  if(_users.ContainsKey(id))
    _users.Remove(id);
}

Being a mock object there are limitations. For instance, in this example the cache only includes users added via CreateAsync, not all the users in the test database. However where these limitations lie depends entirely your testing priorities, as the sophistication of the mock is up to you.

One downside to this approach is that Moq doesn’t support optional parameters, so the signatures for some methods can get quite onerous:

_usersClient.Setup(s => s.GetAllAsync(0, 100, null, null, null, null, null, It.IsAny(), "v2"))
  .Returns((i1, i2, b3, s4, s5, s6, b7, q, s9) => GetAllAsync(i1, i2, b3, s4, s5, s6, b7, q, s9));

private Task<IPagedList> GetAllAsync(int? page, int? perPage, bool? includeTotals, string sort, string connection, string fields, bool? includeFields, string query, string searchEngine)
{
  // regex to match query and fetch from SQL and/or _users cache
}

Authorization

The Auth0 mock class provides authentication, but not authorization, and it would be nice if any integration tests could also check authorization policies. The run-time system is expecting to process a cookie or token on each request and turn that into a UserPrincipal with a set of claims. Therefore our tests must also populate the UserPrincipal, and do so before authorization is checked.

For this we need a piece of middleware that goes into the pipeline before authorization (which is part of UseMvc()). My approach was to place the call to UseAuthentication() into a virtual method in Startup and override that method in the test’s Startup:

public class TestStartup : Startup
{
  protected override void SetAuthenticationMiddleware(IApplicationBuilder app)
  {
    app.UseMiddleware();
  }
  
  protected override void SetAuthenticationService(IServiceCollection services)
  {
    // This is here to get expected responses on Authorize failures.
    // Authentication outcomes (user /claims) will be set via TestAuthentication middleware,
    // hence there are no token settings.
    services.AddAuthentication(JwtBearerDefaults.AuthenticationScheme).AddJwtBearer();
  }
}

The middleware, TestAuthentication, remembers the last user that was set. It must be registered as a singleton with the dependency-injection framework so that the user is remembered between service calls. Testing code can set the user at any time by calling SetUser().

When a request is made TestAuthentication‘s InvokeAsync method applies claims based on that user. These claims will be processed as policies in the normal way so that Authorize attributes work as intended.

public class TestAuthentication : IMiddleware
{
  private string _userId;
  private string _roleName;

  public async Task InvokeAsync(HttpContext context, RequestDelegate next)
  {
    if (_userId > 0)
    {
      var identity = new ClaimsIdentity(new List
      {
        new Claim("http://schemas.xmlsoap.org/ws/2005/05/identity/claims/nameidentifier", "auth0|" + _userId),
        new Claim("http://myuri/", $"Role:{_roleName}")
      });

      var principal = new ClaimsPrincipal(identity);
      context.User = principal;
    }
    await next(context);
  }

  public void SetUser(string userId, string roleName)
  {
    _userId = userId;
    _roleName = roleName;
  }
}

With this combination we are able to successfully mock Auth0 while retaining our ability to work with our database, test non-Auth0 functionality, and test authorization.

Sharing Test Dependencies with Startup

An issue I’ve had while developing integration tests in .NET Core is sharing information between my TestContext and the Startup class.

The documented approach looks something like this:

var hostBuilder = new WebHostBuilder().UseStartup()
_server = new TestServer(hostBuilder);

The problem is that Startup is called from deep within new TestServer making it impossible to pass a reference from the calling context. This is particularly a problem with integration tests on an API, where we need the an HttpClient to be made from the TestServer instance in order to call API methods.

_client = _server.CreateClient();

Dependency Injection into Startup

What I hadn’t originally appreciated is that Startup class accepts dependencies defined by the host. Therefore anything already configured in the services, which is the container for ASP.NET’s dependency injection system, is available for injection into Startup.

For instance, to pass a reference to the current TestContext we register the current instance as a singleton before calling UseStartup:

var hostBuilder = new WebHostBuilder()
  .ConfigureServices(s => { s.AddSingleton(this); })
  .UseStartup()

Now, a the TestContext in the following Startup class will be populated:

public class Startup {
  private TestContext _ctx;
  public Startup(IConfiguration config, TestContext ctx) {
     _ctx = ctx;
  }
...

Passing a Shared Object

A more cohesive approach is to place mutual dependencies in another class and make it available via much the same approach. The following is an example allowing any class access to the TestServer’s client.

public interface ITestDependencies {
  public TestContext Context {get;}
  // also various Mock objects...
}

public class TestDependencies : ITestDependencies {
  public TestContext Context {get; private set;}

  public TestDependencies(TestContext ctx) {
    Context = ctx;
  }
}

public class Startup {
  private readonly ITestDependencies _testDependencies;
  public Startup(IConfiguration configuration, ITestDependencies testDependencies) {
    _testDependencies = testDependencies;
  }
  // other methods - use _testDependencies.Context.Client
}

public class TestContext {
  public HttpClient Client {get; private set;}
  private readonly TestServer _server;

  public TestContext() {
    var builder = new WebHostBuilder()
      .ConfigureServices((IServiceCollection services) => {
        services.AddSingleton(typeof(ITestDependencies), new TestDependencies(this));
      })
      .UseStartup();
    _server = new TestServer(builder);
    Client = _server.CreateClient();
  }
}

Measurement and Agile Software Development

Introduction

I’m going to start this politically, but I promise it’ll get to software development. The trigger for this scribbling of thoughts was an article discussing the under-funding of many areas of the public sector and the quote from the finance spokesperson for New Zealand’s recently-ousted opposition party: “… the government should be thanking [the] National [party] for inheriting such a strong economy.” And it struck me that economic performance was the sole benchmark by which they gauged success. In reality, the country is vastly more complex than one set of economic indicators, and different people have very different perspectives on what constitutes success.

The ‘duh’ disclaimer

As I’ve said in some previous articles, none of this will be new to anyone who has spent, studied, or even thought about management. And it certainly isn’t the first time I’ve thought about it, but the above article engaged some dormant mental spirit to write things down 🙂

You are what you measure

Different people’s values mean that what they consider important and unimportant will vary and that is fine and healthy. The challenges with measurement are the consequences of measuring and how people’s behavior changes in response to the measure.

To take a non-software example, the New Zealand education system places strong emphasis on success at NCEA achievement, which has translated into students being encouraged to take easier courses or teachers being encourage to teach towards the tests. In this case the goal of giving students the best high school education has been subverted by a measurement which effectively demands certain pass rates.

The classic example in software development is measuring lines of code. Lines of code is a basic metric for measuring the overall size and therefore likely cost of learning and maintaining a code base. It is an appalling measure of programmer productivity: good programmers will write less code through reuse; refactoring may end up removing code altogether; and on the other hand, readability is far more important than concision.

Thankfully I believe the industry is well past measuring productivity by LoC, or even the highly amorphous function points. However the beast is far from slain, for instead we have story points and velocity.

Agile Software Development

Agile Software Development, according to Dave Thomas, author of The Pragmatic Programmer and co-author of The Manifesto for Agile Software Development, can be summarized by this process:

  • find out where you are
  • take a small step towards your goal
  • adjust your understanding based on what you’ve learned
  • repeat

And when faced with alternatives that deliver similar value, take the path that makes future changes easier.

This is very idealistic and quickly crashes into commercial reality where managers, usually on behalf of customers, want to know: when will it be ‘done’ and what will it cost? Of course, this ignores all the benefits of learning-as-we-go, Lean style (which is essentially the same thing as agile software development but applied to business), and that you get much better, albeit far less predictable-at-the-outset, outcomes than any upfront planning based process. But we can’t really ask everyone to be rational can we?

Nevertheless, marketing release dates and the like meant we had to invent ways to measure progress and estimate ‘completion’ (I keep using inverted commas because I think we all know that done or complete are very subjective terms). And so Agile (sorry Dave T, I’m going to be using it as a noun) planning evolved from concepts of managing risk and uncertainty via loose estimation in Agile Estimating and Planning to full blown methodologies that are so militaristic they require specialized commanders like Scrum Masters.

A plague of story points

And here’s where I feel agile software development goes wrong. The people involved are so invested in the process they forget the actual goals of their organization or of agile software development. Having the ‘right’ ceremonies and getting the points right become the focus. More significantly, people become concerned with the consequences of their measurement, so they will avoid having a high-scoring sprint because it’ll increase expectations on their future performance (and by this stage the team probably isn’t feeling all that empowered, but that’s another story).

So now the process is about having accurate estimates, and consistent or slightly growing measurements, regardless of the impact on the delivered product. Because although it might be possible to explain to your manager that your productivity (as measured by story points) has bombed in the last month because you decided to refactor X in order to speed up lots of expected future work, by the time it’s aggregated to their manager and so on, that nuance is lost. And now that manager is getting shafted based on that measurement which doesn’t actually reflect whether or not your team is doing a good job.

My favorite Agile

The first time I ‘did agile’ was almost by accident. We had a three person development team working on a product and a product manager who had a three page Word table with a prioritized list of well broken-down features. And every fortnight, we wrote down on a whiteboard what, from the list, each of us was going to work on and how many days we thought it would take. If something needed re-prioritized the product manager would come in (any time) and we’d change what we were doing and update the whiteboard.

The point is that we were focused on delivering the outcomes that the business wanted almost as soon as it knew it wanted them. Sometimes we’d be asked to have a bit of a guess at how long half a page of priorities might take, leading to a 6-8 week kind of estimate. But all parties also understood that estimates were exactly that and things might change, both in terms of time taken, and in terms of what was critical to get done. Unfortunately I don’t believe this approach really scales, and it requires serious buy-in from stakeholders (despite all the evidence of the value of Agile/Lean approaches).

Conclusion

As is normal for these drawn out discussion posts, I can’t conclude with ‘the answer’ – and there are a lot of people out there who’ve spent a lot of time trying to find ‘the answer’ and haven’t found one.

What I am confident of is that measurements can’t show nuance and they subvert the behavior of what they intend to measure. So it’s incredibly important to continually reflect on whether your measurements, and their driving processes, are serving you well or whether people are now just optimizing for that measurement at the expense of actually achieving things.

I understand that an organization needs to gauge how it’s performing – whether it can be more productive, achieve different goals, eliminate waste. To do this it needs concise explanations of whether it is meeting relevant sub-goals. But the consequence of this concision is a loss of nuance that sands off the random edges that create effectiveness.