From Developer to CTO Part 7: Code Quality, Entropy, and Bugs

Contents

I care a lot about code quality, and I truly believe understanding and managing it is foundational to long term success as a technology leader.  I have a very specific definition for code quality as I see it, and in the following definition I mean “quality” in the sense of a facet or attribute of something, not in the sense of “degree of goodness”:

“That quality of code that reduces or eliminates the difficulty in determining code’s intent and thus increases its bug resistance”.

Not actively managed, code quality naturally degrades over time in an entropic process, and eventually gives rise to ever larger refactoring needs.

1.1. The Reasons for Entropy in Software Engineering

Entropy is everywhere.  In nature, wind and water erode structures built over millions of years.  Human-built structures need to be maintained to stave off the constant forces of nature that would fight to bring them to the ground.  Destructive pressures are in constant tension with constructive ones.  Similarly, by virtue of engineers’ interactions with code, it’s only by routine maintenance that the code base can fight the forces of entropy; code is a complex set of interconnected documents, and the connections between them and the logic therein are not impervious to degradation.  In fact, it’s a guarantee that without mindful effort to keep code healthy, its quality will degrade over time.

But what pressures cause entropy to manifest when engineers interact with the code base?  It comes down to a few main categories: changes to the purpose of software over time (functional extensions) that make the software’s design no longer match fit for its purpose, challenges in knowledge management of large platforms, and those forces that affect the ability or will of the developer to remain quality minded.  I’ll discuss these phenomena in order.

Few platforms remain static for the duration of their life, at least until they are sunsetted anyway.  The need to remain competitive and to meet changing needs of its users will generally mean a platform‘s code rarely remains untouched for long.  In its earliest incarnation the design is ostensibly match-fit to its purpose; code should be optimal, and small as it will likely ever be for the job it is expected to do.  Over time however as functionality is added to even a well-designed system, inevitably the original structure and assumptions of the code are no longer appropriate for the new way it is expected to think.  This forces developers into compromises in their implementation; refactoring may not always be practical or advisable, and in an effort to make something work, they must try to shoehorn functionality into existing code.  Over time extension becomes more and more difficult.

Add in that over time development teams experience turnover, especially if the platform’s initial development effort was completed and a maintenance team was left in place.  There are difficulties for developers in coming to comprehend the design holistically, especially when they don’t have access to the original developers anymore.  Their efforts to extend the platform often result in doubling down on the extension of a problematic parts of the implementation just to get the job done with the belief that it’s the least risky way to handle the effort.  Additional code is added because the developers aren’t aware that code already exists to fulfill their needs, especially as it relates to lower-level functions.  This causes differences in the way data is treated, and can introduce new, undesirable dependencies into the platform.  Code is often duplicated by way of copy and paste so that an original implementation of a class or function isn’t broken, and then modified for purpose.  This of course is also a partial duplication of functionality, and to a new developer who will later join the team, it can be difficult to know what the proper implementation is.

Lastly, cultural considerations affect the way developers interact with the codebase too.  Rushed developers will often knowingly introduce hygiene issues to meet delivery commitments, in effect pushing risk further right instead of mitigating it on the spot.  As a result, code becomes even less extensible, oftentimes less performant, unit tests become harder to right, and quality degrades.  A management that believes doing the minimum work possible without periodic refactoring to get the code clean creates a disengaged culture in which quality-mindedness is an afterthought.

These forces are perpetual and strong.  Even in the best of cases and with the most conscientious and well managed teams, over time these will catch up with the software, causing it to tend towards bugginess.

1.2. The Root Cause of Bugs is Usually Not Logic Issues

Flies don’t hang around anywhere there isn’t something decomposing.  I don’t feel I’m being overly general when I say that to see the root cause of bugs as simple logic issues in isolation is to not see them for what they truly are; make no mistake, the root cause of a bug is usually not the mistake in logic that underlies it.

When we encounter bugs in software engineering, the first question we often ask is “why didn’t we catch it in testing?”, followed by the only slightly better question “how can we prevent these kinds of bugs from happening again? “.  The question we should be asking is “what is it about this section of code that made this bug probable?”.  If you subject the bug fix to the 5 why’s test, you don’t have to search for long before the answer isn’t about a single line of code anymore, but something much more pervasive.  For avoidance of doubt, the answer is not lack of or ineffective test coverage (although it doesn’t help), although this may be as far as some teams get in contemplating the problem.

Before digging in here, I want to identify and thereafter exclude from conversation a couple of categories of “behavioural issues” that get pulled in under the bug category (correctly or otherwise), as I want to keep my focus here is on the category of bugs that exist because of code quality and design issues.  The first category I’d like to exclude is those issues that eventuate from well specified and executed functionalities that don’t do what the business really needed.  The second is those behaviours that manifest due to issues in configuration of heavily configuration-driven systems that cause unexpected code paths that were never explicitly designed or intended (due to conflicting non mutually exclusive settings, for example).  The category I wish to examine is those that result directly from issues in code organisation and comprehensibility.

If you want to get at the root cause of why most bugs exist, your engineers often need look no further than this question: “Is the code and design easy to read and comprehend?”.  Specifically:

  • Is the intent of the code obvious?
  • Do I have to spend too much time reading it to work it out?
  • Are conditions and loops nested too deeply to understand it structurally?
  • Can I readily count the code paths of a function on one hand?

These are really simple questions to answer.  Yet, when it comes to code reviews these are often the questions given least consideration; more often code reviews address whether the code achieves its functional goal (to me this is what tests are for, but more on that thought later).  I’ve seen many a code review pass on incredibly long functions, classes, and files, with nary an thought given to these questions.  And still, when it comes to contemplating why bugs end up getting caught late, we turn our attention to test coverage.  I’m going to very much expand on these thoughts in the following sections.

1.3. Good Code Hygiene Cannot Compensate for Bad Design

Now to be clear, to stop looking here is to dramatically oversimplify the bug landscape, so lest I should convey the impression that I believe code hygiene is where the hunt for bugs begins and ends, I’ll spend a moment describing a particularly insidious class of bug – state management bugs.  I do so not for the purpose of listing another bug category for its own sake, but for the sole purpose of pointing out that again, application of proper principles and good technical oversight can reduce the chance of these happening at all, and as a technology executive this is something you need to have a plan for.  You cannot afford to assume these principles will manifest by themselves without real applied leadership.

State management bugs can present in systems that manage large amounts of state, and for which there are poorly defined boundaries and separations of concern around what part of the code base can modify that state, and when (in other words, the state is poorly encapsulated).  In a system where poorly encapsulated shared variables are used at either the global or class member levels, or where data is shared by way of a database with little constraint over what can read or write it, it is possible for one piece of code to modify data in a way that another piece of code was not expecting.  Add in that the temporal dimension may make it difficult to determine the order of operations that led the shared data to have the value it does, and you have yourself a particularly nasty bug to track down.

I single out this category of bug because code readability may not even be a contributing factor; it may have low cyclomatic complexity, low indenting, low lines of code, be perfectly readable, and not even show up on the radar by any other reasonable code complexity metric.  However, organisation and design in this kind of system is most certainly a problem.  Again, I don’t describe this bug category for the sake of doing so but to reiterate my main point: it’s hard to classify the reason for this kind of bug’s existence as a logic issue; it’s much more a function of the level of coding discipline and design that got the system there.   This is entirely preventable.

1.4. Difficulties Visualizing and Policing Code Quality

Simply, good code has a certain “look” to it.  The tenets or underpinnings of any given design choice may have more or less merit than others, but in general, even a poorer design (performance-wise, for example) can be coded cleanly.  The ability to assess whether a design is good or not is what architects and principal engineers are for after all, and ostensibly a solid architecture function should prevent bad designs from becoming considerations anyway.  But design quality aside, clean code is clean code.

Now this is easy to say, but it doesn’t take a very large code base before determining the degree of code quality becomes difficult for even a skilled engineer, especially across a large, distributed team.  No architect should need to be reviewing every code commit, and I’d argue that not even a principal engineer should be expected to do that, either (beyond spot checking, for sure though).  We instead need them collaborating with the architects, understanding needs, designing, laying down frameworks, and the like.

1.5. A Practical Approach to Managing Code Quality

To my earlier point, good code “looks” a certain way.  There is a well-known set of “smells” that poor code has that are readily detectable to the human eye, and thankfully, in an automated fashion too.  As I mentioned earlier, with a large code base and a large team we cannot hope to get the coverage to keep up with what we’re committing to our branches.  We need an automated way to do this, and one that engages engineers when they are about to pump lower quality code into a commit, so they know it the second they do it – something proactive.  If you’re not hearing your developers talking about code quality or code smells on a regular basis, they are almost certainly not thinking about them, either.

There is a variety of tools on the market that take a “reduce the typical bug count” approach by looking at common faults in code (dereferencing null pointers, for example).  These tools are good at doing these and have a place in making sure a code base is solid.  Enterprise subscriptions of some tools are even able to detect common flaws in code that lead to vulnerabilities, making code parameters injectable, amongst others.  These tools are great at what they do and they have their place, but they will do nothing to complain about poor quality code.  They will happily pass quality gates on poorly organized code.

The thing these tools don’t do is tell me that the code is poor quality, poorly organized, and due to its structure, there will be difficulties in its interpretation, making it hard to know exactly what the code’s intent is, what you can safely change, what its dependencies are, and the like.  That is to say, the second another developer gets their hands on it to fix a bug or make a functional extension (or even the original one for that matter, given time) will ponder over its intent, make a change with their fingers crossed, and introduce another bug because its intent is now changed, and they introduced or changed a code path that results in manifestation of a problem.  For a very large function, there also stands a good chance there were never complete tests written to cover it either, so there is greater risk in making changes to it as you have no insurance policy.

Now consider that over time, engineers roll off and new ones roll on, and it doesn’t take long before no single person knows the whole code base anymore.  As a technology executive this is a frightening reality because without some tool to tell me where my code has issues or to what extent my developers know the code base, I basically do not know what I need to manage, who I need to manage, outside of maybe a list of anecdotal (or demonstrated) issues that the engineers I still have are able to illuminate for me.  

A big question that is challenging to answer unless there has been enough continuity in developers to permit it, or unless an engineer is prepared to dig through months and years of commits to a git repository, is “who are my most qualified developers in this area of the code base”?  This has always been a really important thing for me to find an answer to.  I needed to understand where I had knowledge islands and total gaps, and how my team aligned over these.

Finally, I’ve always needed a way to tell whether there was any correlation between organization issues in the artifacts in the code base and where our bugs were being reported.  Fortunately for me, I came across a tool I will truly never work without again – CodeScene. 

1.6. CodeScene for Code Quality Governance

CodeScene is a tool that no technology executive with responsibility for a code base should be without – it is one of the most exciting and game-changing finds of my career.  It is a tool that can be cloud-hosted or locally hosted that connects to your code repositories, analyses them, detects the smells that really strike at the heart of poor code organization, and reports them in a comprehensive interface to navigate its findings.  It can produce regular reports on trends in code quality across the architectural components and files in your system so you can track progress over time.  More, it provides an interactive mechanism through which a developer can directly view the impact on code quality of their changes by seeing the smells they introduced or eliminated and shows recommendations on what should be done to clean them up.

Using CodeScene I was able to ensure that on a week-over-week basis the code health of each repository (which for me had a 1:1 relationship with a microservice) continued to trend in the right direction, and it was able to flag commits that caused a degradation of code quality.  Not only was I able to see it at the administrative level, but the engineers themselves could see it by way of its integration to git; right in the pull request a message would be sent back by CodeScene that allowed the developer to see whether they had caused a reduction in code quality, allowing them the chance to either address it on the spot or to add it to a code quality backlog for their next sprint.

The most powerful aspect of CodeScene for me was the ability to make determinations on code quality in the most relevant areas without direct interaction with any of my team members.  With an engineering background I was able to read the flagged code and determine what I thought priority should be using my own judgment and could use this to inform the future of the technical debt roadmap.  If you don’t have an engineering background, a good relationship with your Architect – for whom CodeScene should be a primary utility – should allow you to determine prioritization, and the Architect can put suggested changes in the context of what is approaching on product and technical debt roadmaps.  For example, they could schedule work to clean up some code and add test coverage before a functional extension you have on product roadmap so that your team has a clean starting point.  You can also take a top-down approach on cleaning up code that is having quality issues.  This is massively enabling.  Your issues are visible to you, and your liabilities are knowable.

Devoid of a tool like CodeScene there are few ways in which to gather meaningful metrics from a team as it relates to code quality.  Through work item category tracking It should be possible to directly correlate effort put into code quality work and the incidence of bugs in those areas of the code base.  Moreover, it should be possible to directly demonstrate that the pro-active work put into this endeavour is an order of magnitude less expensive than having bugs slip through to production due to fragile code, by the time the costs of customer attrition and the weight of the customer and engineering support functions are considered.

Quality commits should be a matter of pride for an engineer, but typically, there are very few practical, efficient ways in which this can be managed at scale, and the efforts of an engineer recognized in a positive way that enforces and publicizes good behaviour and discipline.  Once I incorporated CodeScene’s git pull request integration I saw engineers actually competing to create the largest positive delta in code quality.  I also required teams to show their CodeScene metrics during sprint demos so a) they knew they were accountable to it and b) they could tout their efforts.  For those engineers that were struggling to get increases in quality, the distinctly measurable efforts of those that were succeeding served as a reference point for those engineers whom they could emulate, leading to a very positive uptick in quality commits across the board.  No developer dared commit a significant decrease in quality, and not because of any punitive measures on my part – but because they personally wanted to show they were good enough to always produce quality commits.  The integration of this tool into the development lifecycle created a positive sustainable culture in ways that I never imagined possible.  It made the developers better, and they competed to be the best.

1.7. Holding Developers Accountable Using Cyclomatic Complexity

If you cannot get a CodeScene license, all is not lost.  A metric called cyclomatic complexity can be a valuable tool and get you part way there too, and there is no need to be particularly technical to understand it and use it.  It is only one of several metrics CodeScene analyses, so you won’t get as comprehensive an understanding of your code smells, but it’s a good start nevertheless.  Depending on the language you’re using, most mainstream integrated development environments (IDEs) support the generation of this metric in one form or another (in the IDE itself, or by way of IDE extensions). 

In loose terms, cyclomatic complexity is a measure of how many code paths there are through a function or method (in a file or a class), and thus the minimum number of unit tests that would need to be written to give complete coverage to the function.  The lower this number, the simpler the logic in the function, and the less challenging it should be to determine the code’s intent, and by association the more bug-resistant it should be.

If it is difficult to manually count the code paths, it’s your indication there are too many.  Deep nesting, loops with complex conditionals, ternaries, and anonymous functions are all very capable of obfuscating structure and intent.  For my comfort level there should be a very small number of code paths – I consider 5 to be on the high side.  Anything much more than that is going to create a situation where unit test coverage is going to be missed because you don’t detect a pathway.  Anything more than this creates fertile ground for bugs before long.  I’ve seen 500-line long functions with no accompanying unit tests because they were too long to determine code paths and had too many inbound dependencies, making it impractical to write a single meaningful test.  Frustratingly, functions get to that size because other developers over time continue to extend long functions without conscience or effort to break them up.  This is a very, very common practice and is poison to a code base and overall product quality.

Now to make my position clear on unit testing, I’m not advocating that cyclomatic complexity should be used to drive unit test coverage of 100% of code paths, but I do believe in solid unit test coverage in the right spots.  Some functions have such dependence on other objects that writing a true unit test (with interface mocks, etc.) can take more time than it is worth, and it is better to cover the function with an integration test instead that injects real objects.  However, with this all said, if there is justification for not writing a unit test for a function, it should be clearly documented in the code.  Part of my reason for saying this is that unit tests are by their nature highly coupled to the structure of the code, and any refactoring effort can invalidate many tests, requiring new ones to be written; oftentimes, the original tests get abandoned.  Integration tests tend to be more resilient to structural changes and judiciously used, can be just as effective.  It’s a judgment call, but again, it should be a documented decision.

Absent a good tool it can be difficult to collect the cyclomatic complexity metric in a centralized fashion, so make the developers present these metrics to you as part of their definition of done.  It often isn’t possible to reduce this number to a safe level in a single commit due to delivery time constraints, but any code that is checked in that has a high cyclomatic complexity should be put on the technical debt register for future attention so it can be attacked incrementally.  Even one change to a long function once a sprint can make a difference.  The metric should most certainly never worsen; this must be seriously frowned upon.  In any event, if the metrics are too high their code should be reviewed by Architecture and thereafter adjusted.

1.8. A Final Plea - Code Quality as a Lifestyle

As if I haven’t made my point strongly enough already, I’m going to finish up by approaching this from a more relatable level.  Code Quality must be a 8x5 consideration, not something only occasionally performed during a clean-up period.  It should be discussed every day.  Metaphorically speaking and bluntly put, don’t act surprised if you cannot sustain healthy weight if you eat poorly five days a week and work out only on weekends.

I find the health and fitness analogy appropriate and a good way to identify the values that can also drive your efforts in code quality.  I’ll list just a few of the most relevant from my own routine; there’s nothing crazy here:

  • Regular strength training that covers the whole body. I don’t neglect areas due to concern over creating imbalance and compensating injuries.  I’m 48, after all.
  • Putting good stuff into my body in the right amount.
  • Limiting the amount of unhealthy stuff I eat.
  • Wanting results.
  • Consistency in application.

So – regularity, not neglecting parts, put good stuff in, limit bad stuff in, desire for results, and consistency.  These 100% apply to coding discipline as well.  These statements might sound naively virtuous and throwaway - but - using my earlier practical guidance these are 100% possible to visualize, eminently manageable, and there is no reason to expect that you cannot attain them.  These are values. These are fundamentals.  Don’t get caught saying you see these as nice-to-haves and that you don’t see how you can get there.  Again, your ability to manage code quality and technical debt will define your legacy.

I’ll finish this long-winded analogy with a final thought.  We all know that not paying attention to ergonomics, for example, can one day cause longer term health issues that you can no longer cure – you just have to manage the symptoms.  This is oversimplifying of course, but some symptoms are entirely predictable.  They can creep up on you and one day become noticeable, and you may not immediately tie it back to a root cause because it developed over a long time.  Some of them you’ll look back on and realize you decided to kick that particular can down the road.  I have carpel tunnel and a perpetually stiff neck to show for my neglect.  I’ll extend the analogy no further but to say that truly, code is no different.  It is a complex set of living documents, and they’re highly interconnected.  As it relates to your code, don’t do this to yourself.  This is preventable.

If you believe these values are important and you buy into them, your developers also have to believe in them.  They must buy in, be students of the experience, and develop the skills to become truly conscientious developers.  No matter how technical or knowledgeable a developer is, if they don’t live these values every day, they will cause more long-term harm than good, and then you’ll have to manage your way out of that corner, too.

Next Up

As a technology executive it's generally in your interests to be able to delegate this code quality governance function to someone else in your organisation, because success in this endeavour requires attention to a lot of detail, knowledge of tooling, and the ability to spend time coaching and redirecting the efforts of your engineers.  For this reason, I'll continue this discussion on how such governance can be executed with the help of a well-defined departmental architecture function.

You can find my next post titled "The Architecture Function is the Departmental Glue" here.

No comments:

Post a Comment