There is nothing more deterministic and scientific than electric current running through logic gates, performing numerical operations and boolean algebra. But we have been crafting creative experiences that change the way we interact with the entire world through gold-plated silicon boards. Our experiences in the world are dominated by software interactions that extend our physical reach and dematerialise the real world.
There is real power in being the craftsman of such systems. We see that power manifest in companies with trillions of dollars of market valuation and also in the every day joy of seamless user experiences. This is probably the only profession that directly turns labour into capital (a super power indeed, if you ask Karl Marx).
But this is by no means an easy feat.
In the last few years, I have encountered too many engineering teams struggling to build good software systems across different businesses, countries, and technical skill levels. The most curious observation is that the root cause of all these struggles is strangely similar. Hence, I am writing this blog series to articulate and document my opinionated methodology of software design, which attempts to provide a clear and practical process to improve software development practices. This is called Adaptive Systems Design.
The challenge Link to heading
There are too many people writing code, but too few designing solutions. With the popularity of no-code, low-code, frameworks and wrappers, it’s easy enough these days to cobble together a few web components and run a serverless backend to make an app just work. This works wonders for a quick MVP. The age of Agile glorifies scrappiness and quick time to market. But most companies struggle to make good software when the market assumptions are validated by the MVP and now there are millions of users to serve. Scalability becomes a daunting challenge—especially when your software evolved less gracefully from the MVP.
This manifests in hours of unnecessary and repeated operational management and fire fighting, hindering the pace of innovation. In a typical engineering team, a software engineer spends around 70% of the time on fire fighting and operational tasks, and leaving only 30% of the productive time for innovation and shipping new product features.
So should we go back to the days of big design upfront? Absolutely not. That approach has already been invalidated. But the pendulum of anti-thesis has taken us too far from any kind of formal design in the land of Agile. The key is to have just-in-time design to keep technical debt and complexity under control for any growing software system. (Think Buster Keaton in 1927 stun film, “The General” - clearing stray timbers on the railroad right before the train derails. The train for shipping features doesn’t slow for you!)
Unfortunately, there’s no best practice for a single amount of prescriptive design needed as the business complexity of every software system varies. Thus, it is an art to build good software with sound systems design because there is no one-size fit all solution. There is a level of subjectivity in the problem-solution fit and in making the right tradeoffs at the right time to realise the most developer productivity gains.
Hence, knowing what makes a software systems design good is the first step we must take.
A good software design Link to heading
The purpose of the software system is to facilitate business transactions and deliver real value to the end users. There is no need of a software system if there is no value to be created. Software design and development must work backwards from the business needs.
Let’s start with the tactical challenges that a good software system must address:
- Changing requirements: new product features into the old system, expand or reduce product scopes
- Changing usage level: increase in total baseline traffic or spikey surges
- Changing technology: modernisation, package upgrades, infra operations
- Changing integrations: more modules, more integrations and more developers
To no one’s surprise, we are always dealing with changes. The highest level of evaluation for a software system is its ability and responsiveness in adapting to new changes - this translates into maintanable code bases that can keep up with the pace of evolution from the business.
The systemic and structural challenges that slow down innovation and bog down developer productivity are simply symptoms of a lack of adaptiveness at a system level. We usually try to fix this with more tools, frameworks and practices, addressing one engineering painpoint at a time. Many builders get lost in myraids of software technology offerings and gurus preaching about best practices. These are at best symptomatic reliefs in isolation, and can create more complexity in the long run as you deal with the integrations, upgrades and depractions of different systems. This introduces more deep-rooted complexities that are difficult to reduce, hence entering a vicious cycle.
Therefore, when making technical choices of frameworks and tooling, careful considerations and research must be done. We must resist using the most popular framework of the time, or the tool that we are most comfortable with. Stay to the truth and pick the best solution for the problem at hand.
Software complexity Link to heading
We can encapsulate the 4 tactical changes above as software complexities. A sound systems design in both the software and the infrastructure seeks to minimise the increment of software complexity as the product changes. In other words, controlling the degree of disorder and keeping the entropy in the system in check.
In order to minimise the amount of software complexity, we first need to establish a method to measure the amount of complexities in a given software system.
So let’s take a closer look at the inherent graphical structure of any software system - nodes (logic processing units) and edges (integrations and dependencies).
For example:
- In a simple Python script with functions, the nodes are the functions and edges are the function calls and invocations.
- In a distributed microservice system, the nodes are indepdent software modules that may be deployed to multiple hardware and the edges are the integration surfaces such as the emitted events or the API calls.
Nodes and edges can exist in different levels in the software system based on the levels of source code scope and encapsulation.
Following these fundamental assumptions, we can conclude that the overall software complexity is proportional to the cyclomatic complexity within each node * the number integrations as edges.
Let’s verify such quantification method.
A monolithic application typically has high software complexity as there is a high level of cyclomatic complexity in the single node with centralised and complex business logic executions. On the other hand, a nano-service architecture has little cyclomatic complexity in each node, but too many integrations surfaces as edges, leading to higher maintenance overhead.
To reduce overall software complexity, we aim to lower the cyclomatic complexity within each node and minimize the number of dependencies (edges) to achieve a balanced system.
- By reducing the number of dependencies or edges, we achieve loose coupling.
- By reducing the total number of nodes in a software system, we achieve high cohesion.
- By appropriately matching the aggregate engineering effort dedicated to the development and operation of a particular node, we achieve high development velocity.
When it comes to evaluating the cyclomatic complexity in a single node, the right amount of complexity should be proportional to the amount of engineering effort dedicated to the building and maintaince of this node. Engineering effort refers not only to the team of engineers responsible for the module but also indirectly to specialists, the platform engineering team, etc.
Adaptive Systems Design Link to heading
Thus, we establish that a quality software system is one that does not exceed a software complexity threshold, which means it has the right balance of high cohesion and loose coupling given the product requirements. This results in software units with high independence in all stage of the SDLC, allowing us to build faster and better.
In other words, our software system has great adaptability in addressing the 4 changes that result in software complexity. This translates into faster iterations and shorter time to market, lowering the engineering cost of innovation. We can distill the heuristics into a single idea:
Adaptive Systems Design: "Incremental and extensible architectures that minimise the cost of future change, and without a need for significant overhauls. The resulting agility and velocity enable businesses to respond effectively to changing demands and/or technological advancements, thereby creating a sustainable competitive advantage."
Over years of building complex systems and completing more than 20 enterprise modernization projects, I distilled a simple mental model for building Adaptive Systems:
- Business
- Software
- Teams
-
Business
The business value creation processes must stay at the heart of the software. The value of a particular software is directly proportional to how well it can support the underlying business transactions. Hence we acquire the specifications of how we should build our software from the business and product decisions. If supporting a new, high value product feature requires a major change of the software, then go ahead with it. The last thing we want is to corner our business models into what our software technical debt meagerly allows us to do.
-
Software
Translating requirements into a systems design requires a procedural guidance with guardrails for design blindspots. By following the reality of how our business operates, we make sure that at a high level, service and modular breakdown are consistent, ensuring the ability for such components to be delivered relatively independently without a web of mutual dependencies. Tactical design decisions within each service are important but more reversible. It is important to carve out a cohesive collection of business requirements for each service with sufficient complexity that can be managed by the team.
-
Team
A team is responsible for independently delivering and operating a service or module in the entire software landscape. Hence it is important to match the amount of complexity to the aggregate development effort of the team. Remove as much people connections with people outside this team as possible to enable software delivery teams to make decisions independently and act rapidly. This should be a multi-disciplinary team that works closely together on a day to day basis with the sole goal of software innovation.
No failed experiments Link to heading
Continuously experiment in reasonable margins with different configurations of technology and people structure to find that local productivity maximum for each software module. The goal is to discover a process that works specifically for your software system and your team in your business.
The moment we sink into the comfort zone of using familiar tools and resist challenges to our way of working, technical debt starts to build up and will demand interest payments in no time.
In these engineering productivity experiments, we should monitor the performance of our software system, baselined to its own previous performance. Significant metrics to measure would include the DORA metrics and also key business performance metrics specific to this software system.
Next up Link to heading
Now we understand the mental models of building complex software systems and how to evaluate if a particular systems design is sufficiently sound, we will explore the practical design processes and specific methodologies in the next part. Stay tuned.