Written by Adrian Hennelly, Chief Delivery Officer, Evergen
The engineering protection racket
As a growing technology company, one of your main challenges is always going to be balancing technical and functional debt against a desire to deliver new features to your customers.
We are no different at Evergen, but it is something we are continuously working on. We have developed a process internally, coined, QSARs to attack this problem, here are some insights into how it works. But some background first…
Evergen is a modern tech company who set quarterly Objective Key Results (OKRs) and work in fortnightly sprints to deliver results for the business. We are product-led, value our customers and foster a culture of openness and clear communication despite being a highly distributed team with engineers in 9 countries.
Maintaining clear communication across half a dozen time zones (particularly when half the team is in Australia which, timezone speaking, is very inconvenient for the rest of the world) is a challenge.
We employ a number of strategies to help us manage communication across time zones. We put a lot of focus on making the most of the 2 hours of cross-over we have each day, where everyone is online to enjoy each other’s company and collaborate heavily. This has necessitated us formalising some of our problem solving and collaboration methods. Most of these tools and processes are widely used in the industry, but there is one (as far as we can tell) that is unique to Evergen which we call ‘Quarterly System Architecture Reviews’, or with affection just QSARs.
Before going any further, let’s clarify how we interpret two terms that have slightly different meanings across the industry.
- Technical Debt:
Technical or Product re-work that needs to be done due to an unconscious mistake or because we didn’t know any better at the time. This might be because the product morphed over time, the product scope was incomplete, or because the team unknowingly made the wrong decision.
- Functional Debt:
Technical or Product compromise that was consciously made. Usually, this comes in the form of an agreed MVP or a compromise to meet a delivery timeline.
As a growing company how do you balance attending to your technical and functional debt, with your roadmap and desire to deliver value to your customers?
The tech landscape is filled with horror stories of companies whose constant push for new features, without regard for the debt they created, caused their tech stack to collapse under its own weight. One of the earliest and best examples of this is Friendster which prioritised new features over addressing internal issues until it was crushed under its own weight (1).
eBay suffered a similar problem in 1999 but managed to survive it by taking radical steps to fully re-write their platform twice in a relatively short period of time. This led eBay to institute “headroom” into their company culture, where 20% of engineering time is given back to the engineering team to do with what they will to prevent another disaster (2). The concept of “headroom” or 20% time has become the default solution adopted by software companies ever since.
Pay up or else!
In our experience, “headroom” or “20% time” feels like a protection racket the business pays to its engineering team to prevent the unimaginable. That’s not to say that this approach isn’t addressing the problem, but it seems the rest of the business is divesting itself of accountability and visibility by blindly writing off an amount of time to prevent disaster.
A business would be better served by understanding its technical and functional debt, what risks they pose and what needs to be done to correct them, rather than writing the engineers a 20% cheque so that the rest of the business can just pretend that everything is ok, because I guarantee you no matter how small your tech stack is, things are not ok!
A better solution?
When it came time to solve this problem at Evergen the CTO and I agreed that visibility needed to be at the heart of our solution, this has been an iterative process and there will be more changes to come but here is a snapshot of our solution as of today.
As with any business process, its success or failure doesn’t rely on how good the process is but how bought-in the people are who carry it out are. So far we have had great success and this is primarily because the team agreed that we had a problem and we worked together to build out a solution, this solution has now become a core tenant of our team culture.
First up, if a problem can be easily solved, just solve it. Having a process is great but it comes with an administrative burden and on occasion, that burden is disproportionate to the benefit gained. We have a rule of thumb that we ask our engineers to adhere to… if they find some technical or functional debt that they believe can be solved within a day of effort they should discuss it with their squad and if the others agree then they have the authority to adjust their sprint to get this problem solved.
If the problem can’t be quickly resolved then it goes through the QSAR process.
Throughout the quarter each squad maintains a page in our documentation tool (notion.so) using the following template to log debt as it comes up during the quarter.
Tip: each squad will own a page and as they come across issues they will fill in at least the first 3 fields of the table without dwelling on the issue too long.
Towards the end of the quarter we will put time into the sprint for the squads to flesh out and complete their QSAR pages. Depending on the issue, this might entail the creation of a full-blown project page in Notion or it could just be a couple of Jira issues. We also use this as a checkpoint to update any system-level documentation to ensure that our documentation reflects the actual state of each system.
Prior to the end of the quarter our CTO will review each squad’s QSAR page and meet with the squad to discuss each issue and if needed assist with solution design. The CTO decides if an issue should be worked on in the coming quarter and then assumes the responsibility of being the product owner for each QSAR that has been flagged to be worked on. Importantly, the CTO acts as the project sponsor within the business and advocates for the allocation of the engineering team’s time on solution building.
The engineering squad has a C-level stakeholder that can advocate for them and make sure that technical and functional debt is spoken about at the highest levels of the business. This makes it much easier for the team to maintain the balance between new features and maintaining the existing system without creating an unhealthy level of friction between engineering, product and sales.
At the beginning of the quarter when it comes time to set OKRs, the squads will include any debt items that the CTO has flagged in the quarter and an agreement will be reached with the Product Team for where this will fit in the quarter’s roadmap. At Evergen, OKRs are published across the business and every team member is aware of every squad/team OKRs.
The OKR process allows us to be very clear with the wider business about what new features we will and won’t be delivering, as well as what technical and functional debt we have, how it is affecting us and what needs to be done to remedy it.
The QSAR process has been great for the engineering team. It has streamlined how we track and make time to address technical and functional debt across the business. We don’t create any less debt, but the debt we have is no longer causing stress or friction across the business.
The increase in debt visibility has also had some interesting side effects that we didn’t foresee when we started this journey.
Can the business write debt off?
By giving the business greater visibility of the issues, they are able to make decisions that allow us to avoid having to resolve debt and manage risks effectively. For example when making the business aware of areas within the platform that don’t scale too well and would need a re-work, other departments said…
“That feature doesn’t generate financial value anyway, we should just stop selling it.”
The debt is still there, but because the business was made aware of it we can defer paying it back.
Better debt management
We have also found that by making the business aware of our technical and functional debt, we are able to work more effectively to manage it, and lean on other departments to assist with the mitigation of debt while we undertake the necessary refactor.
No matter how amazing the team or tech stack, technical and functional debt are impossible to avoid. Instead of pretending that it doesn’t exist and paying into the protection racket of “20% time”, our experience suggests discussing the problems you have and working to provide transparency of your technical and functional debt to the entire business.
References: 1. Slash Gear 2. Blog – Marty Cagan