I can't stop thinking about this lately. After several research projects and across 25+ in-depth sessions with teams tasked with engineering transformation, as a psychologist for software teams, I think we're fundamentally failing to prepare engineering managers to handle the sociocognitive work of triage.
Here’s a signal that this failure is happening:
"We spent an endless amount of time planning and we still ended up in a crisis. The crisis was, honestly, devastating. We’ll probably be picking up the pieces for a year if I’m honest.”
"Wow...what is the business doing to invest in the engineering teams healing from this?" This is usually met with laughter, so I have to pivot to a question engineering managers will answer: “So...what have you changed?”
“Well, we never could’ve planned for this…"
You may think about emergency rooms when you hear the word triage, and you should. But it has a broader application as a form of cognitive problem-solving. Triage is a skillset that's specifically about dealing with the unexpected and prioritizing among many urgent tasks--usually at cost to something else that we value. Not all engineering crises are disaster work (although some are – and this should be more recognized! Software is part of what keeps the lights on and the hospitals running), but handling a crisis still produces a pervasive, measurable, very tangible impact on knowledge workers.
But instead of talking about triage as its own experience, I see teams and their managers talk a lot about "planning" and "complexity" and other vague (so high level as to be unhelpful imo) areas. We try to apply a lot of pre-planning to triage situations, and get frustrated when it fails.
I think for one thing, we could better support engineering managers by preparing them distinguish between triage and non-triage planning. One key psychological skill for handling triage is cognitive flexibility–for example, a timely and adaptive recognition that “the old rules” aren’t holding any more. Eng managers tell me a lot of stories about sticking to old plans for too long, and failing to hear the early signals of breakdown.
One fascinating thing that happens to us in situations of triage is that we can expect our quick decision-making to be massively prioritized, which we often do by relying on highly patterned behaviors. Sometimes these are highly biased. This is because under stress, our diminished resources make us different. It is a simple point but often hard to remember. So it's important to recognize that responding to a crisis often means we also fall back into old patterns, "shadow" rules, and heuristics.
Another signal, conversations like this:
"We're evidence based, absolutely." "Really?" "Well....you know when things are really, really going wrong, you have to move fast. Rely on your gut instinct." "And did that make you feel like you've got the information you need to make a decision?" "Absolutely not. My team is angry I made the best call I could, and it feels like they'll never forgive me."
Based on my research, engineering managers are really stressed in general. I think this burden is massively underreported and undermeasured. I'm not playing the tiny violin for some of the most powerful employees in any modern technology organization, but let's be real. It's a sign support is needed. When we talk about any single role “failing” inside of an organization, it is often the case that we are sampling a systemic failure via information that is being surfaced by individuals.
In other words, engineering managers can be seen as an early warning system about the whole environment. But I hear massive differences in perception from the manager POV and other POVs – leaders complain to me about the eng managers, and thought leaders in the software research space seem to be able to make entire careers just on the theme “managers can never be trusted.” I find this alarming. It feels like a variant of contest culture, and we already know that exerts a large toll on people’s wellbeing.
One area where I think we could improve in our manager support is by allowing managers to access over-time information about engineering work, for just in time diagnosis. Sometimes, this information might need to be comparative and shared between managers. Managers tell me constantly that they feel incredible isolated from both their peers and from comparative information in moments of crisis, especially new managers who may be worried about proving themselves. While I am incredibly aware of the potential harms of tracking our work (after all, I used to work in education), we need to think about where "trace data" information tracking can come into play to reduce cognitive load for managers trying to make swift decisions, handling a barrage of conflicting reports, and who don't have access to career's worth of information about different software team situations yet. All people learn by exposure to diverse examples. This is a principle of human learning, but it's unclear to me how we provide that to engineering managers in any structured way.
A signal that some thinking is needed here: “I went around the team and gathered everyone’s best memory about what was going on….well lots of people didn’t really agree with each other, and so I had to stand up in a meeting in front of everyone and try to synthesize a POV. I did my best to please everyone and it feels like it ended up pleasing no one. They hate me for it now, but I guess it turns out it's just the job of a manager to be hated sometimes."
We will absolutely not sustain healthy leadership structures with this kind of learned helplessness. Let's consider a different sociocognitive story: “We came together to honestly share about the steps towards this incident, and then we all checked ourselves against a few pieces of data. Together, we figured out the story. I made a tough choice and we saw our limitations, but at least we were in it together. I learned even if I don't have all the answers, it's the job of a manager to make space for us to come together like this.”
Finally, one of the biggest risk signals I think about across all my data and the stories I hear is the overall, cumulative weight of it. In some industries and orgs, engineering managers are called upon by their organizations to take responsibility for triage not just once in a while, but often. Do engineering organizations have a sense of the cadence of this triage? When engineering managers become more effective and adept at using triage skills to prevent failures, does that feel recognized and known? Do we know whose career this is happening to, and do we think about this invisible load as a significant threat to the software stability of the world?
I don’t think so. One signal that makes me think we aren’t fully getting this: the sociocognitive, psychological cost of triage is thrown into the same general bucket as the paper cuts of “developer experience”, like adrenaline dumps for six months and chronic stress are the same thing as investing in a new license for a slightly shinier tool. Yes, those paper cuts add up, but “developer experience” is not currently providing a human-centered perspective on the real crisis work that many software teams have to face. We need psychology for that.
Kommentare