Goodhart, Bad Intentions!
Peter Drucker famously said “What gets measured, gets managed“. This is largely true. You can only improve what you measure. However, there’s another conniving little force that takes place when a metric is known to those being measured. The Goodhart Law states that when a measure becomes a target, it ceases to be a good measure. This is because when people know they are being judged on a particular measure, they do everything in their power to improve upon that metric.
There is a (in)famous story of a company that measured time spent customer support calls. This measure was chosen to improve customer resolution speeds. The top rated customer service representative for this company had unbelievable speeds of resolution. Most his calls lasted less than 1/3rd of the time it took other people in the department. It was only when the management looked much deeper into his workflow (in order to learn and improve others) that they realized that the reason he had such short call times was that he simply hung-up on customers after giving them a single potential solution – without waiting for them to try out the solution and let him know if the issue was actually resolved. The poor customers simply had to call back and talk to someone else who would really solve their problem – while taking a hit on their own closure speed metric!
Or take my favourite example of my car company’s service arm (I won’t name the company cause I still need to get my car serviced). They have a system where after each service, their customer satisfaction team calls the customer to ask for their rating. They have a target to get a rating of 9 and above. Every time I get my car serviced, I get a call from them asking for a rating. If I give a rating below 9, they offer all sorts of discounts and freebies and request me to give a higher rating when the “actual” customer satisfaction team calls! They game the system by simply pre-empting the ratings call!
Performance Ratings
The Goodhart Law plays out in HCM way more than most people realize. Take the company Cypress – the story of which was shared by T.J. Rodgers in the Harvard Business Review article “No Excuses Management” in the 1990s. It was a highly measurement driven company. I quote the article here:
“All of Cypress’s 1,400 employees have goals, which, in theory, makes them no different from employees at most other companies. What does make our people different is that every week they set their own goals, commit to achieving them by a specific date, enter them into a database, and report whether or not they completed prior goals. Cypress’s computerized goal system is an important part of our managerial infrastructure. It is a detailed guide to the future and an objective record of the past. In any given week, some 6,000 goals in the database come due. Our ability to meet those goals ultimately determines our success or failure.”
Have you heard of Cypress? I suspect not.
One of the things that happens when you measure people on the number of goals and track the aggregate completions is that people put in goals that are easy to complete. They did 6000 things that didn’t really contribute much. After all “brush my teeth” counts the same as “build tesla” if all you measure is the number of goals.
This may be an exaggerated example, but I’ve seen most companies succumb to this fallacy. We set goals in April, review them in March, and base our performance appraisals on that. We feel “scientific” when we make SMART, quantifiable, objective goals. But unfortunately, when we start to measure people on objective, reduced metrics and tie a paycheck to it – people tend to figure out how to boost the metric in the letter without necessarily boosting it in spirit.
“Boost revenue by 10%”? Sure! I’ll make sales with razor thin margins and reduce profitability in favour of revenue!
“Complete 2 certifications”? Sure! Let me pick the easiest certifications I can find and use cheatsheets to get through!
You get the point.
So how do I solve this?
Well, you don’t. Not fully. But what you can do is use multiple measures. That’s my favourite way anyway. Using regular reviews and multiple different metrics as a way to measure performance. You can round out performance metrics across different areas that cover contribution to the team, feedback from peers, feedback from juniors (360 feedback is important), innovativeness, ability to bounce-back from difficult situations, rate of learning from mistakes, outcomes generated, and so on. The idea is to look at the overall contribution by a person as a whole rather than reducing people to a single talent rating.
The other thing to do is to use proxy metrics. Analytical folks have an inherent bias to be objective, and therefore prefer objective metrics. But in most human business, the qualitative metrics matter more. The way to measure qualitative impact is to look at proxies. For instance, look at the Customer Satisfaction (CSAT) “score” but also look at number of renewals, public testimonials, positive words shared by customers, referencibility, willingness of the customer to give higher rates to retain the original team etc. as proxy metrics for customer satisfaction. These proxies are much harder to optimize or game.
A lot of this depends on a company’s size, scale, culture and nature. We’ve been helping customers find their unique talent processes that work for them. While these are just two ways to improve performance appraisals as a process, each company would need a unique approach to talent management.
The Orbrick Way: POKR
We at Orbrick implement a version of this idea in our very own POKR framework which is based on John Doer’s OKRs. Its Purpose, Objectives, Key Results/KPIs, and Rituals. These are big, ambitious, purpose-guided objectives that are measurable but the measures aren’t targets. The purpose is always personal for each employee. It is the answer to “why do you come to Orbrick on a Monday”, or “we know why Orbrick hired you, but why did YOU hire Orbrick?”. The purpose then brings out the objectives and the objectives that align with the organization in some way are selected (generally no more than 3 per quarter). There are no consequences to failure. We then use something called a talent dossier which measures over 20 different metrics to gauge how a person has performed. There is a quarterly reviews of people’s POKRs. And there is a regular updation of the talent dossier too. This helps us keep biases (like recency bias or availability heuristic) low and is more resilient to being gamed!
Purpose derives Objectives. Objectives guide Key Results or Goals. Goals are met through Rituals. Rituals reflect Identity. Identity fuels Purpose.