Before You Publish Appraisals, Ask Yourself: Are You About to Reward Bias? 

The Mars Curiosity Rover wasn’t just sent to Mars to collect data – it was built to explore questions no one could yet answer. It didn’t land there by chance either. It happened because thousands of people believed that discovering new things isn’t optional – it’s a responsibility.  

Progress, after all, depends on a certain kind of thinking. The willingness to ask why, to challenge assumptions, and to keep pushing on what if long enough for patterns to emerge and insights to take shape. 

Curiosity, in that sense, isn’t just a machine. It’s a mindset. 

And yet, that mindset is often missing where it matters most – like during the appraisal season. 

By the time the performance ratings are finalized, most organizations have shifted gears into execution mode. Reviews are written. Ratings are calibrated. Promotion recommendation decisions are lined up. The focus is shifted to rollout – getting everything published smoothly and on time. 

But what if this window, right before the results go live, isn’t the end of the process – but for you to be Curious? 

Curious as in taking a deliberate pause to interrogate outcomes. To ask – Why are certain teams rated consistently higher than others? Are you being reviewed on the performance, or just based on what was most recent or most visible? 

Because the bitter truth is performance cycles don’t fail with a big bang; they fail quietly – through subtle errors and inconsistencies, subconscious assumptions, and patterns that will only emerge when someone is curious enough to look for them. 

Here, we’ll make you uncomfortable by asking the harsh question: 

Are you about to reward the bias? 

To avoid this, we’ll discuss 3 key checks that will be worth running every single year before you hit publish: 

Are you being paid your worth? 

Let’s start with a question your compensation team almost certainly knows the answer to, but your HR Business Partner might not have on their radar. 

Where do you actually sit in your salary band? 

Not just your grades. Not just your title. But within the range itself – are you near the bottom, hovering around the middle, or nudging the top? That position has a name: compa-ratio. A ratio of 1.0 means you’re right at the midpoint of your band. Below 1.0, you’re in the lower half. Above it, the upper. 

Now take that number and lay it next to your performance rating. 

Two uncomfortable clusters tend to show up, and once you see them, you can’t unsee them. 

The first is someone who’s been rated “Below Expectations” but is sitting at a compa-ratio of 1.15 or above. They’re being paid well above the midpoint of their band, and they’re not delivering the performance that justifies it. That’s a budget problem, yes, but it’s also quietly unfair to every colleague around them who’s working harder and earning less. 

The second cluster is the one that should keep you up at night. Someone rated “Exceeds Expectations”, your best people, sitting at a compa-ratio below 0.85. They’re in the bottom quartile of their salary band despite delivering exceptional work. And here’s the thing: they usually know. They can feel it. If they haven’t started quietly exploring other options yet, the moment you publish this appraisal without addressing it, the clock starts ticking. 

        Are you being paid your worth

Neither of these is a rare case. They exist in almost every performance cycle, in almost every organization. They persist not because HR doesn’t care, but because nobody ran the cross-check before hitting publish. 

So do it now, while you still can. Pull compa-ratio from compensation, lay it against your ratings, and look for these two clusters. What you choose to do with what you find is what separates a genuinely fair appraisal process from one that just looks like the part. 

Is your manager rating fairly – or just consistently? 

Here’s something nobody says out loud during performance calibration but probably should: every manager walks into that room convinced their ratings are fair. 

And most of them genuinely believe it. 

But there’s a difference between meaning well and rating fairly – and the gap between those two things only becomes visible when you zoom out and look at the numbers side by side. 

So, ask yourself: how does each manager’s rating distribution actually compare to everyone else’s?  

Picture this. Your company’s average for “Exceeds Expectations” sits at 18%. One manager has rated 55% of their team there. Now, is it possible they’ve somehow assembled the most exceptional group of people in the entire organization? Sure, it’s possible. But it’s worth asking the question, because what’s more likely is that this manager struggles to have difficult conversations, defaults to generous ratings to keep their team happy, and has quietly made “Exceeds” mean something very different on their floor than it does everywhere else. 

The other end of the spectrum is just as damaging, just in a quieter way. A manager who rates 95% of their team as “Meets Expectations” – regardless of what actually happened that year, is essentially telling their strongest performers that outstanding work and average work look the same from where they’re sitting. That’s not just demoralizing. It’s the kind of thing that makes people update their LinkedIn profiles. 

This isn’t a call to force everyone onto a bell curve. It’s simpler than that. Find the managers whose distributions are genuine outliers, sitting more than one standard deviation from the peer group average, and have a calibration conversation with them before the results go live. 

Because once the appraisals are published, you’re no longer course-correcting. You’re managing the fallout. 

Is the same performance worth less depending on who delivers it? 

This one will make quite a few people uncomfortable, which is precisely why it needs to be asked. 

Not across the whole organization, but look within the same job grade, same function, even control for tenure, and compare average performance ratings across gender. At the organization level, it becomes too broad. 

Now look at these numbers. 

If women at Grade 5 in your Sales function are consistently rated 0.3 points lower than men at the same grade, and that gap holds across multiple managers and multiple review cycles, that is a systemic signal. Not an anomaly. Not a coincidence. A pattern. 

 

And here’s what makes this particular check so important – the bias is rarely intentional. Nobody sat down and decided to rate women lower. But as per McKinsey and LeanIn’s Women in the Workplace 2024 Report, for every 100 men promoted, there are only 81 women promoted, which is only 3 more than the women promoted as per the study in 2018. The ratings themselves were part of the problem, year after year, long before anyone noticed. 

The window before appraisals go live is where you can actually do something about it. You still have room to course-correct, to go back to managers, to ask harder questions, and to fix what can be. Once the results are published, that window closes. What was a quiet pattern becomes an official record that shapes someone’s pay, their promotion trajectory, and their sense of whether this organization actually values them. 

Summing Up 

The organizations that answer these questions right don’t just run appraisal cycles. They investigate them.  

They treat the gap between “ratings complete” and “results published” as a window for discovery, a chance to surface bias, test alignment, and ensure that what looks fair on the surface actually is fair underneath. 

In other words, they do what Curiosity was built to do: They ask better questions before they accept the answers. 

Good news is that your Oracle Fusion HCM Cloud already captures the data for every check we’ve discussed above – salary bands, performance review final ratings, legislative and demographic attributes. It’s just a matter of running those checks.  

Oracle’s Workforce Compensation module can flag the mismatches in compa-ratio and performance ratings, the two danger zones – Overpaid Underachievers and Underpaid High Performers, to the managers even before they even submit their recommendations for promotions. Most implementations never configure these warning signs. They exist. They just sit unused. 

One route to reduce leniency or severity bias is by switching the rating scales. Instead of letting the managers review on a 5-point scale, with a comfortable “Meets Expectations” where they can park 80% of their team, switch to a 4-point rating scale. Every rating given would be backed with deliberate thought. It involves a small configuration change, but the impact is significant. Furthermore, leveraging automated rating calculation can limit the scope for arbitrary or inconsistent scoring. 

As per Harvard Business Review’s article How Gender Bias Corrupts Performance Reviews, and What to Do About It, the fix for gender bias starts with having 360-degree feedback. Oracle’s Performance Management is well-equipped to do this. It can be used to allow input from supervisors, matrix managers, and even peers, while check-ins can be set up for regular touchpoints throughout the year rather than a single end-of-cycle review. In addition to this, the performance rating scale can be fixed by using outcome-specific, gender-neutral criteria that evaluate what an employee did, not who their manager thinks they are. 

And then there’s AI. An AI agent, in combination with your HCM data, will not wait for the HR or anyone else to ask these questions. It keeps monitoring, flagging managers whose ratings have drifted beyond one standard deviation from the peer group, shedding light on compa-ratio and ratings mismatch, running a check on gender bias using demographic attributes automatically. This may sound like a distant possibility, but this is what outcome-based tools like HindsightAI are already designed to do – to not let your data vanish into the dark before the window closes. 

Curiosity was built to explore a planet about 140 million miles away, exploring and deriving actionable insights from your HCM data should be an easier problem to solve. 

 

Image (1)

Dhriti Kothari is a Senior HCM Consultant with a passion for helping organizations create smarter and more people-centric HR processes. She enjoys exploring the intersection of technology, workforce strategy, and employee experience. Outside of work, Dhriti loves singing and getting lost in a good fantasy novel.

Frequently Asked Questions

Way more than managers want to admit, honestly, Affinity bias is probably the most common one. You just hit it off with someone, and somehow, they walk out of review season with stronger feedback than the quieter person who actually shipped more. Recency bias is another big one. Whatever happened in the last three weeks tends to colour the entire year’s rating, even if the first ten months told a completely different story. 

Gender bias is everywhere too. A woman pushes back in a meeting and she’s “abrasive” or “too much.” A guy does literally the same thing, and he’s “driven” or “confident.” Then you’ve got the halo effect, where one really strong project makes everything else look better than it actually was. And on the reverse, one bad miss and people can’t seem to let it go for months. New parents, mums especially, quietly get labelled “less committed,” even when their output hasn’t dropped at all. 

The frustrating part is that in the moment, none of this feels like bias. It usually just feels like a gut call. 

You’re not going to get rid of it completely; we’re human; our brains cut corners. But you can stop it from running the show. 

The biggest thing that helps is structure. When every manager answers the same questions and must back up their ratings with actual examples, there’s less room for gut feelings to take over. Calibration sessions help too. Getting managers in a room to compare ratings forces them to explain why one person is a 5 and another is a 3, and that’s usually where shaky reasoning falls apart. 

It also helps to actually look at the words being used. “Aggressive,” “not a team player,” “rough around the edges”; these show way more often for some groups than others, and they’re worth questioning. Giving feedback throughout the year rather than cramming it into one big review takes pressure off memory. And every so often, look at the patterns in your data. If the same kinds of people keep ending up at the bottom, something other than performance is probably going on. 

Sort of. The conversation is happening more than it used to; most companies run some kind of bias training, talk about fairness, and have at least a basic review framework in place. That’s progress. 

But there’s a real gap between talking about bias and actually fixing it. Plenty of employees still leave reviews feeling like the rating had more to do with their manager’s preferences than their work. Women, people of colour, and remote employees are especially likely to say the process doesn’t feel fair, and that’s not something you can train your way out of in an afternoon. 

What’s shifting things more than anything is employees themselves pushing for clearer criteria and the ability to push back without it hurting their careers. That’s where the actual change tends to come from. 

Stay Ahead with ERP & AI Insights

Be part of our growing community. Subscribe to our monthly newsletter and get actionable insights on ERP, AI, business solutions to optimize your ongoing operations

Subscribe for Insights

Launch your enterprise’s Oracle success story

Begin your Business Value Maximization journey with us. Schedule a complimentary consultation today to understand how we make it a smooth ride for you.

Contact Us