Why Do Good People Use Bad Psychometrics?

There are many different tests out there that purport to measure key characteristics of employees like personality, values and preferences. But not all psychometrics are created equal. There are some, excellent tests while bad tests can be worse than useless.

So why are so many many invalid, unscientific or downright bad tests are still widely used? Why do knowledgeable consultants, skilled HR professionals and excellent coaches still use bad psychometric tools?

Familiarity. Often consultants and coaches have an old favourite. A test they have used for a long time, are comfortable and confident using. There may be more recent and more valid tests, but it’s easiest to use what you know even if it isn’t the best tool for the job.

Cost. Cost is always a factor, so why not use a test that is inexpensive, or even free? There are freely available psychometric tests, some of which may provide some relatively useful information. But often the old axiom holds, you get what you pay for.

Common Language. Psychometrics can be complex, but a shared set of terms and definitions can be extraordinarily useful. Once a group learns the jargon and has a new way to talk about a concept like personality the terms stick. Once a group is taught the lingo and feel “in the know” it creates a sense of shared group knowledge and insight. Even if the words don’t really mean anything.

Starts the Conversation. One of the most common reasons consultants who know they use bad psychometrics give for using them, say it gives people something to talk about. A standard testing framework can help make very personal or abstract thoughts, feelings, emotions, more concrete. The testing, along with that common language can spark all number of conversations, activities, games, and “places to go” in a training workshop.

The Client Wants It. Oftentimes, a client is already “sold” on a particular psychometric, but they need someone trained to deliver the workshop. This is a business after all, and so some will just give the customer what they want. And an excellent trainer can deliver a good workshop or training session even if they are using poor psychometrics.

The Marketing Machine. Some of the least valid, most unreliable psychometrics have a skilled or well-funded marketing machine behind them. A good sales team can sell anything, even if there’s little actual value delivered by the product.

The Output is Attractive. There’s an old computing expression about “Garbage In, Garbage Out”. Unreliable tests cannot produce reliable results. But unreliable results can be very nicely packaged, particularly in some feel-good language that makes everyone feel good about themselves.

Problems “getting it right”. Some people, teams and organisations are dysfunctional. There’s an old joke about psychologists and changing light bulbs. The light bulb has to want to change. Often, dysfunctional groups fear change and want to preserve the status quo. It feels “safer” to do some useless development activity that ticks that box on the HR checklist instead of using a good psychometric that might identify the real problems.

The field of teasing is still relatively new, and there are some legitimately good, scientifically sound psychometric tools. The list of excuses not to find and use the good ones still runs rampant, but their legitimacy is fading. And many companies and consultants are getting savvy about which tests are genuinely good, valid and useful, and which legitimately add value.


Ian MacRae

Not Everyone is Normal, but Most People are Average

With few exceptions, all human characteristics from bra to brain-size and height to hearing are normally distributed.  Everything, not just intelligence, is a bell curve.  And because we know a lot about the statistical properties of a bell-curve we “know” most of us are average on most things.  The trouble is when others tell us that this is so. 

What the bell curve shows is that around 68% of us fall between a standard deviation above or below the norm.  In IQ terms that is 85 and 115: in height for men it's probably around 5 foot 5 to 6 foot; in bra sise it's probably 34 to 38.  And 96% of the population lie between two standard deviations: in IQ terms 85 to 130.  A few are really bright and a similar number really dim.  Most of us are average on most things… alas!

Sometimes it’s a relief being told one is average.  Most middle-aged baby boomers react positively to being told that their sex life is average, which can also mean normal.  But for generation-X, brought-up on a rich diet of self-esteem improvement, being told one is average as opposed to extra-ordinary, special, very talented…is not so wonderful.

However there is one area of life where the bell-curve rarely rings true and that is in the world of appraisal.  Curiously this is despite the fact that we know that most work-related human abilities are normally distributed.  Consider the typical 5-point rating scale designed by HR for progress reviews and performance appraisal.  Typically 5 will be described as “outstanding” (or other superlatives like “exceptional performance”); 4 as “above standard”; 3 as “meets standards”; 2 as “below standards”; and 1 as “well-below standards”.  If work performance were a bell curve we would know exactly how many of each number we would find.

But everyone knows from experience that this never happens.  The five-point scale becomes, in effect, a three-point scale because numbers 2 and 1 are so rarely used.  Organisations are aware of this and try to deal with it in one of three ways:

The rarest, most Draconian and least successful is the forced distribution method.  In effect, this forces the bell-curve by giving the rater a specific number of 1, 2, 3, 4, and 5’s that he/she can use.  If they “run-out” of 4’s they have to use 3’s etc.  Mangers hate this because they are forced to differentiate, which has implications for giving negative feedback, which is what they really hate most of all.

The second method is to change the labels.  So one has nearly all positive rating labels indicating above average: only the lowest score suggests below average; or may even suggest average itself.  Thus the message is the worst you can be is average.  This is the favoured approach particularly where the culture is litigious.  This is a test of feedback by euphemism: a language spoken with unselfconscious eloquence by special relationship, transatlantic, ex-colonial friends.  Weaknesses used to be called developmental opportunities for the skill-challenged.  Now they are called latent strengthettes! The trouble with this method is that we soon run short of descriptives for Stakanovite Wunderkinds.  The descriptors can also sound very hollow to the sceptical supervisor who actually works with these average but positive-feedback-hungry souls.

The third method is to widen the scale.  If a 5-point scale effectively becomes a three-point scale; then theoretically a seven-point scale becomes five points, nine becomes 6 or 7 etc.  Whilst this is usually true, scale constructors run out of labels for a 10 point scale.  And there is the problem of words not matching numbers: is the difference between exceptional and strong (performance) the same as between average and weak? This method taxes the form constructors’ verbal skills too much and is soon abandoned.

And all this fuss is about having to tell someone the truth: that much of their performance at work and much of their behaviour (attitude, skill, output) is average.  We all want to believe in Lake Woebegone, “Where all the women are strong and the men are good looking and the children are all above average”.  But if Lake Woebegone is a reasonable sise, the truth must out: some women are strong but most are average and some men are handsome but others ugly and alas only about half of the children are above average.

Differentiating feedback works well.  To be given poor ratings is an excellent warning or wake-up call for the serious under-performer.  He or she may “buck up their ideas”, increase effort or look for another job.  Unless the rating is biased or the employee deluded, the well-below standard, serious-deficiency feedback is an excellent yellow card.  Top ratings only work well if they are accurate but unique.  Satisfaction from a top score is soon evaporated if one discovers others have similar ratings: further, a top score has to be matched by rewards like money, promotion or some other valued perk.

The problem lies for that middle group of 2/3 to 3/4 of employees who are pretty average.  They are OK; average, middle-of the-road.  Most are the backbone, mainstay, heart of the company.  But vanity and pride and what the Chinese call “face” means that people do not enjoy giving or getting that message.  Of course, the message is often harder to give because it is tied to monetary reward – the most tangible sort of feedback.

So raters come up with well-rehearsed and familiar lines explaining why they do not or cannot (“in all honesty”) give the statistically expected number of average ratings.  They include: “My people are actually above average” – meaning both company and population standards.  After all, we selected them; and they work for the world famous “Acme Widgets” so they must be better than average.  Another approach apes the funny way Oxford dons used to have wonderfully fine distinctions: ‘alpha + - -‘ which is subtly worse than ‘alpha + + -‘.  So by making fine distinctions ‘within’ the average category, one can, paradoxically, ban the concept of average.

The third method of trying to avoid the “I think you are average” feedback is currently very popular.  It’s the 360-degree method whereby employees are not only rated by their boss, but also by their colleagues, subordinates, clients, customers… anybody, in fact, who knows them.  Here the feedback given can say you are average because the manager is simply the messenger of a large group, not personally responsible for the rating.  If ‘everybody’ says you are average it must be true! Perhaps the popularity in 360-degree feedback lies partly in the way people can receive feedback about their averageness.

Interestingly, if people evaluate themselves on a whole range of characteristics, the results seem fairly consistent.  People think they are slightly above average (about half to a full standard deviation) on all desirable characteristics: altruism, empathy, intelligence; and equally about the same amount below average on less desirable characteristics (dishonesty, jealousy etc.).  It maybe psychologically healthy for the ego to maintain these beliefs but it partly explains reactions to feedback that does not fit that carefully constructed but fragile picture.

The self-esteem movement that originated in America and which has had a profound effect on school children and University students is bubbling through to the world of work.  It is making it much more difficult for the honest appraiser to reveal the truth during any type of appraisal.  Those who are used to, and expect, exclusively positive, above-average feedback react rather badly when given the unusual message – your work is average.  The paradox is now that average is exceptional. 


Adrian Furnham