Why Do Good People Use Bad Psychometrics?

There are many different tests out there that purport to measure key characteristics of employees like personality, values and preferences. But not all psychometrics are created equal. There are some, excellent tests while bad tests can be worse than useless.

So why are so many many invalid, unscientific or downright bad tests are still widely used? Why do knowledgeable consultants, skilled HR professionals and excellent coaches still use bad psychometric tools?

Familiarity. Often consultants and coaches have an old favourite. A test they have used for a long time, are comfortable and confident using. There may be more recent and more valid tests, but it’s easiest to use what you know even if it isn’t the best tool for the job.

Cost. Cost is always a factor, so why not use a test that is inexpensive, or even free? There are freely available psychometric tests, some of which may provide some relatively useful information. But often the old axiom holds, you get what you pay for.

Common Language. Psychometrics can be complex, but a shared set of terms and definitions can be extraordinarily useful. Once a group learns the jargon and has a new way to talk about a concept like personality the terms stick. Once a group is taught the lingo and feel “in the know” it creates a sense of shared group knowledge and insight. Even if the words don’t really mean anything.

Starts the Conversation. One of the most common reasons consultants who know they use bad psychometrics give for using them, say it gives people something to talk about. A standard testing framework can help make very personal or abstract thoughts, feelings, emotions, more concrete. The testing, along with that common language can spark all number of conversations, activities, games, and “places to go” in a training workshop.

The Client Wants It. Oftentimes, a client is already “sold” on a particular psychometric, but they need someone trained to deliver the workshop. This is a business after all, and so some will just give the customer what they want. And an excellent trainer can deliver a good workshop or training session even if they are using poor psychometrics.

The Marketing Machine. Some of the least valid, most unreliable psychometrics have a skilled or well-funded marketing machine behind them. A good sales team can sell anything, even if there’s little actual value delivered by the product.

The Output is Attractive. There’s an old computing expression about “Garbage In, Garbage Out”. Unreliable tests cannot produce reliable results. But unreliable results can be very nicely packaged, particularly in some feel-good language that makes everyone feel good about themselves.

Problems “getting it right”. Some people, teams and organisations are dysfunctional. There’s an old joke about psychologists and changing light bulbs. The light bulb has to want to change. Often, dysfunctional groups fear change and want to preserve the status quo. It feels “safer” to do some useless development activity that ticks that box on the HR checklist instead of using a good psychometric that might identify the real problems.

The field of teasing is still relatively new, and there are some legitimately good, scientifically sound psychometric tools. The list of excuses not to find and use the good ones still runs rampant, but their legitimacy is fading. And many companies and consultants are getting savvy about which tests are genuinely good, valid and useful, and which legitimately add value.


Ian MacRae

Seven Important Facts About Personality Testing at Work

Publishers and management consultants pushed, peddled and praised the tests so they have become popular over the past few decades. Despite nearly 100 years of research into psychometric testing there are still missed opportunities, poorly administered tests, and many tests with little or no real evidence to support their use.

The popularity of testing ebbs and flows, and there are still many. There remained cynics, skeptics and traditionalists who never trusted them and rejoiced in their publicized failures.

But this periodic coverage is useful for reinvigorating interest in testing.  Here are seven, often asked, questions with answers.

1. Why do people use psychometric tests in recruitment?

Tests can provide reliable data for better decision making.  People are complicated, ambiguous, capricious and difficult to read. Some people conceal, others exaggerate and over-share. Recruiters, managers and HR departments are trying to gauge many different things: creativity, handling pressure, integrity, punctuality, team-working.  They need reliable data to find, hire, promote and develop the right people.

2. Are tests cost-effective?

Cost-effectiveness depends on the accuracy of the test, and the ability knowledge and ability of the test administrator, along with how well the results are applied. When considering the cost effectiveness, consider the cost of getting it wrong. Ever tried to get rid of a well-dug-in, incompetent staff member who was a bad selection decision right from the start?.  Ever seen someone you turned down years before running now a successful competitor company? Weigh the benefits of making a good decision, with the cost of making an error.

3. Should tests be used to select in or select out?

Most recruitment (should) start with a job analysis followed by an accurate and measurable list of attributes required to succeed in the work. Use tests and the available evidence to look for competencies and capabilities. But don't forget it is equally important to look for the negative traits as well as the positive. Charm can be a useful attribute, but may be problematic if it comes with flattery, deception and manipulation.

4. Is lying or faking good on these tests frequent, easy and really a problem?

Everyone presents themselves differently in interviews. They commit sins of omission and commission. Psychologists describe self-presentation and impression-management, while most people would say they are just doing what they need to get the job. Interviews, tests and even performance can be faked to a certain degree.  But if everyone gave the “obvious” and desirable answer there would be two consequences.  First they would all give the same answer (which they patently do not).  Second there would be no evidence of test validity, which there is.  There are many ways to catch dissimulation (which is a polite way to say a lie). The degree of dishonesty often depends on how the test is presented. 

5. How do clients choose between tests?

There are well over 10,000 tests available yet the average recruiter or HR manager has one or two favourites and knowledge of only a few. Some of the most popular tests have a strong marketing machine but little real validity. Test peddlers of both valid and invalid tests know that clients do not know what questions to ask.The main thing is to understand (at least a little about) psychometric qualities such reliability validity and process and how to assess the tests.

6. How important is personality at work anyway?

There are a number of factors that determine performance and potential at work but five are clear: their ability, their motivation, their personality, their colleagues and the organisation’s processes and procedure.  You need to be bright enough for the job and motivated to do it (well).  You need to have a functional ship-shape, well managed organisations.  No “ideal” personality profile can compensate if the other features are missing.  So it’s as dangerous to believe personality is all important as to believe it is not at all important.

7. Personality changes over time?

Personality can change, but rarely does.  Go to a school reunion for evidence. Most personality and ability characteristics are hard wired.  Behaviours, values and motivation can change, but personality rarely does. There is always more evidence of continuity than change, of stability than variability, of consistency than inconsistency.  Trauma, training, brain injury and therapy can change people.  But typically by the mid-twenties what you see is what you get. Teenage introverts are introverts at 90, though they may learn to fake extroversion when its' required.

Of course personality is important at work.  Of course there are more or less desirable profiles for particular jobs.  The question remains: how you choose to find out about an applicant’s personality?


Adrian Furnham

Not Everyone is Normal, but Most People are Average

With few exceptions, all human characteristics from bra to brain-size and height to hearing are normally distributed.  Everything, not just intelligence, is a bell curve.  And because we know a lot about the statistical properties of a bell-curve we “know” most of us are average on most things.  The trouble is when others tell us that this is so. 

What the bell curve shows is that around 68% of us fall between a standard deviation above or below the norm.  In IQ terms that is 85 and 115: in height for men it's probably around 5 foot 5 to 6 foot; in bra sise it's probably 34 to 38.  And 96% of the population lie between two standard deviations: in IQ terms 85 to 130.  A few are really bright and a similar number really dim.  Most of us are average on most things… alas!

Sometimes it’s a relief being told one is average.  Most middle-aged baby boomers react positively to being told that their sex life is average, which can also mean normal.  But for generation-X, brought-up on a rich diet of self-esteem improvement, being told one is average as opposed to extra-ordinary, special, very talented…is not so wonderful.

However there is one area of life where the bell-curve rarely rings true and that is in the world of appraisal.  Curiously this is despite the fact that we know that most work-related human abilities are normally distributed.  Consider the typical 5-point rating scale designed by HR for progress reviews and performance appraisal.  Typically 5 will be described as “outstanding” (or other superlatives like “exceptional performance”); 4 as “above standard”; 3 as “meets standards”; 2 as “below standards”; and 1 as “well-below standards”.  If work performance were a bell curve we would know exactly how many of each number we would find.

But everyone knows from experience that this never happens.  The five-point scale becomes, in effect, a three-point scale because numbers 2 and 1 are so rarely used.  Organisations are aware of this and try to deal with it in one of three ways:

The rarest, most Draconian and least successful is the forced distribution method.  In effect, this forces the bell-curve by giving the rater a specific number of 1, 2, 3, 4, and 5’s that he/she can use.  If they “run-out” of 4’s they have to use 3’s etc.  Mangers hate this because they are forced to differentiate, which has implications for giving negative feedback, which is what they really hate most of all.

The second method is to change the labels.  So one has nearly all positive rating labels indicating above average: only the lowest score suggests below average; or may even suggest average itself.  Thus the message is the worst you can be is average.  This is the favoured approach particularly where the culture is litigious.  This is a test of feedback by euphemism: a language spoken with unselfconscious eloquence by special relationship, transatlantic, ex-colonial friends.  Weaknesses used to be called developmental opportunities for the skill-challenged.  Now they are called latent strengthettes! The trouble with this method is that we soon run short of descriptives for Stakanovite Wunderkinds.  The descriptors can also sound very hollow to the sceptical supervisor who actually works with these average but positive-feedback-hungry souls.

The third method is to widen the scale.  If a 5-point scale effectively becomes a three-point scale; then theoretically a seven-point scale becomes five points, nine becomes 6 or 7 etc.  Whilst this is usually true, scale constructors run out of labels for a 10 point scale.  And there is the problem of words not matching numbers: is the difference between exceptional and strong (performance) the same as between average and weak? This method taxes the form constructors’ verbal skills too much and is soon abandoned.

And all this fuss is about having to tell someone the truth: that much of their performance at work and much of their behaviour (attitude, skill, output) is average.  We all want to believe in Lake Woebegone, “Where all the women are strong and the men are good looking and the children are all above average”.  But if Lake Woebegone is a reasonable sise, the truth must out: some women are strong but most are average and some men are handsome but others ugly and alas only about half of the children are above average.

Differentiating feedback works well.  To be given poor ratings is an excellent warning or wake-up call for the serious under-performer.  He or she may “buck up their ideas”, increase effort or look for another job.  Unless the rating is biased or the employee deluded, the well-below standard, serious-deficiency feedback is an excellent yellow card.  Top ratings only work well if they are accurate but unique.  Satisfaction from a top score is soon evaporated if one discovers others have similar ratings: further, a top score has to be matched by rewards like money, promotion or some other valued perk.

The problem lies for that middle group of 2/3 to 3/4 of employees who are pretty average.  They are OK; average, middle-of the-road.  Most are the backbone, mainstay, heart of the company.  But vanity and pride and what the Chinese call “face” means that people do not enjoy giving or getting that message.  Of course, the message is often harder to give because it is tied to monetary reward – the most tangible sort of feedback.

So raters come up with well-rehearsed and familiar lines explaining why they do not or cannot (“in all honesty”) give the statistically expected number of average ratings.  They include: “My people are actually above average” – meaning both company and population standards.  After all, we selected them; and they work for the world famous “Acme Widgets” so they must be better than average.  Another approach apes the funny way Oxford dons used to have wonderfully fine distinctions: ‘alpha + - -‘ which is subtly worse than ‘alpha + + -‘.  So by making fine distinctions ‘within’ the average category, one can, paradoxically, ban the concept of average.

The third method of trying to avoid the “I think you are average” feedback is currently very popular.  It’s the 360-degree method whereby employees are not only rated by their boss, but also by their colleagues, subordinates, clients, customers… anybody, in fact, who knows them.  Here the feedback given can say you are average because the manager is simply the messenger of a large group, not personally responsible for the rating.  If ‘everybody’ says you are average it must be true! Perhaps the popularity in 360-degree feedback lies partly in the way people can receive feedback about their averageness.

Interestingly, if people evaluate themselves on a whole range of characteristics, the results seem fairly consistent.  People think they are slightly above average (about half to a full standard deviation) on all desirable characteristics: altruism, empathy, intelligence; and equally about the same amount below average on less desirable characteristics (dishonesty, jealousy etc.).  It maybe psychologically healthy for the ego to maintain these beliefs but it partly explains reactions to feedback that does not fit that carefully constructed but fragile picture.

The self-esteem movement that originated in America and which has had a profound effect on school children and University students is bubbling through to the world of work.  It is making it much more difficult for the honest appraiser to reveal the truth during any type of appraisal.  Those who are used to, and expect, exclusively positive, above-average feedback react rather badly when given the unusual message – your work is average.  The paradox is now that average is exceptional. 


Adrian Furnham