Benchmarking is a bit like putting a stake in the ground so you can see how far youâve come when you reach the end of a process.
How it relates to navigation design
– It allows you to see the scale of the problem at the start of a design process, and
– measure the outcome once itâs launched.
Ultimately your goal is to create something that doesnât make the problem any worse.
Why you should benchmark
You might think having a sense of where you are now isnât that important, but things can get lost in design processes. This is especially true if the time between problem identification and launch is long.
Benchmarking also gives you a bit more information on the problem youâre trying to solve (win-win).
The key reason though is to help you understand if your changes are a success or failure.
What is benchmarking?
Iâve heard a lot of people ask âwhatâs the industry best practice for x so we can benchmark against itâ.
At which point my head usually hits my desk with a thunk.
Navigation design performance is unique to your product or service. Youâre not keeping up with the Joneses.
The measurements which you use are widely used, but how you interpret the data or insights needs to be unique to you.
A lot of analytics data is a closely kept secret in an organisation.
The only universal metric that can be benchmarked to an industry standard is bounce rate. However, even then the time limit for this can be adjusted in your analytics software.
How do you benchmark?
You first need to define the parameters.
These will need to include:
What you are benchmarking (please see understanding the problem for more information on this):
– Analytics data
> Screen capture session data
> How well someone completes a task in testing
> Number of customer support queries
> NPS findability score
> Tree test score (aka findability)
– Data parameters, which could include (this is not an exhaustive list)
> Date range – i.e. full calendar month, three calendar months
> Volume – i.e. percentage of people who are using google as navigation
> Accuracy – i.e. 4 out of 5 people couldnât find the navigation in testing
> Findability – i.e. x% of people couldnât find what they were looking for
Once youâve defined them, youâll need to get the data that answers these questions.
Statistically viable data
The data youâre capturing needs to be a statistically viable representation of the problem and the audience.
One weekâs worth of data is not a true representation of the behavioural patterns. That week could be anomalous.
Even when working on a website like NHS digital that had 1billion visits in 2021 I refused to take a weekâs sample as evidence of a significant problem.
I could see problems starting to form but I still had to adopt a watch and wait approach. This was to see if it was
– a coronavirus blip,
– if there was an external force causing a blip,
– if there was a seasonal impact, or
– if we had a problem
If youâre wondering, I saw all of those things happening from the search analytics data alone.
Why itâs essential itâs statistically viable
So I worked on a project where stakeholders had been against a navigation approach proposed by a team working on it.
So they designed something and compared it against what they currently had.
They declared what they wanted to implement was a winner, and they had data to prove it!
The problem was the sample sizes.
The sample size of their preferred option was smaller than the benchmark.
Both sample sizes were also very small.
The site got in excess of a million visitors a year.
The benchmark sample had around 34 people in it.
The preferred option had 26.
The two were not comparative and they were too small to be statistically significant to base an important choice on.
If you want to sell a major change to stakeholders who donât understand what youâre trying to do, you need a representative sample size.
Sample sizes
Iâm not a statistician. I suck with numbers. I had to resit my GCSE Maths. So Iâm not about to get all numbersey on you.
I while ago I did do some searches on what represents a good sample size for the testing we do. I didnât find very much (note to self, must do this again, probably before writing a piece like this).
What I did find was from Optimal Workshop who give clear guidance on what they see to be representative sample sizes on their testing platform.
Their advice holds true as Iâve done a number of different tests and found them to be accurate in what they say.
The problem is, (isnât it always), the bloody stakeholders.
If youâre using this data to get sign-off on budget or approval they will want to see numbers that represent live traffic data.
The way I get around this, because letâs be honest, a representative sample of 1billion visitors would blow a budget, is to be iterative.
The more data you generate as you go through something, the stronger your case is.
If youâve got analytics data that shows a high percentage of people are not using your navigation, itâs a live indisputable fact.
However, if youâve identified a problem in usability testing or research itâs likely you have a small level of insight. So you would probably want to use that to look at other types of research to help you build a picture of the significance of the problem.
Tree testing or first click testing as benchmark tools
Tree testing can give you a findability score for navigation structure. Iâm heavily caveating this because tree testing doesnât allow for context created by visual language.
Using tree testing
Tree testing allows you to put your taxonomy into a test and you can ask people to find things. It gives you a degree of statistical viability.
It also gives you a measure of findability.
For example, if you test what you have now and find that 68% of people couldnât find the key things in your structure you have a problem.
The problem could be the labelling.
I say could be, because visual context is vital in navigation design. Your labels might work but you have a burger menu icon and 80% of traffic is on mobile.
On the flip side, your navigation could be awesome, but your labels are meaningless.
âđ» A word of caution. If you use tree testing for benchmarking you will need to write the questions to work for benchmarking and testing. If you donât you could bias the result of the testing. This isnât easy, sorry.
First click testing
First click testing uses the taxonomy with the context of the design. Yay!
But itâs static images so you can only put one scenario into play at a time.
Using the previous example, the structure maybe brilliant, but youâve got a burger menu icon.
This is how you can tell if itâs causing a problem, with quantifiable insights.
However, if you wanted to test the labels in situ of the design you would have to show the navigation menu exposed.
The same caution applies, in that you would have to write the test scenario questions in a way that works for benchmarking and testing.
The wrap up
Benchmarking is a numbers game.
The purpose of benchmarking is:
– to gain a greater understanding of a problem
– to get some numbers to keep the stakeholders quiet
– to help you understand the impact once youâve launched
This post is part of a series on the four phases of navigation design testing and research. Please also check out the taxonomy of testing and research, and understanding the problem.