Accuracy, Explainability and Ethics – Part 1

17 November 2021

The typical answer to that question is, perhaps rather uninspiringly, “it depends”. And although it’s true that the levels of predictions possible can vary depending on different types of data, there is usually a more subtle reason that leads us to that response. It’s because there is often a tradeoff between ‘explainability’ and accuracy (we’ll expand on what we mean by both those terms shortly) in Artificial Intelligence.

Artificial Intelligence can be a pretty complex and impenetrable topic at the best of times, so it is with that in mind that over this blog post and its subsequent Part 2, we will try to walk through the impact that the tradeoff between explainability and accuracy can have on organisations without going too deep into the technical details.

To try and do this, we will work through an example of trying to predict the quality of red wine (!) using a real-life dataset that contains 1,600 different red wines and their attributes which include Alcohol Level, pH Level and Volatile Acidity (whatever that is?!) and gives each wine a ‘quality’ score out of 10 as rated by professional tasters.

We’ll ignore the fact that measuring the ‘quality’ of wine is arguably pretty subjective, and instead set ourselves the challenge of trying to predict which wines would score 7 out of 10 or more. We’ll call the wines that score at least 7 out of 10 ‘Good Wines’ and (perhaps rather harshly), those that score below 7 as ‘Bad Wines’. If you’re interested, you can download the same dataset we’re using which is provided courtesy of The University of California.

Ethics

So where do ethics come into this? Well, within Artificial Intelligence and Machine Learning, there is a range of different approaches and tools that can be used to make predictions. In general, we would call something that makes a prediction (such as the expected quality of red wine given its other attributes) a ‘model’. And in very basic terms, a good model would be one that usually makes accurate predictions.

To come up with a good model is often quite a complicated process, and people who describe themselves as ‘Data Scientists’ are (amongst other things) specialists in understanding how to manipulate data and select the right tools to create models that provide good results.

But, becoming a Data Scientist isn’t everybody’s cup of tea (to say the least!), and the models that they create can range from incredibly simple to completely impenetrable even to the experts. Therefore, if a non-Data Scientist had the task of explaining a prediction that one of these models has made, that task could range from being pretty straightforward to almost impossible depending on the type of model chosen.

The ethical aspects of this come about because although there might not be too much of an ethical challenge selecting a decent bottle of red wine, what about if you’re applying for a loan or a University place and you get rejected? In many of those cases, an answer of “The computer says no…” probably isn’t going to quite cut it.

It is in these types of examples (where a model is making a decision that has a direct impact on someone’s life) that if a decision can’t be explained and justified that this is at best an ethical grey area, and at worst, simply unethical. Furthermore, even when there isn’t a direct impact on someone’s life, or where it is thought that the impact could only ever be considered a positive one, that care still needs to be taken because unintended side-effects of well meaning actions can still bring about ethical concerns.

Explainability

In a nutshell, we would say that ‘explainability’ is how easy it would be for a non-Data Scientist to understand and explain why a particular prediction has been made. And, it is often the case that there is a tradeoff between how explainable a particular model is, and how accurate its predictions are.

So, to try and bring this to life, we’re going to look at handful of Artificial Intelligence models that might be used with our red wine dataset to try and make predictions. We’ll try to keep the technical details to an absolute minimum, as the point of this post isn’t to go into lots of detail of specific models, rather it’s intended to highlight how using different models might require a tradeoff between accuracy and the ability to easily explain how a particular model has made its predictions. We’ll cover one model in this post, and a further two models in Part 2.

Technical Disclaimer (skip this if you’re not a Data Scientist!!!!): Before we begin, we just wanted to say that we do realise that we’ve massively oversimplified all this, and also that there is not an inherent link between the ability for a model to accurately make a prediction and the complexity/explainability of that model. It’s certainly true that some of the simplest models can produce excellent results in particular situations.

However, we do find that in many of our real-world interactions with clients that the more sophisticated models we use (which are typically more difficult to understand and explain) do often make better predictions, and therefore in many cases this correlation does exist even if there is no causal link.

One final point is that we have massively oversimplified the models and metrics including terms like ‘accuracy’ in quite a ‘non-data sciencey’ way, to try and make the post more understandable to the average person. We tried using more technical measures for each model’s performance, but found that they took away from the point of this article, so went with much looser terms – for which we ask your forgiveness! ???? Right, now that’s out of the way…

Model 1: Decision Tree

Decision Trees are a commonly used and powerful Artificial Intelligence model, and are great in terms of explainability – they are arguably one of the most easily explainable models out there.

Think of a Decision Tree as being a bit like the game ‘Twenty Questions’, where you try and guess the name of a famous person by asking a series of yes/no questions. And, very much like ‘Twenty Questions’, asking the right questions is the key to honing in on an answer quickly.

So, for instance, asking “Is it Sandra Bullock?” as your first question probably isn’t the best idea, so instead you ask “Is it a woman?” because that eliminates approximately half of all famous people. A computer can do a similar job by analysing a dataset to tell you what are the best questions to ask in order to rapidly narrow down the number of possibilities in the fewest questions.

And the best thing about a Decision Tree is that this can be done by pretty much anyone who has basic computer skills. We used Microsoft Power BI’s Decision Tree extension to do just that, and within a few clicks we got the following (don’t worry if this doesn’t make much sense – we’ll explain what it’s telling us below):

Decision Tree

So what this is basically telling us is that the best set of predictors for a good red wine are:

Alcohol Level is 12% or above(!)
Sulphates measure 0.69 or above
Free sulphur dioxide is below 19

And sure enough, if you apply just those three rules to the dataset that we started with, it reduces the original list of 1,600 wines down to just 49 wines, and (crucially) the percentage of good wines in that list is 83.7% as opposed to just 13.6% in the original list.

To put that another way, if you chose a wine at random from the original list of 1,600 wines, then you would only expect to get a good wine about 14% of the time. However, if you randomly chose a wine that meets the three criteria above then you are about 84% likely to end up with a good bottle of red.

So what?

Well, it’s difficult to argue with the effectiveness of a Decision Tree, certainly in this example at least. By applying just three, easily understandable rules you can have a huge impact on your likelihood of choosing a good wine. (And by the way, this is a result that comes from real data on red wine, so if you can get hold of those three bits of information about a wine then you’re onto a winner… – you’re welcome!).

Is there a downside?

Well, perhaps not in this example. But, there are two points worth noting:

Although this is a great result for the amount of effort that we have put in, we can probably get much more accurate results with a more sophisticated model
The way that we have applied a Decision Tree in this example reduces the initial list of 1,600 wines to just 49. And whilst it’s true that within those 49 wines you have a high probability of finding a good one, this has told us very little about the other 1,551 wines on the list that don’t meet the criteria. And, if you had one of those bottles of wine in your hand, this Decision Tree doesn’t help all that much. The Decision Tree could certainly be expanded to include additional complexity and depth to provide insight into the remaining 1,551 wines, but to do that would also take away from the simplicity of being able to say “if it meets these three things, then it’s probably a good bottle of wine”

Decision Tree Summary

Very quick and easy to build a model
Easily explainability results
Can be limited in terms of accuracy and the types of questions it helps you answer

In the next post…

That’s it for this post. In the next post we will continue this theme by looking at two more Artificial Intelligence models and compare their ease of creation, accuracy and explainability and consider some of the associated tradeoffs – and in particular, how different models may impact the ethical use of Artificial Intelligence.