Detect Outliers Automatically

8.2.2022

You can detect outliers automatically with a little help from advanced algorithms. Without automatic outlier detection, finding outliers can be laborious and therefore many companies only detect the most prominent outliers. Often, those are also detected with a delay which causes harm or extra work.

Manual Checking Is Laborious and Error-Prone

A time series is a sequence of values in a time order. These can be monthly costs, daily absences, hourly energy consumption, or nearly anything that is recorded as numeric values at frequent intervals. Suppose we have a time series containing monthly sales figures. If our sales have lately been around € 1,000,000 per month and rising, then a sudden value of € 6,000 is likely an outlier. This is easy to notice.

However, even if our aggregate sales over all cost centers are roughly € 1,000,000 each month, it is still possible that any of the cost centers have an outlier in their sales. We might be losing revenue if we do not even notice that there is something weird going on. Therefore, we should be looking at the individual time series of each cost center, account, and any other relevant dimensions. This can become a lot of work.

Furthermore, sometimes outliers can be tricky to notice even when you are looking at the right data. A time series can have complex seasonal components mixed with a trend. The human eye is not great at noticing complex patterns even from a graph, much less from raw numbers. Additionally, the time series can be influenced by many other complex factors. One such factor is temporary distractions such as those caused by the COVID-19.

Another example is when the time series is affected by another variable. For example, the energy consumption of a factory can spike when there’s an exceptionally busy production period. Therefore, it is not always enough to look at one time series – instead a group of time series may need to be considered together.

Algorithms Help Humans Concentrate on The Essential

Because manual checking is laborious and error-prone, ideally outliers could be detected much like auto-correction works on text. The user would automatically be notified of any potential outliers on any of our time series and given suggestion of the correct values. They could then investigate and decide on how to fix the issue.

Luckily, this can be achieved with an automated algorithm. Such algorithms look at the historical data, disassembling each time series into components such as trend and seasonality. They then discover outliers and report them to the user. The user can see a report of all relevant outliers instead of going through all the time series themselves. The report can include where the outliers are, how big they are, what value was expected instead, and why they are deemed outliers. Naturally, visualizations can be included in the automatic report.

With an automated solution like this, thousands of time series can be analyzed in the time it would take for a person to look for outliers from just one. The solutions can be implemented rapidly with a “plug and play” mentality yet can still be customized for specific needs. A human will still be an integral part of the loop, but instead of dividing their attention over a vast and boresome task or skipping it with a cursory glance they can spend their time investigating the most impactful figures.

Summary

Finding outliers manually and relying on human eye is laborious and error prone. Even big outliers can be missed in a large dataset. With advanced algorithms you can detect outliers automatically, save a lot of time, and help correct errors faster. When automated algorithms do the hard work of going through all the data and point out the outliers, humans can focus on analyzing the findings.

We have created a video to demonstrate how our algorithm works in practice. Watch it to see how to detect outliers automatically.

Outlier Detection Demo Video

If you would like to learn more about our advanced analytics and optimization solutions, check out the solution page.

Visa Linkiö
Lead Data Scientist