Crypto Slang for Noobs

It can be very difficult for new investors to navigate the volatile waters of the cryptocurrency market. To have any chance at success you must begin to understand the slang terminology that has…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




Numeric Scoring Metrics

Find the Right Metric for a Prediction Model

by Maarit Widmann

Quantitative data have endless stories to tell!

These models have many consequences in the real world, from the decisions of the portfolio managers to the pricing of electricity at different times of the day, week and year. Numeric scoring metrics are needed in order to:

(Root) Mean Squared Error, (R)MSE — Which model best captures the rapid changes in the volatile stock market?

In Figure 1, below, you see the development of the LinkedIn stock closing price from 2011 to 2016. Within the time period, the behavior includes sudden peaks, sudden lows, longer periods of increasing and decreasing value, and a few stable periods. Forecasting this kind of volatile behavior is challenging, especially in the long term. However, for the stakeholders of LinkedIn, it’s valuable. Therefore, we prefer a forecasting model that captures the sudden changes to a model that performs well on average over the period of five years.

Figure 1. LinkedIn daily stock market closing price from 2011 to 2016: data with few regular patterns and many sudden changes with low forecastability. We select the forecasting model with the lowest (root) mean squared error because it weighs the big forecast errors more and favors a model that captures the sudden peaks and lows.

Mean Absolute Error, MAE — Which model best estimates the energy consumption in the long term?

Figure 2. Hourly energy consumption values in June 2009 in Dublin, collected from a cluster of households and industries. The data shows a relatively regular behavior and can therefore be easily forecasted in the long term. We select the forecasting model with the lowest mean absolute error because this metric is robust towards outliers.

Mean Absolute Percentage Error, MAPE — Are the sales forecasting models for different products equally accurate?

On a hot summer day, the supply of both sparkling water and ice cream should be guaranteed! We want to check if the two forecasting models that predict the sales of these two products are equally accurate.

Notice, though, that MAPE values can be biased when the actual values are close to zero. For example, the sales of ice cream are relatively low during the winter months compared to summer months, whereas sales of milk remain pretty constant through the entire year. When we compare the accuracies of the forecasting models for milk vs. ice cream by their MAPE values, the small values in the ice cream sales make the forecasting model for ice cream look unreasonably bad compared to the forecasting model for milk.

In Figure 3, in the line plot in the middle, you see the sales of milk (blue line) and ice cream (green line) and the predicted sales of both products (red lines). If we take a look at the MAPE values, the forecasting accuracy is apparently much better for milk (MAPE = 0.016) than for ice cream (0.266). However, this huge difference is due to the low values of ice cream sales in the winter months. The line plot on the right in Figure 3 shows exactly the same actual and predicted sales for ice cream and milk, with ice cream sales scaled up by 25 items for each month. Without the bias from the values close to zero, the forecasting accuracies for ice cream (MAPE=0.036) and milk (MAPE=0.016) are now much closer to each other.

Figure 3. Three line plots showing actual and predicted values of ice cream and sparkling water (line plot on the left) and ice cream and milk (line plots in the middle and on the right). In the line plot on the right, the ice cream sales values are scaled up by 25% in order to avoid the bias in mean absolute percentage error introduced by small actual values.

Mean Signed Difference — Does a running app provide unrealistic expectations?

A smartwatch can be connected to a running application which then estimates the finishing time in a 10km run. It could be that, as a motivator, the app estimates the time lower than what’s realistically expected.

Figure 4. Estimated (red line) and actual (orange line) finishing times in a 10km run in the period of six months. The estimated times are biased downwards, also shown by the negative value of the mean signed difference.

R-squared — How much of our years of education can be explained through access to literature?

R-squared tells how much of the variance of the target column (years of education) is explained by the model. Based on the R-squared value of the model, 0.76, the access to literature explains 76% of the variance in the years of education.

Figure 5. Linear regression line modeling the relationship between access to literature and years of education. R-squared is used to measure the model fit, i.e., how much of the variance in the target column (years of education) can be explained by the model, 76% in this case.

The numeric scoring metrics introduced above are shown in Figure 6. The metrics are listed along with the formulas used to calculate them and a few key properties. In the formulas, yi is the actual value and f(xi) is the predicted value.

Figure 6. Common numeric scoring metrics, their formulas, and key properties. In the formulas, yi is the actual value, f(xi) is the forecasted value, and n is the sample size.

In this article, we’ve introduced the most commonly used numeric error metrics and the perspectives that they provide to the model’s performance.

It’s often recommended to take a look at multiple numeric scoring metrics to gain a comprehensive view of the model’s performance. For example, by reviewing the mean signed difference, you can see if your model has a systematic bias, whereas by studying the (root) mean squared error, you can see which model best captures the sudden fluctuations. Visualizations, a line plot, for example, complement the model evaluation.

For a practical implementation, take a look at the example workflows built in the visual data science tool KNIME Analytics Platform.

Download and inspect these free workflows from the KNIME Hub:

— — — — — — — — — -

Add a comment

Related posts:

How I Scaled Portugal in 5 Days

Portugal seems to have become a popular destination among tourists but this country has always been booming with treasured gems. One of the best things about Portugal is that if you’re on a budget…

Next Level Verification

Level 2 provides Dragonchain users and enterprises alike the opportunity to participate in an ecosystem built to ensure security, flexibility, and expediency. The verification and validation of a…

Keep On Keeping On!

I first heard this expression of encouragement and endurance long ages ago, while studying at a seminary in Texas. I thought it was just an ol’ Southern Baptist urging, but I’ve since learned it has…