Model Performance Differs by Analytics Tool — A Tale of Caution

Business analysts rarely give much thought to the analytics tools they use, and usually just use the software already available in their enterprise. Even more commonly seen is the use of default settings within such systems — strapped for time, analysts and reporting staff will use the default settings in their software packages to complete tasks on time. This is risky, as default settings are not always the best ones to use for specific business problems or modeling tasks.

We recently performed an analysis to see how much machine learning and statistics tools differ in performance when using the same methods, and the results surprised us. Given three different software tools, using the same approach results in model performance that can vary by as much as 200% in some metrics.

Experimenting with Platforms

To illustrate, we used a publicly available data set on breast cancer where given a set of attributes, we tried to predict whether a tumor is malignant or benign. While we chose a data set related to health, this experiment can be applied to any data set. We then chose three machine learning platforms (Weka, Orange, RapidMiner) and in each case, used the same type of algorithm: a logistic regression with ten-fold cross validation. We also repeated the experiment several times to ensure statistically significant differences between results.

What we found was fascinating: while accuracy stays somewhat similar across platforms, the effectiveness of predicting positive or negative results can differ significantly. In the charts below, we show the number of False Positives in the cancer data set, along with the sensitivity scores for each platform.

This is particularly interesting because in many analytic marketing scenarios, one is less concerned about accuracy and more focused on predicting the positive or negative cases. For example, in a churn reduction scenario for a marketing/sales department, one is most concerned about False Positives (customers you say will churn but in reality don’t) than total accuracy. In the breast cancer case, returning a False Negative – incorrectly identifying someone as not having a cancer risk when they in fact do – could lead to disastrous results.

Implications for Customer Analytics

If you are working in customer analytics, these results have very significant implications that warrant a few words of advice. First, ensure that you understand the platform you are using for analyzing customer data. If you are building propensity models, analyzing churn risk, or performing other types of analysis, make sure you explore your data using different model settings and algorithms. Test your software tool and analyze how well models perform with different settings.

In extreme cases, companies wanting full control over every aspect of the model building process opt to build their own algorithms from the ground up. We only recommend this if your sales and marketing work requires extreme sensitivity, such as in cases where fraud detection or churn are key to the success of your business.

In closing, it is crucial to test the sensitivity of the machine learning tools you use. Ask yourself how much risk there is if a model makes a mistake (e.g., incorrectly classifying a customer); ensure you thoroughly test your models and the platforms you use to build them.

Written by Wojciech Gryc

Wojciech Gryc is the CEO of Canopy Labs. Prior to Canopy Labs, Wojciech was a consultant with McKinsey & Co. and a researcher at IBM Research. Wojciech is a Rhodes Scholar and Loran Scholar.


  1. George

    Can you share which platforms are which software (platform 1 — Weka? Orange?)

    1. Wojciech Gryc (Post author)

      That’s a great question. We didn’t share the platform names because we didn’t want people to assume that one platform is generally better than another for this. Different data sets tend to perform differently on each platform, but it’s difficult to say if one is better or worse.

Leave a Reply

Your email address will not be published.