4th - PAKDD 2009 Data Mining Competition


This was credit card application data. The challenge was that the test set was a few years later than the train set, the objective being to build a model that was robust over time.
See here for details.

WINNER - NCDM Analytic Challenge 2008


Challenge08: Which Participant Will Win The Battle to Optimize a Marketing Budget?

Come see industry-leading analytic innovators compete to win the battle of how to optimize a marketing budget across different product lines. Contestants will look at product optimization, build statistical solutions to identify the best products to cross-sell, and optimize a marketing budget accordingly.


Rank Profit Team    Here Dr Phil Brierley of Tiberius teamed up with David Vogel of Data Mining Solutions. The objective was to maximise the profit of a mailout by determining who to mail and what product offering they should receive.

There were two components required to solve the task,

1. Build propensity models for each of the 2 products

2. Use the propensity scores to decide which product to allocate (given certain constraints) in order to maximise profit.

Even in the absence of optimisation techniques, component model rankings generally corresponded to overall challenge performance. Both our component models were ranked 1st.

The competition was very close, but our solution generated 1.3% more profit than the next best solution.
1 $264,760 Tiberius Data Mining & Data Mining Solutions
2 $261,410 DataLab USA
3 $261,045 Equifax
4 $260,770 DMW Worldwide LLC
Honourable Mention:Travelers, Salford Systems, Merkle, Data Management Marketing




'A public transportation company is expecting increasing demand for its services and is planning to acquire new buses and to extend its terminals.These investments require a reliable forecast of future demand which should be based on historic demand stored in the company’s data warehouse. For each 15-minute interval between 6:30 hours and 22 hours the number of passengers arriving at the terminal has been recorded and stored. As a forecasting consultant you have been asked to forecast the number of passengers arriving at the terminal.'


We won this competition outright, using the time series capabilities in Tiberius (Salford Systems were the 2005 winners).
The competition homepage can be found here.  The published results can be found here.
Our submission report can be found here.

2nd in accuracy - PAKDD 2006 Data Mining Competition


'An Asian telco operator which has successfully launched a third generation (3G) mobile telecommunications network would like to make use of existing customer usage and demographic data to identify which customers are likely to switch to using their 3G network.'

Accuracy Results

Rank Accuracy
Team    Predictions were submitted for a holdout set of 5,000 customers.

Entries were mainly from academia, but also from some of the world's leading predictive modelling software vendors and consultancies.

The accuracy results showed our algorithms and software to be as good as any currently available.
1 81.9% Nanjing University
2 81.8% Tiberius
3 81.2% Salford Systems
4 80.8% Oklahoma State University
5 80.7% University of Waikato
6 80.6% Inductis

Not only did we make accurate predictions, but we accurately predicted how accurate our predictions would be, which means we know the 'response rate' if this was a marketing campaign.

Of all those we predicted to be 3G, we expected 781 to be correct (it was actually 755)
Of all those we predicted to be 2G, we expected 4,384 to be correct (it was actually 4,408)

Full details and results of the competition can be found here.

Leading - KDDCUP 2004 Ongoing contest


The goal in this task is to learn a classification rule that differentiates between two types of particles generated in high energy collider experiments. It is a binary classification problem with 78 attributes. The training set has 50,000 examples, and the test set is of size 100,000. Your task is provide 4 sets of predictions for the test set that optimize

  • accuracy (maximize)
  • ROC area (maximize)
  • cross entropy (minimize)
  • q-score: a domain-specific performance measure sometimes used in particle physics to measure the effectiveness of prediction models. (maximize: larger q-scores indicate better performance.)

This is a competition we originally entered and subsequently revisited having further developed algorithms and ideas in Tiberius. You to can now get on the top of the leader board with Tiberius and a few mouse clicks.

The latest standings and more information can be found here

Learnings - PAKDD 2007


We like to learn to make sure that Tiberius offers the predictive accuracy comparable with any software available. The organizers of PAKDD2007 allowed us to do this and we implemented the findings in Tiberius so that a few mouse clicks would allow you to equal the best predictive results in this competition.
See here for our analysis of the results.