Predictive Analytics and Big Data

Business StatisticsPredictive analytics is the science of using existing data sources in order to predict with the best accuracy specific behaviours. It is now used in many disciplines such as financial analysis, auditing and marketing. Creascience combines its knowledge of these fields of application with an extensive experience in data analysis and modeling to offer tailored solutions especially focused on groups wanting to get started in this field.

Creascience provides support for all phases involved in the building of a reliable prediction model. There are four steps that we believe require a specific attention:

Preparing a Clean and Informative Dataset

To start with, a lot of attention should be put on the data preparation phase. It is illusory to believe that any statistical model will be able to perfectly deal with issues like missing values, redundant information, and variables with too many categories. Many can address some of these issues to some degree, but will perform much better if the data have first been prepared adequately. In the same way, it is also illusory to believe that one does not need any information on the data and context of application to build a good model. When the time comes to remove or keep specific variables, it is crucial to know what each one measures or how reliable they are. Therefore, we consider that an adequate preparation of the data to be analyzed should never be overlooked.

Selecting an Adapted Prediction Criterion

Most ways of measuring the predictive ability of a model are based on the idea that different data should be used to build the model and to assess its performance. There are however several ways of implementing that and sometimes, practical constraints also have to be accounted for in the definition of the objective.

Comparing the Performance of Several Methods

There is no such thing as the overall best modeling technique. Of course, it first depends on the type of response one tries to predict, but it also depends on the context and ultimately on the data themselves. Therefore, we never limit our investigations to a single type of model. We believe that the key to successful predictive analytics lies in testing the performance of several types of models in order to select the one that performs best in a given context.

Providing Actionable Results

Building a predictive model is rarely limited to providing a set of predicted values. First, the model itself might need to be delivered in a usable manner. Second, the model performance can rarely be summarized with a single performance measure. It typically works well for some data and not so well for others. Last but not least, many recent modeling techniques appear as "black boxes" when one tries to understand what the most important drivers or predictors are and how they affect the response. We make sure to address these issues, notably by providing uncertainty on the predictions and understandable plots and tables illustrating the relationship between the response and the key predictors.