Machine Learning

In order to ‘boost’ the accuracy of our predictive models, we have made use of both supervised and unsupervised machine learning algorithms.

To put it simply, supervised models are those whose function has an output variable (Y) and one or more input variables (X). Regression (for ordinal and continuous Y variable) and Classification models (for dichotomous/binary/nominal Y variable) are the most typical algorithms. On the other hand, unsupervised machine learning algorithms have only input data (X) with no corresponding output variables.

For supervised learning, we have tested the performance of different algorithms before choosing the final model. We have typically compared the following models: neural networks, random forest, support vector machine, xgboost, linear and logistic regression.

For unsupervised learning, clustering techniques, principal component analysis and item response theory are the algorithms to find out unknown patterns in data.

Example of a logistic regression model (dichotomous Y variable):

Example of a more complex supervised model to predict individual performance in a Brazilian telecommunication company. The below plot shows what variables are the best predictors of individual performance using XGBoost algorithm.