Wednesday, November 6th, 2019.
- Introduction to the powerful classification and predictive algorithms I: K-nearest neighbour:
- Modelling of cases and variables using K-nearest neighbour (KNN) algorithms that build upon regression and classification using distance matrices. These are some of the more advanced procedures for data mining. Students will apply these techniques on a diversity of paleobiological databases.
- Introduction to the powerful classification and predictive algorithms II: Partial Least Square Discriminant Analysis:
- Modeling of cases and variables using Partial Least Square Discriminant Analysis (PLSDA) algorithms that build upon regression and classification. These are some of the more advances procedures for data mining. Students will apply these techniques on a diversity of paleobiological data bases.
- Introduction to the powerful classification and predictive algorithms III: Support Vector Machines:
- Support Vector Machines (SVM) are some of the most advanced non-linear classifiers that can be used for dichotomous target variables or multi-group categorical variables. They are used for classification and prediction and are one of the three most powerful classifiers in ML. Students will apply these techniques on a diversity of paleobiological databases.
- Introduction to the powerful classification and predictive algorithms IV: Neural Networks:
- Neural networks are the most computing-demanding algorithms, but also some of the most advanced in detecting features and generating both predictions and classifications. The basic structure and concepts of neural networks and perceptions will be learnt, as well as additional methods of controlling for training models and learning rates.
- Boosting, Bagging and Cross-Validation:
- Introduce students to inference reliability methods which can guarantee the correctness or high confidence (>95% of cases) in the classification of data or in numeric predictions. Students will use several of the previous databases and others from (James et al., An Introduction to Statistical Learning, Springer).
Thursday, November 7th, 2019.
- Rattle and Random Forests:
- In this section, students will use a GUI in R to apply some of the previous analyses in a more intuitive way, and they will also learn how to make Random Forests, which are a combination of boosting and bagging applied to regression and decision trees for the selection of variables that most accurately help in making the right classification or prediction.
- Introduction to H2O and CARET:
- Here, a special mono-thematic session will be devoted to two of the most advanced R libraries for Machine learning: H2O and Caret. Comparative exercises will be carried out with previous algorithms to show the power of each of them on solving the same problems.
Friday, November 8th, 2019.
- Introduction to Deep Learning and Computer Vision: Convolutional Neural Networks:
- Provide all the theoretical tools to understand the most powerful mathematical algorithms that exist for prediction and classification with a clear focus on image detection and classification. Neural networks will be explained and some of their most advanced algorithms, like convoluted neural networks, will be used. For this last module, the sessions involved will require learning some basics of Python and for that purpose the frameworks Anaconda and Jupiter books will be used. The use of Neural networks will be carried out using both R and Python. The depth of this module, by far the most complex of the course, will depend on the learning rate of students.
- This last module will focus on practical applications of all the software tools learnt and with several cases for data mining. Students will have to work on a personal supervised project with data sets most adequate to their professional interests. Both taphonomic data sets generated on BSM and Bone breakage can be used.
Required textbook: James, G., Witten, D., Hastie, T., Tibshirani, R., 2013. An Introduction to Statistical Learning with applications in R. Springer. A pdf version is available for free HERE.