• Header of Palaeontology and Archaeology at Transmitting Science

Advanced Courses in Life Sciences

1st Edition

Introduction to Machine Learning applied to Taphonomy

November 4th-8th, 2019, Madrid (Spain)

Palaeontology and Archaeology Logo

Introduction to Machine Learning and Deep Learning applied to TaphonomyCourse Overview

This course introduces students to the most advanced tools in Artificial Intelligence (AI); machine learning methods that make data mining and data processing a fascinating topic.

Obtaining and analyzing data is currently a very well developed field in computer science. Finding patterns in these data, or processing this information, is less straightforward and is sometimes subjected to biases. Data Mining has recently given way to Process Mining, in which powerful statistical and software tools are used in combination to correctly detect patterns and make reliable classifications of customers or products and make accurate predictions.  For Paleobiology, these tools provide the most advanced computing technique for accurate classification and prediction.

This course offers a practical introduction to Machine Learning applied to Taphonomy. From class One, students will learn the use of these information-managing tools on their computers. After its completion, students will be prepared to understand the patterns hidden in any database, regardless of its size and complexity. For a practical demonstration, two types of taphonomic fields will be provided.

The study of bone surface modifications (BSM) has been one of the most difficult and controversial areas in taphonomic research. Only AI has provided a way to understands the subtleties of this type of analysis by yielding systematic identification rates of BSM with accuracy higher than 90% of the cases. This constitutes a major revolution in this field.

The second taphonomic field is biometric. As a practicum, metric properties of broken bones will be used to discern process (dry and green breaking) and agency (human or carnivore) in bone fragmentation.

Teaching will be done using R. In the last module involving computer vision and deep learning, both R and Python will be used.

LOCATION

The Institute of Human Evolution in Africa

C/ Covarrubias 36, 28010
Madrid, Spain

DATE

November 4th-8th, 2019

LANGUAGE

English

COURSE LENGTH & ECTS

30 hours on-site.

This course is equivalent to 1 ECTS (European Credit Transfer System) at the Life Science Zurich Graduate School.

The recognition of ECTS by other institutions depends on each university or school.

PLACES

Places are limited to 15 participants and will be occupied by strict registration order.

Participants who have completed the course will receive a certificate at the end of it.

Instructor

Manuel Domínguez-Rodrigo instructor for Transmitting Science

Dr. Manuel Domínguez-Rodrigo
Complutense University
Spain

Coordinators

Ana Rosa Gómez-Cano coordinator at Transmitting Science

Dr. Ana Rosa Gómez-Cano
Transmitting Science
Spain

Soledad De Esteban-Trivigno instructor at Transmitting Science

Dr. Soledad De Esteban-Trivigno
Transmitting Science
Spain

Requirements

Basic knowledge of R is strongly recommended. If you are not familiar with R, you can learn it using the package Swirl.

Although students will benefit from having prior knowledge on statistics (namely, univariate and bivariate or multivariate statistics), the teaching system will not require them to have any statistical basis. Concepts will be explained from their basic foundation so that they are fully understood by students with different backgrounds.

Students must bring their own laptops.

Required textbook: James, G., Witten, D., Hastie, T., Tibshirani, R., 2013. An Introduction to Statistical Learning with applications in R. Springer. A pdf version is available for free HERE.

Program

Monday, November 4th, 2019. 

  • Microscopic characteristics of Bone Surface Modifications (BSM):
    • Compiling all the microscopic characteristics that identify the different types of BSM; tooth marks (by all bone-modifying biotic agents), percussion marks, stone-tool and metal marks, trampling marks, biochemical marks. Practicum: microscopic observation of referential collections.
  • Comparison of traditional techniques to identify and quantify BSM:
    • Showing the advantages and disadvantages of all the BSM tallying methods. Practicum: microscopic observation of referential collections II.
  • Introduction to Machine Learning. Practicum: an introduction to R:
    • Introducing students to Big Data and the various ways data are generated and handled. Describing data volume, velocity, and veracity methods. Differentiating between Data obtainment, Data Mining and Data Processing. Introduction to R: vectors, matrices, data frames and data classes.
  • Simple prediction. Practicum: Simple regression:
    • Seeking measurable patterns in variables. Differentiating among variable types, covariance and variable correlation. How to estimate the influence of variables on each other and predict values from one dependent variable from another explanatory variable. We will start using paleobiological examples.
  • Complex prediction. Practicum: Multiple regression:
    • Expand the predictions of estimates of one dependent variable from a set of multiple variables. Analyze covariance and interactions between variables. Combine different types of explanatory variables. We will continue with paleobiological examples. Students will analyze profit predictions of one company based on investment on several types of advertising media.

Tuesday, November 5th, 2019. 

  • Advanced techniques to identify BSM: 3D and geometric morphometric (GM) approaches:
    • Learn the metric approach to the study of BSM and how it compares with other methods. Use of  GM to identify not only the type of BSM, but also variability according to tool type and raw material type. Practicum: Learning the technique of image capturing and the use of specific software for 3D reconstruction. Use of GM statistics.
  • Big data prediction (I). Practicum: Regression trees:
    • Teach a very powerful analytical tool (trees), which can use combinations of various types of variables and do not require data to follow any specific distribution pattern. Trees are powerful for numeric prediction. It allows the use of very large number of variables. Students will learn how to predict numerical target variables.
  • Big data prediction and classification on categorical and mixed sets (II). Practicum: Decision Trees:
    • These two machine learning methods identify patterns that can be used for predictive classification. Information is structured in logical trees which result in all-purpose classifiers. They use categorical dependent variables. Students will learn how to apply these tools to a large array of examples. Students will apply powerful algorithms such as C5.0, one-rule algorithm (such as ZeroR) or error-reducing algorithms such as RIPPER.
  •  Big data classification on categorical and mixed sets (III). Practicum: Mixture Discriminant Analysis:
    • Introduce students a powerful machine learning methods for identifying associations among items through reduced dimensionality. Paleobiological examples will be used for practice.
  • Identifying associations among objects and behavioural patterns. Practicum:  K-means clustering:
    • To teach methods to address the machine learning task of clustering, which consists of finding natural groupings of data. This method is used for knowledge discovery instead of prediction. It provides powerful insights into groupings found in natural data.
  • Big data classification on categorical and mixed sets (IV). Practicum: Naïve Bayes:
    • This machine learning method uses principles of probability for classification. It easily provides the estimated probability for any given prediction. Paleobiological examples will be used for practice.

Wednesday, November 6th, 2019. 

  • Introduction to the powerful classification and predictive algorithms I: K-nearest neighbour:
    • Modelling of cases and variables using K-nearest neighbour (KNN) algorithms that build upon regression and classification using distance matrices. These are some of the more advanced procedures for data mining.  Students will apply these techniques on a diversity of paleobiological databases.
  • Introduction to the powerful classification and predictive algorithms II: Partial Least Square Discriminant Analysis:
    • Modeling of cases and variables using Partial Least Square Discriminant Analysis (PLSDA) algorithms that build upon regression and classification. These are some of the more advances procedures for data mining.  Students will apply these techniques on a diversity of paleobiological data bases.
  • Introduction to the powerful classification and predictive algorithms III: Support Vector Machines:
    • Support Vector Machines (SVM) are some of the most advanced non-linear classifiers that can be used for dichotomous target variables or multi-group categorical variables. They are used for classification and prediction and are one of the three most powerful classifiers in ML. Students will apply these techniques on a diversity of paleobiological databases.
  • Introduction to the powerful classification and predictive algorithms IV: Neural Networks:
    • Neural networks are the most computing-demanding algorithms, but also some of the most advanced in detecting features and generating both predictions and classifications. The basic structure and concepts of neural networks and perceptions will be learnt, as well as additional methods of controlling for training models and learning rates.
  • Boosting, Bagging and Cross-Validation:
    • Introduce students to inference reliability methods which can guarantee the correctness or high confidence (>95% of cases) in the classification of data or in numeric predictions. Students will use several of the previous databases and others from (James et al., An Introduction to Statistical Learning, Springer).

Thursday, November 7th, 2019. 

  • Rattle and Random Forests:
    • In this section, students will use a GUI in R to apply some of the previous analyses in a more intuitive way, and they will also learn how to make Random Forests, which are a combination of boosting and bagging applied to regression and decision trees for the selection of variables that most accurately help in making the right classification or prediction.
  • Introduction to H2O and CARET:
    • Here, a special mono-thematic session will be devoted to two of the most advanced R libraries for Machine learning: H2O and Caret. Comparative exercises will be carried out with previous algorithms to show the power of each of them on solving the same problems.

Friday, November 8th, 2019. 

  • Introduction to Deep Learning and Computer Vision: Convolutional Neural Networks:
    • Provide all the theoretical tools to understand the most powerful mathematical algorithms that exist for prediction and classification with a clear focus on image detection and classification. Neural networks will be explained and some of their most advanced algorithms, like convoluted neural networks, will be used. For this last module, the sessions involved will require learning some basics of Python and for that purpose the frameworks Anaconda and Jupiter books will be used. The use of Neural networks will be carried out using both R and Python.  The depth of this module, by far the most complex of the course, will depend on the learning rate of students.
  • Practicum:
    • This last module will focus on practical applications of all the software tools learnt and with several cases for data mining. Students will have to work on a personal supervised project with data sets most adequate to their professional interests. Both taphonomic data sets generated on BSM and Bone breakage can be used.

Required textbook: James, G., Witten, D., Hastie, T., Tibshirani, R., 2013. An Introduction to Statistical Learning with applications in R. Springer. A pdf version is available for free HERE.

Fees

  • Course Fee
  • {{content-1}}
  • Early bird (until July 31st, 2019):
  • 975 *

    (780 for Ambassador Institutions)

  • Regular (after July 31st, 2019):
  • 1,100 *

    (880 for Ambassador Institutions)

  • This includes course material, coffee breaks and lunches (VAT included).
    * Participants from companies/industry will have an extra charge of 100 €.

You can check the list of Ambassador Institutions HERE. If you want your institution to become a Transmitting Science Ambassador please contact us at communication@transmittingscience.org.

Discounts (see Funding below) are not cumulative and apply only on the fee.

We offer the possibility of paying in two instalments (contact the course coordinators).

Schedule

Course Schedule
  • Monday to Friday:
    • 10:00 to 13:30 Lessons.
    • 13:30 to 15:00 Lunch (included).
    • 15:00 to 18:00 Lessons.

The schedule is approximate; it is possible that the content of one day may run into the next and a working day may be longer than advertised.

Funding

Discounts

Former participants will have a 5 % discount on the Course Fee.

Furthermore, a 20 % discount on the Course Fee is offered for members of some organizations (Organizations with discount). If you want to apply to this discount please indicate it in the Registration form (proof will be asked later).

Unemployed scientists living in Spain, as well as PhD students based in Spain without any grant or scholarship to develop their PhD, could benefit from a 40 % discount on the Course Fee. If you want to ask for this discount, please contact the course coordinator. That would apply for a maximum of 2 places and they will be covered by strict inscription order.

Discounts are not cumulative and apply only on the fee, not to Accommodation Package or other options.

Organizers

Transmitting Science Logo
Logo IDEA

Collaborators

Col·legi Oficial de Biòlegs de la Comunitat Valenciana Logo
Colegio Oficial de Biólogos de Castilla y León Logo
Colegio Oficial de Biólogos de Euskadi Logo
Colexio Oficial de Biólogos de Galicia Logo

Registration