END OF EARLY REGISTRATION JUNE 30, 2013
The main goal of the tree-based methods, such as CART (Classification and Regression Trees), is to model and predict one response variable explained by a set of dependent variables. These methods can be particularly effective to model interactions between explanatory variables. These techniques were initially proposed in the social sciences (Morgan & Sonquist, 1963) and statistically developed afterwards (Breiman, Friedman, Olshen & Stone, 1984).
Neural networks were initially conceived as an emulation of the human brain (McCullogh & Pitts, 1943), particularly with respect to interactions of and communication between neurons, with the aim of developing computational methods to solve complex problems. Current methods based on neural networks have been developed both from the artificial intelligence and statistics fields, converging in a number of ways. As a statistical model, a neural network is based on linear and non-linear combinations of explanatory variables that interact with other combinations to predict or explain an outcome variable. Feed-forward neural networks [Bishop (1995), Hertz, Krogh & Palmer (1991) and Ripley (1993,1996)], which establish that input variables interact to predict an output variable, by means of a number of hidden layers, are one of the most popular.
In practice, both CART and neural networks methods can provide good results to explain or predict an outcome variable, particularly when the number of interactions is important. These techniques have also been used in the framework of data mining problems, in analyses of large data bases. Nevertheless, these techniques also tend to over-fit the data and a validation of the models is required. ROC (Receiver Operating Characteristic) methods, including a sensitivity / specificity analyses and / or external validations can be performed to assess the consistency of these techniques.
Applications cover a wide range of problems, including species classification in biology, prediction of the prognosis of a patient in biomedicine, analysis of the fidelity of customers in business intelligence or functional genomics analyses in microarray experiments.
Dr. Llorenç Badiella
(Universitat Autònoma de Barcelona, Spain).
Dr. Montserrat Martínez-Alonso
(Biomedical Research Institute of Lleida, Spain).
Dr. Joan Valls
(Biomedical Research Institute of Lleida, Spain).
Dr. Soledad De Esteban-Trivigno
(Transmitting Science, Spain).
Basic knowledge in statistics and R. All participants must bring their own personal laptop (Windows, Macintosh, Linux).
The course will present the basic theory, then focus on practical issues. Sessions will also include exercises with real data that will be solved with R.
|1st Day. Regression trees.
- Introduction to trees.
- Quantitative response variable and decomposition of the sum of squares in a model.
- Recursive binary algorithm for generating partitions.
- Size of the tree and comparison with linear models.
2nd Day. Classification trees.
- Deviance, entropy and Gini index.
- Binary partitions for categorical outcome.
- Comparison with logistic regression model.
|3rd Day. Neural network.
- Specification of a neural network: Input, output, layers, hidden layers, forward-propagation, activation function.
- Neural network for continuous output.
- Neural networks with skip-layers.
- Training of a neural network. Numerical methods for estimation.
- Weight decay and neural network for categorical output.
- Comparison with random forest methods.
4th Day. Validation methods.
- Internal and external validation.
- Training and testing sample.
- Sensitivity and specificity analyses: ROC curves.
- Cross-validation methods.
- Application to CART and neural networks.
- Ripley B (2002) Pattern Recognition and Neural Networks, Cambridge University Press, Cambridge.
- Venables W, Ripley B (2002) Modern Applied Statistics with S-PLUS, Springer, New York.
- Faraway J (2005) Linear Models with R, Chapman & Hall, Boca Raton.
- Faraway J (2006) Extending the Linear Model with R, Chapman & Hall, Miami.
- Hastie T, Tibshirani R, Friedman J (2009) The Elements of Statistical Learning. Data Mining, Inference and Prediction, Springer, New York.
- Torgo L (2010) Data Mining with R. Learning with Case Studies, Chapman & Hall, Miami.
24 hours on-site.
Monday to Thursday:
9:30 to 13:30: Lessons.
13:30 to 15:00: Lunch.
15:00 to 17:00: Lessons.
- There will be a coffee break each day, halfway through each morning lesson session.
Places are limited to 20 participants and will be occupied by strict registration order.
Reduced registration fee until June 30, 2013: 390 €. Full registration fee after June 30, 2013: 545 €. Participation fees include course material, coffee breaks and lunches.
Former participants will have a 5 % discount on the current course fee.
We offer the possibility of paying in two instalments (contact us at email@example.com).
Please complete and submit your Registration Form (see below); we will confirm your acceptance by e-mail.
If you wish to cancel your participation in this course, cancellations up to 20 days before the course start date will incur a 30 % cancellation fee. For later cancellations, or non-attendance, the full course fee will be charged.
If Transmitting Science must cancel this course due to unforeseen circumstances beyond the control of Transmitting Science, you will either be entitled to a full refund of the course fee, or your fee can be credited toward a future course/workshop. Transmitting Science is not responsible for travel fees, or any expenses incurred by you as a result of such cancellation. Every effort will be made to avoid the cancellation of any planned course/workshop.
The course will take place in the city of Sabadell, Barcelona (Spain).
You can stay in Barcelona city or Sabadell. Information about Hotels and Hostel in Sabadell here.
How to arrive to Sabadell from Barcelona city.
Unfortunately there are no internal grants available for this course. For information on External Financial Support, please check the link.
Spanish unemployed scientists or students, as well as students developing their PhD without any grant, could benefit from a 40 % discount on the course fee. That would apply for a maximum of 4 places and they will be covered by strict inscription order.
For further information contact: firstname.lastname@example.org.