Regression and Principal Component Analysis: An Application-Based Introduction

17 oktober 2016 00:00 - 21 oktober 2016 00:00

Bodø, Norway

Download registration form

Deadline for registration: September 1, 2016.

Maximum number of participants will be 20. Decision concerning admission is made according to the time of registration among the qualified applicants.

5 ECTS credits

Goal: The goal of this course is to provide the course participants with knowledge and skills in performing regression analysis, including generalized linear models and nonparametric regression, and component methods of dimension reduction, including principal component analysis and correspondence analysis.

Content: This course deals with methods that are broadly classified under the umbrella term of regression modelling as well as the complementary field of dimension-reduction methods called principal component analysis (PCA). While regression models the relationship between a response variable and several explanatory variables, PCA identifies relationships within a set of variables. Each of these areas has special cases depending on the type of data under scrutiny: for example, the area of generalized linear modelling includes methods for modelling a response variable on a continuous scale (the classic regression problem), as well as on a frequency scale (Poisson regression) or categorical scale (logistic regression). While PCA identifies relationships within a set of continuous variables, its companion method correspondence analysis (CA) does the same within a set of categorical variables. This course gives a comprehensive application-based introduction to these methods, with the accent on examples from social science research.
The practical classes in the afternoons comprise: computer laboratory, generally using the SPSS statistical package, applications to social science data, and to some degree analysis and discussion of participants’ own data set

Learning outcomes:

Based on this course the student should have

1. of the fundamental differences between supervised learning (exemplified by regression analysis) and unsupervised learning (exemplified by principal component analysis);
2. of the linear regression model, and the meaning of its parameters and model estimates;
3. of how the linear regression model can be extended to generalized linear models;
4. of how classification and regression trees function, offering a completely different approach to regression modelling;
5. of how principal component analysis quantifies relationships within a set of variables, and produces new scales;
6. of extensions to principal component analysis that apply to different types of data, for example correspondence analysis for categorical data.

7. to apply these statistical methods to data, using appropriate software;
8. to interpret the results and make valid conclusions from the data analysis;
9. to present their results in a systematic way, using graphical representation as much as possible, for reports and research articles.

Preliminary timetable


Kurslab 0042
Introduction to multivariate analysis
· Functional methods (a.k.a. supervised learning)
· Structural methods (a.k.a. unsupervised learning)
Linear regression analysis
· The regression model
· The general linear model
· Estimating the parameters
· Applications
· Interpreting the results


Gerhard Schøning (A8)Model selection
· Things that can go wrong: model diagnostics
· Choosing from many explanatory variables
· Interaction effects
· The special role of the logarithmic transformation
· Applications
Generalized linear models
· Poisson and logistic regression models
· Estimation and interpretation of parameters
· Applications

Ole Tobias Olsen (A15)Dimension reduction methods (1)
· Basic principles of dimension reduction
· Principal component analysis (PCA) as a geometric method
· PCA as a scaling method
· Applications
· Interpretation of PCA parameters
· Interpretation of PCA biplots

Factor analysis
· Essential differences and similarities to PCA
· Applications

Ole Tobias Olsen (A15)Dimension reduction methods (2)
· PCA on different types of data, and the logarithmic transformation revisited
· Applications
· Categorical data and correspondence analysis (CA)
· Multiple correspondence analysis (MCA)
· Applications
· Combining regression and PCA/CA
Ole Tobias Olsen (A15)Classification and regression trees
· A decision-tree approach to modelling and prediction
· Rules for splitting the data
· Applications
· Comparison of linear regression and regression trees
· Cross-validation for prediction
General summary and discussion
​13:00-15:00​PC Lab (starts with students presenting their data sets -- voluntary!)Ole Tobias Olsen (A15)
PC Lab (starts with students presenting their data sets -- voluntary!)
Gerhard Schøning (A8)
PC Lab
Elias Blix (A13)
PC Lab

Exam and evaluation: Participation in lectures and a paper on an application of multivariate analysis. Paper graded: pass/non pass.
Language of education: English

Course presenter: Professor Michael Greenacre, Department of Economics and Business, Pompeu Fabra University, and Barcelona Graduate School of Economics, Barcelona, Spain

Compulsory literature:
Chatterjee, S. and Hadi, Ali. S. 2007. Regression Analysis by Example, 5th Edition. Wiley.
James, G., Witten, D., Hastie, T. and Tibshirani, R. 2014. An Introduction to Statistical Learning. Springer. By a special arrangement with the publisher, this book is freely available for download from
Greenacre, M. 2010. Biplots in Practice. BBVA Foundation, Madrid. In public domain, free download from

Recommended literature:
Tinsley, E.A and Brown, S.D. (eds) 2000. Handbook of Applied Multivariate Statistics and Mathematical Modeling. Academic Press/Elsevier. Chapters can be downloaded from
Joliffe, I. 2002. Principal Component Analysis. Wiley.
Greenacre, M. 2007. Correspondence Analysis in Practice. Second Edition. Chapman & Hall / CRC.