STT9000 Clustering and Scaling of Categorical Data, with Applications in Business and Social Sciences

Bodø, 29 - 30 April and 2 - 3 May 2019


ECTS credits: 5
Level of course: Ph.D. course
Type of course: Elective for students in business or other behavioral and social sciences. 
Duration: 29-30 April and 2-3 May 2019 
Application deadline: 29 March 2019
Location: Nord University, Bodø 
Course responsible person: Tor Korneliussen, Nord University 
Language: English. 

Faculty: Professor Michael Greenacre, Universitat Pompeu Fabra, Barcelona and Professor II at Nord University Business School https://www.barcelonagse.eu/people/greenacre-michael

Costs:
The course is free. 

Course evaluation: 
An online survey will be implemented asking for feedback on the course, including how it can be improved.

Course content:
This course is aimed at PhD students and researchers working with large sets of categorical data, for example social survey data, purchasing behaviour data, public health data, or data from official databases. The fundamental concepts of similarity and distance are explained, as well as dimension reduction. These concepts underlie the multivariate methods used to analyse such data, either by forming groups (clusters) or byforming new scales that combine several variables (scaling, or dimension reduction) with maximum retention of information.
Methods treated include: 
hierarchical and nonhierarchical cluster analysis, 
latent class analysis, 
metric and non-metric multidimensional scaling, 
simple and multiple correspondence analysis, 
canonical correspondence analysis and 
discriminant analysis. 
Attention will also be given to applications of these methods on a large scale to related problems, for example nearest- neighbour prediction, network analysis and recommender systems.
The course is practically oriented, with the accent on understanding enough of the theory of these methodologies to feel comfortable when applying them to real-life empirical data and with the interpretation of the results.  Students will apply these methods in the course's practical sessions, using different software options. The advantages and disadvantages of different statistical packages will be illustrated and discussed.

Learning outcomes:
Based on this course the participants will:

Knowledge
have an understanding of the theory and algorithms underlying the various multivariate methods of clustering and scaling, especially the basic concepts of similarity and distance, and the important principle of dimension reduction;
have an insight into the extensive toolbox of available methods for analysing a real-life data set to answer research questions;
have an understanding of the results of these methods and how to interpret them;
have an appreciation of the advantages and disadvantages of the available software options.

Skills
have the ability to recognize the correct multivariate method for solving a clustering or scaling problem in a particular research context and for a particular data set;
have the capability to implement these methods using statistical software;
have the assuredness and confidence in interpreting the analytical results, correctly reporting them and making accurate conclusions.

General competence
have dominated the use of a set of multivariate methods for clustering and scaling, to be able to include advanced statistical analysis in their research;
have increased their knowledge of an area of advanced statistical analysis to be used in their research;
have increased their ability to communicate quantitative results in their scientific writing and presentations. 

Course prerequisites:
Students should be admitted to a PhD program or have the qualifications to be admitted to a PhD program. 
A knowledge of basic statistics is assumed, namely the concepts of univariate statistics: summary statistics, e.g. mean, variance, etc...; data plotting, e.g. histograms and boxplots; and hypothesis testing, e.g. t-tests,  as well as regression analysis and analysis of variance.

Mode of delivery:
Face-to-face lectures.

Organization and learning activities:
This is an intensive course of four days with individual study required prior to and after the course. 

Exam and evaluation:
Participation in lectures, and the use and application in a course assignment of one or more of the statistical methods covered in the course, written up as a paper.  The paper will be graded: pass / non pass.

Reading list:
Everitt, B. et al. (2011) Cluster Analysis, 5th edition, Wiley, UK (a very comprehensive book on cluster analysis)
Greenacre, M. (2010) Biplots in Practice. BBVA Foundation, Madrid. In public domain, free download from www.multivariatestatistics.org. (an anthology of methods of dimension reduction, including almost all the scaling methods dealt with in this course).
Greenacre, M. (2013) Multivariate Analysis of Ecological Data. BBVA Foundation, Bilbao. In public domain, free download from www.multivariatestatistics.org. (a handbook for ecologists, but just as useful for any applied researcher, including methods of clustering and scaling)
Greenacre, M. 2016. Correspondence Analysis in Practice. Third Edition. Chapman & Hall / CRC. (the prime reference for correspondence analysis.  A free edition in Spanish translation is available at  www.multivariatestatistics.org) 

Course presenter: 
Michael Greenacre is Professor of Statistics at the Universitat Pompeu Fabra, Barcelona, specializing in multivariate data analysis, principally in the social and environmental sciences.  He also teaches Methods of Marketing Research in the Barcelona School of Management and Data Visualization in the Barcelona Graduate School of Economics.  Apart from more than 80 published articles in international journals he has written six books on correspondence analysis and related methods and co-edited four books (with Jörg Blasius) on data visualization.  He has given short courses in 15 countries in Europe, north and south America, Africa and Australia. 




Application
Deadline 29 March 2019

Practical information

The city of Bodø

Bodø is home to around 50,000 people and is one of the fastest growing cities in the country, with a lively urban scene.

 

Photo: David Grandorge

Getting to Bodø


  • Bodø is the hub of Nordland and can be reached by plane, train and boat. 
  • Bodø's airport is located in the city itself, making it quick and easy to fly in and out. Oslo Gardemoen is a 90 minute flight away.

Photo: Ernst Furuhatt / www.nordnorge.com

Transport in Bodø

  • The main campus is located at Mørkved, about 9 kilometers from the centre of Bodø.
  • It is easy to take the bus from the airport or city center to campus.

Accomodation in Bodø

  • At campus there is a student hotel “Nordavind” which offers short-time rent. For more information. 
  • You may also stay at a hotel in the centre of Bodø.
  • We advise you to book accommodation as early as possible as hotels in Bodø at times are fully booked