Subject description for 2023/24
Advanced Biological Data Analysis
BIO9000
Subject description for 2023/24

Advanced Biological Data Analysis

BIO9000
This course offers an in-depth overview of statistical methods for biological data analysis. The course consists of a series of lectures, demonstrations and computer laboratories that cover good practice in statistics and biological data analysis. Topics include general and generalized linear models, categorical data analysis, parametric and non-parametric statistics, and multivariate statistics.

This course offers an in-depth overview of statistical methods for biological data analysis. The course consists of a series of lectures, demonstrations and computer laboratories that cover good practice in statistics and biological data analysis. Topics include general and generalized linear models, categorical data analysis, parametric and non-parametric statistics, and multivariate statistics.

The course consists of two major parts:

Part I (non-compulsory)

This part is organised together with master course BI300F during the last week of August as an intense training week in basic biological data analysis. Attending this part is strongly recommended if you haven’t had a similar course before, or if you want to refresh your basic data analysis skills. Part II (compulsory)

This part covers advanced biological data analysis, and follows after part I, with 1 afternoon session per week in September-October.

Advanced Biological Data Analysis - BIO9000 (part I)

1. Introduction to R 2. Pearson correlation 3. T-test 4. Simple linear regression 5. Model diagnosis and influential observations 6. One-way between group ANOVA 7. Multiple linear regression and interaction 8. Multiway between group ANOVA 9. ANCOVA 10. Nonparametric statistics 11. Analysis of contingency tables 12. Data visualisation

Advanced Biological Data Analysis - BIO9000 (part II)

1 - Complex ANOVA designs: Repeated measures ANOVA, Complex ANOVA designs, nested ANOVA and linear mixed models,Model selection

2 - Generalized linear models:Logistic regression, Generalized linear models, Generalized linear mixed models

3 - Advanced regression techniques:Polynomial regression, Nonlinear regresssion and nonlinear mixed models

4 - Multivariate statistics:PCA and biplot, Non-metric multidimensional scaling and cluster analysis

5 - Special topics I

6 - Special topics II

7 - Special topics III

Practical Information

Part I includes a series of videos which you have to watch before each session (flipped classroom style). The sessions themselves will focus a lot on data analysis in practise using the R/Rstudio software.

Part II consists of classical lectures. On the first four Mondays of part II, we cover complex ANOVA designs, generalized linear models, advanced regression techniques and multivariate statistics. On one of the last three Mondays, you will have to teach yourself. For this you choose a topic from the list of "special topics" (see below), and you prepare and deliver a 45 min lecture on this topic, combining theory + exercises. For this you don't have to start from scratch, as you will receive a powerpoint presentation and R script on the topic. So, you have to make sure that you understand the topic and the R-script, practise your presentation, and make sure that the R script is free of bugs.

The 10 special topics to choose from are:

1. Spline based regression techniques 2. Generalized additive models 3. Survival analysis 4. Nonparametric statistics, bootstrapping and permutation tests 5. Bayesian inference 6. Discriminant analysis 7. MANOVA 8. Canonical ordination 9. Experimental design 10. Power analysis

This is a PhD-level course.

Have to have completed the BI300F Scientific Communication and Research Methods or simular course.

After having completed the course, the student should:

Knowledge: - have acquired in-depth understanding of statistical methods and their appropriate use in biological data analysis.

Skills: - have acquired the tools and abilities to conduct statistical analyses, including data quality control, data visualisation, model diagnosis, analyses, interpretation of results, and reporting of results. - have acquired the ability to work with the statistical software R.

General competencies: - be able to exchange statistical skills and knowledge with biologists and contribute to the development of good practice in biological data analysis; - develop an understanding of statistical methods in modern scientific research.

No tuition fees. Costs for semester registration and course literature apply.

Elective: PhD Biosciences

Lectures, practice sessions, presentations, seminars

In order to pass the course, you have to fulfil the following aspects of the "portfolio":

1. 80% attendance of lectures and practice sessions. This only applies to part II of the course, so you can skip 1 of the 7 sessions (note that it is possible to attend the course remotely).

2. Oral presentation (= your special topic presentation)

3. Written report (= analysis of a dataset of your own choice, ideally your PhD data or data from another project you are involved in), to be delivered via Inspera.

About the written report

This assignment can be a win-win as you can both obtain 5 ECTS credits and simultaneously make progress with your thesis work (provided that you work with your own data). Importantly, the spirit of the assignment is that you demonstrate that you are able to:

1. Adequately/independently perform data analysis: especially, that you can avoid big mistakes such as wrong (or suboptimal) choice of analysis, wrong statistical inference, concluding something based on figures alone, concluding something based on statistics alone, not adequately addressing the biological question with an appropriate statistical method, violation of assumptions or lack of model diagnosis, big coding mistakes, and so on. Any of these issues could be a reason to fail the assignment. In this case the you will have to submit an improved version of the report at a later stage. It is also important that you demonstrate that you can use at least two advanced statistical methods (i.e. methods from at least two different topics covered by part II of the course).

2. Produce the data analysis sections and statistical results of your PhD chapters: This relates to the quality of the report itself, and how adequately and understandably you explain your hypothesis, what data you collected, how you analysed the data, and which results you obtained.

Portfolio, grading rule Pass/Fail.