BioSS Courses


Our current portfolio of BioSS courses in statistics, mathematical modelling and molecular sequence analysis is shown below. Most of these courses are held in Edinburgh, Aberdeen and Dundee, but other locations can be offered by arrangement. We are also now offering some of our courses online. The number of course participants is deliberately limited to give more time for interaction between participants and presenters.

For information on course timetables and charges please select from the menu on the right.

If you wish to register for a course, please book online. If your course is not currently scheduled please register interest; this lets us gauge demand and will increase the chances of it running in future. For other enquires, please email BioSS Training.

Online Courses (1)

Getting Started in R - Online Course

R is a free software environment for statistical computing and graphics, available for Windows, Linux, Unix and Macintosh systems. This course introduces participants to the use of R: the basics of how to write simple scripts in R, read and manipulate data, create graphics, do statistical analyses and very basic programming are covered. Practicals form an important part of the course. Our online course consists of self-learning supported by videos, practicals and other supplementary material. It is structured in 4 modules with an interactive session with a BioSS tutor after each module.

No prior knowledge of R is required, but you will need to have R and RStudio installed on your computer before starting the course. Note that this course is an introductory course about learning to work in R, it is not a course about learning basic statistics. The final module of the course (Module 4) illustrates how to carry out t-tests, regression and ANOVA, but it does not discuss in detail their use and interpretation. For those who do not have existing knowledge of basic statistics we recommend that you take Modules 1-3 of this course, and then take the BioSS "Basic Statistics in R" course.

Courses Held In-person (13)

Getting Started in R

R is a free software environment for statistical computing and graphics, available for Windows, Linux, Unix and Macintosh systems.

This course introduces participants to the use of R on Windows systems. The basics of how to install R, write simple scripts, read and manipulate data, create graphics, do statistical analyses and very basic programming are covered. Practicals form an important part of the course. No previous experience with R is assumed.

Basic Statistics courses

These two day courses introduce the important ideas in statistics and data analysis. It assumed that participants have no previous knowledge of statistics, or have not used it for a long time.

Emphasis is placed on exploratory methods for examining data prior to analysis, using graphs, tables and summary statistics. The course then progresses to cover the elementary aspects of estimation and testing, including

  • One and two-sample t-tests
  • Confidence intervals
  • One- and two-way analysis of variance
  • Simple linear regression

The course includes several practical sessions that provide an opportunity to discuss the theory and try out the methods. The practical exercises make use of computer software that may be Genstat, Minitab, R or MS Excel depending on the particular course. Sometimes all three are available and participants can use the package of their choice. Details of which package(s) will be available are indicated on the courses timetable. The basics of how to use these packages is also covered, no previous experience being necessary.

These courses also form a good foundation for the more specialised courses, which cover particular topics in statistics.

Experimental Design and Analysis of Variance

Experimentation is fundamental to much scientific and engineering research. Careful consideration of design is needed to make best use of the experimental resources available. Appropriate analysis reveals the conclusions that can be drawn from the resulting data.

This two day course covers the important topics in design and analysis, including randomisation, replication, blocking, factorial treatment structures, use of covariates and choice of design. Analysis of variance is used to interpret experimental results.

This course is suitable for scientists who have a good working knowledge of basic statistical concepts but have little or no experience of collecting and analysing data in more complex situations. To benefit from the course delegates should be familiar with the ideas of estimation, including standard errors and confidence intervals, and understand the rationale and output of standard statistical tests such as the t-test.

A Basic Statistics course is available if you feel you need to increase or refresh your knowledge of statistics before embarking on this course. More specialised courses in this area, including Statistical Methods for Repeated Measures Data and Introduction to Mixed Models and REML, are also available. If you are in any doubt about which course would be most appropriate for your needs please contact Graham Horgan.

Regression and Curve Fitting

Regression is used to investigate and quantify the relationships between variable quantities. This two-day course begins with a careful examination of the simplest case, linear regression, and then progresses to include

  • Regression with several explanatory variables
  • Non-linear regression (exponential and growth curves)
  • General non-linear modelling
  • Generalised linear models

The notes also cover the ideas of Generalized Additive Models which interpret trends in data by direct smoothing rather than by fitting parametric curves.

Introduction to Mixed Models and REML

Mixed models are used when data have a complex structure with random variation occurring at different levels. REML provides a method to estimate how much variability is due to each level, and the extent to which factors and covariates of interest affect the outcome variables. This course introduces the use of mixed models and the topics covered include :-

  • When to use mixed models
  • Fixed and random effects
  • Choosing a model of variability
  • Estimating and testing fixed and random effects
  • Mixed models and regression / covariates
  • Modelling dependency among observations
  • Generalised linear mixed models

Practical sessions in which data (including any provided by the participants) are analysed using Genstat are an important part of the course. Participants should be familiar with Basic Statistics including ANOVA and simple linear regression.

Statistical Methods for Repeated Measures Data

Repeated measures data arise whenever several measurements of the same variable are made on each subject of study, usually at different times. A range of methods for studying such data are described, and the situations where each may be used are discussed. The topics covered in this 1.5 day course are

  • Plotting and displaying repeated measures data
  • Analysis of summaries
  • Split plot analysis
  • Multivariate analysis of variance
  • Antedependence modelling
  • Design of repeated measures experiments

Graphical Methods for Multivariate Data

Multivariate data arise whenever several variables are recorded on each subject of study. Many methods are now available for studying the structure and patterns in such data. This two-day course will cover the most useful of these, including

  • Principal components analysis and biplots
  • Canonical variates and discriminant analysis
  • Principal coordinates analysis
  • Multidimensional scaling
  • Classification techniques.
  • Clustering

Introduction to Mathematical Modelling

This course is suitable for scientists seeking to understand how process-orientated mathematical models are designed, implemented and used in order to aid scientific insight into dynamic biological processes. The emphasis is on building models based on assumptions derived from current understanding of the key biological processes that make up the system under study. Computational and mathematical techniques are then used to follow through the consequences of these assumptions, and discrepancies between the resultant (emergent)behaviour and observations can then be used to assess the validity of the underlying biological understanding.

Both deterministic and stochastic methods are used to illustrate generic issues in modelling dynamic biological processes including the translation of biological understanding into mathematical descriptions, computer-based simulation, and their mathematical analysis. No previous experience or mathematical knowledge is required. Key broadly applicable methodologies are illustrated via application to simple examples drawn from a wide range of systems studied in mathematical biology including population dynamics, predator-prey interactions and epidemiology.

Although tidy distinctions become blurred in practice, the philosophy of such dynamic modelling contrasts with purely statistical approaches (e.g. regression) which seek to identify patterns in data, for example by correlating genotype with measured phenotypic characteristics, or disease distribution with potential risk factors. Dynamic process-orientated modelling offers an important tool for the biological scientist since it provides a rigorous approach for testing the current understanding of a system, often leading to important scientific insights and highlighting areas for further study. In so far as key processes are modelled correctly, it also offers a more robust approach to extrapolation beyond the range of current observations as compared with purely correlative approaches. (This course does not use Genstat).

Introduction to Multiple Alignment and Phylogenetic Analysis

This one-day course introduces methods, software and practical skills for aligning molecular sequences and estimating a phylogenetic tree from an alignment, based on protein sequence data or protein-coding DNA. The objective of the analysis could be to estimate either a species phylogeny (using one or more orthologous genes) or the study of a multi-gene family (involving speciation events and gene duplication events). The quality of the multiple alignment is crucial for all analyses that are subsequently carried out. Modern phylogenetic methods based on Maximum Likelihood (e.g. RAxML, PhyML, IQ-TREE) will be covered. Advanced topics (e.g. Bayesian phylogenetic methods) will be discussed and software reviewed (e.g. TOPALi, MEGA).

Association Mapping using R

This course will introduce the basic concepts of Association Mapping. The course will equip participants with the necessary information and software to conduct an Association Mapping Analysis on their own data, highlighting areas that need to be considered such as accounting for population structure and relationships between individuals. The course is interactive with practical examples using the software R. It assumes participants are:

  • Familiar with the concept of a simple linear model
  • Proficient in R-software

Linkage analysis and QTL mapping in plants

These courses cover the theory and practice of linkage analysis and QTL mapping in plants. The course is split into three sections:

  • Exploratory analysis and linkage analysis in crosses derived from homozygous parents e.g. DH, BC, F2
  • QTL mapping in crosses derived from homozygous parents
  • Extension of methods to linkage analysis and QTL mapping in full sib populations derived from heterozygous parents

The course assumes a knowledge of basic statistical concepts, such as simple probability and analysis of variance.

Participants are welcome to bring their own data, but should contact us to discuss file formats beforehand.

Getting started in Bacterial Phylogenomics

This one-day course introduces methods, software and practical skills for analysing about 30-50 loci from bacterial genomes. This constitutes a small-scale phylogenomics project and may be of interest to people not working with bacteria also. A number of housekeeping loci wil be selected for analysis to allow an estimate of the core phylogeny, and the other loci will come from the accessory genome. Project management will involve using TOPALi to hold individual alignments and Sequence Matrix to concatenate loci and reformat the multi-loci alignment for analysis in RAxML or IQ-TREE, and to produce gene-content matrices. The DENDROSCOPE Tanglegram option will be used to visually compare each accessory locus to the core phylogeny and statistical tests of lack of congruence of trees will be carried out using IQ-TREE. Interpretation of results in terms of Horizontal Gene Tranfer and Gene loss/Gene gain will also be covered. The use of phylogenetic networks (SPLITSTREE) to visualize the variability on a set of phylogenetic trees estimated from several loci will also be included.

Statistical Design and Analysis of Microarray Experiments

In recent years microarray technology has made it possible to generate gene expression data for thousands of genes simultaeneously. Although this technology allows scientists to investigate many new and fascinating processes on the molecular level it also brings with it the problem how to analyse such masses of complex data. In this 2-day course we will cover some of the most important statistical tools to make sense of microarray data. We will also discuss statistical issues in the design of the experiment that facilitate a meaningful subsequent analysis. The course will deal with two-channel cDNA-Arrays as well as with oligonucleotide arrays (Affymetrix Chips) and additonal to the experimental design it will cover problems of normalisation, detecting differential genes and multivariate analysis (e.g. cluster analysis).