Training for Scientists

Description of BioSS Training Courses

BioSS runs a series of courses in statistics, mathematical modelling and molecular sequence analysis each year. The objectives are to give scientists confidence in analysing their own data and to enable them to recognise when they should consult a statistician. Statistical ideas are presented in more intuitive and appealing ways than in traditional statistical courses. By using a computing laboratory, plenty of opportunities are given for gaining practical experience in handling data. Course numbers are limited to allow time for interaction between participants and presenters. Most of these courses are held in Edinburgh, Aberdeen, Ayr and Dundee, but other locations can be offered by arrangement.

Basic Statistics courses

These two day courses introduce the important ideas in statistics and data analysis. It assumed that participants have no previous knowledge of statistics, or have not used it for a long time.

Emphasis is placed on exploratory methods for examining data prior to analysis, using graphs, tables and summary statistics. The course then progresses to cover the elementary aspects of estimation and testing, including

The course includes several practical sessions that provide an opportunity to discuss the theory and try out the methods. The practical exercises make use of computer software that may be Genstat, Minitab, R or MS Excel depending on the particular course. Sometimes all three are available and participants can use the package of their choice. Details of which package(s) will be available are indicated on the courses timetable. The basics of how to use these packages is also covered, no previous experience being necessary.

These courses also form a good foundation for the more specialised courses, which cover particular topics in statistics.

^ Top

Getting Started in R

R is a free software environment for statistical computing and graphics, available for Windows, Linux, Unix and Macintosh systems.

This course introduces participants to the use of R on Windows systems. The basics of how to install R, write simple scripts, read and manipulate data, create graphics, do statistical analyses and very basic programming are covered. Practicals form an important part of the course. No previous experience with R is assumed.

^ Top

Experimental Design and Analysis of Variance

Experimentation is fundamental to much scientific and engineering research. Careful consideration of design is needed to make best use of the experimental resources available. Appropriate analysis reveals the conclusions that can be drawn from the resulting data.

This two day course covers the important topics in design and analysis, including randomisation, replication, blocking, factorial treatment structures, use of covariates and choice of design. Analysis of variance is used to interpret experimental results.

This course is suitable for scientists who have a good working knowledge of basic statistical concepts but have little or no experience of collecting and analysing data in more complex situations. To benefit from the course delegates should be familiar with the ideas of estimation, including standard errors and confidence intervals, and understand the rationale and output of standard statistical tests such as the t-test.

A Basic Statistics course is available if you feel you need to increase or refresh your knowledge of statistics before embarking on this course. More specialised courses in this area, including Statistical Methods for Repeated Measures Data and Introduction to Mixed Models and REML, are also available. If you are in any doubt about which course would be most appropriate for your needs please contact Graham horgan.

^ Top

Regression and Curve Fitting

Regression is used to investigate and quantify the relationships between variable quantities. This two-day course begins with a careful examination of the simplest case, linear regression, and then progresses to include

The notes also cover the ideas of Generalized Additive Models which interpret trends in data by direct smoothing rather than by fitting parametric curves.

^ Top

Graphical Methods for Multivariate Data

Multivariate data arise whenever several variables are recorded on each subject of study. Many methods are now available for studying the structure and patterns in such data. This two-day course will cover the most useful of these, including

^ Top

Statistical Methods for Repeated Measures Data

Repeated measures data arise whenever several measurements of the same variable are made on each subject of study, usually at different times. A range of methods for studying such data are described, and the situations where each may be used are discussed. The topics covered in this 1.5 day course are

Introduction to Mixed Models and REML

Mixed models are used when data have a complex structure with random variation occurring at different levels. REML provides a method to estimate how much variability is due to each level, and the extent to which factors and covariates of interest affect the outcome variables. This course introduces the use of mixed models and the topics covered include :-

Practical sessions in which data (including any provided by the participants) are analysed using Genstat are an important part of the course. Participants should be familiar with Basic Statistics including ANOVA and simple linear regression.

^ Top

Introduction to Mathematical Modelling

This course is suitable for scientists seeking to understand how process-orientated mathematical models are designed, implemented and used in order to aid scientific insight into dynamic biological processes. The emphasis is on building models based on assumptions derived from current understanding of the key biological processes that make up the system under study. Computational and mathematical techniques are then used to follow through the consequences of these assumptions, and discrepancies between the resultant (emergent)behaviour and observations can then be used to assess the validity of the underlying biological understanding.

Both deterministic and stochastic methods are used to illustrate generic issues in modelling dynamic biological processes including the translation of biological understanding into mathematical descriptions, computer-based simulation, and their mathematical analysis. No previous experience or mathematical knowledge is required. Key broadly applicable methodologies are illustrated via application to simple examples drawn from a wide range of systems studied in mathematical biology including population dynamics, predator-prey interactions and epidemiology.

Although tidy distinctions become blurred in practice, the philosophy of such dynamic modelling contrasts with purely statistical approaches (e.g. regression) which seek to identify patterns in data, for example by correlating genotype with measured phenotypic characteristics, or disease distribution with potential risk factors. Dynamic process-orientated modelling offers an important tool for the biological scientist since it provides a rigorous approach for testing the current understanding of a system, often leading to important scientific insights and highlighting areas for further study. In so far as key processes are modelled correctly, it also offers a more robust approach to extrapolation beyond the range of current observations as compared with purely correlative approaches. (This course does not use Genstat).

^ Top

Introduction to Multiple Alignment and Phylogenetic Analysis

This one-day course introduces methods, software and practical skills for aligning molecular sequences and estimating a phylogenetic tree from an alignment, based on protein sequence data or protein-coding DNA. The objective of the analysis could be to estimate either a species phylogeny (using one or more orthologous genes) or the study of a multi-gene family (involving speciation events and gene duplication events). The quality of the multiple alignment is crucial for all analyses that are subsequently carried out. Modern phylogenetic methods based on Maximum Likelihood (e.g. RAxML, PhyML, IQ-TREE) will be covered. Advanced topics (e.g. Bayesian phylogenetic methods) will be discussed and software reviewed (e.g. TOPALi, MEGA).

^ Top

Statistical Design and Analysis of Microarray Experiments

In recent years microarray technology has made it possible to generate gene expression data for thousands of genes simultaeneously. Although this technology allows scientists to investigate many new and fascinating processes on the molecular level it also brings with it the problem how to analyse such masses of complex data. In this 2-day course we will cover some of the most important statistical tools to make sense of microarray data. We will also discuss statistical issues in the design of the experiment that facilitate a meaningful subsequent analysis. The course will deal with two-channel cDNA-Arrays as well as with oligonucleotide arrays (Affymetrix Chips) and additonal to the experimental design it will cover problems of normalisation, detecting differential genes and multivariate analysis (e.g. cluster analysis).

Association Mapping using R

This course will introduce the basic concepts of Association Mapping. The course will equip participants with the necessary information and software to conduct an Association Mapping Analysis on their own data, highlighting areas that need to be considered such as accounting for population structure and relationships between individuals. The course is interactive with practical examples using the software R. It assumes participants are:

Linkage analysis and QTL mapping in plants

These courses cover the theory and practice of linkage analysis and QTL mapping in plants. The course is split into three sections:

The course assumes a knowledge of basic statistical concepts, such as simple probability and analysis of variance.

Participants are welcome to bring their own data, but should contact us to discuss file formats beforehand.

Getting started in Bacterial Phylogenomics

This one-day course introduces methods, software and practical skills for analysing about 30-50 loci from bacterial genomes. This constitutes a small-scale phylogenomics project and may be of interest to people not working with bacteria also. A number of housekeeping loci wil be selected for analysis to allow an estimate of the core phylogeny, and the other loci will come from the accessory genome. Project management will involve using TOPALi to hold individual alignments and Sequence Matrix to concatenate loci and reformat the multi-loci alignment for analysis in RAxML or IQ-TREE, and to produce gene-content matrices. The DENDROSCOPE Tanglegram option will be used to visually compare each accessory locus to the core phylogeny and statistical tests of lack of congruence of trees will be carried out using IQ-TREE. Interpretation of results in terms of Horizontal Gene Tranfer and Gene loss/Gene gain will also be covered. The use of phylogenetic networks (SPLITSTREE) to visualize the variability on a set of phylogenetic trees estimated from several loci will also be included

If you wish to register for a course, please book on-line. For enquiries, you can contact Magda Widera or Graham Horgan

^ Top

Knowledge Exchange

User Friendly Software

Training For Scientists

BioSS Doctoral Training Programme

Meetings & Seminars