This Task View contains information about using R to analyse ecological and environmental data.
The base version of R ships with a wide range of functions for use within the field of environmetrics.
This functionality is complemented by a plethora of packages available via CRAN, which provide specialist
methods such as ordination & cluster analysis techniques. A brief overview of the available packages is
provided in this Task View, grouped by topic or type of analysis. As a testament to the popularity of R for the
analysis of environmental and ecological data, a
Journal of Statistical Software
was produced in 2007.
Those useRs interested in environmetrics should consult the
Complementary information is also available in the
If you have any comments or suggestions for additions or improvements, then please contact the
A list of available packages and functions is presented below, grouped by analysis type.
These packages are general, having wide applicability to the environmetrics field.
Modelling species responses and other data
Analysing species response curves or modeling other data often involves the fitting of standard statistical models
to ecological data and includes simple (multiple) regression, Generalised Linear Models (GLM), extended regression
(e.g. Generalised Least Squares [GLS]), Generalised Additive Models (GAM), and mixed effects models, amongst
The base installation of R provides
for fitting linear and generalised
linear models, respectively.
Generalised least squares and linear and non-linear mixed effects models extend the simple regression model
to account for clustering, heterogeneity and correlations within the sample of observations. Package
provides functions for fitting these models. The package is supported by Pinheiro & Bates (2000)
Mixed-effects Models in S and S-PLUS
, Springer, New York. An updated approach to mixed effects models,
which also fits Generalised Linear Mixed Models (GLMM) and Generalised non-Linear Mixed Models (GNLMM) is provided
package, though this is currently beta software and does not yet allow correlations within
the error structure.
fits GAMs and Generalised Additive Mixed Models (GAMM) with
automatic smoothness selection via generalised cross-validation. The author of
also written a companion monograph, Wood (2006)
Generalized Additive Models; An Introduction with R
Chapman Hall/CRC, which has an accompanying package
provides an implementation of the S-PLUS function
includes LOESS smooths.
Proportional odds models for ordinal responses can be fitted using
package, of Bill Venables and Brian Ripley.
A negative binomial family for GLMs to model over-dispersion in count data is available in
Models for overdispersed counts and proportions
also contains several functions for dealing with over-dispersed count data. Poisson or
negative binomial distributions are provided for both zero-inflated and hurdle models.
provides a suite of functions to analyse overdispersed counts or proportions, plus utility
functions to calculate e.g. AIC, AICc, Akaike weights.
Detecting change points and structural changes in parametric models is well catered for in the
package and the
has recently been
the subject of an R News article (
R News, volume 8 issue 1
Tree-based models are being increasingly used in ecology, particularly for their ability to fit flexible models to
complex data sets and the simple, intuitive output of the tree structure. Ensemble methods such as bagging, boosting and
random forests are advocated for improving predictions from tree-based models and to provide information on uncertainty
in regression models or classifiers.
Tree-structured models for regression, classification and survival analysis, following the ideas in the CART book,
are implemented in
provides an implementation of conditional inference trees which embed tree-structured regression
models into a well defined theory of conditional inference procedures
Multivariate trees are available in
can also handle multivariate responses.
Ensemble techniques for trees:
The Random Forest method of Breiman and Cutler is implemented in
randomForest, providing classification
and regression based on a forest of trees using random inputs
provides functions for improved predictive models for classification, regression and
Graphical tools for the visualization of trees are available in package
implement Multivariate Adaptive Regression Splines (MARS), a technique
which provides a more flexible, tree-based approach to regression than the piecewise constant functions used in
R and add-on packages provide a wide range of ordination methods, many of which are specialised techniques
particularly suited to the analysis of species data. The two main packages are
derives from the traditions of the French school of
Analyse des Donnees
and is based on the use of the duality diagram.
the approach of Mark Hill, Cajo ter Braak and others, though the implementation owes more to that presented in
Legendre & Legendre (1988)
Numerical Ecology, 2
, Elsevier. Where the
two packages provide duplicate functionality, the user should choose whichever framework that best suits their
Principal Components (PCA) is available via the
ade4), provide more ecologically-orientated implementations.
Redundancy Analysis (RDA) is available via
Canonical Correspondence Analysis (CCA) is implemented in
Detrended Correspondence Analysis (DCA) is implemented in
Principal coordinates analysis (PCO) is implemented in
Non-Metric multi-Dimensional Scaling (NMDS) is provided by
nmds(), a wrapper function for
is also provided by package
provides helper function
isoMDS(), implementing random starts of the algorithm and standardised scaling of the NMDS results.
The approach adopted by
is the recommended approach for ecological
Coinertia analysis is available via
mcoa(), both in
Co-correspondence analysis to relate two ecological species data matrices is available in
Canonical Correlation Analysis (CCoA - not to be confused with CCA, above) is available in
in standard package stats.
Procrustes rotation is available in
ade4, with both
providing functions to test the significance of
the association between ordination configurations (as assessed by Procrustes rotation) using permutation/randomisation
and Monte Carlo methods.
Constrained Analysis of Principal Coordinates (CAP), implemented in
fits constrained ordination models similar to RDA and CCA but with any any dissimilarity coefficient.
Constrained Quadratic Ordination (CQO; formerly known as Canonical Gaussian Ordination (CGO)) is a maximum likelihood
estimation alternative to CCA fit by Quadratic Reduced Rank Vector GLMs. Constrained Additive Ordination (CAO) is a
flexible alternative to CQO which uses Quadratic Reduced Rank Vector GAMs. These methods and more are provided in
Fuzzy set ordination (FSO), an alternative to CCA/RDA and CAP, is available in package
complements a recent paper on fuzzy sets in the journal
by Dave Roberts (2008, Statistical analysis of
multidimensional fuzzy set ordinations.
See also the
task view for complementary information.
Much ecological analysis proceeds from a matrix of dissimilarities between samples. A large amount of effort has
been expended formulating a wide range of dissimilarity coefficients suitable for ecological data. A selection of
the more useful coefficients are available in R and various contributed packages.
Standard functions that produce, square, symmetric matrices of pair-wise dissimilarities include:
in standard package stats
in recommended package
a suite of functions in
provides functions for the calculation of similarity and multiple plot similarity
measures with binary data (for instance presence/absence species data)
can be used to calculate dissimilarity between samples
of one matrix and those of a second matrix. The same function can be used to produce pair-wise dissimilarity matrices,
though the other functions listed above are faster.
can also be used to generate
matrices based on Gower's coefficient for mixed data (mixtures of binary, ordinal/nominal and continuous variables).
provides a faster implementation of Gower's coefficient for
mixed-mode data than
if a standard dissimilarity matrix is required. Function
also computes Gower's coefficient and impliments extensions to ordinal variables.
Cluster analysis aims to identify groups of samples within multivariate data sets. A large range of
approaches to this problem have been suggested, but the main techniques are hierarchical cluster analysis,
partitioning methods, such as
-means, and finite mixture models or model-based clustering. In the machine
learning literature, cluster analysis is an unsupervised learning problem.
task view provides a more detailed discussion of available cluster analysis methods and
appropriate R functions and packages.
Hierarchical cluster analysis:
in standard package stats
provides functions for cluster analysis following the methods
described in Kaufman and Rousseeuw (1990)
Finding Groups in data: an introduction to cluster analysis
Wiley, New York
is a package for assessing the uncertainty in hierarchical cluster analysis. It provides
-values as well as bootstrap
in stats provides
implements a fuzzy version of the
also provides functions for various partitioning methodologies.
Mixture models and model-based cluster analysis:
provide implementations of model-based cluster analysis.
clusters a species presence-absence matrix object by calculating an
from the distances, and applying maximum likelihood Gaussian
mixtures clustering to the MDS points. The maintainer's, Christian Hennig, web site contains several publications in
ecological contexts that use
prabclus, especially Hausdorf & Hennig (2007;
Oikos 116 (2007), 818-828
There is a growing number of packages and books that focus on the use of R for theoretical ecological models.
provides a wide range of functions related to ecological theory, such as diversity indices
Hill's numbers [e.g. Hill's N
] and rarefaction), ranked abundance diagrams,
Fisher's log series, Broken Stick model, Hubbell's abundance model, amongst others.
provides the diversity measures suggested by Jost
2006, Oikos 113(2), 363-375
2007, Ecology 88(10), 2427-2439
provides a collection of utilities for biodiversity data, including the simulation ecological drift
under Hubbell's Unified Neutral Theory of Biodiversity, and the calculation of various diagnostics such as Preston
is a support software for Stevens
A Primer of Ecology with R
). The package provides a variety of functions for modeling ecological data and basic theoretical ecology,
including functions related to demographic matrix models, metapopulation and source-sink models, host-parasitoid and
disease models, multiple basins of attraction, the storage effect, neutral theory, and diversity partitioning.
provides a GUI for biodiversity and community ecology analysis.
implements all of the diversity indices reviewed in
Koleff et al (2003;
Animal Ecology 72(3), 367-382
also provides a
method to produce the co-occurrence frequency triangle plots
of the type found in Koleff et al (2003).
betadisper(), also in
vegan, implements Marti Anderson's distance-based test for
homogeneity of multivariate dispersions (PERMDISP, PERMDISP2), a multivariate analogue of Levene's test (Anderson
). Anderson et al (2006;
Ecology Letters 9(6), 683-693
demonstrate the use of this approach for measuring beta diversity.
package computes several measures of functional diversity indices from multiple traits.
Estimating animal abundance and related parameters
This section concerns estimation of population parameters (population size, density, survival probability, site occupancy
etc.) by methods that allow for incomplete detection. Many of these methods use data on marked animals, variously called
'capture-recapture', 'mark-recapture' or 'capture-mark-recapture' data.
fits loglinear models to estimate population size and survival rate from capture-recapture data as
Baillargeon and Rivest (2007)
estimates survival by fitting the Cormack-Jolly-Seber open population model, using a flexible formula-based
approach for covariates (see also RMark).
also fits Huggin's closed population model and computes Horvitz-Thompson
estimates of open-population size from CJS models.
estimates population density given spatially explicit capture-recapture data from traps, passive DNA
sampling, automatic cameras, sound recorders etc. Models are fitted by maximum likelihood. The detection function may be
halfnormal, exponential, cumulative gamma etc. Density surfaces may be fitted. Covariates of density and detection parameters are
specified via formulae, as in
provides a graphical interface for fitting a spatially explicit capture-recapture model to photographic
'capture' data by the Bayesian method described in Royle et al. (
Ecology 90: 3233-3244
provides analyses of line-transect distance sampling data in which the density surface and the detection function
are estimated simultaneously (
et al. 2009
fits hierarchical models of occurrence and abundance to data collected on species subject to imperfect detection.
Examples include single- and multi-season occupancy models, binomial mixture models, and hierarchical distance sampling models. The data
can arise from survey methods such temporally replicated counts, removal sampling, double-observer sampling, and distance sampling.
Parameters governing the state and observation processes can be modeled as functions of covariates.
provides a formula-based R interface for the MARK package which fits a wide variety of capture-recapture
models. See the
(pdf) for further details.
provides a framework for handling data and analysis for mark-recapture.
can fit Cormack-Jolly-Seber (CJS)and Jolly-Seber (JS) models via maximum likelihood and the CJS model via MCMC. Maximum likelihood estimates for the CJS model can be obtained using R or via a link to the Automatic Differentiation Model Builder software. A
description of the package
was published in Methods in Ecology and Evolution.
fits detection functions to point and line transect distance sampling survey data (for both single and double observer surveys). Abundance can be estimated using Horvitz-Thompson-type estimators.
is a simpler interface to
for single observer distance sampling surveys.
density surface models
to spatially-referenced distance sampling data. Count data are corrected using detection function models fitted using
Distance. Spatial models are constructed as in
can also be used to simulate data from their respective models.
See also the
task view for analysis of animal tracking data under
Moving objects, trajectories
Modelling population growth rates:
can be used to construct and analyse age- or stage-specific matrix population models.
Environmental time series
Time series objects in R are created using the
function, though see
below for alternatives.
Classical time series functionality is provided by the
standard package stats for autoregressive (AR), moving average (MA), autoregressive moving average (ARMA) and
integrated ARMA (ARIMA) models.
package provides methods and tools for displaying and analysing univariate time series
forecasts including exponential smoothing via state space models and automatic ARIMA modelling
package provide a variety of more advanced estimation methods
and multivariate time series analysis.
provide general handling and analysis of time series data.
Irregular time series can be handled using packages
its, as well as by
provides functions specifically tailored for the analysis of space-time ecological series.
allows for testing, dating and monitoring of structural change in linear regression
Detecting change points in time series data --- see
package implements statistical methods for the modeling of and change-point detection
in time series of counts, proportions and categorical data. Focus is on outbreak detection in count data time series.
provides a convenient interface to fitting time series regressions via ordinary least
provides a different approach to that of
dynlm, which allows time series data to
be used with any regression function written in the style of lm such as
others, whilst preserving the time series information.
provides functions to assist in the processing and exploration of data from monitoring programs
for aquatic ecosystems, with a focus on time series data for physical and chemical properties of water, and for the
provides numerous tools to analyse, interpret and understand air pollution time series data
package is a collection of widely used methods to analyse, visualise, and interpret wind data. Wind resource analyses can subsequently be combined with characteristics of wind turbines to estimate the potential energy production.
Additionally, a fuller description of available packages for time series analysis can be found in the
Spatial data analysis
CRAN Task View for an overview of spatial analysis in R.
provides functions for models for extreme value statistics and is support software for Coles (2001)
An Introduction to Statistical Modelling of Extreme Values
, Springer, New York. Other packages for extreme value
Phylogenetics and evolution
Packages specifically tailored for the analysis of phylogenetic and evolutionary data include:
task view provides more detailed coverage of the subject area and related functions
UseRs may also be interested in Paradis (2006)
Analysis of Phylogenetics and Evolution with R
New York, a book in the new UseR series from Springer.
Several packages are now available that implement R functions for widely-used methods and approaches in pedology.
provides functions for soil texture plot, classification and transformation.
contains a collection of algorithms related to modeling of soil resources, soil classification,
soil profile aggregation, and visualization.
estimates the parameters in infiltration and water retention models by curve-fitting
The Soil Water project on r-forge.r-project.net provides packages providing soil water retention functions,
soil hydraulic conductivity functions and pedotransfer functions to estimate their parameter from easily available soil
properties. Two packages form the project:
Hydrology and Oceanography
A growing number of packages are available that implement methods specifically related to the fields of hydrology and
oceanography. Also see the
for related packages.
estimates the parameters in infiltration and water retention models by curve-fitting
is a package for management, analysis, interpolation and plotting of time series used in hydrology
and related environmental sciences.
is a package implementing both statistical and graphical goodness-of-fit measures between
observed and simulated values, mainly oriented to be used during the calibration, validation, and application of
hydrological/environmental models. Related packages are
tiger, which allows temporally resolved groups of
typical differences (errors) between two time series to be determined and visualized, and
provides quantitative and qualitative criteria to compare models with data and to measure similarity of patterns
is a model-independent global optimization tool for calibration of environmental and other real-world models that need to be executed from the system console.
implements a state-of-the-art
(SPSO-2011 and SPSO-2007 capable), with several fine-tuning options. The package is parallel-capable, to alleviate the computational burden of complex models.
provides a flexible foundation for scientists, engineers, and policy makers to base
teaching exercises as well as for more applied use to model complex eco-hydrological interactions.
is a set of hydrological functions including an R implementation of the hydrological model
TOPMODEL, which is based on the 1995 FORTRAN version by Keith Beven. New functionality is being developed as part of
package on R-Forge.
is a native R implementation and enhancement of the Dynamic TOPMODEL, Beven and Freers' (2001) extension to the semi-distributed hydrological model TOPMODEL (Beven and Kirkby, 1979).
provides tools for data processing and visualisation of results of the hydrological model WASIM-ETH
provides functions for calculating parameters of the seawater carbonate system.
package contains function for calculating stream metabolism
characteristics, such as GPP, NDM, and R, from single station diurnal Oxygen curves.
supports the analysis of Oceanographic data, including ADP measurements, CTD measurements,
sectional data, sea-level time series, and coastline files.
package provides collection of statistical tools for objective (non-supervised) applications
of the Regional Frequency Analysis methods in hydrology.
package is a collection of functions implementing the one-dimensional Boussinesq Equation
is a package for geostatistical interpolation of data with irregular spatial support such as runoff
related data or data from administrative units.
Several packages related to the field of climatology.
implements a number of functions for analysis and graphics of seasonal data.
is set of S3 and S4 functions for spatial multi-site stochastic generation of daily
time series of temperature and precipitation making use of Vector Autoregressive Models.
makes hourly interpolation of daily minimum and maximum temperature series for example
when hourly time series must be downscaled from the daily information.
Palaeoecology and stratigraphic data
Several packages now provide speciailist functionality for the import, analysis, and plotting of
Transfer function models including weighted averaging (WA), modern analogue technique (MAT), Locally-weighted WA, &
maximum likelihood (aka Gaussian logistic) regression (GLR) are provided by some or all of the
Import of common, legacy, palaeodata formats are provided by package
(cornell format) and
(cornell and Tilia format). In addition,
also allows for import of C2 model files.
Stratigraphic data plots can be drawn using
provides extensive support for developing and interpretting MAT transfer function models,
including ROC curve analysis. Summary of stratigraphic data is supported via principal curves in the
Constrained clustering of stratigraphic data is provided by function
in the form of constrained
hierarchical clustering in
Several other relevant contributed packages for R are available that do not fit under nice headings.
and provides a collection of tools for the analysis of habitat
selection by animals.
provides tools to represent, visualize, filter, analyse, and summarize time-depth recorder (TDR) data
for research on animal diving and movement behaviour.
implements methods for density estimation and nonparametric regression on irregular regions.
A useful alternative to kernel density estimation for e.g. estimating animal densities and home ranges in regions with
irregular boundaries or holes.
package provides some statistical tests and graphics for assessing
tests of equivalence. Such tests have similarity as the alternative hypothesis instead of the null. The package
contains functions to perform two one-sided t-tests (TOST) and paired t-tests of equivalence.
package provides an object oriented framework and tools to simulate
ecological (and other) dynamic systems within R. See the
on the package for further information.
Functions for circular statistics are found in
provides functions for latent class analysis, short time Fourier transform, fuzzy clustering,
support vector machines, shortest path computation, bagged clustering, naive Bayes classifier, and more...
provides a suite of miscellaneous functions for data analysis in ecology.
provides functions for handling and reporting on multivariate count data in ecology and
Sensitivity analysis of models is provided by packages
contains a collection of functions for factor screening and global sensitivity analysis of model
is an implementation of the Fourier Amplitude Sensitivity Test (FAST), a method to determine
global sensitivities of a model on parameter changes with relatively few model runs.
Functions to analyze coherence, boundary clumping, and turnover following the pattern-based metacommunity analysis of
Leibold and Mikkelson (2002)
are provided in the
Growth curve estimation via noncrossing and nonparametric regression quantiles is implemented in package
quantregGrowth. A supporting paper is
Muggeo et al.
package provides an R platform for experimenting with spatially explicit individual-based vegetation models. A supporting paper is
García, O. (2014)