Identifying and characterizing extrapolation in multivariate response data

Autoři: Meridith L. Bartley aff001;  Ephraim M. Hanks aff001;  Erin M. Schliep aff002;  Patricia A. Soranno aff003;  Tyler Wagner aff004
Působiště autorů: Department of Statistics, Pennsylvania State University, University Park, Pennsylvania, United States of America aff001;  Department of Statistics, University of Missouri, Columbia, Missouri, United States of America aff002;  Department of Fisheries and Wildlife, Michigan State University, East Lansing, Michigan, United States of America aff003;  U.S. Geological Survey, Pennsylvania Cooperative Fish and Wildlife Research Unit, Pennsylvania State University, University Park, Pennsylvania, United States of America aff004
Vyšlo v časopise: PLoS ONE 14(12)
Kategorie: Research Article


Faced with limitations in data availability, funding, and time constraints, ecologists are often tasked with making predictions beyond the range of their data. In ecological studies, it is not always obvious when and where extrapolation occurs because of the multivariate nature of the data. Previous work on identifying extrapolation has focused on univariate response data, but these methods are not directly applicable to multivariate response data, which are common in ecological investigations. In this paper, we extend previous work that identified extrapolation by applying the predictive variance from the univariate setting to the multivariate case. We propose using the trace or determinant of the predictive variance matrix to obtain a scalar value measure that, when paired with a selected cutoff value, allows for delineation between prediction and extrapolation. We illustrate our approach through an analysis of jointly modeled lake nutrients and indicators of algal biomass and water clarity in over 7000 inland lakes from across the Northeast and Mid-west US. In addition, we outline novel exploratory approaches for identifying regions of covariate space where extrapolation is more likely to occur using classification and regression trees. The use of our Multivariate Predictive Variance (MVPV) measures and multiple cutoff values when exploring the validity of predictions made from multivariate statistical models can help guide ecological inferences.

Klíčová slova:

Conditioned response – Covariance – Eutrophication – Extrapolation – Lakes – Probability distribution – Water quality


1. Miller JR, Turner MG, Smithwick EaH, Dent CL, Stanley EH. Spatial Extrapolation: The Science of Predicting Ecological Patterns and Processes. BioScience. 2004;54(4):310. doi: 10.1641/0006-3568(2004)054%5B0310:SETSOP%5D2.0.CO;2

2. Filstrup CT, Wagner T, Soranno PA, Stanley EH, Stow CA, Webster KE, et al. Regional variability among nonlinear chlorophyll-phosphorus relationships in lakes. Limnology and Oceanography. 2014;59(5):1691–1703.

3. Forbes V, Calow P. Extrapolation in ecological risk assessment: balancing pragmatism and precaution in chemical controls legislation. BioScience. 2002;225(3):152–161.

4. Freckleton RP. The problems of prediction and scale in applied ecology: The example of fire as a management tool. Journal of Applied Ecology. 2004;41(4):599–603.

5. Colwell RK, Coddington JA. Estimating Terrestrial Biodiversity through Extrapolation. Philosophical Transactions of the Royal Society B: Biological Sciences. 1994;345(1311):101–118. doi: 10.1098/rstb.1994.0091

6. Peters DPC, Herrick JE, Urban DL, Gardner RH, Breshears DD. Strategies for ecological extrapolation. Oikos. 2004;106(3):627–636.

7. Cook RD. Detection of Influential Observation in Linear Regression. Technometrics. 1977.

8. Conn PB, Johnson DS, Boveng PL. On extrapolating past the range of observed data when making statistical predictions in ecology. PLoS ONE. 2015.

9. Elith J, Leathwick JR. Species Distribution Models: Ecological Explanation and Prediction Across Space and Time. Annual Review of Ecology, Evolution, and Systematics. 2009;40(1):677–697. doi: 10.1146/annurev.ecolsys.110308.120159

10. Mesgaran MB, Cousens RD, Webber BL. Here be dragons: A tool for quantifying novelty due to covariate range and correlation change when projecting species distribution models. Diversity and Distributions. 2014;20(10):1147–1159.

11. Carpenter SR, Caraco NF, Correll DL, Howarth RW, Sharpley AN, Smith VH. Carpenter_et_al-1998-Ecological_Applications. Ecological Applications. 1998;8(January 1998):559–568.

12. Tranvik LJ, Cole JJ, Prairie YT. The study of carbon in inland waters-from isolated ecosystems to players in the global carbon cycle. Limnology and Oceanography Letters. 2018;3(3):41–48.

13. Soranno PA, Bacon LC, Beauchene M, Bednar KE, Bissell EG, Boudreau CK, et al. LAGOS-NE: A multi-scaled geospatial and temporal database of lake ecological context and water quality for thousands of U.S. lakes. GigaScience. 2017;(March):1–22. doi: 10.1093/gigascience/gix101 29053868

14. Wagner T, Schliep EM. Combining nutrient, productivity, and landscape-based regressions improves predictions of lake nutrients and provides insight into nutrient coupling at macroscales. Limnology and Oceanography. 2018;63(6):2372–2383.

15. Becker RA, Wilks AR, Brownrigg R, Minka TP. maps: Draw Geographical Maps. R package version 23-6. 2013;.

16. Mahalanobis PC. On the generalized distance in statistics. National Institute of Science of India. 1936;.

17. Etherington TR. Mahalanobis distances and ecological niche modelling: correcting a chi-squared probability error. PeerJ. 2019.

18. Wagner T, Schliep EM. Combining nutrient, productivity, and landscape-based regressions improves predictions of lake nutrients and provides insight into nutrient coupling at macroscales. Limnology and Oceanography. 2018.

19. Gentle JE. Matrix Algebra: Theory, Computations, and Applications in Statistics; 2007.

20. Lottig NR, Wagner T, Henry EN, Cheruvelil KS, Webster KE, Downing JA, et al. Long-term citizen-collected data reveal geographical patterns and temporal trends in lake water clarity. PLoS ONE. 2014.

21. Cook RD. Influential observations in linear regression. Journal of the American Statistical Association. 1979.

22. R Core Team. R Core Team (2016). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria URL http://wwwR-projectorg/. 2016;.

23. Vines MP, Best N, Cowles K, Karen. CODA: Convergence Diagnosis and Output Analysis for MCMC. R news. 2006;6(1):7–11.

24. Doubek JP, Carey CC. Catchment, morphometric, and water quality characteristics differ between reservoirs and naturally formed lakes on a latitudinal gradient in the conterminous United States. Inland Waters. 2017.

25. Hedley SL, Buckland ST. Spatial models for line transect sampling. Journal of Agricultural, Biological, and Environmental Statistics. 2004.

26. Conley DJ, Paerl HW, Howarth RW, Boesch DF, Seitzinger SP, Havens KE, et al. Ecology—Controlling eutrophication: Nitrogen and phosphorus; 2009.

27. Paerl HW, Xu H, McCarthy MJ, Zhu G, Qin B, Li Y, et al. Controlling harmful cyanobacterial blooms in a hyper-eutrophic lake (Lake Taihu, China): The need for a dual nutrient (N & P) management strategy. Water Research. 2011.

Článek vyšel v časopise


2019 Číslo 12
Nejčtenější tento týden