Title: | Ecological Niche Modeling using Presence-Absence Data |
---|---|
Description: | A set of tools to perform Ecological Niche Modeling with presence-absence data. It includes algorithms for data partitioning, model fitting, calibration, evaluation, selection, and prediction. Other functions help to explore signals of ecological niche using univariate and multivariate analyses, and model features such as variable response curves and variable importance. Unique characteristics of this package are the ability to exclude models with concave quadratic responses, and the option to clamp model predictions to specific variables. These tools are implemented following principles proposed in Cobos et al., (2022) <doi:10.17161/bi.v17i.15985>, Cobos et al., (2019) <doi:10.7717/peerj.6281>, and Peterson et al., (2008) <doi:10.1016/j.ecolmodel.2007.11.008>. |
Authors: | Luis F. Arias-Giraldo [aut, cre] , Marlon E. Cobos [aut] , A. Townsend Peterson [ctb] |
Maintainer: | Luis F. Arias-Giraldo <[email protected]> |
License: | GPL (>= 3) |
Version: | 0.1.9 |
Built: | 2024-11-12 04:25:39 UTC |
Source: | https://github.com/luisagi/enmpa |
An object of the class enmpa_calibration storing the results from GLM calibration.
cal_res
cal_res
An object of class enmpa_calibration with results from the function
\link{calibration_glm}
.
data("cal_res", package = "enmpa") str(cal_res)
data("cal_res", package = "enmpa") str(cal_res)
Creates candidate models based on distinct parameter settings, evaluates models, and selects the ones that perform the best.
calibration_glm(data, dependent, independent, weights = NULL, response_type = "l", formula_mode = "moderate", minvar = 1, maxvar = NULL, user_formulas = NULL, cv_kfolds = 5, partition_index = NULL, seed = 1, n_threshold = 100, selection_criterion = "TSS", exclude_bimodal = FALSE, tolerance = 0.01, out_dir = NULL, parallel = FALSE, n_cores = NULL, verbose = TRUE)
calibration_glm(data, dependent, independent, weights = NULL, response_type = "l", formula_mode = "moderate", minvar = 1, maxvar = NULL, user_formulas = NULL, cv_kfolds = 5, partition_index = NULL, seed = 1, n_threshold = 100, selection_criterion = "TSS", exclude_bimodal = FALSE, tolerance = 0.01, out_dir = NULL, parallel = FALSE, n_cores = NULL, verbose = TRUE)
data |
data.frame or matrix of data to be used in model calibration. Columns represent dependent and independent variables. |
dependent |
(character) name of dependent variable. |
independent |
(character) vector of name(s) of independent variable(s). |
weights |
(numeric) a vector with the weights for observations. |
response_type |
(character) a character string that must contain "l", "p", "q" or a combination of them. l = lineal, q = quadratic, p = interaction between two variables. Default = "l". |
formula_mode |
(character) a character string to indicate the strategy to
create the formulas for candidate models. Options are: "light", "moderate",
"intensive", or "complex". Default = "moderate". "complex" returns only the
most complex formula defined in |
minvar |
(numeric) minimum number of independent variables in formulas. |
maxvar |
(numeric) maximum number of independent variables in formulas. |
user_formulas |
(character) vector with formula(s) to test. Default = NULL. |
cv_kfolds |
(numeric) number of folds to use for k-fold
cross-validation exercises. Default = 5. Ignored if |
partition_index |
list of indices for cross-validation in k-fold. The
default, NULL, uses the function |
seed |
(numeric) a seed for k-fold partitioning. |
n_threshold |
(logical) number of threshold values to produce evaluation metrics. Default = 100. |
selection_criterion |
(character) criterion used to select best models, options are "TSS" and "ESS". Default = "TSS". |
exclude_bimodal |
(logical) whether to filter out models with one or more variables presenting concave responses. Default = FALSE. |
tolerance |
(numeric) value to modify the limit value of the metric used to filter models during model selection if none of the models meet initial considerations. Default = 0.01 |
out_dir |
(character) output directory name to save the main calibration results. Default = NULL. |
parallel |
(logical) whether to run on parallel or sequential. Default = FALSE. |
n_cores |
(numeric) number of cores to use. Default = number of free processors - 1. |
verbose |
(logical) whether to print messages and show progress bar. Default = TRUE |
Model evaluation is done considering the ability to predict presences and
absences,as well as model fitting and complexity. Model selection consists
of three steps: 1) a first filter to keep the models with ROC AUC >= 0.5
(statistically significant models), 2) a second filter to maintain only
models that meet the selection_criterion
("TSS": TSS >= 0.4; or "ESS":
maximum Accuracy - tolerance
), and 3) from those, pick the ones with
delta AIC <= 2.
formula_mode
options determine what strategy to iterate the predictors
defined in type
for creating models:
light.– returns simple iterations of complex formulas.
moderate.– returns a comprehensive number of iterations.
intensive.– returns all possible combination. Very time-consuming for 6 or more independent variables.
complex.– returns only the most complex formula.
An object of the class enmpa_calibration containing: selected models, a summary of statistics for all models, results obtained in cross-validation for all models, original data used, weights, and data-partition indices used.
# Load species occurrences and environmental data. data("enm_data", package = "enmpa") head(enm_data) # Calibration using linear (l), quadratic (q), products(p) responses. cal_res <- calibration_glm(data = enm_data, dependent = "Sp", independent = c("bio_1", "bio_12"), response_type = "lpq", formula_mode = "moderate", selection_criterion = "TSS", cv_kfolds = 3, exclude_bimodal = TRUE, verbose = FALSE) print(cal_res) summary(cal_res) head(cal_res$calibration_results) head(cal_res$summary) head(cal_res$selected) head(cal_res$data)
# Load species occurrences and environmental data. data("enm_data", package = "enmpa") head(enm_data) # Calibration using linear (l), quadratic (q), products(p) responses. cal_res <- calibration_glm(data = enm_data, dependent = "Sp", independent = c("bio_1", "bio_12"), response_type = "lpq", formula_mode = "moderate", selection_criterion = "TSS", cv_kfolds = 3, exclude_bimodal = TRUE, verbose = FALSE) print(cal_res) summary(cal_res) head(cal_res$calibration_results) head(cal_res$summary) head(cal_res$selected) head(cal_res$data)
A dataset containing information on presence and absence, and independent variables used to fit GLM models.
enm_data
enm_data
A data frame with 5627 rows and 3 columns.
numeric, values of 0 = absence and 1 = presence.
numeric, temperature values.
numeric, precipitation values.
data("enm_data", package = "enmpa") head(enm_data)
data("enm_data", package = "enmpa") head(enm_data)
enmpa
contains a set of tools to perform detailed Ecological Niche Modeling
using presence-absence data.
It includes algorithms for data partitioning, model fitting, calibration, evaluation, selection, and prediction. Other functions help to explore model features as such variable response curves and variable importance.
enmpa
calibration_glm
, evaluation_stats
,
fit_glms
, fit_selected
,
get_formulas
, independent_eval1
,
kfold_partition
, model_selection
model_validation
, niche_signal
,
optimize_metrics
, predict_glm
,
predict_selected
, response_curve
,
resp2var
, var_importance
,
jackknife
, plot_jk
Maintainer: Luis F. Arias-Giraldo [email protected] (ORCID)
Authors:
Marlon E. Cobos [email protected] (ORCID)
Other contributors:
A. Townsend Peterson [email protected] (ORCID) [contributor]
Useful links:
Constructor of S3 objects of class enmpa_calibration
new_enmpa_calibration(selected, summary, calibration_results, data, partitioned_data, weights = NULL)
new_enmpa_calibration(selected, summary, calibration_results, data, partitioned_data, weights = NULL)
selected |
date.frame with information about selected models. |
summary |
data.frame a summary of statistics for all models. |
calibration_results |
data.frame with results obtained from cross-validation for all models. |
data |
data.frame or matrix with the input data used for calibration. |
partitioned_data |
a list of partition indices. |
weights |
(numeric) a vector with the weights for observations. Default = NULL. |
An S3 object of class enmpa_calibration
.
Constructor of S3 objects of class enmpa_fitted_models
new_enmpa_fitted_models(glms_fitted, selected, data, weights = NULL)
new_enmpa_fitted_models(glms_fitted, selected, data, weights = NULL)
glms_fitted |
a list of fitted GLMs. |
selected |
date.frame with information about selected models. |
data |
data.frame or matrix with the input data used for calibration. |
weights |
(numeric) a vector with the weights for observations. Default = NULL. |
An S3 object of class enmpa_fitted_models
.
Calculate median and standard deviation of evaluation results for all candidate models considering cross-validation kfolds.
evaluation_stats(evaluation_results, bimodal_toexclude = FALSE)
evaluation_stats(evaluation_results, bimodal_toexclude = FALSE)
evaluation_results |
data.frame model evaluation results. These results
are the output of the function |
bimodal_toexclude |
(logical) whether models in which binomial variable response curves were detected will be excluded during selection processes. |
A data.frame with the mean and standard deviation for all metrics considering cross-validation kfolds.
# data data("cal_res", package = "enmpa") all_res <- cal_res$calibration_results[, -1] # statistics for all evaluation results evaluation_stats(all_res, bimodal_toexclude = TRUE)
# data data("cal_res", package = "enmpa") all_res <- cal_res$calibration_results[, -1] # statistics for all evaluation results evaluation_stats(all_res, bimodal_toexclude = TRUE)
Functions to facilitate fitting multiple GLMs.
fit_selected(glm_calibration) fit_glms(formulas, data, weights = NULL, id = NULL)
fit_selected(glm_calibration) fit_glms(formulas, data, weights = NULL, id = NULL)
glm_calibration |
a list resulting from |
formulas |
(character) a vector containing the formula(s) for GLM(s). |
data |
data.frame with the dependent and independent variables. |
weights |
(numeric) a vector with the weights for observations. Default = NULL. |
id |
(character) id code for models fitted. Default = NULL. |
A list of fitted GLMs.
For fit_selected
, an enmpa fitted models
object.
# GLM calibration results data(cal_res, package = "enmpa") # Fitting selected models sel_fit <- fit_selected(cal_res) sel_fit # Custom formulas forms <- c("Sp ~ bio_1 + I(bio_1^2) + I(bio_12^2)", "Sp ~ bio_12 + I(bio_1^2) + I(bio_12^2)") # Fitting models fits <- fit_glms(forms, data = cal_res$data) fits$ModelID_1
# GLM calibration results data(cal_res, package = "enmpa") # Fitting selected models sel_fit <- fit_selected(cal_res) sel_fit # Custom formulas forms <- c("Sp ~ bio_1 + I(bio_1^2) + I(bio_12^2)", "Sp ~ bio_12 + I(bio_1^2) + I(bio_12^2)") # Fitting models fits <- fit_glms(forms, data = cal_res$data) fits$ModelID_1
Generate GLM formulas for independent variables predicting a dependent variable, taking into account response types required. All possible combinations of variables can be created using arguments of the function.
get_formulas(dependent, independent, type = "l", mode = "moderate", minvar = 1, maxvar = NULL) get_formulas_main(dependent, independent, type = "l", complex = FALSE, minvar = 1, maxvar = NULL) aux_var_comb(var_names, minvar = 2, maxvar = NULL) aux_string_comb(string)
get_formulas(dependent, independent, type = "l", mode = "moderate", minvar = 1, maxvar = NULL) get_formulas_main(dependent, independent, type = "l", complex = FALSE, minvar = 1, maxvar = NULL) aux_var_comb(var_names, minvar = 2, maxvar = NULL) aux_string_comb(string)
dependent |
(character) name of dependent variable. |
independent |
(character) vector of name(s) of independent variable(s). |
type |
(character) a character string that must contain "l", "p", "q" or a combination of them. l = lineal, q = quadratic, p = interaction between two variables. Default = "l". |
mode |
(character) (character) a character string to indicate the strategy to create the formulas for candidate models. Options are: "light", "moderate", "intensive", or "complex". Default = "moderate". |
minvar |
(numeric) minimum number of independent variables in formulas. |
maxvar |
(numeric) maximum number of independent variables in formulas. |
complex |
(logical) whether to return the most complex formula. |
var_names |
sames as |
string |
same as |
mode
options determine what strategy to iterate the predictors
defined in type
for creating models:
light.– returns simple iterations of complex formulas.
moderate.– returns a comprehensive number of iterations.
intensive.– returns all possible combination. Very time-consuming for 6 or more independent variables.
complex.– returns only the most complex formula.
A character vector containing the resulting formula(s).
# example variables dep <- "sp" ind <- c("temp", "rain", "slope") # The most complex formula according to "type" get_formulas(dep, ind, type = "lqp", mode = "complex") # mode = 'light', combinations according to type get_formulas(dep, ind, type = "lqp", mode = "light") # mode = 'light', combinations according to type get_formulas(dep, ind, type = "lqp", mode = "intensive")
# example variables dep <- "sp" ind <- c("temp", "rain", "slope") # The most complex formula according to "type" get_formulas(dep, ind, type = "lqp", mode = "complex") # mode = 'light', combinations according to type get_formulas(dep, ind, type = "lqp", mode = "light") # mode = 'light', combinations according to type get_formulas(dep, ind, type = "lqp", mode = "intensive")
Final evaluation steps for model predictions using an independent dataset (not used in model calibration).
independent_eval1(prediction, threshold, test_prediction = NULL, lon_lat = NULL) independent_eval01(prediction, observation, lon_lat = NULL)
independent_eval1(prediction, threshold, test_prediction = NULL, lon_lat = NULL) independent_eval01(prediction, observation, lon_lat = NULL)
prediction |
(numeric) vector or |
threshold |
(numeric) the lowest predicted probability value for an occurrence point. This value must be defined for presences-only data. Default = NULL. |
test_prediction |
(numeric) vector of predictions for independent data. Default = NULL. |
lon_lat |
matrix or data.frame of coordinates (longitude and latitude,
in that order) of independent data. Points must be located within the valid
area of |
observation |
(numeric) vector of observed (known) values of presence
or absence to test against |
A data.frame or list containing evaluation results.
# Independent test data based on coordinates (lon/lat WGS 84) from presence # and absences records data("test", package = "enmpa") head(test) # Loading a model prediction pred <- terra::rast(system.file("extdata", "proj_out_wmean.tif", package = "enmpa")) terra::plot(pred) # Evaluation using presence-absence data independent_eval01(prediction = pred, observation = test$Sp, lon_lat = test[, 2:3]) # Evaluation using presence-only data test_p_only <- test[test$Sp == 1, ] th_maxTSS <- 0.1274123 # threshold based on the maxTSS independent_eval1(prediction = pred, threshold = th_maxTSS, lon_lat = test_p_only[, 2:3])
# Independent test data based on coordinates (lon/lat WGS 84) from presence # and absences records data("test", package = "enmpa") head(test) # Loading a model prediction pred <- terra::rast(system.file("extdata", "proj_out_wmean.tif", package = "enmpa")) terra::plot(pred) # Evaluation using presence-absence data independent_eval01(prediction = pred, observation = test$Sp, lon_lat = test[, 2:3]) # Evaluation using presence-only data test_p_only <- test[test$Sp == 1, ] th_maxTSS <- 0.1274123 # threshold based on the maxTSS independent_eval1(prediction = pred, threshold = th_maxTSS, lon_lat = test_p_only[, 2:3])
The Jackknife function providing a detailed reflection of the impact of each variable on the overall model, considering four difference measures: ROC-AUC, TSS, AICc, and Deviance.
jackknife(data, dependent, independent, user_formula = NULL, cv = 3, response_type = "l", weights = NULL)
jackknife(data, dependent, independent, user_formula = NULL, cv = 3, response_type = "l", weights = NULL)
data |
data.frame or matrix of data to be used in model calibration. Columns represent dependent and independent variables. |
dependent |
(character) name of dependent variable. |
independent |
(character) vector of name(s) of independent variable(s). |
user_formula |
(character) custom formula to test. Default = NULL. |
cv |
(numeric) number of folds to use for k-fold cross-validation exercises. Default = 3. |
response_type |
(character) a character string that must contain "l", "p", "q" or a combination of them. l = lineal, q = quadratic, p = interaction between two variables. Default = "l". |
weights |
(numeric) a vector with the weights for observations. |
list including model performance metrics (ROC-AUC, TSS, AICc, and deviance) for the complete model, model performance when excluding a specific predictor, and the independent contribution of that predictor to the model.
# Load data data("enm_data", package = "enmpa") jk <- jackknife(data = enm_data, dependent = "Sp", independent = c("bio_1", "bio_12"), user_formula = NULL, cv = 3, response_type = "lpq") jk # plot JK's results plot_jk(jk, metric = "TSS") plot_jk(jk, metric = "ROC_AUC") plot_jk(jk, metric = "AIC") plot_jk(jk, metric = "Residual_deviance")
# Load data data("enm_data", package = "enmpa") jk <- jackknife(data = enm_data, dependent = "Sp", independent = c("bio_1", "bio_12"), user_formula = NULL, cv = 3, response_type = "lpq") jk # plot JK's results plot_jk(jk, metric = "TSS") plot_jk(jk, metric = "ROC_AUC") plot_jk(jk, metric = "AIC") plot_jk(jk, metric = "Residual_deviance")
Creates indices to partition available data into k equal-sized subsets or folds, maintaining the global proportion of presence-absences in each fold.
kfold_partition(data, dependent, k = 2, seed = 1)
kfold_partition(data, dependent, k = 2, seed = 1)
data |
data.frame or matrix containing at least two columns. |
dependent |
(character) name of column that contains the presence-absence records (1-0). |
k |
(numeric) the number of groups that the given data is to be split into. |
seed |
(numeric) integer value to specify an initial seed. Default = 1. |
A list of vectors with the indices of rows corresponding to each fold.
# example data data <- data.frame(species = c(rep(0, 80), rep (1,20)), variable1 = rnorm(100), variable2 = rpois(100, 2)) # create partition indices kfolds <- kfold_partition(data, dependent = "species", k = 2) # data for partition 1 data[kfolds$Fold_1, ]
# example data data <- data.frame(species = c(rep(0, 80), rep (1,20)), variable1 = rnorm(100), variable2 = rpois(100, 2)) # create partition indices kfolds <- kfold_partition(data, dependent = "species", k = 2) # data for partition 1 data[kfolds$Fold_1, ]
Applies a series of criteria to select best candidate models.
model_selection(evaluation_stats, criterion = "TSS", exclude_bimodal = FALSE, tolerance = 0.01)
model_selection(evaluation_stats, criterion = "TSS", exclude_bimodal = FALSE, tolerance = 0.01)
evaluation_stats |
data.frame with the statistics of model evaluation
results. These results are the output of the function
|
criterion |
(character) metric used as the predictive criterion for model selection. |
exclude_bimodal |
(logical) whether to exclude models in which binomial variable response curves were detected. |
tolerance |
(numeric) |
A data.frame with one or more selected models.
# data data("cal_res", package = "enmpa") eval_stats <- cal_res$summary[, -1] # selecting best model selected_mod <- model_selection(eval_stats, exclude_bimodal = TRUE)
# data data("cal_res", package = "enmpa") eval_stats <- cal_res$summary[, -1] # selecting best model selected_mod <- model_selection(eval_stats, exclude_bimodal = TRUE)
Model evaluation using entire set of data and a k-fold cross validation approach. Models are assessed based on discrimination power (ROC-AUC), classification ability (accuracy, sensitivity, specificity, TSS, etc.), and the balance between fitting and complexity (AIC).
model_validation(formula, data, family = binomial(link = "logit"), weights = NULL, cv = FALSE, partition_index = NULL, k = NULL, dependent = NULL, n_threshold = 100, keep_coefficients = FALSE, seed = 1)
model_validation(formula, data, family = binomial(link = "logit"), weights = NULL, cv = FALSE, partition_index = NULL, k = NULL, dependent = NULL, n_threshold = 100, keep_coefficients = FALSE, seed = 1)
formula |
(character) |
data |
data.frame with dependent and independent variables. |
family |
a |
weights |
(numeric) vector with weights for observations. Default = NULL. |
cv |
(logical) whether to use a k-fold cross validation for evaluation. Default = FALSE. |
partition_index |
list of indices for cross validation in k-fold.
Obtained with the function |
k |
(numeric) number of folds for a new k-fold index preparation.
Ignored if |
dependent |
(character) name of dependent variable. Ignore if
|
n_threshold |
(numeric) number of threshold values to be used for ROC. Default = 100. |
keep_coefficients |
(logical) whether to keep model coefficients. Default = FALSE. |
seed |
(numeric) a seed number. Default = 1. |
A data.frame with results from evaluation.
# Load species occurrences and environmental data. data("enm_data", package = "enmpa") head(enm_data) # Custom formula form <- c("Sp ~ bio_1 + I(bio_1^2) + I(bio_12^2)") # Model evaluation using the entire set of records model_validation(form, data = enm_data) # Model evaluation using a k-fold cross-validation (k = 3) model_validation(form, data = enm_data, cv = TRUE, k = 3, dependent = "Sp")
# Load species occurrences and environmental data. data("enm_data", package = "enmpa") head(enm_data) # Custom formula form <- c("Sp ~ bio_1 + I(bio_1^2) + I(bio_12^2)") # Model evaluation using the entire set of records model_validation(form, data = enm_data) # Model evaluation using a k-fold cross-validation (k = 3) model_validation(form, data = enm_data, cv = TRUE, k = 3, dependent = "Sp")
Identifies whether a signal of niche can be detected using one or multiple variables. This is an implementation of the methods developed by Cobos & Peterson (2022) doi:10.17161/bi.v17i.15985 that focuses on identifying niche signals in presence-absence data.
niche_signal(data, condition, variables, method = "univariate", permanova_method = "mahalanobis", iterations = 1000, set_seed = 1, verbose = TRUE, ...) niche_signal_univariate(data, condition, variable, iterations = 1000, set_seed = 1, verbose = TRUE) niche_signal_permanova(data, condition, variables, permutations = 999, permanova_method = "mahalanobis", verbose = TRUE, ...)
niche_signal(data, condition, variables, method = "univariate", permanova_method = "mahalanobis", iterations = 1000, set_seed = 1, verbose = TRUE, ...) niche_signal_univariate(data, condition, variable, iterations = 1000, set_seed = 1, verbose = TRUE) niche_signal_permanova(data, condition, variables, permutations = 999, permanova_method = "mahalanobis", verbose = TRUE, ...)
data |
matrix or data.frame containing at least the following
information: a column representing |
condition |
(character) name of the column with numeric information about detection (positive = 1 or negative = 0). |
variables |
(character) vector of one or more names of columns to be
used as environmental variables. If |
method |
(character) name of the method to be used for niche comparison. Default = "univariate". |
permanova_method |
(character) name of the dissimilarity index to be
used as |
iterations |
(numeric) number of iterations to be used in analysis.
Default = 1000. If |
set_seed |
(numeric) integer value to specify a initial seed. Default = 1. |
verbose |
(logical) whether or not to print messages about the process. Default = TRUE. |
... |
other arguments to be passed to |
variable |
(character) name of the column containing data to be used as environmental variable. |
permutations |
number of permutations to be performed. |
A list with results from analysis depending on method
.
# Load species occurrences and environmental data. data("enm_data", package = "enmpa") head(enm_data) # Detection of niche signal using an univariate non-parametric test sn_bio1 <- niche_signal(data = enm_data, variables = "bio_1", condition = "Sp", method = "univariate") sn_bio1 sn_bio12 <- niche_signal(data = enm_data, variables = "bio_12", condition = "Sp", method = "univariate") sn_bio12
# Load species occurrences and environmental data. data("enm_data", package = "enmpa") head(enm_data) # Detection of niche signal using an univariate non-parametric test sn_bio1 <- niche_signal(data = enm_data, variables = "bio_1", condition = "Sp", method = "univariate") sn_bio1 sn_bio12 <- niche_signal(data = enm_data, variables = "bio_12", condition = "Sp", method = "univariate") sn_bio12
The metrics true skill statistic (TSS), sensitivity, specificity are explored by comparing actual vs predicted values to find threshold values that produce sensitivity = specificity, maximum TSS, and a sensitivity value of 0.9.
optimize_metrics(actual, predicted, n_threshold = 100)
optimize_metrics(actual, predicted, n_threshold = 100)
actual |
(numeric) vector of actual values (0, 1) to be compared to
|
predicted |
(numeric) vector of predicted probability values to be
thresholded and compared to |
n_threshold |
(numeric) number of threshold values to be used. Default = 100. |
A list containing a data.frame with the resulting metrics for all threshold values tested, and a second data.frame with the results for the threshold values that produce sensitivity = specificity (ESS), maximum TSS (maxTSS), and a sensitivity value of 0.9 (SEN90).
# example data act <- c(rep(1, 20), rep(0, 80)) pred <- c(runif(20, min = 0.4, max = 0.7), runif(80, min = 0, max = 0.5)) # run example om <- optimize_metrics(actual = act, predicted = pred) om$optimized
# example data act <- c(rep(1, 20), rep(0, 80)) pred <- c(runif(20, min = 0.4, max = 0.7), runif(80, min = 0, max = 0.5)) # run example om <- optimize_metrics(actual = act, predicted = pred) om$optimized
Visualization of the results obtained with the function
var_importance
.
plot_importance(x, xlab = NULL, ylab = "Relative contribution", main = "Variable importance", extra_info = TRUE, ...)
plot_importance(x, xlab = NULL, ylab = "Relative contribution", main = "Variable importance", extra_info = TRUE, ...)
x |
data.frame output from |
xlab |
(character) a label for the x axis. |
ylab |
(character) a label for the y axis. |
main |
(character) main title for the plot. |
extra_info |
(logical) when results are from more than one model, it adds information about the number of models using each predictor and the mean contribution found. |
... |
A plot
# Load species occurrences and environmental data. data("enm_data", package = "enmpa") # Custom formulas forms <- c("Sp ~ bio_1 + I(bio_1^2) + I(bio_12^2)", "Sp ~ bio_12 + I(bio_1^2) + I(bio_12^2)") # Fitting models fits <- fit_glms(forms, data = enm_data) # Variable importance for single models vi_1 <- var_importance(fits$ModelID_1) plot_importance(x = vi_1) vi_2 <- var_importance(fits$ModelID_2) plot_importance(x = vi_2) # Variable importance for multiple models vi_c <- var_importance(fits) plot_importance(x = vi_c)
# Load species occurrences and environmental data. data("enm_data", package = "enmpa") # Custom formulas forms <- c("Sp ~ bio_1 + I(bio_1^2) + I(bio_12^2)", "Sp ~ bio_12 + I(bio_1^2) + I(bio_12^2)") # Fitting models fits <- fit_glms(forms, data = enm_data) # Variable importance for single models vi_1 <- var_importance(fits$ModelID_1) plot_importance(x = vi_1) vi_2 <- var_importance(fits$ModelID_2) plot_importance(x = vi_2) # Variable importance for multiple models vi_c <- var_importance(fits) plot_importance(x = vi_c)
The jackknife figure shows the impact of each variable on the full model, providing detailed information about the function and significance of each variable. Light blue indicates the impact on the model if the variable is not included, while dark blue indicates the independent contribution of the variable to the model.
plot_jk(x, metric = "ROC_AUC", legend = TRUE, colors = c("cyan", "blue", "red"), xlab = NULL, main = NULL)
plot_jk(x, metric = "ROC_AUC", legend = TRUE, colors = c("cyan", "blue", "red"), xlab = NULL, main = NULL)
x |
list output from |
metric |
(character) model metric to plot. Default = "ROC_AUC". |
legend |
(logical) whether to add legend. Default = TRUE. |
colors |
(character) vector of colors. Default = c("cyan", "blue", "red"). |
xlab |
(character) a label for the x axis. |
main |
(character) main title for the plot. |
Plots to interpret results from niche_signal tests (Cobos & Peterson (2022) doi:10.17161/bi.v17i.15985).
plot_niche_signal(niche_signal_list, statistic = "mean", variables = NULL, ellipses = FALSE, level = 0.99, breaks = "Sturges", main = "", xlab = NULL, ylab = NULL, h_col = "lightgray", h_cex = 0.8, lty = 2, lwd = 1, l_col = c("blue", "black"), e_col = c("black", "red"), pch = 19, pt_cex = c(1.3, 0.8), pt_col = c("black", "red"), ...) plot_niche_signal_univariate(niche_signal_univariate_list, statistic = "mean", breaks = "Sturges", main = "", xlab = NULL, ylab = "Frequency", h_col = "lightgray", h_cex = 0.8, lty = 2, lwd = 1, l_col = c("blue", "black"), ...) plot_niche_signal_permanova(niche_signal_permanova_list, variables = NULL, ellipses = FALSE, level = 0.99, main = "", xlab = NULL, ylab = NULL, e_col = c("black", "red"), lty = 2, lwd = 1, pch = 19, pt_cex = c(1.3, 0.8), pt_col = c("black", "red"), ...)
plot_niche_signal(niche_signal_list, statistic = "mean", variables = NULL, ellipses = FALSE, level = 0.99, breaks = "Sturges", main = "", xlab = NULL, ylab = NULL, h_col = "lightgray", h_cex = 0.8, lty = 2, lwd = 1, l_col = c("blue", "black"), e_col = c("black", "red"), pch = 19, pt_cex = c(1.3, 0.8), pt_col = c("black", "red"), ...) plot_niche_signal_univariate(niche_signal_univariate_list, statistic = "mean", breaks = "Sturges", main = "", xlab = NULL, ylab = "Frequency", h_col = "lightgray", h_cex = 0.8, lty = 2, lwd = 1, l_col = c("blue", "black"), ...) plot_niche_signal_permanova(niche_signal_permanova_list, variables = NULL, ellipses = FALSE, level = 0.99, main = "", xlab = NULL, ylab = NULL, e_col = c("black", "red"), lty = 2, lwd = 1, pch = 19, pt_cex = c(1.3, 0.8), pt_col = c("black", "red"), ...)
niche_signal_list |
list of results from niche_signal. |
statistic |
(character) name of the statistic for which results will be explored when results come for univariate analysis. Default = "mean". Options are: "mean", "median", "SD", and "range". |
variables |
(character) name of variables to used in plots when
results come from analysis using the |
ellipses |
(logical) whether to use ellipses to represent all and positive data when results come from PERMANOVA. The default, FALSE, plots points instead. |
level |
(numeric) value from 0 to 1 representing the limit of the ellipse to be plotted. Default = 0.99. |
breaks |
breaks in the histogram as in |
main |
(character) title for plot. Default = "". |
xlab |
(character) x axis label. Default = NULL. For results from PERMANOVA, appropriate variable names are used. |
ylab |
(character) y axis label. Default = NULL. For univariate results, the default turn into "Frequency". For results from PERMANOVA, appropriate variable names are used. |
h_col |
a color to be used to fill the bars of histograms. Default = "lightgray". |
h_cex |
(numeric) value by which plotting text and symbols should be magnified relative to the default in histograms. Default = 0.8. |
lty |
(numeric) line type. See options in |
lwd |
(numeric) line width. See options in |
l_col |
line color for observed value of positives and confidence intervals. Default = c("blue", "black"). |
e_col |
color of ellipse lines for all and positive data. Default = c("black", "red"). |
pch |
point type. See options in |
pt_cex |
(numeric) value by which points will be magnified. Values for all and positive points are recommended. Default = c(1.3, 0.8). |
pt_col |
color for points. Values for all and positive points are recommended. Default = c("black", "red"). |
... |
other plotting arguments to be used. |
niche_signal_univariate_list |
list of results from niche_signal_univariate. |
niche_signal_permanova_list |
list of results from niche_signal_permanova. |
A plot.
# Load species occurrences and environmental data. data("enm_data", package = "enmpa") head(enm_data) # Detection of niche signal using an univariate non-parametric test sn_bio1 <- niche_signal(data = enm_data, variables = "bio_1", condition = "Sp", method = "univariate") plot_niche_signal(sn_bio1, variables = "bio_1") sn_bio12 <- niche_signal(data = enm_data, variables = "bio_12", condition = "Sp", method = "univariate") plot_niche_signal(sn_bio12, variables = "bio_12")
# Load species occurrences and environmental data. data("enm_data", package = "enmpa") head(enm_data) # Detection of niche signal using an univariate non-parametric test sn_bio1 <- niche_signal(data = enm_data, variables = "bio_1", condition = "Sp", method = "univariate") plot_niche_signal(sn_bio1, variables = "bio_1") sn_bio12 <- niche_signal(data = enm_data, variables = "bio_12", condition = "Sp", method = "univariate") plot_niche_signal(sn_bio12, variables = "bio_12")
Obtains predictions from a fitted generalized linear model objects. It also allows the clamping option to restrict extrapolation in areas outside the calibration area.
predict_glm( model, newdata, data = NULL, extrapolation_type = "E", restricted_vars = NULL, type = "response" )
predict_glm( model, newdata, data = NULL, extrapolation_type = "E", restricted_vars = NULL, type = "response" )
model |
a |
newdata |
a data.frame or matrix with the new data to project the predictions. |
data |
data.frame or matrix of data used in the model calibration step. Default = NULL. |
extrapolation_type |
(character) to indicate extrapolation type of model. Models can be transferred with three options: free extrapolation ('E'), extrapolation with clamping ('EC'), and no extrapolation ('NE'). Default = 'E'. |
restricted_vars |
(character) a vector containing the names of the variables that will undergo clamping or no extrapolation. For clamping, these variables are set to minimum and maximum values established for the max and min values within calibration values. For no extrapolation, the variables outside calibration limits became NA. If no specific names are provided, the value is set to NULL by default, indicating that clamping (EC) or no extrapolation (NE) will be applied to all variables. Ignore if extrapolation_type = 'E'. |
type |
(character) the type of prediction required. For a default binomial model the default predictions are of log-odds (probabilities on logit scale). The default, "response", returns predicted probabilities. |
A SpatRaster
object or a vector with predictions.
# Load fitted model data("sel_fit", package = "enmpa") # Load raster layers to be projected env_vars <- terra::rast(system.file("extdata", "vars.tif", package = "enmpa")) # Prediction pred <- predict_glm(sel_fit$glms_fitted$ModelID_7, newdata = env_vars, data = sel_fit$data) terra::plot(pred)
# Load fitted model data("sel_fit", package = "enmpa") # Load raster layers to be projected env_vars <- terra::rast(system.file("extdata", "vars.tif", package = "enmpa")) # Prediction pred <- predict_glm(sel_fit$glms_fitted$ModelID_7, newdata = env_vars, data = sel_fit$data) terra::plot(pred)
Wrapper function that facilitates the prediction of those models selected as the most robust. In addition, it allows the calculation of consensus models, when more than one model are selected.
predict_selected(fitted, newdata, extrapolation_type = "E", restricted_vars = NULL, type = "response", consensus = TRUE)
predict_selected(fitted, newdata, extrapolation_type = "E", restricted_vars = NULL, type = "response", consensus = TRUE)
fitted |
an enmpa-class |
newdata |
a |
extrapolation_type |
(character) to indicate extrapolation type of model. Models can be transferred with three options: free extrapolation ('E'), extrapolation with clamping ('EC'), and no extrapolation ('NE'). Default = 'E'. |
restricted_vars |
(character) a vector containing the names of the variables that will undergo clamping or no extrapolation. For clamping, these variables are set to minimum and maximum values established for the max and min values within calibration values. For no extrapolation, the variables outside calibration limits became NA. If no specific names are provided, the value is set to NULL by default, indicating that clamping (EC) or no extrapolation (NE) will be applied to all variables. Ignore if extrapolation_type = 'E'. |
type |
(character) the type of prediction required. For a default binomial model the default predictions are of log-odds (probabilities on logit scale). The default, "response", returns predicted probabilities. |
consensus |
(logical) valid if |
A list with predictions of selected models on the newdata
and fitted
selected model(s). Consensus predictions are added if multiple selected
models exits and if newdata
is a SpatRaster
object.
# Load a fitted selected model data(sel_fit, package = "enmpa") # Load raster layers to be projected env_vars <- terra::rast(system.file("extdata", "vars.tif", package = "enmpa")) # Predictions (only one selected mode, no consensus required) preds <- predict_selected(sel_fit, newdata = env_vars, consensus = FALSE) # Plot prediction terra::plot(preds$predictions)
# Load a fitted selected model data(sel_fit, package = "enmpa") # Load raster layers to be projected env_vars <- terra::rast(system.file("extdata", "vars.tif", package = "enmpa")) # Predictions (only one selected mode, no consensus required) preds <- predict_selected(sel_fit, newdata = env_vars, consensus = FALSE) # Plot prediction terra::plot(preds$predictions)
Print a short version of elements in 'calibration' and 'fitted models' objects
## S3 method for class 'enmpa_calibration' print(x, ...) ## S3 method for class 'enmpa_fitted_models' print(x, ...)
## S3 method for class 'enmpa_calibration' print(x, ...) ## S3 method for class 'enmpa_fitted_models' print(x, ...)
x |
object of enmpa_fitted_models or enmpa_calibration |
... |
additional arguments affecting the summary produced. Ignored in these functions. |
proc applies partial ROC tests to model predictions.
proc_enm(test_prediction, prediction, threshold = 5, sample_percentage = 50, iterations = 500)
proc_enm(test_prediction, prediction, threshold = 5, sample_percentage = 50, iterations = 500)
test_prediction |
(numeric) vector of model predictions for testing data. |
prediction |
|
threshold |
(numeric) value from 0 to 100 to represent the percentage of potential error (E) that the data could have due to any source of uncertainty. Default = 5. |
sample_percentage |
(numeric) percentage of testing data to be used in each bootstrapped process for calculating the partial ROC. Default = 50. |
iterations |
(numeric) number of bootstrap iterations to be performed; default = 500. |
Partial ROC is calculated following Peterson et al. (2008) doi:10.1016/j.ecolmodel.2007.11.008.
A list with the summary of the results and a data.frame containing the AUC values and AUC ratios calculated for all iterations.
# Loading a model prediction pred <- terra::rast(system.file("extdata", "proj_out_wmean.tif", package = "enmpa")) # Simulated data test <- runif(100, min = 0.3, max = 0.8) # partial ROC calculation pr <- proc_enm(test, pred, threshold = 5, sample_percentage = 50, iterations = 500)
# Loading a model prediction pred <- terra::rast(system.file("extdata", "proj_out_wmean.tif", package = "enmpa")) # Simulated data test <- runif(100, min = 0.3, max = 0.8) # partial ROC calculation pr <- proc_enm(test, pred, threshold = 5, sample_percentage = 50, iterations = 500)
A view of the species probability into a two-dimensional environmental space.
resp2var(model, variable1 , variable2, modelID = NULL, data = NULL, n = 1000, new_data = NULL, extrapolate = FALSE, add_bar = TRUE, add_limits = FALSE, color.palette = NULL, xlab = NULL, ylab = NULL, ...)
resp2var(model, variable1 , variable2, modelID = NULL, data = NULL, n = 1000, new_data = NULL, extrapolate = FALSE, add_bar = TRUE, add_limits = FALSE, color.palette = NULL, xlab = NULL, ylab = NULL, ...)
model |
an object of class |
variable1 |
(character) name of the variable to be plotted in x axis. |
variable2 |
(character) name of the variable to be plotted in y axis. |
modelID |
(character) name of the ModelID if inputed |
data |
data.frame or matrix of data to be used in model calibration. Default = NULL. |
n |
(numeric) an integer guiding the number of breaks. Default = 100 |
new_data |
a |
extrapolate |
(logical) whether to allow extrapolation to study the
behavior of the response outside the calibration limits. Ignored if
|
add_bar |
(logical) whether to add bar legend. Default = TRUE. |
add_limits |
(logical) whether to add calibration limits if
|
color.palette |
(function) a color palette function to be used to assign colors in the plot. Default = function(n) rev(hcl.colors(n, "terrain")). |
xlab |
(character) a label for the x axis. The default, NULL, uses the
name defined in |
ylab |
(character) a label for the y axis. The default, NULL, uses the
name defined in |
... |
additional arguments passed to
|
The function calculates probabilities by focusing on each combination of the two supplied environmental variable while keeping all other variables constant at their mean values.
A plot with the response interaction of two environmental dimensions for
variable1
and variable2
, and don't return anything.
# Load a fitted selected model data(sel_fit, package = "enmpa") # Two-Way interaction response plot in the calibration limits resp2var(sel_fit, variable1 = "bio_1", variable2 = "bio_12", xlab = "BIO-1", ylab = "BIO-12", modelID = "ModelID_7") # Two-Way interaction response plot allowing extrapolation resp2var(sel_fit, variable1 = "bio_1", variable2 = "bio_12", xlab = "BIO-1", ylab = "BIO-12", modelID = "ModelID_7", extrapolate = TRUE)
# Load a fitted selected model data(sel_fit, package = "enmpa") # Two-Way interaction response plot in the calibration limits resp2var(sel_fit, variable1 = "bio_1", variable2 = "bio_12", xlab = "BIO-1", ylab = "BIO-12", modelID = "ModelID_7") # Two-Way interaction response plot allowing extrapolation resp2var(sel_fit, variable1 = "bio_1", variable2 = "bio_12", xlab = "BIO-1", ylab = "BIO-12", modelID = "ModelID_7", extrapolate = TRUE)
A view of variable responses in models. Responses based on single or multiple models can be provided.
response_curve(fitted, variable, data = NULL, modelID = NULL, n = 100, new_data = NULL, extrapolate = TRUE, xlab = NULL, ylab = "Probability", col = "red", ...)
response_curve(fitted, variable, data = NULL, modelID = NULL, n = 100, new_data = NULL, extrapolate = TRUE, xlab = NULL, ylab = "Probability", col = "red", ...)
fitted |
an object of class |
variable |
(character) name of the variables to be plotted. |
data |
data.frame or matrix of data used in the model calibration step. Default = NULL. |
modelID |
(character) vector of ModelID(s) to be considered when the
fitted models is an |
n |
(numeric) an integer guiding the number of breaks. Default = 100 |
new_data |
a |
extrapolate |
(logical) whether to allow extrapolation to study the
behavior of the response outside the calibration limits. Ignored if
|
xlab |
(character) a label for the x axis. The default, NULL, uses the
name defined in |
ylab |
(character) a label for the y axis. Default = "Probability". |
col |
(character) color for lines. Default = "red". |
... |
additional arguments passed to |
The function calculates these probabilities by focusing on a single environmental variable while keeping all other variables constant at their mean values.
When responses for multiple models are to be plotted, the mean and confidence intervals for the set of responses are calculated using a GAM.
A plot with the response curve for a variable
.
# Load a fitted selected model data(sel_fit, package = "enmpa") # Response curve for single models response_curve(sel_fit$ModelID_7, variable = "bio_1") # Response curve when model(s) are in a list (only one model in this one) response_curve(sel_fit, variable = "bio_12")
# Load a fitted selected model data(sel_fit, package = "enmpa") # Response curve for single models response_curve(sel_fit$ModelID_7, variable = "bio_1") # Response curve when model(s) are in a list (only one model in this one) response_curve(sel_fit, variable = "bio_12")
An object of the class enmpa_fitted_models containing fitted selected model(s) and the information from model evaluation for such model(s).
sel_fit
sel_fit
A list with four elements.
a fitted glm (ModelID_7).
a data.frame with results from evaluation of ModelID_7
a data.frame containing information on presence and absence, and independent variables used to fit GLM models.
a vector with the weights for observations.
data("sel_fit", package = "enmpa")
data("sel_fit", package = "enmpa")
Summary of 'calibration' and 'fitted models'
## S3 method for class 'enmpa_calibration' summary(object, ...) ## S3 method for class 'enmpa_fitted_models' summary(object, ...)
## S3 method for class 'enmpa_calibration' summary(object, ...) ## S3 method for class 'enmpa_fitted_models' summary(object, ...)
object |
of class enmpa_calibration or enmpa_fitted_models |
... |
additional arguments affecting the summary produced. Ignored in these functions. |
A printed summary.
A dataset containing information on presence and absence, and independent variables used to fit GLM models.
test
test
A data frame with 100 rows and 3 columns.
numeric, values of 0 = absence and 1 = presence.
numeric, longitude values.
numeric, latitude values.
data("test", package = "enmpa") head(test)
data("test", package = "enmpa") head(test)
Calculates the relative importance of predictor variables based on the concept of explained deviance. This is achieved by fitting a GLMs multiple times, each time leaving out a different predictor variable to observe its impact on the model's performance.
var_importance(fitted, modelID = NULL, data = NULL)
var_importance(fitted, modelID = NULL, data = NULL)
fitted |
an object of class |
modelID |
(character) vector of ModelID(s) to be considered when the
|
data |
data.frame or matrix of data used in the model calibration step. It must be defined in case the model entered does not explicitly include a data component. Default = NULL. |
The process begins by fitting the full GLM model, which includes all predictor variables. Subsequently, separate GLM models are fitted, excluding one variable at a time to assess the influence of its absence on the model's performance. By systematically evaluating the effect of removing each predictor variable, the function provides valuable insights into their individual contributions to the model's overall performance and explanatory power.
A data.frame containing the relative contribution of each variable. An
identification for distinct models is added if fitted
contains multiple
models.
# Load a fitted selected model data(sel_fit, package = "enmpa") # Variable importance for single models var_importance(sel_fit, modelID = "ModelID_7") # Variable importance for multiple models (only one model in this list) var_importance(sel_fit)
# Load a fitted selected model data(sel_fit, package = "enmpa") # Variable importance for single models var_importance(sel_fit, modelID = "ModelID_7") # Variable importance for multiple models (only one model in this list) var_importance(sel_fit)