Efficient way to find species specific coefficients for a function call

allometric equations for carbon sequestration
allometric equation for biomass estimation
above-ground biomass calculator
how to calculate competition coefficient
biomass equation
tree biomass calculator
in the lotka-volterra competition model, under which circumstance can two species coexist?
tree biomass equation

Andrew Robinson shows in irebreakeR how to compute tree volume using diameter and height. He creates a function which uses coefficients depending on species and diameter. A simplified version looks like:

funRobinson <- function(species, diameter, height) {
  bf_params <- data.frame(species  = c("Spruce", "Oak"),
                          b0_small = c(26.729,  29.790),
                          b1_small = c( 0.01189, 0.00997),
                          b0_large = c(32.516,  85.150),
                          b1_large = c( 0.01181, 0.00841))
  dimensions <- data.frame(diameter   = diameter,
                           height     = height,
                           species    = as.character(species),
                           this_order = 1:length(species))
  dimensions <- merge(y=dimensions, x=bf_params, all.y=TRUE, all.x=FALSE)
  dimensions <- dimensions[order(dimensions$this_order, decreasing=FALSE),]
  b0 <- with(dimensions, ifelse(diameter <= 20.5, b0_small, b0_large))
  b1 <- with(dimensions, ifelse(diameter <= 20.5, b1_small, b1_large))
  b0 + b1 * dimensions$diameter^2 * dimensions$height
}

For me this method looks straight forward but it creates an additional data.frame which needs to be sorted and calls ifelse twice to distinguish between small (diameter <= 20.5) and large trees. I'm looking for a more efficient way (low memory consumption, execution time) to find species specific coefficients. I would appreciate the possibility to add coefficients for other species without editing the function.

Example data-set and Performance:

dat <- data.frame(species = c("Spruce", "Spruce", "Oak", "Oak", "Fir"),
                  diameter = c(4,   30,  4,   30,  30),
                  height  = c(30,  100, 30,  100, 100))
with(dat, funRobinson(species, diameter, height))
#[1]   32.4362 1095.4160   34.5756  842.0500        NA

library(microbenchmark)
microbenchmark(
  Robinson = with(dat, funRobinson(species, diameter, height))
)
#Unit: milliseconds
#     expr      min       lq     mean   median       uq      max neval
# Robinson 1.832604 1.860334 1.948054 1.876155 1.905009 3.054021   100


set.seed(0)
size <- 1e5
dat2 <- data.frame(species = sample(c("Spruce", "Oak", "Fir"), size=size, replace = TRUE)
       , diameter = runif(size, 1, 50)
       , height  = runif(size, 1, 100))

microbenchmark(
  Robinson = with(dat2, funRobinson(species, diameter, height))
)
#Unit: milliseconds
#     expr      min       lq     mean   median       uq      max neval
# Robinson 203.8171 219.9265 234.0798 227.5911 250.6204 278.9918   100

I guess it's avoiding the data frame but directly calling the values out from a vector (or matrix). And the values called are the same for b0 and b1, so we only need to calculate it once.

Below is a quick attempt, most likely it can be made faster. I basically make 2 matrices for each parameter, and call out the corresponding rows and columns, according to

f2 <- function(species, diameter, height) {
  species_avail=c("Spruce", "Oak")
  params_b0 = cbind(b0_small = c(26.729,  29.790),
                    b0_large = c(32.516,  85.150))
  rownames(params_b0) = species_avail
  params_b1 = cbind(b1_small = c( 0.01189, 0.00997),
                    b1_large = c( 0.01181, 0.00841))
  rownames(params_b1) = species_avail
  ROWS = match(species,species_avail)
  COLS = +(diameter > 20.5) + 1
  idx = cbind(ROWS,COLS)
  b0 <- params_b0[idx]
  b1 <- params_b1[idx]

  b0 + b1 * diameter^2 * height
}

Create data:

set.seed(0)
size <- 1e5
dat2 <- data.frame(species = sample(c("Spruce", "Oak", "Fir"), size=size, replace = TRUE)
                   , diameter = runif(size, 1, 50)
                   , height  = runif(size, 1, 100))

check the function returns same thing:

identical(
with(dat2,funRobinson(species, diameter, height)),
with(dat2,f2(species,diameter,height))
)
[1] TRUE

Test:

microbenchmark(
  Robinson = with(dat2, funRobinson(species, diameter, height)),
  f2 = with(dat2, f2(species, diameter, height))
)

Unit: milliseconds
     expr        min        lq      mean    median        uq       max neval
 Robinson 249.677157 275.23375 303.97532 298.72475 329.04318 391.53807   100
       f2   9.423471  10.16365  13.86918  10.48073  16.06827  65.19541   100
 cld
   b
  a 

Some Methods for Calculating Competition Coefficients from , the degree to which an individual of one species affects through competition the growth or is ambiguous concerning the meaning of "utilization function." Generally, A second way to regard the competition coefficient is as a number to be search time for the competitor species, follows also from MacArthur 's system. Efficient way to find species specific coefficients for a function call Andrew Robinson shows in irebreakeR how to compute tree volume using diameter and height. He creates a function which uses coefficients depending on species and diameter.

Using the same approach like @StupidWolf but removing the match by directly using the number of the factor of the tree species by storing the coefficients sorted by those factors. Storing the coefficients in an environment avoids setting up the coefficients each time the function is called.

funGKiCl  <- function(params, speciesLevels) {
  force(params)
  force(speciesLevels)
  nSpecies <- length(speciesLevels)
  i <- match(speciesLevels, params$species)
  params_b0  <- c(params$b0_small[i], params$b0_large[i])
  params_b1  <- c(params$b1_small[i], params$b1_large[i])
  rm(i, params, speciesLevels)
  function(species, diameter, height) {
    i <- unclass(species) + (diameter > 20.5) * nSpecies
    params_b0[i] + params_b1[i] * diameter * diameter * height
  }
}

dat <- data.frame(species = c("Spruce", "Spruce", "Oak", "Oak", "Fir")
                , diameter = c(4,   30,  4,   30,  30)
                , height  = c(30,  100, 30,  100, 100))

params <- read.table(header = TRUE, text = "
species b0_small b1_small b0_large b1_large
Spruce    26.729  0.01189   32.516  0.01181
Oak       29.790  0.00997   85.150  0.00841")
funGKi <- compiler::cmpfun(funGKiCl(params, levels(dat$species)))
with(dat, funGKi(species, diameter, height))
#[1]   32.4362 1095.4160   34.5756  842.0500        NA

Analysing multivariate abundance data using gllvm, where βj and θj are vectors of species specific coefficients related to the This leads to an extension of the so-called “fourth corner model” where In particular, we use a factor analysis on Dunn-Smyth residuals to obtain starting functions, estimation methods and link functions for various response types� Coefficient represents the strength of the competitive effect of neighboring trees of species x . Coefficient represents the response of target trees of species y to competition. The two coefficients are estimated for each species (x = 1, 2, …, n ; y = 1, 2, …, n ) The species‐by‐species approach is used for estimating and . 9164: 7

Currently the method from @GKi is the fastest and uses lowest memory.

Data:

dat <- data.frame(species = c("Spruce", "Spruce", "Oak", "Oak", "Fir")
                , diameter = c(4,   30,  4,   30,  30)
                , height  = c(30,  100, 30,  100, 100))

set.seed(0)
size <- 1e5
dat2 <- data.frame(
   species = sample(c("Spruce", "Oak", "Fir"), size=size, replace = TRUE)
 , diameter = runif(size, 1, 50)
 , height  = runif(size, 1, 100))

Methods:

funRobinson <- function(species, diameter, height) {
  bf_params <- data.frame(species  = c("Spruce", "Oak"),
                          b0_small = c(26.729,  29.790),
                          b1_small = c( 0.01189, 0.00997),
                          b0_large = c(32.516,  85.150),
                          b1_large = c( 0.01181, 0.00841))
  dimensions <- data.frame(diameter   = diameter,
                           height     = height,
                           species    = as.character(species),
                           this_order = 1:length(species))
  dimensions <- merge(y=dimensions, x=bf_params, all.y=TRUE, all.x=FALSE)
  dimensions <- dimensions[order(dimensions$this_order, decreasing=FALSE),]
  b0 <- with(dimensions, ifelse(diameter <= 20.5, b0_small, b0_large))
  b1 <- with(dimensions, ifelse(diameter <= 20.5, b1_small, b1_large))
  b0 + b1 * dimensions$diameter^2 * dimensions$height
}
with(dat, funRobinson(species, diameter, height))

funStupidWolf <- function(species, diameter, height) {
  species_avail=c("Spruce", "Oak")
  params_b0 = cbind(b0_small = c(26.729,  29.790),
                    b0_large = c(32.516,  85.150))
  rownames(params_b0) = species_avail
  params_b1 = cbind(b1_small = c( 0.01189, 0.00997),
                    b1_large = c( 0.01181, 0.00841))
  rownames(params_b1) = species_avail
  ROWS = match(species,species_avail)
  COLS = +(diameter > 20.5) + 1
  idx = cbind(ROWS,COLS)
  b0 <- params_b0[idx]
  b1 <- params_b1[idx]
  b0 + b1 * diameter^2 * height
}
with(dat, funStupidWolf(species, diameter, height))

funGKiCl  <- function(params, speciesLevels) {
  force(params)
  force(speciesLevels)
  nSpecies <- length(speciesLevels)
  i <- match(speciesLevels, params$species)
  params_b0  <- c(params$b0_small[i], params$b0_large[i])
  params_b1  <- c(params$b1_small[i], params$b1_large[i])
  rm(i, params, speciesLevels)
  function(species, diameter, height) {
    i <- unclass(species) + (diameter > 20.5) * nSpecies
    params_b0[i] + params_b1[i] * diameter * diameter * height
  }
}
params <- read.table(header = TRUE, text = "
species b0_small b1_small b0_large b1_large
Spruce    26.729  0.01189   32.516  0.01181
Oak       29.790  0.00997   85.150  0.00841")
funGKi <- compiler::cmpfun(funGKiCl(params, levels(dat$species)))
with(dat, funGKi(species, diameter, height))
rm(funGKiCl, params)

fun <- alist(Robinson = funRobinson(species, diameter, height)
           , StupidWolf = funStupidWolf(species, diameter, height)
           , GKi = funGKi(species, diameter, height))

Time:

library(microbenchmark)

attach(dat)
microbenchmark(list = fun, check = "equal")
#Unit: microseconds
#       expr      min       lq       mean    median        uq      max neval
#   Robinson 1876.491 1911.583 1997.00924 1934.8835 1962.3145 3131.453   100
# StupidWolf   15.618   17.371   22.30764   18.9995   26.5125   33.239   100
#        GKi    2.270    2.965    4.04041    3.6825    5.0415    7.434   100
microbenchmark(list = fun, check = "equal", control=list(order="block"))
#Unit: microseconds
#       expr      min        lq       mean   median        uq      max neval
#   Robinson 1887.906 1918.0475 2000.55586 1938.847 1962.9540 3131.112   100
# StupidWolf   15.184   16.2775   16.97111   16.668   17.2230   34.646   100
#        GKi    2.063    2.1560    2.37552    2.255    2.4015    5.616   100

attach(dat2)
microbenchmark(list = fun, setup = gc(), check = "equal")
#Unit: milliseconds
#       expr        min         lq       mean     median         uq        max neval
#   Robinson 189.342408 193.222831 193.682868 193.573419 194.181910 198.231698   100
# StupidWolf   6.755601   6.786439   6.836253   6.804451   6.832409   7.370937   100
#        GKi   1.756241   1.767335   1.794328   1.782949   1.806370   1.964409   100


library(bench)
attach(dat)
mark(exprs = fun, iterations = 100)
#  expression     min  median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time
#  <bch:expr> <bch:t> <bch:t>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm>
#1 Robinson    1.87ms   1.9ms      521.        0B     10.6    98     2    188.1ms
#2 StupidWolf 16.48µs 17.46µs    55666.        0B      0     100     0      1.8ms
#3 GKi         2.67µs  2.86µs   337265.        0B      0     100     0    296.5µs

attach(dat2)
mark(exprs = fun, iterations = 100)
#  expression      min   median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc
#  <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl> <int> <dbl>
#1 Robinson   188.96ms 216.15ms      4.44   67.71MB     11.4   100   257
#2 StupidWolf   6.79ms   6.85ms    131.      8.77MB     30.0   100    23
#3 GKi           1.7ms   1.72ms    552.      2.29MB     22.1   100     4
#Some expressions had a GC in every iteration; so filtering is disabled. 

Memory:

memUse <- function(list, setup = "", gctort = FALSE) {
  as.data.frame(lapply(list, function(z) {
    eval(setup)
    tt <- sum(.Internal(gc(FALSE, TRUE, TRUE))[13:14])
    gctorture(on = gctort)
    eval(z)
    gctorture(on = FALSE)
    sum(.Internal(gc(FALSE, FALSE, TRUE))[13:14]) - tt
  }))
}

attach(dat)
memUse(list=fun, gctort = FALSE)
# Robinson StupidWolf GKi
#1     0.9          0   0
memUse(list=fun, gctort = TRUE)
# Robinson StupidWolf GKi
#1       0          0   0

attach(dat2)
memUse(list=fun, gctort = FALSE)
# Robinson StupidWolf GKi
#1    71.9        8.8 2.3
memUse(list=fun, gctort = TRUE)
# Robinson StupidWolf GKi
#1    29.7        6.5 2.3

object.size(funRobinson)
#109784 bytes

object.size(funStupidWolf)
#68240 bytes

object.size(funGKi)
#21976 bytes

The accuracy of species-specific allometric equations for estimating , using the semi-destructive method. Results. Allometric equations in form of power models were developed for each tree species by evaluating the� Each coefficient entry below the second row is the sum of the closest pair of numbers in the line directly above it. This triangular array is called Pascal's triangle, named after the French mathematician Blaise Pascal. Pascal's triangle can be extended to find the coefficients for raising a binomial to any whole number exponent.

gllvm: Fast analysis of multivariate abundance data with generalized , The r package gllvm offers relatively fast methods to fit GLLVMs via maximum finding between species interactions and visualization of correlation patterns are vectors of species specific coefficients related to the covariates and for environmental variables, we include TR matrix to the function call. CoefficientList[poly, var] gives a list of coefficients of powers of var in poly, starting with power 0. CoefficientList[poly, {var1, var2, }] gives an array of coefficients of the vari.

Comparison of calculation methods for estimating annual carbon , the so-called logging factor method and the growth factor method. The logging factor method incorporates annual logging data to project annual 2008 ( Inventurstudie 2008 und Treibhausgasinventar Wald) in order to get values The tree species-specific coefficients, required for calculating the BB by� Here, we introduce the species-specific coefficients: density equivalence coefficients (DEC), for density equivalence; and density modification coefficient (DMC), for density modification in mixed species stands. DEC is suitable for the conversion of the stand density and growing area requirement of one species into that of another species.

Chapter 9: Modeling Growth, Yield, and Site Index, Model: a mathematical function used to relate observed growth rates or yield to measured tree, growth from current stand variables, then integrated the growth function to obtain yield: Differences in projection methods are based on the distribution of the number of stems in where b3 is a species-specific coefficient and Find the coefficients of this univariate polynomial. The coefficients are ordered from the lowest degree to the highest degree. syms x c = coeffs (16*x^2 + 19*x + 11) c = [ 11, 19, 16] Reverse the ordering of coefficients by using fliplr. c = fliplr (c)