UNDER DEVELOPMENT

1. Introduction

Passive acoustic monitoring is an effective means of monitoring marine mammals; however, the value of acoustic detections depends on our ability to identify the source of the sounds we detect. Manual classification by trained acousticians can be used to develop a set of training data for supervised classification algorithms, such as BANTER (Bio-Acoustic eveNT classifiER).

A BANTER acoustic classifier is written in open source software, R, and requires minimal human intervention, providing more consistent results with fewer biases and errors than manual classification. BANTER also produces a classification error rate which is valuable for evaluating predicted labels when there is no independent verification of species identity. BANTER has been developed in a general manner such that it can be applied to sounds from any source (anthropogenic, terrestrial animals, marine animals).

BANTER is a flexible, hierarchical supervised machine learning algorithm for classifying acoustic events consisting of two stages, each consisting of a set of Random Forest classifiers (Rankin et al. 2017). For the purposes of the model, an acoustic event is any related set of acoustic recordings of a single or set of individual animals collected at the same time and place. For example, this could be a single sighting of a school of dolphins that were continuously recorded from the start to the end of the sighting. Alternatively, this recording could also be subset into several events, if for example, there were periods where the dolphins were out of range of the recorder or there was excessive ship noise.

The first stage of the algorithm is to build a set of individual classification models for each call type (e.g. whistle, echolocation, burst pulses) that were extracted from the event. These are referred to as “Detector Models”. In the second stage, the results from the first stage Detector Models (specifically, the mean of the species classification probabilities for each call type) are combined with event-level variables, such as length of the event or location of the event, to create a model classifying each event to species. This model is referred to as the “Event Model”.

BANTER classifier models (both the Detector and Event Models) are based on the Random Forest supervised learning algorithm (randomForest package in R, Liaw and Wiener 2002). A Random Forest model is a suite of a large number of decision trees, each trained on a different subset of the data, with predictions beeing aggregated across all trees (the forest). For each decision tree in the forest, some portion of the samples will be left out during the construction of the decision tree (referred to as the Out-Of-Bag or OOB samples). The model internally evaluates its own performance by running each of the OOB samples through the trees they are OOB for and comparing its predicted group classification to the a-priori designated group. Thus, there is no need for separate cross-validation or another test to get an unbiased estimate of the classification error. Random Forest can handle a large number of input variables which can be discrete or categorical, is not prone to issues related to correlated variables, and does not require that predictors fit any particular parametric distribution. The random subsetting of samples and variables, and use of OOB samples prevents overfitting of the model.

Here we present a user guide for the BANTER acoustic classification algorithm, using the built-in datasets provided in the BANTER package.

2. Methods

At a minimum, BANTER requires data to train a classifier which can then be applied to predict species identity on a novel dataset that has the same predictors. Here we will use data provided within the BANTER R package for both training and testing models.

Once you have the data, you first need to initialize a BANTER model to train. The BANTER model can be developed in stages (first the Detector Model, then the Event Model) or all at once. We suggest running these separately so that each model can be modified to improve performance and ensure stability. Once the models are optimized, we present options for summarizing and interpreting your results.

This guide was developed based on BANTER version 0.9.5.

2.1 Data Requirements and Limitations

BANTER has flexible data requirements which allow it to be applied to a wide array of training data. BANTER consists of two stages: (1) Detector Models and (2) Event Model. At its core, BANTER is an event classifier: it classifies a group of sounds observed at the same time. Multiple call type detectors can be considered; if your species of interest only produces a single call type, we have found that minor changes to the detector settings can lead to differences between species that can be informative (see discussion of multiple whistle and moan detectors in Rankin et al. 2017).

BANTER accepts raw data in a generic R data frame format. There can be one or more detector data frame and only one event data frame.

The event data frame must have one row per event. The columns must be:

  • event.id a unique character or number identifying each event.
  • species a character or number that assigns each event to a given species.
  • All other columns will be used as predictor variables for the event (i.e. event-level variables).
    Note: The species column is only required when training a model. It is not required for predicting on novel data.

Each detector data frame must have one row per call. The columns must be:

  • event.id a unique character or number identifying each event. This is used to connect each call to the appropriate event in the event data frame described above.
  • call.id unique character or number for this individual call in this detector.
  • All other columns will be used as predictor variables for the call (i.e. acoustic measures for individual calls).

If you use the PAMGuard open source software, you can process and export your data formatted for BANTER using the export_banter() function in the PAMpal package. This will create a list with three items: ‘events’, ‘detectors’, and ‘na’ following the data frame structure outlined above. You can also use the ‘dropVars’ argument in this function to remove any variables you do not want to include in your BANTER model.”

BANTER cannot accommodate missing data (NA). Any predictors with missing data will be excluded from the model. As Random Forest uses OOB data for internal validation, there must be a minimum of two events for each species in your model. Any species with fewer than two events will be excluded from the model. If a species is absent from one of your detector models, but occurs in other detector models, that species can still be can be used in the event model.

BANTER is a supervised machine learning classification model, and the strength of the classifications necessarily relies on the quality of the training data. Likewise, if you are applying a classifier you built to predict novel data, it is imperative that the novel data be collected in the same manner, and have the same variables, as the training data. Here we provide tools to help you assess your model, but we recommend that you do more detailed explorations of your data to fully understand its strengths and limitations.

First, install the following R packages

install.packages(c("banter", "rfPermute", "tidyverse"))

Then load the R packages

library(banter)
library(rfPermute)
library(tidyverse)

2.2 Create BANTER Model

We will use the data provided in the BANTER package (train.data). We must first load the training data, and take a look at the first few lines of data.

# load example data
data(train.data) 
# show names of train.data list
names(train.data)
[1] "events"    "detectors"

The train.data object is a list that contains both the event data frame (train.data$events) and a list of data frames for each of three call detectors (train.data$detectors).

2.2.1 Initialize BANTER Model

Once we have our data, the next step is to initialize the BANTER model with the event data:

# initialize BANTER model
bant.mdl <- initBanterModel(train.data$events) 

When the BANTER model has been initialized, it is good to check the summary() to verify the distribution of the number of events for each species:

# summarize BANTER model
summary(bant.mdl)
Number of events and model classification rate:
          species num.events
1      D.capensis          7
2       D.delphis        116
3       G.griseus          5
4 G.macrorhynchus          1
5   L.obliquidens         10
6       O.orcinus          1
7  S.coeruleoalba         13
8         Overall        153

2.2.2 Adding Detectors

The addBanterDetector() function adds Detectors to your model, where the detector information is tagged by Event. If the detector data is a single data frame, then the name of the detector (for example, “bp” is the “bp” detector) needs to be provided (see https://cran.r-project.org/web/packages/banter/banter.pdf). If detector data is a named list of data frames, the name does not need to be provided (can be NULL). The addBanterDetector() function can be called repeatedly to add additional detectors or detectors can be added all at once. If your models require different parameters for different detectors, you may want to model them separately. Here we will load all detectors at once.

# Add BANTER Detectors and Run Detector Models
bant.mdl <- addBanterDetector(
  bant.mdl, 
  data = train.data$detectors, # Identify all detectors in the train.data dataset
  ntree = 100, # Number of trees to run. See section on 'Tune BANTER Model' for more information.
  importance = TRUE, # Retain the importance information for downstream analysis
  sampsize = 2 # Number of samples used for each tree. See section on 'Tune _BANTER_ Model' for more information.
)
Warning: Detector model (dw): sampsize = 2 is >= species frequencies:
  O.orcinus: 2
These species will be used in the model:
  D.capensis: 350
  D.delphis: 5444
  G.griseus: 103
  G.macrorhynchus: 50
  L.obliquidens: 182
  S.coeruleoalba: 486

This initializes and creates the Random Forest detector models for each detector added. The function will generate reports of species excluded from models due to an insufficient number of samples. When complete, a summary of the model shows the mean correct classification rates of each species in each detector:

summary(bant.mdl)
Model run times:
   start                 stop                  run.time   
bp "2022-02-09 17:48:15" "2022-02-09 17:48:15" "0.37 secs"
dw "2022-02-09 17:48:15" "2022-02-09 17:48:15" "0.38 secs"
ec "2022-02-09 17:48:15" "2022-02-09 17:48:16" "0.31 secs"

Number of events and model classification rate:
          species num.events    bp    dw    ec
1      D.capensis          7 12.37 32.57 30.57
2       D.delphis        116 36.16 33.05 19.92
3       G.griseus          5 57.14 87.38 13.60
4 G.macrorhynchus          1 20.00 52.00 16.00
5   L.obliquidens         10 28.99 32.42 39.20
6       O.orcinus          1    NA    NA 52.00
7  S.coeruleoalba         13 50.00 31.48 15.38
8         Overall        153 35.46 33.88 21.27

You can then create and examine the Error Trace Plot to determine the stability of your model. You may want to modify the sampsize and ntree parameters in the model to improve performance and ensure a stable model. See the section on Tune BANTER Model for more information on interpreting these plots and tuning your model.

plotDetectorTrace(bant.mdl)

Once you are satisfied with the Detector Models, you are ready to set up and run the second stage BANTER event model. This model will be based on output from the Detector Models, as well as any event-level variables you may have.

bant.mdl <- runBanterModel(bant.mdl, ntree = 10, sampsize = 1)
Warning: Event model: sampsize = 1 is >= species frequencies:
  G.macrorhynchus: 1
  O.orcinus: 1
These species will be used in the model:
  D.capensis: 7
  D.delphis: 115
  G.griseus: 5
  L.obliquidens: 10
  S.coeruleoalba: 13

In these examples, we have chosen some arbitrary values for the ntree (number of trees in the Random Forest) and sampsize (number of samples randomly selected for each species in each tree) parameters. In the next section, we will explore the effect of changing these parameters for both Detector and Event Models to improve performance and model stability.

2.3 Tune BANTER Model

Here, we will discuss several arguments that can be modified to improve model performance, and several functions that can be used to visualize and evaluate the model. The arguments provided in the Detector and/or Event models include:

  • sampsize = bootstrapped number of samples to use in each tree The sample size (sampsize) is the number of samples randomly selected from each species to build each tree in the ‘forest’ (model). Note that for BANTER models, samples are randomly selected without replacement (replace = FALSE in randomForest) as this allows for unbiased, balanced sampling. Increasing sampsize leads to a forest that is trained on smaller set of unique random combinations of samples and may miss patterns in small subsets of the sample space. Decreasing sampsize increases the variation from tree to tree in the forest, which strengthens some of the built-in protections against overfitting. However, this may come at the expense of model performance which can be addressed by increasing the number of trees in the forest (ntree).

The model will use n = sampsize samples for each species when creating each tree in the model, and the remaining samples will be used as out-of-bag (OOB) for model testing. By default, sampsize is set as half of the smallest sample size of all species. By choosing an equal number of samples for each species, a balanced an unbiased model is ensured. Since it is half of the smallest sample size, this ensures that at least half of each species will be OOB for internal validation. Models will run faster for low sample sizes and large number of trees, rather than vice-versa (there is little computational cost to running a very large number of trees). We have conducted tests which show that we can obtain the same performance with sample sizes as low as 1-2 per species and very large numbers of trees (> 10,000).

  • ntree = number of trees There is a low computational cost to increasing the number of trees, so we recommend increasing the number of trees until the classification results are extremely stable (see the plotDetectorTrace() function). Each tree is based on a random subset of samples, and therefore, the more trees you run in your model, the more you can reduce the variance. Therefore, you want to increase ntree until the classification results are stable. In the Error Trace plot below, you want any model variation (vertical movement in any lines) to occur in the first 1-5% of the the total length of the trace, resulting in a trace that is primarily flat (stable).

  • importance = TRUE Importance in Random Forest is a measure of the predictive power of a variable. This variable will be used in downstream processing, and we recommend setting importance=TRUE to save these values in your BANTER detector model (it is automatically saved in the event model). As a tree is trained, a permutation experiment is conducted that scrambles the predictor values. If this scrambling increases the final error rate, then this variable is a relatively important predictor. However, if this experiment shows that changes to the value of this variable do not impact the overall error rate, then this variable is not as important.

  • num.cores = number of cores to use for Random Forest model num.cores refers to the number of cores used by your computer in processing data. The default is num.cores = 1, but it can be set to a maximum of 1 less than the number of cores available on your computer. If num.cores is set to >1, the importance variables cannot be saved. While there may be value in increasing the num.cores during preliminary processing (to ‘tune’ the model), we recommend reducing num.cores = 1 for the final processing in order to allow for importance = TRUE.

Note that BANTER uses the Random Forest default value for the number of predictors to randomly select in building the trees (mtry). We have found that modifying this parameter does not appreciably affect model accuracy or performance.

It is important that your BANTER model is stable: the results should not change when you rerun the model. We will explain how to tune the model using the case of the poorly performing BANTER Event Model we created above. These same methods can be applied to the Detector Models, to ensure that your stage 1 models are stable (in this small case they are reasonably stable).

Once the Event Model has been run, the summary() function provides two useful plots for an initial evaluation of the model:

summary(bant.mdl)
Model run times:
      start                 stop                  run.time   
bp    "2022-02-09 17:48:15" "2022-02-09 17:48:15" "0.37 secs"
dw    "2022-02-09 17:48:15" "2022-02-09 17:48:15" "0.38 secs"
ec    "2022-02-09 17:48:15" "2022-02-09 17:48:16" "0.31 secs"
event "2022-02-09 17:48:17" "2022-02-09 17:48:17" "0.33 secs"

Number of events and model classification rate:
          species num.events    bp    dw    ec event
1      D.capensis          7 12.37 32.57 30.57 57.14
2       D.delphis        116 36.16 33.05 19.92 77.39
3       G.griseus          5 57.14 87.38 13.60 20.00
4 G.macrorhynchus          1 20.00 52.00 16.00    NA
5   L.obliquidens         10 28.99 32.42 39.20 10.00
6       O.orcinus          1    NA    NA 52.00    NA
7  S.coeruleoalba         13 50.00 31.48 15.38 53.85
8         Overall        153 35.46 33.88 21.27 68.00

<< Summary for "event" model >>

Number of trees: 10 

Sample sizes:
    D.capensis      D.delphis      G.griseus  L.obliquidens S.coeruleoalba 
             1              1              1              1              1 

Distribution of percent correctly classified overall in last 5 (50%) trees:
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  54.67   59.33   62.67   62.00   65.33   68.00 

Sample inbag proportion distribution:
         Min. 1st Qu. Median Mean 3rd Qu. Max.
expected  0.9     7.7     10 10.6    14.3   20
observed  0.0     0.0      0  3.3     0.0   50

Confusion matrix:
               D.capensis D.delphis G.griseus L.obliquidens S.coeruleoalba
D.capensis              4         2         0             1              0
D.delphis              15        89         1             1              9
G.griseus               1         0         1             1              2
L.obliquidens           3         1         5             1              0
S.coeruleoalba          6         0         0             0              7
Overall                NA        NA        NA            NA             NA
               pct.correct   LCI_0.95 UCI_0.95
D.capensis        57.14286 18.4051568 90.10117
D.delphis         77.39130 68.6539477 84.66848
G.griseus         20.00000  0.5050763 71.64179
L.obliquidens     10.00000  0.2528579 44.50161
S.coeruleoalba    53.84615 25.1345482 80.77676
Overall           68.00000 59.8994214 75.37305

The top plot is the trace of the error by the number of trees. This gives you an idea of the stability of the model. The plot is created using the plotTrace() function from the rfPermute package and shows the cumulative error (y-axis) across an increasing number of trees (x-axis). Much like with the Detector Models, the goal is to have a stable Error Trace (flat lines) across a majority of the trace. Ideally, most of the variability in the trace is restricted to the first 5% of the trees.

The second plot shows the distribution of the percent of all trees samples were used to train the model or “inbag” (the opposite of “out of bag” or OOB). For example, in the above plot, most samples were never inbag (x-axis = 0), because so few trees were run. The red lines show the expected frequencies. The goal is to run enough trees so that the observed distribution (grey bars) tightly match the expected distribution (red lines). This ensures that all samples are being used and there is a good mixing of samples in creating trees. This plot can also be generated for extracted Random Forest models using the plotInbag() function in the rfPermute package.

Remember that for our BANTER model, we used sampsize = 1 and ntree = 10 (bant.mdl <- run_BANTER_Model(bant.mdl, ntree = 10, sampsize = 1). Clearly these values were insufficient to create a stable model. We will need to increase the sample size and/or the number of trees in our model to improve performance. We suggest first increasing ntree until the trace is flat (or close). If this takes too many trees (run times are unnacceptably long), then sampsize can be incrementally increased until you are satisfied with the performance. Remember that it is best to keep sampsize less than or equal to half of the smallest species frequency.

Here we will rerun our model with an improved set of parameters and examine the difference in the results and summary information.

bant.mdl <- runBanterModel(bant.mdl, ntree = 100000, sampsize = 3)
Warning: Event model: sampsize = 3 is >= species frequencies:
  G.macrorhynchus: 1
  O.orcinus: 1
These species will be used in the model:
  D.capensis: 7
  D.delphis: 115
  G.griseus: 5
  L.obliquidens: 10
  S.coeruleoalba: 13
summary(bant.mdl)
Model run times:
      start                 stop                  run.time   
bp    "2022-02-09 17:48:15" "2022-02-09 17:48:15" "0.37 secs"
dw    "2022-02-09 17:48:15" "2022-02-09 17:48:15" "0.38 secs"
ec    "2022-02-09 17:48:15" "2022-02-09 17:48:16" "0.31 secs"
event "2022-02-09 17:48:18" "2022-02-09 17:48:27" "9.15 secs"

Number of events and model classification rate:
          species num.events    bp    dw    ec event
1      D.capensis          7 12.37 32.57 30.57 85.71
2       D.delphis        116 36.16 33.05 19.92 79.13
3       G.griseus          5 57.14 87.38 13.60 80.00
4 G.macrorhynchus          1 20.00 52.00 16.00    NA
5   L.obliquidens         10 28.99 32.42 39.20 90.00
6       O.orcinus          1    NA    NA 52.00    NA
7  S.coeruleoalba         13 50.00 31.48 15.38 92.31
8         Overall        153 35.46 33.88 21.27 81.33

<< Summary for "event" model >>

Number of trees: 1e+05 

Sample sizes:
    D.capensis      D.delphis      G.griseus  L.obliquidens S.coeruleoalba 
             3              3              3              3              3 

Distribution of percent correctly classified overall in last 50000 (50%) trees:
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  81.33   81.33   81.33   81.33   81.33   81.33 

Sample inbag proportion distribution:
         Min. 1st Qu. Median Mean 3rd Qu. Max.
expected  2.6    23.1   30.0 31.7    42.9 60.0
observed  2.5     2.6    2.6 10.0     2.7 60.1

Confusion matrix:
               D.capensis D.delphis G.griseus L.obliquidens S.coeruleoalba
D.capensis              6         0         0             0              1
D.delphis              16        91         1             0              7
G.griseus               0         0         4             1              0
L.obliquidens           0         0         1             9              0
S.coeruleoalba          1         0         0             0             12
Overall                NA        NA        NA            NA             NA
               pct.correct LCI_0.95 UCI_0.95
D.capensis        85.71429 42.12768 99.63897
D.delphis         79.13043 70.55788 86.14719
G.griseus         80.00000 28.35821 99.49492
L.obliquidens     90.00000 55.49839 99.74714
S.coeruleoalba    92.30769 63.97026 99.80544
Overall           81.33333 74.16409 87.22317

Once you are satisfied with your model, you can extract the Random Forest model (and model data) as separate objects for further analysis.

bant.rf <- getBanterModel(bant.mdl)
bantData.df <- getBanterModelData(bant.mdl)

Random Forest models for each detector can also be extracted if it is desired to explore the performance of these models independently. We will give a brief example of this below.

dw.rf <- getBanterModel(bant.mdl, "dw")
bp.rf <- getBanterModel(bant.mdl, "bp")
ec.rf <- getBanterModel(bant.mdl, "ec")

You are now ready to summarize and interpret your models and results.

2.4 Interpret BANTER Results

The summary() function provides information regarding your model results; however, conducting a ‘deep dive’ into these results will give you a better understanding of the strengths and limitations of your results and may guide you towards improving those results. Here we demonstrate a number of options for interpreting your BANTER results.

2.4.1 Model Information

Detector Names & Sample Sizes
Show the Detector Names and Sample Sizes

# Get detector names for your _BANTER_ Model
getDetectorNames(bant.mdl)
[1] "bp" "dw" "ec"
# Get Sample sizes
getSampSize(bant.mdl)
    D.capensis      D.delphis      G.griseus  L.obliquidens S.coeruleoalba 
             3              3              3              3              3 

Number of Calls & Events, Proportion of Calls
Number of calls (numCalls()), proportion of calls (propCalls()) and number of events (numEvents()) in your BANTER detector models (or specify by event/species)

# number of calls in detector model
numCalls(bant.mdl)
          species num.bp num.dw num.ec
1      D.capensis    283    350    350
2       D.delphis   4837   5444   5732
3       G.griseus    182    103    250
4 G.macrorhynchus     50     50     50
5   L.obliquidens    307    182    500
6       O.orcinus      0      0     50
7  S.coeruleoalba    134    486    650
# number of calls by species (can also do by event)
numCalls(bant.mdl, "species")
          species num.bp num.dw num.ec
1      D.capensis    283    350    350
2       D.delphis   4837   5444   5732
3       G.griseus    182    103    250
4 G.macrorhynchus     50     50     50
5   L.obliquidens    307    182    500
6       O.orcinus      0      0     50
7  S.coeruleoalba    134    486    650
# proportion of calls in detector model
propCalls(bant.mdl)
          species   prop.bp   prop.dw   prop.ec
1      D.capensis 0.2878942 0.3560529 0.3560529
2       D.delphis 0.3020671 0.3399738 0.3579592
3       G.griseus 0.3401869 0.1925234 0.4672897
4 G.macrorhynchus 0.3333333 0.3333333 0.3333333
5   L.obliquidens 0.3104146 0.1840243 0.5055612
6       O.orcinus 0.0000000 0.0000000 1.0000000
7  S.coeruleoalba 0.1055118 0.3826772 0.5118110
# proportion of calls by event (can also do by species)
#propCalls(bant.mdl, "event")
#[this is commented out as printout is long]

# number of events, with default for Event Model
numEvents(bant.mdl)
          species num.events
1      D.capensis          7
2       D.delphis        116
3       G.griseus          5
4 G.macrorhynchus          1
5   L.obliquidens         10
6       O.orcinus          1
7  S.coeruleoalba         13
# number of events for a specific detector 
numEvents(bant.mdl, "bp")
          species num.events
1      D.capensis          7
2       D.delphis        113
3       G.griseus          5
4 G.macrorhynchus          1
5   L.obliquidens         10
6       O.orcinus          0
7  S.coeruleoalba         11

2.4.2 Random Forest Summaries

The following functions are available in the rfPermute package and take a randomForest or rfPermute model object. Recall from above that the actual randomForest object can be extracted from the BANTER model with the getBanterModel() function.

Confusion Matrix
The Confusion Matrix is the most commonly used output for a Random Forest model, and is provided by summary(). The output includes the percent correctly classified for each species, the lower and upper confidence levels, and the priors (expected classification rate).

By default, summary() reports the 95% confidence levels of the percent correctly classified. By using the confusionMatrix() function, we can specify a different confidence level if desired. However, unlike summary(), confusionMatrix() takes a randomForest object like the one we extracted above.

# Confusion Matrix
confusionMatrix(bant.rf, conf.level = 0.75)
               D.capensis D.delphis G.griseus L.obliquidens S.coeruleoalba
D.capensis              6         0         0             0              1
D.delphis              16        91         1             0              7
G.griseus               0         0         4             1              0
L.obliquidens           0         0         1             9              0
S.coeruleoalba          1         0         0             0             12
Overall                NA        NA        NA            NA             NA
               pct.correct LCI_0.75 UCI_0.75
D.capensis        85.71429 57.20285 98.11049
D.delphis         79.13043 74.01716 83.56583
G.griseus         80.00000 44.36987 97.36472
L.obliquidens     90.00000 68.32814 98.67356
S.coeruleoalba    92.30769 74.89589 98.97809
Overall           81.33333 77.07386 85.03762

The confusionMatrix() function also has a threshold argument that provides the binomial probability that the true classification probability (given infinite data) is greater than or equal to this value. For example, if we want to know what the probability is that the true classification probability for each species is >= 0.80, we set threshold = 0.8:

# Confusion Matrix with medium threshold
confusionMatrix(bant.rf, threshold = 0.8)
               D.capensis D.delphis G.griseus L.obliquidens S.coeruleoalba
D.capensis              6         0         0             0              1
D.delphis              16        91         1             0              7
G.griseus               0         0         4             1              0
L.obliquidens           0         0         1             9              0
S.coeruleoalba          1         0         0             0             12
Overall                NA        NA        NA            NA             NA
               pct.correct LCI_0.95 UCI_0.95    Pr.gt_0.8
D.capensis        85.71429 42.12768 99.63897 8.445244e-92
D.delphis         79.13043 70.55788 86.14719 4.253745e-08
G.griseus         80.00000 28.35821 99.49492 7.453220e-96
L.obliquidens     90.00000 55.49839 99.74714 3.153290e-86
S.coeruleoalba    92.30769 63.97026 99.80544 4.219544e-81
Overall           81.33333 74.16409 87.22317 6.896342e-01

This shows that D. capensis has a high probability of having a true classification score above 0.8 (Pr.gt_0.8 = 79.0). Conversely, the probability that the classification rate for D.delphis is above 0.8 is very low (Pr.gt_0.8 = 6.8).

And alternative view of the confusion matrix comes in the form of a heat map.

# Plot Confusion Matrix Heatmap
plotConfMat(bant.rf, title="Confusion Matrix HeatMap") 

We can also examine confusion matrices for individual detectors, such as the whistle detector (“dw”):

dw.rf <- getBanterModel(bant.mdl, "dw")
confusionMatrix(dw.rf)
                D.capensis D.delphis G.griseus G.macrorhynchus L.obliquidens
D.capensis             114        77        28              24            51
D.delphis             1067      1799      1251             192           560
G.griseus                4         1        90               2             4
G.macrorhynchus         10         3         2              26             3
L.obliquidens           22         3        62              30            59
S.coeruleoalba         151        78        10              59            35
Overall                 NA        NA        NA              NA            NA
                S.coeruleoalba pct.correct LCI_0.95 UCI_0.95
D.capensis                  56    32.57143 27.68472 37.75609
D.delphis                  575    33.04555 31.79624 34.31307
G.griseus                    2    87.37864 79.38447 93.10523
G.macrorhynchus              6    52.00000 37.41519 66.33949
L.obliquidens                6    32.41758 25.68056 39.73837
S.coeruleoalba             153    31.48148 27.37299 35.81705
Overall                     NA    33.87755 32.73676 35.03257
plotConfMat(dw.rf) 

Model Percent Correct
This function operates on a BANTER model object and provides a summary data frame with the percent of each species correctly classified for each detector model and the event model. It is a summary of the diagonal values from the confusion matrices for all models.

modelPctCorrect(bant.mdl)
# A tibble: 8 × 5
  species            bp    dw    ec event
  <fct>           <dbl> <dbl> <dbl> <dbl>
1 D.capensis       12.4  32.6  30.6  85.7
2 D.delphis        36.2  33.0  19.9  79.1
3 G.griseus        57.1  87.4  13.6  80  
4 G.macrorhynchus  20    52    16    NA  
5 L.obliquidens    29.0  32.4  39.2  90  
6 O.orcinus        NA    NA    52    NA  
7 S.coeruleoalba   50    31.5  15.4  92.3
8 Overall          35.5  33.9  21.3  81.3

Plot Votes
The strength of a classification model depends on the number of trees that ‘voted’ for the correct species. We can look at the votes from each of these 5,000 trees for an event to see how many of them were correct. This plot shows these votes where each vertical slice is an event, and the percentage of votes for each species is represented by their color. If all events for a species were to be correctly classified by all of the trees (votes) in the forest, then the plot for that species would be solid in the color that represents that species.

# Plot Vote distribution
plotVotes(bant.rf) 

Percent Correct
Another way to visualize this distribution is to evaluate the percent of events correctly classified for a given threshold (specified percent of trees in the forest voting for that species).

# Percent Correct for a series of thresholds
pctCorrect(bant.rf, pct = c(seq(0.2, 0.6, 0.2), 0.95))
           class pct.correct_0.2 pct.correct_0.4 pct.correct_0.6
1     D.capensis        85.71429        85.71429        42.85714
2      D.delphis        79.13043        74.78261        33.04348
3      G.griseus        80.00000        60.00000        20.00000
4  L.obliquidens        90.00000        70.00000        40.00000
5 S.coeruleoalba        92.30769        84.61538        46.15385
6        Overall        81.33333        75.33333        34.66667
  pct.correct_0.95
1                0
2                0
3                0
4                0
5                0
6                0

These values will always decrease as the percent of trees threshold increases. That is because as stringency is decreased (lower thresholds), more samples are likely to be correctly classified. These values give an indication of the fraction of events that can be classified with high certainty. As we can see in this example data, the distribution goes to zero for all species at 95%. That is there are no events in any species that are correctly classified with 95% certainty.

Plot Predicted Probabilities
The full distribution of assignment probabilities to the predicted species class can be visualized with the plotPredictedProbs() function in rfPermute. Ideally, all events would be classified to the correct species (identified by the color), and would be strongly classified to the correct species (higher probablity of assignment). This plot can be used to understand the distribution of these classifications, and how strong the misclassifications were, by species.

plotPredictedProbs(bant.rf, bins = 30, plot = TRUE)

Proximity Plot
The proximity plot provides a visualization of the distribution of events within the tree space. It shows the relative distance of events based on their average distance in nodes in the trees across the forest. For each event in the plot, the color of the central dot represents the true species identity, and the color of the circle represents the BANTER classification. Ideally, these would form rather distinct clusters, one for each species. The wider the spread of the events in this feature space, the more variation found in these predictors. Some species differentiation may be predicted by other predictors and may not be clear based on this pair of dimensions (those may be differentiated with different predictors).

# Proximity Plot
plotProximity(bant.rf)

Importance Heat Map
The importance heat map provides a visual assessment of the important predictors for the overall model. The BANTER event model relies on the mean assignment probability for each of the detectors in our detector model, as well as any event level measures. For example, in this heat map, the first variable is ‘dw.D.delphis’, which is the mean probability that a detection was assigned to the species ‘D.delphis’ in the whistle detector. This requires extra steps to dig down to the specific whistle measures that are the important predictor variables for the whistle detector.

# Importance Heat Map
plotImportance(bant.rf, plot.type = "heatmap")

2.4.3 Mis-Classified Events

By segregating the misclassified events, you can dive deeper into these data to understand why the model failed. Perhaps they were incorrectly classified in the first place (inaccurate training data) or the misclassification could be due to natural variability in the call characteristics. There are any number of possibilities, and by diving into the misclassifications, you can learn a lot about your data and your model. We do not recommend eliminating misclassifications simply because they are misclassifications. The point is to learn more about your data, not to cherry pick your data to get the best performing model.

Case Predictions If it is important to identify only strong classification results, they can be identified and filtered using the casePredictions() function in rfPermute.

casePredict <- casePredictions(bant.rf)
head(casePredict)
  id      original     predicted is.correct D.capensis  D.delphis  G.griseus
1  7     D.delphis     D.delphis       TRUE 0.40304756 0.45728798 0.06294187
2 17     G.griseus     G.griseus       TRUE 0.07665488 0.06597101 0.71283060
3 18     G.griseus     G.griseus       TRUE 0.09492357 0.11979284 0.34386650
4 26 L.obliquidens L.obliquidens       TRUE 0.10960924 0.11957502 0.12076347
5 29     G.griseus L.obliquidens      FALSE 0.03318734 0.03116006 0.15717682
6 48 L.obliquidens L.obliquidens       TRUE 0.01702705 0.01437237 0.29378434
  L.obliquidens S.coeruleoalba
1    0.04391771     0.03280488
2    0.12880410     0.01573940
3    0.23563262     0.20578448
4    0.45680780     0.19324446
5    0.54503817     0.23343762
6    0.57027046     0.10454578

This function returns a data frame with the original and predicted species for each event along with if the event was correctly classified and the assignment probabilities to each species.
To identify misclassified events, we just filter this data frame and grab the original event id.

misclass <- casePredict %>% 
  filter(!is.correct) %>%
  select(id)

We can then look closer at these events to learn more about them.

2.4.4 Variable Importance

One of the powerful features of Random Forest is the ability to assess and rank which predictors are important to the classification model. These values are based on measures of how much worse the classifier performs when the predictor variables are randomly permuted. The importance() function in the randomForest package extracts these values from a randomForest object.

# Get importance scores and convert to a data frame
bant.imp <- data.frame(importance(bant.rf))
head(bant.imp)
              D.capensis  D.delphis    G.griseus L.obliquidens S.coeruleoalba
duration       -7.972135  -3.222351  46.50625720     39.132070      0.3119724
prop.bp       -13.876983  43.185110  -8.37692561     -2.630635     78.3716706
prop.dw       -15.933639  32.030778  -0.06805889     35.185321     28.2794087
prop.ec        -7.850480  25.873545 -16.19843374     12.663769     46.2107517
bp.D.capensis 101.336365  58.558314  37.26179087     25.599379    -36.5418976
bp.D.delphis   66.330585 167.284453 112.35805842     68.701433    121.9548050
              MeanDecreaseAccuracy MeanDecreaseGini
duration                  20.76905        0.2363776
prop.bp                   68.59852        0.2543507
prop.dw                   47.88022        0.1927803
prop.ec                   40.41800        0.1624973
bp.D.capensis             79.64982        0.5053987
bp.D.delphis             168.19577        0.9296247

To see the actual distribution of these values in each species, we can first identify the most important predictors (highest importance scores)

# Select top 4 important event stage predictors
bant.4imp <- bant.imp[order(bant.imp$MeanDecreaseAccuracy, decreasing = TRUE), ][1:4, ]
bant.4imp
                  D.capensis D.delphis G.griseus L.obliquidens S.coeruleoalba
dw.D.delphis        45.37301 163.81067 143.42024     123.19125       83.47402
bp.D.delphis        66.33059 167.28445 112.35806      68.70143      121.95480
dw.S.coeruleoalba   95.17281  64.55798  69.32990      93.29123      136.55730
ec.L.obliquidens    64.97518  38.84887 -10.87847     162.03459       59.48529
                  MeanDecreaseAccuracy MeanDecreaseGini
dw.D.delphis                  179.7994        1.0894643
bp.D.delphis                  168.1958        0.9296247
dw.S.coeruleoalba             146.6527        0.8800543
ec.L.obliquidens              125.7512        0.9128095

The predictors that showed the greatest importance came from the whistle (dw) detector and the burst pulse (bp) detectors. We can then plot the distribution of the predictor variables on these classes (in this case, a violin plot for each of these four most important variables).

plotImpPreds(bant.rf, bantData.df, "species", max.vars = 4)

2.5 Predict

The goal of building an acoustic classifier is to ultimately apply this classifier to novel data. It is critical to understand that we should apply our BANTER classifier to data collected in the same manner. All variables (detectors, detector measures, event-level variables) must also be the same (with the same labels). For example, novel data collected using a different hydrophone with different sensitivity curves may result in different measurements from your original model (unless the data is calibrated). Even in the case where a classifier is applied to the appropriate data, it is wise to validate a subset of this novel data.

To run a prediction model, you must have your BANTER model, and new data. Here we will use the bant.mdl object we made previously, and apply it to the test.data provided in the BANTER package.

Predict The predict() function will apply your BANTER model to novel data and provide you with a data frame with the events used in the Event Model for predictions, and a data frame of predicted species and assignment probabilities for each event.
Thepredict() function will ignore any a priori species designation in the event and detection data, so this can be specified if it is desired to compare model predictions with other identifications, or left un-specified.

data(test.data)
predict(bant.mdl, test.data)
$events
  event.id     species duration    prop.bp   prop.dw   prop.ec bp.D.capensis
1  414_415  D.capensis 27.75000 0.33333333 0.3333333 0.3333333     0.1724000
2      380   D.delphis 29.60000 1.00000000 0.0000000 0.0000000     0.1433333
3      400 F.attenuata 30.98333 0.07936508 0.1269841 0.7936508     0.1700000
  bp.D.delphis bp.G.griseus bp.G.macrorhynchus bp.L.obliquidens
1    0.1676000    0.1844000          0.1584000        0.1566000
2    0.1233333    0.1633333          0.2433333        0.1133333
3    0.1000000    0.1540000          0.2100000        0.1980000
  bp.S.coeruleoalba dw.D.capensis dw.D.delphis dw.G.griseus dw.G.macrorhynchus
1         0.1606000       0.17900       0.1928      0.14040            0.12420
2         0.2133333       0.00000       0.0000      0.00000            0.00000
3         0.1680000       0.19875       0.2000      0.03375            0.17375
  dw.L.obliquidens dw.S.coeruleoalba ec.D.capensis ec.D.delphis ec.G.griseus
1           0.1830           0.18060        0.1716       0.1448       0.1416
2           0.0000           0.00000        0.0000       0.0000       0.0000
3           0.1675           0.22625        0.1324       0.1076       0.1420
  ec.G.macrorhynchus ec.L.obliquidens ec.O.orcinus ec.S.coeruleoalba   rate.bp
1             0.1224           0.1264       0.1454            0.1478 1.8018018
2             0.0000           0.0000       0.0000            0.0000 0.1013514
3             0.1336           0.1248       0.1912            0.1684 0.1613771
    rate.dw  rate.ec
1 1.8018018 1.801802
2 0.0000000 0.000000
3 0.2582033 1.613771

$predict.df
  event.id      predicted D.capensis D.delphis G.griseus L.obliquidens
1  414_415     D.capensis    0.40690   0.40150   0.07951       0.04105
2      380      G.griseus    0.13538   0.06263   0.49512       0.19796
3      400 S.coeruleoalba    0.13056   0.09773   0.09832       0.07196
  S.coeruleoalba    original correct
1        0.07104  D.capensis    TRUE
2        0.10891   D.delphis   FALSE
3        0.60143 F.attenuata   FALSE

$detector.freq
  detector num.events
1       bp          3
2       dw          2
3       ec          2

$validation.matrix
             predicted
original      D.capensis G.griseus S.coeruleoalba
  D.capensis           1         0              0
  D.delphis            0         1              0
  F.attenuata          0         0              1

3. Discussion

BANTER has been developed in a general manner such that it can be applied to a wide range of acoustic data (biological, anthropogenic). We have encouraged development of additional software (PAMpal) to facilitate BANTER classification of data analyzed in PAMGuard software. We encourage development of additional open source software to simplify BANTER classification of data analyzed using other signal processing software. While this classifier is easy to use, and can be powerful, we highly recommend that users examine their data and their results to ensure the data are appropriately applied. This is especially important when a classifier is applied to novel data for prediction purposes.

Acknowledgements

Many thanks to our original co-authors for their help in developing the original BANTER trial. Thoughtful reviews were provided by Anne Simonis and Marie Zahn. Funding for development of BANTER was provided by NOAA’s Advanced Sampling Technology Working Group.

References

Liaw, A. and M. Wiener. (2002) Classification and regression by randomForest. R News 2(3):18-22.

Rankin, S., Archer, F., Keating, J. L., Oswald, J. N., Oswald, M., Curtis, A. and Barlow, J. (2017) Acoustic classification of dolphins in the California Current using whistles, echolocation clicks, and burst pulses. Mar Mam Sci, 33: 520-540.doi:10.1111/mms.12381