Problem: Most of our events are from 2 minute long duty cycled recordings, but some are from continuous recordings and are longer than two minutes.
Solution: Let’s split the longer events into smaller 2 minute events, and we’ll add on _1, _2, etc. to the event name for events we split up
First let’s take a look at our problem. The checkStudy
function checks for possible
errors in your event data. It is run automatically after you process your data, but we
can also use it on its own. Here we can have it check if any events are longer (in seconds)
than some maximum value we specify. This is useful if your data is from duty-cycled recordings,
since no events should be longer than your recording length.
longEvents <- checkStudy(myStudy, maxLength = 120)
Looks like there are a lot, but this is because we know our dataset has a mix of duty-cycled and
continuous recordings. We want to be able to find these longer recordings and split them up
into a set of 2 minute long events instead of having one longer event. Our plan of attack is to
go through each event, find the start and end time of that event, and if that event is longer than
120 seconds break it up into a series of shorter events by using PAMpal
’s filter
function
to filter by UTC
.
First let’s write a couple helper functions that will make our code neater and easier to follow
down the line. getTimes
is going to take an AcousticEvent
and just return all the detection
times (UTC
). We’ll also keep track of the UID
just in case we need it.
getTimes <- function(event) {
# get all the detector data in this event
allDets <- getDetectorData(event)
# This goes through all of the $click, $whistle, and $cepstrum detectors if present
justTimes <- bind_rows(lapply(allDets, function(x) {
x[, c('UID', 'UTC')]
}))
justTimes
}
timeToStartEnd
is going to take in a vector of UTC
times (like from the output of
the getTimes
function we just wrote) and turn it a list of start and end times for
our new shorter events. We start at the beginning of the event and just keep adding
120 seconds until we have passed the end of the event.
timeToStartEnd <- function(time, length = 120) {
range <- range(time)
lenSecs <- as.numeric(difftime(range[2], range[1], units='secs'))
# Figure out how many events of length "length" we can have
numSplits <- ceiling(lenSecs / length)
# Each event starts some multiple of length from the original start
start <- range[1] + length * 0:(numSplits-1)
# Doesnt matter that the end is different than the actual event end
# here since we are just using these times to filter later
end <- start + length
list(start=start, end=end)
}
Now we’re ready to create our new list of shorter events. Hopefully you can follow along with the comments in the code.
# We're going to use a for loop, but first we're going to create
# a place to store our output for the new events
newEvents <- vector('list', length = length(events(myStudy)))
for(i in seq_along(newEvents)) {
# One event at a time, use our functions from earlier to get the
# start and end times for our neew times
thisEvent <- events(myStudy)[[i]]
thisTime <- getTimes(thisEvent)
thisStartEnd <- timeToStartEnd(thisTime$UTC, length=120)
# If it only made one start/end, we don't need to change anything!
if(length(thisStartEnd$start) == 1) {
# The "list" part might look weird, but its because when we are breaking
# up events we are going to be storing list sof events in each "newEvents"
# spot, and we'll plan on "unlisting" them at the end of this process
newEvents[[i]] <- list(thisEvent)
next
}
# Make a place to store each smaller event
evList <- vector('list', length = length(thisStartEnd$start))
for(s in seq_along(thisStartEnd$start)) {
# Create each smaller event by filtering the whole event
# to detections only between the start/end times we created earlier
onePart <- filter(thisEvent, UTC >= thisStartEnd$start[s],
UTC < thisStartEnd$end[s])
# Theres a chance this resulted in no detections, so if that happens
# just skip to the next one
if(is.null(onePart)) next
# We need to assign this a new event ID or we'll have a bunch of repeats
id(onePart) <- paste0(id(onePart), '_', s)
evList[[s]] <- onePart
}
newEvents[[i]] <- evList
}
# Now that we are done, unlist everything so that we have a big list of
# AcousticEvents, we'll stick those into an AcousticStudy that's a copy
# of our original so we can compare our results
newEvents <- unlist(newEvents)
names(newEvents) <- sapply(newEvents, id)
shortStudy <- myStudy
events(shortStudy) <- newEvents
Done! Let’s compare our new AcousticStudy
to what we started with to see what changed.
myStudy
shortStudy
noWarns <- checkStudy(shortStudy, maxLength = 120)
No more warnings! All our events are now under two minutes, and we’ve got over twice as many.
Problem: We marked out some detections of interest using PAMGuard’s Spectrogram Annotation module, but there isn’t an easy way to read in only these detections.
Solution: We’ll use the function readSpecAnno
from the PAMmisc package to read in
the Spectrogram Annotation tables, then use these tables as the event grouping files
for processing with mode='time'
. This will get all detections that start within our
boxed times, then we can use the filter
function to remove detections outside of
the frequency bounds of our boxes.
First let’s create a PAMpalSettings
object like normal, then we can
use readSpecAnno
to read in our Spectrogram Annotation tables.
dbFolder <- './Data/Databases'
bin <- './Data/Binary'
library(PAMpal)
pps <- PAMpalSettings(dbFolder, bin, sr_hz='auto', filterfrom_khz=10, filterto_khz=NULL, winLen_sec=.0025)
library(PAMmisc)
library(dplyr)
# Now we can get the database files out of the PPS we created
# If you named your SA table something else, provide that value in the table argument
specAnno <- bind_rows(lapply(pps@db, function(x) {
readSpecAnno(x, table='Spectrogram_Annotation')
}))
# If you only had one database, this code could also be simpler:
specAnno <- readSpecAnno(pps@db, table='Spectrogram_Annotation')
This function
will already apply most of the formatting we need to use it as a grouping file for
running processPgDetections
with mode='time'
, it gives us columns id
, start
, end
, and db
.
It will also read in all other columns, so if you had used another field to store the species
ID you could then use that to create a new column species
. You may also need to provide
the sample rate in your grouping file (see here).
# If you had the species stored somewhere, create a species column
specAnno$species <- specAnno$MySpeciesLabelColumn
# Optional, if needed
specAnno$sr <- 192e3
The data read in for each event may not end up being exactly what you expect. This will read in all detections whose start time lies between one of the annotation boxes you created, but this could be a detection that does not end within your box. It could also be a detection that is not within the frequency bounds of your box. There is also a chance that if your box bounds were not drawn very carefully you might be unintentionally excluding detections - if a detection starts slightly before your annotation box, it will not be included here. To deal with this last issue, one option is shift back the start times of your boxes by a very small amount. This could obviously also start including detections that you did not want, so be thoughtful when choosing a value (if you need one at all).
# If your boxes accidentally did not include the start of some signals, you can
# try shifitng their start times back by a small amount. Th
specAnno$start <- specAnno$start - .005
Now this is ready to pass on to processPgDetections
!
data <- processPgDetections(pps, mode='time', grouping=specAnno, id='SpecAnnoCaseStudy')
Now we can work on filtering down on our detections to only contain ones that are entirely
contained within the boxes we drew. First we’ll deal with the frequency bounds. Whistles and
GPL detections have parameters freqBeg
and freqEnd
that are in units of Hz, and clicks have
a parameter peak
that is in units of kHz (sorry for inconsistency!). The specAnno
table also
has values fmin
and fmax
(units of Hz) that are read in from PAMGuard (I’ve renamed these from
f1
and f2
in the PAMGuard database), so we can use these values to do some filtering. When
processing with mode='time'
, PAMpal will store the relevant row of your grouping file in each
AcousticEvent
’s ancillary
slot for easier access. Whistle and GPL detections also have
duration
parameter, we can use this to make sure each detection ends before our box does.
Here we’re going to loop through each event, grab the grouping info from the ancillary
slot
(which contains the fmin
and fmax
values we want), then filter by the frequency parameters
PAMpal has measured for us. We need to use a loop to do the filtering (rather than just call
filter
on data
) because each event needs different filter values.
# Creating a copy to store the filtered so we can compare the 2, you don't need to do this
filtData <- data
for(i in seq_along(events(filtData))) {
# get grouping info
thisGroup <- ancillary(filtData[[i]])$grouping
# Do filtering. Note fmin/max are in Hz, convert where appropriate
filtData[[i]] <- filter(filtData[[i]],
freqBeg > thisGroup$fmin,
freqBeg < thisGroup$fmax,
freqEnd > thisGroup$fmin,
freqEnd < thisGroup$fmax,
UTC + duration < thisGroup$end,
peak > thisGroup$fmin / 1e3,
peak < thisGroup$fmax / 1e3)
}
Now let’s use the nWhistles
function to compare how many whistles are in each event
before and after filtering.
# Comparing total numbers
nWhistles(data)
nWhistles(filtData)
# For each event
sapply(events(data), nWhistles)
# If some of these are 0, and you weren't expecting that, you may want to investigate
# those annotaitons in PAMGuard
sapply(events(filtData), nWhistles)