BatlabJuliaUtils Documentation
- BatlabJuliaUtils Documentation
- Some Julia Notes
- Defaults.jl
- Plotting Helpers
- Time-Frequency Helper Functions
- Read Audio Data
- Filters
- Signal-to-Noise Ratio
- Chirp Sequences
- "Melody"
- Optimization
- Miscellaneous
Some Julia Notes
Keyword Arguments (kwargs... Notation)
If a function is as follows:
function helloworld(a; kwargs)then you can pass in any keyword arguments you want, as follows:
helloworld(2; b=3, c="hello");The use of these keyword arguments, when present, is described in the documentation.
Plotting
As Julia has some issues with plotting in Jupyter notebooks, it's recommended to use the Plotting Helpers listed here. If you use one of the built-in Julia plot functions, you can pass in the output of getplottingsettings as keyword arguments as follows:
plot(x_data, y_data; getplottingsettings("x label", "y label", "my title")...);Otherwise, the plot will show up as tiny unless you pass in the keyword argument html_output_format=:png, and the axis labels may be cut off unless you pass in left_margin=10Plots.mm and bottom_margin=10Plots.mm.
Also, if some plotting function is not producing an output in a Jupyter notebook, make sure there is no semicolon at the end of the statement.
Broadcasting Across an Array
To broadcast a one number-to-one number function across an array, you can add a dot (.) after the function name.
For instance, if you have complex array
A = [3.0 + 4.0im, 5.0 + 2.3im, 3.4 + 0.9im];you can get the magnitude of each element by abs.(A), which produces
1×3 Matrix{Float64}:
5.0 5.50364 3.5171MethodErrors
If a function throws a MethodError: no method matching..., where the function wants you to pass in a Matrix but you passed in a Vector, you can apply vectortomatrix to the vector before passing it into the function. If you have the opposite problem, you can use matrixtovector.
If the function takes in a single number but you want to pass in an array instead, look at Broadcasting Across an Array.
If the datatype (e.g., Int, Real, Float64, etc.) of the input data is wrong, you can cast them to the correct type:
- If
Ais an array ofInts, you can cast them toFloat64usingFloat64.(A), and cast them toRealusingReal.(A). - If
Ais an array ofFloat64s, you can cast them toInts usingInt.(round.(A))(you can replaceroundwithfloororceil, depending on how you want to round the numbers). If you omit theround, andAcontains decimal numbers, then casting them to anIntwill throw anInexactError.
-If A is an array of complex numbers but you know that the imaginary part should be zero (e.g., taking the inverse FFT of an FFT of a real array), you can do real.(A) to take the real part.
Defaults.jl
BatlabUtils/src/Defaults.jl stores default values for sampling frequencies (250 kHz for audio and 360 Hz for video), the speed of sound (344.69 m/s), default plot dimensions, and some algorithm parameters.
Plotting Helpers
BatlabJuliaUtils.getplottingsettings — Functiongetplottingsettings(xlabel::String, ylabel::String, title::String;
kwargs...) -> Dict{Symbol, Any}Builds a dictionary of keyword arguments to pass into any plotting function, populated with the provided labels and titles, default font sizes and plot dimensions, and any additional keyword arguments passed into the function.
Inputs:
xlabel: label of the x-axis.ylabel: label of the y-axis.title: plot title.kwargs...: you can pass in any other keyword arguments to specify plotting parameters or override the defaults set here.
Output:
kwargs_with_defaults: dictionary of plotting keyword arguments.
BatlabJuliaUtils.myplot — Functionmyplot(args...; kwargs...)Mimics Julia's plot function, except with defaults from getplottingsettings passed in. You can pass in any arguments you would to the regular plot function.
BatlabJuliaUtils.myplot! — Functionmyplot!(args...; kwargs...)Same as myplot, except adds to the last plot produced instead of making a new plot.
BatlabJuliaUtils.plotmicdata — Functionplotmicdata(idxs::AbstractArray{Int}, y::AbstractArray; plot_idxs=idxs,
kwargs...)Plot a time segment of audio data y, with milliseconds on the x-axis and voltage on the y-axis, and default plotting settings from getplottingsettings.
Inputs:
idxs: time indices of input audio data to plot.y: audio data, where each column is one microphone.plot_idxs(default:idxs): the x-axis labels of the plot will beaudioindextoms.(plot_idxs). Use this parameter if the x-axis labels you want don't match the value you passed in foridxs.kwargs...: you can pass in any additional keyword arguments to set plotting parameters.
plotmicdata(y::AbstractArray; plot_idxs=1:size(y, 1), kwargs...)Plot audio data, with milliseconds on the x-axis and voltage on the y-axis, and default plotting settings from getplottingsettings. Plots the full length of y.
Inputs:
y: audio data, where each column is one microphone.plot_idxs(default:1, 2, 3,...): the x-axis labels of the plot will beaudioindextoms.(plot_idxs).kwargs...: you can pass in any additional keyword arguments to set plotting parameters.
BatlabJuliaUtils.plotmicdata! — Functionplotmicdata!(idxs::AbstractArray{Int}, y::AbstractArray; plot_idxs=idxs,
kwargs...)Same as plotmicdata(idxs, y), except adds to the last plot produced instead of making a new plot.
plotmicdata!(y::AbstractArray; plot_idxs=1:size(y, 1), kwargs...)Same as plotmicdata(y), except adds to the last plot produced instead of making a new plot.
BatlabJuliaUtils.plotfftmag — Functionplotfftmag(idxs::AbstractArray{Int}, y::AbstractArray;
fft_idxs=1:length(idxs), kwargs...)Plot the Fourier transform magnitude of a time segment of real-valued time-domain audio data y.
Inputs:
idxs: time indices of input audio data to take the Fourier transform of.y: time-domain audio data, where each column is one microphone.fft_indices(default: plot all indices): indices of the Fourier transform to plot. Use this parameter if you only want to plot some frequencies.kwargs...: you can pass in any additional keyword arguments to set plotting parameters.
plotfftmag(y::AbstractArray; fft_idxs=1:size(y, 1), kwargs...)Plot the Fourier transform magnitude of real-value time-domain audio data y. Takes the Fourier transform of the full length of y.
Inputs:
y: time-domain audio data, where each column is one microphone.fft_indices(default: plot all indices): indices of the Fourier transform to plot. Use this parameter if you only want to plot some frequencies.kwargs...: you can pass in any additional keyword arguments to set plotting parameters.
Time-Frequency Helper Functions
ColWiseFFTs
BatlabJuliaUtils.colwisefft — Functioncolwisefft(A::AbstractArray) -> Matrix{Complex}Applies the Fast fourier Transform (FFT) to each column of input matrix A. A can also be a vector, in which case it is first transformed into a one-col matrix.
BatlabJuliaUtils.colwiseifft — Functioncolwiseifft(A::AbstractArray) -> Matrix{Complex}Applies the Inverse FFT to each column of input matrix A. A can also be a vector, in which case it is first transformed into a one-col matrix.
BatlabJuliaUtils.rowwisefft — Functionrowwisefft(A::Matrix) -> Matrix{Complex}Applies the Fast fourier Transform (FFT) to each row of input matrix A.
BatlabJuliaUtils.rowwiseifft — Functionrowwiseifft(A::Matrix) -> Matrix{Complex}Applies the Inverse FFT to each row of input matrix A.
Convert Data Indices to Time or Frequency
BatlabJuliaUtils.audioindextosec — Functionaudioindextosec(index::Int; fs=250 kHz) -> RealGiven an index of the audio data, calculate the number of seconds since the beginning of the audio data.
Inputs:
index: index of audio datafs: sampling frequency, in Hertz. Default set in Defaults.jl.
Output:
- Seconds since the beginning of the audio data
BatlabJuliaUtils.audioindextoms — Functionaudioindextoms(index::Int; fs=250 kHz) -> RealGiven an index of the audio data, calculate the number of milliseconds since the beginning of the audio data.
Inputs:
index: index of audio datafs: sampling frequency, in Hertz. Default set in Defaults.jl.
Output:
- Milliseconds since the beginning of the audio data
BatlabJuliaUtils.fftindextofrequency — Functionfftindextofrequency(index::Int, N_fft::Int; fs=250 kHz) -> RealGiven an index of an Discrete Fourier Transform taken on a segment of audio datal determine what frequency (in Hz) it corresponds to.
Inputs:
index: index of the Fourier Transform.N_fft: length of the Fourier Transform, in samples.fs: sampling frequency, in Hertz. Default set in Defaults.jl.
Output:
- Frequency, in Hz, of the specified Fourier Transform index
BatlabJuliaUtils.getfftfrequencies — Functiongetfftfrequencies(N_fft::Int; fs=250 kHz) -> Array{Real}Return an array of length N_fft, where each element is the frequency, in Hz, of the corresponding index of a length-N_fft Fourier Transform of a segment of audio data.
Inputs:
N_fft: length of the Fourier Transform, in samples.fs: sampling frequency, in Hertz. Default set in Defaults.jl.
BatlabJuliaUtils.videoindextosec — Functionvideoindextosec(idx_video::Int, L_video::Int; fs_video=360) -> RealGiven an index of the video data, return the number of seconds since the start of the audio data.
The video and audio data are synchronized such that the end of the video data is simultaneous with the 8-second mark of the audio data.
Inputs:
idx_video: index of the video data.L_video: length, in frames, of the video data.fs_video: sampling rate of the video data, in Hertz. Default set in Defaults.jl.
BatlabJuliaUtils.sectovideoindex — Functionsectovideoindex(time_audio::Number, L_video::Int; fs_video=360) -> IntGiven a time (since the start of the audio data) in seconds, calculate the index of the closest video frame.
The video and audio data are synchronized such that the end of the video data is simultaneous with the 8-second mark of the audio data.
Inputs:
time_audio: time, in seconds, since the start of the audio data.L_video: length, in frames, of the video data.fs_video: sampling rate of the video data, in hertz. Default set in Defaults.jl.
BatlabJuliaUtils.getvideoslicefromtimes — Functiongetvideoslicefromtimes(location_data::Matrix{Real}, t1::Real,
t2::Real, fs_video=360) -> UnitRange{Int}, Matrix{Real}Returns the video data corresponding to time interval [t1, t2] of the audio data, where t1 and t2 are in seconds.
The two datasets are synchronized as follows: the end of the video data coincides with the 8-second mark of the audio data.
Inputs:
location_data: full centroid data, where the columns represent coordinates (x, y, z).t1: start time (inclusive) of interval, in seconds since the onset of the audio data.t2: end time (inclusive) of interval, in seconds since the onset of the audio data.fs_video(default set inDefaults.jl): sampling frequency of the centroid data, in Hertz.
Outputs:
slice_idxs: indices of the audio data corresponding to the[t1, t2]slice taken.centroid_slice: centroid data in the interval[t1, t2].
BatlabJuliaUtils.getvideodataslicefromaudioindices — Functiongetvideodataslicefromaudioindices(location_data::Matrix{Real},
t1_idx::Real, t2_idx::Real, fs_video=360, FS=250k)
-> UnitRange{Int}, Matrix{Real}Same as getvideoslicefromtimes, but takes audio data indices in lieu of times.
Inputs:
location_data: full centroid data, where the columns represent coordinates (x, y, z).t1_idx: index of the audio data (inclusive) at which to start the centroid data slice.t2_idx: index of the audio data (inclusive) at which to end the centroid data slice.fs_video(default set inDefaults.jl): sampling frequency of the centroid data, in Hertz.fs(default set inDefaults.jl): sampling frequency of the audio data, in Hertz.
Outputs: see getvideoslicefromtimes
Short-Time Fourier Transforms (STFTs) and Spectrograms
BatlabJuliaUtils.STFTwithdefaults — FunctionSTFTwithdefaults(y::AbstractArray, nfft=256,
noverlap=Int(round(nfft/2)), window=hamming(nfft), zero_pad=true)
-> Matrix{Complex}Wrapper around DSP.Periodograms.stft, which takes the short-time Fourier Transform (STFT) of a signal, with some reasonable default values set.
Inputs:
y: one-dimensional signal of which to take the STFT.nfft(default: 256): length of each window of the STFT.noverlap(default:nfft/2): overlap between adjacent STFT windows.window(default: Hamming): function multiplicatively applied to each window to reduce spectral leakage.zero_pad(default:true): if set totrue, zero-pad the beginning and end of y with (nfft-1) zeros on either end.
Output:
Sy: STFT of y. Matrix with (nfft / 2 + 1) rows, each corresponding to a different frequency andNcolumns, whereNis the number of time windows taken.
BatlabJuliaUtils.plotSTFTdb — FunctionplotSTFTdb(Sy_db::Matrix; nfft=Int(2*floor(size(Sy_db, 1))),
noverlap=Int(round(nfft/2)), crange=50, fs=250 kHz,
plotting_kwargs...)Plots spectrogram Sy_db, where Sy_db is in decibels (If Sy is the STFT of signal y, then Sy_db is 20 times log10 of the magnitude squared of Sy).
Inputs:
Sy_db: STFT, in decibels.nfft(default:2*(height of Sy_db - 1)): length of each window of the STFT. Used for axis labeling.noverlap(default:nfft/2): overlap between adjacent STFT windows. Used for axis labeling.window(default: Hamming): function multiplicatively applied to each window to reduce spectral leakage.crange(default: 50): the lowest value shown on the colorbar is the maximum value ofSy_db, minus crange. All elements ofSy_dbthat are smaller than this will show up in the spectrogram as this lowest value.fs: sampling frequency of the audio data, in Hertz. Default set in Defaults.jl.plotting_kwargs: extra keyword arguments for plotting (passed into the heatmap function).
BatlabJuliaUtils.plotSTFT — FunctionplotSTFT(Sy::Matrix{Complex}; nfft=Int(2*floor(size(Sy_db, 1))),
noverlap=Int(round(nfft/2)), crange=50, fs=250 kHz,
plotting_kwargs...)Plots the spectrogram corresponding to STFT Sy (i.e., plots a heatmap of Sy_db = 20*log10(|Sy|^2)).
Inputs:
Sy: STFT, in decibels.- See
plotSTFTdbfor the rest of the arguments.
BatlabJuliaUtils.plotSTFTtime — FunctionplotSTFTtime(y::AbstractArray; nfft=256, noverlap=Int(round(nfft/2)),
window=hamming(nfft), zero_pad=true, crange=50, fs=250 kHz,
plotting_kwargs...)Takes the STFT of y and then plots the spectrogram using plotSTFT.
Inputs:
y: time-domain signal.nfft,noverlap,window,zero_pad: seeSTFTwithdefaults.crange,fs,plotting_kwargs: seeplotSTFTdb
Read Audio Data
BatlabJuliaUtils.readmicdata — Functionreadmicdata(mat_filename::String, n_channels=4) -> Matrix{Real}Reads audio data from a MAT file into a matrix where each column represents a different microphone.
These can be produced by running matlab_utils/tdms_to_mat.m on a TDMS file with fields including Voltage_0, Voltage_1, etc.
Inputs:
mat_filename: name of a.matfile with variablesVoltage_i, where microphone index i counts up from 0, each variable is a time-series array of voltages for the corresponding microphone, and all arrays are of the same length.n_channels(default 4): number of microphones
Output:
y: N x K matrix, where N is the number of samples per microphone and K is the number of microphones. Each column corresponds to a different microphone.
Filters
BatlabJuliaUtils.movingaveragefilter — Functionmovingaverage(x::AbstractArray, half_len::Int, stride=1)
-> Matrix{Real}Applies a symmetrical moving average filter of width 2*half_len + 1 to signal x. To make the output the same length as the input and avoid edge effects, the input is padded by duplicating the first and last elements each half_len times. The output is aligned with the input: i.e., if there is a large enough local maximum at index i of the input, there will also be a local maximum at the same index of the output.
x may have multiple columns, where each column represents a different set of time-series data. In that case, the filter is applied separately to each column.
Inputs:
x: input vector, or matrix where each column is a different dataset (i.e., channel).half_len: determines the length of the filter, as described above.stride(default: 1): ifstrideis not 1, then the output is downsampled by a factor ofstride.
Output:
y: result of applying a moving average filter tox.
BatlabJuliaUtils.maxfilter — Functionmaxfilter(x::AbstractArray, half_len::Int) -> Matrix{Real}Applies a maximum filter to the input: y[n] = maximum(x[n-half_len:n+half_len]).
Input signal x may have multiple columns; each column represents a different set of time-series data.
Inputs:
x: input vector, or matrix where each column is a different dataset (i.e., channel).half_len: determines the length of the filter, as described above.
Output:
y: output of the filter, as described above.
BatlabJuliaUtils.bandpassfilter — Functionbandpassfilter(x::AbstractArray, min_Hz::Number, max_Hz::Number;
fs=250 kHz) -> AbstractArray{Real}Applies an ideal bandpass filter with cutoffs min_Hz and max_Hz to input signal x.
Input signal x may have multiple columns; each column represents a different set of time-series data
Inputs:
x: input vector, or matrix where each column is a different dataset (i.e., channel).min_Hz: lower cutoff of the filter.max_Hz: upper cutoff of the filter.fs: Sampling frequency, in Hertz. Default set in Defaults.jl.
Output:
y:x, with frequencies belowmin_Hzor abovemax_Hzzeroed out.
BatlabJuliaUtils.bandpassfilterFFT — FunctionbandpassfilterFFT(x_fft::AbstractArray, min_Hz::Number, max_Hz::Number;
fs=250 kHz) -> AbstractArraySame as bandpassfilter, except the input and output are in the frequency domain.
Inputs:
x_fft: input vector (or matrix) in the frequency domain.min_Hz: lower cutoff of the filter.max_Hz: upper cutoff of the filter.fs: Sampling frequency, in Hertz. Default set in Defaults.jl.
Output:
y_fft:x_fft, with frequencies belowmin_Hzor abovemax_Hzzeroed out.
BatlabJuliaUtils.bandpassfilterspecgram — Functionbandpassfilterspecgram(Sx::Matrix, min_Hz::Number, max_Hz::Number;
nfft=2*(size(Sx, 1)-1), fs=FS) -> AbstractArraySame as bandpassfilter, except the input and output are spectrograms.
Inputs:
Sx: input spectrogram.min_Hz: lower cutoff of the filter.max_Hz: upper cutoff of the filter.nfft(default:2*(height of Sx-1)): size of the window used to produce the spectrogram.fs: Sampling frequency, in Hertz. Default set in Defaults.jl.
Output:
Sy:Sx, with frequencies belowmin_Hzor abovemax_Hzzeroed out.
BatlabJuliaUtils.circconv — Functioncircconv(x::AbstractArray, h::AbstractArray; real_output=true) -> AbstractArrayPerform circular convolution in the frequency domain: y = iFFT(FFT(x) * FFT(h)).
Inputs:
x,h: two vectors of the same dimension, or matrtices where each column is a different channel of data.real_output(default:true): whether to take the real component of the output before returning (falseto leave the output a complex number).
Output:
- circular convolution of x and y
BatlabJuliaUtils.deconvolve — Functiondeconvolve(Y::AbstractArray, X::AbstractArray; fft_thresh=0.1)
-> AbstractArrayPerform deconvolution: i.e., given linear time-invariant system Y and input X, such that Y is X convolved with impulse response H, find H using Fourier transforms (H = iFFT(FFT(Y) ./ FFT(X))).
Optionally, zero out indices of FFT(Y) ./ FFT(X) where the magnitude of FFT(X) is above some threshold, fft_thresh.
X and Y must have the same dimensions. If they are matrices, deconvolution is applied separately to each column.
Inputs:
Y: system output; either a vector or a matrix where each column is a different data channel.X: system input; same dimensions asY.fft_thresh(default: 0): described above.
Outputs:
H: estimated impulse response; same dimensions asXandY.
Signal-to-Noise Ratio
BatlabJuliaUtils.getnoisesampleidxs — Functiongetnoisesampleidxs(mic_data::AbstractArray; window_size=200)
-> UnitRange{Int}Given audio data, find the longest segment that is just noise.
First, the algorithm divides the data into segments of length window_size. The amplitude of a window is defined as the maximum deviation of mic_data from its mean (over the window). The maximum noise level is set at twice the minimum amplitude of any window. The algorithm finds the indices of the longest section of mic_data where no window has an amplitude above the noise level.
Inputs:
mic_data: matrix of audio data, where each column is a different channel.window_size: length of the windows described above.
Outputs:
UnitRange(i.e., the datatype of the object1:10) of the indices of the longest segment of the data that is only noise.
BatlabJuliaUtils.windowedenergy — Functionwindowedenergy(x::AbstractArray, window_size::Int; window=hamming(nfft))
-> Matrix{Real}Finds the energy over sliding windows of the signal x, where the windows have stride 1 (i.e., if the first window starts at index 1, the second window starts at index 2, etc.)
Inputs:
x: vector of data, or matrix where each column is a different channel of data.window_size(default: 64): window length, in samples.window(default:ones(window_size)): vector to multiply each window by. The default,ones, multiplies each element by 1. To emphasize the center of long windows, you can use windows likehamming(window_size)from theDSPpackage.
BatlabJuliaUtils.estimatesnr — Functionestimatesnr(y::AbstractArray, noise_sample::AbstractArray, window_size=256,
window=ones(window_size)) -> Matrix{Real}Crude estimate of the signal-to-noise ratio of the input signal y. The signal and noise levels are computed by taking the energy over sliding windows (with stride 1) of y and noise_sample, respectively. The SNR is estimated as the signal level of each window, divided by the mean noise level.
Inputs:
y: input signal, which may be a vector, or a matrix where each column is a different channel.noise_sample: sample of noise, with the same number of channels asy.window_size(default: 64): window length, in samples.window(default:ones(window_size)): vector to multiply each window by. The default,ones, multiplies each element by 1. To emphasize the center of long windows, you can use windows likehamming(window_size)from theDSPpackage.
Output:
snr_est: estimated SNR, in decibels (log scale, times 20) of every timepoint iny.
Chirp Sequences
Terminology: Chirp Sequences
A chirp sequence is defined as all microphone outputs that result from a single bat vocalization.
This is a somewhat overloaded term, which can mean one of two things:
- For a single microphone, a chirp and subsequent echos. We can call this a "single-mic chirp sequence".
- The chirp and subsequent echos, but for all microphones that picked up the chirp. We can call this a "multi-mic chirp sequence".
BatlabJuliaUtils.estimatebuzzphase — Functionfunction estimatebuzzphase(snr::AbstractArray;
buzz_phase_chirp_separation_ms=20) -> Matrix{Int}Makes a very rudimentary estimate of chirp onsets, as heard by microphones, and then finds all times where chirp onsets are less than buzz_phase_chirp_separation_ms apart, for each microphone. It then prints out these time windows in descending order of length (these are the potential buzz phase times).
Note that can also classify some echos as chirp sequences, so expect there to be false positives.
Inputs:
snr: output ofestimatesnr.buzz_phase_chirp_separation_ms(default: 20): maximum number of milliseconds expected between onsets of consecutive buzz phase chirps.
BatlabJuliaUtils.ChirpSequence — Typestruct ChirpSequence
start_idx::Int
length::Int
vocalization_time_ms::Real
snr_data::Vector{Real}
mic_data::Vector{Real}
mic_num::Int
endDatastructure to store individual chirp sequences (for a single microphone).
Fields:
start_idx: start of the chirp, in number of samples since the beginning of the audio data that this chirp sequence comes from (i.e., the MAT data that was initially read in).length: length, in audio samples, of the chirp sequence.vocalization_time_ms: time (since the beginning of the audio data) that the bat made the vocalization. Estimated using the centroid data.snr_data: vector of estimated SNR values over the duration of the chirp sequence. Produced byestimatesnr.mic_data: audio data for the given microphone, over the duration of the chirp sequence.mic_num: which microphone (from 1 to 4) the data corresponds to.
BatlabJuliaUtils.getboundsfromboxes — Functiongetboundsfromboxes(boxes::AbstractArray;
filter_fn=(start_idx, stop_idx) -> true) -> Matrix{Int}Given bitarray boxes (as a column vector), return a matrix where the first column is the start indices of the sections where boxes==1, and the second column is the end indices of those sections.
Optionally, only keep boxes where filter_fn returns true.
Inputs:
boxes: one-dimensional bitarray.filter_fn(default: always returntrue): function that takes in the start and end indices of a region whereboxes==1and returnsfalsefor boxes to discard.
Output:
bounds: two-column matrix, as described above.
BatlabJuliaUtils.findhighsnrregions — Functionfindhighsnrregions(snr::AbstractArray, signal_thresh::Number,
min_peak_thresh::Number, maxfilter_length::Int;
snr_drop_thresh=20, peak_snr_thresh_radius=1500)-> BitArrayGiven the estimated SNR of the audio data (from the function estimatesnr), determine which regions are likely to contain chirp sequences, as follows:
- Apply a
maxfilterto thesnrarray: this helps us find contiguous regions with high SNR. - Find all sections where the maxfiltered SNR is above
signal_thresh. - Of those sections, keep the ones where, at some point, the SNR goes above another, higher threshold.
The threshold in step 3 above is set as follows:
- We look around in a range of
peak_snr_thresh_radiusaround the "signal region" and we find the maximum SNR present in that range. - We require that the peak SNR of the "signal region" be at most
snr_drop_threshdB lower than that maximum value. - We also require that the peak SNR be no lower than
min_peak_thresh.
Inputs:
snr: estimated SNR of the audio data, produced byestimatesnr, either the full matrix or one column.signal_thresh,min_peak_thresh,snr_drop_thresh,peak_snr_thresh_radius: thresholds, descsribed above.maxfilter_length: half-length, in samples, of the maximum filter.
Output:
high_snr_locations: bitarray that is1in regions that likely contain chirp sequences.
BatlabJuliaUtils.findhighsnrregionidxs — Functionfindroughchirpsequencebounds(snr::AbstractArray, mic::Int,
signal_thresh::Number, peak_thresh::Number, maxfilter_length::Int)
-> Matrix{Int}For a single microphone/channel, converts the output of findhighsnrregions to a two-column matrix, where the first column is the start index of each presumed chirp sequence.
Inputs:
snr: estimated SNR of the audio data, produced byestimatesnr(full matrix).signal_thresh,peak_thresh,snr_drop_thresh,peak_snr_thresh_radius: thresholds, described infindhighsnrregions.maxfilter_length: half-length, in samples, of the maximum filter.
Output: described above.
BatlabJuliaUtils.adjusthighsnridxs — Functionadjusthighsnridxs(snr::AbstractArray, mic::Int,
rough_bounds::Vector{Int}, max_end_idx::Int; tail_snr_thresh::Real,
max_seq_len=MAX_SEQUENCE_LENGTH, maxfilter_seq_end=50) -> Vector{Int}Given rough bounds for a single chirp sequence (one row of the output of findhighsnrregionidxs), adjust the end index to ensure that the chirp sequence isn't cut off early.
The process is similar to findhighsnrregions, but with more lenient thresholds.
- Apply a
maxfilterto thesnrarray, with a filter size that is ideally longer than the one used forfindhighsnrregionidxs. - Find the first index of the maxfiltered SNRs that goes below
tail_snr_thresh, a threshold lower than the one used forfindhighsnrregionidxs, and sets the end index of the chirp sequence to this. If this index is beyondmax_end_idx, then the end of the chirp sequence is set tomax_end_idx.
Inputs:
snr: estimated SNR of the audio data, produced byestimatesnr(full matrix).mic: microphone number, from 1 to 4.rough_bounds: row of the matrix produced byfindhighsnrregionidxs.max_end_idx: the start of the next chirp sequence, or the end of the signal if this is the last chirp sequence for a particular microphone.tail_snr_thresh: described above.max_seq_lenmaximum length of the chirp sequence. Default set inDefaults.jl.maxfilter_length(default: 50): half-length, in samples, of the maximum filter.
Output:
refined_bounds: two-element vector, where the first element is the start of the chirp sequence (unchanged), and the second element is the
BatlabJuliaUtils.getvocalizationtimems — Functiongetvocalizationtimems(chirp_start_index::Int, mic::Int,
location_data::Matrix{Real}, mic_positions::Matrix{Real};
buffer_time_ms=100, fs_video=360,
interp_type=QuadraticInterpolation) -> RealGiven the index of the audio data at which a chirp sequence starts, estimate the time that the bat made a vocalization, in milliseconds since the start of the audio data.
This is achieved by taking a slice of the centroid data of radius buffer_time_ms around the time the chirp reached the microphone, performing quadratic interpolation of the centroid data over that slice, and solving for t in
distance_from_mic(t) = speed_of_sound * (time_chirp_reached_mic - t)
to get the vocalization time.
This function returns NaN (not a number) if there isn't sufficient location data to determine the vocalization time.
Inputs:
chirp_start_index: index of the audio data that the chirp sequence started.mic: microphone that heard the chirp sequence.location_data: full centroid data, where the columns represent coordinates (x, y, z).mic_positions: matrix of microphone positions, where each row is a different microphone and the columns represent coordinates (x, y, z).buffer_time_ms(default: 100): radius, in milliseconds, of the slice of centroid data to examine.fs_video(default set inDefaults.jl): sampling rate of the centroid data.interp_type(default:QuadraticInterpolation): type of interpolation (from the packageDataInterpolations).
BatlabJuliaUtils.groupchirpsequencesbystarttime — Functiongroupchirpsequencesbystarttime(
chirp_sequence_bounds_per_mic::AbstractArray{Matrix{Int}},
snr::Matrix{Real}, y::Matrix{Real}, location_data::Matrix{Real},
mic_locations::Matrix{Real}, single_chirp_snr_thresh=100,
vocalization_start_tolerance_ms=1.5, any_mic_snr_thresh=45)
-> Vector{Dict{Int, ChirpSequence}}, Vector{Real}Given start and end indices of chirp sequences, for all microphones, determine which chirp sequences came from the same initial chirp. Only keep chirp sequences arising from vocalizations heard by at least two microphones.
Inputs:
chirp_sequence_bounds_per_mic: array of[chirp sequence indices for mic 1, ..., chirp sequence indices for mic 4]. "Chirp sequence indices for mic i" refers to a two-column matrix where the first column is the start indices of each chirp sequence and the second column is the corresponding end indices.snr: estimated SNR of the audio data, produced byestimatesnr.y: matrix of audio data, where each column is a different microphone.location_data: full centroid data, where the columns represent coordinates (x, y, z).mic_positions: matrix of microphone positions, where each row is a different microphone and the columns represent coordinates (x, y, z).single_mic_snr_thresh(default: 100): if a chirp sequence only has data from one microphone, still store the chirp sequence if it has a SNR over this value.any_mic_snr_thresh(default: 45): if none of the microphones have an SNR above this threshold, then we probably picked up an echo instead of a chirp, which should be disregarded.vocalization_start_tolerance_ms(default: 1.5): if the estimated vocalization time for two chirp sequences (for different microphones) is withinvocalization_start_tolerance_msmilliseconds, then they are considered to be from the same vocalization.
Output:
chirp_sequences: example form[{1 -> ChirpSequences(...), 2 -> ChirpSequence(...)}, {2-> ChirpSequence(...), 4 -> ChirpSequence(...), 3 -> ChirpSequence(...)}, ... ]Each element of thechirp_sequencesvector corresponds to all chirp sequences arising from a single vocalization. This is represented by a dictionary mapping microphone number toChirpSequencestructure.vocalization_times: vector, where each element is the estimated vocalization time for the corresponding chirp sequence.
BatlabJuliaUtils.plotchirpsequence — Functionfunction plotchirpsequence(chirp_seq::Dict{Int, ChirpSequence};
plot_separate=false, plot_spectrogram=false, same_length=true,
n_cols=true)Plots a chirp sequence on one of three formats:
- If
plot_separateandplot_spectrogramare bothfalse, it plots the data from all microphones in the same plot. - If
plot_spectrogramistrue, then it plots the spectrogram of the chirp sequence for each microphone (in separate plots). - Otherwise, if
plot_separateistrue, it plots the time-domain waveforms for each microphone (in separate plots).
Inputs:
chirp_seq_all_mics: dictionary mapping microphone number to aChirpSequenceobject.plot_separate,plot_spectrogram: descibed above.same_length(default: true): zero-pad shorter chirp sequences at the end so that all chirp sequences are the same length. Only relevant forplot_separateorplot_spectrogram.n_cols (default: 2): number of plots per row.
BatlabJuliaUtils.plotchirpsequenceboxes — Functionplotchirpsequenceboxes(start_ms::Real, stop_ms::Real,
vocalization_times::Array, chirp_sequences::Array{Dict{Int, ChirpSequence}},
y::Matrix{Real}, mics=1:size(y, 2))Plots audio data (specified by matrix y) from start_ms to stop_ms milliseconds, with boxes around all chirp sequences in that interval. Estimated vocalization times are written above all boxes.
Inputs:
start_ms: start time of the plot, in milliseconds.stop_ms: stop time of the plot, in milliseconds.vocalization_times: list of vocalization times output bygroupchirpsequencesbystarttime.chirp_sequences: list of mappings from microphone toChirpSequenceobject produced bygroupchirpsequencesbystarttime.y: matrix of microphone data, where each column is a different microphone.mics(default: all): microphones for which to plot chirp sequences.
"Melody"
We define the "melody" of the vocalization as the fundamental harmonic of the chirp.
This section contains documentation for estimating the melody, as well as some basic methods for separating chirps from echos.
BatlabJuliaUtils.findmelody — Functionfindmelody(chirp_seq_single_mic::ChirpSequence, peak_snr_thresh::Real;
find_highest_snr_in_first_ms=1, nfft=256,
bandpass_filter=(20_000, 100_000), maximum_melody_slope=5)
-> Vector{Int}Given a chirp sequence (for a single microphone), estimate the "melody" (i.e., the fundamental harmonic) of the vocalization using the spectrogram.
The melody is traced as follows:
- Find the time in the first
find_highest_snr_in_first_msmilliseconds of the chirp with the highest SNR, and find the melody of the chirp at that point (here, the SNR is hopefully high enough to accurately estimate the melody). - Work backwards until the beginning of the chirp, at each index looking for the strongest frequency within some small range of the last frequency found. This range is determined by the parameter
maximum_melody_slope. - Repeat, but this time work towards the end of the chirp. To avoid picking up echos, enforce that, once the slope of the melody (with respect to time) becomes negative, it can never become positive.
After tracing the melody, we need to see if we found the fundamental harmonic or some higher harmonic. This is done by taking the loudest part of the melody and dividing the frequency by 2, 3, etc. until we go below 20 kHz. The fundamental harmonic is the lowest such frequency with power at most melody_drop_thresh_db below the loudest harmonic.
The melody is computed in terms of Fourier transform indices. To find the melody in Hertz, use the fftindextofrequency function or use findmelodyhertz instead.
Inputs:
chirp_seq_single_mic: input chirp sequence object (for a single microphone).peak_snr_thresh: threshold previously set for the peak SNR of a chirp sequence.find_highest_snr_in_first_ms(default: 1): described in step 1 above.nfft(default: 256): window size for computing the spectrogram.bandpass_filter(default:(20_000, 100_000)): before computing the melody, a band-pass filter is applied to the spectrogram.bandpass_filteris a tuple of the (lower cutoff, upper cutoff), in Hertz.maximum_melody_slope(default: 5): maximum amount, in Fourier transform indices, that that the melody is allowed to change from one index to the next.melody_drop_thresh_db(default: 20): used to find the fundamental harmonic; described above.
Output:
melody: described above.
BatlabJuliaUtils.findmelodyhertz — Functionfindmelodyhertz(chirp_seq_single_mic::ChirpSequence, peak_snr_thresh::Real;
nfft=256, melody_keyword_arguments...) -> Vector{Real}Same as findmelody, but converts the result to Hertz.
BatlabJuliaUtils.getharmonic — Functiongetharmonic(chirp_seq_single_mic::ChirpSequence, melody::AbstractArray{Int},
harmonic_num::Int; nfft=256, band_size=2) -> Vector{Int}Given the output of findmelody for chirp_seq_single_mic, find the harmonic given by harmonic_num. This is done by searching for the strongest frequencies in a region of radius band_size around harmonic_num times the melody.
Inputs:
chirp_seq_single_mic: input chirp sequence object (for a single microphone).melody: output offindmelody.harmonic_num: which harmonic to find (e.g., 2, 3, etc.)nfft(default: 256): window size for computing the spectrogram.band_size(default: 2): described above.
Outputs:
harmonic: array that contains the estimated harmonic, in Fourier transform indices.
BatlabJuliaUtils.smoothmelody — Functionsmoothmelody(melody::AbstractArray{Int}; filter_size=64)Smooths a melody by produced by findmelody by applying a moving average filter.
Inputs:
melody: output offindmelody.filter_size(default: 64): kernel size of the moving average filter.
Output:
melody, with a moving average filter applied, with each element rounded to the nearest integer.
BatlabJuliaUtils.estimatechirpbounds — Function estimatechirpbounds(chirp_seq_single_mic::ChirpSequence,
melody::Vector{Int}, peak_snr_thresh::Real; nfft=256,
bandpass_filter=(20_000, 100_000),melody_drop_thresh_db=20, melody_thresh_db_low=-20,moving_avg_size=10) -> IntGiven a chirp sequence object (from a single mic) and the melody estimated by findmelody, estimate the end index of the chirp (i.e., separate the chirp from the echos) as follows:
- Find the point where the melody is the strongest.
- Find the first index, after this point, where the melody strength drops over
melody_drop_thresh_dbdecibels from its peak value (if this cutoff value is belowmelody_thresh_db_low, we instead find where the melody strength goes belowmelody_thresh_db_low).- If the melody strength never drops below this threshold, then just return the last index of the chirp sequence.
- Apply a moving average filter to the melody strength.
- Apply the following heuristic:
- The end of the chirp is the first local minimum of the melody strength after the index from step 2, or the first time the melody strength dips below
melody_thresh_db_low, whichever comes first. - If neither event happens, return the last index of the chirp sequence.
- The end of the chirp is the first local minimum of the melody strength after the index from step 2, or the first time the melody strength dips below
Inputs:
chirp_seq_single_mic: input chirp sequence object (for a single microphone).melody: result offindmelody.peak_snr_thresh: threshold set for the peak SNR of a chirp sequence.use_second_harmonic_if_melody_starts_below(default: 35,000): if the beginning of the melody, in Hertz, is below this number, then the end probably goes below the range of the microphone. So, we want to check if using the second harmonic leads to a longer chip and, if so, use the second harmonic to estimate the chirp onset and offset.nfft(default: 256): window size for computing the spectrogram.bandpass_filter(default:(20_000, 100_000)): before computing the melody, a band-pass filter is applied to the spectrogram.bandpass_filteris a tuple of the (lower cutoff, upper cutoff), in Hertz.melody_drop_thresh_db,melody_thresh_db_low(defaults: 20, -20): described in step 2 above.melody_drop_thresh_db_start(default: 35): the start index of the chirp is computed as the first index where the melody strength is at most this amount lower than its maximum value.moving_avg_size: radius of the moving average filter from step 4 above.
Output:
chirp_start_index: estimated start index of the chirp, in indices since the start of the chirp sequence.chirp_end_index: estimated end index of the chirp, in indices since the start of the chirp sequence.
BatlabJuliaUtils.getchirpstartandendindices — Functiongetchirpstartandendindices(
chirp_sequence_all_mics::Dict{Int, ChirpSequence}, peak_snr_thresh::Real;
chirp_kwargs...) -> Dict{Int, Int}, Dict{Int, Int}Given a dictionary mapping microphones to ChirpSequence objects, run findmelody and estimatechirpbounds for each ChirpSequence and return two dictionaries mapping microphones to chirp start indices and chirp end indices, respectively. Indices are calculated in samples since the start of the chirp sequence (i.e., since the beginning of the mic_data array of the ChirpSequence object).
Inputs:
chirp_seq_all_mics: mapping of microphone index toChirpSequenceobject.peak_snr_thresh: threshold set for the peak SNR of a chirp sequence.chirp_kwargs: you can additionally pass in any keyword arguments for
findmelody and/or estimate_chirp_bounds.
Outputs:
chirp_starts: dictionary mapping microphone indices to their respective chirp start indices (in samples since the start of the chirp sequence).chirp_ends: dictionary mapping microphone indices to their respective chirp end indices (in samples since the start of the chirp sequence).
BatlabJuliaUtils.plotmelody — Functionplotmelody(chirp_seq_single_mic::ChirpSequence, melody::Vector{Int},
chirp_end=nothing; nfft=256, bandpass_filter=(20_000, 100_000),
melody_color="blue", end_color="cyan")Plots the melody estimated by findmelody, overlayed on the spectrogram of the chirp sequence. Optionally, plot a vertical line at the estimated end of the chirp sequence.
Inputs:
chirp_seq_single_mic: input chirp sequence object (for a single microphone).melody: result offindmelody.chirp_end(default:nothing): optionally, result ofestimatechirpend. Ifchirp_endis nothing, then no vertical line is plotted.nfft(default: 256): window size for computing the spectrogram.bandpass_filter(default:(20_000, 100_000)): before computing the melody, a band-pass filter is applied to the spectrogram.bandpass_filteris a tuple of the (lower cutoff, upper cutoff), in Hertz.- melody_color (default: "blue"): color used to plot the melody.
- end_color (default: "cyan"): color used to plot the vertical line at the end of the chirp sequence.
BatlabJuliaUtils.plotmelodydb — Functionplotmelodydb(chirp_seq_single_mic::ChirpSequence, melody::Vector{Int};
nfft=256, bandpass_filter=(20_000, 100_000))Plots the strength of the melody estimated by findmelody, in decibels.
Inputs:
chirp_seq_single_mic: input chirp sequence object (for a single microphone).melody: result offindmelody.nfft(default: 256): window size for computing the spectrogram.bandpass_filter(default:(20_000, 100_000)): before computing the melody, a band-pass filter is applied to the spectrogram.bandpass_filteris a tuple of the (lower cutoff, upper cutoff), in Hertz.
BatlabJuliaUtils.estimatechirp — Functionestimatechirp(chirp_seq_single_mic::ChirpSequence, peak_snr_thresh::Real;
chirp_kwargs...) -> Int, Vector{Real}Use estimatechirpend to separate the chirp from the echos (for a single ChirpSequence object) by returning the chirp sequence up until the estimated end index.
Inputs:
chirp_seq_single_mic: input chirp sequence object (for a single microphone).peak_snr_thresh: threshold set for the peak SNR of a chirp sequence.chirp_kwargs: you can additionally pass in any keyword arguments forfindmelodyand/orestimate_chirp_bounds.
Output:
chirp_start: index of the chirp start, in samples since the beginning of the chirp sequence.chirp_est: audio data of the estimated chirp.
BatlabJuliaUtils.plotestimatedchirps — Functionplotestimatedchirps(chirp_seq_all_mics::Dict{Int, ChirpSequence},
peak_snr_thresh::Real; nfft=256, bandpass_filter=(20_000, 100_000),
maximum_melody_slope=5, melody_drop_thresh_db=20,
melody_thresh_db_low=-20, moving_avg_size=10)Plots the result of estimatechirp, for every microphone present in chirp_seq_all_mics, by plotting the estimated chirp on top of the chirp sequence.
Inputs:
chirp_seq_all_mics: mapping of microphone index toChirpSequenceobject.same_length(default:true): whether to zero-pad the ends of chirp sequences so that all of them are the same length.- Rest of the arguments: see
estimatechirp.
BatlabJuliaUtils.computemelodyoffsets — Functioncomputemelodyoffsets(chirp_sequence_all_mics::Dict{Int, ChirpSequence},
peak_snr_thresh::Real; max_offset=500, max_negative_offset=100,
tolerance=1, chirp_kwargs...)Sometimes, especially for noisy data, the beginnings of the chirps get cut off. This function takes in data from all (high-snr) microphones for a chirp sequence and estimates how many samples were cut off from the beginning of each chirp.
It does this by:
- Finding the mic with the highest SNR. This mic will be used as reference.
- Finding the melody and chirp lengths for each microphone.
- Shifting the chirps for the non-reference microphones forward and backward in time until we find the shift that minimizes the distance between that chirp's melody and the reference microphone's melody.
This ensures that we can "align" all of the chirps.
Inputs:
chirp_seq_all_mics: mapping of microphone index toChirpSequenceobject.peak_snr_thresh: threshold set for the peak SNR of a chirp sequence.max_offset(default: 500): maximum shift forward to try, in audio samples (i.e., if the beginning of the chirp was cut off).max_negative_offset(default: 100): maximum shift backward to try, in audio samples (i.e., if there is some noise picked up before the beginning of the actual chirp). This should be relatively small to avoid erroneously cutting off the beginnings of chirps.tolerance(default: 1): if the offset found in step 3 does not decrease the mean absolute error of the difference between the reference melody and the given microphone's melody by at leasttolerance, then just set the offset to zero.chirp_kwargs: you can additionally pass in any keyword arguments forfindmelodyand/orestimate_chirp_bounds.
Output:
offsets: dictionary mapping microphone number to the best shift, in samples of audio data, to the number of samples cut off from the beginning of that microphone's chirp sequence. If the number of samples is negative, this corresponds to extra noise at the beginning of the chirp sequence.
BatlabJuliaUtils.plotoffsetchirps — Functionfunction plotoffsetchirps(chirpseqall_mics::Dict{Int, ChirpSequence}, offsets::Dict{Int, Int})
Plot a chirp sequence, where the data from each microphone is shifted by the value found in computemelodyoffsets. Plots spectrograms in a vertical layout so that you can visially see if the chirps are aligned with each other.
Inputs:
chirp_seq_all_mics: mapping of microphone index toChirpSequenceobject.offsets: output ofcomputemelodyoffsets.start_idxs: dictionary mapping microphone indices to their respective chirp start indices (in samples since the start of the chirp sequence). You can get this usinggetchirpstartandendindices.
BatlabJuliaUtils.separatechirpkwargs — Functionseparatechirpkwargs(;chirp_kwargs...) -> Dict{Symbol, Any},
Dict{Symbol, Any},
Dict{Symbol, Any}Given keyword arguments from a combination of findmelody, estimatechirpbounds, and computemelodyoffsets, separate them into three dictionaries: one for each of those functions.
Optimization
This section contains helper methods for performing blind deconvolution on a chirp sequence to estimate the initial bat vocalization.
BatlabJuliaUtils.colwisenormalize — Functioncolwisenormalize(Y::Matrix) -> MatrixNormalizes each column of matrix Y such that each column has an amplitude (i.e., maximum absolute value) of 1.
Inputs:
Y: matrix, where each column is a different channel of data.
Outputs:
Y, with each column divided by its amplitude.
BatlabJuliaUtils.getchirpsequenceY — FunctiongetchirpsequenceY(chirp_seq_all_mics::Dict{Int, ChirpSequence},
offsets::Dict{Int, Int}, pad_len::Int; normalize=true)
-> Matrix{Real}, Vector{Int}Given chirp sequence objects (for all microphones) corresponding to the same initial vocalization, produce a matrix of microphone data, where each column is a different microphone. The microphone data is zero-padded at the beginning and the end by pad_len zeros. In addition, if the beginning of any chirp was cut off (as determined by offsets, which is the output of computemelodyoffsets), the beginning is zero-padded by the number of samples that were cut off. By default, each column of the output is normalized to have amplitude 1.
Inputs:
chirp_seq_all_mics: mapping of microphone index toChirpSequenceobject.offsets: output ofcomputemelodyoffsets; mapping of microphone index to how many samples were cut off (if any) at the beginning of the chirp.pad_len: how many zeros to add to the beginning and end of the mic data. This is used to mitigate edge effects from circular convolution.normalize(default:true): whether to ensure each column of the output has amplitude1.
Outputs:
Y: L x K matrix, where L is the length of the longest chirp sequene inchirp_seq_all_mics, plus2*pad_lenandKis the number of microphones used.mics: list of microphone number, where the microphone number at each index is the microphone used for the corresponding column ofY.
getchirpsequenceY(chirp_seq_all_mics::Dict{Int, ChirpSequence},
peak_snr_thresh::Real, pad_len::Int; normalize=true,
offset_kwargs...) -> Matrix{Real}, Vector{Int}Like the above function (getchirpsequenceY(chirp_seq_all_mics, offsets, pad_len, normalize)), except the offsets are automatically computed using computemelodyoffsets.
Inputs:
chirp_seq_all_mics: mapping of microphone index toChirpSequenceobject.peak_snr_thresh: threshold set for the peak SNR of a chirp sequence.pad_len: how many zeros to add to the beginning and end of the mic data. This is used to mitigate edge effects from circular convolution.normalize(default:true): whether to ensure each column of the output has amplitude1.offset_kwargs: optionally, you can pass in any keyword arguments forcomputemelodyoffsets.
Outputs:
Y: L x K matrix, where L is the length of the longest chirp sequene inchirp_seq_all_mics, plus2*pad_lenandKis the number of microphones used.mics: list of microphone number, where the microphone number at each index is the microphone used for the corresponding column ofY.
getchirpsequenceY(chirp_seq_all_mics::Dict{Int, ChirpSequence},
pad_len::Int; normalize=true) -> Matrix{Real}, Vector{Int}Like the above function (getchirpsequenceY(chirp_seq_all_mics, offsets, pad_len, normalize)), except the offsets are all set to zero.
Inputs
chirp_seq_all_mics: mapping of microphone index toChirpSequenceobject.pad_len: how many zeros to add to the beginning and end of the mic data. This is used to mitigate edge effects from circular convolution.normalize(default:true): whether to ensure each column of the output has amplitude1.
Outputs:
Y: L x K matrix, where L is the length of the longest chirp sequene inchirp_seq_all_mics, plus2*pad_lenandKis the number of microphones used.mics: list of microphone number, where the microphone number at each index is the microphone used for the corresponding column ofY.
BatlabJuliaUtils.getinitialconditionsnr — Functiongetinitialconditionsnr(Y::AbstractArray,
chirp_seq_all_mics::Dict{Int, ChirpSequence}, mics::Vector{Int},
peak_snr_thresh::Real; h_fft_thresh=0.1, nfft=256, chirp_kwargs...)
-> Vector{Real}, Matrix{Real}, IntProduces an initial condition for the blind deconvolution algorithm (i.e.: an estimate of the bat chirp and the impulse responses mapping the bat chirp to the audio output of each microphone). It uses the chirp estimate (from estimatechirp) for the highest-SNR microphone as the estimate of the bat chirp, and performs Fourier deconvolution (see the deconvolve function) to get the impulse responses.
This is the preferred method of obtaining the initial condition for noisy data.
Inputs:
Y: matrix of audio data, produced bygetchirpsequenceY.chirp_seq_all_mics: mapping of microphone index toChirpSequenceobject.mics: list of microphones used, produced bygetchirpsequenceY.peak_snr_thresh: threshold set for the peak SNR of a chirp sequence.h_fft_thresh(default: 0.1): the Fourier transform ofH_initis set to zero at indices where the FFT ofX_initis below this threshold.nfft: window size of the STFT used to get the melody.chirp_kwargs: you can pass in any keywords forestimatechirp.
Outputs:
X_init: estimated bat vocalization.H_init: matrix of estimated impulse responses, the k-th column ofH_init, convolved withX_init, produces the k-th column ofY.longest_chirp: length of the longest estimated chirp.
BatlabJuliaUtils.getinitialconditionsparsity — Functiongetinitialconditionsparsity(Y::Matrix{Real},
chirp_seq_all_mics::Dict{Int, ChirpSequence}, mics::Vector{Int},
peak_snr_thresh::Real; h_fft_thresh=0.1, nfft=256, chirp_kwargs...)
-> Vector{Real}, Matrix{Real}, IntProduces an initial condition for the blind deconvolution algorithm (i.e.: an estimate of the bat chirp and the impulse responses mapping the bat chirp to the audio output of each microphone). It uses the chirp estimate (from estimatechirp) for one of the microphones as the estimate of the bat chirp, and performs Fourier deconvolution (see the deconvolve function) to get the impulse responses. The chirp is estimated using whichever microphone produces the sparserst impulse response.
This is not recommended for noisy data; getinitialconditionsnr is better for that case.
Inputs:
Y: matrix of audio data, produced bygetchirpsequenceY.chirp_seq_all_mics: mapping of microphone index toChirpSequenceobject.mics: list of microphones used, produced bygetchirpsequenceY.peak_snr_thresh: threshold set for the peak SNR of a chirp sequence.h_fft_thresh(default: 0.1): the Fourier transform ofH_initis set to zero at indices where the FFT ofX_initis below this threshold.nfft: window size of the STFT used to get the melody.chirp_kwargs: you can pass in any keywords forestimatechirp.
Outputs:
X_init: estimated bat vocalization.nfft/2H_init: matrix of estimated impulse responses, the k-th column ofH_init, convolved withX_init, produces the k-th column ofY.longest_chirp: length of the longest estimated chirp.
BatlabJuliaUtils.getmelodyregularization — Functionfunction getmelodyregularization(chirpseqallmics::Dict{Int, ChirpSequence}, N::Int, peaksnrthresh::Real; melodyradius = 2, nfft=256, stftstride=Int(floor(nfft/4)), maxfreqhz=100000, chirp_kwargs...) -> Matrix
Produces a weight matrix (of the same dimensions of the STFT of a bat vocalization, where the length of the vocalization is length of the longest estimated chirp for chirp_seq_all_mics) that quantifies how far any part of the spectrogram is from the melody or one of its harmonics.
Details of the algorithm are as follows:
Initialize
Mx2, the element-wise square of the weight matrix, to be all infinity.For each microphone in
chirp_seq_all_mics: -Find the melody, chirp start/end indices, and whether the beginning of the chirp was cut off.Loop through all harmonics under
max_freq_hz:Find the distance of each point on the spectrogram to some range around the melody. The radius of this range is either,
melody_radius, which is found by linearly interpolatingmelody_radius_startandmelody_radius_end, or the slope of the melody at the given point, whichever is larger.This provides some leeway in where the actual melody falls, where the leeway is determined by how fast the melody is changing.
Set
Mx2to the element-wise minimum of its current value and the squared distance from step i.
Truncate
Mx2to the maximum estimated chirp length.Downsample
Mx2by a factor ofstft_stride, setting each index of the downsampled version to the minimum value in a time radius ofnfft/2.
Inputs:
chirp_seq_all_mics: mapping of microphone index toChirpSequenceobject.N: upper bound on the length of the bat vocalization.peak_snr_thresh: threshold set for the peak SNR of a chirp sequence.melody_radius_start,melody_radius_end(defaults: 6, 1): described in step 2.b.i above.nfft(default: 256): window length to use for the STFT.stft_stride(default:nfft/4): amount to downsample in step 4 above.max_freq_hz(default: 100k): used to find the highest possible harmonic.chirp_kwargs: optionally, you can pass in any keyward arguments forfindmelody,estimatechirpbounds,computemelodyoffsets, orgetmelodyregularization. See those functions for more detauls
Outputs:
Mx2: weight matrix, described above.max_chirp_len: maximum chirp length fromestimatechirp.
BatlabJuliaUtils.optimizePALM — FunctionoptimizePALM(chirp_seq::Dict{Int, ChirpSequence}, Y::Matrix{Real},
H_init::Matrix{Real}, X_init::Array{Real}, peak_snr_thresh::Real,
data_fitting_weight::Real, sparsity_weight::Real, melody_weight::Real;
max_iter=1000, alpha=1e-3, gamma_H=1, gamma_X=1, num_debug=10, nfft=256,
stft_stride=Int(floor(nfft/4)), chirp_kwargs...) -> Matrix, Matrix, IntPerforms blind deconvolution, formulated as an optimization problem with a combination of the following objectives:
- Data-fitting: making sure
X, convolved with any column ofH, the matrix of impulse responses, is close to the corresponding column ofY, the microphone data marix produced bygetchirpsequenceY. - Impulse response sparsity: encouraging sparsity of
H, measured by the absolute sum (L1 norm) of the elements ofH. - Melody following: making sure the energy of the spectrogram of
Xis close to the melody or one of its harmonics. Seegetmelodyregularizationfor more details.
This optimization problem is solved using PALM (proximal alternating linearized optimization: paper), following the general process outlined in this paper.
Inputs:
chirp_seq: mapping of microphone index toChirpSequenceobject.Y: matrix of microphone data; output ofgetchirpsequenceY.H_init: initial condition for the impulse responses; output ofgetinitialconditionsnrorgetinitialconditionsparsity.X_init: initial condition for the bat vocalization. Produced by the same function asH_init.peak_snr_thresh: threshold set for the peak SNR of a chirp sequence.data_fitting_weight: weight of the data-fitting term in the optimization objective. Can be thought of as a percentage of how much that term matters (60 is a generally good value).sparsity_weight: weight of the sparsity term in the objective. 10 is a generally good value.melody_weight: weight of the melody-following term in the objective. 50 is a generally good value.max_iter(default: 1000): number of iterations to run the algorithm for.alpha(default: 0.001): the absolute sum of the elements ofHis approximated using the smooth function|x| ≈ sqrt(x^2 + alpha^2) - alpha, wherealphais some small numbergamma_H(default: 1): factor to slow down updates ofH(must be at least 1, or the algorithm might not converge).gamma_X(default: 1): factor to slow down updates ofX(must be at least 1, or the algorithm might not converge).num_debug(default: 10): number of times to print debug statements over the course of the optimization algorithm.nfft(default: 256): window length to use for the spectrogram ofX.stft_stride(default:nfft/4): spacing between adjacent spectrogram windows.chirp_kwargs: optionally, you can pass in any keyward arguments forfindmelody,estimatechirpbounds, orcomputemelodyoffsets.
Outputs:
X: bat vocalization, as estimated by the optimization algorithm,H: matrix of impulse responses, as estimated by the optimization algorithm.max_chirp_len: maximum chirp length fromestimatechirp.
Miscellaneous
BatlabJuliaUtils.matrixtovector — Functionmatrixtovector(mat::AbstractArray)Given an input matrix, return the vector produced by stacking all of the columns together. If the input is already a vector, it remains unchanged.
Inputs:
mat: matrix to transform to a vector.
Output:
vec: e.g., ifmatis[1 2 3; 4 5 6], vec will be[1, 4, 2, 5, 3, 6].
BatlabJuliaUtils.vectortomatrix — Functionvectortomatrix(vec::AbstractArray)Given an input vector, return a matrix that consists of one column. If the input is already a matrix, it remains unchanged.
Inputs:
vec: vector to transform to a one-column matrix.
Output:
mat: single-column matrix.
BatlabJuliaUtils.randint — Functionrandint(end_idx::Int) -> IntReturns a random integer from 1 to end_idx, inclusive.
BatlabJuliaUtils.distancefrommic — Functiondistancefrommic(location_data::Vector{Real},
mic_positions::Matrix{Real}, mic_num::Int) -> RealGiven one frame of centroid data, compute the distance from mic mic_num.
Inputs:
location_data: one frame of centroid data, as a vector.mic_positions: matrix of microphone positions, where each row is a different microphone and the columns represent coordinates (x, y, z).mic_num: microphone for which to compute distances.
Output: distance from mic mic_num.
distancefrommic(location_data::Matrix{Real},
mic_positions::Matrix{Real}, mic_num::Int) -> Vector{Real}Given centroid data for multiple time points, compute the distance from mic mic_num (for each time point).
Inputs:
location_data: slice of centroid data, where the columns represent coordinates (x, y, z).mic_positions: matrix of microphone positions, where each row is a different microphone and the columns represent coordinates (x, y, z).mic_num: microphone for which to compute distances.
Output: vector of distance from mic mic_num, for each row of location_data.