Filtering and data processing#
Introduction#
The xoa.filter module provides comprehensive filtering and data processing
utilities for oceanographic and atmospheric data. It includes spatial filtering
with convolution kernels, specialized temporal filters for tidal signals, mask
erosion operations, and data decimation tools.
Key features:
Spatial filtering: N-dimensional convolution with NaN handling
Temporal filtering: Tidal filters (Demerliac, Godin) for time series
Mask operations: Iterative mask erosion and coast filling
Data decimation: Intelligent undersampling based on spatial proximity
Spatial filtering with convolution#
The core of spatial filtering in xoa is the convolve() function,
which performs N-dimensional convolution while properly handling missing data (NaNs).
Basic convolution#
In [1]: import xarray as xr
In [2]: import numpy as np
In [3]: import xoa.filter
# Create sample data with some noise
In [4]: np.random.seed(42)
In [5]: data = xr.DataArray(
...: np.random.normal(size=(50, 70)),
...: dims=('y', 'x'),
...: name='temperature'
...: )
...:
# Add some missing data
In [6]: data.values[10:20, 10:20] = np.nan
# Apply spatial smoothing with a simple box kernel
In [7]: smoothed = xoa.filter.convolve(data, kernel=5, normalize=True, na_thres=0.5)
The convolve function has important parameters:
normalize: If True, divides by local sum of weights (creates weighted average)
na_thres: Controls NaN contamination tolerance (0=strict, 1=permissive)
Smoothing shortcut#
For convenience, smooth() is a shortcut that automatically
sets normalize=True:
# Equivalent to convolve with normalize=True
In [8]: smoothed = xoa.filter.smooth(data, kernel=5, na_thres=0.5)
Window functions#
The get_window_func() function provides access to various
window functions from NumPy and SciPy:
# Get a Gaussian window function
In [9]: gaussian_func = xoa.filter.get_window_func("gaussian", std=0.2)
# Get a Hamming window
In [10]: hamming_func = xoa.filter.get_window_func("hamming")
# Create custom window from array
In [11]: custom_func = xoa.filter.get_window_func([1, 2, 5, 2, 1])
Available window types include:
NumPy windows:
bartlett,blackman,hamming,hanning,kaiserSciPy windows:
gaussian,tukey,bohman,parzen, and many moreCustom arrays: Provide your own weight array
Kernel generation#
Kernels can be specified in multiple ways:
1. Simple integer (box kernel):
# 5x5 box kernel
In [12]: smoothed = xoa.filter.smooth(data, kernel=5)
2. Dictionary for anisotropic kernels:
# Different sizes along different dimensions
In [13]: kernel_spec = {'x': 7, 'y': 3}
In [14]: smoothed = xoa.filter.smooth(data, kernel=kernel_spec)
3. Dictionary with a window function:
# Gaussian window along both dimensions
In [15]: kernel_spec = {'x': 9, 'y': 5}
In [16]: smoothed = xoa.filter.smooth(data, kernel=kernel_spec,
....: kernel_kwargs=dict(window_func='gaussian'))
....:
4. Explicit arrays:
# Custom 1D weights
In [17]: kernel_spec = {'x': [1, 2, 5, 2, 1], 'y': [1, 2, 1]}
In [18]: smoothed = xoa.filter.smooth(data, kernel=kernel_spec)
Isotropic kernels#
For isotropic (radially symmetric) kernels, use generate_isotropic_kernel():
# Create 2D isotropic Gaussian kernel
In [19]: iso_kernel = xoa.filter.generate_isotropic_kernel(
....: shape=(15, 15),
....: window_func='gaussian'
....: )
....:
In [20]: print(iso_kernel.shape)
(15, 15)
Shapiro kernel#
The Shapiro kernel is a specialized filter kernel commonly used in numerical models for smoothing:
# 2D Shapiro kernel
In [21]: shapiro = xoa.filter.shapiro_kernel(('y', 'x'))
In [22]: print(shapiro.values)
[[0. 1. 0.]
[1. 2. 1.]
[0. 1. 0.]]
# Apply Shapiro smoothing
In [23]: smoothed = xoa.filter.smooth(data, kernel=shapiro)
Temporal filtering for tidal signals#
Oceanographic time series often contain tidal signals that need to be filtered out.
The xoa.filter module provides specialized tidal filters.
Demerliac filter#
The Demerliac filter is a widely-used tidal filter designed to remove tidal components while preserving lower-frequency signals:
# Create hourly time series with tidal signal
In [24]: import pandas as pd
In [25]: time = pd.date_range('2020-01-01', periods=30*24, freq='h')
# Simulate signal: trend + tidal component + noise
In [26]: t_hours = np.arange(len(time))
In [27]: signal = (
....: 0.5 * t_hours / 24 + # trend
....: 0.3 * np.sin(2 * np.pi * t_hours / 12.42) + # M2 tide
....: 0.1 * np.random.normal(size=len(time)) # noise
....: )
....:
In [28]: sea_level = xr.DataArray(
....: signal,
....: coords={'time': time},
....: dims='time',
....: name='sea_level'
....: )
....:
# Apply Demerliac filter
In [29]: filtered = xoa.filter.demerliac(sea_level, na_thres=0.2)
The Demerliac filter uses pre-defined weights optimized for hourly data. It automatically handles sub-hourly data by interpolating the weights.
Other tidal filters#
The tidal_filter() function provides access to multiple
tidal filter types:
# Godin filter (alternative tidal filter)
In [30]: godin_filtered = xoa.filter.tidal_filter(sea_level, 'godin')
# Simple moving average (24-hour)
In [31]: mean_filtered = xoa.filter.tidal_filter(sea_level, 'mean')
Available filter types:
demerliac: Demerliac tidal filter (default for
demerliac())godin: Godin tidal filter
mean: Simple 24-hour moving average
Important notes for tidal filtering:
Data must have a valid time coordinate
Time step should be hourly or sub-hourly (sub-hourly is interpolated)
Time step should be relatively constant (checked with
dt_tolparameter)The
na_thresparameter controls tolerance for missing data in the time series
Mask operations#
Mask erosion and coast filling are important operations for preparing ocean model output or observational data.
Eroding masks#
The erode_mask() function iteratively fills masked values
using smoothing:
# Create data with a masked region
In [32]: data_with_mask = data.copy()
In [33]: data_with_mask.values[20:30, 30:40] = np.nan
# Erode mask by 3 iterations
In [34]: eroded = xoa.filter.erode_mask(data_with_mask, until=3)
The erosion process:
Smooth the data (using Shapiro kernel by default)
Fill NaN values with smoothed values
Repeat for specified number of iterations
You can also erode until a specific mask is satisfied:
# Define a target mask (True = should remain masked)
In [35]: target_mask = xr.DataArray(
....: np.zeros((50, 70), dtype=bool),
....: dims=('y', 'x')
....: )
....:
In [36]: target_mask.values[25:27, 35:37] = True # Keep small region masked
# Erode until only target_mask regions remain masked
In [37]: eroded = xoa.filter.erode_mask(data_with_mask, until=target_mask)
Coast erosion#
The erode_coast() function is specialized for horizontal
(geographic) dimensions and automatically identifies X and Y dimensions:
# Create data with land mask
In [38]: lon = xr.DataArray(np.linspace(-10, 0, 70), dims='lon')
In [39]: lat = xr.DataArray(np.linspace(40, 50, 50), dims='lat')
In [40]: sst = xr.DataArray(
....: np.random.normal(15, 2, size=(50, 70)),
....: coords={'lat': lat, 'lon': lon},
....: dims=('lat', 'lon'),
....: name='sst'
....: )
....:
# Add coastal mask
In [41]: sst.values[:, :10] = np.nan # "land" on the left
# Fill coastal regions
In [42]: sst_filled = xoa.filter.erode_coast(sst, until=5)
The function automatically:
Identifies longitude and latitude dimensions using CF conventions
Uses Shapiro kernel on horizontal dimensions only
Preserves vertical or temporal dimensions
Data decimation#
When working with very dense observational data (e.g., satellite altimetry,
drifter trajectories), it’s often necessary to reduce the number of points
while preserving spatial coverage. The decimate() function
provides intelligent spatial undersampling.
Basic decimation#
# Create dense random points
In [43]: np.random.seed(123)
In [44]: npts = 500
In [45]: lons = np.random.uniform(-20, -10, npts)
In [46]: lats = np.random.uniform(40, 50, npts)
In [47]: values = np.random.normal(15, 2, npts)
In [48]: dense_data = xr.Dataset({
....: 'temp': (['npts'], values)
....: }, coords={
....: 'lon': (['npts'], lons),
....: 'lat': (['npts'], lats)
....: })
....:
# Decimate to ~50km spacing
In [49]: sparse_data = xoa.filter.decimate(dense_data, radius=50e3, method='pick')
In [50]: print(f"Original points: {npts}")
Original points: 500
In [51]: print(f"Decimated points: {sparse_data.sizes['npts']}")
Decimated points: 155
Decimation methods#
Two methods are available:
1. Pick method (method=’pick’): Simply selects points ensuring minimum spacing
In [52]: decimated_pick = xoa.filter.decimate(dense_data, radius=50e3, method='pick')
2. Average method (method=’average’): At each selected point, averages all nearby points within a radius
# Average with smoothing factor
In [53]: decimated_avg = xoa.filter.decimate(
....: dense_data,
....: radius=50e3,
....: method='average',
....: smooth_factor=1.5 # Average over 75km radius
....: )
....:
The smooth_factor parameter controls the averaging radius:
smooth_factor=0: Equivalent to ‘pick’ methodsmooth_factor=1: Average over same radius as spacingsmooth_factor>1: Larger averaging radius (smoother result)
Typical use cases#
1. Preparing data for kriging:
# Reduce dense satellite data before kriging
satellite_decimated = xoa.filter.decimate(
satellite_obs,
radius=25e3, # 25 km spacing
method='average',
smooth_factor=1.0
)
2. Visualization of dense trajectories:
# Thin drifter trajectories for clearer plotting
drifter_thinned = xoa.filter.decimate(
drifter_positions,
radius=10e3, # 10 km spacing
method='pick'
)
3. Reducing computational cost:
# Subsample before expensive geostatistical processing
data_subset = xoa.filter.decimate(
observations,
radius=100e3, # 100 km spacing
method='average',
smooth_factor=2.0 # Smooth over 200 km
)
Best practices and tips#
Handling missing data#
The na_thres parameter is critical for controlling how NaNs propagate:
# Strict: output masked if any input is masked
strict = xoa.filter.smooth(data, kernel=5, na_thres=0)
# Moderate: output masked if >50% input is masked
moderate = xoa.filter.smooth(data, kernel=5, na_thres=0.5)
# Permissive: output masked only if all input is masked
permissive = xoa.filter.smooth(data, kernel=5, na_thres=1.0)
Choosing kernel sizes#
General guidelines:
Smaller kernels (3-5): Preserve features, less smoothing
Medium kernels (7-15): Balance smoothing and feature preservation
Larger kernels (>15): Heavy smoothing, removes fine structure
For anisotropic smoothing:
# Smooth more along one direction
kernel = {'x': 15, 'y': 5} # More smoothing in x-direction
smoothed = xoa.filter.smooth(data, kernel=kernel)
Computational considerations#
Convolution is computationally intensive for large kernels
Consider using FFT-based methods for very large kernels (not yet in xoa)
For repeated filtering, pre-generate kernels:
# Generate kernel once
kernel = xoa.filter.generate_kernel({'x': 11, 'y': 11}, data, window_func='gaussian')
# Reuse for multiple datasets
smoothed1 = xoa.filter.smooth(data1, kernel=kernel)
smoothed2 = xoa.filter.smooth(data2, kernel=kernel)
Working with multi-dimensional data#
Filtering works seamlessly with multi-dimensional arrays:
# 4D data: time, depth, lat, lon
kernel_4d = {
'time': 3, # Light temporal smoothing
'depth': 1, # No vertical smoothing
'lat': 5, # Spatial smoothing
'lon': 5
}
smoothed_4d = xoa.filter.smooth(data_4d, kernel=kernel_4d)
See also#
xoa.filter: Complete filtering module referencexoa.regrid: For interpolation and regridding operationsscipy.signal: Underlying signal processing functionsscipy.ndimage: N-dimensional image processing functions