Metadata, CF and naming conventions#

Warning

Have a quick look at the appendix The default configuration before going any further!

Introduction#

The xoa.meta subpackage is an application and extension of a subset of the CF conventions in application to naming conventions. It has two main intents:

  • Searching xarray.DataArray variables or coordinates within another xarray.DataArray or a xarray.Dataset, by scanning name and attributes like units and standard_name.

  • Formatting xarray.DataArray variables or coordinates with unique name, and standard_name, long_name and units attributes, with support of staggered grid location syntax.

The module offers capabilities for the user to extend and specialize default behaviors for user’s special datasets.

xoa provides ready-to-use configurations for decoding and encoding a few standard dataset types. Look at Specialized configurations.

Note

This module shares common features with the excellent and long awaited cf_xarray package and started a long time before with the Vacumm package. The most notable differences include:

  • The current module is also designed for data variables and dimensions, not only coordinates.

  • It searches for items not only using standard_names but also specialized names. It means that it works even with datasets that are not well formatted.

  • It is not only available as accessors, but also as independent specification objects that can be configured by the user for each type of dataset.

Quick start with accessors#

The easiest way to use xoa.meta is through xarray accessors. Let’s set up a sample dataset and register the accessors:

In [1]: from xoa import meta

In [2]: meta_specs = meta.get_meta_specs()
In [3]: nx = 3

In [4]: lon = xr.DataArray(np.arange(3, dtype='d'), dims='mylon',
   ...:     attrs={'standard_name': 'longitude'})
   ...: 

In [5]: temp = xr.DataArray(np.arange(20, 23, dtype='d'), dims='mylon',
   ...:     coords={'mylon': lon},
   ...:     attrs={'standard_name': 'sea_water_temperature'})
   ...: 

In [6]: sal = xr.DataArray(np.arange(33, 36, dtype='d'), dims='mylon',
   ...:     coords={'mylon': lon},
   ...:     attrs={'standard_name': 'sea_water_salinity'})
   ...: 

In [7]: ds = xr.Dataset({'mytemp': temp, 'mysal': sal})
In [8]: import xoa

In [9]: xoa.register_accessors()

All these arrays are CF compliant according to their standard_name attribute, despite their names not being very explicit. The accessors make it easy to find, format and chain operations:

In [10]: temp
Out[10]: 
<xarray.DataArray (mylon: 3)> Size: 24B
array([20., 21., 22.])
Coordinates:
  * mylon    (mylon) float64 24B 0.0 1.0 2.0
Attributes:
    standard_name:  sea_water_temperature

In [11]: temp.meta.get("lon") # access by .get
Out[11]: 
<xarray.DataArray 'mylon' (mylon: 3)> Size: 24B
array([0., 1., 2.])
Coordinates:
  * mylon    (mylon) float64 24B 0.0 1.0 2.0
Attributes:
    standard_name:  longitude

In [12]: ds.meta.get("temp") # access by .get
Out[12]: 
<xarray.DataArray 'mytemp' (mylon: 3)> Size: 24B
array([20., 21., 22.])
Coordinates:
  * mylon    (mylon) float64 24B 0.0 1.0 2.0
Attributes:
    standard_name:  sea_water_temperature

In [13]: ds.meta.lon # access by attribute
Out[13]: 
<xarray.DataArray 'mylon' (mylon: 3)> Size: 24B
array([0., 1., 2.])
Coordinates:
  * mylon    (mylon) float64 24B 0.0 1.0 2.0
Attributes:
    standard_name:  longitude

In [14]: ds.meta.coords.lon  # specific search = ds.meta.coords.get("lon")
Out[14]: 
<xarray.DataArray 'mylon' (mylon: 3)> Size: 24B
array([0., 1., 2.])
Coordinates:
  * mylon    (mylon) float64 24B 0.0 1.0 2.0
Attributes:
    standard_name:  longitude

In [15]: ds.meta.temp # access by attribute
Out[15]: 
<xarray.DataArray 'mytemp' (mylon: 3)> Size: 24B
array([20., 21., 22.])
Coordinates:
  * mylon    (mylon) float64 24B 0.0 1.0 2.0
Attributes:
    standard_name:  sea_water_temperature

In [16]: ds.meta["temp"].name # access by item
Out[16]: 'mytemp'

In [17]: ds.meta.data_vars.temp.name  # specific search = ds.meta.data_vars.get("temp")
Out[17]: 'mytemp'

In [18]: ds.meta.data_vars.bathy is None # returns None when not found
Out[18]: True

In [19]: ds.meta.temp.meta.lon.name  # chaining
Out[19]: 'mylon'

In [20]: ds.meta.temp.meta.name # generic meta name, not real name
Out[20]: 'temp'

In [21]: ds.meta.temp.meta.attrs # attributes, merged with CF attrs
Out[21]: 
{'standard_name': 'sea_water_temperature',
 'long_name': 'Sea water in situ temperature',
 'units': 'degrees_celsius'}

In [22]: ds.meta.temp.meta.standard_name # single attribute
Out[22]: 'sea_water_temperature'

In [23]: ds.mytemp.meta.auto_format()
Out[23]: 
<xarray.DataArray 'temp' (lon: 3)> Size: 24B
array([20., 21., 22.])
Coordinates:
  * lon      (lon) float64 24B 0.0 1.0 2.0
Attributes:
    standard_name:  sea_water_temperature
    long_name:      Sea water in situ temperature
    units:          degrees_celsius

In [24]: ds.meta.auto_format()
Out[24]: 
<xarray.Dataset> Size: 72B
Dimensions:  (lon: 3)
Coordinates:
  * lon      (lon) float64 24B 0.0 1.0 2.0
Data variables:
    temp     (lon) float64 24B 20.0 21.0 22.0
    sal      (lon) float64 24B 33.0 34.0 35.0

Accessing an accessor attribute or item makes an implicit call to get(). The accessor gives access to two sub-accessors, data_vars and coords, for specializing the searches.

See also

xoa.accessors.MetaDataArrayAccessor xoa.accessors.MetaDatasetAccessor

Searching and matching#

Under the hood, accessor lookups are powered by the MetaSpecs methods. You can use them directly.

Check if arrays match known or explicit meta items:

In [25]: meta_specs.coords.match(lon, "lon") # explicit
Out[25]: True

In [26]: meta_specs.coords.match(lon, "lat") # explicit
Out[26]: False

In [27]: meta_specs.coords.match(lon) # any known
Out[27]: 'lon'

In [28]: meta_specs.data_vars.match(temp) # any known
Out[28]: 'temp'

In [29]: meta_specs.data_vars.match(sal) # any known
Out[29]: 'sal'

Search for known meta items:

In [30]: mytemp = meta_specs.search(ds, "temp")

In [31]: mylon = meta_specs.search(mytemp, "lon")

Datasets are searched for data variables (“data_vars”) and data variables are searched for coordinates (“coords”). You can also search for coordinates in datasets, for instance like this:

In [32]: meta_specs.coords.search(ds, "lon")
Out[32]: 
<xarray.DataArray 'mylon' (mylon: 3)> Size: 24B
array([0., 1., 2.])
Coordinates:
  * mylon    (mylon) float64 24B 0.0 1.0 2.0
Attributes:
    standard_name:  longitude

See also

  • Meta items: lon lat temp sal

  • Methods: xoa.meta.MetaCoordSpecs.match() xoa.meta.MetaVarSpecs.match() xoa.meta.general.MetaSpecs.search() xoa.meta.MetaCoordSpecs.search() xoa.meta.MetaVarSpecs.search()

Formatting, encoding and decoding#

The idea#

Formatting means changing or setting names and attributes. It is possible to format, or even auto-format data variables and coordinates.

During an auto-formatting, each array is matched against meta specs, and the array is formatted when a matching is successful. If the array contains coordinates, the same process is applied on them, as soon as the format_coords keyword is True. If the name key is set in the matching item specs, its value can be used to name the variable or coordinate array, else the generic name is used like "lon".

Explicit formatting:

In [33]: meta_specs.format_coord(lon, "lon")
Out[33]: 
<xarray.DataArray 'lon' (x: 3)> Size: 24B
array([0., 1., 2.])
Dimensions without coordinates: x
Attributes:
    standard_name:  longitude
    long_name:      Longitude
    units:          degrees_east
    axis:           X

In [34]: meta_specs.format_data_var(temp, "temp")
Out[34]: 
<xarray.DataArray 'temp' (lon: 3)> Size: 24B
array([20., 21., 22.])
Coordinates:
  * lon      (lon) float64 24B 0.0 1.0 2.0
Attributes:
    standard_name:  sea_water_temperature
    long_name:      Sea water in situ temperature
    units:          degrees_celsius

Auto-formatting:

In [35]: ds2 = meta_specs.auto_format(ds)

In [36]: ds2.temp
Out[36]: 
<xarray.DataArray 'temp' (lon: 3)> Size: 24B
array([20., 21., 22.])
Coordinates:
  * lon      (lon) float64 24B 0.0 1.0 2.0
Attributes:
    standard_name:  sea_water_temperature
    long_name:      Sea water in situ temperature
    units:          degrees_celsius

In [37]: ds2.lon
Out[37]: 
<xarray.DataArray 'lon' (lon: 3)> Size: 24B
array([0., 1., 2.])
Coordinates:
  * lon      (lon) float64 24B 0.0 1.0 2.0
Attributes:
    standard_name:  longitude
    long_name:      Longitude
    units:          degrees_east
    axis:           X

It can be applied to a data array or a full dataset.

See also

xoa.meta.general.MetaSpecs.format_coord() xoa.meta.general.MetaSpecs.format_data_var() xoa.meta.general.MetaSpecs.auto_format()

Encoding and decoding#

By default, formatting renames known arrays to their generic name, like “temp” in the example above. We speak here of encoding. If the specialize keyword is set to True, arrays are renamed with their specialized name if set in the specs with the name option. We speak here of decoding. Two shortcut methods exist for these tasks:

  • Decoding: decode()

  • Encoding: encode()

Chaining the two methods should lead to the initial dataset or data array. See the last section of this page for an example: Example: decoding/encoding Croco model outputs.

See also

xoa.meta.general.MetaSpecs.decode() xoa.meta.general.MetaSpecs.encode()

Staggered grid locations#

A MetaSpecs instance comes with a SGLocator that is accessible through the sglocator attribute. A SGLocator helps parsing and formatting staggered grid location from name, standard_name and long_name data array attributes. It is configured at the sglocator section in which you can specify the format of the name (name_format and the allowed location names allowed_locations.

In [38]: sglocator = meta_specs.sglocator

In [39]: sglocator.formats
Out[39]: 
{'name': '{root}_{loc}',
 'standard_name': '{root}_at_{loc}_location',
 'long_name': '{root} at {loc} location'}

In [40]: sglocator.update(name_format="{root}_{loc}")

In [41]: sglocator.format_attr("name", "lon", "rho")
Out[41]: 'lon_rho'

In [42]: sglocator.format_attr("standard_name", "lon", "rho")
Out[42]: 'lon_at_rho_location'

In [43]: sglocator.format_attr("long_name", "lon", "rho")
Out[43]: 'Lon at RHO location'

In [44]: sglocator.parse_attr("name", "lon_rho")
Out[44]: ('lon', 'rho')

Understanding the specifications#

Scanning and formatting actions are based on specifications, and this module natively includes a default configuration for various oceanographic, sea surface and atmospheric variables and coordinates. A distinction is made between data variables (data_vars) and coordinates (coords), like in xarray.

Getting the current specifications for data variables and coordinates with the get_meta_specs() function:

In [45]: meta_specs["data_vars"] is meta_specs.data_vars
Out[45]: True

In [46]: meta_specs["coords"] is meta_specs.coords
Out[46]: True

In [47]: meta_specs.data_vars.names[:3]
Out[47]: ['sal', 'ptemp', 'ssh']

In [48]: meta_specs.coords.names[:3]
Out[48]: ['x', 'y', 'sig']

See the appendix The default configuration for the list of available default specifications.

An instance of the MetaSpecs has other configuration sections than data_vars, coords and dims: registration, sglocator, vertical and accessors.

Data variables#

Here is the example of the sst data variable:

In [49]: from pprint import pprint

In [50]: pprint(meta_specs.data_vars['sst'])
{'add_coords_loc': {},
 'add_loc': False,
 'alt_names': [],
 'attrs': {'long_name': ['Sea surface temperature'],
           'standard_name': ['sea_surface_temperature',
                             'surface_sea_water_temperature'],
           'units': ['degrees_celsius']},
 'cmap': 'cmo.thermal',
 'exclude': False,
 'inherit': None,
 'loc': None,
 'name': None,
 'processed': True,
 'search_order': 'sn',
 'select': {},
 'squeeze': None}

Description of specification keys:

meta specs for Data variables (data_vars)#

Key

Type

Description

name

str

Specialized name for decoding and encoding which is empty by default

alt_names

list(str)

Alternate names for decoding

standard_name

list(str)

“standard_name” attributes

long_name

list(str)

“long_name” attributes

units

list(str)

“units” attributes

domain

choice

Domain of application, within {‘generic’, ‘atmos’, ‘ocean’, ‘surface’}

search_order

str

Search order within properties as combination of letters: [n]name, [s]tandard_name, [u]nits

cmap

str

Colormap specification

inherit

str

Inherit specifications from another data variable

select

eval

Item selection evaluated and applied to the array

squeeze

list(str)

List of dimensions that must be squeezed out

Note

The standard_name, long_name and units specifications are internally stored as a dict in the attrs key.

Get the specialized name and the attributes only:

In [51]: meta_specs.data_vars.get_name("sst")
Out[51]: 'sst'

In [52]: meta_specs.data_vars.get_attrs("sst")
Out[52]: 
{'standard_name': 'sea_surface_temperature',
 'long_name': 'Sea surface temperature',
 'units': 'degrees_celsius'}

Coordinates#

Here is the example of the lon coordinate:

In [53]: from pprint import pprint

In [54]: pprint(meta_specs.coords['lon'])
{'add_coords_loc': {},
 'add_loc': None,
 'alt_names': ['longitude', 'nav_lon', 'longitudes', 'lons'],
 'attrs': {'axis': 'X',
           'long_name': ['Longitude'],
           'standard_name': ['longitude'],
           'units': ['degrees_east',
                     'degree_east',
                     'degree_e',
                     'degrees_e',
                     'degreee',
                     'degreese']},
 'exclude': False,
 'inherit': None,
 'loc': None,
 'name': None,
 'processed': True,
 'search_order': 'snu'}

Description of specification keys:

meta specs for Coordinates (coords)#

Key

Type

Description

name

str

Specialized name for decoding and encoding which is empty by default

alt_names

list(str)

Alternate names for decoding

standard_name

list(str)

“standard_name” attributes

long_name

list(str)

“long_name” attributes

units

list(str)

“units” attributes

axis

str

“axis” attribute like X, Y, Z, T or F

search_order

str

Search order within properties as combination of letters: [n]name, [s]tandard_name, [u]nits

inherit

str

Inherit specifications from another data variable or coordinate

Note

Like for data variables, the standard_name, long_name, units and axis specifications are internally stored as a dict in the attrs key.

Get the specialized name and the attributes only:

In [55]: meta_specs.coords.get_name("lon")
Out[55]: 'lon'

In [56]: meta_specs.coords.get_attrs("lon")
Out[56]: 
{'standard_name': 'longitude',
 'long_name': 'Longitude',
 'units': 'degrees_east',
 'axis': 'X'}

Dimensions#

meta specs for Dimensions (dims)#

Key

Type

Description

x

list(str)

Possible names of X-type dimensions

y

list(str)

Possible names of Y-type dimensions

z

list(str)

Possible names of Z-type vertical dimensions

t

list(str)

Possible names of time dimensions

f

list(str)

Possible names of forecast dimensions

This section of the specs defines the names that the dimensions of type x, y, z, t and f (forecast) can take. It is filled automatically by default with possible names (name and alt_names) of the coordinates that have their axis defined. The user can also add more specific names.

These lists of names are used when searching for dimensions that are not coordinates: since they don’t have attributes, their type can only be guessed from their name.

Warning

It is recommended to not fill the dims section, but fill the coordinate section instead.

Here is the default content:

In [57]: meta_specs.dims  # or meta_specs["dims"]
Out[57]: 
{'x': ['x',
  'xi',
  'lon',
  'xi',
  'nx',
  'ni',
  'x',
  'imt',
  'ipi',
  'longitude',
  'nav_lon',
  'longitudes',
  'lons'],
 'y': ['y',
  'eta',
  'lat',
  'yi',
  'ny',
  'nj',
  'y',
  'jmt',
  'jpj',
  'latitude',
  'nav_lat',
  'latitudes',
  'lats'],
 'z': ['sig',
  's',
  'depth',
  'level',
  'altitude',
  'station',
  's_level',
  'sigma',
  's',
  'sigma_level',
  'dep',
  'model_level',
  'lev',
  'nz',
  'nk',
  'z',
  'kmt',
  'zi',
  'altitudet',
  'altitudeu',
  'altitudev',
  'stations'],
 't': ['time', 'nt'],
 'f': ['forecast', 'nf', 'fcst']}

Other sections#

In [58]: meta_specs["register"]
Out[58]: {'name': 'croco', 'attrs': {'ini_file': ['*CROCO*'], 'type': ['*ROMS*']}}

In [59]: meta_specs["sglocator"]
Out[59]: 
{'name_format': '{root}_{loc}',
 'valid_locations': ['rho', 'w', 'u', 'v', 'r', 'psi']}

In [60]: meta_specs["accessors"]
Out[60]: 
{'properties': {'coords': ['ptemp',
   'lon',
   'lat',
   'depth',
   'sig',
   'level',
   'altitude',
   'time',
   'forecast'],
  'data_vars': ['temp',
   'sal',
   'u',
   'v',
   'bathy',
   'ssh',
   'temp',
   'sal',
   'u',
   'v',
   'bathy',
   'ssh']}}

In [61]: meta_specs["vertical"]
Out[61]: {'positive': 'up', 'type': 'sigma'}

Customizing the specifications#

Default user file#

The xoa.meta module has internal defaults as shown in appendix The default configuration.

You can extend these defaults with a user file, whose location is printable with the following command, at the line containing “user meta specs file”:

$ xoa info paths
- xoa library dir: /home/docs/checkouts/readthedocs.org/user_builds/xoa/conda/v2026.2.0/lib/python3.14/site-packages/xoa
- user config file: /home/docs/.config/xoa/xoa.cfg [*]
- user CF specs file: /home/docs/.config/xoa/meta.cfg [*]
- data samples: OBS/ARGO/argo-7900573.nc MODELS/CMEMS-IBI/ibi-argo-7900573.nc MODELS/CROCO/SOUTH-AFRICA/croco.south-africa.meridional.nc MODELS/CROCO/SOUTH-AFRICA/croco.south-africa.surf.nc MODELS/CROCO/SOUTH-AFRICA/croco.south-africa.zonal.nc MODELS/HYCOM/hycom.gdp.h.nc MODELS/HYCOM/hycom.gdp.u.nc MODELS/HYCOM/hycom.gdp.v.nc OBS/DRIFTERS/gdp-6203641.csv
*: file not present

Updating current specs#

The current specs can be updated with different methods.

From a well structured dictionary:

In [62]: meta_specs.load_cfg({"data_vars": {"banana": {"standard_name": "banana"}}})

In [63]: meta_specs.data_vars["banana"]
Out[63]: 
{'name': None,
 'alt_names': [],
 'cmap': None,
 'inherit': None,
 'squeeze': None,
 'search_order': 'sn',
 'loc': None,
 'add_loc': False,
 'exclude': False,
 'add_coords_loc': {},
 'attrs': {'long_name': ['Banana'], 'standard_name': ['banana'], 'units': []},
 'select': {},
 'processed': True}

From a configuration file: instead of the dictionary as an argument to load_cfg() method, you can give either a file name or a multi-line string with the same content as the file. Following the previous example:

[data_vars]
    [[banana]]
        standard_name: banana

If you only want to update a category, you can use such method (here set_specs()):

In [64]: meta_specs.data_vars.set_specs("banana", name="bonono")

In [65]: meta_specs.data_vars["banana"]["name"]
Out[65]: 'bonono'

Alternatively, a xoa.meta.general.MetaSpecs instance can be loaded with the load_cfg() method, as explained below.

Creating new specs#

To create new specs, you must instantiate the xoa.meta.general.MetaSpecs class, with an input type as those presented above:

  • A config file name.

  • A multi-line string in the format of a config file.

  • A dictionary.

  • A configobj.ConfigObj instance.

  • Another MetaSpecs instance.

  • A list of them, with the first having priority over the last.

The initialization also accepts two options:

  • default: whether to load or not the default internal config.

  • user: whether to load or not the user config file.

A config created from default and user configs:

In [66]: banana_specs = {"data_vars": {"banana": {"attrs": {"standard_name": "banana"}}}}

In [67]: mymeta_specs = meta.general.MetaSpecs(banana_specs)

In [68]: mymeta_specs["data_vars"]["sst"]["attrs"]["standard_name"]
Out[68]: ['sea_surface_temperature', 'surface_sea_water_temperature']

In [69]: mymeta_specs["data_vars"]["banana"]["attrs"]["standard_name"]
Out[69]: ['banana']

A config created from scratch:

In [70]: mymeta_specs = meta.general.MetaSpecs(banana_specs, default=False, user=False)

In [71]: mymeta_specs.pprint(depth=2)
{'accessors': {'properties': {...}},
 'coords': {},
 'data_vars': {'banana': {...}},
 'dims': {'f': [], 't': [], 'x': [], 'y': [], 'z': []},
 'exclude_names': None,
 'register': {'attrs': {}, 'name': ''},
 'sglocator': {'name_format': False, 'valid_locations': None},
 'vertical': {'positive': None, 'type': None}}

A config created from two other configs:

In [72]: meta_specs_banana = meta.general.MetaSpecs(banana_specs, default=False, user=False)

In [73]: apple_specs = {"data_vars": {"apple": {"attrs": {"long_name": "Big apple"}}}}

In [74]: meta_specs_apple = meta.general.MetaSpecs(apple_specs, default=False, user=False)

In [75]: meta_specs_fruits = meta.general.MetaSpecs((meta_specs_apple, meta_specs_banana),
   ....:     default=False, user=False)
   ....: 

In [76]: meta_specs_fruits.data_vars.names
Out[76]: ['apple', 'banana']

Temporarily replacing specs#

As shown before, the current meta specs are accessible with the xoa.meta.get_meta_specs() function. You can replace them with the xoa.meta.set_meta_specs class, to be used as a function.

In [77]: meta_specs_old = meta.get_meta_specs()

In [78]: meta.set_meta_specs(meta_specs_banana)
Out[78]: <xoa.meta.set_meta_specs at 0x759687ed5810>

In [79]: meta.get_meta_specs() is meta_specs_banana
Out[79]: True

In [80]: meta.set_meta_specs(meta_specs_old)
Out[80]: <xoa.meta.set_meta_specs at 0x759687ed5480>

In [81]: meta.get_meta_specs() is meta_specs_old
Out[81]: True

In case of a temporary change, you can use set_meta_specs in a context statement:

In [82]: with meta.set_meta_specs(meta_specs_banana) as myspecs:
   ....:     print('inside', meta.get_meta_specs() is meta_specs_banana)
   ....:     print('inside', myspecs is meta.get_meta_specs())
   ....: 
inside True
inside True

In [83]: print('outside', meta.get_meta_specs() is meta_specs_old)
outside True

For convenience, you can set specs directly with a dictionary:

In [84]: with meta.set_meta_specs({"data_vars": {"apple": {}}}) as myspecs:
   ....:     print("apple" in meta.get_meta_specs())
   ....: 
True

In [85]: print("apple" in meta.get_meta_specs())
False

Application with an accessor usage:

In [86]: data = xr.DataArray([5], attrs={'standard_name': 'sea_surface_banana'})

In [87]: ds = xr.Dataset({'toto': data})

In [88]: mymeta_specs = meta.general.MetaSpecs({"data_vars": {"ssb":
   ....:     {"standard_name": "sea_surface_banana"}}})
   ....: 

In [89]: with meta.set_meta_specs(mymeta_specs):
   ....:     print(ds.meta.get("ssb"))
   ....: 
<xarray.DataArray 'toto' (dim_0: 1)> Size: 8B
array([5])
Dimensions without coordinates: dim_0
Attributes:
    standard_name:  sea_surface_banana

Registering and inferring specs#

It is possible to register specialized MetaSpecs instances with register_meta_specs() for future access.

Here we register new specs with an internal registration name "mycroco":

In [90]: content = {
   ....:     "register": {
   ....:         "name": "mycroco"
   ....:     },
   ....:     "data_vars": {
   ....:         "temp": {
   ....:             "name": "supertemp"
   ....:         }
   ....:     },
   ....:     "coords": {
   ....:         "lon": {
   ....:             "name": "mylon"
   ....:         }
   ....:     }
   ....: }
   ....: 

In [91]: mymeta_specs = meta.general.MetaSpecs(content)

In [92]: meta.register_meta_specs(mymeta_specs)

We can now access it with the get_meta_specs() function:

In [93]: these_meta_specs = meta.get_meta_specs('mycroco')

In [94]: these_meta_specs is mymeta_specs
Out[94]: True

If you set the meta_specs attribute or encoding of a dataset to the name of a registered MetaSpecs instance, you can get it automatically with the get_meta_specs() function.

Let’s register another MetaSpecs instance:

In [95]: content = {
   ....:     "register": {
   ....:         "name": "myhycom"
   ....:     },
   ....:     "data_vars": {
   ....:         "sal": {
   ....:             "name": "supersal"
   ....:         }
   ....:     },
   ....: }
   ....: 

In [96]: mymeta_specs2 = meta.general.MetaSpecs(content)

In [97]: meta.register_meta_specs(mymeta_specs2)

Let’s create a dataset:

In [98]: ds = xr.Dataset({'supertemp': ("mylon", [0, 2])}, coords={"mylon": [10, 20]})

Now find the best registered specs instance which has either the name myhycom or mycroco:

In [99]: meta_specs_auto = meta.infer_meta_specs(ds)

In [100]: print(meta_specs_auto.name)
mycroco

In [101]: ds_decoded = meta_specs_auto.decode(ds)

In [102]: ds_decoded
Out[102]: 
<xarray.Dataset> Size: 32B
Dimensions:  (lon: 2)
Coordinates:
  * lon      (lon) int64 16B 10 20
Data variables:
    temp     (lon) int64 16B 0 2

In [103]: meta_specs_auto.encode(ds)
Out[103]: 
<xarray.Dataset> Size: 32B
Dimensions:    (mylon: 2)
Coordinates:
  * mylon      (mylon) int64 16B 10 20
Data variables:
    supertemp  (mylon) int64 16B 0 2

It is mycroco as expected.

Assigning specs to datasets#

All xoa routines that need to access specific coordinates or variables try to infer the appropriate specs, which default to the current specs. When the meta_specs attribute or encoding is set, get_meta_specs() uses it to search within registered specs.

In [104]: ds.encoding.update(meta_specs="mycroco")

In [105]: meta_specs = meta.get_meta_specs(ds)

In [106]: meta_specs.encode(ds)
Out[106]: 
<xarray.Dataset> Size: 32B
Dimensions:    (mylon: 2)
Coordinates:
  * mylon      (mylon) int64 16B 10 20
Data variables:
    supertemp  (mylon) int64 16B 0 2

The meta_specs encoding is set at the dataset level, not at the data array level:

In [107]: meta.get_meta_specs(ds.supertemp) is meta_specs
Out[107]: True

To propagate to all the data arrays, use assign_meta_specs():

In [108]: meta.assign_meta_specs(ds, "mycroco")
Out[108]: 
<xarray.Dataset> Size: 32B
Dimensions:    (mylon: 2)
Coordinates:
  * mylon      (mylon) int64 16B 10 20
Data variables:
    supertemp  (mylon) int64 16B 0 2

In [109]: ds.mylon.encoding
Out[109]: {'meta_specs': 'mycroco'}

In [110]: meta.get_meta_specs(ds.supertemp) is meta_specs
Out[110]: True

Specialized helper modules#

The xoa.meta module provides the foundation for finding and identifying variables and coordinates. Building on this, xoa provides specialized modules with convenient functions for common oceanographic and atmospheric variables. All these functions use the meta specs under the hood.

Finding coordinates with xoa.coords#

The xoa.coords module provides high-level functions to find specific coordinates in a dataset or data array.

Available coordinate finder functions:

Dimension finder functions:

Example usage:

In [111]: import xoa.coords

# Rebuild a sample dataset with coordinates
In [112]: lon = xr.DataArray(np.arange(3, dtype='d'), dims='mylon',
   .....:     attrs={'standard_name': 'longitude'})
   .....: 

In [113]: temp = xr.DataArray(np.arange(20, 23, dtype='d'), dims='mylon',
   .....:     coords={'mylon': lon},
   .....:     attrs={'standard_name': 'sea_water_temperature'})
   .....: 

# Get longitude coordinate from a data array
In [114]: found_lon = xoa.coords.get_lon(temp)

In [115]: print(found_lon.name)
mylon

# Check if a coordinate is latitude
In [116]: print(xoa.coords.is_lon(found_lon))
True

In [117]: print(xoa.coords.is_lat(found_lon))
False

Finding thermodynamic variables with xoa.thermdyn#

The xoa.thermdyn module provides functions to find thermodynamic variables like temperature, salinity, and density in datasets.

Each function accepts a variant keyword to target a specific type:

  • Temperature variants: "temp" (in situ), "ptemp" (potential), "ctemp" (conservative), "atemp" (absolute)

  • Salinity variants: "sal" (in situ), "psal" (practical), "asal" (absolute), "pfsal" (preformed)

In [118]: import xoa.thermdyn

In [119]: sal = xr.DataArray(np.arange(33, 36, dtype='d'), dims='mylon',
   .....:     coords={'mylon': lon},
   .....:     attrs={'standard_name': 'sea_water_salinity'})
   .....: 

In [120]: ds_thd = xr.Dataset({'mytemp': temp, 'mysal': sal})

# Find any temperature variable
In [121]: found_temp = xoa.thermdyn.get_temp(ds_thd)

In [122]: print(found_temp.name)
mytemp

# Check if a variable is temperature-like
In [123]: print(xoa.thermdyn.is_temp(found_temp))
True

# Find salinity
In [124]: found_sal = xoa.thermdyn.get_sal(ds_thd)

In [125]: print(found_sal.name)
mysal

Finding dynamical variables with xoa.dyn#

The xoa.dyn module provides functions for ocean dynamics variables, particularly sea level.

Supported variants: "ssh" (sea surface height), "adt" (absolute dynamic topography), "sla" (sea level anomaly), "mdt" (mean dynamic topography), "mss" (mean sea surface).

Example: decoding/encoding Croco model outputs#

Here are the specs for Croco:

[register]
name=croco

    [[attrs]]
    ini_file=*CROCO*
    type=*ROMS*

[sglocator]
name_format={root}_{loc}
valid_locations=rho,w,u,v,r,psi

[accessors]

    [[properties]]
    coords=ptemp

[vertical]
positive=up
type=sigma

[data_vars]

    [[sal]]
    name=salt

    [[ptemp]]
    name=temp

    [[ssh]]
    name=zeta

    [[bathy]]
    name=h

    [[mask]]
    mask=mask
    add_loc=True

    [[corio]]
    name=f

    [[akt]]
    name=AKt

    [[ex]]
    name=pm

    [[ey]]
    name=pn

    [[sig]]
    name=s

    [[cs]]
    name=Cs
    add_loc=True

[coords]

    [[x]]
    name=xi
    search_order=n
    standard_name=x_grid_index

    [[y]]
    name=eta
    search_order=n
    standard_name=y_grid_index

    [[sig]]
    name=s
    add_loc=True

    [[depth]]
    add_loc=True

Register them:

In [126]: meta_config_file = xoa.meta.get_meta_config_file("croco")

In [127]: print(meta_config_file)
/home/docs/checkouts/readthedocs.org/user_builds/xoa/conda/v2026.2.0/lib/python3.14/site-packages/xoa/meta/configs/croco.cfg

In [128]: xoa.meta.register_meta_specs(meta_config_file) # xoa.meta.register_meta_specs("croco")

In [129]: xoa.meta.is_registered_meta_specs("croco")
Out[129]: True

Register the xoa accessor:

In [130]: xoa.register_accessors()

Now let’s open a Croco sample as a xarray dataset:

In [131]: ds = xoa.open_data_sample("MODELS/CROCO/SOUTH-AFRICA/croco.south-africa.meridional.nc")

In [132]: ds
Out[132]: 
<xarray.Dataset> Size: 48kB
Dimensions:     (time: 1, s_w: 33, eta_rho: 56, xi_rho: 1, s_rho: 32, auxil: 4,
                 xi_u: 1, eta_v: 55)
Coordinates: (12/13)
  * time        (time) float64 8B 2.592e+06
  * s_w         (s_w) float32 132B -1.0 -0.9688 -0.9375 ... -0.0625 -0.03125 0.0
  * eta_rho     (eta_rho) float32 224B 6.0 7.0 8.0 9.0 ... 58.0 59.0 60.0 61.0
  * xi_rho      (xi_rho) float32 4B 131.0
    lat_rho     (eta_rho, xi_rho) float32 224B ...
    lat_v       (eta_v, xi_rho) float32 220B ...
    ...          ...
    lon_v       (eta_v, xi_rho) float32 220B ...
  * s_rho       (s_rho) float32 128B -0.9844 -0.9531 ... -0.04688 -0.01562
  * xi_u        (xi_u) float32 4B 131.5
    lat_u       (eta_rho, xi_u) float32 224B ...
    lon_u       (eta_rho, xi_u) float32 224B ...
  * eta_v       (eta_v) float32 220B 6.5 7.5 8.5 9.5 ... 57.5 58.5 59.5 60.5
Dimensions without coordinates: auxil
Data variables: (12/23)
    AKt         (time, s_w, eta_rho, xi_rho) float32 7kB ...
    Cs_r        (s_rho) float32 128B ...
    Cs_w        (s_w) float32 132B ...
    Vtransform  float32 4B ...
    angle       (eta_rho, xi_rho) float32 224B ...
    el          float32 4B ...
    ...          ...
    time_step   (time, auxil) int32 16B ...
    u           (time, s_rho, eta_rho, xi_u) float32 7kB ...
    v           (time, s_rho, eta_v, xi_rho) float32 7kB ...
    w           (time, s_rho, eta_rho, xi_rho) float32 7kB ...
    xl          float32 4B ...
    zeta        (time, eta_rho, xi_rho) float32 224B ...
Attributes: (12/57)
    type:           ROMS history file
    title:          BENGUELA TEST MODEL
    date:           
    rst_file:       CROCO_FILES/croco_rst.nc
    his_file:       CROCO_FILES/croco_his.nc
    avg_file:       CROCO_FILES/croco_avg.nc
    ...             ...
    v_sponge:       0.0
    sponge_expl:    Sponge parameters : extent (m) & viscosity (m2.s-1)
    SRCS:           main.F step.F read_inp.F timers_roms.F init_scalars.F ini...
    CPP-options:    REGIONAL BENGUELA_VHR MPI OBC_EAST OBC_WEST OBC_NORTH OBC...
    history:        Tue Mar 31 16:26:24 2020: ncks -O -d time,30 -d xi_rho,13...
    NCO:            4.4.2

Let’s decode it:

In [133]: dsd = ds.xoa.decode()

In [134]: dsd
Out[134]: 
<xarray.Dataset> Size: 48kB
Dimensions:     (time: 1, sig_w: 33, y_rho: 56, x_rho: 1, sig_rho: 32,
                 auxil: 4, x_u: 1, y_v: 55)
Coordinates: (12/13)
  * time        (time) float64 8B 2.592e+06
  * sig_w       (sig_w) float32 132B -1.0 -0.9688 -0.9375 ... -0.03125 0.0
  * y_rho       (y_rho) float32 224B 6.0 7.0 8.0 9.0 ... 58.0 59.0 60.0 61.0
  * x_rho       (x_rho) float32 4B 131.0
    lat_rho     (y_rho, x_rho) float32 224B ...
    lat_v       (y_v, x_rho) float32 220B ...
    ...          ...
    lon_v       (y_v, x_rho) float32 220B ...
  * sig_rho     (sig_rho) float32 128B -0.9844 -0.9531 ... -0.04688 -0.01562
  * x_u         (x_u) float32 4B 131.5
    lat_u       (y_rho, x_u) float32 224B ...
    lon_u       (y_rho, x_u) float32 224B ...
  * y_v         (y_v) float32 220B 6.5 7.5 8.5 9.5 10.5 ... 57.5 58.5 59.5 60.5
Dimensions without coordinates: auxil
Data variables: (12/23)
    akt         (time, sig_w, y_rho, x_rho) float32 7kB ...
    cs_r        (sig_rho) float32 128B ...
    cs_w        (sig_w) float32 132B ...
    Vtransform  float32 4B ...
    angle       (y_rho, x_rho) float32 224B ...
    el          float32 4B ...
    ...          ...
    time_step   (time, auxil) int32 16B ...
    u           (time, sig_rho, y_rho, x_u) float32 7kB ...
    v           (time, sig_rho, y_v, x_rho) float32 7kB ...
    w           (time, sig_rho, y_rho, x_rho) float32 7kB ...
    xl          float32 4B ...
    ssh         (time, y_rho, x_rho) float32 224B ...
Attributes: (12/57)
    type:           ROMS history file
    title:          BENGUELA TEST MODEL
    date:           
    rst_file:       CROCO_FILES/croco_rst.nc
    his_file:       CROCO_FILES/croco_his.nc
    avg_file:       CROCO_FILES/croco_avg.nc
    ...             ...
    v_sponge:       0.0
    sponge_expl:    Sponge parameters : extent (m) & viscosity (m2.s-1)
    SRCS:           main.F step.F read_inp.F timers_roms.F init_scalars.F ini...
    CPP-options:    REGIONAL BENGUELA_VHR MPI OBC_EAST OBC_WEST OBC_NORTH OBC...
    history:        Tue Mar 31 16:26:24 2020: ncks -O -d time,30 -d xi_rho,13...
    NCO:            4.4.2

Let’s re-encode it!

In [135]: dse = dsd.xoa.encode()

In [136]: dse
Out[136]: 
<xarray.Dataset> Size: 48kB
Dimensions:     (time: 1, s_w: 33, eta_rho: 56, xi_rho: 1, s_rho: 32, auxil: 4,
                 xi_u: 1, eta_v: 55)
Coordinates: (12/13)
  * time        (time) float64 8B 2.592e+06
  * s_w         (s_w) float32 132B -1.0 -0.9688 -0.9375 ... -0.0625 -0.03125 0.0
  * eta_rho     (eta_rho) float32 224B 6.0 7.0 8.0 9.0 ... 58.0 59.0 60.0 61.0
  * xi_rho      (xi_rho) float32 4B 131.0
    lat_rho     (eta_rho, xi_rho) float32 224B ...
    lat_v       (eta_v, xi_rho) float32 220B ...
    ...          ...
    lon_v       (eta_v, xi_rho) float32 220B ...
  * s_rho       (s_rho) float32 128B -0.9844 -0.9531 ... -0.04688 -0.01562
  * xi_u        (xi_u) float32 4B 131.5
    lat_u       (eta_rho, xi_u) float32 224B ...
    lon_u       (eta_rho, xi_u) float32 224B ...
  * eta_v       (eta_v) float32 220B 6.5 7.5 8.5 9.5 ... 57.5 58.5 59.5 60.5
Dimensions without coordinates: auxil
Data variables: (12/23)
    AKt         (time, s_w, eta_rho, xi_rho) float32 7kB ...
    Cs_r        (s_rho) float32 128B ...
    Cs_w        (s_w) float32 132B ...
    Vtransform  float32 4B ...
    angle       (eta_rho, xi_rho) float32 224B ...
    el          float32 4B ...
    ...          ...
    time_step   (time, auxil) int32 16B ...
    u           (time, s_rho, eta_rho, xi_u) float32 7kB ...
    v           (time, s_rho, eta_v, xi_rho) float32 7kB ...
    w           (time, s_rho, eta_rho, xi_rho) float32 7kB ...
    xl          float32 4B ...
    zeta        (time, eta_rho, xi_rho) float32 224B ...
Attributes: (12/57)
    type:           ROMS history file
    title:          BENGUELA TEST MODEL
    date:           
    rst_file:       CROCO_FILES/croco_rst.nc
    his_file:       CROCO_FILES/croco_his.nc
    avg_file:       CROCO_FILES/croco_avg.nc
    ...             ...
    v_sponge:       0.0
    sponge_expl:    Sponge parameters : extent (m) & viscosity (m2.s-1)
    SRCS:           main.F step.F read_inp.F timers_roms.F init_scalars.F ini...
    CPP-options:    REGIONAL BENGUELA_VHR MPI OBC_EAST OBC_WEST OBC_NORTH OBC...
    history:        Tue Mar 31 16:26:24 2020: ncks -O -d time,30 -d xi_rho,13...
    NCO:            4.4.2

Et voilà !

Example: decoding and merging Hycom split outputs#

At Shom, the Hycom model outputs are split into separate files, one for each variable. Conflicts may occur when using variables that are not at the same staggered grid location, since all variables are stored with dimensions (y, x) and lon and lat coordinates, all with the same name. To avoid this problem, the horizontal dimensions and coordinates of staggered variables are renamed to indicate their location.

Here are the specs to take care of the staggered grid indicators in the names:

[register]
name=hycom

    [[attrs]]
    nsigma = "[0-9]?"

[sglocator]
name_format="{root}_{loc}"
valid_locations=u,v

[vertical]
positive=down
type=dz

[data_vars]

    [[u]]
    loc=u
        [[[add_coords_loc]]]
        lon=True
        lat=True
        x=True
        y=True

    [[v]]
    loc=v
        [[[add_coords_loc]]]
        lon=True
        lat=True
        x=True
        y=True

    [[bathy]]
    name=bathymetry
    add_loc=True

    [[dz]]
    name=h

[coords]

    [[x]]
    name=X

    [[y]]
    name=Y

    [[z]]
    name=lev

Note the add_coords_loc sections. Location is not added to the u and v variables but to their dimensions and coordinates.

Register them:

In [137]: xoa.meta.register_meta_specs("hycom")

In [138]: xoa.meta.is_registered_meta_specs("hycom")
Out[138]: True

Overview of the U dataset:

In [139]: dsu = xoa.open_data_sample("MODELS/HYCOM/hycom.gdp.u.nc")

In [140]: dsu
Out[140]: 
<xarray.Dataset> Size: 173kB
Dimensions:  (Y: 11, X: 13, time: 50, lev: 6)
Coordinates:
  * time     (time) datetime64[ns] 400B 2021-03-01 ... 2021-03-25T12:00:00
  * lev      (lev) int32 24B 1 2 3 4 5 6
Dimensions without coordinates: Y, X
Data variables:
    lon      (Y, X) float32 572B ...
    lat      (Y, X) float32 572B ...
    u        (time, lev, Y, X) float32 172kB ...
Attributes: (12/26)
    history:                   Fri Apr  9 12:15:12 2021: ncrcat -x -v bathyme...
    Conventions:               COARDS
    nhybrd:                    20
    nsigma:                    19
    kapref:                    2
    hybflg:                    0
    ...                        ...
    dp00ft:                    [1.15 1.15 1.15 1.15 1.15 1.15 1.15 1.15 1.15 ...
    dp00xt:                    [21.874 21.414 21.414 13.197  7.85   3.85   4....
    ds00t:                     [3.555 3.026 3.026 1.547 0.828 0.828 0.968 0.8...
    ds00ft:                    [1.15 1.15 1.15 1.15 1.15 1.15 1.15 1.15 1.15 ...
    ds00xt:                    [21.874 21.414 21.414  7.197  3.85   3.85   4....
    nco_openmp_thread_number:  1

Decoding:

In [141]: dsu = dsu.xoa.decode()

In [142]: dsu
Out[142]: 
<xarray.Dataset> Size: 173kB
Dimensions:  (y_u: 11, x_u: 13, time: 50, z: 6)
Coordinates:
    lon_u    (y_u, x_u) float32 572B ...
    lat_u    (y_u, x_u) float32 572B ...
  * time     (time) datetime64[ns] 400B 2021-03-01 ... 2021-03-25T12:00:00
  * z        (z) int32 24B 1 2 3 4 5 6
Dimensions without coordinates: y_u, x_u
Data variables:
    u        (time, z, y_u, x_u) float32 172kB ...
Attributes: (12/26)
    history:                   Fri Apr  9 12:15:12 2021: ncrcat -x -v bathyme...
    Conventions:               COARDS
    nhybrd:                    20
    nsigma:                    19
    kapref:                    2
    hybflg:                    0
    ...                        ...
    dp00ft:                    [1.15 1.15 1.15 1.15 1.15 1.15 1.15 1.15 1.15 ...
    dp00xt:                    [21.874 21.414 21.414 13.197  7.85   3.85   4....
    ds00t:                     [3.555 3.026 3.026 1.547 0.828 0.828 0.968 0.8...
    ds00ft:                    [1.15 1.15 1.15 1.15 1.15 1.15 1.15 1.15 1.15 ...
    ds00xt:                    [21.874 21.414 21.414  7.197  3.85   3.85   4....
    nco_openmp_thread_number:  1

Now we can read, decode and merge all files without any conflict:

In [143]: dsv = xoa.open_data_sample("MODELS/HYCOM/hycom.gdp.u.nc").xoa.decode()

In [144]: dsh = xoa.open_data_sample("MODELS/HYCOM/hycom.gdp.h.nc").xoa.decode()

In [145]: print(dsu)
<xarray.Dataset> Size: 173kB
Dimensions:  (y_u: 11, x_u: 13, time: 50, z: 6)
Coordinates:
    lon_u    (y_u, x_u) float32 572B ...
    lat_u    (y_u, x_u) float32 572B ...
  * time     (time) datetime64[ns] 400B 2021-03-01 ... 2021-03-25T12:00:00
  * z        (z) int32 24B 1 2 3 4 5 6
Dimensions without coordinates: y_u, x_u
Data variables:
    u        (time, z, y_u, x_u) float32 172kB ...
Attributes: (12/26)
    history:                   Fri Apr  9 12:15:12 2021: ncrcat -x -v bathyme...
    Conventions:               COARDS
    nhybrd:                    20
    nsigma:                    19
    kapref:                    2
    hybflg:                    0
    ...                        ...
    dp00ft:                    [1.15 1.15 1.15 1.15 1.15 1.15 1.15 1.15 1.15 ...
    dp00xt:                    [21.874 21.414 21.414 13.197  7.85   3.85   4....
    ds00t:                     [3.555 3.026 3.026 1.547 0.828 0.828 0.968 0.8...
    ds00ft:                    [1.15 1.15 1.15 1.15 1.15 1.15 1.15 1.15 1.15 ...
    ds00xt:                    [21.874 21.414 21.414  7.197  3.85   3.85   4....
    nco_openmp_thread_number:  1

In [146]: ds = xr.merge([dsu, dsv, dsh])

In [147]: ds
Out[147]: 
<xarray.Dataset> Size: 346kB
Dimensions:  (y_u: 11, x_u: 13, time: 50, z: 6, y: 11, x: 13)
Coordinates:
    lon_u    (y_u, x_u) float32 572B -3.213 -3.138 -3.062 ... -2.388 -2.312
    lat_u    (y_u, x_u) float32 572B 44.69 44.69 44.69 ... 45.22 45.22 45.22
  * time     (time) datetime64[ns] 400B 2021-03-01 ... 2021-03-25T12:00:00
  * z        (z) int32 24B 1 2 3 4 5 6
    lon      (y, x) float32 572B ...
    lat      (y, x) float32 572B ...
Dimensions without coordinates: y_u, x_u, y, x
Data variables:
    u        (time, z, y_u, x_u) float32 172kB 0.03078 0.02891 ... 0.1252 0.1272
    bathy    (y, x) float32 572B ...
    dz       (time, z, y, x) float32 172kB ...
Attributes: (12/26)
    history:                   Fri Apr  9 12:15:12 2021: ncrcat -x -v bathyme...
    Conventions:               COARDS
    nhybrd:                    20
    nsigma:                    19
    kapref:                    2
    hybflg:                    0
    ...                        ...
    dp00ft:                    [1.15 1.15 1.15 1.15 1.15 1.15 1.15 1.15 1.15 ...
    dp00xt:                    [21.874 21.414 21.414 13.197  7.85   3.85   4....
    ds00t:                     [3.555 3.026 3.026 1.547 0.828 0.828 0.968 0.8...
    ds00ft:                    [1.15 1.15 1.15 1.15 1.15 1.15 1.15 1.15 1.15 ...
    ds00xt:                    [21.874 21.414 21.414  7.197  3.85   3.85   4....
    nco_openmp_thread_number:  1