Naming conventions with xoa.cf

Warning

Have a quick loop to the appendix The default configuration before going any further!

Introduction

This module is an application and extension of a subset of the CF conventions in application to naming conventions. It has two intents:

  • Searching xarray.DataArray variables or coordinates with another xarray.DataArray or a xarray.Dataset, by scanning name and attributes like units and standard_name.

  • Formatting xarray.DataArray variables or coordinates with unique name, and standard_name, long_name and units attributes, with support of staggered grid location syntax.

The module offers capabilities for the user to extend and specialize default behaviors for user’s special datasets.

Note

This module shares common feature with the excellent and long awaited xf_xarray package and started a long time before with the Vacumm package. The most notable differences differences include:

  • The current module is also designed for data variables, not only coordinates.

  • It search for items not only using standard_names but also specialized names.

  • It is not only available as accessors, but also as independant objects that can be configured for each type of dataset or in contexts by the user.

Accessing the current specifications

Scanning and formatting actions are based on specifications, and this module natively includes a default configuration for various oceanographic, sea surface and atmospheric variables and coordinates. A distinction is made between data variables (data_vars) and coordinates (coords), like in xarray.

Getting the current specifications for data variables and coordinates with the get_cf_specs() function:

In [1]: from xoa import cf

In [2]: cfspecs = cf.get_cf_specs()

In [3]: cfspecs["data_vars"] is cfspecs.data_vars
Out[3]: True

In [4]: cfspecs["data_vars"] is cfspecs.coords
Out[4]: False

In [5]: cfspecs.data_vars.names[:3]
Out[5]: ['ptemp', 'ctemp', 'temp']

In [6]: cfspecs.coords.names[:3]
Out[6]: ['x', 'y', 'level']

See the appendix The default configuration for the list of available default specifications.

An instance of the CFSpecs has other configuration sections than data_vars, coords and dims: registration, sglocator, vertical and accessors.

Data variables

Here is the example of the sst data variable:

In [7]: from pprint import pprint

In [8]: pprint(cfspecs.data_vars['sst'])
{'add_coords_loc': {},
 'add_loc': False,
 'alt_names': [],
 'attrs': {'long_name': ['Sea surface temperature'],
           'standard_name': ['sea_surface_temperature',
                             'surface_sea_water_temperature'],
           'units': ['degrees_celsius']},
 'cmap': 'cmo.thermal',
 'exclude': False,
 'inherit': None,
 'loc': None,
 'name': None,
 'processed': True,
 'search_order': 'sn',
 'select': {},
 'squeeze': None}

Description of specification keys:

CF specs for Data variables (data_vars)

Key

Type

Description

name

str

Specialized name for decoding and encoding which is empty by default

alt_names

list(str)

Alternate names for decoding

standard_name

list(str)

“standard_name” attributes

long_name

list(str)

“long_name” attributes

units

list(str)

“units” attributes

domain

choice

Domain of application, within {‘generic’, ‘atmos’, ‘ocean’, ‘surface’}

search_order

str

Search order within properties as combination of letters: [n]name, [s]tandard_name, [u]nits

cmap

str

Colormap specification

inherit

str

Inherit specifications from another data variable

select

eval

Item selection evaluated and applied to the array

squeeze

list(str)

List of dimensions that must be squeezed out

Note

The standard_name, long_name and units specifications are internally stored as a dict in the attrs key.

Get the specialized name and the attributes only:

In [9]: cfspecs.data_vars.get_name("sst")
Out[9]: 'sst'

In [10]: cfspecs.data_vars.get_attrs("sst")
Out[10]: 
{'standard_name': 'sea_surface_temperature',
 'long_name': 'Sea surface temperature',
 'units': 'degrees_celsius'}

Coordinates

Here is the example of the lon coordinate:

In [11]: from pprint import pprint

In [12]: pprint(cfspecs.coords['lon'])
{'add_coords_loc': {},
 'add_loc': None,
 'alt_names': ['longitude', 'nav_lon', 'longitudes', 'lons'],
 'attrs': {'axis': 'X',
           'long_name': ['Longitude'],
           'standard_name': ['longitude'],
           'units': ['degrees_east',
                     'degree_east',
                     'degree_e',
                     'degrees_e',
                     'degreee',
                     'degreese']},
 'exclude': False,
 'inherit': None,
 'loc': None,
 'name': None,
 'processed': True,
 'search_order': 'snu'}

Description of specification keys:

CF specs for Coordinates (coords)

Key

Type

Description

name

str

Specialized name for decoding and encoding which is empty by default

alt_names

list(str)

Alternate names for decoding

standard_name

list(str)

“standard_name” attributes

long_name

list(str)

“long_name” attributes

units

list(str)

“units” attributes

axis

str

“axis” attribute like X, Y, Z, T or F

search_order

str

Search order within properties as combination of letters: [n]name, [s]tandard_name, [u]nits

inherit

str

Inherit specifications from another data variable or coordinate

Note

Like for data variables, the standard_name, long_name, units and axis specifications are internally stored as a dict in the attrs key.

Get the specialized name and the attributes only:

In [13]: cfspecs.coords.get_name("lon")
Out[13]: 'lon'

In [14]: cfspecs.coords.get_attrs("lon")
Out[14]: 
{'standard_name': 'longitude',
 'long_name': 'Longitude',
 'units': 'degrees_east',
 'axis': 'X'}

Dimensions

CF specs for Dimensions (dims)

Key

Type

Description

x

list(str)

Possible names of X-type dimensions

y

list(str)

Possible names of Y-type dimensions

z

list(str)

Possible names of Z-type vertical dimensions

t

list(str)

Possible names of time dimensions

f

list(str)

Possible names of forecast dimensions

This section of the specs defines the names that the dimensions of type x, y, z, t and f (forecast) can take. It is filled automatically by default with possible names (name and alt_names) of the coordinates that have their axis defined. The user can also add more spcific names.

Theses list of names are used when searching for dimensions that are not coordinates: since they don’t have attribute, their type can only be guessed from their name.

Warning

It is recommended to not fill the dims section, but fill the coordinate section instead.

Here is the default content:

In [15]: cfspecs.dims  # or cfspecs["dims"]
Out[15]: 
{'x': ['x',
  'lon',
  'xi',
  'nx',
  'ni',
  'x',
  'imt',
  'ipi',
  'longitude',
  'nav_lon',
  'longitudes',
  'lons'],
 'y': ['y',
  'lat',
  'yi',
  'ny',
  'nj',
  'y',
  'jmt',
  'jpj',
  'latitude',
  'nav_lat',
  'latitudes',
  'lats'],
 'z': ['level',
  'depth',
  'altitude',
  'sig',
  'station',
  'model_level',
  'lev',
  'nz',
  'nk',
  'z',
  'kmt',
  'zi',
  'dep',
  'altitudet',
  'altitudeu',
  'altitudev',
  's_level',
  'sigma',
  's',
  'sigma_level',
  'stations'],
 't': ['time', 'nt'],
 'f': ['forecast', 'nf', 'fcst']}

Other sections

In [16]: cfspecs["register"]
Out[16]: {'name': 'generic', 'attrs': {}}

In [17]: cfspecs["sglocator"]
Out[17]: {'name_format': '{root}_{loc}', 'valid_locations': None}

In [18]: cfspecs["accessors"]
Out[18]: 
{'properties': {'coords': ['lon',
   'lat',
   'depth',
   'sig',
   'level',
   'altitude',
   'time',
   'forecast'],
  'data_vars': ['temp', 'sal', 'u', 'v', 'bathy', 'ssh']}}

In [19]: cfspecs["vertical"]
Out[19]: {'positive': None, 'type': None}

Searching within a Dataset or DataArray

Let’s define a minimal dataset:

In [20]: nx = 3

In [21]: lon = xr.DataArray(np.arange(3, dtype='d'), dims='mylon',
   ....:     attrs={'standard_name': 'longitude'})
   ....: 

In [22]: temp = xr.DataArray(np.arange(20, 23, dtype='d'), dims='mylon',
   ....:     coords={'mylon': lon},
   ....:     attrs={'standard_name': 'sea_water_temperature'})
   ....: 

In [23]: sal = xr.DataArray(np.arange(33, 36, dtype='d'), dims='mylon',
   ....:     coords={'mylon': lon},
   ....:     attrs={'standard_name': 'sea_water_salinity'})
   ....: 

In [24]: ds = xr.Dataset({'mytemp': temp, 'mysal': sal})

All these arrays are CF compliant according to their standard_name attribute, despite their name is not really explicit.

Check if they match known or explicit CF items:

In [25]: cfspecs.coords.match(lon, "lon") # explicit
Out[25]: True

In [26]: cfspecs.coords.match(lon, "lat") # explicit
Out[26]: False

In [27]: cfspecs.coords.match(lon) # any known
Out[27]: 'lon'

In [28]: cfspecs.data_vars.match(temp) # any known
Out[28]: 'temp'

In [29]: cfspecs.data_vars.match(sal) # any known
Out[29]: 'sal'

Search for known CF items:

In [30]: mytemp = cfspecs.search(ds, "temp")

In [31]: mylon = cfspecs.search(mytemp, "lon")

Datasets are searched for data variables (“data_vars”) and data variables are searched for coordinates (“coords”). You can also search for coordinates in datasets, for instance like this:

In [32]: cfspecs.coords.search(ds, "lon")
Out[32]: 
<xarray.DataArray 'mylon' (mylon: 3)>
array([0., 1., 2.])
Coordinates:
  * mylon    (mylon) float64 0.0 1.0 2.0
Attributes:
    standard_name:  longitude

The staggered-grid locator SGLocator

A CFSpecs instance comes with SGLocator that is accessible through the sglocator attribute. A SGLocator helps parsing and formatting staggered grid location from name, standard_name and long_name data array attributes. It is configured at the sglocator section in which you can specify the format of the name (name_format and the allowed location names allowed_locations.

In [33]: sglocator = cfspecs.sglocator

In [34]: sglocator.formats
Out[34]: 
{'name': '{root}_{loc}',
 'standard_name': '{root}_at_{loc}_location',
 'long_name': '{root} at {loc} location'}

In [35]: sglocator.format_attr("name", "lon", "rho")
Out[35]: 'lon_rho'

In [36]: sglocator.format_attr("standard_name", "lon", "rho")
Out[36]: 'lon_at_rho_location'

In [37]: sglocator.format_attr("long_name", "lon", "rho")
Out[37]: 'Lon at RHO location'

In [38]: sglocator.parse_attr("name", "lon_rho")
Out[38]: ('lon', 'rho')

Formatting i.e encoding and decoding

The idea

Formatting means changing or setting names and attributes. It is possible to format, or even auto-format data variables and coordinates.

During an auto-formatting, each array is matched against CF specs, and the array is formatted when a matching is successfull. If the array contains coordinates, the same process is applied on them, as soon as the format_coords keyword is True. If the name key is set in the matching item specs, its value can be used to name the variable or coordinate array, else the generic name is used like "lon".

Explicit formatting:

In [39]: cfspecs.format_coord(lon, "lon")
Out[39]: 
<xarray.DataArray 'lon' (x: 3)>
array([0., 1., 2.])
Dimensions without coordinates: x
Attributes:
    standard_name:  longitude
    long_name:      Longitude
    units:          degrees_east
    axis:           X

In [40]: cfspecs.format_data_var(temp, "temp")
Out[40]: 
<xarray.DataArray 'temp' (lon: 3)>
array([20., 21., 22.])
Coordinates:
  * lon      (lon) float64 0.0 1.0 2.0
Attributes:
    standard_name:  sea_water_temperature
    long_name:      Sea water in situ temperature
    units:          degrees_celsius

Auto-formatting:

In [41]: ds2 = cfspecs.auto_format(ds)

In [42]: ds2.temp
Out[42]: 
<xarray.DataArray 'temp' (lon: 3)>
array([20., 21., 22.])
Coordinates:
  * lon      (lon) float64 0.0 1.0 2.0
Attributes:
    standard_name:  sea_water_temperature
    long_name:      Sea water in situ temperature
    units:          degrees_celsius

In [43]: ds2.lon
Out[43]: 
<xarray.DataArray 'lon' (lon: 3)>
array([0., 1., 2.])
Coordinates:
  * lon      (lon) float64 0.0 1.0 2.0
Attributes:
    standard_name:  longitude
    long_name:      Longitude
    units:          degrees_east
    axis:           X

It can be applied to a data array or a full dataset.

Encoding and decoding

By default, formatting renames known arrays to their generic name, like “temp” in the example above. We speak here of encoding. If the specialize keyword is set to True, arrays are renamed with their specialized name if set in the specs with the name option. We speak here of decoding. Two shortcut methods exists for these tasks:

Chaining the two methods should lead to the initial dataset or data array. See the last section of this page for an exemple: Example: decoding/encoding Croco model outputs.

Using the accessors

Accessors for xarray.Dataset and xarray.DataArray can be registered with the xoa.cf.register_cf_accessors():

In [44]: import xoa

In [45]: xoa.register_accessors(xcf=True)

The accessor is named here xcf to not conflict with the cf accessor of cf-xarray.

Note

All xoa accessors can be be registered with xoa.register_accessors(). Note also that all functionalities of the xcf accessor are also available with the more global xoa accessor.

These accessors make it easy to use some of the xoa.cf.CFSpecs capabilities. Here are examples of use:

In [46]: temp
Out[46]: 
<xarray.DataArray (mylon: 3)>
array([20., 21., 22.])
Coordinates:
  * mylon    (mylon) float64 0.0 1.0 2.0
Attributes:
    standard_name:  sea_water_temperature

In [47]: temp.xcf.get("lon") # access by .get
Out[47]: 
<xarray.DataArray 'mylon' (mylon: 3)>
array([0., 1., 2.])
Coordinates:
  * mylon    (mylon) float64 0.0 1.0 2.0
Attributes:
    standard_name:  longitude

In [48]: ds.xcf.get("temp") # access by .get
Out[48]: 
<xarray.DataArray 'mytemp' (mylon: 3)>
array([20., 21., 22.])
Coordinates:
  * mylon    (mylon) float64 0.0 1.0 2.0
Attributes:
    standard_name:  sea_water_temperature

In [49]: ds.xcf.lon # access by attribute
Out[49]: 
<xarray.DataArray 'mylon' (mylon: 3)>
array([0., 1., 2.])
Coordinates:
  * mylon    (mylon) float64 0.0 1.0 2.0
Attributes:
    standard_name:  longitude

In [50]: ds.xcf.coords.lon  # specific search = ds.cf.coords.get("lon")
Out[50]: 
<xarray.DataArray 'mylon' (mylon: 3)>
array([0., 1., 2.])
Coordinates:
  * mylon    (mylon) float64 0.0 1.0 2.0
Attributes:
    standard_name:  longitude

In [51]: ds.xcf.temp # access by attribute
Out[51]: 
<xarray.DataArray 'mytemp' (mylon: 3)>
array([20., 21., 22.])
Coordinates:
  * mylon    (mylon) float64 0.0 1.0 2.0
Attributes:
    standard_name:  sea_water_temperature

In [52]: ds.xcf["temp"].name # access by item
Out[52]: 'mytemp'

In [53]: ds.xcf.data_vars.temp.name  # specific search = ds.cf.coords.get("temp")
Out[53]: 'mytemp'

In [54]: ds.xcf.data_vars.bathy is None # returns None when not found
Out[54]: True

In [55]: ds.xcf.temp.xcf.lon.name  # chaining
Out[55]: 'mylon'

In [56]: ds.xcf.temp.xcf.name # CF name, not real name
Out[56]: 'temp'

In [57]: ds.xcf.temp.xcf.attrs # attributes, merged with CF attrs
Out[57]: 
{'standard_name': 'sea_water_temperature',
 'long_name': 'Sea water in situ temperature',
 'units': 'degrees_celsius'}

In [58]: ds.xcf.temp.xcf.standard_name # single attribute
Out[58]: 'sea_water_temperature'

In [59]: ds.mytemp.xcf.auto_format() # or ds.temp.xcf()
Out[59]: 
<xarray.DataArray 'temp' (lon: 3)>
array([20., 21., 22.])
Coordinates:
  * lon      (lon) float64 0.0 1.0 2.0
Attributes:
    standard_name:  sea_water_temperature
    long_name:      Sea water in situ temperature
    units:          degrees_celsius

In [60]: ds.xcf.auto_format() # or ds.xcf()
Out[60]: 
<xarray.Dataset>
Dimensions:  (lon: 3)
Coordinates:
  * lon      (lon) float64 0.0 1.0 2.0
Data variables:
    temp     (lon) float64 20.0 21.0 22.0
    sal      (lon) float64 33.0 34.0 35.0

As you can see, accessing an accessor attribute or item make an implicit call to get. The root accessor cf give access to two sub-accessors, data_vars and coords, for being able to specialize the searches.

See also

xoa.cf.DataArrayCFAccessor xoa.cf.DatasetCFAccessor

Changing the CF specs

Default user file

The xoa.cf module has internal defaults as shown in appendix The default configuration.

You can extend these defaults with a user file, whose location is printable with the following command, at the line containing “user CF specs file”:

$ xoa info paths
- xoa library dir: /home/docs/checkouts/readthedocs.org/user_builds/xoa/conda/latest/lib/python3.11/site-packages/xoa
- user config file: /home/docs/.config/xoa/xoa.cfg [*]
- user CF specs file: /home/docs/.config/xoa/cf.cfg [*]
- user CF cache file: /home/docs/.cache/xoa/cf.pyk
- data samples: mercator.cfg hycom.gdp.u.nc argo-7900573.nc hycom.gdp.v.nc gdp-6203641.csv croco.south-africa.surf.nc croco.cfg croco.south-africa.meridional.nc hycom.gdp.h.nc hycom.cfg croco.south-africa.zonal.nc ibi-argo-7900573.nc gdp.cfg argo.cfg
*: file not present

Update the current specs

The current specs can be updated with different methods.

From a well structured dictionary:

In [61]: cfspecs.load_cfg({"data_vars": {"banana": {"standard_name": "banana"}}})

In [62]: cfspecs.data_vars["banana"]
Out[62]: 
{'standard_name': 'banana',
 'name': None,
 'alt_names': [],
 'cmap': None,
 'inherit': None,
 'squeeze': None,
 'search_order': 'sn',
 'loc': None,
 'add_loc': False,
 'exclude': False,
 'add_coords_loc': {},
 'attrs': {'long_name': ['Banana'], 'standard_name': [], 'units': []},
 'select': {},
 'processed': True}

From a configuration file: instead of the dictionary as an argument to load_cfg() method, you can give either a file name or a multi-line string with the same content as the file. Following the previous example:

[data_vars]
    [[banana]]
        standard_name: banana

If you only want to update a category, you can use such method (here set_specs()):

In [63]: cfspecs.data_vars.set_specs("banana", name="bonono")

In [64]: cfspecs.data_vars["banana"]["name"]
Out[64]: 'bonono'

Alternatively, a xoa.cf.CFSpecs instance can be loaded with the load_cfg() method, as explained below.

Create new specs from scratch

To create new specs, you must instantiate the xoa.cf.CFSpecs class, with an input type as those presented above:

  • A config file name.

  • A Multi-line string in the format of a config file.

  • A dictionary.

  • A configobj.ConfigObj instance.

  • Another CFSpecs instance.

  • A list of them, with the having priority over the lasts.

The initialization also accepts two options:

  • default: wether to load or not the default internal config.

  • user: wether to load or not the user config file.

An config created from default and user configs:

In [65]: banana_specs = {"data_vars": {"banana": {"attrs": {"standard_name": "banana"}}}}

In [66]: mycfspecs = cf.CFSpecs(banana_specs)

In [67]: mycfspecs["data_vars"]["sst"]["attrs"]["standard_name"]
Out[67]: ['sea_surface_temperature', 'surface_sea_water_temperature']

In [68]: mycfspecs["data_vars"]["banana"]["attrs"]["standard_name"]
Out[68]: ['banana']

An config created from scratch:

In [69]: mycfspecs = cf.CFSpecs(banana_specs, default=False, user=False)

In [70]: mycfspecs.pprint(depth=2)
{'accessors': {'properties': {...}},
 'coords': {},
 'data_vars': {'banana': {...}},
 'dims': {'f': [], 't': [], 'x': [], 'y': [], 'z': []},
 'register': {'attrs': {}, 'name': ''},
 'sglocator': {'name_format': '{root}_{loc}', 'valid_locations': None},
 'vertical': {'positive': None, 'type': None}}

An config created from two other configs:

In [71]: cfspecs_banana = cf.CFSpecs(banana_specs, default=False, user=False)

In [72]: apple_specs = {"data_vars": {"apple": {"attrs": {"long_name": "Big apple"}}}}

In [73]: cfspecs_apple = cf.CFSpecs(apple_specs, default=False, user=False)

In [74]: cfspecs_fruits = cf.CFSpecs((cfspecs_apple, cfspecs_banana),
   ....:     default=False, user=False)
   ....: 

In [75]: cfspecs_fruits.data_vars.names
Out[75]: ['apple', 'banana']

Replacing the currents CF specs

As shown before, the currents CF specs are accessible with the xoa.cf.get_cf_specs() function. You can replace them with the xoa.cf.set_cf_specs class, to be used as a fonction.

In [76]: cfspecs_old = cf.get_cf_specs()

In [77]: cf.set_cf_specs(cfspecs_banana)
Out[77]: <xoa.cf.set_cf_specs at 0x7f37bc230850>

In [78]: cf.get_cf_specs() is cfspecs_banana
Out[78]: True

In [79]: cf.set_cf_specs(cfspecs_old)
Out[79]: <xoa.cf.set_cf_specs at 0x7f37bc1c7b50>

In [80]: cf.get_cf_specs() is cfspecs_old
Out[80]: True

In case of a temporary change, you can used set_cf_specs in a context statement:

In [81]: with cf.set_cf_specs(cfspecs_banana) as myspecs:
   ....:     print('inside', cf.get_cf_specs() is cfspecs_banana)
   ....:     print('inside', myspecs is cf.get_cf_specs())
   ....: 
inside True
inside True

In [82]: print('outside', cf.get_cf_specs() is cfspecs_old)
outside True

For convience, you can set specs directly with a dictionary:

In [83]: with cf.set_cf_specs({"data_vars": {"apple": {}}}) as myspecs:
   ....:     print("apple" in cf.get_cf_specs())
   ....: 
False

In [84]: print("apple" in cf.get_cf_specs())
False

Application with an accessor usage:

In [85]: data = xr.DataArray([5], attrs={'standard_name': 'sea_surface_banana'})

In [86]: ds = xr.Dataset({'toto': data})

In [87]: mycfspecs = cf.CFSpecs({"data_vars": {"ssb":
   ....:     {"standard_name": "sea_surface_banana"}}})
   ....: 

In [88]: with cf.set_cf_specs(mycfspecs):
   ....:     print(ds.xcf.get("ssb"))
   ....: 
None

Working with registered specs

Registering and accessing new specs

It is possible to register specialized CFSpecs instances with register_cf_specs() for future access.

Here we register new specs with a internal registration name "mycroco":

In [89]: content = {
   ....:     "register": {
   ....:         "name": "mycroco"
   ....:     },
   ....:     "data_vars": {
   ....:         "temp": {
   ....:             "name": "supertemp"
   ....:         }
   ....:     },
   ....:     "coords": {
   ....:         "lon": {
   ....:             "name": "mylon"
   ....:         }
   ....:     }
   ....: }
   ....: 

In [90]: mycfspecs = cf.CFSpecs(content)

In [91]: cf.register_cf_specs(mycfspecs)

We can now access with it the get_cf_specs() function:

In [92]: these_cfspecs = cf.get_cf_specs('mycroco')

In [93]: these_cfspecs is mycfspecs
Out[93]: True

Inferring the best specs for my dataset

If you set the cfspecs attribute or encoding of a dataset to the name of a registered CFSpecs instance, you can get it automatically with the get_cf_specs() or infer_cf_specs() functions.

Let’s register another CFSpecs instance:

In [94]: content = {
   ....:     "register": {
   ....:         "name": "myhycom"
   ....:     },
   ....:     "data_vars": {
   ....:         "sal": {
   ....:             "name": "supersal"
   ....:         }
   ....:     },
   ....: }
   ....: 

In [95]: mycfspecs2 = cf.CFSpecs(content)

In [96]: cf.register_cf_specs(mycfspecs2)

Let’s create a dataset:

In [97]: ds = xr.Dataset({'supertemp': ("mylon", [0, 2])}, coords={"mylon": [10, 20]})

Now find the best registered specs instance which has either the name myhycom or mycroco:

In [98]: cf_specs_auto = cf.infer_cf_specs(ds)

In [99]: print(cf_specs_auto.name)
mycroco

In [100]: ds_decoded = cf_specs_auto.decode(ds)

In [101]: ds_decoded
Out[101]: 
<xarray.Dataset>
Dimensions:  (lon: 2)
Coordinates:
  * lon      (lon) int64 10 20
Data variables:
    temp     (lon) int64 0 2

In [102]: cf_specs_auto.encode(ds)
Out[102]: 
<xarray.Dataset>
Dimensions:    (mylon: 2)
Coordinates:
  * mylon      (mylon) int64 10 20
Data variables:
    supertemp  (mylon) int64 0 2

It is mycroco as expected.

Assigning registered specs to a dataset or data array

All xoa routines that needs to access specific coordinates or variables try to infer the approriate specs, which default to the current specs. When the cfspecs attribute or encoding is set, get_cf_specs() uses it to search within registered specs.

In [103]: ds.encoding.update(cfspecs="mycroco")

In [104]: cfspecs = cf.get_cf_specs(ds)

In [105]: cfspecs.encode(ds)
Out[105]: 
<xarray.Dataset>
Dimensions:    (mylon: 2)
Coordinates:
  * mylon      (mylon) int64 10 20
Data variables:
    supertemp  (mylon) int64 0 2

The cfspecs encoding is set at the dataset level, not at the data array level:

In [106]: cf.get_cf_specs(ds.supertemp) is cfspecs
Out[106]: True

To propagate to all the data arrays, use assign_cf_specs():

In [107]: cf.assign_cf_specs(ds, "mycroco")
Out[107]: 
<xarray.Dataset>
Dimensions:    (mylon: 2)
Coordinates:
  * mylon      (mylon) int64 10 20
Data variables:
    supertemp  (mylon) int64 0 2

In [108]: ds.mylon.encoding
Out[108]: {'cf_specs': 'mycroco'}

In [109]: cf.get_cf_specs(ds.supertemp) is cfspecs
Out[109]: False

Example: decoding/encoding Croco model outputs

Here are the specs for Croco:

[register]
name=croco

    [[attrs]]
    ini_file=*CROCO*
    type=*ROMS*

[sglocator]
valid_locations=rho,w,u,v,r,psi

[accessors]

    [[properties]]
    coords=ptemp

[vertical]
positive=up
type=sigma

[data_vars]

    [[sal]]
    name=salt

    [[ptemp]]
    name=temp

    [[ssh]]
    name=zeta

    [[bathy]]
    name=h

    [[mask]]
    mask=mask
    add_loc=True

    [[corio]]
    name=f

    [[akt]]
    name=AKt

    [[ex]]
    name=pm

    [[ey]]
    name=pn

    [[sig]]
    name=s

    [[cs]]
    name=Cs
    add_loc=True

[coords]

    [[x]]
    name=xi
    search_order=n
    standard_name=x_grid_index

    [[y]]
    name=eta
    search_order=n
    standard_name=y_grid_index

    [[sig]]
    name=s
    add_loc=True

Register them:

In [110]: xoa.cf.register_cf_specs(xoa.get_data_sample("croco.cfg"))

In [111]: xoa.cf.is_registered_cf_specs("croco")
Out[111]: True

Register the xoa accessor:

In [112]: xoa.register_accessors()

Now let’s open a Croco sample as a xarray dataset:

In [113]: ds = xoa.open_data_sample("croco.south-africa.meridional.nc")

In [114]: ds
Out[114]: 
<xarray.Dataset>
Dimensions:     (time: 1, s_w: 33, eta_rho: 56, xi_rho: 1, s_rho: 32,
                 eta_v: 55, xi_u: 1, auxil: 4)
Coordinates: (12/13)
  * eta_rho     (eta_rho) float32 6.0 7.0 8.0 9.0 10.0 ... 58.0 59.0 60.0 61.0
  * eta_v       (eta_v) float32 6.5 7.5 8.5 9.5 10.5 ... 57.5 58.5 59.5 60.5
    lat_rho     (eta_rho, xi_rho) float32 ...
    lat_u       (eta_rho, xi_u) float32 ...
    lat_v       (eta_v, xi_rho) float32 ...
    lon_rho     (eta_rho, xi_rho) float32 ...
    ...          ...
    lon_v       (eta_v, xi_rho) float32 ...
  * s_rho       (s_rho) float32 -0.9844 -0.9531 -0.9219 ... -0.04688 -0.01562
  * s_w         (s_w) float32 -1.0 -0.9688 -0.9375 ... -0.0625 -0.03125 0.0
  * time        (time) float64 2.592e+06
  * xi_rho      (xi_rho) float32 131.0
  * xi_u        (xi_u) float32 131.5
Dimensions without coordinates: auxil
Data variables: (12/23)
    AKt         (time, s_w, eta_rho, xi_rho) float32 ...
    Cs_r        (s_rho) float32 ...
    Cs_w        (s_w) float32 ...
    Vtransform  float32 ...
    angle       (eta_rho, xi_rho) float32 ...
    el          float32 ...
    ...          ...
    time_step   (time, auxil) int32 ...
    u           (time, s_rho, eta_rho, xi_u) float32 ...
    v           (time, s_rho, eta_v, xi_rho) float32 ...
    w           (time, s_rho, eta_rho, xi_rho) float32 ...
    xl          float32 ...
    zeta        (time, eta_rho, xi_rho) float32 ...
Attributes: (12/57)
    type:           ROMS history file
    title:          BENGUELA TEST MODEL
    date:           
    rst_file:       CROCO_FILES/croco_rst.nc
    his_file:       CROCO_FILES/croco_his.nc
    avg_file:       CROCO_FILES/croco_avg.nc
    ...             ...
    v_sponge:       0.0
    sponge_expl:    Sponge parameters : extent (m) & viscosity (m2.s-1)
    SRCS:           main.F step.F read_inp.F timers_roms.F init_scalars.F ini...
    CPP-options:    REGIONAL BENGUELA_VHR MPI OBC_EAST OBC_WEST OBC_NORTH OBC...
    history:        Tue Mar 31 16:26:24 2020: ncks -O -d time,30 -d xi_rho,13...
    NCO:            4.4.2

Let’s decode it:

In [115]: dsd = ds.xoa.decode()

In [116]: dsd
Out[116]: 
<xarray.Dataset>
Dimensions:     (time: 1, sig_w: 33, y_rho: 56, x_rho: 1, sig_rho: 32, y_v: 55,
                 x_u: 1, auxil: 4)
Coordinates: (12/13)
  * y_rho       (y_rho) float32 6.0 7.0 8.0 9.0 10.0 ... 58.0 59.0 60.0 61.0
  * y_v         (y_v) float32 6.5 7.5 8.5 9.5 10.5 ... 56.5 57.5 58.5 59.5 60.5
    lat_rho     (y_rho, x_rho) float32 ...
    lat_u       (y_rho, x_u) float32 ...
    lat_v       (y_v, x_rho) float32 ...
    lon_rho     (y_rho, x_rho) float32 ...
    ...          ...
    lon_v       (y_v, x_rho) float32 ...
  * sig_rho     (sig_rho) float32 -0.9844 -0.9531 -0.9219 ... -0.04688 -0.01562
  * sig_w       (sig_w) float32 -1.0 -0.9688 -0.9375 ... -0.0625 -0.03125 0.0
  * time        (time) float64 2.592e+06
  * x_rho       (x_rho) float32 131.0
  * x_u         (x_u) float32 131.5
Dimensions without coordinates: auxil
Data variables: (12/23)
    akt         (time, sig_w, y_rho, x_rho) float32 ...
    cs_r        (sig_rho) float32 ...
    cs_w        (sig_w) float32 ...
    Vtransform  float32 ...
    angle       (y_rho, x_rho) float32 ...
    el          float32 ...
    ...          ...
    time_step   (time, auxil) int32 ...
    u           (time, sig_rho, y_rho, x_u) float32 ...
    v           (time, sig_rho, y_v, x_rho) float32 ...
    w           (time, sig_rho, y_rho, x_rho) float32 ...
    xl          float32 ...
    ssh         (time, y_rho, x_rho) float32 ...
Attributes: (12/57)
    type:           ROMS history file
    title:          BENGUELA TEST MODEL
    date:           
    rst_file:       CROCO_FILES/croco_rst.nc
    his_file:       CROCO_FILES/croco_his.nc
    avg_file:       CROCO_FILES/croco_avg.nc
    ...             ...
    v_sponge:       0.0
    sponge_expl:    Sponge parameters : extent (m) & viscosity (m2.s-1)
    SRCS:           main.F step.F read_inp.F timers_roms.F init_scalars.F ini...
    CPP-options:    REGIONAL BENGUELA_VHR MPI OBC_EAST OBC_WEST OBC_NORTH OBC...
    history:        Tue Mar 31 16:26:24 2020: ncks -O -d time,30 -d xi_rho,13...
    NCO:            4.4.2

Let’s re-encode it!

In [117]: dse = dsd.xoa.encode()

In [118]: dse
Out[118]: 
<xarray.Dataset>
Dimensions:     (time: 1, s_w: 33, eta_rho: 56, xi_rho: 1, s_rho: 32,
                 eta_v: 55, xi_u: 1, auxil: 4)
Coordinates: (12/13)
  * eta_rho     (eta_rho) float32 6.0 7.0 8.0 9.0 10.0 ... 58.0 59.0 60.0 61.0
  * eta_v       (eta_v) float32 6.5 7.5 8.5 9.5 10.5 ... 57.5 58.5 59.5 60.5
    lat_rho     (eta_rho, xi_rho) float32 ...
    lat_u       (eta_rho, xi_u) float32 ...
    lat_v       (eta_v, xi_rho) float32 ...
    lon_rho     (eta_rho, xi_rho) float32 ...
    ...          ...
    lon_v       (eta_v, xi_rho) float32 ...
  * s_rho       (s_rho) float32 -0.9844 -0.9531 -0.9219 ... -0.04688 -0.01562
  * s_w         (s_w) float32 -1.0 -0.9688 -0.9375 ... -0.0625 -0.03125 0.0
  * time        (time) float64 2.592e+06
  * xi_rho      (xi_rho) float32 131.0
  * xi_u        (xi_u) float32 131.5
Dimensions without coordinates: auxil
Data variables: (12/23)
    AKt         (time, s_w, eta_rho, xi_rho) float32 ...
    Cs_r        (s_rho) float32 ...
    Cs_w        (s_w) float32 ...
    Vtransform  float32 ...
    angle       (eta_rho, xi_rho) float32 ...
    el          float32 ...
    ...          ...
    time_step   (time, auxil) int32 ...
    u           (time, s_rho, eta_rho, xi_u) float32 ...
    v           (time, s_rho, eta_v, xi_rho) float32 ...
    w           (time, s_rho, eta_rho, xi_rho) float32 ...
    xl          float32 ...
    zeta        (time, eta_rho, xi_rho) float32 ...
Attributes: (12/57)
    type:           ROMS history file
    title:          BENGUELA TEST MODEL
    date:           
    rst_file:       CROCO_FILES/croco_rst.nc
    his_file:       CROCO_FILES/croco_his.nc
    avg_file:       CROCO_FILES/croco_avg.nc
    ...             ...
    v_sponge:       0.0
    sponge_expl:    Sponge parameters : extent (m) & viscosity (m2.s-1)
    SRCS:           main.F step.F read_inp.F timers_roms.F init_scalars.F ini...
    CPP-options:    REGIONAL BENGUELA_VHR MPI OBC_EAST OBC_WEST OBC_NORTH OBC...
    history:        Tue Mar 31 16:26:24 2020: ncks -O -d time,30 -d xi_rho,13...
    NCO:            4.4.2

Et voilà !

Example: decoding and merging Hycom splitted outputs

At Shom, the Hycom model outputs are splitted into separate files, one for each variable. Conflicts may occur when using variables that are not at the same staggered grid location, since all variables are stored with dimensions (y, x) and lon and lat coordinates, all with the same name. To avoid this problem, the horizontal dimensions and coordinates of staggered variables are renamed to indicate their location.

Here are the specs to take care of the staggered grid indicators in the names:

[register]
name=hycom

    [[attrs]]
    nsigma = "[0-9]?"

[vertical]
positive=down
type=dz

[data_vars]

    [[u]]
    loc=u
        [[[add_coords_loc]]]
        lon=True
        lat=True
        x=True
        y=True

    [[v]]
    loc=v
        [[[add_coords_loc]]]
        lon=True
        lat=True
        x=True
        y=True

    [[bathy]]
    name=bathymetry
    add_loc=True

    [[dz]]
    name=h

[coords]

    [[x]]
    name=X

    [[y]]
    name=Y

    [[z]]
    name=lev

Note the add_coords_loc sections. Location is not added to the u and v variables but to their dimensions and coordinates.

Register them:

In [119]: xoa.cf.register_cf_specs(xoa.get_data_sample("hycom.cfg"))

In [120]: xoa.cf.is_registered_cf_specs("hycom")
Out[120]: True

Overview of the U dataset:

In [121]: dsu = xoa.open_data_sample("hycom.gdp.u.nc")

In [122]: dsu
Out[122]: 
<xarray.Dataset>
Dimensions:  (Y: 11, X: 13, lev: 6, time: 50)
Coordinates:
  * lev      (lev) int32 1 2 3 4 5 6
  * time     (time) datetime64[ns] 2021-03-01 ... 2021-03-25T12:00:00
Dimensions without coordinates: Y, X
Data variables:
    lon      (Y, X) float32 ...
    lat      (Y, X) float32 ...
    u        (time, lev, Y, X) float32 ...
Attributes: (12/26)
    history:                   Fri Apr  9 12:15:12 2021: ncrcat -x -v bathyme...
    Conventions:               COARDS
    nhybrd:                    20
    nsigma:                    19
    kapref:                    2
    hybflg:                    0
    ...                        ...
    dp00ft:                    [1.15 1.15 1.15 1.15 1.15 1.15 1.15 1.15 1.15 ...
    dp00xt:                    [21.874 21.414 21.414 13.197  7.85   3.85   4....
    ds00t:                     [3.555 3.026 3.026 1.547 0.828 0.828 0.968 0.8...
    ds00ft:                    [1.15 1.15 1.15 1.15 1.15 1.15 1.15 1.15 1.15 ...
    ds00xt:                    [21.874 21.414 21.414  7.197  3.85   3.85   4....
    nco_openmp_thread_number:  1

Decoding:

In [123]: dsu = dsu.xoa.decode()

In [124]: dsu
Out[124]: 
<xarray.Dataset>
Dimensions:  (y_u: 11, x_u: 13, z: 6, time: 50)
Coordinates:
    lon_u    (y_u, x_u) float32 ...
    lat_u    (y_u, x_u) float32 ...
  * z        (z) int32 1 2 3 4 5 6
  * time     (time) datetime64[ns] 2021-03-01 ... 2021-03-25T12:00:00
Dimensions without coordinates: y_u, x_u
Data variables:
    u        (time, z, y_u, x_u) float32 ...
Attributes: (12/26)
    history:                   Fri Apr  9 12:15:12 2021: ncrcat -x -v bathyme...
    Conventions:               COARDS
    nhybrd:                    20
    nsigma:                    19
    kapref:                    2
    hybflg:                    0
    ...                        ...
    dp00ft:                    [1.15 1.15 1.15 1.15 1.15 1.15 1.15 1.15 1.15 ...
    dp00xt:                    [21.874 21.414 21.414 13.197  7.85   3.85   4....
    ds00t:                     [3.555 3.026 3.026 1.547 0.828 0.828 0.968 0.8...
    ds00ft:                    [1.15 1.15 1.15 1.15 1.15 1.15 1.15 1.15 1.15 ...
    ds00xt:                    [21.874 21.414 21.414  7.197  3.85   3.85   4....
    nco_openmp_thread_number:  1

Now we can read, decode and merge all files without any conflict:

In [125]: dsv = xoa.open_data_sample("hycom.gdp.u.nc").xoa.decode()

In [126]: dsh = xoa.open_data_sample("hycom.gdp.h.nc").xoa.decode()

In [127]: ds = xr.merge([dsu, dsv, dsh])

In [128]: ds
Out[128]: 
<xarray.Dataset>
Dimensions:  (y_u: 11, x_u: 13, z: 6, time: 50, y: 11, x: 13)
Coordinates:
    lon_u    (y_u, x_u) float32 -3.213 -3.138 -3.062 ... -2.463 -2.388 -2.312
    lat_u    (y_u, x_u) float32 44.69 44.69 44.69 44.69 ... 45.22 45.22 45.22
  * z        (z) int32 1 2 3 4 5 6
  * time     (time) datetime64[ns] 2021-03-01 ... 2021-03-25T12:00:00
    lon      (y, x) float32 ...
    lat      (y, x) float32 ...
Dimensions without coordinates: y_u, x_u, y, x
Data variables:
    u        (time, z, y_u, x_u) float32 0.03078 0.02891 ... 0.1252 0.1272
    bathy    (y, x) float32 ...
    dz       (time, z, y, x) float32 ...
Attributes: (12/26)
    history:                   Fri Apr  9 12:15:12 2021: ncrcat -x -v bathyme...
    Conventions:               COARDS
    nhybrd:                    20
    nsigma:                    19
    kapref:                    2
    hybflg:                    0
    ...                        ...
    dp00ft:                    [1.15 1.15 1.15 1.15 1.15 1.15 1.15 1.15 1.15 ...
    dp00xt:                    [21.874 21.414 21.414 13.197  7.85   3.85   4....
    ds00t:                     [3.555 3.026 3.026 1.547 0.828 0.828 0.968 0.8...
    ds00ft:                    [1.15 1.15 1.15 1.15 1.15 1.15 1.15 1.15 1.15 ...
    ds00xt:                    [21.874 21.414 21.414  7.197  3.85   3.85   4....
    nco_openmp_thread_number:  1