Metadata, CF and naming conventions#
Warning
Have a quick look at the appendix The default configuration before going any further!
Introduction#
The xoa.meta subpackage is an application and extension of a subset of the
CF conventions in application
to naming conventions.
It has two main intents:
Searching
xarray.DataArrayvariables or coordinates within anotherxarray.DataArrayor axarray.Dataset, by scanning name and attributes likeunitsandstandard_name.Formatting
xarray.DataArrayvariables or coordinates with uniquename, andstandard_name,long_nameandunitsattributes, with support of staggered grid location syntax.
The module offers capabilities for the user to extend and specialize default behaviors for user’s special datasets.
xoa provides ready-to-use configurations for decoding and encoding a few standard dataset types.
Look at Specialized configurations.
Note
This module shares common features with the excellent and long awaited cf_xarray package and started a long time before with the Vacumm package. The most notable differences include:
The current module is also designed for data variables and dimensions, not only coordinates.
It searches for items not only using standard_names but also specialized names. It means that it works even with datasets that are not well formatted.
It is not only available as accessors, but also as independent specification objects that can be configured by the user for each type of dataset.
Quick start with accessors#
The easiest way to use xoa.meta is through xarray accessors.
Let’s set up a sample dataset and register the accessors:
In [1]: from xoa import meta
In [2]: meta_specs = meta.get_meta_specs()
In [3]: nx = 3
In [4]: lon = xr.DataArray(np.arange(3, dtype='d'), dims='mylon',
...: attrs={'standard_name': 'longitude'})
...:
In [5]: temp = xr.DataArray(np.arange(20, 23, dtype='d'), dims='mylon',
...: coords={'mylon': lon},
...: attrs={'standard_name': 'sea_water_temperature'})
...:
In [6]: sal = xr.DataArray(np.arange(33, 36, dtype='d'), dims='mylon',
...: coords={'mylon': lon},
...: attrs={'standard_name': 'sea_water_salinity'})
...:
In [7]: ds = xr.Dataset({'mytemp': temp, 'mysal': sal})
In [8]: import xoa
In [9]: xoa.register_accessors()
All these arrays are CF compliant according to their
standard_name attribute, despite their names not being very explicit.
The accessors make it easy to find, format and chain operations:
In [10]: temp
Out[10]:
<xarray.DataArray (mylon: 3)> Size: 24B
array([20., 21., 22.])
Coordinates:
* mylon (mylon) float64 24B 0.0 1.0 2.0
Attributes:
standard_name: sea_water_temperature
In [11]: temp.meta.get("lon") # access by .get
Out[11]:
<xarray.DataArray 'mylon' (mylon: 3)> Size: 24B
array([0., 1., 2.])
Coordinates:
* mylon (mylon) float64 24B 0.0 1.0 2.0
Attributes:
standard_name: longitude
In [12]: ds.meta.get("temp") # access by .get
Out[12]:
<xarray.DataArray 'mytemp' (mylon: 3)> Size: 24B
array([20., 21., 22.])
Coordinates:
* mylon (mylon) float64 24B 0.0 1.0 2.0
Attributes:
standard_name: sea_water_temperature
In [13]: ds.meta.lon # access by attribute
Out[13]:
<xarray.DataArray 'mylon' (mylon: 3)> Size: 24B
array([0., 1., 2.])
Coordinates:
* mylon (mylon) float64 24B 0.0 1.0 2.0
Attributes:
standard_name: longitude
In [14]: ds.meta.coords.lon # specific search = ds.meta.coords.get("lon")
Out[14]:
<xarray.DataArray 'mylon' (mylon: 3)> Size: 24B
array([0., 1., 2.])
Coordinates:
* mylon (mylon) float64 24B 0.0 1.0 2.0
Attributes:
standard_name: longitude
In [15]: ds.meta.temp # access by attribute
Out[15]:
<xarray.DataArray 'mytemp' (mylon: 3)> Size: 24B
array([20., 21., 22.])
Coordinates:
* mylon (mylon) float64 24B 0.0 1.0 2.0
Attributes:
standard_name: sea_water_temperature
In [16]: ds.meta["temp"].name # access by item
Out[16]: 'mytemp'
In [17]: ds.meta.data_vars.temp.name # specific search = ds.meta.data_vars.get("temp")
Out[17]: 'mytemp'
In [18]: ds.meta.data_vars.bathy is None # returns None when not found
Out[18]: True
In [19]: ds.meta.temp.meta.lon.name # chaining
Out[19]: 'mylon'
In [20]: ds.meta.temp.meta.name # generic meta name, not real name
Out[20]: 'temp'
In [21]: ds.meta.temp.meta.attrs # attributes, merged with CF attrs
Out[21]:
{'standard_name': 'sea_water_temperature',
'long_name': 'Sea water in situ temperature',
'units': 'degrees_celsius'}
In [22]: ds.meta.temp.meta.standard_name # single attribute
Out[22]: 'sea_water_temperature'
In [23]: ds.mytemp.meta.auto_format()
Out[23]:
<xarray.DataArray 'temp' (lon: 3)> Size: 24B
array([20., 21., 22.])
Coordinates:
* lon (lon) float64 24B 0.0 1.0 2.0
Attributes:
standard_name: sea_water_temperature
long_name: Sea water in situ temperature
units: degrees_celsius
In [24]: ds.meta.auto_format()
Out[24]:
<xarray.Dataset> Size: 72B
Dimensions: (lon: 3)
Coordinates:
* lon (lon) float64 24B 0.0 1.0 2.0
Data variables:
temp (lon) float64 24B 20.0 21.0 22.0
sal (lon) float64 24B 33.0 34.0 35.0
Accessing an accessor attribute or item makes an
implicit call to get().
The accessor gives access to
two sub-accessors, data_vars
and coords,
for specializing the searches.
See also
xoa.accessors.MetaDataArrayAccessor
xoa.accessors.MetaDatasetAccessor
Searching and matching#
Under the hood, accessor lookups are powered by the MetaSpecs
methods. You can use them directly.
Check if arrays match known or explicit meta items:
In [25]: meta_specs.coords.match(lon, "lon") # explicit
Out[25]: True
In [26]: meta_specs.coords.match(lon, "lat") # explicit
Out[26]: False
In [27]: meta_specs.coords.match(lon) # any known
Out[27]: 'lon'
In [28]: meta_specs.data_vars.match(temp) # any known
Out[28]: 'temp'
In [29]: meta_specs.data_vars.match(sal) # any known
Out[29]: 'sal'
Search for known meta items:
In [30]: mytemp = meta_specs.search(ds, "temp")
In [31]: mylon = meta_specs.search(mytemp, "lon")
Datasets are searched for data variables (“data_vars”) and data variables are searched for coordinates (“coords”). You can also search for coordinates in datasets, for instance like this:
In [32]: meta_specs.coords.search(ds, "lon")
Out[32]:
<xarray.DataArray 'mylon' (mylon: 3)> Size: 24B
array([0., 1., 2.])
Coordinates:
* mylon (mylon) float64 24B 0.0 1.0 2.0
Attributes:
standard_name: longitude
Formatting, encoding and decoding#
The idea#
Formatting means changing or setting names and attributes. It is possible to format, or even auto-format data variables and coordinates.
During an auto-formatting, each array is matched against meta specs,
and the array is formatted when a matching is successful.
If the array contains coordinates, the same process is applied on them,
as soon as the format_coords keyword is True.
If the name key is set in the matching item specs, its value can be used
to name the variable or coordinate array, else the generic name is used
like "lon".
Explicit formatting:
In [33]: meta_specs.format_coord(lon, "lon")
Out[33]:
<xarray.DataArray 'lon' (x: 3)> Size: 24B
array([0., 1., 2.])
Dimensions without coordinates: x
Attributes:
standard_name: longitude
long_name: Longitude
units: degrees_east
axis: X
In [34]: meta_specs.format_data_var(temp, "temp")
Out[34]:
<xarray.DataArray 'temp' (lon: 3)> Size: 24B
array([20., 21., 22.])
Coordinates:
* lon (lon) float64 24B 0.0 1.0 2.0
Attributes:
standard_name: sea_water_temperature
long_name: Sea water in situ temperature
units: degrees_celsius
Auto-formatting:
In [35]: ds2 = meta_specs.auto_format(ds)
In [36]: ds2.temp
Out[36]:
<xarray.DataArray 'temp' (lon: 3)> Size: 24B
array([20., 21., 22.])
Coordinates:
* lon (lon) float64 24B 0.0 1.0 2.0
Attributes:
standard_name: sea_water_temperature
long_name: Sea water in situ temperature
units: degrees_celsius
In [37]: ds2.lon
Out[37]:
<xarray.DataArray 'lon' (lon: 3)> Size: 24B
array([0., 1., 2.])
Coordinates:
* lon (lon) float64 24B 0.0 1.0 2.0
Attributes:
standard_name: longitude
long_name: Longitude
units: degrees_east
axis: X
It can be applied to a data array or a full dataset.
See also
xoa.meta.general.MetaSpecs.format_coord()
xoa.meta.general.MetaSpecs.format_data_var()
xoa.meta.general.MetaSpecs.auto_format()
Encoding and decoding#
By default, formatting renames known arrays to their generic name,
like “temp” in the example above. We speak here of encoding.
If the specialize keyword is set to True, arrays are
renamed with their specialized name if set in the specs with the
name option.
We speak here of decoding.
Two shortcut methods exist for these tasks:
Decoding:
decode()Encoding:
encode()
Chaining the two methods should lead to the initial dataset or data array. See the last section of this page for an example: Example: decoding/encoding Croco model outputs.
See also
xoa.meta.general.MetaSpecs.decode()
xoa.meta.general.MetaSpecs.encode()
Staggered grid locations#
A MetaSpecs instance comes with a SGLocator
that is accessible through the sglocator attribute.
A SGLocator helps parsing and formatting staggered grid
location from name, standard_name and long_name
data array attributes.
It is configured at the sglocator section
in which you can specify the
format of the name (name_format and
the allowed location names allowed_locations.
In [38]: sglocator = meta_specs.sglocator
In [39]: sglocator.formats
Out[39]:
{'name': '{root}_{loc}',
'standard_name': '{root}_at_{loc}_location',
'long_name': '{root} at {loc} location'}
In [40]: sglocator.update(name_format="{root}_{loc}")
In [41]: sglocator.format_attr("name", "lon", "rho")
Out[41]: 'lon_rho'
In [42]: sglocator.format_attr("standard_name", "lon", "rho")
Out[42]: 'lon_at_rho_location'
In [43]: sglocator.format_attr("long_name", "lon", "rho")
Out[43]: 'Lon at RHO location'
In [44]: sglocator.parse_attr("name", "lon_rho")
Out[44]: ('lon', 'rho')
Understanding the specifications#
Scanning and formatting actions are based on specifications,
and this module natively includes a
default configuration
for various oceanographic, sea surface and atmospheric variables and coordinates.
A distinction is made between
data variables (data_vars)
and coordinates (coords), like in xarray.
Getting the current specifications for data variables and coordinates
with the get_meta_specs() function:
In [45]: meta_specs["data_vars"] is meta_specs.data_vars
Out[45]: True
In [46]: meta_specs["coords"] is meta_specs.coords
Out[46]: True
In [47]: meta_specs.data_vars.names[:3]
Out[47]: ['sal', 'ptemp', 'ssh']
In [48]: meta_specs.coords.names[:3]
Out[48]: ['x', 'y', 'sig']
See the appendix The default configuration for the list of available default specifications.
An instance of the MetaSpecs has other configuration
sections
than
data_vars,
coords and
dims:
registration,
sglocator,
vertical and
accessors.
Data variables#
Here is the example of the sst data variable:
In [49]: from pprint import pprint
In [50]: pprint(meta_specs.data_vars['sst'])
{'add_coords_loc': {},
'add_loc': False,
'alt_names': [],
'attrs': {'long_name': ['Sea surface temperature'],
'standard_name': ['sea_surface_temperature',
'surface_sea_water_temperature'],
'units': ['degrees_celsius']},
'cmap': 'cmo.thermal',
'exclude': False,
'inherit': None,
'loc': None,
'name': None,
'processed': True,
'search_order': 'sn',
'select': {},
'squeeze': None}
Description of specification keys:
Key |
Type |
Description |
|
str |
Specialized name for decoding and encoding which is empty by default |
|
list(str) |
Alternate names for decoding |
|
list(str) |
“standard_name” attributes |
|
list(str) |
“long_name” attributes |
|
list(str) |
“units” attributes |
|
choice |
Domain of application, within {‘generic’, ‘atmos’, ‘ocean’, ‘surface’} |
|
str |
Search order within properties as combination of letters: [n]name, [s]tandard_name, [u]nits |
|
str |
Colormap specification |
|
str |
Inherit specifications from another data variable |
|
eval |
Item selection evaluated and applied to the array |
|
list(str) |
List of dimensions that must be squeezed out |
Note
The standard_name, long_name and units specifications are
internally stored as a dict in the attrs key.
Get the specialized name and the attributes only:
In [51]: meta_specs.data_vars.get_name("sst")
Out[51]: 'sst'
In [52]: meta_specs.data_vars.get_attrs("sst")
Out[52]:
{'standard_name': 'sea_surface_temperature',
'long_name': 'Sea surface temperature',
'units': 'degrees_celsius'}
Coordinates#
Here is the example of the lon coordinate:
In [53]: from pprint import pprint
In [54]: pprint(meta_specs.coords['lon'])
{'add_coords_loc': {},
'add_loc': None,
'alt_names': ['longitude', 'nav_lon', 'longitudes', 'lons'],
'attrs': {'axis': 'X',
'long_name': ['Longitude'],
'standard_name': ['longitude'],
'units': ['degrees_east',
'degree_east',
'degree_e',
'degrees_e',
'degreee',
'degreese']},
'exclude': False,
'inherit': None,
'loc': None,
'name': None,
'processed': True,
'search_order': 'snu'}
Description of specification keys:
Key |
Type |
Description |
|
str |
Specialized name for decoding and encoding which is empty by default |
|
list(str) |
Alternate names for decoding |
|
list(str) |
“standard_name” attributes |
|
list(str) |
“long_name” attributes |
|
list(str) |
“units” attributes |
|
str |
“axis” attribute like X, Y, Z, T or F |
|
str |
Search order within properties as combination of letters: [n]name, [s]tandard_name, [u]nits |
|
str |
Inherit specifications from another data variable or coordinate |
Note
Like for data variables, the standard_name, long_name, units
and axis specifications are internally stored as a dict in the attrs key.
Get the specialized name and the attributes only:
In [55]: meta_specs.coords.get_name("lon")
Out[55]: 'lon'
In [56]: meta_specs.coords.get_attrs("lon")
Out[56]:
{'standard_name': 'longitude',
'long_name': 'Longitude',
'units': 'degrees_east',
'axis': 'X'}
Dimensions#
Key |
Type |
Description |
|
list(str) |
Possible names of X-type dimensions |
|
list(str) |
Possible names of Y-type dimensions |
|
list(str) |
Possible names of Z-type vertical dimensions |
|
list(str) |
Possible names of time dimensions |
|
list(str) |
Possible names of forecast dimensions |
This section of the specs defines the names that the dimensions
of type x, y, z, t and f (forecast) can take.
It is filled automatically by default with possible names (name and alt_names)
of the coordinates that have their axis defined.
The user can also add more specific names.
These lists of names are used when searching for dimensions that are not coordinates: since they don’t have attributes, their type can only be guessed from their name.
Warning
It is recommended to not fill the dims section, but fill the coordinate section instead.
Here is the default content:
In [57]: meta_specs.dims # or meta_specs["dims"]
Out[57]:
{'x': ['x',
'xi',
'lon',
'xi',
'nx',
'ni',
'x',
'imt',
'ipi',
'longitude',
'nav_lon',
'longitudes',
'lons'],
'y': ['y',
'eta',
'lat',
'yi',
'ny',
'nj',
'y',
'jmt',
'jpj',
'latitude',
'nav_lat',
'latitudes',
'lats'],
'z': ['sig',
's',
'depth',
'level',
'altitude',
'station',
's_level',
'sigma',
's',
'sigma_level',
'dep',
'model_level',
'lev',
'nz',
'nk',
'z',
'kmt',
'zi',
'altitudet',
'altitudeu',
'altitudev',
'stations'],
't': ['time', 'nt'],
'f': ['forecast', 'nf', 'fcst']}
Other sections#
In [58]: meta_specs["register"]
Out[58]: {'name': 'croco', 'attrs': {'ini_file': ['*CROCO*'], 'type': ['*ROMS*']}}
In [59]: meta_specs["sglocator"]
Out[59]:
{'name_format': '{root}_{loc}',
'valid_locations': ['rho', 'w', 'u', 'v', 'r', 'psi']}
In [60]: meta_specs["accessors"]
Out[60]:
{'properties': {'coords': ['ptemp',
'lon',
'lat',
'depth',
'sig',
'level',
'altitude',
'time',
'forecast'],
'data_vars': ['temp',
'sal',
'u',
'v',
'bathy',
'ssh',
'temp',
'sal',
'u',
'v',
'bathy',
'ssh']}}
In [61]: meta_specs["vertical"]
Out[61]: {'positive': 'up', 'type': 'sigma'}
Customizing the specifications#
Default user file#
The xoa.meta module has internal defaults as shown
in appendix The default configuration.
You can extend these defaults with a user file, whose location is printable with the following command, at the line containing “user meta specs file”:
$ xoa info paths
- xoa library dir: /home/docs/checkouts/readthedocs.org/user_builds/xoa/conda/latest/lib/python3.14/site-packages/xoa
- user config file: /home/docs/.config/xoa/xoa.cfg [*]
- user CF specs file: /home/docs/.config/xoa/meta.cfg [*]
- data samples: OBS/ARGO/argo-7900573.nc MODELS/CMEMS-IBI/ibi-argo-7900573.nc MODELS/CROCO/SOUTH-AFRICA/croco.south-africa.meridional.nc MODELS/CROCO/SOUTH-AFRICA/croco.south-africa.surf.nc MODELS/CROCO/SOUTH-AFRICA/croco.south-africa.zonal.nc MODELS/HYCOM/hycom.gdp.h.nc MODELS/HYCOM/hycom.gdp.u.nc MODELS/HYCOM/hycom.gdp.v.nc OBS/DRIFTERS/gdp-6203641.csv
*: file not present
Updating current specs#
The current specs can be updated with different methods.
From a well structured dictionary:
In [62]: meta_specs.load_cfg({"data_vars": {"banana": {"standard_name": "banana"}}})
In [63]: meta_specs.data_vars["banana"]
Out[63]:
{'name': None,
'alt_names': [],
'cmap': None,
'inherit': None,
'squeeze': None,
'search_order': 'sn',
'loc': None,
'add_loc': False,
'exclude': False,
'add_coords_loc': {},
'attrs': {'long_name': ['Banana'], 'standard_name': ['banana'], 'units': []},
'select': {},
'processed': True}
From a configuration file: instead of the dictionary as an argument
to load_cfg() method, you can give either a
file name or a multi-line string with the same content as
the file.
Following the previous example:
[data_vars]
[[banana]]
standard_name: banana
If you only want to update a category,
you can use such method (here set_specs()):
In [64]: meta_specs.data_vars.set_specs("banana", name="bonono")
In [65]: meta_specs.data_vars["banana"]["name"]
Out[65]: 'bonono'
Alternatively, a xoa.meta.general.MetaSpecs instance can be loaded
with the load_cfg() method, as explained below.
Creating new specs#
To create new specs, you must instantiate the xoa.meta.general.MetaSpecs class,
with an input type as those presented above:
A config file name.
A multi-line string in the format of a config file.
A dictionary.
A
configobj.ConfigObjinstance.Another
MetaSpecsinstance.A list of them, with the first having priority over the last.
The initialization also accepts two options:
default: whether to load or not the default internal config.user: whether to load or not the user config file.
A config created from default and user configs:
In [66]: banana_specs = {"data_vars": {"banana": {"attrs": {"standard_name": "banana"}}}}
In [67]: mymeta_specs = meta.general.MetaSpecs(banana_specs)
In [68]: mymeta_specs["data_vars"]["sst"]["attrs"]["standard_name"]
Out[68]: ['sea_surface_temperature', 'surface_sea_water_temperature']
In [69]: mymeta_specs["data_vars"]["banana"]["attrs"]["standard_name"]
Out[69]: ['banana']
A config created from scratch:
In [70]: mymeta_specs = meta.general.MetaSpecs(banana_specs, default=False, user=False)
In [71]: mymeta_specs.pprint(depth=2)
{'accessors': {'properties': {...}},
'coords': {},
'data_vars': {'banana': {...}},
'dims': {'f': [], 't': [], 'x': [], 'y': [], 'z': []},
'exclude_names': None,
'register': {'attrs': {}, 'name': ''},
'sglocator': {'name_format': False, 'valid_locations': None},
'vertical': {'positive': None, 'type': None}}
A config created from two other configs:
In [72]: meta_specs_banana = meta.general.MetaSpecs(banana_specs, default=False, user=False)
In [73]: apple_specs = {"data_vars": {"apple": {"attrs": {"long_name": "Big apple"}}}}
In [74]: meta_specs_apple = meta.general.MetaSpecs(apple_specs, default=False, user=False)
In [75]: meta_specs_fruits = meta.general.MetaSpecs((meta_specs_apple, meta_specs_banana),
....: default=False, user=False)
....:
In [76]: meta_specs_fruits.data_vars.names
Out[76]: ['apple', 'banana']
Temporarily replacing specs#
As shown before, the current meta specs are accessible with the
xoa.meta.get_meta_specs() function.
You can replace them with the xoa.meta.set_meta_specs class,
to be used as a function.
In [77]: meta_specs_old = meta.get_meta_specs()
In [78]: meta.set_meta_specs(meta_specs_banana)
Out[78]: <xoa.meta.set_meta_specs at 0x7b38c5c7e650>
In [79]: meta.get_meta_specs() is meta_specs_banana
Out[79]: True
In [80]: meta.set_meta_specs(meta_specs_old)
Out[80]: <xoa.meta.set_meta_specs at 0x7b38c5c7e3f0>
In [81]: meta.get_meta_specs() is meta_specs_old
Out[81]: True
In case of a temporary change, you can use set_meta_specs
in a context statement:
In [82]: with meta.set_meta_specs(meta_specs_banana) as myspecs:
....: print('inside', meta.get_meta_specs() is meta_specs_banana)
....: print('inside', myspecs is meta.get_meta_specs())
....:
inside True
inside True
In [83]: print('outside', meta.get_meta_specs() is meta_specs_old)
outside True
For convenience, you can set specs directly with a dictionary:
In [84]: with meta.set_meta_specs({"data_vars": {"apple": {}}}) as myspecs: ....: print("apple" in meta.get_meta_specs()) ....: True In [85]: print("apple" in meta.get_meta_specs()) False
Application with an accessor usage:
In [86]: data = xr.DataArray([5], attrs={'standard_name': 'sea_surface_banana'})
In [87]: ds = xr.Dataset({'toto': data})
In [88]: mymeta_specs = meta.general.MetaSpecs({"data_vars": {"ssb":
....: {"standard_name": "sea_surface_banana"}}})
....:
In [89]: with meta.set_meta_specs(mymeta_specs):
....: print(ds.meta.get("ssb"))
....:
<xarray.DataArray 'toto' (dim_0: 1)> Size: 8B
array([5])
Dimensions without coordinates: dim_0
Attributes:
standard_name: sea_surface_banana
Registering and inferring specs#
It is possible to register specialized MetaSpecs instances
with register_meta_specs() for future access.
Here we register new specs with an internal registration name "mycroco":
In [90]: content = {
....: "register": {
....: "name": "mycroco"
....: },
....: "data_vars": {
....: "temp": {
....: "name": "supertemp"
....: }
....: },
....: "coords": {
....: "lon": {
....: "name": "mylon"
....: }
....: }
....: }
....:
In [91]: mymeta_specs = meta.general.MetaSpecs(content)
In [92]: meta.register_meta_specs(mymeta_specs)
We can now access it with the get_meta_specs() function:
In [93]: these_meta_specs = meta.get_meta_specs('mycroco')
In [94]: these_meta_specs is mymeta_specs
Out[94]: True
If you set the meta_specs attribute or encoding of a dataset
to the name of a registered MetaSpecs instance, you can
get it automatically with the get_meta_specs() function.
Let’s register another MetaSpecs instance:
In [95]: content = {
....: "register": {
....: "name": "myhycom"
....: },
....: "data_vars": {
....: "sal": {
....: "name": "supersal"
....: }
....: },
....: }
....:
In [96]: mymeta_specs2 = meta.general.MetaSpecs(content)
In [97]: meta.register_meta_specs(mymeta_specs2)
Let’s create a dataset:
In [98]: ds = xr.Dataset({'supertemp': ("mylon", [0, 2])}, coords={"mylon": [10, 20]})
Now find the best registered specs instance which has either the name
myhycom or mycroco:
In [99]: meta_specs_auto = meta.infer_meta_specs(ds)
In [100]: print(meta_specs_auto.name)
mycroco
In [101]: ds_decoded = meta_specs_auto.decode(ds)
In [102]: ds_decoded
Out[102]:
<xarray.Dataset> Size: 32B
Dimensions: (lon: 2)
Coordinates:
* lon (lon) int64 16B 10 20
Data variables:
temp (lon) int64 16B 0 2
In [103]: meta_specs_auto.encode(ds)
Out[103]:
<xarray.Dataset> Size: 32B
Dimensions: (mylon: 2)
Coordinates:
* mylon (mylon) int64 16B 10 20
Data variables:
supertemp (mylon) int64 16B 0 2
It is mycroco as expected.
Assigning specs to datasets#
All xoa routines that need to access specific coordinates
or variables try to infer the appropriate specs, which default
to the current specs.
When the meta_specs attribute or encoding is set,
get_meta_specs() uses it to search within
registered specs.
In [104]: ds.encoding.update(meta_specs="mycroco")
In [105]: meta_specs = meta.get_meta_specs(ds)
In [106]: meta_specs.encode(ds)
Out[106]:
<xarray.Dataset> Size: 32B
Dimensions: (mylon: 2)
Coordinates:
* mylon (mylon) int64 16B 10 20
Data variables:
supertemp (mylon) int64 16B 0 2
The meta_specs encoding is set at the dataset level,
not at the data array level:
In [107]: meta.get_meta_specs(ds.supertemp) is meta_specs
Out[107]: True
To propagate to all the data arrays, use assign_meta_specs():
In [108]: meta.assign_meta_specs(ds, "mycroco")
Out[108]:
<xarray.Dataset> Size: 32B
Dimensions: (mylon: 2)
Coordinates:
* mylon (mylon) int64 16B 10 20
Data variables:
supertemp (mylon) int64 16B 0 2
In [109]: ds.mylon.encoding
Out[109]: {'meta_specs': 'mycroco'}
In [110]: meta.get_meta_specs(ds.supertemp) is meta_specs
Out[110]: True
Specialized helper modules#
The xoa.meta module provides the foundation for finding and identifying
variables and coordinates. Building on this, xoa provides specialized
modules with convenient functions for common oceanographic and atmospheric variables.
All these functions use the meta specs under the hood.
Finding coordinates with xoa.coords#
The xoa.coords module provides high-level functions to find specific
coordinates in a dataset or data array.
Available coordinate finder functions:
get_depth()/is_depth(): Depthget_altitude()/is_altitude(): Altitudeget_level()/is_level(): Generic vertical levelget_vertical(): Any vertical coordinate (depth, altitude, or level)get_time()/is_time(): Time
Dimension finder functions:
Example usage:
In [111]: import xoa.coords
# Rebuild a sample dataset with coordinates
In [112]: lon = xr.DataArray(np.arange(3, dtype='d'), dims='mylon',
.....: attrs={'standard_name': 'longitude'})
.....:
In [113]: temp = xr.DataArray(np.arange(20, 23, dtype='d'), dims='mylon',
.....: coords={'mylon': lon},
.....: attrs={'standard_name': 'sea_water_temperature'})
.....:
# Get longitude coordinate from a data array
In [114]: found_lon = xoa.coords.get_lon(temp)
In [115]: print(found_lon.name)
mylon
# Check if a coordinate is latitude
In [116]: print(xoa.coords.is_lon(found_lon))
True
In [117]: print(xoa.coords.is_lat(found_lon))
False
Finding thermodynamic variables with xoa.thermdyn#
The xoa.thermdyn module provides functions to find thermodynamic variables
like temperature, salinity, and density in datasets.
get_temp()/is_temp(): Temperatureget_dens()/is_dens(): Density
Each function accepts a variant keyword to target a specific type:
Temperature variants:
"temp"(in situ),"ptemp"(potential),"ctemp"(conservative),"atemp"(absolute)Salinity variants:
"sal"(in situ),"psal"(practical),"asal"(absolute),"pfsal"(preformed)
In [118]: import xoa.thermdyn
In [119]: sal = xr.DataArray(np.arange(33, 36, dtype='d'), dims='mylon',
.....: coords={'mylon': lon},
.....: attrs={'standard_name': 'sea_water_salinity'})
.....:
In [120]: ds_thd = xr.Dataset({'mytemp': temp, 'mysal': sal})
# Find any temperature variable
In [121]: found_temp = xoa.thermdyn.get_temp(ds_thd)
In [122]: print(found_temp.name)
mytemp
# Check if a variable is temperature-like
In [123]: print(xoa.thermdyn.is_temp(found_temp))
True
# Find salinity
In [124]: found_sal = xoa.thermdyn.get_sal(ds_thd)
In [125]: print(found_sal.name)
mysal
Finding dynamical variables with xoa.dyn#
The xoa.dyn module provides functions for ocean dynamics variables,
particularly sea level.
get_sea_level(): Find a sea level variable
Supported variants: "ssh" (sea surface height), "adt" (absolute dynamic
topography), "sla" (sea level anomaly), "mdt" (mean dynamic topography),
"mss" (mean sea surface).
See also
Example: decoding/encoding Croco model outputs#
Here are the specs for Croco:
[register]
name=croco
[[attrs]]
ini_file=*CROCO*
type=*ROMS*
[sglocator]
name_format={root}_{loc}
valid_locations=rho,w,u,v,r,psi
[accessors]
[[properties]]
coords=ptemp
[vertical]
positive=up
type=sigma
[data_vars]
[[sal]]
name=salt
[[ptemp]]
name=temp
[[ssh]]
name=zeta
[[bathy]]
name=h
[[mask]]
mask=mask
add_loc=True
[[corio]]
name=f
[[akt]]
name=AKt
[[ex]]
name=pm
[[ey]]
name=pn
[[sig]]
name=s
[[cs]]
name=Cs
add_loc=True
[coords]
[[x]]
name=xi
search_order=n
standard_name=x_grid_index
[[y]]
name=eta
search_order=n
standard_name=y_grid_index
[[sig]]
name=s
add_loc=True
[[depth]]
add_loc=True
Register them:
In [126]: meta_config_file = xoa.meta.get_meta_config_file("croco")
In [127]: print(meta_config_file)
/home/docs/checkouts/readthedocs.org/user_builds/xoa/conda/latest/lib/python3.14/site-packages/xoa/meta/configs/croco.cfg
In [128]: xoa.meta.register_meta_specs(meta_config_file) # xoa.meta.register_meta_specs("croco")
In [129]: xoa.meta.is_registered_meta_specs("croco")
Out[129]: True
Register the xoa accessor:
In [130]: xoa.register_accessors()
Now let’s open a Croco sample as a xarray dataset:
In [131]: ds = xoa.open_data_sample("MODELS/CROCO/SOUTH-AFRICA/croco.south-africa.meridional.nc")
In [132]: ds
Out[132]:
<xarray.Dataset> Size: 48kB
Dimensions: (time: 1, s_w: 33, eta_rho: 56, xi_rho: 1, s_rho: 32, auxil: 4,
xi_u: 1, eta_v: 55)
Coordinates: (12/13)
* time (time) float64 8B 2.592e+06
* s_w (s_w) float32 132B -1.0 -0.9688 -0.9375 ... -0.0625 -0.03125 0.0
* eta_rho (eta_rho) float32 224B 6.0 7.0 8.0 9.0 ... 58.0 59.0 60.0 61.0
* xi_rho (xi_rho) float32 4B 131.0
lat_rho (eta_rho, xi_rho) float32 224B ...
lat_v (eta_v, xi_rho) float32 220B ...
... ...
lon_v (eta_v, xi_rho) float32 220B ...
* s_rho (s_rho) float32 128B -0.9844 -0.9531 ... -0.04688 -0.01562
* xi_u (xi_u) float32 4B 131.5
lat_u (eta_rho, xi_u) float32 224B ...
lon_u (eta_rho, xi_u) float32 224B ...
* eta_v (eta_v) float32 220B 6.5 7.5 8.5 9.5 ... 57.5 58.5 59.5 60.5
Dimensions without coordinates: auxil
Data variables: (12/23)
AKt (time, s_w, eta_rho, xi_rho) float32 7kB ...
Cs_r (s_rho) float32 128B ...
Cs_w (s_w) float32 132B ...
Vtransform float32 4B ...
angle (eta_rho, xi_rho) float32 224B ...
el float32 4B ...
... ...
time_step (time, auxil) int32 16B ...
u (time, s_rho, eta_rho, xi_u) float32 7kB ...
v (time, s_rho, eta_v, xi_rho) float32 7kB ...
w (time, s_rho, eta_rho, xi_rho) float32 7kB ...
xl float32 4B ...
zeta (time, eta_rho, xi_rho) float32 224B ...
Attributes: (12/57)
type: ROMS history file
title: BENGUELA TEST MODEL
date:
rst_file: CROCO_FILES/croco_rst.nc
his_file: CROCO_FILES/croco_his.nc
avg_file: CROCO_FILES/croco_avg.nc
... ...
v_sponge: 0.0
sponge_expl: Sponge parameters : extent (m) & viscosity (m2.s-1)
SRCS: main.F step.F read_inp.F timers_roms.F init_scalars.F ini...
CPP-options: REGIONAL BENGUELA_VHR MPI OBC_EAST OBC_WEST OBC_NORTH OBC...
history: Tue Mar 31 16:26:24 2020: ncks -O -d time,30 -d xi_rho,13...
NCO: 4.4.2
Let’s decode it:
In [133]: dsd = ds.xoa.decode()
In [134]: dsd
Out[134]:
<xarray.Dataset> Size: 48kB
Dimensions: (time: 1, sig_w: 33, y_rho: 56, x_rho: 1, sig_rho: 32,
auxil: 4, x_u: 1, y_v: 55)
Coordinates: (12/13)
* time (time) float64 8B 2.592e+06
* sig_w (sig_w) float32 132B -1.0 -0.9688 -0.9375 ... -0.03125 0.0
* y_rho (y_rho) float32 224B 6.0 7.0 8.0 9.0 ... 58.0 59.0 60.0 61.0
* x_rho (x_rho) float32 4B 131.0
lat_rho (y_rho, x_rho) float32 224B ...
lat_v (y_v, x_rho) float32 220B ...
... ...
lon_v (y_v, x_rho) float32 220B ...
* sig_rho (sig_rho) float32 128B -0.9844 -0.9531 ... -0.04688 -0.01562
* x_u (x_u) float32 4B 131.5
lat_u (y_rho, x_u) float32 224B ...
lon_u (y_rho, x_u) float32 224B ...
* y_v (y_v) float32 220B 6.5 7.5 8.5 9.5 10.5 ... 57.5 58.5 59.5 60.5
Dimensions without coordinates: auxil
Data variables: (12/23)
akt (time, sig_w, y_rho, x_rho) float32 7kB ...
cs_r (sig_rho) float32 128B ...
cs_w (sig_w) float32 132B ...
Vtransform float32 4B ...
angle (y_rho, x_rho) float32 224B ...
el float32 4B ...
... ...
time_step (time, auxil) int32 16B ...
u (time, sig_rho, y_rho, x_u) float32 7kB ...
v (time, sig_rho, y_v, x_rho) float32 7kB ...
w (time, sig_rho, y_rho, x_rho) float32 7kB ...
xl float32 4B ...
ssh (time, y_rho, x_rho) float32 224B ...
Attributes: (12/57)
type: ROMS history file
title: BENGUELA TEST MODEL
date:
rst_file: CROCO_FILES/croco_rst.nc
his_file: CROCO_FILES/croco_his.nc
avg_file: CROCO_FILES/croco_avg.nc
... ...
v_sponge: 0.0
sponge_expl: Sponge parameters : extent (m) & viscosity (m2.s-1)
SRCS: main.F step.F read_inp.F timers_roms.F init_scalars.F ini...
CPP-options: REGIONAL BENGUELA_VHR MPI OBC_EAST OBC_WEST OBC_NORTH OBC...
history: Tue Mar 31 16:26:24 2020: ncks -O -d time,30 -d xi_rho,13...
NCO: 4.4.2
Let’s re-encode it!
In [135]: dse = dsd.xoa.encode()
In [136]: dse
Out[136]:
<xarray.Dataset> Size: 48kB
Dimensions: (time: 1, s_w: 33, eta_rho: 56, xi_rho: 1, s_rho: 32, auxil: 4,
xi_u: 1, eta_v: 55)
Coordinates: (12/13)
* time (time) float64 8B 2.592e+06
* s_w (s_w) float32 132B -1.0 -0.9688 -0.9375 ... -0.0625 -0.03125 0.0
* eta_rho (eta_rho) float32 224B 6.0 7.0 8.0 9.0 ... 58.0 59.0 60.0 61.0
* xi_rho (xi_rho) float32 4B 131.0
lat_rho (eta_rho, xi_rho) float32 224B ...
lat_v (eta_v, xi_rho) float32 220B ...
... ...
lon_v (eta_v, xi_rho) float32 220B ...
* s_rho (s_rho) float32 128B -0.9844 -0.9531 ... -0.04688 -0.01562
* xi_u (xi_u) float32 4B 131.5
lat_u (eta_rho, xi_u) float32 224B ...
lon_u (eta_rho, xi_u) float32 224B ...
* eta_v (eta_v) float32 220B 6.5 7.5 8.5 9.5 ... 57.5 58.5 59.5 60.5
Dimensions without coordinates: auxil
Data variables: (12/23)
AKt (time, s_w, eta_rho, xi_rho) float32 7kB ...
Cs_r (s_rho) float32 128B ...
Cs_w (s_w) float32 132B ...
Vtransform float32 4B ...
angle (eta_rho, xi_rho) float32 224B ...
el float32 4B ...
... ...
time_step (time, auxil) int32 16B ...
u (time, s_rho, eta_rho, xi_u) float32 7kB ...
v (time, s_rho, eta_v, xi_rho) float32 7kB ...
w (time, s_rho, eta_rho, xi_rho) float32 7kB ...
xl float32 4B ...
zeta (time, eta_rho, xi_rho) float32 224B ...
Attributes: (12/57)
type: ROMS history file
title: BENGUELA TEST MODEL
date:
rst_file: CROCO_FILES/croco_rst.nc
his_file: CROCO_FILES/croco_his.nc
avg_file: CROCO_FILES/croco_avg.nc
... ...
v_sponge: 0.0
sponge_expl: Sponge parameters : extent (m) & viscosity (m2.s-1)
SRCS: main.F step.F read_inp.F timers_roms.F init_scalars.F ini...
CPP-options: REGIONAL BENGUELA_VHR MPI OBC_EAST OBC_WEST OBC_NORTH OBC...
history: Tue Mar 31 16:26:24 2020: ncks -O -d time,30 -d xi_rho,13...
NCO: 4.4.2
Et voilà !
Example: decoding and merging Hycom split outputs#
At Shom, the Hycom model outputs are split into separate files, one for each variable.
Conflicts may occur when using variables that are not at the same staggered grid location,
since all variables are stored with dimensions (y, x) and lon and lat
coordinates, all with the same name.
To avoid this problem, the horizontal dimensions and coordinates of
staggered variables are renamed to indicate their location.
Here are the specs to take care of the staggered grid indicators in the names:
[register]
name=hycom
[[attrs]]
nsigma = "[0-9]?"
[sglocator]
name_format="{root}_{loc}"
valid_locations=u,v
[vertical]
positive=down
type=dz
[data_vars]
[[u]]
loc=u
[[[add_coords_loc]]]
lon=True
lat=True
x=True
y=True
[[v]]
loc=v
[[[add_coords_loc]]]
lon=True
lat=True
x=True
y=True
[[bathy]]
name=bathymetry
add_loc=True
[[dz]]
name=h
[coords]
[[x]]
name=X
[[y]]
name=Y
[[z]]
name=lev
Note the add_coords_loc sections.
Location is not added to the u and v variables
but to their dimensions and coordinates.
Register them:
In [137]: xoa.meta.register_meta_specs("hycom")
In [138]: xoa.meta.is_registered_meta_specs("hycom")
Out[138]: True
Overview of the U dataset:
In [139]: dsu = xoa.open_data_sample("MODELS/HYCOM/hycom.gdp.u.nc")
In [140]: dsu
Out[140]:
<xarray.Dataset> Size: 173kB
Dimensions: (Y: 11, X: 13, time: 50, lev: 6)
Coordinates:
* time (time) datetime64[ns] 400B 2021-03-01 ... 2021-03-25T12:00:00
* lev (lev) int32 24B 1 2 3 4 5 6
Dimensions without coordinates: Y, X
Data variables:
lon (Y, X) float32 572B ...
lat (Y, X) float32 572B ...
u (time, lev, Y, X) float32 172kB ...
Attributes: (12/26)
history: Fri Apr 9 12:15:12 2021: ncrcat -x -v bathyme...
Conventions: COARDS
nhybrd: 20
nsigma: 19
kapref: 2
hybflg: 0
... ...
dp00ft: [1.15 1.15 1.15 1.15 1.15 1.15 1.15 1.15 1.15 ...
dp00xt: [21.874 21.414 21.414 13.197 7.85 3.85 4....
ds00t: [3.555 3.026 3.026 1.547 0.828 0.828 0.968 0.8...
ds00ft: [1.15 1.15 1.15 1.15 1.15 1.15 1.15 1.15 1.15 ...
ds00xt: [21.874 21.414 21.414 7.197 3.85 3.85 4....
nco_openmp_thread_number: 1
Decoding:
In [141]: dsu = dsu.xoa.decode()
In [142]: dsu
Out[142]:
<xarray.Dataset> Size: 173kB
Dimensions: (y_u: 11, x_u: 13, time: 50, z: 6)
Coordinates:
lon_u (y_u, x_u) float32 572B ...
lat_u (y_u, x_u) float32 572B ...
* time (time) datetime64[ns] 400B 2021-03-01 ... 2021-03-25T12:00:00
* z (z) int32 24B 1 2 3 4 5 6
Dimensions without coordinates: y_u, x_u
Data variables:
u (time, z, y_u, x_u) float32 172kB ...
Attributes: (12/26)
history: Fri Apr 9 12:15:12 2021: ncrcat -x -v bathyme...
Conventions: COARDS
nhybrd: 20
nsigma: 19
kapref: 2
hybflg: 0
... ...
dp00ft: [1.15 1.15 1.15 1.15 1.15 1.15 1.15 1.15 1.15 ...
dp00xt: [21.874 21.414 21.414 13.197 7.85 3.85 4....
ds00t: [3.555 3.026 3.026 1.547 0.828 0.828 0.968 0.8...
ds00ft: [1.15 1.15 1.15 1.15 1.15 1.15 1.15 1.15 1.15 ...
ds00xt: [21.874 21.414 21.414 7.197 3.85 3.85 4....
nco_openmp_thread_number: 1
Now we can read, decode and merge all files without any conflict:
In [143]: dsv = xoa.open_data_sample("MODELS/HYCOM/hycom.gdp.u.nc").xoa.decode()
In [144]: dsh = xoa.open_data_sample("MODELS/HYCOM/hycom.gdp.h.nc").xoa.decode()
In [145]: print(dsu)
<xarray.Dataset> Size: 173kB
Dimensions: (y_u: 11, x_u: 13, time: 50, z: 6)
Coordinates:
lon_u (y_u, x_u) float32 572B ...
lat_u (y_u, x_u) float32 572B ...
* time (time) datetime64[ns] 400B 2021-03-01 ... 2021-03-25T12:00:00
* z (z) int32 24B 1 2 3 4 5 6
Dimensions without coordinates: y_u, x_u
Data variables:
u (time, z, y_u, x_u) float32 172kB ...
Attributes: (12/26)
history: Fri Apr 9 12:15:12 2021: ncrcat -x -v bathyme...
Conventions: COARDS
nhybrd: 20
nsigma: 19
kapref: 2
hybflg: 0
... ...
dp00ft: [1.15 1.15 1.15 1.15 1.15 1.15 1.15 1.15 1.15 ...
dp00xt: [21.874 21.414 21.414 13.197 7.85 3.85 4....
ds00t: [3.555 3.026 3.026 1.547 0.828 0.828 0.968 0.8...
ds00ft: [1.15 1.15 1.15 1.15 1.15 1.15 1.15 1.15 1.15 ...
ds00xt: [21.874 21.414 21.414 7.197 3.85 3.85 4....
nco_openmp_thread_number: 1
In [146]: ds = xr.merge([dsu, dsv, dsh])
In [147]: ds
Out[147]:
<xarray.Dataset> Size: 346kB
Dimensions: (y_u: 11, x_u: 13, time: 50, z: 6, y: 11, x: 13)
Coordinates:
lon_u (y_u, x_u) float32 572B -3.213 -3.138 -3.062 ... -2.388 -2.312
lat_u (y_u, x_u) float32 572B 44.69 44.69 44.69 ... 45.22 45.22 45.22
* time (time) datetime64[ns] 400B 2021-03-01 ... 2021-03-25T12:00:00
* z (z) int32 24B 1 2 3 4 5 6
lon (y, x) float32 572B ...
lat (y, x) float32 572B ...
Dimensions without coordinates: y_u, x_u, y, x
Data variables:
u (time, z, y_u, x_u) float32 172kB 0.03078 0.02891 ... 0.1252 0.1272
bathy (y, x) float32 572B ...
dz (time, z, y, x) float32 172kB ...
Attributes: (12/26)
history: Fri Apr 9 12:15:12 2021: ncrcat -x -v bathyme...
Conventions: COARDS
nhybrd: 20
nsigma: 19
kapref: 2
hybflg: 0
... ...
dp00ft: [1.15 1.15 1.15 1.15 1.15 1.15 1.15 1.15 1.15 ...
dp00xt: [21.874 21.414 21.414 13.197 7.85 3.85 4....
ds00t: [3.555 3.026 3.026 1.547 0.828 0.828 0.968 0.8...
ds00ft: [1.15 1.15 1.15 1.15 1.15 1.15 1.15 1.15 1.15 ...
ds00xt: [21.874 21.414 21.414 7.197 3.85 3.85 4....
nco_openmp_thread_number: 1