Naming conventions with xoa.cf
Warning
Have a quick loop to the appendix The default configuration before going any further!
Introduction
This module is an application and extension of a subset of the CF conventions in application to naming conventions. It has two intents:
Searching
xarray.DataArray
variables or coordinates with anotherxarray.DataArray
or axarray.Dataset
, by scanning name and attributes likeunits
andstandard_name
.Formatting
xarray.DataArray
variables or coordinates with uniquename
, andstandard_name
,long_name
andunits
attributes, with support of staggered grid location syntax.
The module offers capabilities for the user to extend and specialize default behaviors for user’s special datasets.
Note
This module shares common feature with the excellent and long awaited xf_xarray package and started a long time before with the Vacumm package. The most notable differences differences include:
The current module is also designed for data variables, not only coordinates.
It search for items not only using standard_names but also specialized names.
It is not only available as accessors, but also as independant objects that can be configured for each type of dataset or in contexts by the user.
Accessing the current specifications
Scanning and formatting actions are based on specifications,
and this module natively includes a
default configuration
for various oceanographic, sea surface and atmospheric variables and coordinates.
A distinction is made between
data variables (data_vars)
and coordinates (coords), like in xarray
.
Getting the current specifications for data variables and coordinates
with the get_cf_specs()
function:
In [1]: from xoa import cf
In [2]: cfspecs = cf.get_cf_specs()
In [3]: cfspecs["data_vars"] is cfspecs.data_vars
Out[3]: True
In [4]: cfspecs["data_vars"] is cfspecs.coords
Out[4]: False
In [5]: cfspecs.data_vars.names[:3]
Out[5]: ['ptemp', 'ctemp', 'temp']
In [6]: cfspecs.coords.names[:3]
Out[6]: ['x', 'y', 'level']
See the appendix The default configuration for the list of available default specifications.
An instance of the CFSpecs
has other configuration
sections
than
data_vars
,
coords
and
dims
:
registration
,
sglocator
,
vertical
and
accessors
.
Data variables
Here is the example of the sst
data variable:
In [7]: from pprint import pprint
In [8]: pprint(cfspecs.data_vars['sst'])
{'add_coords_loc': {},
'add_loc': False,
'alt_names': [],
'attrs': {'long_name': ['Sea surface temperature'],
'standard_name': ['sea_surface_temperature',
'surface_sea_water_temperature'],
'units': ['degrees_celsius']},
'cmap': 'cmo.thermal',
'exclude': False,
'inherit': None,
'loc': None,
'name': None,
'processed': True,
'search_order': 'sn',
'select': {},
'squeeze': None}
Description of specification keys:
Key |
Type |
Description |
|
str |
Specialized name for decoding and encoding which is empty by default |
|
list(str) |
Alternate names for decoding |
|
list(str) |
“standard_name” attributes |
|
list(str) |
“long_name” attributes |
|
list(str) |
“units” attributes |
|
choice |
Domain of application, within {‘generic’, ‘atmos’, ‘ocean’, ‘surface’} |
|
str |
Search order within properties as combination of letters: [n]name, [s]tandard_name, [u]nits |
|
str |
Colormap specification |
|
str |
Inherit specifications from another data variable |
|
eval |
Item selection evaluated and applied to the array |
|
list(str) |
List of dimensions that must be squeezed out |
Note
The standard_name
, long_name
and units
specifications are
internally stored as a dict in the attrs
key.
Get the specialized name and the attributes only:
In [9]: cfspecs.data_vars.get_name("sst")
Out[9]: 'sst'
In [10]: cfspecs.data_vars.get_attrs("sst")
Out[10]:
{'standard_name': 'sea_surface_temperature',
'long_name': 'Sea surface temperature',
'units': 'degrees_celsius'}
Coordinates
Here is the example of the lon
coordinate:
In [11]: from pprint import pprint
In [12]: pprint(cfspecs.coords['lon'])
{'add_coords_loc': {},
'add_loc': None,
'alt_names': ['longitude', 'nav_lon', 'longitudes', 'lons'],
'attrs': {'axis': 'X',
'long_name': ['Longitude'],
'standard_name': ['longitude'],
'units': ['degrees_east',
'degree_east',
'degree_e',
'degrees_e',
'degreee',
'degreese']},
'exclude': False,
'inherit': None,
'loc': None,
'name': None,
'processed': True,
'search_order': 'snu'}
Description of specification keys:
Key |
Type |
Description |
|
str |
Specialized name for decoding and encoding which is empty by default |
|
list(str) |
Alternate names for decoding |
|
list(str) |
“standard_name” attributes |
|
list(str) |
“long_name” attributes |
|
list(str) |
“units” attributes |
|
str |
“axis” attribute like X, Y, Z, T or F |
|
str |
Search order within properties as combination of letters: [n]name, [s]tandard_name, [u]nits |
|
str |
Inherit specifications from another data variable or coordinate |
Note
Like for data variables, the standard_name
, long_name
, units
and axis
specifications are internally stored as a dict in the attrs
key.
Get the specialized name and the attributes only:
In [13]: cfspecs.coords.get_name("lon")
Out[13]: 'lon'
In [14]: cfspecs.coords.get_attrs("lon")
Out[14]:
{'standard_name': 'longitude',
'long_name': 'Longitude',
'units': 'degrees_east',
'axis': 'X'}
Dimensions
Key |
Type |
Description |
|
list(str) |
Possible names of X-type dimensions |
|
list(str) |
Possible names of Y-type dimensions |
|
list(str) |
Possible names of Z-type vertical dimensions |
|
list(str) |
Possible names of time dimensions |
|
list(str) |
Possible names of forecast dimensions |
This section of the specs defines the names that the dimensions
of type x
, y
, z
, t
and f
(forecast) can take.
It is filled automatically by default with possible names (name
and alt_names
)
of the coordinates that have their axis
defined.
The user can also add more spcific names.
Theses list of names are used when searching for dimensions that are not coordinates: since they don’t have attribute, their type can only be guessed from their name.
Warning
It is recommended to not fill the dims section, but fill the coordinate section instead.
Here is the default content:
In [15]: cfspecs.dims # or cfspecs["dims"]
Out[15]:
{'x': ['x',
'lon',
'xi',
'nx',
'ni',
'x',
'imt',
'ipi',
'longitude',
'nav_lon',
'longitudes',
'lons'],
'y': ['y',
'lat',
'yi',
'ny',
'nj',
'y',
'jmt',
'jpj',
'latitude',
'nav_lat',
'latitudes',
'lats'],
'z': ['level',
'depth',
'altitude',
'sig',
'station',
'model_level',
'lev',
'nz',
'nk',
'z',
'kmt',
'zi',
'dep',
'altitudet',
'altitudeu',
'altitudev',
's_level',
'sigma',
's',
'sigma_level',
'stations'],
't': ['time', 'nt'],
'f': ['forecast', 'nf', 'fcst']}
Other sections
In [16]: cfspecs["register"]
Out[16]: {'name': 'generic', 'attrs': {}}
In [17]: cfspecs["sglocator"]
Out[17]: {'name_format': '{root}_{loc}', 'valid_locations': None}
In [18]: cfspecs["accessors"]
Out[18]:
{'properties': {'coords': ['lon',
'lat',
'depth',
'sig',
'level',
'altitude',
'time',
'forecast'],
'data_vars': ['temp', 'sal', 'u', 'v', 'bathy', 'ssh']}}
In [19]: cfspecs["vertical"]
Out[19]: {'positive': None, 'type': None}
Searching within a Dataset
or DataArray
Let’s define a minimal dataset:
In [20]: nx = 3
In [21]: lon = xr.DataArray(np.arange(3, dtype='d'), dims='mylon',
....: attrs={'standard_name': 'longitude'})
....:
In [22]: temp = xr.DataArray(np.arange(20, 23, dtype='d'), dims='mylon',
....: coords={'mylon': lon},
....: attrs={'standard_name': 'sea_water_temperature'})
....:
In [23]: sal = xr.DataArray(np.arange(33, 36, dtype='d'), dims='mylon',
....: coords={'mylon': lon},
....: attrs={'standard_name': 'sea_water_salinity'})
....:
In [24]: ds = xr.Dataset({'mytemp': temp, 'mysal': sal})
All these arrays are CF compliant according to their
standard_name
attribute, despite their name is not really explicit.
Check if they match known or explicit CF items:
In [25]: cfspecs.coords.match(lon, "lon") # explicit
Out[25]: True
In [26]: cfspecs.coords.match(lon, "lat") # explicit
Out[26]: False
In [27]: cfspecs.coords.match(lon) # any known
Out[27]: 'lon'
In [28]: cfspecs.data_vars.match(temp) # any known
Out[28]: 'temp'
In [29]: cfspecs.data_vars.match(sal) # any known
Out[29]: 'sal'
Search for known CF items:
In [30]: mytemp = cfspecs.search(ds, "temp")
In [31]: mylon = cfspecs.search(mytemp, "lon")
Datasets are searched for data variables (“data_vars”) and data variables are searched for coordinates (“coords”). You can also search for coordinates in datasets, for instance like this:
In [32]: cfspecs.coords.search(ds, "lon")
Out[32]:
<xarray.DataArray 'mylon' (mylon: 3)>
array([0., 1., 2.])
Coordinates:
* mylon (mylon) float64 0.0 1.0 2.0
Attributes:
standard_name: longitude
See also
The staggered-grid locator SGLocator
A CFSpecs
instance comes with SGLocator
that is accessible through the sglocator
attribute.
A SGLocator
helps parsing and formatting staggered grid
location from name
, standard_name
and long_name
data array attributes.
It is configured at the sglocator
section
in which you can specify the
format of the name (name_format
and
the allowed location names allowed_locations
.
In [33]: sglocator = cfspecs.sglocator
In [34]: sglocator.formats
Out[34]:
{'name': '{root}_{loc}',
'standard_name': '{root}_at_{loc}_location',
'long_name': '{root} at {loc} location'}
In [35]: sglocator.format_attr("name", "lon", "rho")
Out[35]: 'lon_rho'
In [36]: sglocator.format_attr("standard_name", "lon", "rho")
Out[36]: 'lon_at_rho_location'
In [37]: sglocator.format_attr("long_name", "lon", "rho")
Out[37]: 'Lon at RHO location'
In [38]: sglocator.parse_attr("name", "lon_rho")
Out[38]: ('lon', 'rho')
Formatting i.e encoding and decoding
The idea
Formatting means changing or setting names and attributes. It is possible to format, or even auto-format data variables and coordinates.
During an auto-formatting, each array is matched against CF specs,
and the array is formatted when a matching is successfull.
If the array contains coordinates, the same process is applied on them,
as soon as the format_coords
keyword is True
.
If the name
key is set in the matching item specs, its value can be used
to name the variable or coordinate array, else the generic name is used
like "lon"
.
Explicit formatting:
In [39]: cfspecs.format_coord(lon, "lon")
Out[39]:
<xarray.DataArray 'lon' (x: 3)>
array([0., 1., 2.])
Dimensions without coordinates: x
Attributes:
standard_name: longitude
long_name: Longitude
units: degrees_east
axis: X
In [40]: cfspecs.format_data_var(temp, "temp")
Out[40]:
<xarray.DataArray 'temp' (lon: 3)>
array([20., 21., 22.])
Coordinates:
* lon (lon) float64 0.0 1.0 2.0
Attributes:
standard_name: sea_water_temperature
long_name: Sea water in situ temperature
units: degrees_celsius
Auto-formatting:
In [41]: ds2 = cfspecs.auto_format(ds)
In [42]: ds2.temp
Out[42]:
<xarray.DataArray 'temp' (lon: 3)>
array([20., 21., 22.])
Coordinates:
* lon (lon) float64 0.0 1.0 2.0
Attributes:
standard_name: sea_water_temperature
long_name: Sea water in situ temperature
units: degrees_celsius
In [43]: ds2.lon
Out[43]:
<xarray.DataArray 'lon' (lon: 3)>
array([0., 1., 2.])
Coordinates:
* lon (lon) float64 0.0 1.0 2.0
Attributes:
standard_name: longitude
long_name: Longitude
units: degrees_east
axis: X
It can be applied to a data array or a full dataset.
Encoding and decoding
By default, formatting renames known arrays to their generic name,
like “temp” in the example above. We speak here of encoding.
If the specialize
keyword is set to True
, arrays are
renamed with their specialized name if set in the specs with the
name
option.
We speak here of decoding.
Two shortcut methods exists for these tasks:
Chaining the two methods should lead to the initial dataset or data array. See the last section of this page for an exemple: Example: decoding/encoding Croco model outputs.
Using the accessors
Accessors for xarray.Dataset
and xarray.DataArray
can be registered with the xoa.cf.register_cf_accessors()
:
In [44]: import xoa
In [45]: xoa.register_accessors(xcf=True)
The accessor is named here xcf
to not conflict with the cf
accessor of
cf-xarray.
Note
All xoa accessors can be be registered with
xoa.register_accessors()
. Note also that all functionalities
of the xcf
accessor are also available with the more global
xoa
accessor.
These accessors make it easy to use some of the xoa.cf.CFSpecs
capabilities.
Here are examples of use:
In [46]: temp
Out[46]:
<xarray.DataArray (mylon: 3)>
array([20., 21., 22.])
Coordinates:
* mylon (mylon) float64 0.0 1.0 2.0
Attributes:
standard_name: sea_water_temperature
In [47]: temp.xcf.get("lon") # access by .get
Out[47]:
<xarray.DataArray 'mylon' (mylon: 3)>
array([0., 1., 2.])
Coordinates:
* mylon (mylon) float64 0.0 1.0 2.0
Attributes:
standard_name: longitude
In [48]: ds.xcf.get("temp") # access by .get
Out[48]:
<xarray.DataArray 'mytemp' (mylon: 3)>
array([20., 21., 22.])
Coordinates:
* mylon (mylon) float64 0.0 1.0 2.0
Attributes:
standard_name: sea_water_temperature
In [49]: ds.xcf.lon # access by attribute
Out[49]:
<xarray.DataArray 'mylon' (mylon: 3)>
array([0., 1., 2.])
Coordinates:
* mylon (mylon) float64 0.0 1.0 2.0
Attributes:
standard_name: longitude
In [50]: ds.xcf.coords.lon # specific search = ds.cf.coords.get("lon")
Out[50]:
<xarray.DataArray 'mylon' (mylon: 3)>
array([0., 1., 2.])
Coordinates:
* mylon (mylon) float64 0.0 1.0 2.0
Attributes:
standard_name: longitude
In [51]: ds.xcf.temp # access by attribute
Out[51]:
<xarray.DataArray 'mytemp' (mylon: 3)>
array([20., 21., 22.])
Coordinates:
* mylon (mylon) float64 0.0 1.0 2.0
Attributes:
standard_name: sea_water_temperature
In [52]: ds.xcf["temp"].name # access by item
Out[52]: 'mytemp'
In [53]: ds.xcf.data_vars.temp.name # specific search = ds.cf.coords.get("temp")
Out[53]: 'mytemp'
In [54]: ds.xcf.data_vars.bathy is None # returns None when not found
Out[54]: True
In [55]: ds.xcf.temp.xcf.lon.name # chaining
Out[55]: 'mylon'
In [56]: ds.xcf.temp.xcf.name # CF name, not real name
Out[56]: 'temp'
In [57]: ds.xcf.temp.xcf.attrs # attributes, merged with CF attrs
Out[57]:
{'standard_name': 'sea_water_temperature',
'long_name': 'Sea water in situ temperature',
'units': 'degrees_celsius'}
In [58]: ds.xcf.temp.xcf.standard_name # single attribute
Out[58]: 'sea_water_temperature'
In [59]: ds.mytemp.xcf.auto_format() # or ds.temp.xcf()
Out[59]:
<xarray.DataArray 'temp' (lon: 3)>
array([20., 21., 22.])
Coordinates:
* lon (lon) float64 0.0 1.0 2.0
Attributes:
standard_name: sea_water_temperature
long_name: Sea water in situ temperature
units: degrees_celsius
In [60]: ds.xcf.auto_format() # or ds.xcf()
Out[60]:
<xarray.Dataset>
Dimensions: (lon: 3)
Coordinates:
* lon (lon) float64 0.0 1.0 2.0
Data variables:
temp (lon) float64 20.0 21.0 22.0
sal (lon) float64 33.0 34.0 35.0
As you can see, accessing an accessor attribute or item make an
implicit call to get
.
The root accessor cf
give access to
two sub-accessors, data_vars
and coords
,
for being able to specialize the searches.
See also
xoa.cf.DataArrayCFAccessor
xoa.cf.DatasetCFAccessor
Changing the CF specs
Default user file
The xoa.cf
module has internal defaults as shown
in appendix The default configuration.
You can extend these defaults with a user file, whose location is printable with the following command, at the line containing “user CF specs file”:
$ xoa info paths
- xoa library dir: /home/docs/checkouts/readthedocs.org/user_builds/xoa/conda/latest/lib/python3.11/site-packages/xoa
- user config file: /home/docs/.config/xoa/xoa.cfg [*]
- user CF specs file: /home/docs/.config/xoa/cf.cfg [*]
- user CF cache file: /home/docs/.cache/xoa/cf.pyk
- data samples: mercator.cfg hycom.gdp.u.nc argo-7900573.nc hycom.gdp.v.nc gdp-6203641.csv croco.south-africa.surf.nc croco.cfg croco.south-africa.meridional.nc hycom.gdp.h.nc hycom.cfg croco.south-africa.zonal.nc ibi-argo-7900573.nc gdp.cfg argo.cfg
*: file not present
Update the current specs
The current specs can be updated with different methods.
From a well structured dictionary:
In [61]: cfspecs.load_cfg({"data_vars": {"banana": {"standard_name": "banana"}}})
In [62]: cfspecs.data_vars["banana"]
Out[62]:
{'standard_name': 'banana',
'name': None,
'alt_names': [],
'cmap': None,
'inherit': None,
'squeeze': None,
'search_order': 'sn',
'loc': None,
'add_loc': False,
'exclude': False,
'add_coords_loc': {},
'attrs': {'long_name': ['Banana'], 'standard_name': [], 'units': []},
'select': {},
'processed': True}
From a configuration file: instead of the dictionary as an argument
to load_cfg()
method, you can give either a
file name or a multi-line string with the same content as
the file.
Following the previous example:
[data_vars]
[[banana]]
standard_name: banana
If you only want to update a category
,
you can use such method (here set_specs()
):
In [63]: cfspecs.data_vars.set_specs("banana", name="bonono")
In [64]: cfspecs.data_vars["banana"]["name"]
Out[64]: 'bonono'
Alternatively, a xoa.cf.CFSpecs
instance can be loaded
with the load_cfg()
method, as explained below.
Create new specs from scratch
To create new specs, you must instantiate the xoa.cf.CFSpecs
class,
with an input type as those presented above:
A config file name.
A Multi-line string in the format of a config file.
A dictionary.
A
configobj.ConfigObj
instance.Another
CFSpecs
instance.A list of them, with the having priority over the lasts.
The initialization also accepts two options:
default
: wether to load or not the default internal config.user
: wether to load or not the user config file.
An config created from default and user configs:
In [65]: banana_specs = {"data_vars": {"banana": {"attrs": {"standard_name": "banana"}}}}
In [66]: mycfspecs = cf.CFSpecs(banana_specs)
In [67]: mycfspecs["data_vars"]["sst"]["attrs"]["standard_name"]
Out[67]: ['sea_surface_temperature', 'surface_sea_water_temperature']
In [68]: mycfspecs["data_vars"]["banana"]["attrs"]["standard_name"]
Out[68]: ['banana']
An config created from scratch:
In [69]: mycfspecs = cf.CFSpecs(banana_specs, default=False, user=False)
In [70]: mycfspecs.pprint(depth=2)
{'accessors': {'properties': {...}},
'coords': {},
'data_vars': {'banana': {...}},
'dims': {'f': [], 't': [], 'x': [], 'y': [], 'z': []},
'register': {'attrs': {}, 'name': ''},
'sglocator': {'name_format': '{root}_{loc}', 'valid_locations': None},
'vertical': {'positive': None, 'type': None}}
An config created from two other configs:
In [71]: cfspecs_banana = cf.CFSpecs(banana_specs, default=False, user=False)
In [72]: apple_specs = {"data_vars": {"apple": {"attrs": {"long_name": "Big apple"}}}}
In [73]: cfspecs_apple = cf.CFSpecs(apple_specs, default=False, user=False)
In [74]: cfspecs_fruits = cf.CFSpecs((cfspecs_apple, cfspecs_banana),
....: default=False, user=False)
....:
In [75]: cfspecs_fruits.data_vars.names
Out[75]: ['apple', 'banana']
Replacing the currents CF specs
As shown before, the currents CF specs are accessible with the
xoa.cf.get_cf_specs()
function.
You can replace them with the xoa.cf.set_cf_specs
class,
to be used as a fonction.
In [76]: cfspecs_old = cf.get_cf_specs()
In [77]: cf.set_cf_specs(cfspecs_banana)
Out[77]: <xoa.cf.set_cf_specs at 0x7f37bc230850>
In [78]: cf.get_cf_specs() is cfspecs_banana
Out[78]: True
In [79]: cf.set_cf_specs(cfspecs_old)
Out[79]: <xoa.cf.set_cf_specs at 0x7f37bc1c7b50>
In [80]: cf.get_cf_specs() is cfspecs_old
Out[80]: True
In case of a temporary change, you can used set_cf_specs
in a context statement:
In [81]: with cf.set_cf_specs(cfspecs_banana) as myspecs:
....: print('inside', cf.get_cf_specs() is cfspecs_banana)
....: print('inside', myspecs is cf.get_cf_specs())
....:
inside True
inside True
In [82]: print('outside', cf.get_cf_specs() is cfspecs_old)
outside True
For convience, you can set specs directly with a dictionary:
In [83]: with cf.set_cf_specs({"data_vars": {"apple": {}}}) as myspecs: ....: print("apple" in cf.get_cf_specs()) ....: False In [84]: print("apple" in cf.get_cf_specs()) False
Application with an accessor usage:
In [85]: data = xr.DataArray([5], attrs={'standard_name': 'sea_surface_banana'})
In [86]: ds = xr.Dataset({'toto': data})
In [87]: mycfspecs = cf.CFSpecs({"data_vars": {"ssb":
....: {"standard_name": "sea_surface_banana"}}})
....:
In [88]: with cf.set_cf_specs(mycfspecs):
....: print(ds.xcf.get("ssb"))
....:
None
Working with registered specs
Registering and accessing new specs
It is possible to register specialized CFSpecs
instances
with register_cf_specs()
for future access.
Here we register new specs with a internal registration name "mycroco"
:
In [89]: content = {
....: "register": {
....: "name": "mycroco"
....: },
....: "data_vars": {
....: "temp": {
....: "name": "supertemp"
....: }
....: },
....: "coords": {
....: "lon": {
....: "name": "mylon"
....: }
....: }
....: }
....:
In [90]: mycfspecs = cf.CFSpecs(content)
In [91]: cf.register_cf_specs(mycfspecs)
We can now access with it the get_cf_specs()
function:
In [92]: these_cfspecs = cf.get_cf_specs('mycroco')
In [93]: these_cfspecs is mycfspecs
Out[93]: True
Inferring the best specs for my dataset
If you set the cfspecs
attribute or encoding of a dataset
to the name of a registered CFSpecs
instance, you can
get it automatically with the get_cf_specs()
or
infer_cf_specs()
functions.
Let’s register another CFSpecs
instance:
In [94]: content = {
....: "register": {
....: "name": "myhycom"
....: },
....: "data_vars": {
....: "sal": {
....: "name": "supersal"
....: }
....: },
....: }
....:
In [95]: mycfspecs2 = cf.CFSpecs(content)
In [96]: cf.register_cf_specs(mycfspecs2)
Let’s create a dataset:
In [97]: ds = xr.Dataset({'supertemp': ("mylon", [0, 2])}, coords={"mylon": [10, 20]})
Now find the best registered specs instance which has either the name
myhycom
or mycroco
:
In [98]: cf_specs_auto = cf.infer_cf_specs(ds)
In [99]: print(cf_specs_auto.name)
mycroco
In [100]: ds_decoded = cf_specs_auto.decode(ds)
In [101]: ds_decoded
Out[101]:
<xarray.Dataset>
Dimensions: (lon: 2)
Coordinates:
* lon (lon) int64 10 20
Data variables:
temp (lon) int64 0 2
In [102]: cf_specs_auto.encode(ds)
Out[102]:
<xarray.Dataset>
Dimensions: (mylon: 2)
Coordinates:
* mylon (mylon) int64 10 20
Data variables:
supertemp (mylon) int64 0 2
It is mycroco
as expected.
Assigning registered specs to a dataset or data array
All xoa routines that needs to access specific coordinates
or variables try to infer the approriate specs, which default
to the current specs.
When the cfspecs
attribute or encoding is set,
get_cf_specs()
uses it to search within
registered specs.
In [103]: ds.encoding.update(cfspecs="mycroco")
In [104]: cfspecs = cf.get_cf_specs(ds)
In [105]: cfspecs.encode(ds)
Out[105]:
<xarray.Dataset>
Dimensions: (mylon: 2)
Coordinates:
* mylon (mylon) int64 10 20
Data variables:
supertemp (mylon) int64 0 2
The cfspecs
encoding is set at the dataset level,
not at the data array level:
In [106]: cf.get_cf_specs(ds.supertemp) is cfspecs
Out[106]: True
To propagate to all the data arrays, use assign_cf_specs()
:
In [107]: cf.assign_cf_specs(ds, "mycroco")
Out[107]:
<xarray.Dataset>
Dimensions: (mylon: 2)
Coordinates:
* mylon (mylon) int64 10 20
Data variables:
supertemp (mylon) int64 0 2
In [108]: ds.mylon.encoding
Out[108]: {'cf_specs': 'mycroco'}
In [109]: cf.get_cf_specs(ds.supertemp) is cfspecs
Out[109]: False
Example: decoding/encoding Croco model outputs
Here are the specs for Croco:
[register]
name=croco
[[attrs]]
ini_file=*CROCO*
type=*ROMS*
[sglocator]
valid_locations=rho,w,u,v,r,psi
[accessors]
[[properties]]
coords=ptemp
[vertical]
positive=up
type=sigma
[data_vars]
[[sal]]
name=salt
[[ptemp]]
name=temp
[[ssh]]
name=zeta
[[bathy]]
name=h
[[mask]]
mask=mask
add_loc=True
[[corio]]
name=f
[[akt]]
name=AKt
[[ex]]
name=pm
[[ey]]
name=pn
[[sig]]
name=s
[[cs]]
name=Cs
add_loc=True
[coords]
[[x]]
name=xi
search_order=n
standard_name=x_grid_index
[[y]]
name=eta
search_order=n
standard_name=y_grid_index
[[sig]]
name=s
add_loc=True
Register them:
In [110]: xoa.cf.register_cf_specs(xoa.get_data_sample("croco.cfg"))
In [111]: xoa.cf.is_registered_cf_specs("croco")
Out[111]: True
Register the xoa
accessor:
In [112]: xoa.register_accessors()
Now let’s open a Croco sample as a xarray dataset:
In [113]: ds = xoa.open_data_sample("croco.south-africa.meridional.nc")
In [114]: ds
Out[114]:
<xarray.Dataset>
Dimensions: (time: 1, s_w: 33, eta_rho: 56, xi_rho: 1, s_rho: 32,
eta_v: 55, xi_u: 1, auxil: 4)
Coordinates: (12/13)
* eta_rho (eta_rho) float32 6.0 7.0 8.0 9.0 10.0 ... 58.0 59.0 60.0 61.0
* eta_v (eta_v) float32 6.5 7.5 8.5 9.5 10.5 ... 57.5 58.5 59.5 60.5
lat_rho (eta_rho, xi_rho) float32 ...
lat_u (eta_rho, xi_u) float32 ...
lat_v (eta_v, xi_rho) float32 ...
lon_rho (eta_rho, xi_rho) float32 ...
... ...
lon_v (eta_v, xi_rho) float32 ...
* s_rho (s_rho) float32 -0.9844 -0.9531 -0.9219 ... -0.04688 -0.01562
* s_w (s_w) float32 -1.0 -0.9688 -0.9375 ... -0.0625 -0.03125 0.0
* time (time) float64 2.592e+06
* xi_rho (xi_rho) float32 131.0
* xi_u (xi_u) float32 131.5
Dimensions without coordinates: auxil
Data variables: (12/23)
AKt (time, s_w, eta_rho, xi_rho) float32 ...
Cs_r (s_rho) float32 ...
Cs_w (s_w) float32 ...
Vtransform float32 ...
angle (eta_rho, xi_rho) float32 ...
el float32 ...
... ...
time_step (time, auxil) int32 ...
u (time, s_rho, eta_rho, xi_u) float32 ...
v (time, s_rho, eta_v, xi_rho) float32 ...
w (time, s_rho, eta_rho, xi_rho) float32 ...
xl float32 ...
zeta (time, eta_rho, xi_rho) float32 ...
Attributes: (12/57)
type: ROMS history file
title: BENGUELA TEST MODEL
date:
rst_file: CROCO_FILES/croco_rst.nc
his_file: CROCO_FILES/croco_his.nc
avg_file: CROCO_FILES/croco_avg.nc
... ...
v_sponge: 0.0
sponge_expl: Sponge parameters : extent (m) & viscosity (m2.s-1)
SRCS: main.F step.F read_inp.F timers_roms.F init_scalars.F ini...
CPP-options: REGIONAL BENGUELA_VHR MPI OBC_EAST OBC_WEST OBC_NORTH OBC...
history: Tue Mar 31 16:26:24 2020: ncks -O -d time,30 -d xi_rho,13...
NCO: 4.4.2
Let’s decode it:
In [115]: dsd = ds.xoa.decode()
In [116]: dsd
Out[116]:
<xarray.Dataset>
Dimensions: (time: 1, sig_w: 33, y_rho: 56, x_rho: 1, sig_rho: 32, y_v: 55,
x_u: 1, auxil: 4)
Coordinates: (12/13)
* y_rho (y_rho) float32 6.0 7.0 8.0 9.0 10.0 ... 58.0 59.0 60.0 61.0
* y_v (y_v) float32 6.5 7.5 8.5 9.5 10.5 ... 56.5 57.5 58.5 59.5 60.5
lat_rho (y_rho, x_rho) float32 ...
lat_u (y_rho, x_u) float32 ...
lat_v (y_v, x_rho) float32 ...
lon_rho (y_rho, x_rho) float32 ...
... ...
lon_v (y_v, x_rho) float32 ...
* sig_rho (sig_rho) float32 -0.9844 -0.9531 -0.9219 ... -0.04688 -0.01562
* sig_w (sig_w) float32 -1.0 -0.9688 -0.9375 ... -0.0625 -0.03125 0.0
* time (time) float64 2.592e+06
* x_rho (x_rho) float32 131.0
* x_u (x_u) float32 131.5
Dimensions without coordinates: auxil
Data variables: (12/23)
akt (time, sig_w, y_rho, x_rho) float32 ...
cs_r (sig_rho) float32 ...
cs_w (sig_w) float32 ...
Vtransform float32 ...
angle (y_rho, x_rho) float32 ...
el float32 ...
... ...
time_step (time, auxil) int32 ...
u (time, sig_rho, y_rho, x_u) float32 ...
v (time, sig_rho, y_v, x_rho) float32 ...
w (time, sig_rho, y_rho, x_rho) float32 ...
xl float32 ...
ssh (time, y_rho, x_rho) float32 ...
Attributes: (12/57)
type: ROMS history file
title: BENGUELA TEST MODEL
date:
rst_file: CROCO_FILES/croco_rst.nc
his_file: CROCO_FILES/croco_his.nc
avg_file: CROCO_FILES/croco_avg.nc
... ...
v_sponge: 0.0
sponge_expl: Sponge parameters : extent (m) & viscosity (m2.s-1)
SRCS: main.F step.F read_inp.F timers_roms.F init_scalars.F ini...
CPP-options: REGIONAL BENGUELA_VHR MPI OBC_EAST OBC_WEST OBC_NORTH OBC...
history: Tue Mar 31 16:26:24 2020: ncks -O -d time,30 -d xi_rho,13...
NCO: 4.4.2
Let’s re-encode it!
In [117]: dse = dsd.xoa.encode()
In [118]: dse
Out[118]:
<xarray.Dataset>
Dimensions: (time: 1, s_w: 33, eta_rho: 56, xi_rho: 1, s_rho: 32,
eta_v: 55, xi_u: 1, auxil: 4)
Coordinates: (12/13)
* eta_rho (eta_rho) float32 6.0 7.0 8.0 9.0 10.0 ... 58.0 59.0 60.0 61.0
* eta_v (eta_v) float32 6.5 7.5 8.5 9.5 10.5 ... 57.5 58.5 59.5 60.5
lat_rho (eta_rho, xi_rho) float32 ...
lat_u (eta_rho, xi_u) float32 ...
lat_v (eta_v, xi_rho) float32 ...
lon_rho (eta_rho, xi_rho) float32 ...
... ...
lon_v (eta_v, xi_rho) float32 ...
* s_rho (s_rho) float32 -0.9844 -0.9531 -0.9219 ... -0.04688 -0.01562
* s_w (s_w) float32 -1.0 -0.9688 -0.9375 ... -0.0625 -0.03125 0.0
* time (time) float64 2.592e+06
* xi_rho (xi_rho) float32 131.0
* xi_u (xi_u) float32 131.5
Dimensions without coordinates: auxil
Data variables: (12/23)
AKt (time, s_w, eta_rho, xi_rho) float32 ...
Cs_r (s_rho) float32 ...
Cs_w (s_w) float32 ...
Vtransform float32 ...
angle (eta_rho, xi_rho) float32 ...
el float32 ...
... ...
time_step (time, auxil) int32 ...
u (time, s_rho, eta_rho, xi_u) float32 ...
v (time, s_rho, eta_v, xi_rho) float32 ...
w (time, s_rho, eta_rho, xi_rho) float32 ...
xl float32 ...
zeta (time, eta_rho, xi_rho) float32 ...
Attributes: (12/57)
type: ROMS history file
title: BENGUELA TEST MODEL
date:
rst_file: CROCO_FILES/croco_rst.nc
his_file: CROCO_FILES/croco_his.nc
avg_file: CROCO_FILES/croco_avg.nc
... ...
v_sponge: 0.0
sponge_expl: Sponge parameters : extent (m) & viscosity (m2.s-1)
SRCS: main.F step.F read_inp.F timers_roms.F init_scalars.F ini...
CPP-options: REGIONAL BENGUELA_VHR MPI OBC_EAST OBC_WEST OBC_NORTH OBC...
history: Tue Mar 31 16:26:24 2020: ncks -O -d time,30 -d xi_rho,13...
NCO: 4.4.2
Et voilà !
Example: decoding and merging Hycom splitted outputs
At Shom, the Hycom model outputs are splitted into separate files, one for each variable.
Conflicts may occur when using variables that are not at the same staggered grid location,
since all variables are stored with dimensions (y, x)
and lon
and lat
coordinates, all with the same name.
To avoid this problem, the horizontal dimensions and coordinates of
staggered variables are renamed to indicate their location.
Here are the specs to take care of the staggered grid indicators in the names:
[register]
name=hycom
[[attrs]]
nsigma = "[0-9]?"
[vertical]
positive=down
type=dz
[data_vars]
[[u]]
loc=u
[[[add_coords_loc]]]
lon=True
lat=True
x=True
y=True
[[v]]
loc=v
[[[add_coords_loc]]]
lon=True
lat=True
x=True
y=True
[[bathy]]
name=bathymetry
add_loc=True
[[dz]]
name=h
[coords]
[[x]]
name=X
[[y]]
name=Y
[[z]]
name=lev
Note the add_coords_loc
sections.
Location is not added to the u
and v
variables
but to their dimensions and coordinates.
Register them:
In [119]: xoa.cf.register_cf_specs(xoa.get_data_sample("hycom.cfg"))
In [120]: xoa.cf.is_registered_cf_specs("hycom")
Out[120]: True
Overview of the U dataset:
In [121]: dsu = xoa.open_data_sample("hycom.gdp.u.nc")
In [122]: dsu
Out[122]:
<xarray.Dataset>
Dimensions: (Y: 11, X: 13, lev: 6, time: 50)
Coordinates:
* lev (lev) int32 1 2 3 4 5 6
* time (time) datetime64[ns] 2021-03-01 ... 2021-03-25T12:00:00
Dimensions without coordinates: Y, X
Data variables:
lon (Y, X) float32 ...
lat (Y, X) float32 ...
u (time, lev, Y, X) float32 ...
Attributes: (12/26)
history: Fri Apr 9 12:15:12 2021: ncrcat -x -v bathyme...
Conventions: COARDS
nhybrd: 20
nsigma: 19
kapref: 2
hybflg: 0
... ...
dp00ft: [1.15 1.15 1.15 1.15 1.15 1.15 1.15 1.15 1.15 ...
dp00xt: [21.874 21.414 21.414 13.197 7.85 3.85 4....
ds00t: [3.555 3.026 3.026 1.547 0.828 0.828 0.968 0.8...
ds00ft: [1.15 1.15 1.15 1.15 1.15 1.15 1.15 1.15 1.15 ...
ds00xt: [21.874 21.414 21.414 7.197 3.85 3.85 4....
nco_openmp_thread_number: 1
Decoding:
In [123]: dsu = dsu.xoa.decode()
In [124]: dsu
Out[124]:
<xarray.Dataset>
Dimensions: (y_u: 11, x_u: 13, z: 6, time: 50)
Coordinates:
lon_u (y_u, x_u) float32 ...
lat_u (y_u, x_u) float32 ...
* z (z) int32 1 2 3 4 5 6
* time (time) datetime64[ns] 2021-03-01 ... 2021-03-25T12:00:00
Dimensions without coordinates: y_u, x_u
Data variables:
u (time, z, y_u, x_u) float32 ...
Attributes: (12/26)
history: Fri Apr 9 12:15:12 2021: ncrcat -x -v bathyme...
Conventions: COARDS
nhybrd: 20
nsigma: 19
kapref: 2
hybflg: 0
... ...
dp00ft: [1.15 1.15 1.15 1.15 1.15 1.15 1.15 1.15 1.15 ...
dp00xt: [21.874 21.414 21.414 13.197 7.85 3.85 4....
ds00t: [3.555 3.026 3.026 1.547 0.828 0.828 0.968 0.8...
ds00ft: [1.15 1.15 1.15 1.15 1.15 1.15 1.15 1.15 1.15 ...
ds00xt: [21.874 21.414 21.414 7.197 3.85 3.85 4....
nco_openmp_thread_number: 1
Now we can read, decode and merge all files without any conflict:
In [125]: dsv = xoa.open_data_sample("hycom.gdp.u.nc").xoa.decode()
In [126]: dsh = xoa.open_data_sample("hycom.gdp.h.nc").xoa.decode()
In [127]: ds = xr.merge([dsu, dsv, dsh])
In [128]: ds
Out[128]:
<xarray.Dataset>
Dimensions: (y_u: 11, x_u: 13, z: 6, time: 50, y: 11, x: 13)
Coordinates:
lon_u (y_u, x_u) float32 -3.213 -3.138 -3.062 ... -2.463 -2.388 -2.312
lat_u (y_u, x_u) float32 44.69 44.69 44.69 44.69 ... 45.22 45.22 45.22
* z (z) int32 1 2 3 4 5 6
* time (time) datetime64[ns] 2021-03-01 ... 2021-03-25T12:00:00
lon (y, x) float32 ...
lat (y, x) float32 ...
Dimensions without coordinates: y_u, x_u, y, x
Data variables:
u (time, z, y_u, x_u) float32 0.03078 0.02891 ... 0.1252 0.1272
bathy (y, x) float32 ...
dz (time, z, y, x) float32 ...
Attributes: (12/26)
history: Fri Apr 9 12:15:12 2021: ncrcat -x -v bathyme...
Conventions: COARDS
nhybrd: 20
nsigma: 19
kapref: 2
hybflg: 0
... ...
dp00ft: [1.15 1.15 1.15 1.15 1.15 1.15 1.15 1.15 1.15 ...
dp00xt: [21.874 21.414 21.414 13.197 7.85 3.85 4....
ds00t: [3.555 3.026 3.026 1.547 0.828 0.828 0.968 0.8...
ds00ft: [1.15 1.15 1.15 1.15 1.15 1.15 1.15 1.15 1.15 ...
ds00xt: [21.874 21.414 21.414 7.197 3.85 3.85 4....
nco_openmp_thread_number: 1