Naming conventions with xoa.cf¶
Introduction¶
This module is an application and extension of a subset of the CF conventions in application to naming conventions. It has two intents:
Searching
xarray.DataArrayvariables or coordinates with anotherxarray.DataArrayor axarray.Dataset, by scanning name and attributes likeunitsandstandard_name.Formatting
xarray.DataArrayvariables or coordinates with uniquename, andstandard_name,long_nameandunitsattributes, with support of staggered grid location syntax.
The module offers capabilities for the user to extend and specialize default behaviors for user’s special datasets.
Note
This module shares common feature with the excellent and long awaited xf_xarray package and started a long time before with the Vacumm package. The most differences differences include:
The current module is also designed for data variables, not only coordinates.
It is not only available as accessors, but also as independant objects that can be configured for each type of dataset or in contexts by the user.
Accessing the current specifications¶
Scanning and formatting actions are based on specifications,
and this module natively includes a
default configuration
for various oceanographic, sea surface and atmospheric variables and coordinates.
A distinction is made between
data variables (data_vars)
and coordinates (coords), like in xarray.
Getting the current specifications for data variables and coordinates
with the get_cf_specs() function:
In [1]: from xoa import cf
In [2]: cfspecs = cf.get_cf_specs()
In [3]: cfspecs["data_vars"] is cfspecs.data_vars
Out[3]: True
In [4]: cfspecs["data_vars"] is cfspecs.coords
Out[4]: False
In [5]: cfspecs.data_vars.names[:3]
Out[5]: ['ptemp', 'temp', 'sal']
In [6]: cfspecs.coords.names[:3]
Out[6]: ['x', 'y', 'level']
See the appendix The default configuration for the list of available default specifications.
An instance of the CFSpecs has other configuration
sections
than
data_vars,
coords and
dims:
registration,
sglocator,
vertical and
accessors.
Data variables¶
Here is the example of the sst data variable:
In [7]: from pprint import pprint
In [8]: pprint(cfspecs.data_vars['sst'])
{'add_loc': False,
'alt_names': [],
'attrs': {'long_name': ['Sea surface temperature'],
'standard_name': ['sea_surface_temperature',
'surface_sea_water_temperature'],
'units': ['degrees_celsius']},
'cmap': 'cmo.thermal',
'inherit': None,
'name': None,
'processed': True,
'search_order': 'sn',
'select': {},
'squeeze': None}
Description of specification keys:
Key |
Type |
Description |
|
str |
Specialized name for decoding and encoding which is empty by default |
|
list(str) |
Alternate names for decoding |
|
list(str) |
“standard_name” attributes |
|
list(str) |
“long_name” attributes |
|
list(str) |
“units” attributes |
|
choice |
Domain of application, within {‘generic’, ‘atmos’, ‘ocean’, ‘surface’} |
|
str |
Search order within properties as combination of letters: [n]name, [s]tandard_name, [u]nits |
|
str |
Colormap specification |
|
str |
Inherit specifications from another data variable |
|
eval |
Item selection evaluated and applied to the array |
|
list(str) |
List of dimensions that must be squeezed out |
Note
The standard_name, long_name and units specifications are
internally stored as a dict in the attrs key.
Get name and attributes only:
In [9]: cfspecs.data_vars.get_name("sst")
Out[9]: 'sst'
In [10]: cfspecs.data_vars.get_attrs("sst")
Out[10]:
{'standard_name': 'sea_surface_temperature',
'long_name': 'Sea surface temperature',
'units': 'degrees_celsius'}
Coordinates¶
Here is the example of the lon coordinate:
In [11]: from pprint import pprint
In [12]: pprint(cfspecs.coords['lon'])
{'add_loc': True,
'alt_names': ['longitude', 'nav_lon'],
'attrs': {'axis': 'X',
'long_name': ['Longitude'],
'standard_name': ['longitude', 'longitudes'],
'units': ['degrees_east',
'degree_east',
'degree_e',
'degrees_e',
'degreee',
'degreese']},
'inherit': None,
'name': None,
'processed': True,
'search_order': 'snu'}
Description of specification keys:
Key |
Type |
Description |
|
str |
Specialized name for decoding and encoding which is empty by default |
|
list(str) |
Alternate names for decoding |
|
list(str) |
“standard_name” attributes |
|
list(str) |
“long_name” attributes |
|
list(str) |
“units” attributes |
|
str |
“axis” attribute like X, Y, Z, T or F |
|
str |
Search order within properties as combination of letters: [n]name, [s]tandard_name, [u]nits |
|
str |
Inherit specifications from another data variable or coordinate |
Note
Like for data variables, the standard_name, long_name, units
and axis specifications are internally stored as a dict in the attrs key.
Get name and attributes only:
In [13]: cfspecs.coords.get_name("lon")
Out[13]: 'lon'
In [14]: cfspecs.coords.get_attrs("lon")
Out[14]:
{'standard_name': 'longitude',
'long_name': 'Longitude',
'units': 'degrees_east',
'axis': 'X'}
Dimensions¶
Key |
Type |
Description |
|
list(str) |
Possible names of X-type dimensions |
|
list(str) |
Possible names of Y-type dimensions |
|
list(str) |
Possible names of Z-type vertical dimensions |
|
list(str) |
Possible names of time dimensions |
|
list(str) |
Possible names of forecast dimensions |
This section of the specs defines the names that the dimensions
of type x, y, z, t and f (forecast) can take.
It is automatically by default with possible names (name and alt_names)
of the coordinates that have their axis defined.
The use can also add more spcific names.
Theses list of names are used when searching for dimensions that are not coordinate: since they don’t have attribute, their type can only guessed from their name.
Warning
It is recommended to not fill this section by fill in the coordinate section instead.
Here is the content:
In [15]: cfspecs.dims # or cfspecs["dims"]
Out[15]:
{'x': ['x',
'lon',
'xi',
'nx',
'ni',
'x',
'imt',
'ipi',
'longitude',
'nav_lon'],
'y': ['y',
'lat',
'yi',
'ny',
'nj',
'y',
'jmt',
'jpj',
'latitude',
'nav_lat',
'latitudes'],
'z': ['level',
'depth',
'altitude',
'sig',
'model_level',
'lev',
'nz',
'nk',
'z',
'kmt',
'zi',
'dep',
'altitudet',
'altitudeu',
'altitudev',
's_level',
'sigma',
's',
'sigma_level'],
't': ['time', 'nt'],
'f': ['forecast', 'nf', 'fcst']}
Other sections¶
In [16]: cfspecs["register"]
Out[16]: {'name': 'generic', 'attrs': {}}
In [17]: cfspecs["sglocator"]
Out[17]: {'name_format': '{root}_{loc}', 'valid_locations': None}
In [18]: cfspecs["accessors"]
Out[18]:
{'properties': {'coords': ['lon',
'lat',
'depth',
'sig',
'level',
'altitude',
'time',
'forecast'],
'data_vars': ['temp', 'sal', 'u', 'v', 'bathy']}}
In [19]: cfspecs["vertical"]
Out[19]: {'positive': None, 'type': None}
Searching within a Dataset or DataArray¶
Let’s define a minimal dataset:
In [20]: nx = 3
In [21]: lon = xr.DataArray(np.arange(3, dtype='d'), dims='mylon',
....: attrs={'standard_name': 'longitude'})
....:
In [22]: temp = xr.DataArray(np.arange(20, 23, dtype='d'), dims='mylon',
....: coords={'mylon': lon},
....: attrs={'standard_name': 'sea_water_temperature'})
....:
In [23]: sal = xr.DataArray(np.arange(33, 36, dtype='d'), dims='mylon',
....: coords={'mylon': lon},
....: attrs={'standard_name': 'sea_water_salinity'})
....:
In [24]: ds = xr.Dataset({'mytemp': temp, 'mysal': sal})
All these arrays are CF compliant according to their
standard_name attribute, despite their name is not really explicit.
Check if they match known or explicit CF items:
In [25]: cfspecs.coords.match(lon, "lon") # explicit
Out[25]: True
In [26]: cfspecs.coords.match(lon, "lat") # explicit
Out[26]: False
In [27]: cfspecs.coords.match(lon) # any known
Out[27]: 'lon'
In [28]: cfspecs.data_vars.match(temp) # any known
Out[28]: 'temp'
In [29]: cfspecs.data_vars.match(sal) # any known
Out[29]: 'sal'
Search for known CF items:
In [30]: mytemp = cfspecs.search(ds, "temp")
In [31]: mylon = cfspecs.search(mytemp, "lon")
Datasets are searched for data variables (“data_vars”) and data variables are searched for coordinates (“coords”). You can also search for coordinates in datasets, for instance like this:
In [32]: cfspecs.coords.search(ds, "lon")
Out[32]:
<xarray.DataArray 'mylon' (mylon: 3)>
array([0., 1., 2.])
Coordinates:
* mylon (mylon) float64 0.0 1.0 2.0
Attributes:
standard_name: longitude
See also
The staggered-grid locator SGLocator¶
A CFSpecs instance comes with SGLocator
that is accessible through the sglocator attribute.
A SGLocator helps parsing and formatting staggered grid
location from name, standard_name and long_name
data array attributes.
It is configured at the sglocator section
in which you can specify the
format of the name (name_format and
the allowed location names allowed_locations.
In [33]: sglocator = cfspecs.sglocator
In [34]: sglocator.formats
Out[34]:
{'name': '{root}_{loc}',
'standard_name': '{root}_at_{loc}_location',
'long_name': '{root} at {loc} location'}
In [35]: sglocator.format_attr("name", "lon", "rho")
Out[35]: 'lon_rho'
In [36]: sglocator.format_attr("standard_name", "lon", "rho")
Out[36]: 'lon_at_rho_location'
In [37]: sglocator.format_attr("long_name", "lon", "rho")
Out[37]: 'Lon at RHO location'
In [38]: sglocator.parse_attr("name", "lon_rho")
Out[38]: ('lon', 'rho')
Formatting i.e encoding and decoding¶
The idea¶
Formatting means changing or setting names and attributes. It is possible to format, or even auto-format data variables and coordinates.
During an auto-formatting, each array is matched against CF specs,
and the array is formatted when a matching is successfull.
If the array contains coordinates, the same process is applied on them,
as soon as the format_coords keyword is True.
If the name key is set in the matching item specs, its value is used
to name the variable or coordinate array, else the generic name is used
like "lon".
Explicit formatting:
In [39]: cfspecs.format_coord(lon, "lon")
Out[39]:
<xarray.DataArray 'lon' (mylon: 3)>
array([0., 1., 2.])
Dimensions without coordinates: mylon
Attributes:
standard_name: longitude
long_name: Longitude
units: degrees_east
axis: X
In [40]: cfspecs.format_coord(lon, "lon", loc="rho")
Out[40]:
<xarray.DataArray 'lon_rho' (mylon: 3)>
array([0., 1., 2.])
Dimensions without coordinates: mylon
Attributes:
standard_name: longitude_at_rho_location
long_name: Longitude at RHO location
units: degrees_east
axis: X
In [41]: cfspecs.format_data_var(temp, "temp")
Out[41]:
<xarray.DataArray 'temp' (lon: 3)>
array([20., 21., 22.])
Coordinates:
* lon (lon) float64 0.0 1.0 2.0
Attributes:
standard_name: sea_water_temperature
long_name: Temperature
units: degrees_celsius
Auto-formatting:
In [42]: ds2 = cfspecs.auto_format(ds)
In [43]: ds2.temp
Out[43]:
<xarray.DataArray 'temp' (lon: 3)>
array([20., 21., 22.])
Coordinates:
* lon (lon) float64 0.0 1.0 2.0
Attributes:
standard_name: sea_water_temperature
long_name: Temperature
units: degrees_celsius
In [44]: ds2.lon
Out[44]:
<xarray.DataArray 'lon' (lon: 3)>
array([0., 1., 2.])
Coordinates:
* lon (lon) float64 0.0 1.0 2.0
Attributes:
standard_name: longitude
long_name: Longitude
units: degrees_east
axis: X
It can be applied to a data array or a full dataset.
Encoding and decoding¶
By default, formatting renames known arrays to their generic name,
like “temp” in the example above. We speak here of encoding.
If the specialize keyword is set to True, arrays are
renamed with their specialized name if set in the specs with the
name option.
We speak here of decoding.
Two shortcut methods exists for these tasks:
Chaining the two methods should lead to the initial dataset or data array. See the last section of this page for an exemple: Example: decoding/encoding Croco model outputs.
Using the accessors¶
Accessors for xarray.Dataset and xarray.DataArray
can be registered with the xoa.cf.register_cf_accessors():
In [45]: import xoa
In [46]: xoa.register_accessors(cf="xcf")
The accessor is named here xcf
to not conflict with the cf accessor of
cf-xarray.
Note
All xoa accessors can be be registered with
xoa.register_accessors(). Note also that all functionalities
of the xcf
accessor are also available with the more global
xoa accessor.
These accessors make it easy to use some of the xoa.cf.CFSpecs
capabilities.
Here are examples of use:
In [47]: temp
Out[47]:
<xarray.DataArray (mylon: 3)>
array([20., 21., 22.])
Coordinates:
* mylon (mylon) float64 0.0 1.0 2.0
Attributes:
standard_name: sea_water_temperature
In [48]: temp.xcf.get("lon") # access by .get
Out[48]:
<xarray.DataArray 'mylon' (mylon: 3)>
array([0., 1., 2.])
Coordinates:
* mylon (mylon) float64 0.0 1.0 2.0
Attributes:
standard_name: longitude
In [49]: ds.xcf.get("temp") # access by .get
Out[49]:
<xarray.DataArray 'mytemp' (mylon: 3)>
array([20., 21., 22.])
Coordinates:
* mylon (mylon) float64 0.0 1.0 2.0
Attributes:
standard_name: sea_water_temperature
In [50]: ds.xcf.lon # access by attribute
Out[50]:
<xarray.DataArray 'mylon' (mylon: 3)>
array([0., 1., 2.])
Coordinates:
* mylon (mylon) float64 0.0 1.0 2.0
Attributes:
standard_name: longitude
In [51]: ds.xcf.coords.lon # specific search = ds.cf.coords.get("lon")
Out[51]:
<xarray.DataArray 'mylon' (mylon: 3)>
array([0., 1., 2.])
Coordinates:
* mylon (mylon) float64 0.0 1.0 2.0
Attributes:
standard_name: longitude
In [52]: ds.xcf.temp # access by attribute
Out[52]:
<xarray.DataArray 'mytemp' (mylon: 3)>
array([20., 21., 22.])
Coordinates:
* mylon (mylon) float64 0.0 1.0 2.0
Attributes:
standard_name: sea_water_temperature
In [53]: ds.xcf["temp"].name # access by item
Out[53]: 'mytemp'
In [54]: ds.xcf.data_vars.temp.name # specific search = ds.cf.coords.get("temp")
Out[54]: 'mytemp'
In [55]: ds.xcf.data_vars.bathy is None # returns None when not found
Out[55]: True
In [56]: ds.xcf.temp.xcf.lon.name # chaining
Out[56]: 'mylon'
In [57]: ds.xcf.temp.xcf.name # CF name, not real name
Out[57]: 'temp'
In [58]: ds.xcf.temp.xcf.attrs # attributes, merged with CF attrs
Out[58]:
{'standard_name': 'sea_water_temperature',
'long_name': 'Temperature',
'units': 'degrees_celsius'}
In [59]: ds.xcf.temp.xcf.standard_name # single attribute
Out[59]: 'sea_water_temperature'
In [60]: ds.mytemp.xcf.auto_format() # or ds.temp.xcf()
Out[60]:
<xarray.DataArray 'temp' (lon: 3)>
array([20., 21., 22.])
Coordinates:
* lon (lon) float64 0.0 1.0 2.0
Attributes:
standard_name: sea_water_temperature
long_name: Temperature
units: degrees_celsius
In [61]: ds.xcf.auto_format() # or ds.xcf()
Out[61]:
<xarray.Dataset>
Dimensions: (lon: 3)
Coordinates:
* lon (lon) float64 0.0 1.0 2.0
Data variables:
temp (lon) float64 20.0 21.0 22.0
sal (lon) float64 33.0 34.0 35.0
As you can see, accessing an accessor attribute or item make an
implicit call to get.
The root accessor cf give access to
two sub-accessors, data_vars
and coords,
for being able to specialize the searches.
See also
xoa.cf.DataArrayCFAccessor
xoa.cf.DatasetCFAccessor
Changing the CF specs¶
Default user file¶
The xoa.cf module has internal defaults as shown
in appendix The default configuration.
You can extend these defaults with a user file, whose location is printable with the following command, at the line containing “user CF specs file”:
$ xoa info paths
- xoa library dir: /home/docs/checkouts/readthedocs.org/user_builds/xoa/conda/v0.2.0/lib/python3.9/site-packages/xoa
- user config file: /home/docs/.config/xoa/xoa.cfg [*]
- user CF specs file: /home/docs/.config/xoa/cf.cfg [*]
- user CF cache file: /home/docs/.cache/xoa/cf.pyk
- data samples: hycom.gdp.u.nc hycom.gdp.v.nc gdp-6203641.csv croco.cfg gdp.cfg hycom.cfg croco.south-africa.zonal.nc croco.south-africa.surf.nc hycom.gdp.h.nc croco.south-africa.meridional.nc
*: file not present
Update the current specs¶
The current specs can be updated with different methods.
From a well structured dictionary:
In [62]: cfspecs.load_cfg({"data_vars": {"banana": {"standard_name": "banana"}}})
In [63]: cfspecs.data_vars["banana"]
Out[63]:
{'standard_name': 'banana',
'name': None,
'alt_names': [],
'cmap': None,
'inherit': None,
'squeeze': None,
'search_order': 'sn',
'add_loc': False,
'attrs': {'long_name': ['Banana'], 'standard_name': [], 'units': []},
'select': {},
'processed': True}
From a configuration file: instead of the dictionary as an argument
to load_cfg() method, you can give either a
file name or a multi-line string with the same content as
the file.
Following the previous example:
[data_vars]
[[banana]]
standard_name: banana
If you only want to update a category,
you can use such method (here set_specs()):
In [64]: cfspecs.data_vars.set_specs("banana", name="bonono")
In [65]: cfspecs.data_vars["banana"]["name"]
Out[65]: 'bonono'
Alternatively, a xoa.cf.CFSpecs instance can be loaded
with the load_cfg() method, as explained below.
Create new specs from scratch¶
To create new specs, you must instantiate the xoa.cf.CFSpecs class,
with an input type as those presented above:
A config file name.
A Multi-line string in the format of a config file.
A dictionary.
A
configobj.ConfigObjinstance.Another
CFSpecsinstance.A list of them, with the having priority over the lasts.
The initialization also accepts two options:
default: wether to load or not the default internal config.user: wether to load or not the user config file.
An config created from default and user configs:
In [66]: banana_specs = {"data_vars": {"banana": {"attrs": {"standard_name": "banana"}}}}
In [67]: mycfspecs = cf.CFSpecs(banana_specs)
In [68]: mycfspecs["data_vars"]["sst"]["attrs"]["standard_name"]
Out[68]: ['sea_surface_temperature', 'surface_sea_water_temperature']
In [69]: mycfspecs["data_vars"]["banana"]["attrs"]["standard_name"]
Out[69]: ['banana']
An config created from scratch:
In [70]: mycfspecs = cf.CFSpecs(banana_specs, default=False, user=False)
In [71]: mycfspecs.pprint(depth=2)
{'accessors': {'properties': {...}},
'coords': {},
'data_vars': {'banana': {...}},
'dims': {'f': [], 't': [], 'x': [], 'y': [], 'z': []},
'register': {'attrs': {}, 'name': ''},
'sglocator': {'name_format': '{root}_{loc}', 'valid_locations': None},
'vertical': {'positive': None, 'type': None}}
An config created from two other configs:
In [72]: cfspecs_banana = cf.CFSpecs(banana_specs, default=False, user=False)
In [73]: apple_specs = {"data_vars": {"apple": {"attrs": {"long_name": "Big apple"}}}}
In [74]: cfspecs_apple = cf.CFSpecs(apple_specs, default=False, user=False)
In [75]: cfspecs_fruits = cf.CFSpecs((cfspecs_apple, cfspecs_banana),
....: default=False, user=False)
....:
In [76]: cfspecs_fruits.data_vars.names
Out[76]: ['apple', 'banana']
Replacing the currents CF specs¶
As shown before, the currents CF specs are accessible with the
xoa.cf.get_cf_specs() function.
You can replace them with the xoa.cf.set_cf_specs class,
to be used as a fonction.
In [77]: cfspecs_old = cf.get_cf_specs()
In [78]: cf.set_cf_specs(cfspecs_banana)
Out[78]: <xoa.cf.set_cf_specs at 0x7f0706182730>
In [79]: cf.get_cf_specs() is cfspecs_banana
Out[79]: True
In [80]: cf.set_cf_specs(cfspecs_old)
Out[80]: <xoa.cf.set_cf_specs at 0x7f07061821f0>
In [81]: cf.get_cf_specs() is cfspecs_old
Out[81]: True
In case of a temporary change, you can used set_cf_specs
in a context statement:
In [82]: with cf.set_cf_specs(cfspecs_banana) as myspecs:
....: print('inside', cf.get_cf_specs() is cfspecs_banana)
....: print('inside', myspecs is cf.get_cf_specs())
....:
inside True
inside True
In [83]: print('outside', cf.get_cf_specs() is cfspecs_old)
outside True
For convience, you can set specs directly with a dictionary:
In [84]: with cf.set_cf_specs({"data_vars": {"apple": {}}}) as myspecs: ....: print("apple" in cf.get_cf_specs()) ....: False In [85]: print("apple" in cf.get_cf_specs()) False
Application with an accessor usage:
In [86]: data = xr.DataArray([5], attrs={'standard_name': 'sea_surface_banana'})
In [87]: ds = xr.Dataset({'toto': data})
In [88]: mycfspecs = cf.CFSpecs({"data_vars": {"ssb":
....: {"standard_name": "sea_surface_banana"}}})
....:
In [89]: with cf.set_cf_specs(mycfspecs):
....: print(ds.xcf.get("ssb"))
....:
None
Working with registered specs¶
Registering and accessing new specs¶
It is possible to register specialized CFSpecs instances
with register_cf_specs() for future access.
Here we register new specs with a internal registration name "mycroco":
In [90]: content = {
....: "register": {
....: "name": "mycroco"
....: },
....: "data_vars": {
....: "temp": {
....: "name": "supertemp"
....: }
....: },
....: "coords": {
....: "lon": {
....: "name": "mylon"
....: }
....: }
....: }
....:
In [91]: mycfspecs = cf.CFSpecs(content)
In [92]: cf.register_cf_specs(mycfspecs)
We can now access with it the get_cf_specs() function:
In [93]: these_cfspecs = cf.get_cf_specs('mycroco')
In [94]: these_cfspecs is mycfspecs
Out[94]: True
Inferring the best specs for my dataset¶
If you set the cfspecs attribute or encoding of a dataset
to the name of a registered CFSpecs instance, you can
get it automatically with the get_cf_specs() or
infer_cf_specs() functions.
Let’s register another CFSpecs instance:
In [95]: content = {
....: "register": {
....: "name": "myhycom"
....: },
....: "data_vars": {
....: "sal": {
....: "name": "supersal"
....: }
....: },
....: }
....:
In [96]: mycfspecs2 = cf.CFSpecs(content)
In [97]: cf.register_cf_specs(mycfspecs2)
Let’s create a dataset:
In [98]: ds = xr.Dataset({'supertemp': ("mylon", [0, 2])}, coords={"mylon": [10, 20]})
Now find the best registered specs instance which has either the name
myhycom or mycroco:
In [99]: cf_specs_auto = cf.infer_cf_specs(ds)
In [100]: print(cf_specs_auto.name)
mycroco
In [101]: ds_decoded = cf_specs_auto.decode(ds)
In [102]: ds_decoded
Out[102]:
<xarray.Dataset>
Dimensions: (lon: 2)
Coordinates:
* lon (lon) int64 10 20
Data variables:
temp (lon) int64 0 2
In [103]: cf_specs_auto.encode(ds)
Out[103]:
<xarray.Dataset>
Dimensions: (mylon: 2)
Coordinates:
* mylon (mylon) int64 10 20
Data variables:
supertemp (mylon) int64 0 2
It is mycroco as expected.
Assigning registered specs to a dataset or data array¶
All xoa routines that needs to access specific coordinates
or variables try to infer the approriate specs, which default
to the current specs.
When the cfspecs attribute or encoding is set,
get_cf_specs() uses it to search within
registered specs.
In [104]: ds.encoding.update(cfspecs="mycroco")
In [105]: cfspecs = cf.get_cf_specs(ds)
In [106]: cfspecs.encode(ds)
Out[106]:
<xarray.Dataset>
Dimensions: (mylon: 2)
Coordinates:
* mylon (mylon) int64 10 20
Data variables:
supertemp (mylon) int64 0 2
The cfspecs encoding is set at the dataset level,
not at the data array level:
In [107]: cf.get_cf_specs(ds.supertemp) is cfspecs
Out[107]: False
To propagate to all the data arrays, use assign_cf_specs():
In [108]: cf.assign_cf_specs(ds, "mycroco")
Out[108]:
<xarray.Dataset>
Dimensions: (mylon: 2)
Coordinates:
* mylon (mylon) int64 10 20
Data variables:
supertemp (mylon) int64 0 2
In [109]: ds.mylon.encoding
Out[109]: {'cfspecs': 'mycroco'}
In [110]: cf.get_cf_specs(ds.supertemp) is cfspecs
Out[110]: True
Example: decoding/encoding Croco model outputs¶
Here are the specs for Croco:
[register]
name=croco
[[attrs]]
ini_file=*CROCO*
[sglocator]
valid_locations=rho,w,u,v,r
[accessors]
[[properties]]
coords=ptemp
[vertical]
positive=up
type=sigma
[data_vars]
[[sal]]
name=salt
[[ptemp]]
name=temp
[[ssh]]
name=zeta
[[bathy]]
name=h
[[mask]]
mask=mask
add_loc=True
[[corio]]
name=f
[[akt]]
name=AKt
[[ex]]
name=pm
[[ey]]
name=pn
[[sig]]
name=s
[[cs]]
name=Cs
add_loc=True
[coords]
[[x]]
name=xi
search_order=n
standard_name=x_grid_index
[[y]]
name=eta
search_order=n
standard_name=y_grid_index
Register them:
In [111]: xoa.cf.register_cf_specs(xoa.get_data_sample("croco.cfg"))
In [112]: xoa.cf.is_registered_cf_specs("croco")
Out[112]: True
Register the xoa accessor:
In [113]: xoa.register_accessors()
Now let’s open a Croco sample as a xarray dataset:
In [114]: ds = xoa.open_data_sample("croco.south-africa.meridional.nc")
In [115]: ds
Out[115]:
<xarray.Dataset>
Dimensions: (auxil: 4, eta_rho: 56, eta_v: 55, s_rho: 32, s_w: 33, time: 1, xi_rho: 1, xi_u: 1)
Coordinates: (12/13)
* eta_rho (eta_rho) float32 6.0 7.0 8.0 9.0 10.0 ... 58.0 59.0 60.0 61.0
* eta_v (eta_v) float32 6.5 7.5 8.5 9.5 10.5 ... 57.5 58.5 59.5 60.5
lat_rho (eta_rho, xi_rho) float32 ...
lat_u (eta_rho, xi_u) float32 ...
lat_v (eta_v, xi_rho) float32 ...
lon_rho (eta_rho, xi_rho) float32 ...
... ...
lon_v (eta_v, xi_rho) float32 ...
* s_rho (s_rho) float32 -0.9844 -0.9531 -0.9219 ... -0.04688 -0.01562
* s_w (s_w) float32 -1.0 -0.9688 -0.9375 ... -0.0625 -0.03125 0.0
* time (time) float64 2.592e+06
* xi_rho (xi_rho) float32 131.0
* xi_u (xi_u) float32 131.5
Dimensions without coordinates: auxil
Data variables: (12/23)
AKt (time, s_w, eta_rho, xi_rho) float32 ...
Cs_r (s_rho) float32 ...
Cs_w (s_w) float32 ...
Vtransform float32 ...
angle (eta_rho, xi_rho) float32 ...
el float32 ...
... ...
time_step (time, auxil) int32 ...
u (time, s_rho, eta_rho, xi_u) float32 ...
v (time, s_rho, eta_v, xi_rho) float32 ...
w (time, s_rho, eta_rho, xi_rho) float32 ...
xl float32 ...
zeta (time, eta_rho, xi_rho) float32 ...
Attributes: (12/57)
type: ROMS history file
title: BENGUELA TEST MODEL
date:
rst_file: CROCO_FILES/croco_rst.nc
his_file: CROCO_FILES/croco_his.nc
avg_file: CROCO_FILES/croco_avg.nc
... ...
v_sponge: 0.0
sponge_expl: Sponge parameters : extent (m) & viscosity (m2.s-1)
SRCS: main.F step.F read_inp.F timers_roms.F init_scalars.F ini...
CPP-options: REGIONAL BENGUELA_VHR MPI OBC_EAST OBC_WEST OBC_NORTH OBC...
history: Tue Mar 31 16:26:24 2020: ncks -O -d time,30 -d xi_rho,13...
NCO: 4.4.2
Let’s decode it:
In [116]: dsd = ds.xoa.decode()
In [117]: dsd
Out[117]:
<xarray.Dataset>
Dimensions: (auxil: 4, sig_rho: 32, sig_w: 33, time: 1, x_rho: 1, x_u: 1, y_rho: 56, y_v: 55)
Coordinates: (12/13)
* y_rho (y_rho) float32 6.0 7.0 8.0 9.0 10.0 ... 58.0 59.0 60.0 61.0
* y_v (y_v) float32 6.5 7.5 8.5 9.5 10.5 ... 56.5 57.5 58.5 59.5 60.5
lat_rho (y_rho, x_rho) float32 -37.67 -37.61 -37.54 ... -34.03 -33.96
lat_u (y_rho, x_u) float32 -37.67 -37.61 -37.54 ... -34.03 -33.96
lat_v (y_v, x_rho) float32 -37.64 -37.57 -37.51 ... -34.06 -33.99
lon_rho (y_rho, x_rho) float32 18.83 18.83 18.83 ... 18.83 18.83 18.83
... ...
lon_v (y_v, x_rho) float32 18.83 18.83 18.83 ... 18.83 18.83 18.83
* sig_rho (sig_rho) float32 -0.9844 -0.9531 -0.9219 ... -0.04688 -0.01562
* sig_w (sig_w) float32 -1.0 -0.9688 -0.9375 ... -0.0625 -0.03125 0.0
* time (time) float64 2.592e+06
* x_rho (x_rho) float32 131.0
* x_u (x_u) float32 131.5
Dimensions without coordinates: auxil
Data variables: (12/23)
akt (time, sig_w, y_rho, x_rho) float32 0.0 0.0 ... 0.00501 0.00501
cs_r (sig_rho) float32 -0.9639 -0.8824 ... -0.0002295 -2.53e-05
cs_w (sig_w) float32 -1.0 -0.9245 -0.8382 ... -0.0001015 0.0
Vtransform float32 2.0
angle (y_rho, x_rho) float32 -0.0004444 -0.0004438 ... -0.0004062
el float32 9.969e+36
... ...
time_step (time, auxil) int32 2161 1 31 30
u (time, sig_rho, y_rho, x_u) float32 -0.03134 -0.03715 ... 0.0
v (time, sig_rho, y_v, x_rho) float32 -0.02077 -0.02401 ... 0.0
w (time, sig_rho, y_rho, x_rho) float32 -4.694e-05 ... 0.0
xl float32 9.969e+36
ssh (time, y_rho, x_rho) float32 -0.07026 -0.04691 ... 0.0 0.0
Attributes: (12/57)
type: ROMS history file
title: BENGUELA TEST MODEL
date:
rst_file: CROCO_FILES/croco_rst.nc
his_file: CROCO_FILES/croco_his.nc
avg_file: CROCO_FILES/croco_avg.nc
... ...
v_sponge: 0.0
sponge_expl: Sponge parameters : extent (m) & viscosity (m2.s-1)
SRCS: main.F step.F read_inp.F timers_roms.F init_scalars.F ini...
CPP-options: REGIONAL BENGUELA_VHR MPI OBC_EAST OBC_WEST OBC_NORTH OBC...
history: Tue Mar 31 16:26:24 2020: ncks -O -d time,30 -d xi_rho,13...
NCO: 4.4.2
Let’s re-encode it!
In [118]: dse = dsd.xoa.encode()
In [119]: dse
Out[119]:
<xarray.Dataset>
Dimensions: (auxil: 4, eta_rho: 56, eta_v: 55, s_rho: 32, s_w: 33, time: 1, xi_rho: 1, xi_u: 1)
Coordinates: (12/13)
* eta_rho (eta_rho) float32 6.0 7.0 8.0 9.0 10.0 ... 58.0 59.0 60.0 61.0
* eta_v (eta_v) float32 6.5 7.5 8.5 9.5 10.5 ... 57.5 58.5 59.5 60.5
lat_rho (eta_rho, xi_rho) float32 -37.67 -37.61 -37.54 ... -34.03 -33.96
lat_u (eta_rho, xi_u) float32 -37.67 -37.61 -37.54 ... -34.03 -33.96
lat_v (eta_v, xi_rho) float32 -37.64 -37.57 -37.51 ... -34.06 -33.99
lon_rho (eta_rho, xi_rho) float32 18.83 18.83 18.83 ... 18.83 18.83
... ...
lon_v (eta_v, xi_rho) float32 18.83 18.83 18.83 ... 18.83 18.83 18.83
* s_rho (s_rho) float32 -0.9844 -0.9531 -0.9219 ... -0.04688 -0.01562
* s_w (s_w) float32 -1.0 -0.9688 -0.9375 ... -0.0625 -0.03125 0.0
* time (time) float64 2.592e+06
* xi_rho (xi_rho) float32 131.0
* xi_u (xi_u) float32 131.5
Dimensions without coordinates: auxil
Data variables: (12/23)
AKt (time, s_w, eta_rho, xi_rho) float32 0.0 0.0 ... 0.00501 0.00501
Cs_r (s_rho) float32 -0.9639 -0.8824 -0.7925 ... -0.0002295 -2.53e-05
Cs_w (s_w) float32 -1.0 -0.9245 -0.8382 ... -0.0004109 -0.0001015 0.0
Vtransform float32 2.0
angle (eta_rho, xi_rho) float32 -0.0004444 -0.0004438 ... -0.0004062
el float32 9.969e+36
... ...
time_step (time, auxil) int32 2161 1 31 30
u (time, s_rho, eta_rho, xi_u) float32 -0.03134 -0.03715 ... 0.0
v (time, s_rho, eta_v, xi_rho) float32 -0.02077 -0.02401 ... 0.0
w (time, s_rho, eta_rho, xi_rho) float32 -4.694e-05 ... 0.0
xl float32 9.969e+36
zeta (time, eta_rho, xi_rho) float32 -0.07026 -0.04691 ... 0.0 0.0
Attributes: (12/57)
type: ROMS history file
title: BENGUELA TEST MODEL
date:
rst_file: CROCO_FILES/croco_rst.nc
his_file: CROCO_FILES/croco_his.nc
avg_file: CROCO_FILES/croco_avg.nc
... ...
v_sponge: 0.0
sponge_expl: Sponge parameters : extent (m) & viscosity (m2.s-1)
SRCS: main.F step.F read_inp.F timers_roms.F init_scalars.F ini...
CPP-options: REGIONAL BENGUELA_VHR MPI OBC_EAST OBC_WEST OBC_NORTH OBC...
history: Tue Mar 31 16:26:24 2020: ncks -O -d time,30 -d xi_rho,13...
NCO: 4.4.2
Et voilà !