How to identify time, lon, and lat coordinates in xarray?

xarray coordinates
xarray rename coordinates
xarray dimensions without coordinates
xarray select time range
xarray isin
xarray assign values
xarray vs pandas
xarray interpolate time

What is the best way to determine which coordinates of an xarray dataArray object contain longitude, latitude and time?

A typical dataArray might look like this:

<xarray.Dataset>
Dimensions:    (ensemble: 9, lat: 224, lon: 464, time: 12054)
Coordinates:
  * lat        (lat) float64 25.06 25.19 25.31 25.44 ... 52.56 52.69 52.81 52.94
  * lon        (lon) float64 -124.9 -124.8 -124.7 ... -67.31 -67.19 -67.06
  * time       (time) datetime64[ns] 1980-01-01 1980-01-02 ... 2012-12-31
Dimensions without coordinates: ensemble
Data variables:
    elevation  (lat, lon) float64 dask.array<shape=(224, 464), chunksize=(224, 464)>
    temp       (ensemble, time, lat, lon) float64 dask.array<shape=(9, 12054, 224, 464), chunksize=(1, 287, 224, 464)>

One approach could be to loop through the variables identified by the variable coords, like temp.coords, looking for the standard_name attributes of time, longitude, and latitude. But many datasets don't seem to include standard_name attributes for all variables.

I guess another approach be to search over the units attributes and try to identify if they have appropriate units attributes (e.g. degrees_east or degrees_west for longitude, etc).

Is there a better way?

The MetPy package includes some helpers for systematic coordinate identification like this. You can see the basics of how this works in the xarray with MetPy tutorial. For example, if you want the time coordinate of a DataArray called temp (assuming it came from a dataset that has been parsed by MetPy), you would simply call:

temp.metpy.time

This is done internally by parsing the coordinate metadata according to the CF conventions.

Here's a short example:

import xarray as xr
import metpy.calc as mpcalc

ds = xr.tutorial.load_dataset('air_temperature')
ds = ds.metpy.parse_cf()

x,y,t = ds['air'].metpy.coordinates('x','y','time')

print([coord.name for coord in (x, y, t)])

which produces:

['lon', 'lat', 'time']

Indexing and selecting data, As xarray objects can store coordinates corresponding to each dimension of an the first coordinate time and with 'IA' value from the second coordinate space . You may find increased performance by loading your data into memory first,  Xarray follows these conventions, but it mostly semantic and you don’t have to follow it. I see it like this: a data variable is the data of interest, and a coordinate is a label to describe the data of interest. For example latitude, longitude and time are coordinates while the temperature is a data variable.

You can probably do something similar to the code below with xarray filter_by:

def x_axis(nc):
    xnames = ['longitude', 'grid_longitude', 'projection_x_coordinate']
    xunits = [
        'degrees_east',
        'degree_east',
        'degree_E',
        'degrees_E',
        'degreeE',
        'degreesE',
    ]
    xvars = list(set(
        nc.get_variables_by_attributes(
            axis=lambda x: x and str(x).lower() == 'x'
        ) +
        nc.get_variables_by_attributes(
            standard_name=lambda x: x and str(x).lower() in xnames
        ) +
        nc.get_variables_by_attributes(
            units=lambda x: x and str(x).lower() in xunits
        )
    ))
    return xvars

Working with Multidimensional Coordinates, time: 36; x: 275; y: 205 In this example, the logical coordinates are x and y , while the physical Plotting¶. Let's examine these coordinate variables by plotting them. This help us distinguish it from the original multidimensional variable xc . Note: This group-by-latitude approach does not take into account the finite-size geometry of grid cells. It simply bins each value according to the coordinates at the cell center. Xarray has no understanding of grid cells and their geometry. More precise geographic regridding for Xarray data is available via the xesmf package. [ ]:

Data Structures, DataArray(data, coords=[times, locs], dims=['time', 'space']) In [5]: foo Out[5]: Data and coordinate variables are also contained separately in the data_vars and  In [59]: ds. assign (temperature2 = 2 * ds. temperature) Out[59]: <xarray.Dataset> Dimensions: (time: 3, x: 2, y: 2) Coordinates: lat (x, y) float64 42.25 42.21 42.63 42.59 lon (x, y) float64 -99.83 -99.32 -99.79 -99.23 * time (time) datetime64[ns] 2014-09-06 2014-09-07 2014-09-08 reference_time datetime64[ns] 2014-09-05 Dimensions without

If you are looking for just the special coords that act as indexes, then you can iterate over the ds.indexes and do some string parsing on their names. Something like:

ds = xr.tutorial.load_dataset('air_temperature')
ds.lat.attrs.pop('standard_name')

for k in ds.indexes.keys():
    v = ds[k]
    sn = v.attrs.get('standard_name')
    if not sn:
        if 'lon' in k:
            v.attrs.update(standard_name='longitude')
            continue
        if 'lat' in k:
            v.attrs.update(standard_name='latitude')
            continue
        if 'time' in k or k in ['day', 't', 'month', 'year']:
            v.attrs.update(standard_name='time')

Plotting, To use xarray's plotting capabilities with time coordinates containing Dataset> Dimensions: (lat: 25, lon: 53, time: 2920) Coordinates: * lat (lat) float32 75.0 72.5 If you'd like to find out what's really going on in the coordinate system, read on. Align and reindex¶. xarray’s reindex, reindex_like and align impose a DataArray or Dataset onto a new set of coordinates corresponding to dimensions. The original values are subset to the index labels still found in the new labels, and values corresponding to new labels not found in the original object are in-filled with NaN.

How do I …, change a data variable to a coordinate variable find out if my xarray object is wrapping a Dask Array round off time values to a specified frequency. xarray.Dataset.assign_coords¶ Dataset.assign_coords (self, coords=None, **coords_kwargs) ¶ Assign new coordinates to this object. Returns a new object with all the original data in addition to the new coordinates. Parameters. coords (dict, optional) – A dict with keys which are variables names. If the values are callable, they are computed

xarray with MetPy Tutorial, xarray is a powerful Python package that provides N-dimensional labeled arrays which allow simplified projection handling and coordinate identification. a coordinate from the property time = data['temperature'].metpy.time # To verify For DataArrays, MetPy also allows using the coordinate axis types  Latitude and Longitude Finder. Latitude and Longitude are the units that represent the coordinates at geographic coordinate system.To make a search, use the name of a place, city, state, or address, or click the location on the map to find lat long coordinates.

xarray, Provide accessors to enhance interoperability between xarray and MetPy. For example, MetPy can identify the coordinate corresponding to a particular axis Return the data as unix timestamp (for easier time derivatives). >>> xr. merge ([x, y, z]) <xarray.Dataset> Dimensions: (lat: 3, lon: 3, time: 2) Coordinates: * lat (lat) float64 35.0 40.0 42.0 * lon (lon) float64 100.0 120.0 150.0

Comments
  • I just loop and look for latitude/lat and longitude/lon. Is there a convention outlined somewhere? time/date/day/etc. are also tough. some common climate data library with climate_toolz.standardize_dims would be great.
  • When I first tried installing metpy into my conda environment, it wanted to upgrade and downgrade a bunch of stuff. For this coordinate identification it was sufficient to use conda install -c conda-forge metpy pint pooch --no-deps.
  • I like this idea. Could standard_name also be provided by an Intake Catalog?