Resample xarray object to lower resolution spatially

xarray resample
xarray reduce example
xarray dataarray resample
xarray coarsen
pandas resample
xarray where
xarray sel
xarray rolling
Use xarray to resample to lower spatial resolution

I want to resample my xarray object to a lower spatial resolution (LESS PIXELS).

import pandas as pd
import numpy as np
import xarray as xr

time = pd.date_range(np.datetime64('1998-01-02T00:00:00.000000000'), np.datetime64('2005-12-28T00:00:00.000000000'), freq='8D')
x = np.arange(1200)
y = np.arange(1200)

latitude = np.linspace(40,50,1200)
longitude = np.linspace(0,15.5572382,1200)

latitude, longitude = np.meshgrid(latitude, longitude)

BHR_SW = np.ones((365, 1200, 1200))

output_da = xr.DataArray(BHR_SW, coords=[time, y, x])
latitude_da = xr.DataArray(latitude, coords=[y, x])
longitude_da = xr.DataArray(longitude, coords=[y, x])

output_da = output_da.rename({'dim_0':'time','dim_1':'y','dim_2':'x'})
latitude_da = latitude_da.rename({'dim_0':'y','dim_1':'x'})
longitude_da = longitude_da.rename({'dim_0':'y','dim_1':'x'})

output_ds = output_da.to_dataset(name='BHR_SW')
output_ds = output_ds.assign({'latitude':latitude_da, 'longitude':longitude_da})

print(output_ds)

<xarray.Dataset>
Dimensions:    (time: 365, x: 1200, y: 1200)
Coordinates:
  * time       (time) datetime64[ns] 1998-01-02 1998-01-10 ... 2005-12-23
  * y          (y) int64 0 1 2 3 4 5 6 7 ... 1193 1194 1195 1196 1197 1198 1199
  * x          (x) int64 0 1 2 3 4 5 6 7 ... 1193 1194 1195 1196 1197 1198 1199
Data variables:
    BHR_SW     (time, y, x) float64 1.0 1.0 1.0 1.0 1.0 ... 1.0 1.0 1.0 1.0 1.0
    latitude   (y, x) float64 40.0 40.01 40.02 40.03 ... 49.97 49.98 49.99 50.0
    longitude  (y, x) float64 0.0 0.0 0.0 0.0 0.0 ... 15.56 15.56 15.56 15.56
```
My question is, how to I resample the following by the x,y coordinates to a 200x200 grid?

This is a REDUCING the spatial resolution of the variable.

What I have tried is the following:

output_ds.resample(x=200).mean()

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-54-10fbdf855a5d> in <module>()
----> 1 output_ds.resample(x=200).mean()

/home/mpim/m300690/miniconda3/envs/holaps/lib/python2.7/site-packages/xarray/core/common.pyc in resample(self, indexer, skipna, closed, label, base, keep_attrs, **indexer_kwargs)
    701         group = DataArray(dim_coord, coords=dim_coord.coords,
    702                           dims=dim_coord.dims, name=RESAMPLE_DIM)
--> 703         grouper = pd.Grouper(freq=freq, closed=closed, label=label, base=base)
    704         resampler = self._resample_cls(self, group=group, dim=dim_name,
    705                                        grouper=grouper,

/home/mpim/m300690/miniconda3/envs/holaps/lib/python2.7/site-packages/pandas/core/resample.pyc in __init__(self, freq, closed, label, how, axis, fill_method, limit, loffset, kind, convention, base, **kwargs)
   1198                              .format(convention))
   1199
-> 1200         freq = to_offset(freq)
   1201
   1202         end_types = set(['M', 'A', 'Q', 'BM', 'BA', 'BQ', 'W'])

/home/mpim/m300690/miniconda3/envs/holaps/lib/python2.7/site-packages/pandas/tseries/frequencies.pyc in to_offset(freq)
    174                     delta = delta + offset
    175         except Exception:
--> 176             raise ValueError(libfreqs._INVALID_FREQ_ERROR.format(freq))
    177
    178     if delta is None:

ValueError: Invalid frequency: 200

But I get the error shown.

How can I complete this spatial resampling for x and y?

 Ideally I want to do this:
output_ds.resample(x=200, y=200).mean()

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-55-e0bfce19e037> in <module>()
----> 1 output_ds.resample(x=200, y=200).mean()

/home/mpim/m300690/miniconda3/envs/holaps/lib/python2.7/site-packages/xarray/core/common.pyc in resample(self, indexer, skipna, closed, label, base, keep_attrs, **indexer_kwargs)
    679         if len(indexer) != 1:
    680             raise ValueError(
--> 681                 "Resampling only supported along single dimensions."
    682             )
    683         dim, freq = indexer.popitem()

ValueError: Resampling only supported along single dimensions.
NOTE: Real data has different behaviour

this on the test data I have created above. On the real data read in from a netcdf file

<xarray.Dataset>
Dimensions:    (time: 368, x: 1200, y: 1200)
Coordinates:
  * time       (time) datetime64[ns] 1998-01-02 1998-01-10 ... 2005-12-28
Dimensions without coordinates: x, y
Data variables:
    latitude   (y, x) float32 ...
    longitude  (y, x) float32 ...
    Data_Mask  (y, x) float32 ...
    BHR_SW     (time, y, x) float32 ...
Attributes:
    CDI:               Climate Data Interface version 1.9.5 (http://mpimet.mp...
    Conventions:       CF-1.4
    history:           Fri Dec 07 13:29:13 2018: cdo mergetime GLOBALBEDO/Glo...
    content:           extracted variabel BHR_SW of the original GlobAlbedo (...
    metadata_profile:  beam
    metadata_version:  0.5
    CDO:               Climate Data Operators version 1.9.5 (http://mpimet.mp...
```

I have tried a similar thing:

ds.resample(x=200).mean()

/home/mpim/m300690/miniconda3/envs/holaps/lib/python2.7/site-packages/xarray/core/common.pyc in resample(self, indexer, skipna, closed, label, base, keep_attrs, **indexer_kwargs)
    686         dim_coord = self[dim]
    687
--> 688         if isinstance(self.indexes[dim_name], CFTimeIndex):
    689             raise NotImplementedError(
    690                 'Resample is currently not supported along a dimension '

/home/mpim/m300690/miniconda3/envs/holaps/lib/python2.7/site-packages/xarray/core/coordinates.pyc in __getitem__(self, key)
    309         if key not in self._sizes:
    310             raise KeyError(key)
--> 311         return self._variables[key].to_index()
    312
    313     def __unicode__(self):

KeyError: 'x'

Any help very much appreciated.

As piman314 suggests, groupby is the only way to do this in xarray. Resample can only be used for datetime coordinates.

Since xarray currently does not handle multidimensional groupby, this has to be done in two stages:

# this results in bin centers on 100, 300, ...
reduced = (
    output_ds
    .groupby(((output_ds.x//200) + 0.5) * 200)
    .mean(dim='x')
    .groupby(((output_ds.y//200) + 0.5) * 200)
    .mean(dim='y'))

If you simply want to downsample your data, you can use positional slicing:

output_ds[:, ::200, ::200]

or, using named dims:

output_ds[{'x': slice(None, None, 200), 'y': slice(None, None, 200)}]

Finally, there are other packages out there that are specifically designed for fast regridding compatible with xarray. xESMF is a good one.

xarray.Dataset.resample, Returns a Resample object for performing resampling operations. Handles both downsampling and upsampling. The resampled dimension must be a  keep_attrs (bool, optional) – If True, the object’s attributes (attrs) will be copied from the original object to the new one. If False (default), the new object will be returned without attributes.

To do it using xarray the most obvious way is to use groupby_bins, however it turns out this is incredibly slow. It's probably much more effecient to drop into numpy and use the superfast indexing ([:, :, frequency])

nsamples = 200
bins = np.linspace(output_ds.x.min(),
                   output_ds.x.max(), nsamples).astype(int)
output_ds = output_ds.groupby_bins('x', bins).first()

Interpolating data, This method interpolates an xarray object onto the coordinates of another xarray object. For example, if we want to compute the difference between two  Reduce this array by applying func along some dimension(s). Parameters func ( function ) – Function which can be called in the form f(x, axis=axis, **kwargs) to return the result of reducing an np.ndarray over an integer valued axis.

As you are using a NetCDF file which already was manipulated with CDOs you could also use either CDOs SAMPLEGRID function or NCOs bilinear_interp function:

SAMPLEGRID (https://code.mpimet.mpg.de/projects/cdo/embedded/cdo.pdf) does not interpolate, it just removes every n-th grid point.

bilinear_interp (http://nco.sourceforge.net/nco.html#Bilinear-interpolation) does interpolation.

As you probably want mean, max, whatever albedo values you probably would prefer NCOs bilinear_interp. But CDOs SAMPLEGRID can give you the grid_out you need for NOCs bilinear_interp.

xarray.Dataset.resample, Resample this object to a new temporal resolution. must be a function that can be called like how(values, axis) to reduce ndarray values along the given axis. Resample xarray object to lower resolution spatially Use xarray to resample to lower spatial resolution I want to resample my xarray object to a lower spatial resolution (LESS PIXELS). import pandas as pd import numpy as np import xarray as xr time =

Using xarray to resample and merge two datasets, My two datasets are of the same area, but at slightly different spatial resolutions. The SIF is a little lower resolution than the rainfall. I can  Reduce the items in this group by applying func along some dimension(s). Parameters func ( function ) – Function which can be called in the form func(x, axis=axis, **kwargs) to return the result of collapsing an np.ndarray over an integer valued axis.

Earth Observation Data Cubes, These include the selection of the spatial reference system, the resolution in space and time, the area and time range of interest, and a resampling algorithm. We define a target cube with a data cube view, an object that defines the cube To lower memory requirements and to read and process data in parallel for larger  Using xarray to resample and merge two datasets The SIF is a little lower resolution than the rainfall. import netCDF4 import numpy as np import dask import

datacube.Datacube.load, Load data as an xarray object. To reproject or resample the data, supply the output_crs , resolution A tuple of the spatial resolution of the returned data. Function used to fuse/combine/reduce data with the group_by parameter. By default  Concatenating xarray objects along an axis with a MultiIndex or PeriodIndex preserves the nature of the index . By Stephan Hoyer. Fixed bug in arithmetic operations on DataArray objects whose dimensions are numpy structured arrays or recarrays GH861, GH837. By Maciek Swat. decode_cf_timedelta now accepts arrays with ndim >1 .

Comments
  • Relevant: stackoverflow.com/a/42463491/1456927
  • Nowadays this is slightly easier with groupby_bins or even coarsen.