## Python natural smoothing splines

I am trying to find a python package that would give an option to fit *natural smoothing splines* with user selectable smoothing factor. Is there an implementation for that? If not, how would you use what is available to implement it yourself?

By natural spline I mean that there should be a condition that the second derivative of the fitted function at the endpoints is zero (linear).

By smoothing spline I mean that the spline should not be 'interpolating' (passing through all the datapoints). I would like to decide the correct smoothing factor lambda (see the Wikipedia page for smoothing splines) myself.

##### What I have found

scipy.interpolate.CubicSpline [link]: Does natural (cubic) spline fitting. Does interpolation, and there is no way to smooth the data.

scipy.interpolate.UnivariateSpline [link]: Does spline fitting with user selectable smoothing factor. However, there is no option to make the splines natural.

After hours of investigation, I did not find any pip installable packages which could fit a natural cubic spline with user-controllable smoothness. However, after deciding to write one myself, while reading about the topic I stumbled upon a blog post by github user madrury. He has written python code capable of producing natural cubic spline models.

The model code is available here (NaturalCubicSpline) with a BSD-licence. He has also written some examples in an IPython notebook.

But since this is the Internet and links tend to die, I will copy the relevant parts of the source code here + a helper function (`get_natural_cubic_spline_model`

) written by me, and show an example of how to use it. The smoothness of the fit can be controlled by using different number of knots. The position of the knots can be also specified by the user.

##### Example

from matplotlib import pyplot as plt import numpy as np def func(x): return 1/(1+25*x**2) # make example data x = np.linspace(-1,1,300) y = func(x) + np.random.normal(0, 0.2, len(x)) # The number of knots can be used to control the amount of smoothness model_6 = get_natural_cubic_spline_model(x, y, minval=min(x), maxval=max(x), n_knots=6) model_15 = get_natural_cubic_spline_model(x, y, minval=min(x), maxval=max(x), n_knots=15) y_est_6 = model_6.predict(x) y_est_15 = model_15.predict(x) plt.plot(x, y, ls='', marker='.', label='originals') plt.plot(x, y_est_6, marker='.', label='n_knots = 6') plt.plot(x, y_est_15, marker='.', label='n_knots = 15') plt.legend(); plt.show()

##### The source code for `get_natural_cubic_spline_model`

import numpy as np import pandas as pd from sklearn.base import BaseEstimator, TransformerMixin from sklearn.linear_model import LinearRegression from sklearn.pipeline import Pipeline def get_natural_cubic_spline_model(x, y, minval=None, maxval=None, n_knots=None, knots=None): """ Get a natural cubic spline model for the data. For the knots, give (a) `knots` (as an array) or (b) minval, maxval and n_knots. If the knots are not directly specified, the resulting knots are equally space within the *interior* of (max, min). That is, the endpoints are *not* included as knots. Parameters ---------- x: np.array of float The input data y: np.array of float The outpur data minval: float Minimum of interval containing the knots. maxval: float Maximum of the interval containing the knots. n_knots: positive integer The number of knots to create. knots: array or list of floats The knots. Returns -------- model: a model object The returned model will have following method: - predict(x): x is a numpy array. This will return the predicted y-values. """ if knots: spline = NaturalCubicSpline(knots=knots) else: spline = NaturalCubicSpline(max=maxval, min=minval, n_knots=n_knots) p = Pipeline([ ('nat_cubic', spline), ('regression', LinearRegression(fit_intercept=True)) ]) p.fit(x, y) return p class AbstractSpline(BaseEstimator, TransformerMixin): """Base class for all spline basis expansions.""" def __init__(self, max=None, min=None, n_knots=None, n_params=None, knots=None): if knots is None: if not n_knots: n_knots = self._compute_n_knots(n_params) knots = np.linspace(min, max, num=(n_knots + 2))[1:-1] max, min = np.max(knots), np.min(knots) self.knots = np.asarray(knots) @property def n_knots(self): return len(self.knots) def fit(self, *args, **kwargs): return self class NaturalCubicSpline(AbstractSpline): """Apply a natural cubic basis expansion to an array. The features created with this basis expansion can be used to fit a piecewise cubic function under the constraint that the fitted curve is linear *outside* the range of the knots.. The fitted curve is continuously differentiable to the second order at all of the knots. This transformer can be created in two ways: - By specifying the maximum, minimum, and number of knots. - By specifying the cutpoints directly. If the knots are not directly specified, the resulting knots are equally space within the *interior* of (max, min). That is, the endpoints are *not* included as knots. Parameters ---------- min: float Minimum of interval containing the knots. max: float Maximum of the interval containing the knots. n_knots: positive integer The number of knots to create. knots: array or list of floats The knots. """ def _compute_n_knots(self, n_params): return n_params @property def n_params(self): return self.n_knots - 1 def transform(self, X, **transform_params): X_spl = self._transform_array(X) if isinstance(X, pd.Series): col_names = self._make_names(X) X_spl = pd.DataFrame(X_spl, columns=col_names, index=X.index) return X_spl def _make_names(self, X): first_name = "{}_spline_linear".format(X.name) rest_names = ["{}_spline_{}".format(X.name, idx) for idx in range(self.n_knots - 2)] return [first_name] + rest_names def _transform_array(self, X, **transform_params): X = X.squeeze() try: X_spl = np.zeros((X.shape[0], self.n_knots - 1)) except IndexError: # For arrays with only one element X_spl = np.zeros((1, self.n_knots - 1)) X_spl[:, 0] = X.squeeze() def d(knot_idx, x): def ppart(t): return np.maximum(0, t) def cube(t): return t*t*t numerator = (cube(ppart(x - self.knots[knot_idx])) - cube(ppart(x - self.knots[self.n_knots - 1]))) denominator = self.knots[self.n_knots - 1] - self.knots[knot_idx] return numerator / denominator for i in range(0, self.n_knots - 2): X_spl[:, i+1] = (d(i, X) - d(self.n_knots - 2, X)).squeeze() return X_spl

**Smoothing Spline Formulation,** should not be 'interpolating' (passing through all the datapoints). Cubic and Natural Cubic Splines Cubic spline is a piecewise polynomial with a set of extra constraints (continuity, continuity of the first derivative, and continuity of the second derivative). In general, a cubic spline with K knots uses cubic spline with a total of 4 + K degrees of freedom.

You could use this numpy/scipy implementation of natural cubic smoothing spline for univariate/multivariate data smoothing. Smoothing parameter should be in range [0.0, 1.0]. If we use smoothing parameter equal to 1.0 we get natural cubic spline interpolant without data smoothing. Also the implementation supports vectorization for univariate data.

Univariate example:

import numpy as np import matplotlib.pyplot as plt import csaps np.random.seed(1234) x = np.linspace(-5., 5., 25) y = np.exp(-(x/2.5)**2) + (np.random.rand(25) - 0.2) * 0.3 sp = csaps.UnivariateCubicSmoothingSpline(x, y, smooth=0.85) xs = np.linspace(x[0], x[-1], 150) ys = sp(xs) plt.plot(x, y, 'o', xs, ys, '-') plt.show()

Bivariate example:

import numpy as np import matplotlib.pyplot as plt from mpl_toolkits.mplot3d import Axes3D import csaps xdata = [np.linspace(-3, 3, 61), np.linspace(-3.5, 3.5, 51)] i, j = np.meshgrid(*xdata, indexing='ij') ydata = (3 * (1 - j)**2. * np.exp(-(j**2) - (i + 1)**2) - 10 * (j / 5 - j**3 - i**5) * np.exp(-j**2 - i**2) - 1 / 3 * np.exp(-(j + 1)**2 - i**2)) np.random.seed(12345) noisy = ydata + (np.random.randn(*ydata.shape) * 0.75) sp = csaps.MultivariateCubicSmoothingSpline(xdata, noisy, smooth=0.988) ysmth = sp(xdata) fig = plt.figure() ax = fig.add_subplot(111, projection='3d') ax.plot_wireframe(j, i, noisy, linewidths=0.5, color='r') ax.scatter(j, i, noisy, s=5, c='r') ax.plot_surface(j, i, ysmth, linewidth=0, alpha=1.0) plt.show()

**scipy.interpolate.UnivariateSpline,** Definition; Implementation; Differences from smoothing splines in SciPy 1: The natural cubic spline interpolant csaps is implemented as a pure (without C-extensions) Python modified port of MATLAB CSAPS function that is an Constructing Natural Cubic Splines with Python. Finally, let us explore how we can code the algorithm. Step 1: Create our Own Jacobi Method. Here, we define tolerance as the norm of the difference

The programming language R offers a very good implementation of natural cubic smoothing splines. You can use R functions in Python with `rpy2`

:

import rpy2.robjects as robjects r_y = robjects.FloatVector(y_train) r_x = robjects.FloatVector(x_train) r_smooth_spline = robjects.r['smooth.spline'] #extract R function# run smoothing function spline1 = r_smooth_spline(x=r_x, y=r_y, spar=0.7) ySpline=np.array(robjects.r['predict'](spline1,robjects.FloatVector(x_smooth)).rx2('y')) plt.plot(x_smooth,ySpline)

If you want to directly set `lambda`

: `spline1 = r_smooth_spline(x=r_x, y=r_y, lambda=42)`

doesn't work, because `lambda`

has already another meaning in Python, but there is a solution: How to use the lambda argument of smooth.spline in RPy WITHOUT Python interprating it as lambda.

To get the code running you first need to define the data `x_train`

and `y_train`

and you can define ```
x_smooth=np.array(np.linspace(-3,5,1920)).
```

if you want to plot it between -3 and 5 in Full-HD-resolution.

**scipy.interpolate.UnivariateSpline,** Degree of the smoothing spline. Must be <= 5. Default is k=3, a cubic spline. sfloat or None, optional. Positive smoothing factor used to choose Cubic smoothing splines with natural boundary conditions and automated choice of the smoothing parameter - eldad-a/natural-cubic-smoothing-splines

**Introduction to Regression Splines (with Python codes),** One-dimensional smoothing spline fit to a given set of data points. Fits a spline y=s(x) of degree k to the provided x, y data. s specifies the number of knots by A natural cubic smoothing splines module to smooth-out noise and obtain an estimate of the first two derivatives (velocity and acceleration in the case of a particle trajectory). Various methods have been introduced for the automatic choice of the smoothing parameter. It can also be used to get an interpolating natural cubic spline.

**natural-cubic-smoothing-splines/splines.py at master · eldad-a ,** Piece wise Step Functions; Basis Functions; Piece wise Polynomials; Constraints and Splines; Cubic and Natural Cubic splines; Choosing the In order to find the spline representation, there are two different ways to represent a curve and obtain (smoothing) spline coefficients: directly and parametrically. The direct method finds the spline representation of a curve in a 2-D plane using the function splrep .

**eldad-a/natural-cubic-smoothing-splines: Cubic smoothing ,** Cubic smoothing splines with natural boundary conditions and automated choice of from past.builtins import xrange # Python 2 and 3: backward-compatible. An introduction to modeling for statistical/machine learning via smoothing splines. You can find the code from this video here: http://bit.ly/rudeboybert_splines.