## Significant mismatch between `r2_score` of `scikit-learn` and the R^2 calculation

##### Question

Why is there a significant difference between the `r2_score`

function in scikit-learn and the formula for the Coefficient of Determination as described in Wikipedia? Which is the correct one?

##### Context

I'm using with Python 3.5 to predict linear and quadratic models, and one of the measures of goodness of fit that I'm trying out is the . However, while testing, there's a marked difference between the `r2_score`

metric in `scikit-learn`

and the calculation provided in Wikipedia.

##### Code

I'm providing my code here as reference, which computes the example in the Wikipedia page linked above.

from sklearn.metrics import r2_score import numpy y = [1, 2, 3, 4, 5] f = [1.9, 3.7, 5.8, 8.0, 9.6] # Convert to numpy array and ensure double precision to avoid single precision errors observed = numpy.array(y, dtype=numpy.float64) predicted = numpy.array(f, dtype=numpy.float64) scipy_value = r2_score(observed, predicted) >>> scipy_value:

As is evident, the `scipy`

calculated value is `-3.8699999999999992`

while the reference value in Wikipedia is `0.998`

.

Thank you!

**UPDATE:** This is different from this question about how R^2 is calculated in scikit-learn as what I'm trying to understand and have clarified is the discrepancy between both results. That question states that the formula used in scikit is the same as Wikipedia's which should not result in different values.

**UPDATE #2:** It turns out I made a mistake reading the Wikipedia article's example. Answers and comments below mention that the example I provide is for the linear, least squares fit of the (x, y) values in the example. For that, the answer in Wikipedia's article is correct. For that, the R^2 calue provided is 0.998. For the R^2 between both vectors, scikit's answer is also correct. Thanks a lot for your help!

The referred question is correct -- if you work through the calculation for the residual sum of squares and the total sum of squares, you get the same value as sklearn:

In [85]: import numpy as np In [86]: y = [1,2,3,4,5] In [87]: f = [1.9, 3.7, 5.8, 8.0, 9.6] In [88]: SSres = sum(map(lambda x: (x[0]-x[1])**2, zip(y, f))) In [89]: SStot = sum([(x-np.mean(y))**2 for x in y]) In [90]: SSres, SStot Out[90]: (48.699999999999996, 10.0) In [91]: 1-(SSres/SStot) Out[91]: -3.8699999999999992

The idea behind a negative value is that you'd have been closer to the actual values had you just predicted the mean each time (which would correspond to an r2 = 0).

**sklearn.metrics.r2_score — scikit-learn 0.23.1 documentation,** The referred question is correct -- if you work through the calculation for the residual sum of squares and the total sum of squares, you get the same value as � Question Why is there a significant difference between the r2_score function in scikit-learn and the formula for the Coefficient of Determination as described in Wikipedia?

I think you have misinterpreted wikipedia. The example on wikipedia does ** not** state:

y=[1,2,3,4,5] f=[1.9, 3.7, 5.8, 8.0, 9.6] R^2 = 0.998

Instead, it says that the `R^2`

for a linear least-squares fit to the data:

x=[1,2,3,4,5] y=[1.9, 3.7, 5.8, 8.0, 9.6]

is equal to `0.998`

Consider this script, which first uses `np.linalg.lstsq`

to find the least squares fit, and the uses both methods to find an `R^2`

of 0.998 for both:

import numpy as np from sklearn.metrics import r2_score x=np.arange(1,6,1) y=np.array([1.9, 3.7, 5.8, 8.0, 9.6]) A=np.vstack([x, np.ones(len(x))]).T # Use numpy's least squares function m, c = np.linalg.lstsq(A, y)[0] print m,c # 1.97 -0.11 # Define the values of our least squares fit f=m*x+c print f # [ 1.86 3.83 5.8 7.77 9.74] # Calculate R^2 explicitly yminusf2=(y-f)**2 sserr=sum(yminusf2) mean=float(sum(y))/float(len(y)) yminusmean2=(y-mean)**2 sstot=sum(yminusmean2) R2=1.-(sserr/sstot) print R2 # 0.99766066838 # Use scikit print r2_score(y,f) # 0.99766066838 r2_score(y,f) == R2 # True

**scikit-learn/scikit-learn,** R^2 (coefficient of determination) regression score function. Best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A� I'm trying to test my Scikit-learn machine learning algorithm with a simple R^2 score, but for some reason it always returns zero.

The coefficient of determination effectively compares the variance in the data to the variance in the residual. The residual is the difference between the predicted and observed value and its variance is the sum of squares of this difference.

If the prediction is perfect, the variance of the residual is zero. Hence, the coefficient of determination is one. If the prediction is not perfect some of the residuals are non-zero and the variance of the residuals is positive. Hence, the coefficient of determination is lower than one.

The toy problem obviously has a low coefficient of determination since most of the predicted values are way off. A coefficient of determination of `-3.86`

means that the variance of the residual is `4.86`

times as large as the variance in the observed values.

The `0.998`

value comes from the coefficient of determination of linear least squares fit of the set of data. This means that the observed values are related to the predicted values by a linear relation (plus a constant) that minimizes the variance of the residual. The observed and predicted values from the toy problem are highly linear dependent and thus the coefficient of determination of the linear least squares fit is very close to one.

**Python scikit-learn (metrics): difference between r2_score and ,** Description Significant mismatch between r2 score computed by and sklearn. metrics.r2_score Steps/Code to Reproduce import numpy as np from [ 9.74]] # Calculate R^2 explicitly yminusf2=(y-f)**2 sserr=sum(yminusf2)� Yes, it differs. You are getting the score on whole data (i.e fitting on x and also predicting on same). So R2 and r2_score() are very high. But as for your question, it differs because the permutation_test_score() doesnt calculate score on whole data, but uses a cross-validation technique and outputs the average of scores got over all folds.

Both method uses the same formula to calculate the R-Square. check out the code below:

# Data X=np.array([1.9, 3.7, 5.8, 8.0, 9.6]).reshape(-1, 1) y=[1,2,3,4,5] # Import module from sklearn.linear_model import LinearRegression from sklearn.metrics import r2_score reg = LinearRegression().fit(X, y) # Predict the target variable y_pred=reg.predict(X) # R-Square fitness print('R-Square(metrics):', r2_score(y, y_pred)) # R-Square using score method print('R-Sqaure(Score):',reg.score(X, y))

Output: R-Square(metrics): 0.9976606683804627 R-Sqaure(Score): 0.9976606683804627

**Why am I getting a different r-square value computed from ,** Python scikit-learn (metrics): difference between r2_score and only difference is that we are subtracting the Mean Error from the first formula! 嚴重不匹配的r2_score scikit-learn和R ^ 2的計算 - Significant mismatch between� I am using sklearn.linear_model.LogisticRegression and using that I calculate the R^2 value as follows. regr.score(xtest, ytest) and I get a score of 0.65. Now, just to compare i used the metric provided by sklearn.metrics.r2_score¶ and I calculate the score as follows. r2_score(ytest,regr.predict(xtest)) and I get a score of -0.54

**How is the R2 value in Scikit learn calculated?,** While I was trying to fit a linear relationship between two variable (x,y) with no constants, I had come Is it possible to have such a wide difference in the value of R2. With your data, you have a non significant positive intercept. and believe it or not the R2 that is calculated directly of the scatter plot is wright (R2= 0.3769). R^2 (coefficient of determination) regression score function. Best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a R^2 score of 0.0. Read more in the User Guide. Parameters

**[PDF] Statistics and Machine Learning in Python,** The R^2 value in scikit learn is the same on the coefficient of determination. It is a 1-residual sum of square / total sum of squares. The big� The goal is to have a value that is low. What low means is quantified by the r2 score (explained below). In the code below, this is np.var(err), where err is an array of the differences between observed and predicted values and np.var() is the numpy array variance function. r2 score—varies between 0 and 100%.

**How to Choose Right Metric for Evaluating ML Model,** Scikit-learn: Machine learning li- returns union: {cobra, r, java, viper, python} returns 20 Ex: calc(3,5) returns 8 Ex: calc(1, 2, "something") returns error message Numpy internals: By default Numpy use C convention, ie, Row-major language: The sures of goodness of fit typically summarize the discrepancy between� You're incorrect that it is the correlation coefficient. In the doctest example, r2_score([1,2,3], [3,2,1]) is calculated as -3. The correlation coefficient would be -1.