How to get odds-ratios and other related features with scikit-learn

plot logistic regression python
sklearn logistic regression coefficients
statsmodels logistic regression
logistic regression sklearn
multinomial logistic regression python
valueerror: solver lbfgs supports only l2 or none' penalties, got l1 penalty
logistic regression classifier
odds ratio to probability

I'm going through this odds ratios in logistic regression tutorial, and trying to get the exactly the same results with the logistic regression module of scikit-learn. With the code below, I am able to get the coefficient and intercept but I could not find a way to find other properties of the model listed in the tutorial such as log-likelyhood, Odds Ratio, Std. Err., z, P>|z|, [95% Conf. Interval]. If someone could show me how to have them calculated with sklearn package, I would appreciate it.

import pandas as pd
from sklearn.linear_model import LogisticRegression

url = 'http://www.ats.ucla.edu/stat/mult_pkg/faq/general/sample.csv'
df = pd.read_csv(url, na_values=[''])
y = df.hon.values
X = df.math.values
y = y.reshape(200,1)
X = X.reshape(200,1)
clf = LogisticRegression(C=1e5)
clf.fit(X,y)
clf.coef_
clf.intercept_

You can get the odds ratios by taking the exponent of the coeffecients:

import numpy as np
X = df.female.values.reshape(200,1)
clf.fit(X,y)
np.exp(clf.coef_)

# array([[ 1.80891307]])

As for the other statistics, these are not easy to get from scikit-learn (where model evaluation is mostly done using cross-validation), if you need them you're better off using a different library such as statsmodels.

Logistic Regression - 2020, Logistic regression is another simple yet more powerful algorithm for linear and binary classification problems. It is one of the Let's compare the odds ratio with probability for a case of getting "1" for dice roll: Logit and logistic functions This tutorial is largely based on "Python Machine Learning: Sebastian Raschka". Scikit-learn offers an extensive range of built-in algorithms that make the most of data science projects. Here are the main ways the Scikit-learn library is used. 1. Classification. The classification tools identify the category associated with provided data. For example, they can be used to categorize email messages as either spam or not.

In addition to @maxymoo's answer, to get other statistics, statsmodel can be used. Assuming that you have your data in a DataFrame called df, the code below should show a good summary:

import pandas as pd
from patsy import dmatrices
import statsmodels.api as sm 

y, X = dmatrices( 'label ~ age + gender', data=df, return_type='dataframe')
mod = sm.Logit(y, X)
res = mod.fit()
print res.summary()

Data Science Projects with Python: A case study approach to , A case study approach to successful data science projects using Python, pandas, and scikit-learn Stephen Klosterman. The output should be: 4. Calculate the odds ratio from p and q, as well as the log odds, using the natural In order to plot the log odds against the values of the feature, we can get the Create a similar. Currently, sklearn has support for two strategies in order to achieve this: One vs. One: Calculates the average of pairwise ROC AUC scores for each pair of classes (average=’macro’) <image>. One vs. Rest: Calculate the average of ROC AUC scores for each class against all the others (average=’weighted’) <image>.

I don't know such a method using scikit-learn, but Table2x2 from statsmodels.api.stats could be useful in your case, as it provides you with the OR, SE, CI and P value with 3 lines of codes:

import statsmodels.api as sm
table = sm.stats.Table2x2(np.array([[73, 756], [14, 826]]))
table.summary(method='normal')
"""
               Estimate    SE   LCB    UCB p-value
Odds ratio        5.697       3.189 10.178   0.000
Log odds ratio    1.740 0.296 1.160  2.320   0.000
Risk ratio        5.283       3.007  9.284   0.000
Log risk ratio    1.665 0.288 1.101  2.228   0.000
"""

(Tutorial) Understanding Logistic REGRESSION in PYTHON , Learn about LOGISTIC REGRESSION, its basic properties, and build a MACHINE It uses a log of odds as the dependent variable. Logistic #split dataset in features and target variable feature_cols You have two classes 0 and 1. to the target variable and are very similar or correlated to each other. Basics of the API ¶. Most commonly, the steps in using the Scikit-Learn estimator API are as follows (we will step through a handful of detailed examples in the sections that follow). Choose a class of model by importing the appropriate estimator class from Scikit-Learn.

Building A Logistic Regression in Python, Step by Step, The independent variables are linearly related to the log odds. Our classes are imbalanced, and the ratio of no-subscription to subscription instances is 89:11. means for other categorical variables such as education and marital status to get a more This process is applied until all features in the dataset are exhausted. 1.13.3. Recursive feature elimination ¶. Given an external estimator that assigns weights to features (e.g., the coefficients of a linear model), recursive feature elimination ( RFE ) is to select features by recursively considering smaller and smaller sets of features.

Logistic Regression In Python, For example, if the odds of winning a game are 5 to 2, we calculate the ratio as 5/​2=2.5. We take the log of the odds because otherwise, when we calculate the On the other hand, the odds of Team B winning a game are 1 to 5. fit function with the features and the labels (since Logistic Regression is a  Scikit-learn plotting capabilities (i.e., functions start with “plot_” and classes end with “Display”) require Matplotlib (>= 2.1.1). For running the examples Matplotlib >= 2.1.1 is required. A few examples require scikit-image >= 0.13, a few examples require pandas >= 0.18.0, some examples require seaborn >= 0.9.0.

3.2.4.1.5. sklearn.linear_model.LogisticRegressionCV, If not given, all classes are supposed to have weight one. Note! the synthetic feature weight is subject to l1/l2 regularization as all other features. To lessen the​  In this post you will get an gentle introduction to the scikit-learn Python library and useful references that you can use to dive deeper. If you are a Python programmer or you are looking for a robust library you can use to bring machine learning into a production system then a library that you will want to seriously consider is scikit-learn.

Comments
  • fyi you should do the import as from sklearn.linear_model import LogisticRegression
  • Thanks @maxymoo. Yes, I could get other summaries with statsmodels.
  • Are these values only odds ratios if and only if the features are independent? (i.e. there is no interaction between the features).
  • @user48956 take a look at stats.stackexchange.com/questions/57031/… for a good example of how to interpret odds ratios