Scikit learn SVC predict probability doesn't work as expected

svc sklearn
predict_proba is not available when probability=false
predict probability sklearn
logistic regression sklearn
svc machine learning
svm rbf kernel
svm parameters
linearsvc predict_proba

I built sentiment analyzer using SVM classifier. I trained model with probability=True and it can give me probability. But when I pickled my model and load it again later, the probability doesn't work anymore.

The model:

from sklearn.svm import SVC, LinearSVC
pipeline_svm = Pipeline([
    ('bow', CountVectorizer()),
    ('tfidf', TfidfTransformer()),
    ('classifier', SVC(probability=True)),])

# pipeline parameters to automatically explore and tune
param_svm = [
  {'classifier__C': [1, 10, 100, 1000], 'classifier__kernel': ['linear']},
  {'classifier__C': [1, 10, 100, 1000], 'classifier__gamma': [0.001, 0.0001], 'classifier__kernel': ['rbf']},
]

grid_svm = GridSearchCV(
    pipeline_svm,
    param_grid=param_svm,
    refit=True,
    n_jobs=-1, 
    scoring='accuracy',
    cv=StratifiedKFold(label_train, n_folds=5),)

svm_detector_reloaded = cPickle.load(open('svm_sentiment_analyzer.pkl', 'rb'))
print(svm_detector_reloaded.predict([""""Today is awesome day"""])[0])

Gives me:

AttributeError: predict_proba is not available when probability=False

If that can help, pickling the model with with:

import pickle
pickle.dump(grid_svm, open('svm_sentiment_analyzer.pkl', 'wb'))

and loading the model and predicting with

svm_detector_reloaded = pickle.load(open('svm_sentiment_analyzer.pkl', 'rb'))
print(svm_detector_reloaded.predict_proba(["Today is an awesome day"])[0])

returned me two probabilities fine, after working on your code to rerun it and training the model on a pandas sents DataFrame with

grid_svm.fit(sents.Sentence.values, sents.Positive.values)

Best practices (e.g. using joblib) on model serialization can be found at https://scikit-learn.org/stable/modules/model_persistence.html

Scikit learn SVC predict probability doesn't work as expected, If that can help, pickling the model with with: import pickle pickle.dump(grid_svm, open('svm_sentiment_analyzer.pkl', 'wb')). and loading the  probability bool, default=False. Whether to enable probability estimates. This must be enabled prior to calling fit, will slow down that method as it internally uses 5-fold cross-validation, and predict_proba may be inconsistent with predict. Read more in the User Guide. tol float, default=1e-3. Tolerance for stopping criterion. cache_size

You can use CallibratedClassifierCV for probability score output.

from sklearn.calibration import CalibratedClassifierCV

model_svc = LinearSVC()
model = CalibratedClassifierCV(model_svc) 
model.fit(X_train, y_train)

Save model using pickle.

import pickle
filename = 'linearSVC.sav'
pickle.dump(model, open(filename, 'wb'))

Load model using pickle.load.

model = pickle.load(open(filename, 'rb'))

Now start prediction.

pred_class = model.predict(pred)
probability = model.predict_proba(pred)

Predicting probability from scikit-learn SVC decision_function with , Your link has sufficient resources, so let's go through: When you call decision_function(), you get the output from each of the pairwise classifiers (n*(n-​1)/2  Scikit learn SVC predict probability doesn't work as expected - Stack .. 'rb')) print(svm_detector_reloaded.predict_proba(["Today is an awesome day"])[0]). returned me two probabilities fine, after working on your stackoverflow.com

Use: SVM(probability=True)

or

grid_svm = GridSearchCV(
    probability=True
    pipeline_svm,
    param_grid=param_svm,
    refit=True,
    n_jobs=-1, 
    scoring='accuracy',
    cv=StratifiedKFold(label_train, n_folds=5),)

API Inconsitency of predict and predict_proba in SVC · Issue #13211 , When using SVC(probability=True) or SVR(probability=True) the output of SVC predict_proba does not always correspond to class with highest Also note that in https://github.com/scikit-learn/scikit-learn/pull/16769/files#  The really strange thing is that svm_predict() gives the wrong answer while svm_predict_probability(), a more complicated function, which falls back to svm_predict(), gives the right thing. (the correct predictions jump around because libsvm does some sort of cross-validation thing that I don't understand yet to train them, but they're always

scikit-learn/scikit-learn, This is separate from #13211: if you'd force scikit-learn to do something similarly bad with API Inconsitency of predict and predict_proba in SVC #13211 so SVC(probability=True).predict_proba does return correct results  When using SVC (probability=True) or SVR (probability=True) the output of predict_proba will not necessarily be consistent with predict, in the sense that, np.argmax (self.predict_proba (X), axis=1) != self.predict (X) this is documented in the user guide,

Calibrate Predicted Probabilities In SVC, In scikit-learn, the predicted probabilities must be generated when the model is being trained. This can be done by setting SVC 's probability to  Thanks for contributing an answer to Data Science Stack Exchange! Please be sure to answer the question. Provide details and share your research! But avoid … Asking for help, clarification, or responding to other answers. Making statements based on opinion; back them up with references or personal experience. Use MathJax to format equations.

sklearn.svm.libsvm.predict_proba, sklearn.svm.libsvm. predict_proba ()¶. Predict probabilities. svm_model stores all parameters needed to predict a given value. For speed, all real work is done at  You need to do a GridSearchCrossValidation instead of just CV. CV is used for performance evaluation and itself doesn't fit the estimator actually. from sklearn.datasets import make_classification from sklearn.svm import SVC from sklearn.grid_search import GridSearchCV # unbalanced classification X, y = make_classification(n_samples=1000, weights=[0.1, 0.9]) # use grid search for tuning

Comments
  • Can you show the code where you originally save the object to ''svm_sentiment_analyzer.pkl''?
  • did you try to call predict_proba rather than predict when getting that AttributeError? Otherwise this is a bit puzzling