Fitting a gamma distribution with (python) Scipy

Related searches

Can anyone help me out in fitting a gamma distribution in python? Well, I've got some data : X and Y coordinates, and I want to find the gamma parameters that fit this distribution... In the Scipy doc, it turns out that a fit method actually exists but I don't know how to use it :s.. First, in which format the argument "data" must be, and how can I provide the second argument (the parameters) since that's what I'm looking for?

Generate some gamma data:

import scipy.stats as stats    
alpha = 5
loc = 100.5
beta = 22
data = stats.gamma.rvs(alpha, loc=loc, scale=beta, size=10000)    
print(data)
# [ 202.36035683  297.23906376  249.53831795 ...,  271.85204096  180.75026301
#   364.60240242]

Here we fit the data to the gamma distribution:

fit_alpha, fit_loc, fit_beta=stats.gamma.fit(data)
print(fit_alpha, fit_loc, fit_beta)
# (5.0833692504230008, 100.08697963283467, 21.739518937816108)

print(alpha, loc, beta)
# (5, 100.5, 22)

scipy.stats.gamma — SciPy v1.5.2 Reference Guide, The probability density function for gamma is: f ( x , a ) = x a See scipy.stats. rv_continuous.fit for detailed documentation of the keyword arguments. expect( func� Can anyone help me out in fitting a gamma distribution in python? Well, I've got some data : X and Y coordinates, and I want to find the gamma parameters that fit this distribution In the Scipy doc , it turns out that a fit method actually exists but I don't know how to use it :s..

I was unsatisfied with the ss.gamma.rvs-function as it can generate negative numbers, something the gamma-distribution is supposed not to have. So I fitted the sample through expected value = mean(data) and variance = var(data) (see wikipedia for details) and wrote a function that can yield random samples of a gamma distribution without scipy (which I found hard to install properly, on a sidenote):

import random
import numpy

data = [6176, 11046, 670, 6146, 7945, 6864, 767, 7623, 7212, 9040, 3213, 6302, 10044, 10195, 9386, 7230, 4602, 6282, 8619, 7903, 6318, 13294, 6990, 5515, 9157]

# Fit gamma distribution through mean and average
mean_of_distribution = numpy.mean(data)
variance_of_distribution = numpy.var(data)

def gamma_random_sample(mean, variance, size):
    """Yields a list of random numbers following a gamma distribution defined by mean and variance"""
    g_alpha = mean*mean/variance
    g_beta = mean/variance
    for i in range(size):
        yield random.gammavariate(g_alpha,1/g_beta)

# force integer values to get integer sample
grs = [int(i) for i in gamma_random_sample(mean_of_distribution,variance_of_distribution,len(data))]

print("Original data: ", sorted(data))
print("Random sample: ", sorted(grs))

# Original data: [670, 767, 3213, 4602, 5515, 6146, 6176, 6282, 6302, 6318, 6864, 6990, 7212, 7230, 7623, 7903, 7945, 8619, 9040, 9157, 9386, 10044, 10195, 11046, 13294]
# Random sample:  [1646, 2237, 3178, 3227, 3649, 4049, 4171, 5071, 5118, 5139, 5456, 6139, 6468, 6726, 6944, 7050, 7135, 7588, 7597, 7971, 10269, 10563, 12283, 12339, 13066]

scipy.stats.gamma — SciPy v0.15.1 Reference Guide, This is documentation for an old release of SciPy (version 0.15.1). Read this gamma.pdf(x, a) = lambda**a * x**(a-1) * exp(-lambda*x) / gamma(a). for x >= 0, a fit(data, a, loc=0, scale=1), Parameter estimates for generic data. expect(func, a� scipy.stats.gamma¶ scipy.stats.gamma = <scipy.stats._continuous_distns.gamma_gen object> [source] ¶ A gamma continuous random variable. As an instance of the rv_continuous class, gamma object inherits from it a collection of generic methods (see below for the full list), and completes them with details specific for this particular distribution.

If you want a long example including a discussion about estimating or fixing the support of the distribution, then you can find it in https://github.com/scipy/scipy/issues/1359 and the linked mailing list message.

Preliminary support to fix parameters, such as location, during fit has been added to the trunk version of scipy.

Estimating gamma distribution parameters using sample mean and , I tried this for my data, however the results are very different compared to fitting a gamma distribution on the actual data using a python� Distribution fitting with scipy Distribution fitting is the procedure of selecting a statistical distribution that best fits to a dataset generated by some random process. In this post we will see how to fit a distribution using the techniques implemented in the Scipy library.

1): the "data" variable could be in the format of a python list or tuple, or a numpy.ndarray, which could be obtained by using:

data=numpy.array(data)

where the 2nd data in the above line should be a list or a tuple, containing your data.

2: the "parameter" variable is a first guess you could optionally provide to the fitting function as a starting point for the fitting process, so it could be omitted.

3: a note on @mondano's answer. The usage of moments (mean and variances) to work out the gamma parameters are reasonably good for large shape parameters (alpha>10), but could yield poor results for small values of alpha (See Statistical methods in the atmospheric scineces by Wilks, and THOM, H. C. S., 1958: A note on the gamma distribution. Mon. Wea. Rev., 86, 117–122.

Using Maximum Likelihood Estimators, as that implemented in the scipy module, is regarded a better choice in such cases.

Secondly, using gamma, alpha from it did not give out the correct Weibull mean. Lastly, I confirmed which method works best by computing the mean of the Weibull distribution using: Mean=alpha*scipy.special.gamma(1+(1/gamma)) The values I got corresponded to my application.

OpenTURNS has a simple way to do this with the GammaFactory class.

First, let's generate a sample:

import openturns as ot
gammaDistribution = ot.Gamma()
sample = gammaDistribution.getSample(100)

Then fit a Gamma to it:

distribution = ot.GammaFactory().build(sample)

Then we can draw the PDF of the Gamma:

import openturns.viewer as otv
otv.View(distribution.drawPDF())

which produces:

More details on this topic at: http://openturns.github.io/openturns/latest/user_manual/_generated/openturns.GammaFactory.html

SciPy has over 80 distributions that may be used to either generate data or test for fitting of existing data. In this example we will test for fit against ten distributions and plot the best three fits.

Fitting gaussian-shaped data¶ Calculating the moments of the distribution¶ Fitting gaussian-shaped data does not require an optimization routine. Just calculating the moments of the distribution is enough, and this is much faster. However this works only if the gaussian is not cut out too much, and if it is not too small.

Usage. First, let us create a data samples with N = 10,000 points from a gamma distribution: from scipy import stats data = stats.gamma.rvs(2, loc=1.5, scale=2, size=10000)

The fitter.fitter.Fitter.summary() method shows the first best distributions (in terms of fitting). Once the fitting is performed, one may want to get the parameters corresponding to the best distribution. The parameters are stored in fitted_param. For instance in the example above, the summary told us that the Gamma distribution has the best fit.

Comments
  • Thanks a lot ! But why did you create the variable x in the beginning ?
  • Ah, it seems that my message is too late. Thanks you very much again ;)
  • scipy.stats uses maximum likelihood estimation for fitting so you need to pass the raw data and not the pdf/pmf (x, y)