Vertical line at the end of a CDF histogram using matplotlib

matplotlib histogram
plot cdf python
matplotlib histogram log scale
python plot histogram from list
cdf from histogram python
horizontal histogram matplotlib
matplotlib histogram already counted
matplotlib histogram example

I'm trying to create a CDF but at the end of the graph, there is a vertical line, shown below:

I've read that his is because matplotlib uses the end of the bins to draw the vertical lines, which makes sense, so I added into my code as:

bins = sorted(X) + [np.inf]

where X is the data set I'm using and set the bin size to this when plotting:

plt.hist(X, bins = bins, cumulative = True, histtype = 'step', color = 'b')

This does remove the line at the end and produce the desired effect, however when I normalise this graph now it produces an error:

ymin = max(ymin*0.9, minimum) if not input_empty else minimum

UnboundLocalError: local variable 'ymin' referenced before assignment

Is there anyway to either normalise the data with

bins = sorted(X) + [np.inf]

in my code or is there another way to remove the line on the graph?

An alternative way to plot a CDF would be as follows (in my example, X is a bunch of samples drawn from the unit normal):

import numpy as np
import matplotlib.pyplot as plt

X = np.random.randn(10000)
n = np.arange(1,len(X)+1) / np.float(len(X))
Xs = np.sort(X)
fig, ax = plt.subplots()
ax.step(Xs,n) 

Matplotlib cumulative histogram, If you do not set the bins parameter yourself, plt.hist will choose (by default, 10) bins for you: In [58]: n, bins, patches = plt.hist(X, normed=False,� Using histograms to plot a cumulative distribution¶ This shows how to plot a cumulative, normalized histogram as a step function in order to visualize the empirical cumulative distribution function (CDF) of a sample. We also show the theoretical CDF. A couple of other options to the hist function are demonstrated.

I needed a solution where I would not need to alter the rest of my code (using plt.hist(...) or, with pandas, dataframe.plot.hist(...)) and that I could reuse easily many times in the same jupyter notebook.

I now use this little helper function to do so:

def fix_hist_step_vertical_line_at_end(ax):
    axpolygons = [poly for poly in ax.get_children() if isinstance(poly, mpl.patches.Polygon)]
    for poly in axpolygons:
        poly.set_xy(poly.get_xy()[:-1])

Which can be used like this (without pandas):

import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt

X = np.sort(np.random.randn(1000))

fig, ax = plt.subplots()
plt.hist(X, bins=100, cumulative=True, density=True, histtype='step')

fix_hist_step_vertical_line_at_end(ax)

Or like this (with pandas):

import numpy as np
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt

df = pd.DataFrame(np.random.randn(1000))

fig, ax = plt.subplots()
ax = df.plot.hist(ax=ax, bins=100, cumulative=True, density=True, histtype='step', legend=False)

fix_hist_step_vertical_line_at_end(ax)

This works well even if you have multiple cumulative density histograms on the same axes.

Warning: this may not lead to the wanted results if your axes contain other patches falling under the mpl.patches.Polygon category. That was not my case so I prefer using this little helper function in my plots.

Using histograms to plot a cumulative distribution — Matplotlib 3.1.0 , Conversely, setting, cumulative to -1 as is done in the last series for this example, creates a "exceedance" curve. Selecting different bin counts and sizes can� Parameters: x: scalar or 1D array_like. x-indexes where to plot the lines. ymin, ymax: scalar or 1D array_like. Respective beginning and end of each line. If scalars are provided, all lines will have same length.

Assuming that your intentions are pure aesthetic, add a vertical line, of the same color as your plot background:

ax.axvline(y = value, color = 'white', linewidth = 2)

Where "value" stands for the right extreme of the rightmost bin.

matplotlib.pyplot.hist — Matplotlib 3.1.0 documentation, matplotlib.pyplot. hist (x, bins=None, range=None, density=None, weights=None, cumulative=False, bottom=None, histtype='bar', align='mid', orientation='vertical', rwidth=None, log=False, color=None, With Numpy 1.11 or newer, you can alternatively provide a string describing a binning strategy, such as 'auto', 'sturges ',� At the end of this guide, I’ll show you another way to derive the bins. Step 4: Plot the histogram in Python using matplotlib. You’ll now be able to plot the histogram based on the template that you saw at the beginning of this guide: import matplotlib.pyplot as plt x = [value1, value2, value3,.] plt.hist(x, bins = number of bins) plt.show()

matplotlib.axes.Axes.hist — Matplotlib 3.3.0 documentation, Axes. hist (self, x, bins=None, range=None, density=False, weights=None, bottom=None, histtype='bar', align='mid', orientation='vertical', rwidth=None, log= False, or as a 2-D ndarray in which each column is a dataset. cumulativebool or -1, default: False If multiple data are given the bars are arranged side by side. You can use the plot or vlines to draw a vertical line, but to draw a vertical line from the bottom to the top of the y axis, axvline is the probably the simplest function to use.

python, i'm trying create cdf @ end of graph, there vertical line, shown below: plot. i've read because matplotlib uses end of bins draw vertical lines,� Bases: matplotlib.axes._base._AxesBase The Axes contains most of the figure elements: Axis , Tick , Line2D , Text , Polygon , etc., and sets the coordinate system. The Axes instance supports callbacks through a callbacks attribute which is a CallbackRegistry instance.

Python Histogram Plotting: NumPy, Matplotlib, Pandas & Seaborn , A histogram is a great tool for quickly assessing a probability distribution that is intuitively Building histograms in pure Python, without use of third party libraries def ascii_histogram(seq) -> None: """A horizontal frequency-table/ histogram plot. The last line contains some LaTex, which integrates nicely with Matplotlib. Plotting Histogram using only Matplotlib. Plotting histogram using matplotlib is a piece of cake. All you have to do is use plt.hist() function of matplotlib and pass in the data along with the number of bins and a few optional parameters. In plt.hist(), passing bins='auto' gives you the “ideal” number of bins. The idea is to select a bin

Comments
  • Not sure why this got down-voted. This is an artifact of how hist + step works. You may be better off computing the cumulative histogram and then using ax.step.
  • Do you want a CDF or a histogram? If it's a CDF, which one?
  • This is a brilliant and beautiful alternative!
  • The problem that appears is that plot will be linearly interpolated inbetween dots, but the true cumulative function should have these "jumps".
  • Yes, that's probably a fair point - although it won't make much difference for large samples of data. Nonetheless, I have updated my answer to use plt.step instead. Thanks!
  • Thanks! That worked for me. I have a complementary CDF, so I just needed to change poly.set_xy(poly.get_xy()[:-1]) to poly.set_xy(poly.get_xy()[1:])