Python & Pandas - Group by day and count for each day
I am new on pandas and for now i don't get how to arrange my time serie, take a look at it :
date & time of connection 19/06/2017 12:39 19/06/2017 12:40 19/06/2017 13:11 20/06/2017 12:02 20/06/2017 12:04 21/06/2017 09:32 21/06/2017 18:23 21/06/2017 18:51 21/06/2017 19:08 21/06/2017 19:50 22/06/2017 13:22 22/06/2017 13:41 22/06/2017 18:01 23/06/2017 16:18 23/06/2017 17:00 23/06/2017 19:25 23/06/2017 20:58 23/06/2017 21:03 23/06/2017 21:05
This is a sample of a dataset of 130 k raws,I tried :
df.groupby('date & time of connection')['date & time of connection'].apply(list)
Not enough i guess
I think i should :
- Create a dictionnary with index from dd/mm/yyyy to dd/mm/yyyy
- Convert "date & time of connection" type dateTime to Date
- Group and count Date of "date & time of connection"
- Put the numbers i count inside the dictionary ?
What do you think about my logic ? Do you know some tutos ? Thank you very much
df = (pd.to_datetime(df['date & time of connection']) .dt.floor('d') .value_counts() .rename_axis('date') .reset_index(name='count')) print (df) date count 0 2017-06-23 6 1 2017-06-21 5 2 2017-06-19 3 3 2017-06-22 3 4 2017-06-20 2
s = pd.to_datetime(df['date & time of connection']) df = s.groupby(s.dt.floor('d')).size().reset_index(name='count') print (df) date & time of connection count 0 2017-06-19 3 1 2017-06-20 2 2 2017-06-21 5 3 2017-06-22 3 4 2017-06-23 6
np.random.seed(1542) N = 220000 a = np.unique(np.random.randint(N, size=int(N/2))) df = pd.DataFrame(pd.date_range('2000-01-01', freq='37T', periods=N)).drop(a) df.columns = ['date & time of connection'] df['date & time of connection'] = df['date & time of connection'].dt.strftime('%d/%m/%Y %H:%M:%S') print (df.head()) In : %%timeit ...: df['date & time of connection']=pd.to_datetime(df['date & time of connection']) ...: df1 = df.groupby(by=df['date & time of connection'].dt.date).count() ...: 539 ms ± 45.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) In : %%timeit ...: df1 = (pd.to_datetime(df['date & time of connection']) ...: .dt.floor('d') ...: .value_counts() ...: .rename_axis('date') ...: .reset_index(name='count')) ...: 12.4 ms ± 350 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) In : %%timeit ...: s = pd.to_datetime(df['date & time of connection']) ...: df2 = s.groupby(s.dt.floor('d')).size().reset_index(name='count') ...: 17.7 ms ± 140 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
What is Python used for?, Python Crash Course, 2nd Edition: A Hands-On, Project-Based Introduction to Programming Learn how to build, deploy & scale programs in Python, a widely-used programming language.
To make sure your columns in in date format.
df['date & time of connection']=pd.to_datetime(df['date & time of connection'])
Then you can group the data by date and do a count:
df.groupby(by=df['date & time of connection'].dt.date).count() Out: date & time of connection date & time of connection 2017-06-19 3 2017-06-20 2 2017-06-21 5 2017-06-22 3 2017-06-23 6
Why Python Should Be The First Programming Language You Learn , Automate the Boring Stuff with Python: Practical Programming for Total Beginners Looking for where to learn python? Get your questions answered. Discover where to learn python and related content.
Hey I found easy way to do this with resample.
# Set the date column as index column. df = df.set_index('your_date_column') # Make counts df_counts = df.your_date_column.resample('D').count()
Although your column name is long and contains spaces, which makes me a little cringy. I would use dashes instead of spaces.
Why Python is Popular Despite Being (Super) Slow - Bobby, Learn Python the Hard Way: A Very Simple Introduction to the Terrifyingly Beautiful World of Computers and Code Trusted results for Learn Programming Python. Check Visymo Search for the best results!
Python (programming language), Learn Python 2. Learn the basics of the world's fastest growing and most popular programming language used by software engineers, analysts, data scientists, Python knows the usual control flow statements that other languages speak — if, for, while and range — with some of its own twists, of course. More control flow tools in Python 3 Python is a programming language that lets you work quickly and integrate systems more effectively.
Python Tutorial, The class includes written materials, lecture videos, and lots of code exercises to practice Python coding. These materials are used within Google The Python web site provides a Python Package Index (also known as the Cheese Shop, a reference to the Monty Python script of that name). There is also a search page for a number of sources of Python-related information. Failing that, just Google for a phrase including