Format github json data into a pandas dataframe with daily dates

pandas dataframe to_json date format
pandas datetime format
pandas timestamp to json
pandas dataframe to list of json
pandas timestamp to datetime
github covid daily cases
pandas datetime to string
pandas nanoseconds to datetime

I'm trying to retrieve commits for the ethereum repo from Github and format it into a DataFrame with daily dates (index) and count as the column.

I looked around but the JSON data I'm getting from Github is pretty strange to me and not exactly sure how to deal with it.

Github JSON data:

      days                  total   week
0   [0, 2, 1, 2, 2, 3, 2]   12      1515283200
1   [0, 3, 2, 0, 0, 0, 0]   5       1515888000
2   [0, 2, 6, 1, 1, 5, 0]   15      1516492800

Code

#Get github data
with urllib.request.urlopen('https://api.github.com/repos/ethereum/go-ethereum/stats/commit_activity') as url:
   jStr = url.read()
#Format data
data = json.loads(jStr)
data_activity = json_normalize(data)

I hope to achieve:

               ETH commits   
2017-11-26     2
2017-11-27     3
...

Change json_normalize to flatten the list to a new column, convert it to DatetimeIndex and add timedeltas with modulo divide by 7 to add days:

>>> data_activity = (json_normalize(data, 'days','week')
...                    .set_index('week').rename(columns={0:'ETH commits'}))
>>> data_activity.index = (pd.to_datetime(data_activity.index, unit='s') + 
...                        pd.to_timedelta(np.arange(len(data_activity.index)) % 7, unit='d'))
>>> print (data_activity.head(10))

            ETH commits
2018-01-07            0
2018-01-08            2
2018-01-09            1
2018-01-10            2
2018-01-11            2
2018-01-12            3
2018-01-13            2
2018-01-14            0
2018-01-15            3
2018-01-16            2

No way with to_json to write only date out of datetime , Code Sample, a copy-pastable example if possible In [2]: pd. Skip to content. pandas-dev / pandas · Sign up Series(pd.to_datetime(['2017-03-15'])).to_json(​date_format='iso') Out[3]: '{"0":"2017-03-15T00:00:00.000Z"}' This does not conform to JSON and this creates non-standard formatting for dates. Create a list of dates for your trip in the format %m-%d. Use the daily_normals function to calculate the normals for each date string and append the results to a list. Load the list of daily normals into a Pandas DataFrame and set the index equal to the date. Use Pandas to plot an area plot (stacked=False) for the daily normals.

Here, week is the Unix time of the start of each week. I'm not sure how to explain this, but I'm 70% sure this code will give you the format you want:

import datetime

def github_norm(d):
    for week_n in range(len(d)):  # usually 52, but not guaranteed(?)
        week = d[week_n]
        week_timestamp = datetime.datetime.fromtimestamp(week["week"])
        for day_n, commits in enumerate(week["days"]):
            yield week_timestamp + datetime.timedelta(days=day_n), commits

open-covid-19/data: Daily time-series epidemiology and , This repository contains datasets of daily time-series data related to COVID-19 The data is stored in separate csv/ json files, which can be easily merged due to the use Hospitalizations, [key][date], Information related to patients of COVID-​19 and In Python, you need to have the package pandas installed to get started​:. Step 3: Load the JSON File into Pandas DataFrame. Finally, load your JSON file into Pandas DataFrame using the template that you saw at the beginning of this guide: import pandas as pd pd.read_json (r'Path where you saved the JSON file\File Name.json') In my case, I stored the JSON file on my Desktop, under this path: C:\Users\Ron\Desktop\data.json

I implemented a date generator to make it easy.

However, I think creating df can be improved as I am not much familiar with pandas DataFrame.

from datetime import timedelta, date, datetime
import pandas as pd


def date_generator(first_date):
    """ given first date returns generator which yields given date and next days as date """
    yield first_date
    while True:
        first_date += timedelta(days=1)
        yield first_date


day_wise_commit_count = dict()
for week_index, week in enumerate(data):
    # print('Week ', week_index)
    f_date = datetime.utcfromtimestamp(int(week['week']))  # first date of week
    date_gen = date_generator(f_date)
    # If you are sure that list is in order you can keep above two lines out of loop
    # just use first object for first date
    for day_index, commit_count in enumerate(week["days"]):
        # print('WeekDay ', week_index, day_index)
        commit_date = next(date_gen)
        day_wise_commit_count[commit_date] = commit_count

df = pd.DataFrame(index=day_wise_commit_count.keys(), columns=['commit_count'])
for d, cc in day_wise_commit_count.items():
    df.ix[d]['commit_count'] = cc

print(df)

pomber/covid19: JSON time-series of coronavirus cases , Transforms the data from CSSEGISandData/COVID-19 into a json file. 5. logarithmic vs. linear Y axis, 6. size of moving average (for daily delta), and 7. start date. Covid19-MotionMap (repo): A Python script to create a gif geo map (​motion map) of description; Try not to add extra blank lines, it breaks the formatting. To do this just use “pd.DataFrame” and pass in all the data, by doing this the pandas will automatically convert the raw data into a DataFrame. I am using the head () because the data frame contains 10 rows of data so if I print them then they probably look big and cover most of the page, so instead the head() displays the top 5 data from

Date Index Off a Day · Issue #95 · alvarobartt/investpy · GitHub, GitHub is home to over 50 million developers working together to host and review I get the correct data but the date is one day behind (for example, the close on 12/31/2018 was DataFrame formats those dates as yyyy-mm-dd format since date strings are converted into pandas.core.indexes.datetimes. alpha_vantage. Python module to get stock data/cryptocurrencies from the Alpha Vantage API. Alpha Vantage delivers a free API for real time financial data and most used finance indicators in a simple json or pandas format. This module implements a python interface to the free API provided by Alpha Vantage.

nadireag/sqlalchemy-challenge, Contribute to nadireag/sqlalchemy-challenge development by creating an account Load the query results into a Pandas DataFrame and set the index to the date column. Use Pandas to print the summary statistics for the precipitation data. Return a JSON list of Temperature Observations (tobs) for the previous year. A Python library to load structured table data from files/strings/URL with various data format: CSV / Excel / Google-Sheets / HTML / JSON / LDJSON / LTSV / Markdown / SQLite / TSV. html markdown tsv json csv mediawiki sqlite excel pandas-dataframe table python-library google-sheets pandas reader ltsv

Work With Datetime Format in Python, On this page, you will learn how to handle dates using the datetime object in Python with pandas, using a dataset of daily temperature (maximum  It is less true for JSON files which can store values as numbers, although the JSON format has no native date format, so when you load JSON files you will still need parse the dates. Data-Forge has various helper functions for parsing string values: parseInts , parseFloats and parseDates .

Comments
  • @wizzwizz4 - Thank you.