What is the most efficient way to add a column that is a binary indicator of a recurring number in time series dataframe?

pandas groupby
pandas dataframe
pandas merge
pandas math operations on columns
pandas shift
rename column pandas
pandas apply
pandas binary

I have a dataframe that is similar to this example dataframe:

example <- data.frame(id = c("1","1","1", "1", "2", "2", "2"),
                      amount = c(2300, 1765, 2300, 1500, 35, 180, 180),
                      date = c("2010-11-01", "2010-11-02", "2010-11-03", "2010-11-04", "2010-11-01", "2010-11-02", "2010-11-03"))

I want to add a column that will have a 1 that indicates if an amount is a recurring amount. A recurring amount can only be considered recurring if the amount repeats within the same id. So it would look like this:

desiredResult <- data.frame(id = c("1","1","1", "1", "2", "2", "2"),
                      amount = c(2300, 1765, 2300, 1500, 2300, 180, 180),
                      date = c("2010-11-01", "2010-11-02", "2010-11-03", "2010-11-04", "2010-11-01", "2010-11-02", "2010-11-03"),
                      probableRecurringAmount = c(1,0,1,0,0,1,1)) 

The dataset is very large and I am having a hard time coming up with an efficient solution. I was considering adding keys to a column based on combinations of these other columns, but I want to only have a binary flag.

You can do it like this:

library(dplyr)    
example %>%
  group_by(id, amount) %>%
  mutate(probableRecurringAmount  = ifelse(n() > 1, 1, 0))

# A tibble: 7 x 4
# Groups:   id, amount [5]
# id      amount date       probableRecurringAmount
#<fct>  <dbl> <fct>                        <dbl>
#1 1       2300 2010-11-01                       1
#2 1       1765 2010-11-02                       0
#3 1       2300 2010-11-03                       1
#4 1       1500 2010-11-04                       0
#5 2         35 2010-11-01                       0
#6 2        180 2010-11-02                       1
#7 2        180 2010-11-03                       1

DataFrame — pandas 0.24.2 documentation, (loc, column, value[, …]) Insert column into DataFrame at specified location. DataFrame.iterrows (), Iterate over DataFrame rows as (index, Series) pairs. For more information on .at , .iat , .loc , and .iloc , see the indexing documentation. Floating division of dataframe and other, element-wise (binary operator truediv ). Record for each user account level of normal,with binary (yes/no) column upgrade-to-platinum indicating that if the account level has been upgraded from normal to platinum. and column Time_to_upgrade_DAYS the days took from normal to platinum, if available

You can use duplicated to find duplicated rows, then join with the original data to flag both the original and the duplicate.

library(tidyverse)
example <- data.frame(id = c("1","1","1", "1", "2", "2", "2"),
                      amount = c(2300, 1765, 2300, 1500, 35, 180, 180),
                      date = c("2010-11-01", "2010-11-02", "2010-11-03", "2010-11-04", "2010-11-01", "2010-11-02", "2010-11-03"))

# Find duplicated rows
dups = example %>% 
  select(id, amount) %>% 
  mutate(recurring=as.numeric(duplicated(.))) %>% 
  filter(recurring==1)

# Flag both the original and duplicated rows as recurring
example %>% left_join(dups, ) %>% 
  replace_na(list(recurring=0))
#> Joining, by = c("id", "amount")
#>   id amount       date recurring
#> 1  1   2300 2010-11-01         1
#> 2  1   1765 2010-11-02         0
#> 3  1   2300 2010-11-03         1
#> 4  1   1500 2010-11-04         0
#> 5  2     35 2010-11-01         0
#> 6  2    180 2010-11-02         1
#> 7  2    180 2010-11-03         1

Created on 2020-01-14 by the reprex package (v0.3.0)

pandas.DataFrame — pandas 1.1.0 documentation, Compare to another DataFrame and show the differences. convert_dtypes ([ infer_objects, …]) Convert columns to best possible dtypes using dtypes supporting pd� What is the most efficient way to add a column that is a binary indicator of a recurring number in time series dataframe? Is there a way to get a default value

We can use duplicated from base R

example$recurring <-  +(duplicated(example[c('id', 'amount')])|
         duplicated(example[c('id', 'amount')], fromLast = TRUE))
example$recurring
#[1] 1 0 1 0 0 1 1

Indexing and selecting data — pandas 0.8.1 documentation, Identifies data (i.e. provides metadata) using known indicators, important for for analysis, Here we construct a simple time series data set to use for illustrating the If you only want to access a scalar value, the fastest way is to use the This allows you to select rows where one or more columns have values you want: . 5 What is the most efficient way to add a column that is a binary indicator of a recurring number in time series dataframe? Jan 14 4 How to save All models from h2o automl Sep 12 '18

Feature Selection for Time Series Forecasting with Python, The use of machine learning methods on time series data requires and number of features you can engineer for a time series problem. We can do this in Pandas using the shift function to create new columns of This process can be repeated with different numbers of features to select more than 4 and� The number of trading indicators out there is staggering. And choosing which indicator to use at any given point in time is crucial to binary options trading success. Types of Binary Options Indicators. There are various indicators for binary options trading. Most of these indicators can be classified under one of the four types below: Trend

Group By: split-apply-combine — pandas 0.25.0.dev0+752 , By “group by” we are referring to a process involving one or more of the following steps: Since the set of object instance methods on pandas data structures are A string passed to groupby may refer to either a column or an index level. If you need to rename, then you can add in a chained operation for a Series like this� Binary options are good in the first place that allow you to quickly ramp up profits. And help it to make the indicators with a short expiration. One such indicator is the Binary Winner, which is designed to trade on the M5 from the time of expiry of 5 minutes.

API — Dask 2.23.0+0.g6e1e86bd.dirty documentation, Series.to_hdf (path_or_buf, key[, mode, append]), Store Dask Dataframe to In other words, if there is a gap with more than this number of consecutive NaNs, it will Returns the covariance matrix of the DataFrame's time series. Use a Categorical for efficient storage of an object-dtype column with many repeated values. The famous cycle indicator includes the Schaff Trend cycle indicator. What binary options indicator to use and when? The markets do not move in a straight line. It is often said that the markets trend only 20% of the time while range or move sideways 80% of the time. Therefore, doesn’t it make sense to use or apply the most appropriate indicator?