## Find global index for first and last NA value by group

I have a data set of the form

#create data.frame df <- data.frame(id = rep(1:3,each=10), value = rnorm(30)) #throw in some NAs df[c(1:5, 25:30),2] <- NA df[1:10,] id value 1 1 NA 2 1 NA 3 1 NA 4 1 NA 5 1 NA 6 1 -1.0763008 7 1 -0.4026228 8 1 1.6110506 9 1 -1.0626593 10 1 -0.4058101

I would like to find the first and last non-NA value by group. I tried to code up a function that does that and it works fine if there's no grouping:

first.last.non.na = function(x){ return(c(min(which(!is.na(x))),max(which(!is.na(x))))) }

When I try to use this in combination with aggregate, it unfortunately only returns the indices of the first and last non-NA value *within* groups (as is to be expected):

aggregate(df[,2], by = list(df[,1]), FUN = first.last.non.na) Group.1 x.1 x.2 1 1 6 10 2 2 1 10 3 3 1 4

My desired output are the ''global'' indices of first and last non-NA values, i.e.

Group.1 x.1 x.2 1 1 6 10 2 2 11 20 3 3 21 24

Any solutions that would also work with extremely large data sets?

The main idea is to create a variable based on the row numbers before grouping. Using `dplyr`

,

library(dplyr) df %>% mutate(rn = row_number()) %>% group_by(id) %>% summarise(v1 = first(rn[!is.na(value)]), v2 = last(rn[!is.na(value)]))

which gives,

# A tibble: 3 x 3 id v1 v2 <int> <int> <int> 1 1 6 10 2 2 11 20 3 3 21 24

**Find global index for first and last NA value by group,** Find global index for first and last NA value by group. 发布于 2020-04-20 11:15: 46. I have a data set of the form #create data.frame df <- data.frame(id = rep(1:3� set.seed(234) x <- sample(c(rep(NA,3),1:5)) x [1] 3 5 NA 1 4 NA NA 2 For each NA, I want the index (or value) of the last preceeding non-NA value. That is, for the first NA, the last previous non-NA has the index 2. For the next two NA, their last previous non-NA has index 5: [1] NA NA 2 NA NA 5 5 NA Base R or tidyverse would be ok.

Same idea as @Sotos in `data.table`

:

library(data.table) setDT(df)[!is.na(value), .(x.1 = .I[1], x.2 = .I[.N]), by = id] id x.1 x.2 1: 1 6 10 2: 2 11 20 3: 3 21 24

Whereby we first filter for non-missing values of your `df`

(in the `value`

column) and then we extract the global row numbers (`.I`

) for both first (`[1]`

) and last (`[.N]`

) value per each `id`

.

**Swiss Re Group,** Publication Global resilience 2020. The pandemic is putting the world economy's resilience to the test. Discover our latest resilience index. (first|last)_valid_index isn't defined on DataFrames, but you can apply them on each column using apply. # first valid index for each column df.apply(pd.Series.first_valid_index) A 1 B 0 dtype: int64 # last valid index for each column df.apply(pd.Series.last_valid_index) A 3 B 0 dtype: int64 As before, you can also use notna and idxmax

Here is a base R solution using `aggregate`

res <- aggregate(value~id, df, function(x) range(which(!is.na(x))),na.action = NULL) res$value[-1,1] <- res$value[-1,1] + cumsum(res$value[-nrow(res$value),2]) res$value[,2] <- cumsum(res$value[,2])

such that

> res id value.1 value.2 1 1 6 10 2 2 11 20 3 3 21 24

**How to get the first value of each group in a Pandas DataFrame in ,** DataFrame displays the row in each group which has the lowest index. DataFrame.first() to group a DataFrame and get the first value in each group with by as a column name or list of column names to group DataFrame by the column by . The lookup value is in cell B2 (Cookies), and the lookup array is the Category column in tblProducts. The last argument is zero, because I want an exact match for the lookup value. The result of this formula is 4 – the first Cookies product is Arrowroot, in the table’s 4th row of data. Step 2: Find the Number of Rows. If I want to find the

**Indexes and Index-Organized Tables,** In general, the most commonly accessed columns go first. If the database scans the index for a value, then it will find this value in n I/Os where n is If the last name and salary are a composite key in an index, then a fast full index scan piece, then the database provides one by appending a rowid to the grouping piece. The above code does not work well with large datasets. However, you can still get the first observation from each group without the sorting steps. In the code below, you first determine the unique ID values and the subset your data and then can take the last (or first) observation from the subset and combine them into a new dataset.

**17 Optimizer Hints,** Hints that specify an index can use either a simple index name or a parenthesized values, such as allocated storage for such tables, to estimate the missing The FIRST_ROWS hint, which optimizes for the best plan to return the first single For a description of the tablespec syntax, see "Specifying Global Table Hints". Compute first of group values. Parameters numeric_only bool, default False. Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric data. min_count int, default -1. The required number of valid values to perform the operation. If fewer than min_count non-NA values are present the result will be

the "first_value(LastValTest.UserName) over (order by LastValTest.Modified desc)" is a WINDOW aggregate expression and is evaluated after the GROUP BY groups are evaluated, and as such the GROUP BY must contain all expressions that are not contained within a GROUP BY aggregate function and includes any references within any WINDOW functions.