Passing a certain number of rows into a function in R

r apply function to each row of dataframe
r apply function with multiple arguments
r apply custom function to each row
r apply function to data frame columns
r apply function to each column of dataframe
r apply function with multiple arguments to data frame
mapply function in r
r lapply custom function

I would like to pass 1500 rows into a function until reaches the end of the dataset. Currently I hard coded the number of rows My code

AA1 = AA[1:1500,]
AA2 = AA[1501:3000,]
AA3 = AA[3001:4500,]
AA4 = AA[4501:6000,]
AA5 = AA[6001:6573,]
#passing into the function generate_pa
AAdone1 = generate_pa(AA1)
AAdone2 = generate_pa(AA2)
AAdone3 = generate_pa(AA3)
AAdone4 = generate_pa(AA4)
AAdone5 = generate_pa(AA5)

Is there anyway I can do this efficiently? Should I create a for loop?

In base R, we can also use by which does the split-apply-combine step together. Since you already have two methods to split the data every n rows, I'll show another way using gl.

n <- 1500
by(AA, gl(ceiling(nrow(AA)/n), n)[1:nrow(AA)], generate_pa)

This splits the data every 1500 rows and applies generate_pa function to each chunk.

How to Apply Functions on Rows and Columns in R, In R, you can use the apply() function to apply a function over every row or column Now you want to know the maximum count per species on any given day. In order to deal with the missing values, you need to pass the argument na​.rm to  In base R, we can also use by which does the split-apply-combine step together. Since you already have two methods to split the data every n rows, I'll show another way using gl. n <- 1500 by(AA, gl(ceiling(nrow(AA)/n), n)[1:nrow(AA)], generate_pa) This splits the data every 1500 rows and applies generate_pa function to each chunk.

you can split a data.table into chunks of size chuncksize using data.table::split

You can then feed the resulting list l to any function using lapply( l, ...)

what it actually does: setDT(AA) converts AA to a data.table [, rowID := (.I-1) %/% chunksize] create a new column, based on rownumber integer division, .I is used, becasue data.table does not have rownames. The result is then split by the newly created rowID-column.

#sample data
set.seed(123)
AA <- data.frame( data = rnorm(10))

#     data
# 1  -0.56047565
# 2  -0.23017749
# 3   1.55870831
# 4   0.07050839
# 5   0.12928774
# 6   1.71506499
# 7   0.46091621
# 8  -1.26506123
# 9  -0.68685285
# 10 -0.44566197

chunksize = 3
l <- split( setDT(AA)[, rowID := (.I-1) %/% chunksize][], by = "rowID")

# $`0`
#          data rowID
# 1: -0.5604756     0
# 2: -0.2301775     0
# 3:  1.5587083     0
# 
# $`1`
#          data rowID
# 1: 0.07050839     1
# 2: 0.12928774     1
# 3: 1.71506499     1
# 
# $`2`
#          data rowID
# 1:  0.4609162     2
# 2: -1.2650612     2
# 3: -0.6868529     2
# 
# $`3`
#         data rowID
# 1: -0.445662     3

R tutorial on the Apply family of functions, You have seen some variations on the do: the FUN function you want to pass; The subsets of that data: rows, But there are many more! Don't stop exploring now! As a follow-up to this tutorial, to R tutorial or Intermediate R course. In This tutorial we will learn about head and tail function in R. head() function in R takes argument “n” and returns the first n rows of a dataframe or matrix, by default it returns first 6 rows. tail() function in R returns last n rows of a dataframe or matrix, by default it returns last 6 rows. we can also use slice() group of functions in dplyr package like slice_sample(),slice_head

Here is a nearly similar approach to the on of Wimpel:

sequence <- 1:6573

lists <- split(sequence, ceiling(seq_along(sequence)/1500))

lapply(lists, generate_pa)

19 Functions, Even after using R for many years I still learn new techniques and better ways of Extracting repeated code out into a function is a good idea because it There is still quite a bit of duplication since we're doing the same thing to multiple columns. For example, we might discover that some of our variables include infinite  Filter or subsetting rows in R using Dplyr can be easily achieved. Dplyr package in R is provided with filter() function which subsets the rows with multiple conditions. We will be using mtcars data to depict the example of filtering or subsetting. Filter or subsetting the rows in R using Dplyr: Subset using filter() function.

If nr is the number of rows in the input data frame and k is the number of rows in each chunk then using the builtin anscombe data.frame for reproducibility then either of these split lines will create a list of chunks. You can lapply your function to that. No packages are used.

nr <- nrow(anscombe)
k <- 3 # 1500 in your case

split(anscombe, rep(1:nr, each = k, length = nr))

# or
split(anscombe, droplevels(gl(nr, k, nr)))

Aggregating and analyzing data with dplyr, Selecting columns and filtering rows; Pipes; Challenge; Mutate; Challenge with R in that all operations are conducted in memory and thus the amount of data you can work We're going to learn some of the most common dplyr functions: select​() When the data frame is being passed to the filter() and select() functions  the desired number of rows. ncol. the desired number of columns. byrow. logical. If FALSE (the default) the matrix is filled by columns, otherwise the matrix is filled by rows. dimnames. A dimnames attribute for the matrix: NULL or a list of length 2 giving the row and column names respectively.

Data frame columns as arguments to dplyr functions, For some reason, you have to pass the column name as a string though. Data frame columns as arguments to dplyr functions offers daily e-mail updates about R news and tutorials about learning R and many other topics. If we are taking the input reference as a range of cells, Excel ROW function returns the row number of the topmost rows in the specified range. For example, if =ROW (D4:G9), Excel ROW Function would return 4 as the topmost row is D4 for which the row number is 4.

Manipulating, analyzing and exporting data with tidyverse, Select certain columns in a data frame with the dplyr function select . Use summarize , group_by , and count to split a data frame into groups of The functions we've been using so far, like str() or data.frame() , come built into R; Since %>% takes the object on its left and passes it as the first argument to the function on  Method 1 : Using Dataframe.apply () Apply a lambda function to all the rows in dataframe using Dataframe.apply () and inside this lambda function check if row index label is ‘b’ then square all the values in it i.e. # Apply function numpy.square () to square the values of one row only i.e. row with index name 'b'.

Data Tidying · Data Science with R, Section 2.3 explains how to split apart and combine values in your data set to structures in R. Many functions in R are written to take atomic vectors as input, Each value will only be paired with other values that appear in the same row To tidy table2 , you would pass spread() the key column and then the value column. The sqlQuery command is a convenience wrapper that first calls odbcQuery and then sqlGetResults. If finer-grained control is needed, for example over the number of rows fetched, additional arguments can be passed to sqlQuery or the underlying functions called directly. sqlGetResults is a mid-level function.

Comments
  • is there anyway I can then combine all the generated data into a dataframe?
  • @Hal yes, did you try the answer ? What did it return ?
  • Yes I did. It outputs all the data on the console. I tried doing 'save=by(AA, gl(ceiling(nrow(AA)/n), n)[1:nrow(AA)], generate_pa)' but it didn't work.
  • do output <- by(AA, gl(ceiling(nrow(AA)/n), n)[1:nrow(AA)], generate_pa) and check output now.