Stepping through a pipeline with intermediate results

magrittr
%t>%

Is there a way to output the result of a pipeline at each step without doing it manually? (eg. without selecting and running only the selected chunks)

I often find myself running a pipeline line-by-line to remember what it was doing or when I am developing some analysis.

For example:

library(dplyr)

mtcars %>% 
  group_by(cyl) %>% 
  sample_frac(0.1) %>% 
  summarise(res = mean(mpg))
# Source: local data frame [3 x 2]
# 
# cyl  res
# 1   4 33.9
# 2   6 18.1
# 3   8 18.7

I'd to select and run:

mtcars %>% group_by(cyl)

and then...

mtcars %>% group_by(cyl) %>% sample_frac(0.1)

and so on...

But selecting and CMD/CTRL+ENTER in RStudio leaves a more efficient method to be desired.

Can this be done in code?

Is there a function which takes a pipeline and runs/digests it line by line showing output at each step in the console and you continue by pressing enter like in demos(...) or examples(...) of package guides

It is easy with magrittr function chain. For example define a function my_chain with:

foo <- function(x) x + 1
bar <- function(x) x + 1
baz <- function(x) x + 1
my_chain <- . %>% foo %>% bar %>% baz

and get the final result of a chain as:

     > my_chain(0)
    [1] 3

You can get a function list with functions(my_chain) and define a "stepper" function like this:

stepper <- function(fun_chain, x, FUN = print) {
  f_list <- functions(fun_chain)
  for(i in seq_along(f_list)) {
    x <- f_list[[i]](x)
    FUN(x)
  }
  invisible(x)
}

And run the chain with interposed print function:

stepper(my_chain, 0, print)

# [1] 1
# [1] 2
# [1] 3

Or with waiting for user input:

stepper(my_chain, 0, function(x) {print(x); readline()})

Step-Debugging magrittr/dplyr Pipelines in R with wrapr and replyr , to debug magrittr/dplyr pipelines include: Pipelines being large expressions that are hard to line-step into. Visibility of intermediate results. Basic 5 Stage PipelineBasic 5 Stage Pipeline • Same structure as single cycle but now broken into 5 stages • Pipeline stage registersPipeline stage registers act as temp registers storing intermediateact as temp. registers storing intermediate results and thus allowing previous stage to be reused for another instruction

You can select which results to print by using the tee-operator (%T>%) and print(). The tee-operator is used exclusively for side-effects like printing.

# i.e.
mtcars %>%
  group_by(cyl) %T>% print() %>%
  sample_frac(0.1) %T>% print() %>%
  summarise(res = mean(mpg))

18 Pipes, Now, in this chapter, it's time to explore the pipe in more detail. You'll Save each intermediate step as a new object. The results seem counterintuitive at first:. The following plugin provides functionality available through Pipeline-compatible steps. Read more about how to integrate steps into your Pipeline in the Steps section of the Pipeline Syntax page. For a list of other such plugins, see the Pipeline Steps Reference page.

Add print:

mtcars %>% 
  group_by(cyl) %>% 
  print %>% 
  sample_frac(0.1) %>% 
  print %>% 
  summarise(res = mean(mpg))

Pipe for side effect, In a pipeline, one may be interested not only in the final outcome but sometimes save the intermediate result for further use, visualize the intermediate values for better This feature can be used to indicate the step in which R is working on. The following plugin provides functionality available through Pipeline-compatible steps. Read more about how to integrate steps into your Pipeline in the Steps section of the Pipeline Syntax page. For a list of other such plugins, see the Pipeline Steps Reference page.

IMHO magrittr is mostly useful interactively, that is when I am exploring data or building a new formula/model.

In this cases, storing intermediate results in distinct variables is very time consuming and distracting, while pipes let me focus on data, rather than typing:

x %>% foo
## reason on results and 
x %>% foo %>% bar
## reason on results and 
x %>% foo %>% bar %>% baz
## etc.

The problem here is that I don't know in advance what the final pipe will be, like in @bergant.

Typing, as in @zx8754,

x %>% print %>% foo %>% print %>% bar %>% print %>% baz

adds to much overhead and, to me, defeats the whole purpose of magrittr.

Essentially magrittr lacks a simple operator that both prints and pipes results. The good news is that it seems quite easy to craft one:

`%P>%`=function(lhs, rhs){ print(lhs); lhs %>% rhs }

Now you can print an pipe:

1:4 %P>% sqrt %P>% sum 
## [1] 1 2 3 4
## [1] 1.000000 1.414214 1.732051 2.000000
## [1] 6.146264

I found that if one defines/uses a key bindings for %P>% and %>%, the prototyping workflow is very streamlined (see Emacs ESS or RStudio).

Pipe with assignment, In addition to printing and plotting, one may need to save an intermediate value to If one needs to assign the value to a symbol, just insert a step like (~ symbol) These two variables are exactly the intermediate results we wanted to save to� The results of this, however, is that the gci takes several hours to complete with no feedback and using up many GB of memory and slowing my machine to a crawl before finally dumping all of the data to SQL.

sklearn.pipeline.Pipeline — scikit-learn 0.23.2 documentation, Intermediate steps of the pipeline must be 'transforms', that is, they must implement fit and transform Applies fit_predict of last step in pipeline after transforms. • Read EX/MEM pipeline register to get values and control bits • Perform memory load/store if needed – address is ALU result Write values of interest to pipeline register (MEM/WB) • Control information, Rd index, … • Result of memory operation • Pass result of ALU operation

[PDF] Pipeline Parallelism Performance Practicalities, This presentation will describe what pipeline parallelism is and using a DATA step at the same time you read a different raw data file using a eliminates write to disk of intermediate results, which minimizes disk space requirements. A method of arresting the propagation of a buckle in a pipeline being laid on a submerged surface of a body of water from a floating vessel means, said method comprising the steps of: supporting

Collection Pipeline, I first came across the collection pipeline pattern when I started with Unix. a collection pipeline where the intermediate collections are constrained to be relations. At each step in the reduction it sets the value of the accumulator to result of� There are four main phases in the pipeline. The details of these will be covered below. While there are exceptions, all of the phases (mostly) follow this pattern: There are two or more sub-phases, the first of which computes an intermediate result, while the later ones “massage” this intermediate result.

Componentized Pipeline Framework, Each step in this pipeline is wrapped in this framework, so the connection Typically, all intermediate results for one job will be stored at the� At this point, we enter the pipeline Stepping tab. which, initially displays the Raw Event data from our stream. This is the Source display for the Event Pipeline. Step data through Pipeline - Text Converter. We click on the icon to enter the Text Converter stepping window. This stepping tab is divided into three sub-panes. The top one is the

Comments
  • Check out R's debug() function. It is close to what you want. You could use it with the print() statements. This post on Cross Validated talks more about it.
  • When the output is a dataframe I find it useful to use %T>% View() %>% to see the intermediate results
  • I get that print returns it's argument and so this works but it's not really shorter/faster/more convenient than just hand selecting and running chunks.
  • @andrewwong Tell us more, why would you need to run it line by line, more importantly why would you want to look at print output one by one?
  • updated question. I want like an interactive stepper in the console or an auto-magic markdown document with the intermediates all generated. thanks for your thoughts!