Design pattern for buffering pipeline input to PowerShell cmdlet

powershell pipeline
powershell pipe multiple commands
powershell pipeline parameter
powershell pipeline function
powershell pipeline
powershell pipe input to command
powershell foreach
commands in the powershell pipeline executes from left to right

I occasionally encounter situations where it makes sense to support pipeline input to a Cmdlet but the operations I wish to do (e.g. database access) make sense to batch up on a sensible number of objects.

A typical way to achieve this appears to be something like the following:

function BufferExample {
<#
.SYNOPSIS
Example of filling and using an intermediate buffer.
#>
[CmdletBinding()]
param(
    [Parameter(ValueFromPipeline)]
    $InputObject
)

BEGIN {
    $Buffer = New-Object System.Collections.ArrayList(10)
    function _PROCESS {
        # Do something with a batch of items here.
        Write-Output "Element 1 of the batch is $($Buffer[1])"
        # This could be a high latency operation such as a DB call where we 
        # retrieve a set of rows by primary key rather than each one individually.

        # Then empty the buffer.
        $Buffer.Clear()
    }
}

PROCESS {
    # Accumulate the buffer or process it if the buffer is full.
    if ($Buffer.Count -ne $Buffer.Capacity) {    
        [void]$Buffer.Add($InputObject)        
    } else {
        _PROCESS 
    }
}

END {
     # The buffer may be partially filled, so let's do something with the remainder.
    _PROCESS
}
}

Is there a less "boilerplate" way to do this?

One method may be to write the function which I call "_PROCESS" here to accept array argument(s) but not pipeline input and then for the cmdlet exposed to the user to be a proxy function built to buffer the input and pass on the buffer as described in Proxy commands.

Alternatively I could dot source dynamic code in the body of the cmdlet I wish to write to support this functionality, however this seems error prone and potentially hard to debug / understand.


The nature of the pipeline puts some constraints on (easily) doing this the way you'd like, mostly because a Process block is designed to receive (and process) a single object.

If you want to buffer all objects, that's fairly simple; you just collect all the objects in an array or other collection within your Process block, and then do all the work in the End block; similar to the way a cmdlet like Sort-Object would handle it.

In the case of buffering for the sake of the underlying resource, like a web-based API, or your example of DB access, I think the approach you take will need to be situation-specific. There's unlikely to be a great general way to achieve it.


One of those approaches is to split the operation into 2 (maybe more?) functions.

For example I write some functions to send metrics to Graphite. I split the functions up between Format-Graphite and Out-Graphite; the former generates a properly formatted metric string, based on the parameters and the pipeline, while the latter sends string(s) to the Graphite collector. It allows the client code to be a bit more versatile in how it gets and generates it data because it can pipe to Format-Graphite, or make individual calls to it without worrying that the network portion will be inefficient. The client code does not have to deal with manually collecting its own data just to avoid that. Not the best example without code to demo but I can't post that code right now.


Another approach for things where the "expensive" part of a single operation is the initialization and tear down code is to just do that stuff in Begin and End and then use Process normally.

For example making dozens of database calls over a single connection you established in Begin may not be so bad, may even be preferable to doing something like building a big SQL string and sending it all at once.


Ultimately I think you might be better off looking at each individual use case and determining the best approach for your needs, balancing performance/efficiency and ease/intuitiveness of invoking the code.

If you have a specific use case and post a question about that, I'd like to read it; send me a link.

What does $_. mean in PowerShell?, What are the 2 ways for a PowerShell cmdlet to accept input from the pipeline? However, when a cmdlet needs to process pipeline input, it must have its parameters bound to input values by the Windows PowerShell runtime. To do this, you must add the ValueFromPipeline keyword or add the ValueFromPipelineByProperty keyword to the System.Management.Automation.Parameterattribute attribute declaration.


I made a re-usable cmdlet from your example.

function Buffer-Pipeline {
[CmdletBinding()]
param(
    $Size = 10,
    [Parameter(ValueFromPipeline)]
    $InputObject
)
    BEGIN {
        $Buffer = New-Object System.Collections.ArrayList($Size)
    }

    PROCESS {
        [void]$Buffer.Add($_)

        if ($Buffer.Count -eq $Size) {    
            $b = $Buffer;
            $Buffer = New-Object System.Collections.ArrayList($Size)
            Write-Output -NoEnumerate $b
        }
    }

    END {
     # The buffer may be partially filled, so let's do something with the remainder.
        if ($Buffer.Count -ne 0) {  
            Write-Output -NoEnumerate $Buffer
        }
    }
}

Usage:

@(1;2;3;4;5) | Buffer-Pipeline -Size 3 | % { "$($_.Count) items: ($($_ -join ','))" }

Output:

3 items: (1,2,3)
2 items: (4,5)

Another example:

1,2,3,4,5 | Buffer-Pipeline -Size 10 | Measure-Object

Count    : 2

Processing each batch:

 1,2,3 | Buffer-Pipeline -Size 2 | % { 
    $_ | % { "Starting batch of $($_.Count)" } { $_ * 100 } { "Done with batch of $($_.Count)" }
}

Starting batch of 2
100
200
Done with batch of 2
Starting batch of 1
300
Done with batch of 1

Understanding the PowerShell $_ and $PSItem pipeline variables , , the word pipeline generally refers to a series of commands that have been joined together. However, because this cmdlet is designed for the middle of a pipeline, this cmdlet allows previous cmdlets or data in the pipeline to specify the processes to retrieve. Support Input from the Pipeline. In each parameter set for a cmdlet, include at least one parameter that supports input from the pipeline. Support for pipeline input allows the


I think this is one of the best approach you are using.

But still as you are looking for better debugging , I believe you can introduce some LOGS inside the function with a desired log file location.

Further, for dot sourcing you can still code Dot Source reference and call the function which is inside the other script and you can wrap the total thing in one function.

Best way for multiple functions is to create as module.

Hope it helps you.

Get All PowerShell Object Properties, How do I get the properties of an object in PowerShell? 6 Design pattern for buffering pipeline input to PowerShell cmdlet Jan 4 '17 4 PowerShell : GetNewClosure() and Cmdlets with validation Nov 4 '13 3 Canonical example of ConfigurationProperty May 8 '12


Strongly Encouraged Development Guidelines, They are separated into guidelines for designing cmdlets and guidelines for Support Well Defined Pipeline Input (SC02) At the most specific level, it also provides ways to read and write individual keys and to deal with buffers. patterns, see Supporting Wildcard Characters in Cmdlet Parameters. Welcome to my Getting Started with Windows PowerShell series! In case you missed the earlier posts, you can check them out here: Customizing your environment Command discovery Using the ISE and basic function creation A deeper dive into functions Loops Modules Help We wil


about_Pipelines, The output of the first command can be sent for processing as input to the Most PowerShell cmdlets are designed to support pipelines. PowerShell's parameter binding component associates the input objects with cmdlet parameters according to the following criteria: The parameter must accept input from a pipeline. The parameter must accept the type of object being sent or a type that can be converted to the expected type.


Chapter 1, The Windows PowerShell Interactive Shell, Above all else, the design of Windows PowerShell places priority on its use as an pipeline scenarios (using the output of one command as the input of another). Every PowerShell command lets you provide input to the command through its the pattern cmdlet : parameter —that is, a cmdlet name and parameter name,  For more information aboutWindows PowerShell paths, see "PowerShell Path Concepts" in How Windows PowerShell Works. Declaring the Pattern Parameter. To specify the patterns to search for, this cmdlet declares a Pattern parameter that is an array of strings. A positive result is returned when any of the patterns are found in the data store.