F# PSeq.iter does not seem to be using all cores

f# parallel for loop
f# async sleep
f# async runsynchronously

I've been doing some computationally intensive work in F#. Functions like Array.Parallel.map which use the .Net Task Parallel Library have sped up my code exponentially for a really quite minimal effort.

However, due to memory concerns, I remade a section of my code so that it can be lazily evaluated inside a sequence expression (this means I have to store and pass less information). When it came time to evaluate I used:

// processor and memory intensive task, results are not stored
let calculations : seq<Calculation> =  seq { ...yield one thing at a time... }

// extract results from calculations for summary data
PSeq.iter someFuncToExtractResults results

Instead of:

// processor and memory intensive task, storing these results is an unnecessary task
let calculations : Calculation[] = ...do all the things...

// extract results from calculations for summary data
Array.Parallel.map someFuncToExtractResults calculations 

When using any of the Array.Parallel functions I can clearly see all the cores on my computer kick into gear (~100% CPU usage). However the extra memory required means the program never finished.

With the PSeq.iter version when I run the program, there's only about 8% CPU usage (and minimal RAM usage).

So: Is there some reason why the PSeq version runs so much slower? Is it because of the lazy evaluation? Is there some magic "be parallel" stuff I am missing?

Thanks,

Other resources, source code implementations of both (they seem to use different Parallel libraries in .NET):

https://github.com/fsharp/fsharp/blob/master/src/fsharp/FSharp.Core/array.fs

https://github.com/fsharp/powerpack/blob/master/src/FSharp.PowerPack.Parallel.Seq/pseq.fs

EDIT: Added more detail to code examples and details

Code:

  • Seq

    // processor and memory intensive task, results are not stored
    let calculations : seq<Calculation> =  
        seq { 
            for index in 0..data.length-1 do
                yield calculationFunc data.[index]
        }
    
    // extract results from calculations for summary data (different module)
    PSeq.iter someFuncToExtractResults results
    
  • Array

    // processor and memory intensive task, storing these results is an unnecessary task
    let calculations : Calculation[] =
        Array.Parallel.map calculationFunc data
    
    // extract results from calculations for summary data (different module)
    Array.Parallel.map someFuncToExtractResults calculations 
    

Details:

  • The storing the intermediate array version runs quick (as far as it gets before crash) in under 10 minutes but uses ~70GB RAM before it crashes (64GB physical, the rest paged)
  • The seq version takes over 34mins and uses a fraction of the RAM (only around 30GB)
  • There's a ~billion values I'm calculating. Hence a billion doubles (at 64bits each) = 7.4505806GB. There's more complex forms of data... and a few unnecessary copies I'm cleaning up hence the current massive RAM usage.
  • Yes the architecture isn't great, the lazy evaluation is the first part of me attempting to optimize the program and/or batch up the data into smaller chunks
  • With a smaller dataset, both chunks of code output the same results.
  • @pad, I tried what you suggested, the PSeq.iter seemed to work properly (all cores active) when fed the Calculation[], but there is still the matter of RAM (it eventually crashed)
  • both the summary part of the code and the calculation part are CPU intensive (mainly because of large data sets)
  • With the Seq version I just aim to parallelize once

Based on your updated information, I'm shortening my answer to just the relevant part. You just need this instead of what you currently have:

let result = data |> PSeq.map (calculationFunc >> someFuncToExtractResults)

And this will work the same whether you use PSeq.map or Array.Parallel.map.

However, your real problem is not going to be solved. This problem can be stated as: when the desired degree of parallel work is reached in order to get to 100% CPU usage, there is not enough memory to support the processes.

Can you see how this will not be solved? You can either process things sequentially (less CPU efficient, but memory efficient) or you can process things in parallel (more CPU efficient, but runs out of memory).

The options then are:

  1. Change the degree of parallelism to be used by these functions to something that won't blow your memory:

    let result = data 
                 |> PSeq.withDegreeOfParallelism 2 
                 |> PSeq.map (calculationFunc >> someFuncToExtractResults)
    
  2. Change the underlying logic for calculationFunc >> someFuncToExtractResults so that it is a single function that is more efficient and streams data through to results. Without knowing more detail, it's not simple to see how this could be done. But internally, certainly some lazy loading may be possible.

F# Parallel Sequences: Is “PSeq” still the best way to do this? Why is , r/fsharp: This group is geared towards people interested in the "F#" language, The ecosystem was not great, I remember the FSharp Interpreter not working at all, time appears to be good enough when using the ServerPrerendered flag, and JS interactions appears to be less friendly than straight Fable Js Inter Op. F# PSeq.iter does not seem to be using all cores // processor and memory intensive task, results are not stored let calculations : seq<Calculation> = seq { yield one thing at a time

Array.Parallel.map uses Parallel.For under the hood while PSeq is a thin wrapper around PLINQ. But the reason they behave differently here is there is not enough workloads for PSeq.iter when seq<Calculation> is sequential and too slow in yielding new results.

I do not get the idea of using intermediate seq or array. Suppose data to be the input array, moving all calculations in one place is the way to go:

// Should use PSeq.map to match with Array.Parallel.map
PSeq.map (calculationFunc >> someFuncToExtractResults) data

and

Array.Parallel.map (calculationFunc >> someFuncToExtractResults) data

You avoid consuming too much memory and have intensive computation in one place which leads to better efficiency in parallel execution.

F# Parallel Sequences, The API is akin to F# operations on sequences. The F# FSharp. Example. This example demonstrates using a function defined in this sample library. How can I test if a sequence is empty in F#? Ask Question F# PSeq.iter does not seem to be using all cores. 75. When to use a sequence in F# as opposed to a list? 2.

I had a problem similar to yours and solved it by adding the following to the solution's App.config file:

<runtime> 
    <gcServer enabled="true" />
    <gcConcurrent enabled="true"/>
</runtime>

A calculation that was taking 5'49'' and showing roughly 22% CPU utilization on Process Lasso took 1'36'' showing roughly 80% CPU utilization.

Another factor that may influence the speed of parallelized code is whether hyperthreading (Intel) or SMT (AMD) is enabled in the BIOS. I have seen cases where disabling leads to faster execution.

Can I run Seq.iter's action function asynchronously - .net - iOS, Seq.iter (fun item -> (*do something*)) sequence Can I run all those FSharp.​Collections Code away using PSeq module, e.g. {1..100} |> PSeq.iter You'll get some compiler warnings, because the result of expression is not unit, but it will work. of the builtin LOOP facility, so I tend to treat it as being in the core language. I set myself to understand better what was going on under the hood with F# sequences. A task I needed to optimized involved converting strings into a sequence of Unicode codepoints and I was wondering if I could replace the mutable loop we were using into an immutable one without sacrificing too much performance.

Haskell vs. f# vs. scala, multi-cores achieve a speedup up to 5.62 (on 8 cores) with a classroom use is granted without fee provided that copies are not made or distributed such as Haskell, F# and Scala, all aim to be minimally intrusive par, pseq. PLINQ, ParFor Par Coll. Async Wkf. Explicit threads tasks. Actors In OOPSLA'05 — Inter-. Q&A for active researchers, academics and students of physics. Stack Exchange network consists of 175 Q&A communities including Stack Overflow, the largest, most trusted online community for developers to learn, share their knowledge, and build their careers.

Concurrent computing, null; -- the "environment task" doesn't need to do anything Using the new (2013​) core.async library, "go blocks" can execute This approach launches multiple process to achieve concurrency on NET framework and F# PowerPack 2.0 installed, it is possible to use the predefined PSeq.iter instead. Q&A for computer enthusiasts and power users. Stack Exchange network consists of 175 Q&A communities including Stack Overflow, the largest, most trusted online community for developers to learn, share their knowledge, and build their careers.

Parallel Programming in F# (III.): Aggregating data, Unfortunatelly, not all programs are like that. In particular, we This can be done using built in query operator Sum or the PSeq.sum function:

Comments
  • Lazy evaluation doesn't play nice with parallel execution. To be fair, pass the same Calculation[] to PSeq.iter and Array.Parallel.map. It's impossible to tell the reason without having more details of Calculation and someFuncToExtractResults.
  • Thanks for suggestion, I tried this and PSeq behaves well when given the array instead on the lazy seq... however it doesn't solve RAM issue
  • Both are intensive, I'm not sure what you mean my your second point, can you elaborate please?
  • @AnthonyTruskinger: I've made some significant updates based on the extra info you provided. Note that you must choose a trade off somewhere if you don't want the algorithm changed (you won't get 100% CPU and efficient memory without changing the algorithm). If you can change the algorithm, well, see my answer.