Python Multiprocessing: What's the difference between map and imap?

python multiprocessing pool queue
python multiprocessing for loop
python multiprocessing return value
python multiprocessing vs threading
python multiprocessing shared object
when to use multiprocessing python
python multiprocessing class method
python multiprocessing shared memory

I'm trying to learn how to use Python's multiprocessing package, but I don't understand the difference between map and imap.

Is the difference that map returns, say, an actual array or set, while imap returns an iterator over an array or set? When would I use one over the other?

Also, I don't understand what the chunksize argument is. Is this the number of values that are passed to each process?

That is the difference. One reason why you might use imap instead of map is if you wanted to start processing the first few results without waiting for the rest to be calculated. map waits for every result before returning.

As for chunksize, it is sometimes more efficient to dole out work in larger quantities because every time the worker requests more work, there is IPC and synchronization overhead.

16.6. multiprocessing, A manager object returned by Manager() controls a server process which holds Python objects and allows other processes to manipulate them using proxies. A  Multiprocessing refers to the ability of a system to support more than one processor at the same time. Applications in a multiprocessing system are broken to smaller routines that run independently. The operating system allocates these threads to the processors improving performance of the system.

imap is from itertools module which is used for fast and memory efficiency in python.Map will return the list where as imap returns the object which generates the values for each iterations(In python 2.7).The below code blocks will clear the difference.

Map returns the list can be printed directly

 from itertools import *
    from math import *

    integers = [1,2,3,4,5]
    sqr_ints = map(sqrt, integers)
    print (sqr_ints)

imap returns object which is converted to list and printed.

from itertools import *
from math import *

integers = [1,2,3,4,5]
sqr_ints = imap(sqrt, integers)
print list(sqr_ints)

Chunksize will make the iterable to be split into pieces of specified size(approximate) and each piece is submitted as a separate task.

17.2. multiprocessing — Process-based parallelism, Alternatively, you can use get_context() to obtain a context object. Context objects have the same API as the multiprocessing module, and allow one to use multiple  multiprocessing is a package that supports spawning processes using an API similar to the threading module. The multiprocessing package offers both local and remote concurrency, effectively side-stepping the Global Interpreter Lock by using subprocesses instead of threads.

With imap, forked calls are done in parallel, not one after another sequentially. For example, below you're hitting say three exchanges to get order books. Instead of hitting exchange 1, then exchange 2, then exchange 3 sequentially, imap.pool calls are non-blocking and goes straight to all three exchanges to fetch order books as soon as you call.

from pathos.multiprocessing import ProcessingPool as Pool
pool = Pool().imap
self.pool(self.getOrderBook, Exchanges, Tickers)

multiprocessing Basics, The simplest way to spawn a second is to instantiate a Process object with a target function and call start() to let it begin working. import multiprocessing def worker  Multiprocessing allows you to create programs that can run concurrently (bypassing the GIL) and use the entirety of your CPU core. Though it is fundamentally different from the threading library, the syntax is quite similar. The multiprocessing library gives each process its own Python interpreter and each their own GIL.

Multiprocessing in Python, Python's "multiprocessing" module feels like threads, but actually launches processes. Many people, when they start to work with Python, are  The answer is somewhere in the middle. The Python standard library comes with "multiprocessing", a module that gives the feeling of working with threads, but that actually works with processes.

Things I Wish They Told Me About Multiprocessing in Python, Python ships with the multiprocessing module which provides a number of useful functions and classes to manage subprocesses and the  Python offers two libraries - multiprocessing and threading- for the eponymous parallelization methods. Despite the fundamental difference between them, the two libraries offer a very similar API (as of Python 3.7).

An introduction to parallel programming using Python's , – using Python's multiprocessing module. Jun 20, 2014 by Sebastian Raschka. CPUs with multiple  Table of Contents Previous: multiprocessing – Manage processes like threads Next: Communication Between Processes. This Page. Show Source. Examples. The output from all the example programs from PyMOTW has been generated with Python 2.7.8, unless otherwise noted. Some of the features described here may not be available in earlier versions of Python.

Comments
  • Closely related: multiprocessing.pool: What's the difference between map_async and imap?
  • So how does one approach determining a reasonable value for chunksize then? If bigger means less IPC & sync overhead due to pickling, what's the tradeoff? (ie why is picking chunksize == len(iterable) a bad idea, or is it?)
  • @Adam If you pick chunksize = len(iterable), then all the jobs will be assigned to a single process! len(iterable) // numprocesses is the maximum that is useful. The tradeoff is between synchronization overhead and cpu utilization (large chunksizes will cause some processes to finish before others, wasting potential processing time).
  • Ok, I see that, but that simply mean picking a reasonable chunksize boils down to trial and error on particular data in a particular setting?
  • I think so. Most optimization requires profiling and fine tuning.
  • It's also worth mentioning that imap can be applied to a generator, while map will turn your generator into a list-like object, so imap doesn't wait for the input to get generated.