Hot questions for Using ZeroMQ in docker

Question:

I'm trying to set up a toy example of Docker networking with ZeroMQ in macOS, where the serverd.py sends a message to the clientd.py and the client simply displays it using PUSH/PULL. If I run them outside of the container they work fine, but I'm having trouble getting them to communicate when running within separate containers. It seems like my clientd.py cannot connect to the container name despite them being within the same bridge network. I tried replacing the hostname with the assigned ip address for serverd_dev_1, but this doesn't work either.

Here's my setup:

  1. I created a new network with docker network create -d bridge mynet. Here is the output from docker network inpsect mynet:

    {
        "Name": "mynet",
        "Id": "cec7f8037c0ef173d9a9a66065bb46cb6a631fea1c0636876ccfe5a792f92412",
        "Created": "2017-08-19T09:52:44.8034344Z",
        "Scope": "local",
        "Driver": "bridge",
        "EnableIPv6": false,
        "IPAM": {
            "Driver": "default",
            "Options": {},
            "Config": [
                {
                    "Subnet": "172.18.0.0/16",
                    "Gateway": "172.18.0.1"
                }
            ]
        },
        "Internal": false,
        "Attachable": false,
        "Ingress": false,
        "ConfigFrom": {
            "Network": ""
        },
        "ConfigOnly": false,
        "Containers": {
            "5fa8dc2f8059d675dfd3dc4f2e50265be99361cd8a8f2730eb273772c0148742": {
                "Name": "serverd_dev_1",
                "EndpointID": "3a62e82b1b34d5c08f2a9f340ff93aebd65c0f3dfde70e354819befe21422d0b",
                "MacAddress": "02:42:ac:12:00:02",
                "IPv4Address": "172.18.0.2/16",
                "IPv6Address": ""
            },
            "ec1e5f8c525ca8297611e02bcd3a64198fda3a07ce8ed82c0c4298609ba0357f": {
                "Name": "clientd_dev_1",
                "EndpointID": "a8ce6f178a225cb2d39ac0009e16c39abdd2dae02a65ba5fd073b7900f059bb8",
                "MacAddress": "02:42:ac:12:00:03",
                "IPv4Address": "172.18.0.3/16",
                "IPv6Address": ""
            }
        },
        "Options": {},
        "Labels": {}
    }
    
  2. I created serverd.py and clientd.py like this and put them in separate folders along with their Dockerfiles and docker-compose.yml:

serverd.py:

import zmq
import time

context = zmq.Context()
socket = context.socket(zmq.PUSH)
address = "tcp://127.0.0.1:5557"
socket.bind(address)
print("Sending to {}...".format(address))
while True:
    message = socket.send_string("Got it!")
    print("Sent message")
    time.sleep(1)

clientd.py:

import zmq

context = zmq.Context()
socket = context.socket(zmq.PULL)
address = "tcp://serverd_dev_1:5557"
socket.connect(address)
print("Listening to {}...".format(address))
while True:
    message = socket.recv_string()
    print("Client got message! {}".format(message))

I have two Dockerfiles and docker-compose.yml:

Dockerfile for serverd.py:

FROM python:3.6

RUN mkdir src
ADD serverd.py /src/
RUN pip install pyzmq
WORKDIR /src/
EXPOSE 5557

Dockerfile for clientd.py:

FROM python:3.6

RUN mkdir src
ADD clientd.py /src/
RUN pip install pyzmq
WORKDIR /src/
EXPOSE 5557

docker-compose.yml for serverd.py:

dev:
  build: .
  command: ["python", "-u", "./serverd.py"]
  net: mynet

docker-compose for clientd.py:

dev:
  build: .
  command: ["python", "-u", "./clientd.py"]
  net: mynet
  1. serverd.py starts up as expected with docker-compose up:

Sending to tcp://127.0.0.1:5557...

  1. clientd.py won't start up like this due to it not finding the hostname tcp://serverd_dev_1:5557.

    Attaching to countd_dev_1
    dev_1  | Traceback (most recent call last):
    dev_1  |   File "./countd.py", line 6, in <module>
    dev_1  |     socket.connect(address)
    dev_1  |   File "zmq/backend/cython/socket.pyx", line 528, in zmq.backend.cython.socket.Socket.connect (zmq/backend/cython/socket.c:5971)
    dev_1  |   File "zmq/backend/cython/checkrc.pxd", line 25, in zmq.backend.cython.checkrc._check_rc (zmq/backend/cython/socket.c:10014)
    dev_1  | zmq.error.ZMQError: Invalid argument
    
  2. If I replace the URI tcp://serverd_dev_1:5557 with tcp://172.18.0.2:5557 it does not crash anymore, but it is simply idling without receiving any of the messages from the server. Obviously I'm doing something wrong, but I'm not entirely sure what exactly. I feel like I have been following the Docker documentation as closely as possible, and would very much be grateful if you had any ideas.


Answer:

Your primary problem is that you have configured your server with the address tcp://127.0.0.1:5557. Because it's bound to localhost (127.0.0.1), that socket isn't going to be visible to anything outside of that container. So the first thing you need to fix is the server bind address. Consider:

address = "tcp://0.0.0.0:5557"

A second problem is that you're using the name serverd_dev_1 in the client, but it's not clear this would actually be the name of your serverd container(that would depend on the directory names in use when you run docker-compose up).

Naming is easier to manage with a single docker-compose.yaml file. For example, I set things up like this:

version: "2"

services:
  serverd:
    build: serverd
    command: ["python", "-u", "./serverd.py"]
    environment:
      SERVER_LISTEN_URI: tcp://0.0.0.0:5557

  clientd:
    build: clientd
    command: ["python", "-u", "./clientd.py"]
    environment:
      SERVER_CONNECT_URI: tcp://serverd:5557

This will launch both containers in a dedicated network (because this is what docker-compose does by default), so you don't need to explicitly create or reference mynet.

As you can probably infer from the above, I modified your code to get the ZMQ uri from an environment variable, because this made it easier to experiment. You can find the above docker-compose.yaml and the modified code at:

Update

In case you really want/need to have two separate docker-compose.yaml files, I've updated the example to include per-service files. These examples use the alias option to provide a name by which the client can contact the server, regardless of your local directory layout:

version: "2"

services:
  serverd:
    build: .
    command: ["python", "-u", "./serverd.py"]
    environment:
      SERVER_LISTEN_URI: tcp://0.0.0.0:5557
    networks:
      mynet:
        aliases:
          - serverd

networks:
  mynet:
    external: True

This configuration requires that you create mynet before bringing up the containers.

Question:

Got a bit problem with ZeroMQ socket on trying to .bind() on 0.0.0.0:5555 address, when tried to run it in Docker container via a Rancher Cattle.

Every time I try to run it, I'm getting the same error:

zmq.error.ZMQError: Address already in use.

Tried to do EXPOSE 5555 and EXPOSE 5555/tcp in my Dockerfile, but it did not help me.

Here is a part of my code:

...
self.context = zmq.Context()
self.socket = self.context.socket(zmq.PUB)
self.socket.bind('tcp://%s:%d' % ('0.0.0.0', 5555))
...

Maybe somebody had the same problem. How to solve it?


Answer:

ZeroMQ API defines 3-ways to go:

Assigning a local address to a socket When assigning a local address to a socket using zmq_bind() with the tcp:// transport, the endpoint shall be interpreted as an interface followed by a colon and the TCP port number to use. An interface may be specified by either of the following: - The wild-card *, meaning all available interfaces. - The primary IPv4 address assigned to the interface, in its numeric representation. - The interface name as defined by the operating system. Interface names are not standardised in any way and should be assumed to be arbitrary and platform dependent. On Win32 platforms no short interface names exist, thus only the primary IPv4 address may be used to specify an interface.

So, at least one ought make the job progress.

Question:

I want to build a modularized system with modules communicating over ZeroMQ. To improve usability, I want to Dockerize (some) of these modules, so that users don't have to setup an environment. However, I cannot get a dockerized publisher to have its messaged received by a non-dockerized subscriber.

System
  • Ubuntu 18.04
  • Python 3.7
  • libzmq version 4.2.5
  • pyzmq version is 17.1.2
  • Docker version 18.09.0, build 4d60db4
Minimal test case

zmq_sub.py

# CC0

import zmq


def main():
    # ZMQ connection
    url = "tcp://127.0.0.1:5550"
    ctx = zmq.Context()
    socket = ctx.socket(zmq.SUB)
    socket.bind(url)  # subscriber creates ZeroMQ socket
    socket.setsockopt(zmq.SUBSCRIBE, ''.encode('ascii'))  # any topic
    print("Sub bound to: {}\nWaiting for data...".format(url))

    while True:
        # wait for publisher data
        topic, msg = socket.recv_multipart()
        print("On topic {}, received data: {}".format(topic, msg))


if __name__ == "__main__":
    main()

zmq_pub.py

# CC0

import zmq
import time


def main():
    # ZMQ connection
    url = "tcp://127.0.0.1:5550"
    ctx = zmq.Context()
    socket = ctx.socket(zmq.PUB)
    socket.connect(url)  # publisher connects to subscriber
    print("Pub connected to: {}\nSending data...".format(url))

    i = 0

    while True:
        topic = 'foo'.encode('ascii')
        msg = 'test {}'.format(i).encode('ascii')
        # publish data
        socket.send_multipart([topic, msg])  # 'test'.format(i)
        print("On topic {}, send data: {}".format(topic, msg))
        time.sleep(.5)

        i += 1


if __name__ == "__main__":
    main()

When I open 2 terminals and run:

  • python zmq_sub.py
  • python zmq_pub.py

The subscriber receives without error the data (On topic b'foo', received data: b'test 1')

Dockerfile

I've created the following Dockerfile:

FROM python:3.7.1-slim

MAINTAINER foo bar <foo@spam.eggs>

RUN apt-get update && \
  apt-get install -y --no-install-recommends \
  gcc

WORKDIR /app
COPY requirements.txt /app
RUN pip install -r requirements.txt

COPY zmq_pub.py /app/zmq_pub.py

EXPOSE 5550

CMD ["python", "zmq_pub.py"]

and then I successfully build a Dockerized publisher with the command: sudo docker build . -t foo/bar

Attempts
Attempt 1

Now I have my docker container with publisher, I'm trying to have my non-dockerized subscriber receive the data. I run the following 2 commands:

  1. python zmq_sub.py
  2. sudo docker run -it foo/bar

I see my publisher inside the container publishing data, but my subscriber receives nothing.

Attempt 2

With the idea I have to map the internal port of my dockerized publisher to my machine's port, I run the following 2 commands:

  1. python zmq_sub.py
  2. sudo docker run -p 5550:5550 -it foo/bar

However, then I receive the following error: docker: Error response from daemon: driver failed programming external connectivity on endpoint objective_shaw (09b5226d89a815ce5d29842df775836766471aba90b95f2e593cf5ceae0cf174): Error starting userland proxy: listen tcp 0.0.0.0:5550: bind: address already in use.

It seems to me that my subscriber has already bound to 127.0.0.1:5550 and therefore Docker cannot do this anymore when I try to map it. If I change it to -p 5549:5550, Docker doesn't give an error, but then it's the same situation as with Attempt 1.

Question

How do I get my dockerized publisher to publish data to my non-dockerized subscriber?

Code

Edit 1: Code updated to also give an example on how to use docker-compose for automatic IP inference.

GitHub: https://github.com/NumesSanguis/pyzmq-docker


Answer:

This is mostly a docker networking question, and isn't specific to pyzmq or zeromq. You would have the same issues with anything trying to connect from a container to the host.

To be clear, in this example, you have a server running on the host (zmq_sub.py, which calls bind), and you want to connect to it from a client running inside a docker container (zmq_pub.py).

Since the docker container is the one connecting, you don't need to do any docker port exposing or forwarding. EXPOSE and forwarding ports are only for making it possible to connect to a container (i.e. bind is called in the container), not to make outbound connections from a container, which is what's going on here.

The main thing here is that when it comes to networking with docker, you should think of each container as a separate machine on a local network. In order to be connectable from other containers or the host, a service should bind onto all interfaces (or at least an accessible interface). Binding on localhost in a container means that only other processes in that container should be able to talk to it. Similarly, binding localhost on the host means that docker containers shouldn't be able to connect.

So the first change is that your bind url should be:

url = 'tcp://0.0.0.0:5550'
...
socket.bind(url)

or pick the ip address that corresponds to your docker virtual network.

Then, your connect url needs to be the ip of your host as seen from the container. This can be found via ifconfig. Typically any ip address will do, but if you have a docker0 network, that would be the logical choice.

Question:

We have a aiohttp based web services which uses ZMQ to send jobs to workers and waits for the result. We are of course using the ZMQ eventloop, so we can wait for ZMQ sockets. "Sometimes" the process crashes and we get this stack trace:

...
await socket.send(z, flags=flags)
File "/usr/local/lib/python3.5/dist-packages/zmq/eventloop/future.py", line 165, in send
kwargs=dict(flags=flags, copy=copy, track=track),
File "/usr/local/lib/python3.5/dist-packages/zmq/eventloop/future.py", line 276, in _add_send_event
timeout_ms = self._shadow_sock.sndtimeo
File "/usr/local/lib/python3.5/dist-packages/zmq/sugar/attrsettr.py", line 45, in _getattr_
return self._get_attr_opt(upper_key, opt)
File "/usr/local/lib/python3.5/dist-packages/zmq/sugar/attrsettr.py", line 49, in _get_attr_opt
return self.get(opt)
File "zmq/backend/cython/socket.pyx", line 449, in zmq.backend.cython.socket.Socket.get (zmq/backend/cython/socket.c:4920)
File "zmq/backend/cython/socket.pyx", line 221, in zmq.backend.cython.socket._getsockopt (zmq/backend/cython/socket.c:2860)

"Sometimes" means, that the code works fine, if I just run it on my test machine. We encountered the problem in some rare cases when using docker containers, but were never able to reproduce it in an reliable way. Since we moved our containers into a Kubernetes cluster, it occurs much more often. Does anybody know, what could be the source of the above stack trace?


Answer:

aiohttp is not intended to be used with vanilla pyzmq. Use aiozmq loopless streams instead.

See also https://github.com/zeromq/pyzmq/issues/894 and https://github.com/aio-libs/aiozmq/blob/master/README.rst

Question:

I have a publisher running on host on port N.

It would be painful to pack into container.

And I have a subscriber running inside a container.

The problem is that if I first run the publisher on host, then I can't export ports in docker run -d -p N:N publisher (where N is the port number)

Error response from daemon:    Cannot start container 41202025441bf02ad5c8cf2a85fb1f1bd04c2211e648f5ec446442f9af4a6274:
Error starting userland proxy: listen tcp 0.0.0.0:5570:
                               bind: address already in use

And if I first start container, then there would be an Address already in use error in publisher.

The problem I think is that when docker exposing port from container to host in binds to them somehow. So no one else can bind to that port in host.

Can I do something aside running publisher in container?


Answer:

Make sure your docker demon is running with --icc=true (communication between containers/network) and the docker0 network adapter is successfully added to your host iptables config (Usually done by docker parameter --iptables=true ).

Then run your container with -p N (not N:N). This means that the port is exposed and not mapped on a host port.

Then look for the containers ip address:

$ docker inspect publisher

Now contact your subscriber not by localhost:5570 but ip-adress:5570.