Hot questions for Using Cap'n Proto in protocol buffers

Top 10 C/C++ Open Source / Cap'n Proto / protocol buffers

Question:

At the moment we are using ProtocolBuffers to exchange data between python and C++. However, we are running into the maximum filesize limitation of protocol buffers and are considering switching everything to Cap'n Proto. However, since it is somewhat related to protocol buffers, I was wondering if Cap'n Proto too has a limitation wrt to the maximum filesize?


Answer:

Cap'n Proto has a maximum file size of approximately 2^64 bytes, or 16 exbibytes -- which "should be enough for anyone". :)

Cap'n Proto is in fact an excellent format for extremely large data files, because it supports random access and lazy loading. When reading a huge Cap'n Proto file, I recommend using mmap() to map the file into memory, then passing the bytes directly to the Cap'n Proto implementation (e.g. capnp::FlatArrayMessageReader in C++). This way, only the pages of the file that you actually use will be brought into memory by the operating system. (In contrast, with Protocol Buffers, it is necessary to parse the entire file upfront into in-memory data structures before you can access any of it.)

Note that an individual List value in a Cap'n Proto structure has a limit of 2^29-1 elements. Text and Data (strings and byte blobs) are special kinds of lists, so this implies that any single contiguous text or byte blob is limited to 512MB. However, you can have multiple such blobs, so larger data can be stored into a single file by splitting it into pieces.

Note also that most Cap'n Proto implementations by default impose a "traversal limit" when reading a Cap'n Proto structure in order to defend against malicious data containing pointer loops. Typically this defaults to 64MiB. For larger data, you'll want to override the limit -- in C++, you'll want to pass a custom ReaderOptions to the MessageReader constructor.

Question:

When calling Cap'n Proto's writeMessageToFd(pipe, message); I get this error:

terminate called after throwing an instance of 'kj::ExceptionImpl'
  what():  src/kj/io.c++:323: failed: ::writev(fd, current, iov.end() - current): Bad file descriptor; fd = -1
stack: 0x7efead69cf89 0x7efead6a0c7f 0x7efead6a2648 0x7efead6a24f7 0x7efead8f40b7 0x7efead8f42a4 0x402c7b 0x402a36 0x4028df 0x7efeabd50e50 0x7efeabd5181a 0x7efeabd52669 0x7efeabd52a03 0x7efeabd52bb2 0x402865 0x4027ab

Answer:

You've not really asked a question, but I can tell you from that exception that you shouldn't have attempted to call writeMessageToFd with an invalid file descriptor (the exception text tells you this "Bad file descriptor; fd = -1").

You have two options: - don't call that function if pipe == -1 (probably best, you should really have checked that the call which returned pipe didn't return -1) - surround your call to writeMessageToFd() with a try/catch and handle the exception appropriately

You should really go with the former and handle a -1 value in pipe appropriately.