GRPC: make high-throughput client in Java/Scala

grpc-java async client example
grpc client
grpc streaming example java
grpc client example
grpc async java
grpc example
grpc github
grpc vs rest

I have a service that transfers messages at a quite high rate.

Currently it is served by akka-tcp and it makes 3.5M messages per minute. I decided to give grpc a try. Unfortunately it resulted in much smaller throughput: ~500k messages per minute an even less.

Could you please recommend how to optimize it?

My setup

Hardware: 32 cores, 24Gb heap.

grpc version: 1.25.0

Message format and endpoint

Message is basically a binary blob. Client streams 100K - 1M and more messages into the same request (asynchronously), server doesn't respond with anything, client uses a no-op observer

service MyService {
    rpc send (stream MyMessage) returns (stream DummyResponse);
}

message MyMessage {
    int64 someField = 1;
    bytes payload = 2;  //not huge
}

message DummyResponse {
}

Problems: Message rate is low compared to akka implementation. I observe low CPU usage so I suspect that grpc call is actually blocking internally despite it says otherwise. Calling onNext() indeed doesn't return immediately but there is also GC on the table.

I tried to spawn more senders to mitigate this issue but didn't get much of improvement.

My findings Grpc actually allocates a 8KB byte buffer on each message when serializes it. See the stacktrace:

java.lang.Thread.State: BLOCKED (on object monitor) at com.google.common.io.ByteStreams.createBuffer(ByteStreams.java:58) at com.google.common.io.ByteStreams.copy(ByteStreams.java:105) at io.grpc.internal.MessageFramer.writeToOutputStream(MessageFramer.java:274) at io.grpc.internal.MessageFramer.writeKnownLengthUncompressed(MessageFramer.java:230) at io.grpc.internal.MessageFramer.writeUncompressed(MessageFramer.java:168) at io.grpc.internal.MessageFramer.writePayload(MessageFramer.java:141) at io.grpc.internal.AbstractStream.writeMessage(AbstractStream.java:53) at io.grpc.internal.ForwardingClientStream.writeMessage(ForwardingClientStream.java:37) at io.grpc.internal.DelayedStream.writeMessage(DelayedStream.java:252) at io.grpc.internal.ClientCallImpl.sendMessageInternal(ClientCallImpl.java:473) at io.grpc.internal.ClientCallImpl.sendMessage(ClientCallImpl.java:457) at io.grpc.ForwardingClientCall.sendMessage(ForwardingClientCall.java:37) at io.grpc.ForwardingClientCall.sendMessage(ForwardingClientCall.java:37) at io.grpc.stub.ClientCalls$CallToStreamObserverAdapter.onNext(ClientCalls.java:346)

Any help with best practices on building high-throughput grpc clients appreciated.

I solved the issue by creating several ManagedChannel instances per destination. Despite articles say that a ManagedChannel can spawn enough connections itself so one instance is enough it's wasn't true in my case.

Performance is in parity with akka-tcp implementation.

So You Want to Optimize gRPC, A common question with gRPC is how to make it fast. The gRPC library offers users access to high performance RPCs, but it isn't always clear For this blog post, I have written an example client and server using gRPC Java. Akka gRPC Quickstart with Scala Akka gRPC is a toolkit for building streaming gRPC servers and clients on top of Akka Streams. This guide will get you started building gRPC based systems with Scala. If you prefer to use Akka gRPC with Java, switch to the Akka gRPC Quickstart with Java guide.

Interesting question. Computer network packages are encoded using a stack of protocols, and such protocols are built on top of the specifications of the previous one. Hence the performance (throughput) of a protocol is bounded by the performance of the one used to built it, since you are adding extra encoding/decoding steps on top of the underlying one.

For instance gRPC is built on top of HTTP 1.1/2, which is a protocol on the Application layer, or L7, and as such its performance is bound by the performance of HTTP. Now HTTP itself is build on top of TCP, which is at Transport layer, or L4, so we can deduce that gRPC throughput cannot be larger than an equivalent code served in the TCP layer.

In other words: if you server is able to handle raw TCP packages, how adding new layers of complexity (gRPC) would improve performance?

Basics Tutorial – gRPC, Use the Java gRPC API to write a simple client and server for your service. route information such as traffic updates with the server and other clients. However, proto packages generally do not make good Java packages since proto� The gRPC community is very active, with the open sourced gRPC ecosystem listing exciting projects for gRPC on the horizon. In addition, gRPC has principles with which we agree with. Lyft gave a great talk on moving to gRPC which is similar to our own experiences: Generating Unified APIs with Protocol Buffers and gRPC .

I'm quite impressed with how good Akka TCP has performed here :D

Our experience was slightly different. We were working on much smaller instances using Akka Cluster. For Akka remoting, we changed from Akka TCP to UDP using Artery and achieved a much higher rate + lower and more stable response time. There is even a config in Artery helping to balance between CPU consumption and response time from a cold start.

My suggestion is to use some UDP based framework which also takes care of transmission reliability for you (e.g. that Artery UDP), and just serialize using Protobuf, instead of using full flesh gRPC. The HTTP/2 transmission channel is not really for high throughput low response time purposes.

gRPC – A high-performance, open source universal RPC framework, Get started! Go C++ Java Python C#. Why gRPC? gRPC is a modern open source high performance RPC framework that can� Use the Java gRPC API to write a simple client and server for your service. It assumes that you have read the Introduction to gRPC and are familiar with protocol buffers . Note that the example in this tutorial uses the proto3 version of the protocol buffers language: you can find out more in the proto3 language guide and Java generated code

grpc-ecosystem/awesome-grpc: A curated list of useful , Contribute to grpc-ecosystem/awesome-grpc development by creating an Java ; Ruby; Python; C#; Rust; Haskell; Erlang; Elixir; Elm; TypeScript; Scala; Dart command line client written in Java; grpcc - Node.js grpc command-line client; gcall Building High Performance APIs In Go Using gRPC And Protocol Buffers - An� This way the client gets an option to make a blocking call or a non-blocking call. gRPC uses builders for creating objects. We use HelloResponse.newBuilder() and set the greeting text to build a HelloResponse object.

Building a modern API with gRPC, It's a high-performance solution that can run in any environment. I'll show you how easy we can create a simple API using Java and Python. Now it's time to create a client for sending our gRPC request. profil-nonit profil-product profil- systems profil-testing profil-ux profil-webops rekrutacja Scala scrum� Both gRPC and REST communications are secured with TLS/SSL. Streaming is bidirectional in gRPC, while only 1 way request from client to server in REST. So gRPC is better than REST for most of the things that we’ve mentioned so far. However, there’s one thing that REST is still better, That is browser support.

sbt • Akka gRPC, Server). What language to generate stubs for is also configurable: // default is Scala only akkaGrpcGeneratedLanguages := Seq(AkkaGrpc.Scala) // Java only� A reference implementation — implementing the backend and a set of client API libraries in different languages. Cloudstate's reference implementation is leveraging Knative, gRPC, Akka Cluster, and GraalVM running on Kubernetes

Comments
  • Are you using Protobuf? This code path should only be taken if the InputStream returned by MethodDescriptor.Marshaller.stream() does not implement Drainable. The Protobuf Marshaller does support Drainable. If you are using Protobuf, is it possible a ClientInterceptor is changing the MethodDescriptor?
  • @EricAnderson thank you for you response. I tried the standard protobuf with gradle (com.google.protobuf:protoc:3.10.1, io.grpc:protoc-gen-grpc-java:1.25.0) and also scalapb. Probably this stacktrace was indeed from to scalapb-generated code. I removed everything related to scalapb but it didn't help much wrt performance.
  • @EricAnderson I solved my problem. Pinging you as a developer of grpc. Does my answer make sense?
  • ManagedChannel (with built-in LB policies) does not use more than one connection per backend. So if you are high-throughput with few backends it is possible to saturate the connections to all the backends. Using multiple channels can increase performance in those cases.
  • @EricAnderson thanks. In my case spawning several channels even to a single backend node has helped
  • The fewer the backends and the higher the bandwidth, the more likely you need multiple channels. So "single backend" would make it more likely more channels is helpful.
  • For exactly that reason I use streaming approach: I pay once for establishing an http connection and send ~300M messages using it. It uses websockets under the hood which I expect to have relatively low overhead.
  • For gRPC you also pay once for establishing a connection, but you have added the extra burden of parsing protobuf. Anyway it's hard to make guesses without too much information, but I would bet that, in general, since you are adding extra encoding/decoding steps in your pipeline the gRPC implementation would be slower than the equivalent web socket one.
  • Akka adds up some overhead as well. Anyway x5 slowdown looks too much.
  • I think you may find this interesting: github.com/REASY/akka-http-vs-akka-grpc, in his case (and I think this extends to yours), the bottleneck may be due to high memory usage in protobuf (de)serialization, which in turn trigger more calls to the garbage collector.
  • just by curiosity, what was your issue after all?