Why Segment Chose Go, gRPC, and Envoy to Build Their New Config API

nzoschke · on Dec 15, 2018

OP here. There’s a lot more to say about this stack, particularly learnings of grpc and envoy over 6 months. But overall I think the service framework and service proxy pattern is really powerful.

ArtWomb · on Dec 15, 2018

Agreed. Lightweight, distributed tracing in real-time creates a feedback loop that can in turn inform design of your application. Even if you are only hosting a dozen microservice nodes, instead of 1000s. And if you have familiarity with Golang channel patterns such as timeouts, retrys, etc. It is sort of like second nature.

Expect more on the state of the art as KubeCon vids come online ;)

https://www.youtube.com/channel/UCvqbFHwN-nwalWPjPUKpvTA/vid...

nindalf · on Dec 15, 2018

Looking forward to that article. I've read a lot about the features that envoy and grpc provide but it'd be really interesting to know how they mesh together and what problems they actually help to solve.

securingsincity · on Dec 15, 2018

I would love to hear how you are managing protofiles across repositories. Do you have a central repository for protofiles? It's something we're trying to figure out with our GRPC implementation.

asdkhadsj · on Dec 15, 2018

We opted to store proto files in a singular location. In my view, this has three benefits:

1. Protobufs importing each other becomes effortless. You don't need to download multiple repositories, if you have the proto repo, you have all the dependencies.

2. Decoupling "ownership" allows for a bit more freedom when creating proto files that describe shared data structures.

3. You get a singular location to browse infra. It doesn't describe relationships of course, but it's handy to be able to see what services you have and what their API looks like.

Disclaimer: I don't use gRPC, but I do use Protobufs & an RPC, Twirp.

jpdb · on Dec 15, 2018

This is something I've pondered myself, but haven't got a chance to play around with. My idea was simply to integrate it with make (or whatever build tool/script you prefer) where it pulls the .proto file from master in git unless you specify a branch. The downside is that you aren't versioning the protoc file with the client.

It's probably best to separate the proto file for server and client and treat it as a dependency.

If you're using a mono-repo the solution is possibly a little simpler.

be_erik · on Dec 15, 2018

I recently built something like this: https://github.com/reverbdotcom/protopkg

nzoschke · on Dec 15, 2018

OP here.

We keep them all in one repo.

Right now the entire API is a monorepo so it’s easy. But I plan to split it up soon...

All the protos in one repo. Ideally public so Segment users can use it as a reference too!

Then all the generated Go clients, servers, mocks, typescript, swagger, etc in another repo. This is what programs can import.

Then we can build gRPC services in their own repos if we want to. And it’s easy to produce and consume messages for all the other services.

VectorLock · on Dec 15, 2018

Sounds like a great followup article ready to be written.

asdkhadsj · on Dec 15, 2018

I work in a small shop, much "behind the times", and recently tried introducing gRPC. It went okay. The problem is it was a hard sell and for our limited needs, a lot of gRPC's features weren't something we didn't happily solve with REST. Some of the other more complex features that REST didn't support were things we didn't feel we needed. All together, gRPC was a big burden for me to push, with little payoff for people. I think I was the only one that liked it, and even I disliked parts of it.

So before pulling the switch too far, I decided to take a step back and assess with benefits we all got from gRPC. Mainly it was Protobuf spec, RPC design (ie not REST verbs and whatnot), and code generation.

Because those were the things we all agreed upon, I went with Twirp[0]. It's been a great experience so far. The rollout is just a bit ahead of where our gRPC was, and it's been mostly effortless.

The most effortless thing? I've got an (internal) API user switching from the old REST API to the proto RPC spec, and all they're changing is the URL and a few field naming schemes. It's still JSON for them. They're not generating code, they're not doing anything. It all "just works", which is pretty impressive.

TL;DR: We love simpler options that let us gain the benefits of Protobufs, without the complexity of gRPC when we don't need it.

[0]: https://blog.twitch.tv/twirp-a-sweet-new-rpc-framework-for-g...

grpcthrowa1215 · on Dec 15, 2018

I work on GRPC. Twirp is serious competitor to GRPC due to being easier to use (as you noticed) and gets most of the benefits just by using Protobuf. It's a problem for us because it's hard to explain the value added.

For me, the reason to use GRPC is because there is a very good story about the boring details of networking. The stuff that would make other programmers eyes glaze over but are nonetheless important, has basically been solved by GRPC. What happens when the connection gets dropped but you haven't sent the RPC yet? Does it try to find another server to send it to? Does it time out? What happens if the DNS entries change after you connect? Does it even use all the addresses that DNS returned? If the connection has been idle for for 30 minutes, does it keep it around or shut it down? How does it even know if the connection is good after being idle so long? Home routers have a bad habit of breaking connections due to their NAT.

For most of these problems, TCP and UDP only gives you the tools to solve them, but don't do so out of the box. Coming back to Twirp, I don't recall seeing these things addressed in their code, (except what Go's net and http library already provide). If you don't care about those networking details (and it's completely legitimate if you don't, some people don't need to), then I think Twirp's API is much more pleasant to use, and still gets you 80% of the benefit of GRPC.

star-trek-fleet · on Dec 15, 2018

Grpc is the enterprise ready and high-performance rpc infrastructure, I cannot believe Google still cannot figure out the right pitch.

There isn't anything in the world that can match grpcs features and still perform at the same level, plus Google's decades of large scale use.

Guys, I am so disappointed at how Google push it's internal technology. People won't come to you without pitching, be smart...

hsaliak · on Dec 16, 2018

PM for gRPC here. Feedback received! :) What would you like to see us do more of?

rcv · on Dec 16, 2018

Unrelated to the above comment, but could you all please maintain an Ubuntu PPA with binary builds? We're trying to use gRPC at my shop in C++ and the build is a huge pain.

be_erik · on Dec 16, 2018

HTTP 1.1 support would help those of us working in AWS with ELBs

Game_Ender · on Dec 15, 2018

Is the GRPC API less pleasant because you have to handle those details? Or could it be evolved to be more user friendly?

nzoschke · on Dec 15, 2018

OP here.

Agree completely that .proto offers big advantages on its own.

Also agree that grpc is complex and challenging. We considered twirp too.

A few things that pushed us towards gRPC...

Canonical error codes and semantics. Standards here go such a long way to building reliable systems.

The ecosystem. There are a lot of middlewares, validators, transcoders written specifically for the Go gRPC stack.

Envoy. This has a lot of things we wanted like rate limiting and auth which are implemented over gRPC. With AWS adopting Envoy for their service mesh offering it’s a safe technology bet now.

Performance. This doesn’t matter for our API stack but could be a huge gain in Segments data plane.

That said I still wish all this was simpler. Twirp looks like a great choice in that regard.

DandyDev · on Dec 15, 2018

This might be a dumb question, but why translate REST to gRPC and not expose the gRPC services directly?

buckhx · on Dec 15, 2018

Browsers don't speak grpc natively. There is a separate spec for browser support, but it's a very new project https://github.com/grpc/grpc-web/blob/master/README.md

nzoschke · on Dec 15, 2018

OP here. We considered this, but decided against it for a couple reasons.

First and foremost our customers universally favor REST APIs. GraphQL and gRPC are better tech wise but REST is more familiar.

Next gRPC uses pure HTTP/2 which doesn’t work with an AWS ALB.

Thanks to the grpc ecosystem the HTTP/1 and REST transcoding is just an Envoy config.

SenHeng · on Dec 16, 2018

Loved the article and no criticisms towards it, but hate all the highlighted keywords. Made it really hard to scan the page because the grey blobs just grab my eyes and won’t let go.

hsaliak · on Dec 16, 2018

PM for gRPC here. Thank you for writing this up! Do you folks secure inter service RPCs in some way? eg mTLS? How do you see your stack evolving over time?

ipsin · on Dec 15, 2018

I'm curious how you're doing authorization in Envoy. Is it a JWT-based httpfilter?

nzoschke · on Dec 15, 2018

OP here.

We use the envoy authz filter.

For every incoming request envoy first calls out to a custom auth check service with all the request metadata like path and http headers.

The auth service can return a “fail” response which indicates to not forward the original request any further. Or it can return a “pass” response plus data to add to the original request headers.

Docs here:

https://www.envoyproxy.io/docs/envoy/latest/configuration/ht...

buckhx · on Dec 15, 2018

Can't speak for OP, but we have a similar stack and use middleware on the actual services since they need to decide which permissions are needed instead of centralized auth at the gateway level.

SirMonkey · on Dec 15, 2018

The funny thing with envoy is that you can take the source of the authz service and build your own auth-mechanism for other protocols. We are doing that for MQTT

champagnepapi · on Dec 15, 2018

I am getting an application error on page load. Can't load page.

InGodsName · on Dec 15, 2018

I've been using lots of Go lately.

In analytics space, for example, we built end points in Go which collect data and post it to Bigquery or Redshift.

Helping startup create their data pipelines.

We use Go everywhere whenever we need to glue things together.

One cool thing about Go is that it doesn't take much time to understand what's happening in the codebase which we've is good for dodging mistakes in app receicijg billions of events per week.