Retina – eBPF distributed networking observability tool for Kubernetes

jookat · on March 20, 2024

Red Hat has a similar eBPF based tool https://github.com/netobserv/network-observability-operator (Disclaimer: i work on it) - the cool thing imho with retina or redhat netobserv or pixie is they aren't tied to a specific CNI. Now one of the problems that arises is potential conflicts and lack of collaboration between eBPF based tools, as there are more and more. Something called bpfman aims to address this aspect

worthless-trash · on March 20, 2024

I've used this in OpenShift and it is -very- neat. Saved incredible amounts of time trying to solve problems.

sharangxy · on March 21, 2024

You can also check out DeepFlow [1], where we implemented distributed tracing for microservices using eBPF, which of course also includes observability of K8s networks.

1. https://deepflow.io

maxboone · on March 19, 2024

Probably this means it's not for me, but what is this useful for?

Anyone using this in their prod set-up and has a scenario where they found this useful?

p_l · on March 19, 2024

Just today at $DAYJOB we had a complaint that one teams Azure Kubernetes clusters was "slow". About only metric out of norm was network traffic - but lack of detailed instrumentation meant we couldn't really isolate the cause to specific container or process

hacker_newz · on March 20, 2024

If the only explanation someone can provide is that "a cluster is slow", the issue isn't with network observability. They need to do at least the minimum level of analysis before escalating.

chronid · on March 20, 2024

Yes, that would be great, but unfortunately there are application teams (particularly in the enterprise) lacking such tact when blaming infrastructure for issues.

Good old silos are alive and well, and ownership is not always part of the culture.

p_l · on March 20, 2024

In our case the expected golden path is that once our team figures the proper procedure, we will establish it for the downstream teams that are direct supports of the application teams.

So at least in theory things are somewhat well set up, but there's too much siloing at our level (wildly separate network teams, teams for specific clouds, etc.)

p_l · on March 20, 2024

Unfortunately we were the right team for escalation before escalating to network team.

FridgeSeal · on March 19, 2024

It’s like Cilium + Hubble but useful for you don’t/can’t run cilium. Uses eBPF to collect metrics and stats on what flows where, can record an impressive amount of stuff, without any required instrumentation of your applications. Amazingly handy for when you run both first party and 3rd party apps in your K8s cluster. The network maps these tools produce are handy too.

Although, Cilium is pretty great, so not sure why you wouldn’t run it, given the option…

hosh · on March 19, 2024

Cillium has been bought out by Cisco, so its monetization is only a matter of time.

Also, not everyone needs to implement a service mesh.

candiddevmike · on March 19, 2024

Cillium is a CNCF project and Apache 2 licensed. Isovalent and their enterprise product were bought by Cisco.

FridgeSeal · on March 19, 2024

Neither Cilium nor Hubble are service meshes.

Cilium is a CNI - the functionality that provides the K8s cluster inter-pod networking. The fact that it uses eBPF to deliver its functionality is what gives it the impressive observability you usually only get from a service mesh. I agree that not everyone needs a service mesh.

nullify88 · on March 20, 2024

You are right that cilium is a CNI but one of its many features is also providing a sidecar-less service mesh using eBPF. https://cilium.io/use-cases/service-mesh/

nijave · on March 19, 2024

Haven't used this but I tried out Pixie trying to debug where outgoing traffic was coming from and where it was going and was fairly successful although Pixie wasn't very stable/had a lot of issues causing crashes.

In this case, we had a couple services talking to 3rd party services running on AWS so it wasn't obvious from generic flow logs.

I also used Lacework a couple years ago which is eBPF based and it was pretty trivial to see things phoning home or one off maintenance where a new connection was being initiated.

emmelaich · on March 19, 2024

The linked docs provide more info: https://retina.sh/docs/intro

emmanueloga_ · on March 20, 2024

There is a flood of observability tools based on eBPF coming out these days [1]. eBPF is used to collect metrics without the need to instrument the code.

--

1: https://ebpf.io/applications/

sharangxy · on March 21, 2024

DeepFlow [1][2] is one of them, where we implemented distributed tracing for microservices using eBPF.

1. https://deepflow.io 2. https://github.com/deepflowio/deepflow

orisho · on March 19, 2024

See also: Network Mapper - low privileges, no-eBPF network observability tool for K8s

https://hackernews.hn/item?id=39761114

xyst · on March 20, 2024

Speaking of observability tools. Anybody here know how to gather more in-depth metrics on mTLS requests? Have an internal (self signed) CA and just want to know which issued certs are presented to nodes. Would be nice to get cert serial number and other metadata as well

vamsi1105 · on March 20, 2024

https://github.com/microsoft/retina/issues/85

That is a very interesting ask, let me raise an issue against the repo and see how we can solve this with eBPF in this repo. I am pretty sure this is a very common problem for a lot of kube admins.

xyst · on March 20, 2024

oh, thank you! Will follow the issue

tempaccount420 · on March 20, 2024

What's the performance impact of using VM-based eBPF for these tasks instead of native code?

tptacek · on March 20, 2024

Linux eBPF is JIT'd. It has native performance.

tempaccount420 · on March 24, 2024

Even WASM doesn't have native performance, so I question this claim.

scottlamb · on March 19, 2024

Sigh, anyone have a name suggestion for a (Rust) RTSP library? https://crates.io/crates/retina

nextaccountic · on March 20, 2024

Don't change the name of this crate, it is in an unrelated domain. (Also unrelated to retina displays and uhh.. this Brazilian intrusion detection system https://sunsoftware.com.br/retina/ and whatever this thing is https://retina.ai/ among other things)

emmanueloga_ · on March 20, 2024

https://deadcells.fandom.com/wiki/Conjunctivius :-p

karolist · on March 20, 2024

The name collisions for opensource projects stopped mattering a long time ago