Yo dawg, I heard you liked containers so I put a container inside your container!
In all seriousness though, I wonder if this could solve one of our issues of running mutli-container apps in Azure, which is currently causing us some trouble in regards to VNETs and Private Endpoints. I’m sure it’s likely because we should be using Kubernetes or similar, or at the very least have someone who knows how to actually set the Ops part of DevOps up better than we currently do. Long story short, those things won’t realistically happen, and if we could solve them by running multiple podmans inside a docker docker-container deployed as an azure app then we could likely fix our security issues of not being able to setup VNETs and PEs on multi-container apps because it’s been a “up-coming” feature for at least a year.
I’d be interested to know your thoughts on it. The efficiency doesn’t matter as much, our workloads are around 1-5% of our Azure budget which is basically a blip on the bigger budgets.
App Service won't allow the level of privilege needed to start the "head" container that enables container-in-container to work. Options like `--privileged` and `--cap SYS_ADMIN` or mounting special volumes from the host are the first step on "how to escape from a container" so shared container services don't allow them. The options might be more allowable in your own App Service Environment but I don't think the API is even there to set the options in the first place.
Container-in-container also pushes a number of responsibilities from the the App Service down into the head container, like ingress, service management, health, logging etc. You might as well run compose/swarm on your own VM(s) than reinvent those wheels.
AKS + kompose might be a not too bad option. AKS can deploy on the private vnet and can do private endpoints. The cluster will probably fall over once a year for <reasons> but it will mostly manage itself. If you leave k8s to auto upgrade, run a microk8s instance as a test env somewhere and it will hit the upgrade issues before AKS releases a k8s version.
This is very useful for running containers in lightweight untrusted environments.
I personally use this setup for my GitLab CI runners, using a mix of podman and buildah to build containers. I can't 100% trust the container build script, so avoiding having to expose a Docker socket to that untrusted code is a critical safety layer.
I realize that Linux wasn't made for containerization originally and that's why it's so so painful to nest containers, run them without root etc. But I wish we'd already live in a world where all binaries (maybe with the exception of command line tools) ran containerized by default and I, as the user, had to grant explicit access to files, network, hardware/devices, etc. Not just for security reasons (though it'd be nice if `npm install` and `pip install` couldn't mess up my entire system) but also to make external dependencies more explicit.
Isn't that what Snap and Flatpak are doing? 1) Include dependencies (there can be shared runtimes but they don't mess up the host), 2) sandboxed by default and you need to grant permissions.
I built Packj [1] sandboxing for securing “pip/NPM install”. It uses strace for sandboxing and blocks access to sensitive files and limits traffic to known-good IP addresses.
It's a shame that multi-ring security never really made a comeback.
The ability to let any ring (such as a user ring) handle the library and system calls of an outer ring (such as a container ring) seems like it would be convenient.
Alternately, some microkernels have a more peer-to-peer setup which could work as well.
It's not so painful and most pain comes from Linux insistence on /proc-based APIs as opposed to something similar to Solaris-descendants where it's a json spec (I think xml originally)
Sounds like android or what a modern linux distro should look like.
Sadly both most used userspace distros and the kernel they come with are borderline dangerous and shouldn't be any trusted by people using them, even if they're technical enough to feel comfortable within their environment.
> Most of this has historically been related to Docker in Docker (DIND), but now, people also want to run Podman in Podman (PINP) or Podman in Docker (PIND).
Kind is super cool! I used it for my thesis, as I needed many nodes without getting super into k8s configuration. I love, love kind. If you're into learning Kubernetes, it's probably the easiest way to get yourself a cluster quickly.
For example, your CI/CD agent is running in a container, and has to build and test containerized software. You can use something like buildah to build your app containers without a container runtime, but you want to run them to run the tests.
Ideally, you would want to do this while remaining platform-agnostic, so your agents will be able to start a container no matter where they are themselves: on bare metal, in Podman, in Docker, in k8s.
I was about to say the same thing. We currently use a podman container, deploy it in Kubernetes, build container images with it for CI, push them and scale down the podman container.
Ive been reading every line of this blog for a week trying to get it to work — and failed.
The goal is container driven development. But if all my dev tools are inside a container, how do I iterate on building new containers within a container? This is the core feature I wanted.
But unfortunately, this challenge proved way too difficult to pull off and nothing seemed to work for me.
My guess is these setups heavily depend on the host machine and how podman was setup on it which breaks the entire value of using container driven dev.
I ship containers to customers. Would be nice to run tests against them before releasing a container and it would be nice if those tests themselves could run in a container in order to have a clean test environment.
You could create that test environment image beforehand, mirror it and use it for the job that needs it. No need for adding another container within a container.
Ever since we started moving to Docker it's been a massive pain. I'm on a maintenance team doing both legacy (non-Docker) and the occasional Docker-based system that gets handed to us (and then a few months later taken back, it's kind of annoying), and not a single member of my team has gotten the Docker-based systems to work. We all end up with different issues getting Docker to run, or for the different containers to communicate, or being able to browse to the app running in the container. Different systems from different teams also have a tendency to interfere with each other. Two years after this started I finally got an admission from the main development team that every new dev on their team has had similar problems.
On the flipside, LXD containers have "just worked" for everyone who has tried it so far. We've been able to use them on a whim for testing stuff, no problems at all.
So I've been wondering for a while if we could use LXD containers to provide a clean slate to run the Docker containers inside of, maybe then we'd at least all have the same problems, if not be able to solve them entirely.
I use MicroOS (https://microos.opensuse.org/), to keep the base operating system clean you'd install helper tools for constructing containers in a container... so two levels of containers would be very helpful
Being able to run rootful stuff in rootless containers is nice for CI where you need for example to install a bunch of stuff or mount things. There are stuff that requires being root, and you might not want to give a real root access to your CI.
Yes, there are other ways to do it, but adding another tool as an option is good, too. Being able to nest containers makes composeability a little bit better, too. For instance, if I want an image that effectively has multiple base images, instead, I can just run both base images in containers inside of the parent container. You can bundle a lot of related containers into an arbitrary app container. It's a natural expression of programming composition into a containerization context, in my opinion.
Containers being nestable components makes a lot of natural wants much more natural to express.
Here's another one. I have an app which spawns subprocesses to do computations. I would love to put the subprocesses in containers to constrain their resource usage (i know that's not the only way to do it, but it's an effective and well-understood way to do it). But i would also like to able to run my application in a container!
While using containers is common and well-understood, containers within containers are not. It's novel enough to warrant this blog post on how to do it!
If all you want is resource constraints on your spawned processes, it's easier and more common to just use cgroups. It's straightforward and you should have a working understanding of cgroups anyway if you want to be effective at using containers, which are built on top of cgroups.
Cgroups are really easy to use and I feel like people aren't bothering to learn about it. :(
To be fair cgroupsv1 and the original tooling around them weren't so great. And people were stuck with them for aaaaaages because the cpu controller wasn't ported to v2 for so long. And then they were still stuck on v1 because of Docker...
Building containers within a CI system, for example. CI jobs run as images, with a DinD service attached, which they can use to build images themselves.
But even if I run in a container e.g. a Github actions runner I can use buildah to build my image right? So no need for another container. Why is there another layer needed?
Over complicating the engineering (Docker) involved in getting some x86 code into RAM and on a CPU stack, by over complicating further (DIND.) It's essentially another way to abstract and abstraction, and engineers are finding ways to justify it.
It's not really over complicating anything, though? It makes perfect sense, it's not hard to understand. Nested VMs have been a thing for a looong time.
I honestly don't understand where the criticism for Docker and k8s comes from. Some of HN comments sound like old men shouting at the clouds because they don't understand modern technology and its purposes.
If you want, go right ahead with your managed, on-site VMs where you copy a .php file to an Apache server using a thumb drive and restart the service. I don't care. But that's not how modern teams are working.
I'd suggest you do more research. Plenty of teams in (very) big companies, handling (very) large workloads not doing what you'd want them to.
Docker? Docker is ideal for teams deploying quickly evolving (possible stateless) applications across multiple platforms. If you have no need to deploy across multiple platforms, simply deploy directly to your OS using the best primitives possible.
Kubernetes? Excellent for when you need to do multiple, rapid deployments per day of small (ideally stateless) services. A good example of this is small services that serve up adverts in response to some "profile" of a visitor sent to it from a browser - these things evolve a lot, and often.
Abstracting your abstractions because you don't want to learn how-to make the OS better or use what's already there isn't clever. Learning to use the tools is the important thing, and Docker and K8s are just tools.
In e.g. the website example, the parent podman container does everything from running the integration tests to converting a video of the test at the end into a slowed down GIF and combines it with screenshots to put in the docs. It orchestrates the child podman containers - playwright + the webapp.
I'm certain that if all this tooling were run on different host machines it would have a myriad of "works on my machine" problems all over the place - whether due to mac, WSL, weird linux distros, github actions idiosyncrasies or whatever.
I'm equally certain that if the tooling were bundled in the app container it would needlessly fatten it with unnecessary and potentially conflicting dependencies. I don't want video conversion tooling installed in my web app even if I do want it to generate my docs.
I just tried this out.
The new systemd directive OpenFile=
opens up the possibility to pass the file descriptor
of a file from the host to a container running in a container.
(using rootless Podman running rootless Podman)
sudo systemd-run --property User=test --property OpenFile=/etc/secretfile.txt --collect --pipe --wait --quiet podman run --security-opt label=disable --user podman --device /dev/fuse quay.io/podman/stable podman run -q alpine sh -c "cat <&3"
In all seriousness though, I wonder if this could solve one of our issues of running mutli-container apps in Azure, which is currently causing us some trouble in regards to VNETs and Private Endpoints. I’m sure it’s likely because we should be using Kubernetes or similar, or at the very least have someone who knows how to actually set the Ops part of DevOps up better than we currently do. Long story short, those things won’t realistically happen, and if we could solve them by running multiple podmans inside a docker docker-container deployed as an azure app then we could likely fix our security issues of not being able to setup VNETs and PEs on multi-container apps because it’s been a “up-coming” feature for at least a year.
I’d be interested to know your thoughts on it. The efficiency doesn’t matter as much, our workloads are around 1-5% of our Azure budget which is basically a blip on the bigger budgets.