HN2new | past | comments | ask | show | jobs | submitlogin
Managing Apt Repos in S3 Using Lambda (webscale.plumbing)
75 points by daenney on Aug 7, 2016 | hide | past | favorite | 29 comments


Interesting idea to use Lambda. (Personally, I've used Aptly.)

---

For the Apt transport, I recommend apt-boto-s3 (https://www.lucidchart.com/techblog/2016/06/13/apt-transport...) that I authored. Unlike apt-transport-s3, it

  * Works with AWS v4 signatures (required in some regions)
  * Supports If-Modified-Since caching.
  * Uses pipelining.
  * Uses standard AWS credential resolution, like ~/.aws/credentials or IAM roles
  * Allows credentials to be specified per-repo (in the URL)
  * Supports both path and virtual-host styles for S3 URLs
And it works with proxies if you use HTTPS_PROXY/HTTP_PROXY environment variables (though I haven't tested this).


Oh cool. The only reason I chose apt-transport-s3 was that it was included in Ubuntu 16.10. It also only uses stuff from the standard python library so there's less bootstrapping before it is usable.

That does look really cool though. For proxies it would be nice if it supported the standard apt-get proxy declaration. Apt-transport-s3 basically does that and sets the proxy env vars.

You could probably get yours into the offical ubuntu repos if you wanted, though.


Great write up. We use S3 for our Apt & Yum repos at work, too, only with website hosting turned on for the buckets (losing some of the security OP's solution has but adding simplicity).

I'd like to point out a tool I'm using now for building and managing our Apt repos- Aptly https://www.aptly.info/

It's really nice and Just Works™. It even has S3 publishing built in. Oh, and you can build Apt repos on Debian, CentOS, MacOS X & FreeBSD (thanks Go!).

To solve for Yum & Apt I wrote a tool in Go that keeps all the metadata in a JSON file in S3. It's just a map of S3 buckets that are repos to a list S3 URLs for each RPM / Deb. Whenever that file gets updated, just like OP, I send an event from S3 to Lambda. Unlike OP, tho, I just have Lambda launch a task in ECS and then my repo builds (& S3 syncing) takes place in containers (1 Ubuntu container for Apt repos & 1 CentOS container for Yum).


So, this way be dragons but... You could just have a small Javascript or Python wrapper that simply calls out to Aptly. As long as you compile Aptly for the right target you can bundle it up in the zip that your ship to Lambda. Since it's Go it'll just run. Then your wrapper just takes care of calling Aptly with the right arguments depending on what event got sent from S3.


I actually prefer using the containers. It's easily repeatable anywhere I can run docker (doesn't have to be AWS). I can also avoid any race conditions by checking to see if the task is already running.

(And in general I'm not a fan of trying to do a lot in lambda. I like it a lot more as a glue layer / event handler).


I use deb-s3[1] which is sort of like this, but handles uploading the package and updating the Packages index in a single step. deb-s3 can also sign your Release file so apt won't complain about unsigned packages.

The post mentioned deb-s3 and dismissed it as more complicated to setup than this solution. While the lambda solution is neat, I'm not sure I would describe it as simple. For now I think I'll stick with deb-s3.

[1]: https://github.com/krobertson/deb-s3


True, lambda setup is not super simple. However with deb-s3 you have to have a local copy of the .deb, whereas with what I'm doing the files never leave S3.


If you care about your time, PackageCloud is a hosted solution for managing dpkg and rpm repositories. It supports repository signing out of the box. I'm a happy customer.

https://packagecloud.io/


I love this!

But I have to admit I'm a little confused. Is this mostly for large repositories? I ship some software on multiple platforms and part of my build process builds an apt repo (available over HTTP) as follows:

    run_gpg_command dpkg-sig -g " --yes" \
        --sign builder *.deb && \
        rm -rf repo && \
        mkdir repo && \
        cp *.deb repo && \

        cd repo && \
            (dpkg-scanpackages . /dev/null > Packages) && \
            (bzip2 -kf Packages) && \
            (apt-ftparchive release . > Release) && \
                (run_gpg_command gpg --yes -abs -o Release.gpg Release) && \
                (gpg --export --armour > archive.key)
Then I sync it to S3 using aws s3 sync.

What are the problems doing it this way?


It's not just for large repos. The idea is that you can just build your package, drop it in S3 and the rest is taken care of. You don't need to generate the rest of the metadata yourself, like the Packages or Release file.

It does however not address the issue of signing in this case. Would be interesting to see if someone can extend it using an additional bucket to store credentials with the necessary policies to only allow the Lambda function to get and use them.


Seems to me that it's not really beneficial for most small software publishers in that case. They can just use a variant of the script I posted above, I suppose.


I use it for "small"ish repos. Mainly I don't want to maintain a copy on disk. I have a list of (and tooling around) what all .debs go in a release, and copy from a master S3 location to the correct apt repo s3 location, and the Packages index gets generated automatically. So I just create a new repo each time but not every .deb changes each time. Other architectural issues kind of drove the need to move to S3, and at that point I wanted to see if I could just keep everything in S3.


Looks really interesting!

I may be about to show my apt inexperience here, but I don't see any explicit callout of a signing step. Does apt handle signing only at the package level, such that you'd just sign when you build a single package and then upload that?

I've written a tool for managing Archlinux repos in S3 ( https://github.com/amylum/s3repo ), which I run in a container myself so that I can use my GPG key to sign the packages and the full repo metadata. I'd love to move to something like this, so I'd be able to let AWS handle running the lambda on my behalf, but I've yet to figure out a good way to do that with the signing structure necessary


Not OP but I left another comment further up-

I also run my Yum & Apt repo builds in containers and I have lambda launch those containers when S3 is updated...

In order to facilitate signing I have my GPG key on the container host machine and mount that directory as a volume in the container so the container has access to the key as well.

For initially provisioning the container server I deploy the GPG key as an encrypted blob and decrypt it on the host.


Yup yup. I do similarly at the moment. I was brainstorming a bit more, I think if I were to port to Lambda, I'd probably use a KMS-encrypted S3 object to store the private key.

That said, it would mean fixing up my original key strategy. At the moment, I'm using a signing subkey of my personal GPG key, since it's just a personal repo and I control the VM/container where the stuff happens. By comparison, if I were dropping it into a lambda/s3, I'd probably be more motivated to split onto a fully separate key for the repo.


Yeah you just sign the packages. There is an option to create a signed "Release" file, but I'm not creating a "Release" file. "Packages" is the only one I'm creating for now. If you wanted to create a signed release file or other files, you could just add that after the step that creates Packages.


Very interesting - I've been working on exactly the same approach for a yum repo, but hadn't solved the race condition issue yet. I might have to "be inspired" by this post :)


I'm wondering what people are using to _build_ custom deb packages. We currently have a set of simple shell scripts which first install the dependencies, then compile some source code and finally use checkinstall to build a Debian package from that. It works, but I'd much rather have a config file with all the required steps instead of a buggy shell script. Is there any better way?



Looks good! Additionally, there's fpm-cookery, which comes pretty close to what I want: https://github.com/bernd/fpm-cookery Somebody also created a Docker container for that: https://github.com/andytinycat/fpm-dockery

Now combining this tool with Jenkins, which triggers a build whenever I change the recipe would be all I need for now.


Any particular reason why there is no debian packaging for it? Debhelper makes it so trivial to package things these days.


Debhelper felt a bit too "low-level" when I first tried it. After learning a lot about Debian packages it might be a good choice now. Maybe I'll have another look. Thanks!


Check out the "not your grandpa's debhelper" presentation. (http://penta.debconf.org/dc9_schedule/events/418.en.html) The official documentation seems to disregard a lot of the recent (as in, from the last 8 years) improvements.


These days I barely need anything more than /usr/share/doc/debhelper/examples/rules.tiny


I use CMake/CPacl


I like that this solution takes into account the "race condition" for doing multiple things at once. To me that was the biggest issue I ran into with deb-s3 and it's why I wrote deb-simple (https://github.com/esell/deb-simple).


My co-worker just setup something similar for RPMs. his solution was Lambda + ECS - https://github.com/erumble/s3-repo-sync


This is an awesome write up.


Thanks! And thanks everyone else for all the comments!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: