4mo ago

Exposing docker socket to a container

Do you guys expose the docker socket to any of your containers or is that a strict no-no? What are your thoughts behind it if you don't? How do you justify this decision from a security standpoint if you do?

I am still fairly new to docker but I like the idea of something like Watchtower. Even though I am not a fan of auto-updates and I probably wouldn't use that feature I still find it interesting to get a notification if some container needs an update. However, it needs to have access to the docker socket to do its work and I read a lot about that and that this is a bad idea which can result in root access on your host filesystem from within a container.

There are probably other containers as well especially in this whole monitoring and maintenance category, that need that privilege, so I wanted to ask how other people handle this situation.

Cheers!

27 comments

I just follow the software release pages with RSS.
- That is actually a pretty good idea. I wanted to try out FreshRSS anyways, so this might be one more reason to do that. Thanks!
Per this guide https://cheatsheetseries.owasp.org/cheatsheets/Docker_Security_Cheat_Sheet.html I do not. I have a cron/service script that updates containers automatically (‘docker compose pull’ I think) that I don’t care if they fail for a bit (pdf converter, RSS reader, etc.) or they’re exposed to the internet directly (Authentik, caddy).
Note that smart peeps say that the docker socket is not safe as read-only. Watchtower is inherently untenable sadly, so is Traefik (trusting a docker-socket-proxy container with giga root permissions only made sense to me if you could audit the whole thing and keep auditing with updates and I cannot). https://stackoverflow.com/a/52333163 https://blog.quarkslab.com/why-is-exposing-the-docker-socket-a-really-bad-idea.html
I then just have scripts to do the ‘docker compose pull’ for things with oodles of breaking changes (Immich) or things I’d care if they did break suddenly (paperless).
Overall, I’ve only had a few break over a few years - and that’s because I also run all services (per link above) as a user, read-only, and with no capabilities (that aren’t required, afaik none need any). And while some containers are well coded, many are not, and if an update makes changes that want to write to ‘/npm/staging’ suddenly, the read-only torches that until I can figure it out and put in a tmpfs fix. The few failures are worth the peace of mind that it’s locked the fuck down.
I hope to move to podman sometime to eliminate the last security risk - the docker daemon running the containers, which runs as root. Rootless docker seems to be a significant hassle to do at any scale, so I haven’t bothered with that.
Edit: this effort is to prevent the attack vector of “someone hacks or buys access to a well-used project (e.g., Watchtower last updated 2 years ago, commonly used docker socket proxy, etc.) which is known to have docker socket access and then pushes a malicious update that to encrypt and ransom your server with root access escalations from the docker socket”. As long as no container has root, (and the container doesn’t breach the docker daemon…) the fallout from a good container turned bad is limited to the newly bad container.
- All true, wanted to add on to this:
  Note that smart peeps say that the docker socket is not safe as read-only.
  That's true, and it's not just something mildly imperfect, read-only straight up does nothing. For connecting to a socket, Linux ignores read-only mount state and only checks write permission on the socket itself. Read-only would only make it impossible to make a new socket there. Once you do have a connection, that connection can write anything it wants to it. Traefik and other "read-only" uses still have to send GET queries for the data they need, so that's happening for legitimate use cases too.
  If you really need a "GET-only" Docker socket, it has to be done with some other kind of mechanism, and frankly the options aren't very good. Docker has authorization plugins that seem like too much of a headache to set up, and proxies don't seem very good to me either.
  Or TLDR: :ro or stripping off permission bits doesn't do anything aside from potentially break all uses for the socket. If it can connect at all, it's root-equivalent or has all privileges of your rootless user, unless you took other steps. That might or might not be a massive problem for your setup, but it is something you should know when doing it.
  
  Thanks for explaining the underworkings, never dug to see what happened and how it works - I see it bad
- Thank you for your comment and the resources you provided. I definitely look into these. I like your approach of minimizing the attack surface. As I said, I am still new to all of this and I came across the user option of docker compose just recently when I installed Jellyfin. However, I thought the actual container image has to be configured in a way so that this is even possible. Otherwise you can run into permission errors and such. Do you just specify a non-root user and see if it still works?
  And while we're at it, how would you setup something like Jellyfin with regards to read-write permissions? I currently haven't restricted it to read-only and in my current setup I most certainly need write permissions as well because I store the artwork in the respective directories inside my media folder. Would you just save these files to the non-persisted storage inside the container because you can re-download them anyway and keep the media volume as read-only?
  
  So I've found that if you use the user: option with a user: UserName it requires the container to have that UserName alsoo inside. If you do it with a UID/GID, it maps the container's default user (likely root 0) to the UID/GID you provide user: 1500:1500. For many containers it just works, for linuxserver (a group that produces containers for stuff) containers I think it biffs it - those are way jacked up. I put the containers that won't play ball in a LXC container (via Incus GUI), or for simple permission fixes I just make a permissions-fixing version of the container (runs as root, but only executes commands I provide) to fill a volume with the data that has the right permissions then load that volume into the container. Luckily jellyfin doesn't need that.
  I give jellyfin read-only access (via :ro in the volumes:) to my media stuff because it doesn't need to write to it. I think it's fine if your use-case needs :rw, keep a backup (even if you :ro!).
  Here's my docker-compose.yml, I gave jellyfin its own IP with macvlan. It's pretty janky and I'm still working it, but you can have jellyfin use your server's IP by deleting everything after jellyfin-nw: (but keep jellyfin-nw:!) in both the networks: section and services: section. Delete the mac: in the services: section too. In the ports: part that 10.0.1.69 would be the IP of your server (or in this case, what I declare the jellyfin container's IP to be) - it makes it so the container can only bind to the IP you provide, otherwise it can bind to anything the server has access to (as far as I understand).
  And of course, I have GPU acceleration working here with some embeded Intel iGPU. Hope this helps!
  
  # --- NETWORKS --- networks: jellyfin-nw: # In docker, `macvlan` gets similar stuff to driver: macvlan driver_opts: parent: 'br0' # mode: 'l2' name: 'doc0' ipam: config: - subnet: "10.0.1.0/24" gateway: "10.0.1.1" # --- SERVICES --- services: jellyfin: container_name: jellyfin image: ghcr.io/jellyfin/jellyfin:latest environment: - TZ=America/Los_Angeles - JELLYFIN_PublishedServerUrl=https://jellyfin.guzzlezone.local/ ports: - '10.0.1.69:8096:8096/tcp' - '10.0.1.69:7359:7359/udp' - '10.0.1.69:1900:1900/udp' devices: - '/dev/dri/renderD128:/dev/dri/renderD128' # - '/dev/dri/card0:/dev/dri/card0' volumes: - '/mnt/ssd/jellyfin/config:/config:rw,noexec,nosuid,nodev,Z' - '/mnt/cache/jellyfin/log:/config/log:rw,noexec,nosuid,nodev,Z' - '/mnt/cache/jellyfin/cache:/cache:rw,noexec,nosuid,nodev,Z' - '/mnt/cache/jellyfin/config-cache:/config/cache:rw,noexec,nosuid,nodev,Z' # Media links below - '/mnt/spinner/movies:/data/movies:ro,noexec,nosuid,nodev,z' - '/mnt/spinner/shows:/data/shows:ro,noexec,nosuid,nodev,z' - '/mnt/spinner/music:/data/music:ro,noexec,nosuid,nodev,z' restart: unless-stopped # Security stuff read_only: true tmpfs: - /tmp:uid=2200,gid=2200,rw,noexec,nosuid,nodev # mac address is 02:42 then 10.0.1.69 in hex for each # betwen the .s mapped to the :s in the mac address # its how docker assigns so there will never be a mac address collision mac_address: 02:42:0A:00:01:45 networks: jellyfin-nw: # Docker is pretty jacked up and can't get an IP via DHCP so manually specify it ipv4_address: 10.0.1.69 user: 2200:2200 # gpu capability needs render capability, see the # for your server with `getent group render | cut -d: -f3` group_add: - "109" security_opt: - no-new-privileges:true cap_drop: - ALL
  Lastly thought I should add the external stuff needed for the hardware acceleration to work/get the user going:
  
  # For jellyfin low power (LP) intel QSV stuff # if trouble see https://jellyfin.org/docs/general/administration/hardware-acceleration/intel/#configure-and-verify-lp-mode-on-linux sudo apt install -y firmware-linux-nonfree #intel-opencl-icd sudo mkdir -p /etc/modprobe.d sudo sh -c "echo 'options i915 enable_guc=2' >> /etc/modprobe.d/i915.conf" sudo update-initramfs -u sudo update-grub APP_NAME="jellyfin" APP_PID=2200 sudo useradd -u $APP_PID $APP_NAME
  The Jellyfin user isn't added to the render group, rather the group is added to the container in the docker-compose.yml file.
I use Watchtower just to notify me of the updates. So the docker socket is read-only.
- Interesting. I just skimmed through the documentation again and couldn't find anything about read-only. How did you set it up exactly? Just because it isn't auto-updating i.e. writing something, doesn't necessarily mean it doesn't have write privileges.
Sorry this doesn't answer your question really but I've had issues when I used to auto update containers so stopped doing that. Some things have breaking changes, others just had issues in that release that caused me issues accessing stuff when not at home. I update every so often when I have ten minutes to do updates, check release notes and deal with any issues if they arise or roll back to that version. I spin up what's up docker to see what's changed then when finished, stop the container so it doesn't keep on polling docker hub using my free allowance.
In short, it could be an option to spin it up, let it run, then stop the container so theres less risk it could be used for an attack.
- That is the exact reason why I wouldn't use the auto-update feature. I just thought about setting it up to check for updates and give me some sort of notification. I just feel like a reminder every now and then helps me to keep everything up to date and avoid some sort of never change a running system mentality.
  Your idea about setting it up and only letting it run occasionally is definitely one to consider. It would at least avoid manually checking the releases of each container similar to the RSS suggestion of /u/InnerScientist
  
  To be honest, you would get frequent notifications for updates that are probably more often than just to remind you. If you're like me, you'll just end up ignoring them anyway! There are a lot of small updates to a lot of software, most often not from a security point of view but just as people develop their projects. I update every week if I can but can be a couple of weeks, in which I start to feel "guilty" so when it builds up I know I have to do it
If you mean updating the images themselves, I just use kubernetes and rolling updates. Works like a charm.
As for monitoring, kubernetes also handles that well. Liveness probes are kind of standard, then Prometheus for more intense monitoring.
If you don't mind the extra overhead it would probably address these issues for you.
- I have heard the name Kubernetes and know that is also some kind of container thing, but never went really deeper than that. It was more a general question how people handle the whole business of exposing the docker socket to a container. Since I came across it in Watchtower and considered installing that I used it as an example. I always thought that Kubernetes and Docker swarms and things like that are something for the future when I have more experience with Docker and containers in general, but thank you for the idea.
  
  I've seen three cases where the docker socket gets exposed to the container (perhaps there are more but I haven't seen any?):
  Watchtower, which does auto updates and/or notifies people
  Nextcloud AIO, which uses a management container that controls the docker socket to deploy the rest of the stuff nextcloud wants.
  Traefik, which reads the docker socket to automatically reverse proxy services.
  Nextcloud does the AIO, because Nextcloud is a complex service, but it grows to be very complex if you want more features or performance. The AIO handles deploying all the tertiary services for you, but something like this is how you would do it yourself: https://github.com/pimylifeup/compose/blob/main/nextcloud/signed/compose.yaml . Also, that example docker compose does not include other services, like collabara office, which is the google docs/sheets/slides alternative, a web based office.
  Compare this to the kubernetes deployment, which yes, may look intimidating at first. But actually, many of the complexities that the docker deploy of nextcloud has are automated away. Enabling the Collabara office is just collabara.enabled: true in the configuration of it. Tertiary services like Redis or the database, are included in the Kubernetes package as well. Instead of configuring the containers itself, it lets you configure the database parameters via yaml, and other nice things.
  For case 3, Kubernetes has a feature called an "Ingress", which is essentially a standardized configuration for a reverse proxy that you can either separate out, or one is provided as part of the packages. For example, the nextcloud kubernetes package I linked above, has a way to handle ingresses in the config.
  Kubernetes handles these things pretty well, and it's part of why I switched. I do auto upgrade, but I only auto upgrade my services, within the supported stable release, which is compatible for auto upgrades and won't break anything. This enables me to get automatic security updates for a period of time, before having to do a manual and potentially breaking upgrade.
  TLDR: You are asking questions that Kubernetes has answers to.
I use Podman with Diun (like Watchtower but no auto-updates) and I think that's the only time I've had to mount the socket into the container. Maybe also CrowdSec. Podman is rootless so I feel a bit better about it.
- Diun with Podman is a solid approch - I've been using it for months and it's way more secure than exposing the docker socket with watchtower, plsu the notifications are configurable without the auto-update risks (which saved my ass during a power outage when I had some great power stations from gearscouts.com keeping my server rack alive).
- I don't know anything about Podman but I think Docker also has a rootless mode, however I don't really know any details about that either. Maybe I should read more about that.
  Yeah, I think I also saw some fancy dashboard with Grafana and Prometheus where some part also required access to the socket (can't remember which), so I thought it might me more common to do that than I originally thought.
I use Docker Socket Proxy.
Linux Server IO has their own version too
- That sounds interesting, but I think I am following an approach where I don't have to expose the socket at all and see how far I can get with that. If I ever have to expose it, this will definitely be something to come back to. Thanks for the suggestion!

27 comments