Skip to content

UID and GID 999 are system IDs and are not to be used by non-default-distro processes such as container services #33

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
bbruun opened this issue Sep 4, 2024 · 10 comments

Comments

@bbruun
Copy link

bbruun commented Sep 4, 2024

I was trying out the docker image and setting up docker-compose and kept getting errors about access. That OS distro is RHEL 8.

It appears that "you" use UID 999 (and 1000 for some reason) and GID 999 for the service inside the docker image.
That overlaps with the default <1000 is system user and groups only.

You set the UID/GID's here: https://github.com/search?q=repo%3Avalkey-io%2Fvalkey-container%20999&type=code

On e.g. RHEL 8 then

  • UID 999 is the systemd-coredump user managed and installed by systemd
  • GID 999 is the input group (I don't know which one uses it on a server).

Could you fix this mixup of valkey being a system service maintained by the distros and make the UID and GID larger than 1000 which is also the advised method for non system accounts. E.g. 65535 for both would be ideal as it is the default "nobody" group on systems.

@polarathene
Copy link

It is fine to use whatever UID/GID in the container. <1000 is for system services, anything above that can be assigned for something else on the host as well. You cannot always have a container with a single UID/GID when a distinction needs to be made for operation.

In a rootless container, the container itself can run as UID 0 (root) and be mapped to UID on the host that the host deems appropriate. Then within the container any volume mount has the UID mapped accordingly, which is probably the correct approach. Podman does this well with --uidmap + --gidmap options.

I haven't looked into what this image is doing, but it's not uncommon for containers that only need to persist data with a specific UID/GID to support making that runtime configurable. You could also use named volumes instead of bind mount volumes if you don't need direct file access in the host (should be fine for a DB).

@bbruun
Copy link
Author

bbruun commented Feb 17, 2025

The problem is (was as I've fixed it) that the container uses UID 999 in the container and you don't document it on https://hub.docker.com/r/valkey/valkey and it, in my case, conflicted with the systemd-bus-proxy user which on RHEL gets UID/GID 999/997 by default causing a conflict with systemd and your container when run as currently documented.

Normal, I would say sane, defaults for a container is to make it as non-intrusive/secure by default as possible and e.g. use the nobody user with UID/GID of 65534 (or similar high UID/GID) to avoid conflicting with OS UID/GID's - like most other containers does.

If it somehow is a requirement for the container to run with UID 999 then please add it to the documentation of the container so that --uidmap/--gidmap or Docker's --user is part of the documentation.

I'm most likely not the only one that tries out the container and gets a UID overlap because of this. Not every test needs to be setup in a full blown lab that thinks of nothing but security.

Changing the UID from 999 to 65534 (or there abouts) is the easiest solution to avoid getting a CVE because of overlapping UID's with the host OS system which can grant more access if somehow it was possible to break out of the container and having a valid UID to work with.

@roshkhatri
Copy link
Member

Hey, Thank you so much for raising this issue and the information in this issue. To be frank it would be really helpful if you could raise a PR with a fix, that would also help other users to avoid getting a CVE.

is this the fix we are looking for?

addgroup -S -g 1000 valkey; \

# alpine already has a gid 999, so we'll use a nobody id
	addgroup -S -g 65534 valkey; \
	adduser -S -G valkey -u 65534 valkey

Also, previously redis container was also setup the same way, were we facing the same issue then?

@polarathene
Copy link

polarathene commented Feb 17, 2025

Might want to give this consideration:

Hello. Using a high uid/gid for files in the image requires reserving a lot of uids/gids per operating system user user when running docker rootless or podman rootless.

It would be more practical to keep nonroot to be 1000 or 1001. If no files are owned by nobody, then maybe it doesn't matter so much which uid does it have assigned.

NOTE: I've not investigated that myself, nor am I endorsing their suggestion to default to 1000/1001 instead. Just mentioning it as a datapoint 😅 (my personal preference is for images to default to 0:0 and support + document advice for --user or rootless containers, but I do understand why non-root is broadly adopted)


@bbruun @roshkhatri

Quick overview of 999:999 with popular database containers:


@bbruun

Could you fix this mixup of valkey being a system service maintained by the distros

I don't see how this is a mixup. It's perfectly normal with containers, I think you're just not familiar with that? If you insist that you know better, please cite images or resources that state this should/is the way it should be within containers (note that I cited links to various official Docker images that say otherwise).

I understand your concern from a sysadmin perspective when you don't take into consideration how it is with containers, that threw me off when I first got into working with containers myself. The thing is a container can vary in base image, so the UID/GID assignment by distro is not always in alignment, even with common system users/groups like bin,daemon,mail,uucp..

This UID/GID concern can vary across new releases of the same distro as well, but is more of an evident issue when an image installs packages that create users/groups, as the order can matter.

If your base image or your own were to change packages in a future build that would shift that install order, it can affect the UID/GID of the next image release made, such that any existing users persisted storage outside of the image is no longer in alignment with the UID/GID values for your containers users/groups, requiring manual fix by each affected user (or sometimes containers apply this at runtime with an entrypoint script).

Trying not to step on UID/GID used on the host system is a bit foolish as you will not know what these are. As the sysadmin for the containers running, any data you persist to the host storage is up to you with ensuring it's not mishandled, should the UID/GID conflict with assignment on the host that it causes a real problem (it rarely does in practice).

Keep your volume data persisted to a common location on the host system where this boundary is a non-issue, alternatively deploy with rootless or user namespace remapping (solves your issue properly). There is also ID Mapped mounts which rootful containers can leverage (Podman supports this, while Docker doesn't officially yet, it can be done manually IIRC).


make the UID and GID larger than 1000 which is also the advised method for non system accounts.

It's a system service, not one dependent upon a user session. Like the references I've provided above, they either explicitly create with 999 UID/GID or they have it implicitly by requesting system user/group during creation.

Normal, I would say sane, defaults for a container is to make it as non-intrusive/secure by default as possible and e.g. use the nobody user with UID/GID of 65534 (or similar high UID/GID) to avoid conflicting with OS UID/GID's - like most other containers does.

Citation needed for other official database images that are running as nobody please.

If it somehow is a requirement for the container to run with UID 999 then please add it to the documentation of the container so that --uidmap/--gidmap or Docker's --user is part of the documentation.

I agree with you here, I wish images would document their UID/GID more visibly, especially when it's not configurable (without extending/customizing the image)


I'm most likely not the only one that tries out the container and gets a UID overlap because of this. Not every test needs to be setup in a full blown lab that thinks of nothing but security.

Besides the UID overlap, does it cause any actual problem in practice? I assume your concern is related to a container escape, similar to escape as a root user?

If you're properly locking down the containers for security reasons, that really shouldn't be happening, you'll find escapes are reliant upon non-default capabilities being granted to the container root user (it's not equivalent to root on host which has far more capabilities granted by default).

The switch to a non-root user already results in all caps being dropped, thus the only time that becomes an issue is when the image modifies binaries with setcap to grant non-root users capabilities they'd otherwise not have (these are usually done with kernel enforcement check applied as well, rather than raising the capability for the process at runtime, thus if you drop the cap intentionally and don't use the feature, sadly the container fails to run as the dumb capability enforcement check prevents it).

Other than that the most common other way to break out is via access to the Docker API (usually the socket), but that's an explicit mount (and one that is forbidden access to with SELinux enabled for the container IIRC). I have seen some images run as non-root but give that non-root user permission to use the socket within the container 🤦‍♂

If you're serious about security though, unless you need rootful containers, go for rootless (these have some limitations, but are usually fine). The whole UID/GID concern will be handled for you implicitly (at least it is with Podman IIRC).

@polarathene
Copy link

@roshkhatri

is this the fix we are looking for?

You shouldn't be assigning a new user/group to the existing UID/GID already assigned to nobody/nogroup entries.

Personally I prefer containers to run as root (0:0) by default unless there's an actual need to switch to a non-root user. I know it goes against some "best practice" advice parroted around, but similar to VOLUME directive use, it often causes more problems.

The main value of the switch is for the convenience of a non-root user dropping all caps, so that the sysadmin deploying the container doesn't have to explicitly do so (drop caps, or change the container to run as a different UID/GID). Something that shouldn't be an issue if the sysadmin deployed with rootless containers instead, but I understand the precaution is taken given the broad audience that often doesn't know any better.

FWIW breakouts happen with non-root users too, and even if you are running in the container as nobody, this doesn't make you exempt from gaining access as root on the host when running rootful containers alright? All that requires is someone to misconfigure an image or container at runtime to enable the exploit, one of the obvious ones being access to the Docker socket.


If you want to go the extra mile, go get Valkey packaged as a slice for Canonical's chisel tool, so that the bare minimum for a container is installed and promote that as the default image (no shell, no package manager).

Then if Valkey can run as the nobody user by default, you can do that if you like, there's no need for a special valkey user if Valkey will be the only process running in the container, it has no relevance outside the container, where storage would be persisted only as UID/GID values (and show as whatever friendly text mapping exists in /etc/passwd + /etc/group on the host, if any does).

The sysadmin should ideally be able to use --user 0:0 (or whatever UID/GID pair they like) to run the image as that without issues. This is effectively the same as the user switch entrypoint, but is done by the container runtime prior to the entrypoint being run.


As I showed in my previous comment, Valkey is presently aligned with the same UID/GID as all other popular DB images. It would seem wise to be consistent there unless a proper discussion with the other projects can all agree on it being pragmatic to change away from that.

Personally the reasoning from @bbruun doesn't seem like good enough justification IMO, and I hope I've made that rather evident as to why with this verbose response.

@bbruun
Copy link
Author

bbruun commented Feb 19, 2025

Quick overview of 999:999 with popular database containers:

I understand this and I'm fully aware of this, but the valkey container just hit a ... nerve that overlapped with RHEL's systemd setup which the others don't (or haven't as of yet).

But just because "everyone else is doing it" then it does not make it the correct way or safe way or secure way to do it.

I don't see how this is a mixup. It's perfectly normal with containers, I think you're just not familiar with that? If you insist that you know better, please cite images or resources that state this should/is the way it should be within containers (note that I cited links to various official Docker images that say otherwise).

I am familiar with it (see my last sentence about my situation).

A few sources:
OWASP has a "RULE #2 -Set a user"
Configuring the container to use an unprivileged user is the best way to prevent privilege escalation attacks.

From https://linuxhandbook.com/uid-linux/
Do note that in most Linux distributions, UID 1-500 are usually reserved for system users. In Ubuntu and Fedora, UID for new users start from 1000.

RHEL that I'm on is in the Fedora family...

I understand your concern from a sysadmin perspective when you don't take into consideration how it is with containers, that threw me off when I first got into working with containers myself. The thing is a container can vary in base image, so the UID/GID assignment by distro is not always in alignment, even with common system users/groups like bin,daemon,mail,uucp..

I'm fully aware from the first FROM scratch container almost a decade ago until today to get into what it actually takes to make a container properly (and the FROM <distro>:<version> is a very very good starting point compared with scratch).
Most of the UID/GID issues can be handled by the (docker-)entrypoint.sh script and having a look then it is already ready for it by adding a usermod and chown if a env variable has been set to change runtime UID/GID.

This UID/GID concern can vary across new releases of the same distro as well, but is more of an evident issue when an image installs packages that create users/groups, as the order can matter.

Yes, hence the best practices to not create users in containers that are in well know UID/GID ranges for system users or accounts to specifically avoid that kind of scenario and nobody is a good candidate to use by default as it is a non-privileged user on most systems.

If your base image or your own were to change packages in a future build that would shift that install order, it can affect the UID/GID of the next image release made, such that any existing users persisted storage outside of the image is no longer in alignment with the UID/GID values for your containers users/groups, requiring manual fix by each affected user (or sometimes containers apply this at runtime with an entrypoint script).

I know - but since the container uses docker-entrypoint.sh and the container isn't using the USER setting then a usermod/groupmod could be added to change the UID/GID of the UID/GID in the container and chown the files using an environment variable and then valkey-server's params will use these.
Not an elegant solution but it works e.g. docker -e SET_UID=1234 -e SET_GID=1234 -p ...:... valkey or similar.

Trying not to step on UID/GID used on the host system is a bit foolish as you will not know what these are. As the sysadmin for the containers running, any data you persist to the host storage is up to you with ensuring it's not mishandled, should the UID/GID conflict with assignment on the host that it causes a real problem (it rarely does in practice).

True yet very avoidable by not using well known UID/GID ranges that are well know for system user and accounts.

Keep your volume data persisted to a common location on the host system where this boundary is a non-issue, alternatively deploy with rootless or user namespace remapping (solves your issue properly). There is also ID Mapped mounts which rootful containers can leverage (Podman supports this, while Docker doesn't officially yet, it can be done manually IIRC).

make the UID and GID larger than 1000 which is also the advised method for non system accounts.

It's a system service, not one dependent upon a user session. Like the references I've provided above, they either explicitly create with 999 UID/GID or they have it implicitly by requesting system user/group during creation.

A container is not a system service.
Volume mappings are always an issue due to exactly this problem. In Podman more then Docker as docker volumes (if directly mapped to the filesystem) sets the ownership automatically were as Podman running as the executing user sometimes have issues due to restrictions on non-root accounts. That is the main disadvantage of Podman - everything else is better IMHO.

Normal, I would say sane, defaults for a container is to make it as non-intrusive/secure by default as possible and e.g. use the nobody user with UID/GID of 65534 (or similar high UID/GID) to avoid conflicting with OS UID/GID's - like most other containers does.

Citation needed for other official database images that are running as nobody please.

I don't have any citations. It is based on old introduction docs from Dockers site back in the day, and almost all other "get started with containers" documentation.

The main issue here is that if you install Valkey/Redis/Postgres/Mongo etc. etc. from the package manager then it will create a user (and often group) using adduser/addgroup which will not overlap with any system users or accounts but in the container this is not possible as it is hardcoded.
There are generally two solutions to this

  1. use a very high UID e.g. the nobody user which exists on most systems
  2. use variables and usermod/chown in the entrypoint.sh script to fix it. People using the variables will also know to set directory owner ship for volume mounts as they specifically choose a UID/GID to run as.

If it somehow is a requirement for the container to run with UID 999 then please add it to the documentation of the container so that --uidmap/--gidmap or Docker's --user is part of the documentation.

I agree with you here, I wish images would document their UID/GID more visibly, especially when it's not configurable (without extending/customizing the image)

Agreement is a nice thing :-)

I'm most likely not the only one that tries out the container and gets a UID overlap because of this. Not every test needs to be setup in a full blown lab that thinks of nothing but security.

Besides the UID overlap, does it cause any actual problem in practice? I assume your concern is related to a container escape, similar to escape as a root user?

No - which is why I've not made a PR to "fix" the issue "for me" but only asked about it/make the maintainers aware of the problems with using low UID/GIDs, which I hope is OK?

If you're properly locking down the containers for security reasons, that really shouldn't be happening, you'll find escapes are reliant upon non-default capabilities being granted to the container root user (it's not equivalent to root on host which has far more capabilities granted by default).

Agree. But for testing out a container and the documentation or Docker Hub "example" does not document this then that is not something to take into account for testing - hence this issue.

The switch to a non-root user already results in all caps being dropped, thus the only time that becomes an issue is when the image modifies binaries with setcap to grant non-root users capabilities they'd otherwise not have (these are usually done with kernel enforcement check applied as well, rather than raising the capability for the process at runtime, thus if you drop the cap intentionally and don't use the feature, sadly the container fails to run as the dumb capability enforcement check prevents it).

Agree. But that is outside the scope of the low hardcoded UID/GID in the container.

Other than that the most common other way to break out is via access to the Docker API (usually the socket), but that's an explicit mount (and one that is forbidden access to with SELinux enabled for the container IIRC). I have seen some images run as non-root but give that non-root user permission to use the socket within the container 🤦‍♂

I know, but it is often better to be safe than sorry... hence I would guestimate that 99% of all security thinking is about what could happen vs how little the chance of it actually happening. Better safe than sorry.

If you're serious about security though, unless you need rootful containers, go for rootless (these have some limitations, but are usually fine). The whole UID/GID concern will be handled for you implicitly (at least it is with Podman IIRC).

I agree, but this was found in a test of the container on a test RHEL server. So not in scope to see if using the container is valid choice vs manual installation (I'm thinking ahead in regards to upgrades mostly).

Podman is not something my colleagues like - they have trouble enough understanding and accepting the use of containers to begin with as they only see issues with them vs normal package managed applications that they have been using for decades. I'll get them there, but I can't make them do 1000 new things that they don't understand from day 1, I have to do it doucement even if it means using Docker and its root daemon.

@bbruun
Copy link
Author

bbruun commented Feb 19, 2025

Hey, Thank you so much for raising this issue and the information in this issue. To be frank it would be really helpful if you could raise a PR with a fix, that would also help other users to avoid getting a CVE.

Because it is (was) an inconvenience for me when I had the user overlap but not critical for running the container.

alpine already has a gid 999, so we'll use a nobody id

addgroup -S -g 65534 valkey;
adduser -S -G valkey -u 65534 valkey

I know how to fix it - that is not the problem.

I would add 2 variables and have the docker-entrypoint.sh script do a usermod and chown before running valkey-server to fix it so it is possible to run it with custom a UID/GID.

Also, previously redis container was also setup the same way, were we facing the same issue then?

I didn't use the Redis container - migrating away from AWS version to on-prem and I was looking at alternatives and Valkey seems to be the best choice. And using the container is mostly to test out upgrades later on vs using a package manager or manual install.

@polarathene
Copy link

TL;DR: Apologies for verbosity, I'm short on time.

  • Error/access cited problem needs more context.
    • It should not only happen with Valkey when other DB images have the same UID/GID (Potential issue identified if using Docker Compose).
    • Before any "fix" is considered by maintainers, verifying via reproduction should be achieved first.
  • Rootless containers would avoid the host overlap concern.
    • Rootful containers can use ID mapped volumes (Docker might not support this properly until v28).
    • Rootful containers could also use UserNS remapping (I've not personally used this feature).

Quick overview of 999:999 with popular database containers:

I understand this and I'm fully aware of this, but the valkey container just hit a ... nerve that overlapped with RHEL's systemd setup which the others don't (or haven't as of yet).

Could you please clarify with a reproduction? Are you certain this is what you think it is and not an XY problem?

For example these DB images all use VOLUME, and if you were using Docker Compose with the same service name and no explicit volumes for persistence, Redis and Valkey images (among others) declare the same VOLUME /data instruction... which Docker Compose will gladly carry over when you change images but not the name of the service (services.<service name>.image).

That can cause a variety of mishaps if you're not careful.

You shouldn't be proposing a fix with a UID/GID change, when that's not reproducible for you on other similar images using that same UID/GID. Instead it's better to understand the difference between the images (or what you did) to cause the underlying failure scenario itself. Adjusting the UID/GID may have "fixed" it, but that's not necessarily the correct solution... you've tried connecting some dots, but from what you've described the same problem could occur if both images changed to the same UID/GID values and you repeated the steps, it may have nothing to do with the existing host assignment.


Where I do agree with a change is for consistency. This is not ok:

FROM alpine:3.21
# add our user and group first to make sure their IDs get assigned consistently, regardless of whatever dependencies get added
RUN set -eux; \
# alpine already has a gid 999, so we'll use the next id
addgroup -S -g 1000 valkey; \
adduser -S -G valkey -u 999 valkey

FROM debian:bookworm-slim
# add our user and group first to make sure their IDs get assigned consistently, regardless of whatever dependencies get added
RUN set -eux; \
groupadd -r -g 999 valkey; \
useradd -r -g valkey -u 999 valkey

Image variants should be compatible with their UID/GID for the containerized service. However changing them will also impact all existing users of the images, that is a breaking change. I haven't reviewed the image in full, so the valkey group itself may not have much relevance but changing the UID would.


I don't see how this is a mixup. It's perfectly normal with containers, I think you're just not familiar with that? If you insist that you know better, please cite images or resources that state this should/is the way it should be within containers (note that I cited links to various official Docker images that say otherwise).

I am familiar with it (see my last sentence about my situation).

A few sources: OWASP has a "RULE #2 - Set a user" Configuring the container to use an unprivileged user is the best way to prevent privilege escalation attacks.

From https://linuxhandbook.com/uid-linux/ Do note that in most Linux distributions, UID 1-500 are usually reserved for system users. In Ubuntu and Fedora, UID for new users start from 1000.

RHEL that I'm on is in the Fedora family...

Regarding OWASP advice, privilege escalation attacks can happen as non-root users but switching away from the root user will drop all capabilities to the container user implicitly for the sysadmin so that they don't have to, and it minimizes some damage in the event a user escapes as that user (there are privilege escalation attacks in which they can become root regardless).

I have seen some image authors blindly follow such advice, but do so poorly by implementing workarounds that reduce security via setcap, defeating the purpose entirely when they grant non-default privileges that assist in carrying out attacks.

Yes, <1000 UID is typically reserved for system users, the range is configurable but please understand what a system user is... Valkey qualifies as a system user as it would when installed on the host.

Non-system users are those with login shells and intended for actual user sessions. The kernel has some features like:

# Default is 1024, Docker runs this with it set to 0 implicitly,
# as it's reasonably safe in the container context:
sysctl net.ipv4.ip_unprivileged_port_start=80

# It wasn't the case with other container engines for some time though,
# So some images would instead grant their non-root program this capability
# to ignore the security restriction (as it would apply for root)
#
# NOTE: the `+e` enforces this by the kernel preventing the program from running
# if the capability were denied by the sysadmin, even when the program is configured
# to bind to a port above 1024 which would have otherwise been valid..
setcap 'cap_net_bind_service=+ep' /path/to/program

An unprivileged user can be a system user btw, a UID of 1000+ can also be privileged in the sense of being granted ambient capabilities (Docker does not support this AFAIK, systemd does however), rather than a program/process being granted capabilities to it's permitted set (setcap with p; while e is the effective set which a process itself could natively raise at runtime if it is permitted).


Most of the UID/GID issues can be handled by the (docker-)entrypoint.sh script and having a look then it is already ready for it by adding a usermod and chown if a env variable has been set to change runtime UID/GID.

# allow the container to be started with `--user`
if [ "$1" = 'valkey-server' -a "$(id -u)" = '0' ]; then
find . \! -user valkey -exec chown valkey '{}' +
exec setpriv --reuid=valkey --regid=valkey --clear-groups -- "$0" "$@"
fi

Ah alright, yeah that resolves the ownership concern provided no other separate tooling/images are expecting the 999 UID/GID, otherwise still breaking.

However for a common ENV configurable I see across images with PUID / PGID, I suppose that works and is useful when the container runs with a non-root user instead of leveraging rootless containers 😅

Off-topic: That conditional would look better like (works with both bash and ash, which /bin/sh symlinks to):

if [[ "$1" == 'valkey-server' && "$(id -u)" == '0' ]]; then

This UID/GID concern can vary across new releases of the same distro as well, but is more of an evident issue when an image installs packages that create users/groups, as the order can matter.

Yes, hence the best practices to not create users in containers that are in well know UID/GID ranges for system users or accounts to specifically avoid that kind of scenario and nobody is a good candidate to use by default as it is a non-privileged user on most systems.

Sorry, doesn't make sense to me here. For clarity so that we're on the same page, when you've explicitly mounted a volume to the host, anything written to that location is where your concern is?

Otherwise it's a bit bizarre to expect no conflict in UID/GID with containers and the host, base images differ here by distro, and for each image on those base images, any further system packages installed will shuffle new UID/GID assignment accordingly, you can't do much about that beyond choose fixed UID/GID in advance (I've had to do this for ClamAV with it's DB for example so that it has a stable ownership across image upgrades).

999 on your host is a system UID but not necessarily privileged... systemd-coredump itself will initially be invoked privileged to create a socket, but the related service runs unprivileged. Your UID/GID grants permissions (rwx) for ownership/access, but privileges are more to do with capabilities the process has (which utilities like setcap can grant at a file level aka "capability-dumb", or systemd can augment processes).

I think you have a misunderstanding with the relation of privilege to capabilities, not UID/GID ownership?


Trying not to step on UID/GID used on the host system is a bit foolish as you will not know what these are. As the sysadmin for the containers running, any data you persist to the host storage is up to you with ensuring it's not mishandled, should the UID/GID conflict with assignment on the host that it causes a real problem (it rarely does in practice).

True yet very avoidable by not using well known UID/GID ranges that are well know for system user and accounts.

Again... you can't justify that way when it's not consistent across distros 🤷‍♂

This UID/GID doesn't exist in Fedora by default unless systemd is installed (for desktop/server ISOs that's usually a given).

$ docker run --rm -it fedora:42

# Nothing:
$ cat /etc/passwd | grep coredump

# Install systemd:
$ dnf install -y systemd

$ cat /etc/passwd | grep coredump
systemd-coredump:x:998:998:systemd Core Dumper:/:/usr/sbin/nologin

$ cat /etc/group | grep coredump
systemd-coredump:x:998:

# This is what got assigned 999 instead:
$ cat /etc/passwd | grep oom
systemd-oom:x:999:999:systemd Userspace OOM Killer:/:/usr/sbin/nologin

$ cat /etc/group | grep oom
systemd-oom:x:999:

Now the UID is 998, while for you it's 999... Does Fedora need to "fix" this now? No, it's no different to using a VM or migrating data from another OS install (even if it's the same one where UID/GID were mismatched due to implicit assignment of UID/GID and package install time).


A container is not a system service.
Volume mappings are always an issue due to exactly this problem. In Podman more then Docker as docker volumes (if directly mapped to the filesystem) sets the ownership automatically were as Podman running as the executing user sometimes have issues due to restrictions on non-root accounts. That is the main disadvantage of Podman - everything else is better IMHO.

The container is a sandbox (namespace) that runs one or more processes, those can be services as they would be outside of a container, why you want to make a distinction here I do not know. If you need isolation from host UID/GID values, go with rootless or related rootful features for remapping.

Volumes are only allowed to write to disk what it's permitted to on the host, a non-issue when the container is rootful, or when you use rootless containers with user namespace remapping (/etc/subuid + /etc/subgid range for a user to leverage). You can also use ID mapped volumes (better supported in Podman, but requires rootful).

So no it's not really the volumes that are the problem... it's the way a user chooses to run the image, and how the image author approaches it in their image. If you just use root in the container, it's not really a problem is it, except for the time before rootless containers were available which is why we have all this "best practice" advice to run containers as non-root users internally (similar to VOLUME "best practice", despite it being effectively legacy). The equivalent rootful container with non-root user security benefits is mostly down to dropping capabilities for the root user, but that can be inconvenient friction for users to do (plus they'd need to explicitly grant back any capabilities that non-root required setcap workarounds to function).

Push for the PGID/PUID feature if you like, or just use rootless containers. It's true that rootless outside of volume concerns do have other limitations, but that shouldn't affect most containers.

I'm not sure what you are on about with Podman disadvantage... it's daemonless, unlike Docker. If you want rootful container, then run Podman commands as root?


Normal, I would say sane, defaults for a container is to make it as non-intrusive/secure by default as possible and e.g. use the nobody user with UID/GID of 65534 (or similar high UID/GID) to avoid conflicting with OS UID/GID's - like most other containers does.

Citation needed for other official database images that are running as nobody please.

I don't have any citations. It is based on old introduction docs from Dockers site back in the day, and almost all other "get started with containers" documentation.

Perhaps it was removed from the docs for a reason then if it was previously suggested? I don't see how all images using nobody is any better when their own data would then overlap with the ownership of other containers writing to the host allowing anybody as the nobody user to access it 🤷‍♂

The main issue here is that if you install Valkey/Redis/Postgres/Mongo etc. etc. from the package manager then it will create a user (and often group) using adduser/addgroup which will not overlap with any system users or accounts

The system packages avoid the conflict for a good reason, but I really don't see it being an issue with containers. By that logic, containers shouldn't have root 0:0 🤦‍♂ Your problem is entirely resolved with ID mapping, just use rootless? Depending on storage driver, IIRC the containers own internal filesystem layout can be on the host with all it's files accessible, that'd be even worse for your concern, except no host service should be interacting with that area of the filesystem unrelated to the service.

but in the container this is not possible as it is hardcoded.

The hard-coding has no relevance...? Without that, it's not going to read the hosts users and groups, that'd be very bad.

FWIW, the official Docker docs encourage pinning explicit UID/GID, and they warn about large UID/GID values (for nobody this isn't particularly bad, adds only 20MB to the image):

Image

For rootless containers, this also affects the range assignment (you allocate 2^16 sub-UID/GID for a host user).


I'm most likely not the only one that tries out the container and gets a UID overlap because of this. Not every test needs to be setup in a full blown lab that thinks of nothing but security.

Besides the UID overlap, does it cause any actual problem in practice? I assume your concern is related to a container escape, similar to escape as a root user?

No - which is why I've not made a PR to "fix" the issue "for me" but only asked about it/make the maintainers aware of the problems with using low UID/GIDs, which I hope is OK?

Yes questions are great, just a reminder that I'm not a maintainer of this image (I recall your first reply to me might have mistaken me for one).

BTW, I mean no disrespect in my responses where I'm potentially over-explaining things you already know as a sysadmin, but I have enough experience as a sysadmin and with containers that I feel I can weigh in that this sounds like an XY problem with potential knowledge gaps on your end for the more niche aspects.

You do seem rather knowledgeable but something seems off with the reported problem (errors and access) only affecting Valkey.


If you're properly locking down the containers for security reasons, that really shouldn't be happening, you'll find escapes are reliant upon non-default capabilities being granted to the container root user (it's not equivalent to root on host which has far more capabilities granted by default).

Agree. But for testing out a container and the documentation or Docker Hub "example" does not document this then that is not something to take into account for testing - hence this issue.

Locking down capabilities would be more advanced security practice. Using non-root users as default in images or adding the support for such is usually to benefit the majority audience that doesn't have a good grasp on such things and thus rely upon "best practice" advice from resources they trust.

As such you won't see that sort of guidance in specific images READMEs. I disagree with the practice of rootful containers using non-root users due to various caveats that can bring, the default capabilities for container root are fine as-is, but I understand the precaution, especially for projects that just want to offer Docker / Containers as a deployment option but otherwise don't have much of a handle on this sort of security that well either (I've seen enterprise grade, well funded projects that are open-source that don't implement things on their end properly too, so you can imagine why such practices are prevalent).

Docs wise, the image should at least communicate the UID/GID it uses when deviating from root as default.


Agree. But that is outside the scope of the low hardcoded UID/GID in the container.

Apologies, I'm more focused on the UID/GID concern in general, rather than specific to this image.

In that sense I consider it relevant context and in scope, should anyone arrive at this issue to go over the pro/cons of what is being discussed.


Other than that the most common other way to break out is via access to the Docker API (usually the socket), but that's an explicit mount (and one that is forbidden access to with SELinux enabled for the container IIRC). I have seen some images run as non-root but give that non-root user permission to use the socket within the container 🤦‍♂

I know, but it is often better to be safe than sorry... hence I would guestimate that 99% of all security thinking is about what could happen vs how little the chance of it actually happening. Better safe than sorry.

Fair point.


If you're serious about security though, unless you need rootful containers, go for rootless (these have some limitations, but are usually fine). The whole UID/GID concern will be handled for you implicitly (at least it is with Podman IIRC).

I agree, but this was found in a test of the container on a test RHEL server. So not in scope to see if using the container is valid choice vs manual installation (I'm thinking ahead in regards to upgrades mostly).

Sorry, I don't follow the concern here?

If you're evaluating a container vs host install, and your concern is the container writes volume data to the host with a UID/GID already assigned to something else, why wouldn't ID mapped volumes or rootless containers make sense?

For context, this specific image may only have the one UID/GID mapping concern so your nobody user/group solution works for you. Some containers though will have more than a single user/group for write/read access, it's uncommon but it does happen (I happen to maintain one).


Podman is not something my colleagues like - they have trouble enough understanding and accepting the use of containers to begin with as they only see issues with them vs normal package managed applications that they have been using for decades. I'll get them there, but I can't make them do 1000 new things that they don't understand from day 1, I have to do it document even if it means using Docker and its root daemon.

FWIW Docker does have rootless too, and if this is mostly a concern on the developers desktop systems, Docker Desktop while rootful is effectively rootless in the sense of it using a VM to manage Docker. You can use that on Linux too, which still provides the docker CLI. I mention this because Docker Desktop also has ECI (enhanced container isolation), should you need even stricter security requirements (this will add friction should a container require access to the docker socket).

I understand the choice to go with Docker for users and to an extent with developers. If your colleagues are sysadmins and already comfortable with Systemd however, Podman Quadlets is the Docker Compose equivalent in Podman but with systemd units (they use a generator service to extend config with some Podman/Container specific metadata settings, but the generator outputs a standard systemd service). The difference between rootful and rootless then becomes system or user scope systemd services based on standard systemd config locations 👍 (compose.yaml is otherwise more friendly and widely supported that troubleshooting can be easier, except with certain caveats specific to Docker Compose)

@bbruun
Copy link
Author

bbruun commented Feb 21, 2025

TL;DR: Apologies for verbosity, I'm short on time.

No problem.

  • Error/access cited problem needs more context.

    • It should not only happen with Valkey when other DB images have the same UID/GID (Potential issue identified if using Docker Compose).
    • Before any "fix" is considered by maintainers, verifying via reproduction should be achieved first.
  • Rootless containers would avoid the host overlap concern.

    • Rootful containers can use ID mapped volumes (Docker might not support this properly until v28).
    • Rootful containers could also use UserNS remapping (I've not personally used this feature).

Quick overview of 999:999 with popular database containers:

I understand this and I'm fully aware of this, but the valkey container just hit a ... nerve that overlapped with RHEL's systemd setup which the others don't (or haven't as of yet).

Could you please clarify with a reproduction? Are you certain this is what you think it is and not an XY problem?

For example these DB images all use VOLUME, and if you were using Docker Compose with the same service name and no explicit volumes for persistence, Redis and Valkey images (among others) declare the same VOLUME /data instruction... which Docker Compose will gladly carry over when you change images but not the name of the service (services.<service name>.image).

I just checked a slew of RHEL servers. There is an overlap with 3 distro package manged applications that install a user using adduser/addgroup to get a dynamic system account:

polkitd:x:999:998:User for polkitd:/:/sbin/nologin
systemd-bus-proxy:x:999:997:systemd Bus Proxy:/:/sbin/nologin
systemd-coredump:x:999:997:systemd Core Dumper:/:/sbin/nologin

So any of these servers running a container with a user using the hardcoded UID 999 will overlap on these servers.

That can cause a variety of mishaps if you're not careful.

I'm pretty sure that, in case of a container exploit that breaks out of the container, could to quit some damage as teh polkitd user.

You shouldn't be proposing a fix with a UID/GID change, when that's not reproducible for you on other similar images using that same UID/GID. Instead it's better to understand the difference between the images (or what you did) to cause the underlying failure scenario itself. Adjusting the UID/GID may have "fixed" it, but that's not necessarily the correct solution... you've tried connecting some dots, but from what you've described the same problem could occur if both images changed to the same UID/GID values and you repeated the steps, it may have nothing to do with the existing host assignment.

I would say that the +500 servers I just checked is effectively a reproduction of the issue, hence it is OK for me to propose a fix or workround for this problem since the disto comes before the container and the container is the has the overlapping UID.


Where I do agree with a change is for consistency. This is not ok:

Image variants should be compatible with their UID/GID for the containerized service. However changing them will also impact all existing users of the images, that is a breaking change. I haven't reviewed the image in full, so the valkey group itself may not have much relevance but changing the UID would.

The valkey user and group are just UID/GID's the valkey-server changes to after start and can be used for volume mapping. The actual integer values are irrelevant as long as they are withing the valid range on the host OS.

Regarding OWASP advice, privilege escalation attacks can happen as non-root users but switching away from the root user will drop all capabilities to the container user implicitly for the sysadmin so that they don't have to, and it minimizes some damage in the event a user escapes as that user (there are privilege escalation attacks in which they can become root regardless).

Yes, and that is what this is about - not escaping as an existing user with privileges, poteltially polkitd permissions as show above in my case on a few systems.
I'm pretty sure I'm not the only one running RHEL and having system daemons from systemd or polkitd or similar using UID 999.

I have seen some image authors blindly follow such advice, but do so poorly by implementing workarounds that reduce security via setcap, defeating the purpose entirely when they grant non-default privileges that assist in carrying out attacks.

Agreed. Containers are not native system services - they are just that: ephemeral containers - the volume mappings and e.g. docker compose files for setting up and restoring a runtime environment, but they are not system services - there are more akin to the old applications run directly off a CDROM/DVD you got with magazines.

Yes, <1000 UID is typically reserved for system users, the range is configurable but please understand what a system user is... Valkey qualifies as a system user as it would when installed on the host.

Valkey does when installed via a package manager or manually on the distro, but not in a container as the container's hardcoded values are not part of the operating systems overall management and that is where this issue arises from.

Non-system users are those with login shells and intended for actual user sessions. The kernel has some features like:

Default is 1024, Docker runs this with it set to 0 implicitly,

as it's reasonably safe in the container context:

sysctl net.ipv4.ip_unprivileged_port_start=80

It wasn't the case with other container engines for some time though,

So some images would instead grant their non-root program this capability

to ignore the security restriction (as it would apply for root)

NOTE: the +e enforces this by the kernel preventing the program from running

if the capability were denied by the sysadmin, even when the program is configured

to bind to a port above 1024 which would have otherwise been valid..

setcap 'cap_net_bind_service=+ep' /path/to/program
An unprivileged user can be a system user btw, a UID of 1000+ can also be privileged in the sense of being granted ambient capabilities (Docker does not support this AFAIK, systemd does however), rather than a program/process being granted capabilities to it's permitted set (setcap with p; while e is the effective set which a process itself could natively raise at runtime if it is permitted).

Yes, it can, but in this case we are talking about a container, not a package managed application that creates a normal user and then gets extra permissions to e.g. run natively on port 80 without doing it like e.g. Apache that starts as root and then binds to port 80 and then switched runtime userid.

Most of the UID/GID issues can be handled by the (docker-)entrypoint.sh script and having a look then it is already ready for it by adding a usermod and chown if a env variable has been set to change runtime UID/GID.

valkey-container/docker-entrypoint.sh

Lines 10 to 14 in 796eef7

allow the container to be started with --user

if [ "$1" = 'valkey-server' -a "$(id -u)" = '0' ]; then
find . ! -user valkey -exec chown valkey '{}' +
exec setpriv --reuid=valkey --regid=valkey --clear-groups -- "$0" "$@"
fi
Ah alright, yeah that resolves the ownership concern provided no other separate tooling/images are expecting the 999 UID/GID, otherwise still breaking.

True, but the main issue here is the lack of documentationfor the containre on Docker Hub which does not state either the UID the container want's to run as, nor informs potential users that --user <string|int> can be used.
The above entry point is quite good and usable, but requires going through the repo to find it to see if it can be used or not.

However for a common ENV configurable I see across images with PUID / PGID, I suppose that works and is useful when the container runs with a non-root user instead of leveraging rootless containers 😅

Perhaps - I'm mostly interrested in not having hardcoded UID/GID's in containers running on host servers where they have no clue what is running underneath. Yes there are rootless and --user options but if the container was created with "this is running in a container and should not overlap with the host operating system" then there would not be an issue.

Off-topic: That conditional would look better like (works with both bash and ash, which /bin/sh symlinks to):

Depends - the image is alpine IIRC so it does not need to be usable by other base images.

Yes, hence the best practices to not create users in containers that are in well know UID/GID ranges for system users or accounts to specifically avoid that kind of scenario and nobody is a good candidate to use by default as it is a non-privileged user on most systems.

Sorry, doesn't make sense to me here. For clarity so that we're on the same page, when you've explicitly mounted a volume to the host, anything written to that location is where your concern is?

No - I ran the container with a volume mount to a path on the host - using Docker. When I looked at the path it was owned by the hosts user systemd-coredump and then i looked into it and found out that the UID valkey runs as in the container is 999 which is <1000 which IMHO it shouldn't as it is technically running as another service on the host which is not good.
Setting the UID/GID in the container to something high or the default UID for nobody on most hosts then this would be less of an issue.

Documenting the requirement for the container could have made me prep the host to not have the systemd-coredump user on UID 999 or re-build the container and/or run it with --user or rootless or rebuild the container so it uses 65534 instead.

But neither was done on the test I ran.

Otherwise it's a bit bizarre to expect no conflict in UID/GID with containers and the host, base images differ here by distro, and for each image on those base images, any further system packages installed will shuffle new UID/GID assignment accordingly, you can't do much about that beyond choose fixed UID/GID in advance (I've had to do this for ClamAV with it's DB for example so that it has a stable ownership across image upgrades).

No, what is bizarre is to think of an application running in a container as a system service on the host when the host can be any random OS that is setup in any kind of random way and only (for Linux based OS'es) mostly adhere to best practices e.g. keep services accounts below UID <1000 and users above (which I think started being the norm around ~20years ago as it used to be UID <500 when I started learning Linux in the late '90-ies (I'm old - sorry).

999 on your host is a system UID but not necessarily privileged... systemd-coredump itself will initially be invoked privileged to create a socket, but the related service runs unprivileged. Your UID/GID grants permissions (rwx) for ownership/access, but privileges are more to do with capabilities the process has (which utilities like setcap can grant at a file level aka "capability-dumb", or systemd can augment processes).

True, but polkitd is by nature is and there might be other users with other setups or services (monitoring/management/etc.) services running on the host before Docker is even installed making using UID's <1000 in containers (IMHO <65000) a problem.

I think you have a misunderstanding with the relation of privilege to capabilities, not UID/GID ownership?

No I'm not - I'm well aware of both and how they differ but I'm also aware that applications running in containers are not system services - they are containerized applications that are not part of the system hence they should not overlap with the host - that is kinda the whole idea aka keep them separate form the underlying host, ephemeral and host agnostic.

Trying not to step on UID/GID used on the host system is a bit foolish as you will not know what these are. As the sysadmin for the containers running, any data you persist to the host storage is up to you with ensuring it's not mishandled, should the UID/GID conflict with assignment on the host that it causes a real problem (it rarely does in practice).

True yet very avoidable by not using well known UID/GID ranges that are well know for system user and accounts.

Again... you can't justify that way when it's not consistent across distros 🤷‍♂

Having the issue on RHEL using systemd is enough and I don't have to recreate on the 1000s of distros out there to justify a basic issue that clearly thinks a container is a host system service when it is not.

This UID/GID doesn't exist in Fedora by default unless systemd is installed (for desktop/server ISOs that's usually a given).

Did you test RedHat Enterprise Linux ?

$ docker run --rm -it fedora:42

Nothing:

$ cat /etc/passwd | grep coredump

Install systemd:

$ dnf install -y systemd

$ cat /etc/passwd | grep coredump
systemd-coredump:x:998:998:systemd Core Dumper:/:/usr/sbin/nologin

$ cat /etc/group | grep coredump
systemd-coredump:x:998:

This is what got assigned 999 instead:

$ cat /etc/passwd | grep oom
systemd-oom:x:999:999:systemd Userspace OOM Killer:/:/usr/sbin/nologin

$ cat /etc/group | grep oom
systemd-oom:x:999:
Now the UID is 998, while for you it's 999... Does Fedora need to "fix" this now? No, it's no different to using a VM or migrating data from another OS install (even if it's the same one where UID/GID were mismatched due to implicit assignment of UID/GID and package install time).

Running and testing such things in a container base image does not compare to a real VM or bare metal server that is actaully setup to be useful for people.

And just for clarification then I have a mix of chrony, polkitd, sssd, libstoragemgmtand systemd-network for UID 998, hence UID 999 is being used for e.g.systemd-coredump.
This is most likely because someone installed a system service, with a custom high UID but <1000, using adduser and the counter then started from there.

A container is not a system service.
Volume mappings are always an issue due to exactly this problem. In Podman more then Docker as docker volumes (if directly mapped to the filesystem) sets the ownership automatically were as Podman running as the executing user sometimes have issues due to restrictions on non-root accounts. That is the main disadvantage of Podman - everything else is better IMHO.

The container is a sandbox (namespace) that runs one or more processes, those can be services as they would be outside of a container, why you want to make a distinction here I do not know. If you need isolation from host UID/GID values, go with rootless or related rootful features for remapping.

Yes they can be services, running what ever your heart desires, but they are not system services in the way you try to define it - they are containers, not system services. They are ephemeral and can be moved to other hosts at any given time which is exactly what Kubernets, Docker Swarm, etc. does - as they are not host specific like real system services - they are containers used to make distribution and running the application in them easy as they have all dependencies inside the container vs a system service that is part of the host systems filesystem (and the slew of hash'ed files under /var/lib/docker/... does not count as that).

Volumes are only allowed to write to disk what it's permitted to on the host, a non-issue when the container is rootful, or when you use rootless containers with user namespace remapping (/etc/subuid + /etc/subgid range for a user to leverage). You can also use ID mapped volumes (better supported in Podman, but requires rootful).

Yes, and that is where the UID overlap originates from - but you writing these things here does not fix the missing documetation for the container when finding it and testing it out...

So no it's not really the volumes that are the problem... it's the way a user chooses to run the image, and how the image author approaches it in their image. If you just use root in the container, it's not really a problem is it, except for the time before rootless containers were available which is why we have all this "best practice" advice to run containers as non-root users internally (similar to VOLUME "best practice", despite it being effectively legacy). The equivalent rootful container with non-root user security benefits is mostly down to dropping capabilities for the root user, but that can be inconvenient friction for users to do (plus they'd need to explicitly grant back any capabilities that non-root required setcap workarounds to function).

The user (me in this case) didn't choose the UID 999 nor the existing systemd-coredump user having 999 as a UID. The problem/issue is that the container assumes, from multiple statements above and the docker-entrypoint.sh scirpt that it is an integrated part of the host operating system when in fact it is not and the UID the service runs as in the container is absolutely irrelevant for what ever runs on the host. That is the core issue here.

Push for the PGID/PUID feature if you like, or just use rootless containers. It's true that rootless outside of volume concerns do have other limitations, but that shouldn't affect most containers.

I'm not pushing for it - I made the ticket to make the team aware of an issue that seems to arise from "it is a system service and should have the same UID/GID to run" and "it works on my computer so no problem with this configu" and thinking that an application in a container is a native package managed system service which it is not.

I'm not sure what you are on about with Podman disadvantage... it's daemonless, unlike Docker. If you want rootful container, then run Podman commands as root?

I have some issues with volumes with some containers where my user can't create volumes for the container running without using sudo to make them. That is the kind I'm talking about. Running containers without volumes is not an issue.

Normal, I would say sane, defaults for a container is to make it as non-intrusive/secure by default as possible and e.g. use the nobody user with UID/GID of 65534 (or similar high UID/GID) to avoid conflicting with OS UID/GID's - like most other containers does.

Citation needed for other official database images that are running as nobody please.

I don't have any citations. It is based on old introduction docs from Dockers site back in the day, and almost all other "get started with containers" documentation.

Perhaps it was removed from the docs for a reason then if it was previously suggested? I don't see how all images using nobody is any better when their own data would then overlap with the ownership of other containers writing to the host allowing anybody as the nobody user to access it 🤷‍♂

I think they rewamped most of their site a few years ago when they switched to containerd as backend and changed focus to be an enterprise product where you either buy courses/documentation/support or are required to know beforehand. From what I know then they have a great forum and user ecosystem but non of that adheres to lack of documentation on the Docker Hub page for the valkey container.

The main issue here is that if you install Valkey/Redis/Postgres/Mongo etc. etc. from the package manager then it will create a user (and often group) using adduser/addgroup which will not overlap with any system users or accounts

The system packages avoid the conflict for a good reason, but I really don't see it being an issue with containers. By that logic, containers shouldn't have root 0:0 🤦‍♂ Your problem is entirely resolved with ID mapping, just use rootless? Depending on storage driver, IIRC the containers own internal filesystem layout can be on the host with all it's files accessible, that'd be even worse for your concern, except no host service should be interacting with that area of the filesystem unrelated to the service.

I could - but as stated several times then I tested the container and observed the hardcoded UID 999 which overlapped with actual system services (systemd-coredump in this case on this server which are also used by 2 other system accounts/users on other RHEL servers).

I'll repeat: A container is not a system service and should not overlap with anything on the host operating system to avoid conflicts as it is an isolated application. It might run a service that is considered a system service, but since it is not part of the system then it is not a system service, it is an isolated contanerized application with no access to the host operating system and it should ad-here to that state and not try to run as a host system package managed application that actually adhere to dynamic UID/GID's when installing to avoid overlapping UID/GIDs.

Your problem is entirely resolved with ID mapping

Yes, but please keep in mind I just ran a test of the container as per the Docker Hub documentation and saw the UID overlap when I added a volume and then made you guys aware of it.

but in the container this is not possible as it is hardcoded.

The hard-coding has no relevance...? Without that, it's not going to read the hosts users and groups, that'd be very bad.

The hardcoded UID is the root of this issue, so yes, it very much has relevance. How you do not see this is kinda frightening.
Imaging running a container with hardcoded UID 1000 which is most desktop's users default UID with sudo rights and an application in the container escapes: full host control. Not good.
Same with system service UID's <1000 as the person hardcoding 999 in the container does not know the host operating systems setup nor knows if it will be attached and escaped from.

FWIW, the official Docker docs encourage pinning explicit UID/GID, and they warn about large UID/GID values (for nobody this isn't particularly bad, adds only 20MB to the image):

Image

For rootless containers, this also affects the range assignment (you allocate 2^16 sub-UID/GID for a host user).

Yes. But it is totally irrelevant for testing the container as per the Docker Hub container documentation lacking any kind of UID information.

No - which is why I've not made a PR to "fix" the issue "for me" but only asked about it/make the maintainers aware of the problems with using low UID/GIDs, which I hope is OK?

Yes questions are great, just a reminder that I'm not a maintainer of this image (I recall your first reply to me might have mistaken me for one).

I might have for which I'm sorry. Long thread :-P

BTW, I mean no disrespect in my responses where I'm potentially over-explaining things you already know as a sysadmin, but I have enough experience as a sysadmin and with containers that I feel I can weigh in that this sounds like an XY problem with potential knowledge gaps on your end for the more niche aspects.

No worries, but apparently we've learned about containers differently. I've been with it since the begining and know about rootless and --user and rebuiding images I dislike to adhere to practical security practices to e.g. changing UID to something that does not overlap with the hostsystem aka <1000 and often use >65000 to be sure as I've yet since '98 seen users be created in that UID range. I often use variables to set UID/GID dynamically in my won containers like you already mentioned.

You do seem rather knowledgeable but something seems off with the reported problem (errors and access) only affecting Valkey.

Thanks and you too.

My only problem here is the hardcoded UID of 999 which is something I will defend until my death... or containers automatically UID re-map containers to avoid such trivial issues.

If you're properly locking down the containers for security reasons, that really shouldn't be happening, you'll find escapes are reliant upon non-default capabilities being granted to the container root user (it's not equivalent to root on host which has far more capabilities granted by default).

Agree. But for testing out a container and the documentation or Docker Hub "example" does not document this then that is not something to take into account for testing - hence this issue.

Locking down capabilities would be more advanced security practice. Using non-root users as default in images or adding the support for such is usually to benefit the majority audience that doesn't have a good grasp on such things and thus rely upon "best practice" advice from resources they trust.

We fully agree. However when using non-root users in containers then they should also be setup to be in a ID range outside the normal host operating systems the container is intended to run on aka well above >1000. In my opinion then it should be >65000 .

As such you won't see that sort of guidance in specific images READMEs. I disagree with the practice of rootful containers using non-root users due to various caveats that can bring, the default capabilities for container root are fine as-is, but I understand the precaution, especially for projects that just want to offer Docker / Containers as a deployment option but otherwise don't have much of a handle on this sort of security that well either (I've seen enterprise grade, well funded projects that are open-source that don't implement things on their end properly too, so you can imagine why such practices are prevalent).

I've seen the same - and Valkey seems to be the industry's new baby so it should adhere to these practices and not try to just be a native system service as it is not when running in a container.

Docs wise, the image should at least communicate the UID/GID it uses when deviating from root as default.

Agree. But that is outside the scope of the low hardcoded UID/GID in the container.

Apologies, I'm more focused on the UID/GID concern in general, rather than specific to this image.

In that sense I consider it relevant context and in scope, should anyone arrive at this issue to go over the pro/cons of what is being discussed.

I disagree. Since this tickets scope is specifically about the hardcoded use of UID 999 in the container then it is in my view very much in scope.

Other than that the most common other way to break out is via access to the Docker API (usually the socket), but that's an explicit mount (and one that is forbidden access to with SELinux enabled for the container IIRC). I have seen some images run as non-root but give that non-root user permission to use the socket within the container 🤦‍♂

I know, but it is often better to be safe than sorry... hence I would guestimate that 99% of all security thinking is about what could happen vs how little the chance of it actually happening. Better safe than sorry.

Fair point.

If you're serious about security though, unless you need rootful containers, go for rootless (these have some limitations, but are usually fine). The whole UID/GID concern will be handled for you implicitly (at least it is with Podman IIRC).

I agree, but this was found in a test of the container on a test RHEL server. So not in scope to see if using the container is valid choice vs manual installation (I'm thinking ahead in regards to upgrades mostly).

Sorry, I don't follow the concern here?

If you're evaluating a container vs host install, and your concern is the container writes volume data to the host with a UID/GID already assigned to something else, why wouldn't ID mapped volumes or rootless containers make sense?

Evaluation is more than setting it up and running it in production - testing includes a lot of things. In this case testing wether or not the container actually works as I expected without altering and getting to know it.

I've not used Valkey before and needed to see if my previous Redis knowledge from my previous company worked and it seems to do with version 7, but seems to need updating for version 8 from what I can remember from the Valkey documentation about the future.

During the testing I observed the UID overlap and created this ticket as a container shouldn't overlap with the host operating system - just like not having multiple services installed all trying to use e.g. the same port (80 for apache/nginx/etc.) - not good either but when it comes to hardcoded values in an ephemeral container created by someone else then it should not overlap like overlapping/identical system services such as webservers or as mentioned in another answer valkey, postgres, mariadb, mysql and mongo etc. If they use unique UID/GIDs then there is less change of something happening, even by configuration mistake in a docker compose volume update or in Kubernetes with a PVC name overlap.

For context, this specific image may only have the one UID/GID mapping concern so your nobody user/group solution works for you. Some containers though will have more than a single user/group for write/read access, it's uncommon but it does happen (I happen to maintain one).

Using nobody was just an example that could be used. For kubernetes I would still recommend dynamic UID/GIDs for applications in containers as I just mentioned above.

Podman is not something my colleagues like - they have trouble enough understanding and accepting the use of containers to begin with as they only see issues with them vs normal package managed applications that they have been using for decades. I'll get them there, but I can't make them do 1000 new things that they don't understand from day 1, I have to do it document even if it means using Docker and its root daemon.

FWIW Docker does have rootless too, and if this is mostly a concern on the developers desktop systems, Docker Desktop while rootful is effectively rootless in the sense of it using a VM to manage Docker. You can use that on Linux too, which still provides the docker CLI. I mention this because Docker Desktop also has ECI (enhanced container isolation), should you need even stricter security requirements (this will add friction should a container require access to the docker socket).

Docker didn't have it when containers started to gain traction ~8ish years ago and that is when they (old farts) made up their mind about containers is a bad way to run things. So I'm on a quest...

I understand the choice to go with Docker for users and to an extent with developers. If your colleagues are sysadmins and already comfortable with Systemd however, Podman Quadlets is the Docker Compose equivalent in Podman but with systemd units (they use a generator service to extend config with some Podman/Container specific metadata settings, but the generator outputs a standard systemd service). The difference between rootful and rootless then becomes system or user scope systemd services based on standard systemd config locations 👍 (compose.yaml is otherwise more friendly and widely supported that troubleshooting can be easier, except with certain caveats specific to Docker Compose)

They are, but they are old farts, like me, though unlike them I like containers (and my manager does too), so I'm on a quest to make them love containers and get into the mindset and make them understand they don't have to re-learn 30 years of Linux/BSD commands to live in the future.
DevOps and containers is not all over the place, yet.

@polarathene
Copy link

TL:DR: This response took a considerable amount of time of my day (not your fault, I'm just bad at estimating how long I take to respond). I'm going to try summarize, you are not expected to respond to everything below (especially since it's iterative and I don't have the additional time to revise/compact it after the fact).

  • I don't intend to repeat that in this discussion, we have a similar communication style that's introducing repetitive elements or outdated responses due to the length of our replies.
  • We have different opinions regarding UID/GID in containers and what is acceptable, that's fine.
  • The actual issue (UID/GID and documentation aside) regarding errors and access were not reported well, but without a reproduction as detailed below (not what you were thinking of), I can only assume it is an XY problem. See below if it wasn't apparent that this Valkey image works just fine with --user option.
  • Resolution of the issue is effectively documentation improvements for usage/info regarding UID/GID. No change to the Dockerfile or entrypoint is necessary or encouraged regarding UID/GID.
  • I side track a fair amount below and get a bit frustrated in my response towards the end, I'm tired, so apologies if I come off a bit irritated or ignorant.

As the discussion was also steering off into a bit more informal direction, should you want to DM me directly about anything instead, you can do so via my LinkedIn.

EDIT: I just saw how long this response is.. 😨 I can't imagine any sane person taking the time to read through it. I want to emphasize again that I do not have any expectations of a full response, the TL;DR above is enough with any relevant context (like the code fences below) as supplementary. I do not want to encourage further drawn out discussion as I cannot afford to spare more time responding in kind.


Could you please clarify with a reproduction? Are you certain this is what you think it is and not an XY problem?
For example these DB images all use VOLUME, and if you were using Docker Compose with the same service name and no explicit volumes for persistence, Redis and Valkey images (among others) declare the same VOLUME /data instruction... which Docker Compose will gladly carry over when you change images but not the name of the service (services.<service name>.image).

I just checked a slew of RHEL servers. There is an overlap with 3 distro package managed applications that install a user using adduser/addgroup to get a dynamic system account:

polkitd:x:999:998:User for polkitd:/:/sbin/nologin
systemd-bus-proxy:x:999:997:systemd Bus Proxy:/:/sbin/nologin
systemd-coredump:x:999:997:systemd Core Dumper:/:/sbin/nologin

So any of these servers running a container with a user using the hardcoded UID 999 will overlap on these servers.

You misunderstood the request...

You have reported that Valkey is at fault, but not other common DB images which also use UID/GID of 999 and have VOLUME (some with the same internal path, which can cause bugs when you're doing these tests via compose.yaml and not changing the service name, due to how Docker Compose handles anonymous volumes differently).

What I want is a more helpful reproduction of the claim that only Valkey produces these errors/access concerns.

I am more than happy to test it against a host OS with 999 UID and GID already assigned, but my point was what you are stating should apply to the other DB images as well, hence the request for reproduction to verify as it sounds like an XY problem (the errors you refer to are potentially due to implicit behaviour you're not aware of, and thus leading you to think it's only the fault of Valkey).

If I am right, then it's not Valkey at fault, but Compose (or images using VOLUME rather). I've tried to reason about it upstream, but Docker maintainers and other DB image authors disagree with my views that VOLUME is legacy and causes more problems than good. These have been issues known / reported for 8 years. I think there's been a request for opt-out via daemon config or CLI option for about 2-3 years? (Podman supports this opt-out FWIW)

So I hope the request for reproduction and the reasoning for that is more clear now.


That can cause a variety of mishaps if you're not careful.

I'm pretty sure that, in case of a container exploit that breaks out of the container, could to quit some damage as the polkitd user.

I know what Polkit is, but I've honestly not done much with it directly at that level to explore. Are you sure that it's capable of doing anything without privileges? (capabilities granted to it)

I'm somewhat doubtful much damage could be done, and it sounds like you're just theorizing it rather than providing an actual proof of such so I assume you don't know anymore than I do.

The most likely damage that would be doable would be write access to some config file that a privileged process would then use, or the host service running with that user/group performing something unprivileged that's problematic. These are avoided by having root ownership for write access and group ownership to a user for read access restrictions, but usually 644 / rwx-r--r- is fine. Services don't necessarily need the ability to write to their own configs.

So I'm not saying it's not possible to exploit such, just that it's often not the case even if you have this overlap. In the event the container escape allows escaping as root (regardless of what user the container itself is running as), none of this matters. Container escape itself is very unlikely for reasons I've cited previously, but you can use rootless to minimize the impact further when you're extra cautious about security.


You shouldn't be proposing a fix with a UID/GID change, when that's not reproducible for you on other similar images using that same UID/GID. Instead it's better to understand the difference between the images (or what you did) to cause the underlying failure scenario itself. Adjusting the UID/GID may have "fixed" it, but that's not necessarily the correct solution... you've tried connecting some dots, but from what you've described the same problem could occur if both images changed to the same UID/GID values and you repeated the steps, it may have nothing to do with the existing host assignment.

I would say that the +500 servers I just checked is effectively a reproduction of the issue, hence it is OK for me to propose a fix or workaround for this problem since the distro comes before the container and the container is the has the overlapping UID.

Please re-read the part of my response you quoted there. I've taken extra care to re-iterate that earlier in my current reply to you about why you're mistaken with your "reproduction".

Your concern isn't even about the overlap in the image, it's about the UID/GID the process runs as (regarding container escape) and when writing files as the UID/GID.

  • You should be able find all image / container content with ownership of UID/GID in that container at /var/lib/docker/overlay2/<hexadecimal>/merged/ for example. How's that different to your volume concern here storage wise? Or you weren't aware of this?
  • That UID/GID cannot be reliably assumed to not conflict on the host. The desire for PUID/PGID ENV or similar support is fine if you are not comfortable with root in rootful containers and for whatever reason can't justify more security focused rootless containers that resolve this far better for you. (EDIT: I was under the impression --user wasn't sufficient, never mind)
  • I don't consider nobody any better if you want to use ownership/permissions as they're intended for files, especially if you're concerned about security since you now allow access to any sensitive data owned by nobody? (go with your container escape concern) This is a far more pragmatic example of risk, no? So the proposal just shifts your concern, it's not a proper solution, just convenient.

FWIW non-system users (eg UID 1000) can be just as bad, if not worse on systems where such a user is granted access to the Docker daemon without requiring credentials via sudo to use the Docker CLI. If that host user is compromised or a container escape occurs, it grants root access to the system. Yet it's common for developers to do as a convenience on their systems (granted not production systems).

Your concern with separating system UID range from the container is not one I agree with. You would have the same with a VM guest and mounted storage to the host that is equivalent to volume mounts. Migrating services to another system isn't uncommon either and that host OS may differ, or it's package installs may also differ in that the two systems even with the same host distro for the OS do not align with their UID assignment. Even without containers or VMs, in that scenario migrating data needs care to re-map the UID assignment, yet I'm sure you'd agree both systems are right to configure their same services with system UIDs even when they may differ unintentionally, with some overlap/conflict like discussed here.

Please take this into consideration.

Image variants should be compatible with their UID/GID for the containerized service. However changing them will also impact all existing users of the images, that is a breaking change. I haven't reviewed the image in full, so the valkey group itself may not have much relevance but changing the UID would.

The valkey user and group are just UID/GID's the valkey-server changes to after start and can be used for volume mapping. The actual integer values are irrelevant as long as they are withing the valid range on the host OS.

This was earlier in my response before I got to your mention of the entrypoint script already handling it 👍

It's absolutely a breaking change when an image doesn't have an equivalent entrypoint, which is my initial assumption for images that lack a feature to customize the runtime UID/GID.

I'm not sure what you're referring to with valid range on the host OS, unless you mean assignments exceeding 2^16 like you'd have on the host within /etc/subuid, systemd has some good docs on that and various caveats/usage to keep in mind.


Slight rant:

Trusting images that you don't build and maintain yourself is often problematic like this as the image authors have different opinions and expertise that you can either defer trust to them or ensure consistency on your end.

As such I have to inspect if an image uses VOLUME instructions for example because of the various problems those have caused me in the past.

The quality of images vary, sometimes it's simpler to just choose a base image with a package of the software you want, optionally pin the package (Fedora releases for example do update packages across a release cycle, while I've found Debian to be averse to patch releases to address bugs/regressions from a new major release version that a Debian release upgrade introduced 🤦‍♂).

Some of the official images for projects like PHP/Python/Node/Rust/etc are more aligned with building that software from scratch, which has it's uses sometimes when you want flexibility in version and base selection as if it were a package. When you disagree with what they do, like the user/group assignment, you can either extend the image with your own modifications, or keep it simple/predictable with the earlier approach I described (a common build image can be used for other projects when that's not an option).


Yes, <1000 UID is typically reserved for system users, the range is configurable but please understand what a system user is... Valkey qualifies as a system user as it would when installed on the host.

Valkey does when installed via a package manager or manually on the distro, but not in a container as the container's hardcoded values are not part of the operating systems overall management and that is where this issue arises from.

Disagree as already covered in this response. Hard-coding has nothing to do with it, you'd get a system user installed in the container implicitly it were handled as a package rather than manual build.


I personally wouldn't create any custom user/group for Valkey and just have the image run as root, directing users to rootless containers when that's a concern, or creating a host path with the needed ownership before bind mounting a volume for persistence and running via --user to whatever they want.

When the container needs to run as root internally to perform/support certain functionality, --user isn't necessarily viable, persistence can be more problematic to manage, and this is where leaving the container runtime user as root works better with the distinction being rootful vs rootless containers.

As a maintainer of a less conventional but quite popular Docker image that runs multiple services with varied ownership requirements, this is a necessity and makes a lot more sense without introducing extra complexity for security paranoia requests from users that refuse to adopt the more secure approach (rootless) due to their own comfort with rootful (switching to rootless is deemed inconvenient friction). I say this in reference to my various experiences from users across projects/communities, rather than this Valkey specific discussion.


Yes, it can, but in this case we are talking about a container, not a package managed application that creates a normal user and then gets extra permissions to e.g. run natively on port 80 without doing it like e.g. Apache that starts as root and then binds to port 80 and then switched runtime userid.

It's not that different with containers.

The notable difference is that regardless of the container listening on port 80 or 8080 in the container is if that capability is granted to a non-root user, or the tunable modified (as it is by default for Docker) to allow non-root users to bind ports below 1024.

When that container publishes a port mapping to the host, such as 80:80 or 80:8080, the host mapping is unrelated to whatever privilege that container user has. Podman rootless I think will have an issue with it, like you would trying to do so with any service on the host without privilege (or the mentioned workarounds).

One concern with port publishing is it disregards any firewall rules, since Docker manages the rules directly as a rootful service, less noticeable with firewalld which can leverage a specialized zone.

I've been aware of exploits related to this:

  • Even when the port isn't published IIRC it was possible on L2 switches for a system to connect to another docker hosts containers that were only intended to be accessible via 127.0.0.1. Firewalld was immune to this when the docker zone was used.
  • Another is with IPv6 (resolved since Docker v27 IIRC) host connections that'd route through an IPv4 bridge gateway IP (which if a service trusted a private subnet, would accidentally trust foreign public connections to the server).

I realize this information may seem a bit off-topic, but it's to further highlight what sounds like an XY problem in the reported issue itself (the errors encountered), by referencing lesser known and implicit gotchas with Docker that I've been involved in (I've encountered and reported a fair amount, along with providing evidence / justification to get them resolved over the years).

The UID/GID overlap concern with the host is a different thing that I don't mind discussing for the potential benefit that either of us learn something that adjusts our stance where there is disagreement.


True, but the main issue here is the lack of documentation for the container on Docker Hub which does not state either the UID the container want's to run as, nor informs potential users that --user <string|int> can be used.
The above entry point is quite good and usable, but requires going through the repo to find it to see if it can be used or not.

The main issue is you are reporting Valkey's image is causing an error due to it's choice of 999 UID + GID, while insisting that this does not happen with other images that do the same, and that the host OS having these already assigned somehow is at fault for that (other concerns aside).

Your report is rather vague and light on details to reproduce that, I don't even know what the error is, just that you encountered one. So for the most part we're debating the UID/GID concern from different perspectives and experiences.


As already discussed, yes the Valkey image could better document the default user behaviour (root init, followed by switch to valkey / 999), and that it's still supports --user (some images do not have the conditional guard to support skipping the root + non-root switch, it can depend on init requirements, so instead of --user they rely on other methods). This documentation concern is fairly common across images unfortunately.

The image's entrypoint script we've discussed specifically only runs the chown + setpriv (to switch running the valkey-server process as valkey UID/GID of 999) when running the container as the default root user, since both commands require that sort of privilege.

  • If you actually want to run valkey-server as root, you need to workaround that conditional (which was originally introduced in this PR (Feb 2016)).
  • If you want to run as another user just provide --user 1337 or --user nobody for example, but ensure that you mount a volume to /data with the equivalent ownership (should you want to persist data without errors).
# Either of these will work (--user 0:0 is implicit since the image didn't change it),
# either override the entrypoint or ensure the first arg isn't `valkey-server`:
podman run --rm -itd --name valkey --entrypoint /usr/local/bin/valkey-server valkey/valkey
podman run --rm -itd --name valkey valkey/valkey sh -c 'exec valkey-server'

# Save to disk:
$ podman exec -it valkey valkey-cli SAVE

# Despite the ownership of `/data` still being `valkey:valkey` / `999:999`
# we're running as root so no problem:
$ podman exec -it valkey ls -l
-rw-------. 1 root root 89 Feb 22 06:55 dump.rdb
# When choosing to run as a different non-root user:
$ podman run -itd --name valkey --user 1337 valkey/valkey

$ podman exec -it valkey sh -c 'grep Uid "/proc/$(pidof valkey-server)/status"'
Uid:    1337    1337    1337    1337

# Save to disk fails:
$ podman exec -it valkey valkey-cli SAVE
(error) ERR

# Inspect the logs while shutting down and you'll see similar permission errors:
# FIX: Mount a volume at `/data` with the correct ownership
$ podman stop valkey
$ podman logs valkey
# ...
1:signal-handler (1740211583) Received SIGTERM scheduling shutdown...
1:M 22 Feb 2025 08:06:23.518 * User requested shutdown...
1:M 22 Feb 2025 08:06:23.518 * Saving the final RDB snapshot before exiting.
1:M 22 Feb 2025 08:06:23.518 # Failed opening the temp RDB file temp-1.rdb (in server root dir /data) for saving: Permission denied
1:M 22 Feb 2025 08:06:23.518 # Error trying to save the DB, can't exit.
1:M 22 Feb 2025 08:06:23.518 # Errors trying to shut down the server. Check the logs for more information.

The common practice for rootful containers to support running as non-root users is to have an entrypoint like in this Valkey image does, starting as the root user to later switch to non-root after any initial setup (often with common ENV like PUID/PGID to support like LSIO documents for their MariaDB image).

As you can see from that link, the OSS org linuxserver.io handles the documentation/inconsistency gripe by maintaining various service images with solid documentation and consistency.

Valkey doesn't need the PUID/PGID feature since --user works fine with their image.


NOTE: This section is for additional context on the topic / caveats around differences. Feel free to ignore it.

Bitnami is similar, but opinionated differently. Unlike LSIO, Bitnami actually has a Valkey image like I cited early on in this issue, but their default user is non-root 1001. You can shell into the container to see Valkey run as that user:

$ docker run --rm -itd --name valkey --env ALLOW_EMPTY_PASSWORD=yes bitnami/valkey
$ docker exec -it valkey bash

$ grep valkey <<< $(ps -u)
1001           1  0.6  0.5  55864 10424 pts/0    Ssl+ 02:43   0:00 valkey-server 0.0.0.0:6379

If you try to run as your own UID/GID, it'll fail too to no surprise:

$ docker run --rm -it --user 4242:7777 --name valkey --env ALLOW_EMPTY_PASSWORD=yes bitnami/valkey
mkdir: cannot create directory '/opt/bitnami/valkey/tmp': Permission denied

It would kinda support setting the ENV VALKEY_DAEMON_USER=nobody / VALKEY_DAEMON_GROUP=nogroup if they wasn't treated as "read-only" (overwritten) here, thus the Bitnami image doesn't really support custom user/group either.

However when the image is run as root with --user 0:0 instead, it'll trigger this logic running valkey-server as the valkey user (1000):

$ docker run --rm -itd --user 0:0 --name valkey --env ALLOW_EMPTY_PASSWORD=yes bitnami/valkey
$ docker exec -it valkey bash

$ grep valkey <<< $(ps -u)
valkey           1  0.6  0.5  55864 10424 pts/0    Ssl+ 02:43   0:00 valkey-server 0.0.0.0:6379

$ grep valkey /etc/passwd
valkey:x:1000:1000::/home/valkey:/bin/sh

That creates a valkey user at runtime with a default UID of 1000. This uses their own common scripts across their images to support, so it's likely intended to dynamically handle an event event where you've extended the image and would have introduced a conflicting UID or your own valkey user.

It does assign that valkey user with a login shell as we can see (which shouldn't be necessary). While not really applicable to most containers due to the sandboxed nature, that has been exploited previously on a host system running Redis, where the attacker was enabled to write an SSH key to disk via Redis and log into the server as that service user (I am aware that you can prevent this even with a login shell):

Perhaps the most notable example is SSH. If you successfully authenticate as a user over SSH, SSH then launches the user's login shell (or uses it to run the command you provide, if you use the ssh [email protected] 'command to run' syntax).

I'd left a Redis instance with no password open to the world. It got targeted by the Redis crackit attack, in which the attacker inserts an SSH public key into your Redis database, then sends a command telling Redis to write the contents of the database to .ssh/authorized_keys, then tries to SSH in.

I was saved from the consequences of my own incompetence only by the fact that the maintainers of the redis-server Apt package (which I'd used to install Redis) had the wisdom to make it run Redis as a redis user who has no home directory or login shell; had it not been for them, my server would likely have been ransomwared or ended up part of some hacker's botnet. - Source

This official Valkey image also creates a system user with a login shell:

$ docker run --rm -it valkey/valkey grep valkey /etc/passwd
valkey:x:999:999::/home/valkey:/bin/sh

But then again the other official images maintained by the docker-library org do the same:

$ docker run --rm -it postgres grep postgres /etc/passwd
postgres:x:999:999::/var/lib/postgresql:/bin/bash

FWIW, when I install both DB packages into a Fedora / Alpine container, we have:

$ docker run --rm -it fedora
$ dnf install -y postgres valkey
$ grep -E 'postgresql|valkey' /etc/passwd
postgres:x:26:26:PostgreSQL Server:/var/lib/pgsql:/bin/bash
valkey:x:998:998:Valkey Database Server:/dev/null:/sbin/nologin


$ docker run --rm -it alpine
$ apk add postgresql valkey
$ grep -E 'postgresql|valkey' /etc/passwd
postgres:x:70:70:PostgreSQL user:/var/lib/postgresql:/bin/sh
valkey:x:100:101:valkey:/var/lib/valkey:/sbin/nologin

The need is often cited as other tooling/automation needing to run postgres CLI tools like cron, or a user switching to the postgres user via su to run commands like pgsql, but otherwise isn't actually necessary beyond convenience.


However for a common ENV configurable I see across images with PUID / PGID, I suppose that works and is useful when the container runs with a non-root user instead of leveraging rootless containers 😅

Perhaps - I'm mostly interested in not having hardcoded UID/GID's in containers running on host servers where they have no clue what is running underneath. Yes there are rootless and --user options but if the container was created with "this is running in a container and should not overlap with the host operating system" then there would not be an issue.

While fixing this for you in the way you would like might make you happy, it'll likely make others unhappy. I see this sort of thing happen a lot in projects too when it's the easier route to resolve an issue rather than pull it apart and address the concern properly (your errors / access issues).

Rootless will prevent the overlap, use it (far quicker to do that than participate in this lengthy discussion). I consider rootful containers with non-root users for security intent more of an anti-pattern, I've made that stance rather clear. The fact that it can be supported to a certain degree is what causes these concerns (and the various issues they often introduce).

Docker's bind mounts will implicitly create the directory on the host for you when it doesn't exist, provided you use the --volume definition, the newer and more capable --mount option does not inherit that behaviour. Similarly with Podman, since that came later on it also doesn't implicitly create the host path, even with --volume and rootful containers, you're expected to do that yourself in advance. I mention this since the convenience while nice also leads to frustration for users attempting non-root/rootless when a volumes host path lacks the permissions to be created or is mismatched in ownership from the containers user.


Off-topic: That conditional would look better like (works with both bash and ash, which /bin/sh symlinks to):

Depends - the image is alpine IIRC so it does not need to be usable by other base images.

This image publishes Valkey with both Debian and Alpine bases.

You should generally prefer the Debian one if you've been hit by various Alpine gotchas (often related to musl or DNS concerns), some of the Alpine issues are now historical or rather specific to what you're using the image for (default memory allocator for example is notable worse, try build/run rust or python for example vs using glibc or mimalloc).

Deno's official alpine image for example copies over a glibc build and uses LD_LIBRARY_PATH instead of leveraging patchelf to use a relative path. That breaks stuff (attempting apk add patchelf in that image and trying to use it should be an easy example of that), yet it's an official image and this sort of thing happens alot because this is not the speciality of many project developers it relies upon someone with the expertise to come along and correct it (I've been meaning to, but I've got a long backlog).

The Dockerfile for Valkey if I'm not mistaken is likely forked from Redis docker-library org copy and modified accordingly with what is familiar. Since the docker-library org images I referenced all do the same user/group management for the DB's with 999 set, many projects that reference that are going to follow along too when that's what has been adopted by these more reputable images and the lack of that changing for so long is going to be a stronger case for image authors to be reluctant to deviate from.

If anything, should you still hold a strong opinion that the container UID/GID is the real problem here to resolve, you want to shift the discussion to docker-library and get such change more broadly adopted... however I don't think you'll have success in convincing them of such. Their views will be similar to mine on the topic I imagine.


Yes, hence the best practices to not create users in containers that are in well know UID/GID ranges for system users or accounts to specifically avoid that kind of scenario and nobody is a good candidate to use by default as it is a non-privileged user on most systems.

Sorry, doesn't make sense to me here. For clarity so that we're on the same page, when you've explicitly mounted a volume to the host, anything written to that location is where your concern is?

No - I ran the container with a volume mount to a path on the host - using Docker. When I looked at the path it was owned by the hosts user systemd-coredump and then i looked into it and found out that the UID valkey runs as in the container is 999 which is <1000 which IMHO it shouldn't as it is technically running as another service on the host which is not good. Setting the UID/GID in the container to something high or the default UID for nobody on most hosts then this would be less of an issue.

Yes, you had Docker daemon run a rootful container, one that runs as a user other than root and wrote to that location as that UID (999), along with the ownership change due to this images entrypoint choices to support that.

There is no surprise there, that's exactly what my question was asking if your problem was about the volume host path having data written to it that was assigned ownership to 999. This was allowed because it's a rootful container, rootless would not permit it.

With rootless, you would have the non-root user of your choice run, and it would typically map the containers root UID to that host user. So now whenever the container writes data to a volume as root it'll appear as UID 0 in the container shell, but outside of the container the host will show your host users UID for that same file due to the mapping.

Then to support any other UID/GID in that container such as 999 in this case, additional mappings (including a range) can be done, using the subuid/subguid bounds defined for that host user, these are outside the standard 2^16 range, which is exactly what you're asking for and wanting.... Please just use rootless if this bothers you so heavily, it's literally what it's meant for, image authors don't have to think about it and the sysadmin has the host decide what UIDs to manage.

If you would like a quick pointer or example I can demonstrate this with Fedora where it's very simple (their Podman package basically automates much of the rootless setup for you by default).


FWIW you should be able to inspect your processes on the host, you should find that the process valkey-server would be visible and running as it's assigned UID 999:

# Rootful:
$ docker run --rm -itd --name valkey valkey/valkey

# Run on the Fedora host that is running the `valkey/valkey` container:
$ grep valkey-server <<< $(ps -au)
systemd+ 1748578  2.8  0.5  55868 10140 pts/0    Ssl+ 04:57   0:00 valkey-server *:6379

# On the Fedora host, 999 is assigned to systemd-oom (for reasons I've already demonstrated earlier)
$ grep 999 /etc/passwd
systemd-oom:x:999:999:systemd Userspace OOM Killer:/:/usr/sbin/nologin

And here it is with rootless podman:

# Rootless:
$ podman run --rm -itd --name valkey valkey/valkey

# Run on the Fedora host that is running the `valkey/valkey` container:
$ grep valkey-server <<< $(ps -au)
590822   3642731  0.5  0.5  55868  9928 pts/0    Rsl+ 17:59   0:00 valkey-server *:6379

# And now the container, which lacks `ps` command so we'll query it differently:
# See that internally it's UID 999
$ podman exec -it valkey sh -c 'grep Uid "/proc/$(pidof valkey-server)/status"'
Uid:    999     999     999     999

Here is how the UID mapping support is working:

# I am running as `my-user` with UID 1001:
$ whoami
my-user
$ id -u
1001

# I mentioned this file in the past, you can see it configures a host user
# with a large starting UID value and assigns a range of 2^16 from that point:
$ cat /etc/subuid
linuxuser:524288:65536
my-user:589824:65536

# This is implicitly created by podman,
# but when flexibility is needed Podman CLI supports it via `--uidmap`
# Rootless containers will map <container UID> => <host UID> => <range>
# Thus root user in the container is mapped to 1001 on the host,
# and every UID afterwards is incremented from 589824 on the host
# Since UID 0 is already remapped, UID 999 will be the 998th increment
# (since 589824 is the starting point with UID 1, sort of like 0-based indexing)
# thus you get UID 999 mapped to 590822 on the host,
# which `/etc/subuid` permits writing to `my-user` owned directories,
# but for hopefully obvious reasons cannot coalesce all container UID to 1001 on the host.
$ podman exec -it valkey cat /proc/self/uid_map
         0       1001          1
         1     589824      65536

Notice how using the CLI with rootless was the same as doing it with rootful? Yet the desired outcome is what you're asking for, and that does not require you to ask every image author to accommodate you with a UID that suits.


Documenting the requirement for the container could have made me prep the host to not have the systemd-coredump user on UID 999 or re-build the container and/or run it with --user or rootless or rebuild the container so it uses 65534 instead.

But neither was done on the test I ran.

Already covered agreement on the documentation being better.

I'm not sure what test you ran:

$ mkdir /tmp/my-nobody-owned-data-dir
$ chown nobody /tmp/my-nobody-owned-data-dir

# Ensure that `/data` has a volume mounted with the correct user ownership,
# `:Z` is required due to Fedora host using SELinux, otherwise writes would fail.
# This does assume `nobody` has the same UID between host and container (usually the case)
$ docker run --rm -itd --name valkey --user nobody -v /tmp/my-nobody-owned-data-dir:/data:Z valkey/valkey

# Works (unlike the earlier example I gave with `--user 1337`):
$ docker exec valkey valkey-cli SAVE
OK

There you go, works just fine! 💪


Otherwise it's a bit bizarre to expect no conflict in UID/GID with containers and the host, base images differ here by distro, and for each image on those base images, any further system packages installed will shuffle new UID/GID assignment accordingly, you can't do much about that beyond choose fixed UID/GID in advance (I've had to do this for ClamAV with it's DB for example so that it has a stable ownership across image upgrades).

No, what is bizarre is to think of an application running in a container as a system service on the host when the host can be any random OS that is setup in any kind of random way and only (for Linux based OS'es) mostly adhere to best practices e.g. keep services accounts below UID <1000 and users above (which I think started being the norm around ~20years ago as it used to be UID <500 when I started learning Linux in the late '90-ies (I'm old - sorry).

I've been using linux since 2008? I understand your concern but I seriously think you're looking at this the wrong way.

I know I've repeated myself a fair bit on this:

  • Transfer data between hosts, VM guest mount to host/network, Container volume mount.. your concern is the data boundary being crossed. It's really not that different from the other scenarios in that sense. VM guests can be escaped too. Multiple hosts can share network storage despite different service mappings between them and the storage host.
  • Rootless or user namespace remapping address this issue.

The earlier mention of containers filesystems being visible on the host with all their own internal UID/GID is another valid data point.

If you rather focus on the process(es) in the container and the UID/GID they run as, you'll find OpenShift and Kubernetes have support to automate setting that to a random UID (like via --user) when the image is compatible with that. I personally have no issue when I run rootful containers I'm treating them as those processes are equivalent to native packages on my system, if I feel uneasy about that I'll use rootless.

999 on your host is a system UID but not necessarily privileged... systemd-coredump itself will initially be invoked privileged to create a socket, but the related service runs unprivileged. Your UID/GID grants permissions (rwx) for ownership/access, but privileges are more to do with capabilities the process has (which utilities like setcap can grant at a file level aka "capability-dumb", or systemd can augment processes).

True, but polkitd is by nature is and there might be other users with other setups or services (monitoring/management/etc.) services running on the host before Docker is even installed making using UID's <1000 in containers (IMHO <65000) a problem.

I think you have a misunderstanding with the relation of privilege to capabilities, not UID/GID ownership?

No I'm not - I'm well aware of both and how they differ but I'm also aware that applications running in containers are not system services - they are containerized applications that are not part of the system hence they should not overlap with the host - that is kinda the whole idea aka keep them separate form the underlying host, ephemeral and host agnostic.

Use rootless then. You cannot expect image authors to satisfy everyone's expectations out of the box.

As I've stated, I have no idea how you would expect to handle an image which has multiple internal UID/GID usage to map for persistence on the host, and runs multiple processes with different UID. What does work and make sense in that scenario is rootless, it greatly simplifies that concern.


Again... you can't justify that way when it's not consistent across distros 🤷‍♂

Having the issue on RHEL using systemd is enough and I don't have to recreate on the 1000s of distros out there to justify a basic issue that clearly thinks a container is a host system service when it is not.

This UID/GID doesn't exist in Fedora by default unless systemd is installed (for desktop/server ISOs that's usually a given).

Did you test RedHat Enterprise Linux ?

You're missing the point. This is my fault for not being clear enough in my prior response.

I hope you understand what I meant about reproduction this time, and that I used the fedora container as an example of how host UID are not predictable (for Alpine 999 is assigned to the group ping).

Don't leave it up to the image to fix for you when this concern is one you have control over via rootless containers.

Rootless is not difficult to switch to and use, you'll have more headaches pursuing containers if you need to wrestle with this and think about with every single image you adopt, whereas with rootless it's resolved (thus you can avoid having this sort of debate with various projects, or maintaining your own image variants/patches as a workaround).

Alternatively, as Docker Desktop does... wrap rootful containers into a VM guest... kinda defeats the purpose but depending on your deployment needs a single VM to sandbox all those rootful containers may work for you. FWIW podman can also manage VMs.


The user (me in this case) didn't choose the UID 999 nor the existing systemd-coredump user having 999 as a UID. The problem/issue is that the container assumes, from multiple statements above and the docker-entrypoint.sh scirpt that it is an integrated part of the host operating system when in fact it is not

I honestly have no idea what you expect an image to do here, other than pick some randomly high UID value and pray that it's not ever used, nor conflicts or causes any other problems. I've already stated why centralizing on nobody can be bad, might not affect you but it can for other use-cases with containers.

The container supports you providing your own alternative UID/GID pair instead, should you want that. And you can run rootless if you'd rather not think about it (and everyone wins, at least until you need a privileged container, but again there is DIND which can kind of work around that).

This is not a new concern, it's an opinionated one (with valid solutions). If there was a major concern with this practice, you would not have the other official DB images also doing what Valkey is 🤷‍♂ (which it inherited from them anyway)


Push for the PGID/PUID feature if you like, or just use rootless containers. It's true that rootless outside of volume concerns do have other limitations, but that shouldn't affect most containers.

I'm not pushing for it - I made the ticket to make the team aware of an issue that seems to arise from "it is a system service and should have the same UID/GID to run" and "it works on my computer so no problem with this configu" and thinking that an application in a container is a native package managed system service which it is not.

Yeah, you gave a very vague report about the issue you had. With the experience you have, one would hope you could report a bug with a bit more information.

I'm still convinced this is an XY problem that I'm wasting an incredibly amount of time to point out and educate (unless you really just want to debate the UID/GID thing, which is entirely separate from "I have errors trying to use the Valkey image and I think it's because of the 999 UID overlap with my host's assignment for it").

I'm not sure what you are on about with Podman disadvantage... it's daemonless, unlike Docker. If you want rootful container, then run Podman commands as root?

I have some issues with volumes with some containers where my user can't create volumes for the container running without using sudo to make them. That is the kind I'm talking about. Running containers without volumes is not an issue.

That's a problem you are far more likely to encounter with with non-root containers (I don't like them) vs rootful containers (that stay as root) and rootless containers doing their remapping thing (but otherwise staying/appearing as root within the container).

If you can elaborate a bit better about your problem, I can probably guide you to an actual understanding of the issue you encounter and how to resolve that. I have a WIP docs page to guide users with rootless + Podman Quadlets (aka systemd unit config vs compose.yaml), it is rather informative on the whole --uidmap thing (when the default podman does isn't sufficient and instead you need something like --uidmap "+0@$(id -u)", which uses some advanced syntax that is podman specific AFAIK).

Keep in mind the implicit volumes issue I've mentioned can really be problematic with compose.yaml should you mess around with changing users or images with a Compose service name that remains the same and the image uses VOLUME (like all these DB containers do). That combination can quite easily cause errors or subtle bugs due to different images sharing /data anonymous volumes (switching between redis and valkey images for example, or mongo) or the lack of that data not being cleared when you create new containers (you need to explicitly request compose remove/replace the implicit anonymous volumes).

Using explicit volumes is much less of a surprise vs those problems, often easy to troubleshoot their failure cause and fix.


I'll repeat: A container is not a system service and should not overlap with anything on the host operating system to avoid conflicts as it is an isolated application. It might run a service that is considered a system service, but since it is not part of the system then it is not a system service, it is an isolated contanerized application with no access to the host operating system and it should ad-here to that state and not try to run as a host system package managed application that actually adhere to dynamic UID/GID's when installing to avoid overlapping UID/GIDs.

Strongly disagree.


Your problem is entirely resolved with ID mapping

Yes, but please keep in mind I just ran a test of the container as per the Docker Hub documentation and saw the UID overlap when I added a volume and then made you guys aware of it.

Again, I have no involvement in this project. I merely saw this issue when I opened my own to discourage the usage of VOLUME, and thought I'd chime in.

If you have an actual bug / error to report, then do so with more information otherwise we're debating opinions. AFAIK whatever test you've done was done wrong and led to the wrong conclusions for what caused the error.


but in the container this is not possible as it is hardcoded.

The hard-coding has no relevance...? Without that, it's not going to read the hosts users and groups, that'd be very bad.

The hardcoded UID is the root of this issue, so yes, it very much has relevance. How you do not see this is kinda frightening. Imaging running a container with hardcoded UID 1000 which is most desktop's users default UID with sudo rights and an application in the container escapes: full host control. Not good. Same with system service UID's <1000 as the person hardcoding 999 in the container does not know the host operating systems setup nor knows if it will be attached and escaped from.

This has become so drawn out that I was lost where I detailed the same exploit for UID 1000, but then I realized it's in this very reply that I'm still typing up hours later as I go through your reply.

So yes, I do know this and I'm still standing by my opinion that the decision for 999 is fine. A container escape is very unlikely these days unless you as the sysadmin choose to do something stupid, deploying a server with a user that is non-root but has the same ability to leverage docker as if it were root is foolish.

On a dev system it's a convenience at the expense of security, choosing to run rootful containers like that is their decision, but at least if it's only used internally instead of making it publicly accessible the risk of something accessing the container and escaping it should be low, unless the user is extra foolish to blindly trust any image they come across and that image is intentionally malware itself.

I'd be more wary of hard-coding UIDs like 1000 for that very reason with dev systems, the system range is safer in that regard since most are not that capable without the necessary capabilities granted to them. Feel free to demonstrate by switching to any of those users on the host system without any capabilities, I'd appreciate an actual threat... anything you find is likely going to rely on calling a binary that enables privilege escalation or write access to configs (when they likely shouldn't need that in the first place).

Use rootless container, crisis averted regardless of image author UID/GID decisions (unless you still have the host user permitted to run rootful containers without credentials?)


Yes questions are great, just a reminder that I'm not a maintainer of this image (I recall your first reply to me might have mistaken me for one).

I might have for which I'm sorry. Long thread :-P

I see we are both replying iteratively instead of reading through a full response first 😅

That explains the repetition 😂

BTW, I mean no disrespect in my responses where I'm potentially over-explaining things you already know as a sysadmin, but I have enough experience as a sysadmin and with containers that I feel I can weigh in that this sounds like an XY problem with potential knowledge gaps on your end for the more niche aspects.

No worries, but apparently we've learned about containers differently. I've been with it since the beginning and know about rootless and --user and rebuilding images I dislike, to adhere to practical security practices to e.g. changing UID to something that does not overlap with the host system aka <1000 and often use >65000 to be sure as I've yet since '98 seen users be created in that UID range. I often use variables to set UID/GID dynamically in my won containers like you already mentioned.

Accounts aren't really meant to be created on the host beyond 2^16 IIRC I'd have to lookup my resource on that (EDIT: Here you go), slight risk of conflict depending on environment.

If you know about rootless (not non-root) or the various caveats I've brought up like with VOLUME and Docker Compose, then I don't know why you've been encountering an error (or have any issue with the explicit volumes you mention) and pointing the blame to a host UID/GID assignment that matches what valkey-server runs as by default in this Valkey image 🤷‍♂

Container escapes is a separate thing, your error though is an XY problem unrelated to the host UID overlap.

Surely in this extensive reply from me somewhere there's an a-ha moment for you with a knowledge gap that led to your error experience (because I'm rather certain something isn't right when you claim the other images running just like Valkey magically avoid the error).


You do seem rather knowledgeable but something seems off with the reported problem (errors and access) only affecting Valkey.

Thanks and you too.

My only problem here is the hardcoded UID of 999 which is something I will defend until my death... or containers automatically UID re-map containers to avoid such trivial issues.

I'm confused. Rootless containers automatically handle the re-map just like I demonstrated earlier in this response... I'm not going to fight your opinion to death, I don't have the motivation to convince you nor the time to sacrifice 😝

Locking down capabilities would be more advanced security practice. Using non-root users as default in images or adding the support for such is usually to benefit the majority audience that doesn't have a good grasp on such things and thus rely upon "best practice" advice from resources they trust.

We fully agree. However when using non-root users in containers then they should also be setup to be in a ID range outside the normal host operating systems the container is intended to run on aka well above >1000. In my opinion then it should be >65000 .

...but that's exactly what rootless does! 😭 (rootful can do it with ID mapped volumes, but as mentioned that may not work with Docker until the containerd 2.0 upgrade with Docker v28, nor do they offer simple syntax for that type of mount, you have to do it as you would on a host)


Evaluation is more than setting it up and running it in production - testing includes a lot of things. In this case testing wether or not the container actually works as I expected without altering and getting to know it.

I've not used Valkey before and needed to see if my previous Redis knowledge from my previous company worked and it seems to do with version 7, but seems to need updating for version 8 from what I can remember from the Valkey documentation about the future.

I've not used Redis or Valkey before myself...until today? I ran the commands as demonstrated in this reply, looked up the docs to figure out some way to cause it to write to the volume path, and bam I've tested the image and I wouldn't consider --user nobody altering it.


just like not having multiple services installed all trying to use e.g. the same port (80 for apache/nginx/etc.) - not good either but when it comes to hardcoded values in an ephemeral container created by someone else then it should not overlap like overlapping/identical system services such as webservers or as mentioned in another answer valkey, postgres, mariadb, mysql and mongo etc.

Are you seriously applying this same concern to ports as well? That would drive me nuts.

It's 100% ok for you to run multiple containers that all listen on port 80 internally, you can publish those each to different host ports if you like, but within the container a standard port is expected and should remain consistent....

Not that you would need to publish different host ports for containers providing services that can sit behind a reverse proxy.

If you're seriously doing this with ports, please please stop. Not needing to do that is one of the perks of using containers. There's already enough real pragmatic concerns with adopting containers (if your colleagues saw this comment with the variety of caveats you can run into, that'd probably discourage them enough, opinionated concerns aside).


Podman is not something my colleagues like - they have trouble enough understanding and accepting the use of containers to begin with as they only see issues with them vs normal package managed applications that they have been using for decades. I'll get them there, but I can't make them do 1000 new things that they don't understand from day 1, I have to do it document even if it means using Docker and its root daemon.

FWIW Docker does have rootless too, and if this is mostly a concern on the developers desktop systems, Docker Desktop while rootful is effectively rootless in the sense of it using a VM to manage Docker. You can use that on Linux too, which still provides the docker CLI. I mention this because Docker Desktop also has ECI (enhanced container isolation), should you need even stricter security requirements (this will add friction should a container require access to the docker socket).

Docker didn't have it when containers started to gain traction ~8ish years ago and that is when they (old farts) made up their mind about containers is a bad way to run things. So I'm on a quest...

FWIW I've been using Docker since 2016. Despite all the issues I have had with it I'd still advocate for containers. I'm reasonably aware of the history, but I've only ever worked with containers on a voluntary basis after 2016, so most of my knowledge and experience is from open-source engagement and personal projects rather than paid roles.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants