Skip to content

Getting collations to work on Alpine using ICU #1004

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
lukaromih opened this issue Oct 14, 2022 · 2 comments · Fixed by docker-library/docs#2220
Closed

Getting collations to work on Alpine using ICU #1004

lukaromih opened this issue Oct 14, 2022 · 2 comments · Fixed by docker-library/docs#2220

Comments

@lukaromih
Copy link

I need custom collations and would really prefer to use Alpine as a base.
I know about Musl (currently) only supporting C or POSIX locales so instead of using a libc as a provider, I tried using icu.

My sources:

I couldn't figure out, why the Docker image postgres:13-alpine doesn't contain all icu locales, because this line in its build source would have me believe it's included already. If anybody could explain that, that would be bonus points.

So I decide to write my own Dockerfile, based on the example of locale customization in Debian based image.

My Dockerfile:

FROM postgres:13-alpine
RUN apk add --no-cache icu-data-full
ENV LANG sl-SI-x-icu

In DB, I test the working of collation:

CREATE TABLE test (some_field text);

INSERT INTO test (some_field)
VALUES ('BA'), ('be'), ('Bi'), ('CA'), ('ce'), ('Ci'), ('ČA'), ('če'), ('Či'), ('DA'), ('de'), ('Di');

SELECT * FROM test ORDER BY some_field;

It returns the values in the wrong order: "BA", "Bi", "CA", "Ci", "DA", "Di", "be", "ce", "de", "ČA", "Či", "če"
I get the same result with:SELECT * FROM test ORDER BY some_field COLLATE "C";

I do get the right result with an explicit: SELECT * FROM test ORDER BY some_field COLLATE "sl-SI-x-icu";
The correct order is: "BA", "be", "Bi", "CA", "ce", "Ci", "ČA", "če", "Či", "DA", "de", "Di"

So Postgres CAN use the collation if instructed so explicitly in every statement (unpractical).
If I do SHOW LC_COLLATE;, it returns sl-SI-x-icu. So it should work, right?
Well, if I change my dockerfile to intead say ENV LANG ghjfgjfc, the container sets up just fine and even SHOW LC_COLLATE; will return "ghjfgjfc" (a nonexistent locale), but keep on using C locale as default.
If I were to set the same gibberish as the LANG environment variable in a Debian based image, it would error out and shut down.

Am I missing something trivial? Should I be setting environment variables differently? Can't it be done with said variables and it could be achieved with a custom initialiaztion script? Should I just give up on Alpine, use Debian and be done with it?

@wglambert
Copy link

wglambert commented Oct 14, 2022

Maybe setting some more environment variables like LC_ALL and LC_COLLATE would apply the right sort order? #725 (comment)

@lukaromih
Copy link
Author

I've tried setting all the different LC_* variables to my desired locale to no avail.
However, being stubborn enough finally paid off, and I did find the solution!
Being able to use Alpine as container base is a victory to me. I hope this solution helps anybody, who appreciates lean containers.

The key is in version 15 of Postgres. And the timing couldn't be better, because it's only like 3-4 days old.
As mentioned in PostgreSQL 15.0 release docs:

Allow ICU collations to be set as the default for clusters and databases (Peter Eisentraut)

Previously, only libc-based collations could be selected at the cluster and database levels. ICU collations could only be used via explicit COLLATE clauses.

So, as per docs on locales and locale providers, the solution is a simple one-liner for initializing your DB cluster with any desired icu collation:
initdb --locale-provider=icu --icu-locale=en (just replace "en" with any other supported locale)

I first used my desired collation with Alpine with a simple custom Dockerfile:

FROM postgres:15-alpine
ENV POSTGRES_INITDB_ARGS "--locale-provider=icu --icu-locale=sl-SI"

Worked like a charm, so I skipped the need for a custom Dockerfile altogether and just passed that ENV in through the docker-compose.yml using the unmodified postgres:15-alpine image.
You could probably also add those 2 parameters to the postgresql.conf file and use that. Though I haven't tested that yet.

Might I suggest updating the docs (Locale Customization) to include the solution for Alpine based images.
Particularly the

Also of note, Alpine-based variants do not support locales

statement is not correct anymore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants