Skip to content

postgresql-style pre-startup init scripts hook #53

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

anentropic
Copy link

Here is what I mean in issue #52

I've just coded in the 2.6/ tree at the moment, I am seeking feedback... can add to the other versions if it's ok.

this is a direct copy of what they do in https://github.com/docker-library/postgres/

baking it into the library avoids end-users having to copy and paste the docker-entrypoint.sh and override it in their own project...

instead they can add a line like

COPY ./docker-entrypoint-initdb.d /docker-entrypoint-initdb.d`

to their Dockerfile

I noticed the rest of your docker-entrypoint.sh is already quite similar to the postgres one so I'm hoping there's a bit of standardisation on how to achieve this kind of thing emerging in the docker-library.

@@ -13,6 +13,20 @@ if [ "$1" = 'mongod' ]; then
set -- $numa "$@"
fi

# internal start of server in order to allow set-up using mongo client
gosu mongodb mongod --fork --dbpath=/data/db --syslog
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be gosu mongodb "$@" --fork --bind_ip 127.0.0.1? That way it is only available inside the container and we keep numa or any passed in flags.

I'm not sure if --syslog or --logpath /dev/stdout should be added, since one is required when using --fork.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the postgres image they don't take the $@ args when starting the temporary server... I'm not sure what the reasoning to or not to is. I think the only really important thing is that you're using the same data dir as the real server will be using.

I think maybe --logpath /dev/stdout would be best... all the logs are supposed to go there under docker, the main server ones will do, so it would be the same.

I'm happy with whatever you'd advise in both cases.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The datadir of /data/db is automatic (docs.mongodb).

We actually need to adjust this, postgres, and the other SQLs to start with passed in options, so that users could override things like datadir or run with --storageEngine=wiredTiger. I don't think using /dev/stdout for logpath will work, since that stdout won't be the same stdout as the bash process running it. We might as well just stick it in /var/log/mongodb/.

gosu mongodb "$@" --fork --bind_ip 127.0.0.1 --logpath /var/log/mongodb/mongo-init.log

The other option is to drop --fork and --logpath and just background it with & and stick in a "try to connect" loop.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a little more playing with this, and the following works successfully: (from tianon/gosu#8 (comment))

$ docker run -it --rm mongo bash
root@201b1fb3ccb3:/# chown --dereference mongodb /dev/stdout
root@201b1fb3ccb3:/# gosu mongodb mongod --bind_ip 127.0.0.1 --logpath /dev/stdout
2017-02-14T00:03:55.866+0000 I CONTROL  [initandlisten] MongoDB starting : pid=9 port=27017 dbpath=/data/db 64-bit host=201b1fb3ccb3
2017-02-14T00:03:55.866+0000 I CONTROL  [initandlisten] db version v3.4.2
2017-02-14T00:03:55.866+0000 I CONTROL  [initandlisten] git version: 3f76e40c105fc223b3e5aac3e20dcd026b83b38b
2017-02-14T00:03:55.866+0000 I CONTROL  [initandlisten] OpenSSL version: OpenSSL 1.0.1t  3 May 2016
2017-02-14T00:03:55.866+0000 I CONTROL  [initandlisten] allocator: tcmalloc
2017-02-14T00:03:55.866+0000 I CONTROL  [initandlisten] modules: none
2017-02-14T00:03:55.866+0000 I CONTROL  [initandlisten] build environment:
2017-02-14T00:03:55.866+0000 I CONTROL  [initandlisten]     distmod: debian81
2017-02-14T00:03:55.866+0000 I CONTROL  [initandlisten]     distarch: x86_64
2017-02-14T00:03:55.866+0000 I CONTROL  [initandlisten]     target_arch: x86_64
2017-02-14T00:03:55.866+0000 I CONTROL  [initandlisten] options: { net: { bindIp: "127.0.0.1" }, systemLog: { destination: "file", path: "/dev/stdout" } }
2017-02-14T00:03:55.872+0000 I STORAGE  [initandlisten] 
2017-02-14T00:03:55.872+0000 I STORAGE  [initandlisten] ** WARNING: Using the XFS filesystem is strongly recommended with the WiredTiger storage engine
2017-02-14T00:03:55.872+0000 I STORAGE  [initandlisten] **          See http://dochub.mongodb.org/core/prodnotes-filesystem
2017-02-14T00:03:55.872+0000 I STORAGE  [initandlisten] wiredtiger_open config: create,cache_size=15562M,session_max=20000,eviction=(threads_max=4),config_base=false,statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000),checkpoint=(wait=60,log_size=2GB),statistics_log=(wait=0),
2017-02-14T00:03:56.106+0000 I CONTROL  [initandlisten] 
2017-02-14T00:03:56.106+0000 I CONTROL  [initandlisten] ** WARNING: Access control is not enabled for the database.
2017-02-14T00:03:56.106+0000 I CONTROL  [initandlisten] **          Read and write access to data and configuration is unrestricted.
2017-02-14T00:03:56.106+0000 I CONTROL  [initandlisten] 
2017-02-14T00:03:56.173+0000 I FTDC     [initandlisten] Initializing full-time diagnostic data capture with directory '/data/db/diagnostic.data'
2017-02-14T00:03:56.279+0000 I INDEX    [initandlisten] build index on: admin.system.version properties: { v: 2, key: { version: 1 }, name: "incompatible_with_version_32", ns: "admin.system.version" }
2017-02-14T00:03:56.279+0000 I INDEX    [initandlisten] 	 building index using bulk method; build may temporarily use up to 500 megabytes of RAM
2017-02-14T00:03:56.280+0000 I INDEX    [initandlisten] build index done.  scanned 0 total records. 0 secs
2017-02-14T00:03:56.281+0000 I COMMAND  [initandlisten] setting featureCompatibilityVersion to 3.4
2017-02-14T00:03:56.282+0000 I NETWORK  [thread1] waiting for connections on port 27017

The main issue still remaining is that as written, this code will re-run on every startup of MongoDB, so we need a reasonably non-hacky way to determine whether a database has already been initialized. In the PostgreSQL image, we check for a specific file that Postgres itself always creates. In the MySQL image, we took a check from upstream which checks for a mysql database within the configured /data/db folder.

In the case of this image, we're even slightly more complicated because --datadir might be passed on the command line, and we need to be able to handle that intelligently (and it might be hidden behind -f in a config file too), so I think this might be layering hacks deeper and deeper. 😞 😢

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I didn't realise postgres only ran it once, I'd assumed the startup script should be idempotent and it wouldn't matter if it ran every time

@metal3d
Copy link

metal3d commented Nov 27, 2015

👍 for this PR.

I discovered the init paths for MySQL and Postgres docker images that are very usefull. If this PR is validated (as soon as you agree the startup command changes ;) ) I guess that a lot of people will be happy :)

@jshimko
Copy link

jshimko commented Dec 6, 2015

Just curious where this PR stands. I've been doing a fairly similar setup with a custom mongo container I put together, but I'd love to switch back to the official if I can get the same functionality out of it.

Is there anything else that needs to be done to this? Would be happy to contribute if there's still something else to do.

@aheissenberger
Copy link

👍 for this PR

@CpuID
Copy link

CpuID commented Jul 3, 2016

+1 to getting this merged once everyones happy with it

@juicemia
Copy link

Does anybody know more or less when this is going to be merged in?

This is a feature I'm in need of.

@tianon
Copy link
Member

tianon commented Jan 25, 2017

To summarize several issues with implementing this behavior in mongo (that I see right off while looking into what this would take to bring up to speed):

  • needs to use "$@" for launching the temporary mongod, especially so that numactl is used appropriately too
  • bind_ip and logpath issues noted above (postgresql-style pre-startup init scripts hook #53 (comment))
  • need to detect whether initialization has already happened so initdb scripts only run once, which is complicated by the fact that users can supply --datadir either directly or via --config to change from the default value of /data/db (so we can't simply look at /data/db to determine whether we're already initialized)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants