Skip to content

Add collectionName and databaseName attributes to MongoDbProvider #3322

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

jesmith17
Copy link
Contributor

Added support for defining the target collectionName and databaseName for the logs to the config file. This allows users to define the values via config instead of only configuring them via the connectionString.

@ppkarwasz ppkarwasz changed the title Main Add collectionName/databaseName attributes to MongoDB provider Dec 25, 2024
@ppkarwasz
Copy link
Contributor

@jesmith17,

Thank you for your contribution.

While I am not a MongoDB user, the current behavior of using the connection string to infer the name of the database and collection does not make sense to me. Your proposal looks like a valid improvement. According to the MongoDB connection string documentation the connection string has the form:

mongodb://[username:password@]host1[:port1][,...hostN[:portN]][/[defaultauthdb][?options]]

where defaultauthdb is documented as:

The authentication database to use if the connection string includes username:password@ authentication credentials but the authSource option is unspecified.

Therefore:

  • defaultauthdb does not seem to be the right value to be used as database name to store the logs: I imagine that some MongoDB installation might use a different database for authentication and to store the data, right?
  • the collection name does not even appear in the official connection string. It only appears in the Java-specific ConnectionString class. I guess that many users don't know that syntax.

@garydgregory, what do you think?

Copy link
Contributor

@ppkarwasz ppkarwasz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jesmith17,

This looks like a very valid improvement! 💯

The changes to the MongoDB provider should be documented. Can you:

  • Modify database.adoc and add a description of the databaseName and collectionName configuration attributes. IMHO these attributes should be marked as "required" in the documentation and the description should mention their fallback values (which are kept only for backward compatibility). The fallback values can be marked as deprecated and we might remove them in a future Log4j Core version.
  • Since IMHO providing the database and collection name explicitly seems like the right thing to do, all the examples and integration tests should be rewritten to use the databaseName and collectionName attributes.

@ppkarwasz
Copy link
Contributor

@jesmith17,

The integration tests in the log4j-mongodb module use Docker. If you want to run the tests locally, you need to have Docker available on your system and run:

./mvnw verify -pl log4j-mongodb -Pdocker

@garydgregory
Copy link
Member

I'm not a fan because this creates confusion with the connection string. IMO: It should be that EITHER you provide a connection string OR you provide all the bits and pieces to build a connection string. I don't like the idea of doing it half one way and half the other.

IMO, we should follow the KISS principle and only provide connection string support.

Just like with JDBC, you just need to know a driver's connection string syntax, which invariably changes from time to time, and I certainly don't want to have to change this code because there is some Mongo driver change in some version that breaks our code.

@ppkarwasz
Copy link
Contributor

@garydgregory,

I'm not a fan because this creates confusion with the connection string. IMO: It should be that EITHER you provide a connection string OR you provide all the bits and pieces to build a connection string. I don't like the idea of doing it half one way and half the other.

These were my initial thoughts too, until I dug deeper into the connection string.
From what I have understood from the connection string (mainly this documentation), it does not provide neither the database nor the collection name:

  • it does not contain the database name, because the defaultauthdb part is the name of the database, where the user is defined, not the database that will be used by the MongoDB client. The MongoDB client can use any database available on the server. Did I misinterpret this part?
  • the official connections string does not even have a "collectionName" part. The collection name is only defined by the Java-specific ConnectionString. So Java developers might know this alternative syntax, but MongoDB administrators might not know it.

As I mentioned before, I have almost no experience with MongoDB, so these interpretations might not make sense.

Just like with JDBC, you just need to know a driver's connection string syntax, which invariably changes from time to time, and I certainly don't want to have to change this code because there is some Mongo driver change in some version that breaks our code.

For JDBC you also need to provide the name of the table.

@garydgregory
Copy link
Member

So the real problem is that the Mongo documentation could be better, like in any software.

I am concerned this opens the door for more and more options to be added to "override" the connection string components, making maintenance harder.

We should only, IMO, offer configuration hooks for behavior that can't be achieved otherwise. Let's review:

The database as part of the connection string is documented here (where Google took me):
https://www.mongodb.com/docs/manual/reference/connection-string-examples/#std-label-connections-connection-examples
For example. 2 examples I picked from the page:
Self-Hosted records Database Running Locally:

mongodb://myDatabaseUser:D1fficultP%40ssw0rd@localhost/records

and fancier:
MongoDB Atlas Cluster that Authenticates with AWS IAM credentials:

mongodb+srv://<aws access key id>:<aws secret access key>@cluster0.example.com/testdb?authSource=$external&authMechanism=MONGODB-AWS

The collection is specified as "databaseName.collectionName", see our XML configuration files.

The classic problem is that we do not want to create support tickets like "My appender connects to A instead of B, why?" where we then have to say "Oh, that's because you specified the foo in both the connection string and in the other thing and this takes precedence over that, blah blah".

For these use cases of double configuration, we should throw an exception.

Who would port this to log4j-mongodb4? to master?

@vy
Copy link
Member

vy commented Dec 25, 2024

Agreeing with @garydgregory's sentiment that we should not add more configuration knobs, if the shared use case can be served with just the connection string. I can spare time for this further after the vacation (i.e., 2025-01-02), if not earlier.

@jesmith17
Copy link
Contributor Author

I agree that there is a slippery slope here, but specifying a table name for a db connection in the connection string is very very odd. Allowing the collection name via config instead of connection string puts this in line how the JDBC appender works which required a table name as part of the config. I would actually prefer to make this a config only part and have it override the connection string value. With the way the code currently functions you have to extract the db and collection name from the connection string to pass them to other code blocks anyway. But to preserve backwards compatibility I left it in.

@ppkarwasz
Copy link
Contributor

I looked into the Spring Boot configuration properties for MongoDB:

  • Spring Boot offers a set of properties (username, password, host, port, additional-hosts) that are mutually exclusive with uri. We should certainly not introduce those.
  • The database (and authentication-database) property can however be used **together** with uri`.

It seems that Spring Boot interprets the /defaultauthdb part as "default database to use for authentication and data", while it allows users to specify a different database for each one of these functionalities. We should probably offer a similar configuration option. As documented in Authentication Database, you can create a user in database foo and allow the user to access database bar, so we need at least (and at most 😉) an additional configuration knob for the database to use (or authentication database to use).

@jesmith17
Copy link
Contributor Author

I would also point out that spring doesn't have a configuration property for the collection, and that's really the key item here. They assume, appropriately, that you will have various objects being written to various collections inside the app. So you configure the collection for items differently than you do the DB.

That's really the pattern I think is needed here.

@jesmith17
Copy link
Contributor Author

So I am working on making the unit test changes but I am finding 2 key challenges.

  1. The MongoDbProvider and the MongoDbConnection classes aren't easily unitTested. Essentially they don't expose anyways to ask them about the database or collection (or even the connection for that matter) based on how they currently sit. So Im finding that I have to make some more changes to make them really testable.
  2. I am getting blocked by the io.fabric5:docker-maven-plugin and it's unhappiness that I am running on a M1 Mac. It can't seem to resolve any of the docker server connections so it fails when trying to start docker. I have verified that it's all running, but seem to be stuck. Has anyone recently hit this issue with the Mac M1 process and docker? According to the docker-maven-plugin this was supposed to be resolved in 0.38 and the version mine is finding is 0.45.1.

Copy link
Contributor

@ppkarwasz ppkarwasz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jesmith17,

This looks almost done to me, thanks! It is still missing some details:

  • can you modify the config files in log4j-mongodb/src/test/resources to use the databaseName and collectionName attributes?
  • can you add a changelog entry to src/changelog/.2.x.x? Just copy/paste an existing one and modify the data.

@jesmith17
Copy link
Contributor Author

Unable to do the changelog entry as this work is done on the main (3.x) branch and that file doesn't exist. I can work to back-port (or recreate these in the 2.x structure and can make those changes to that version as needed.

@ppkarwasz
Copy link
Contributor

Unable to do the changelog entry as this work is done on the main (3.x) branch and that file doesn't exist. I can work to back-port (or recreate these in the 2.x structure and can make those changes to that version as needed.

Sorry, I meant src/changelog/.3.x.x 😉

Copy link
Contributor Author

@jesmith17 jesmith17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look good to me

@ppkarwasz ppkarwasz requested review from garydgregory and vy February 12, 2025 08:42
@ppkarwasz
Copy link
Contributor

@garydgregory, @vy,

Could you review this PR?

@jesmith17
Copy link
Contributor Author

@vy Because the filestrucutres between main and 2.x are different I don't think there is a way to just retarget the code changes. In the 2.x branch the MongoDbProvider class is just a shell that wraps calls to MongoDb4Provider (which is in a different project). I can create a new PR with these changes (and the feedback above) and do that if that makes sense. but trying to port these changes to the 2.x branch ends up being a much bigger change to the core of the appender itself.

To your point on the changes to the IT tests to use the new format in all cases, I agree that a test needs to exist for using the old format, but I made those changes at the request of @ppkarwasz on Dec 29th. I don't personally have a strong opinion either way, so if you can both get connected and let me know which way you prefer it, I can make the changes.

@jesmith17
Copy link
Contributor Author

Actually for the IT tests, I just added an additional one that explicitly tests the old format. That's more descriptive and clear anyway.

@vy vy self-assigned this Feb 17, 2025
@vy vy added the appenders Affects one or more Appender plugins label Feb 17, 2025
@vy vy changed the title Add collectionName/databaseName attributes to MongoDB provider Add collectionName and databaseName attributes to MongoDbProvider Feb 17, 2025
@vy vy added this to the 2.25.0 milestone Feb 20, 2025
@ppkarwasz
Copy link
Contributor

@jesmith17,

Could you address @vy's suggestions?

@ppkarwasz ppkarwasz added the waiting-for-user More information is needed from the user label Apr 13, 2025
@ppkarwasz ppkarwasz moved this from To triage to Waiting for user in Log4j bug tracker Apr 13, 2025
Removed the configuration block for the maven-surefire-plugin as false is the default so no need to explicitly state it.
@vy
Copy link
Member

vy commented May 19, 2025

Hey @jesmith17! Are you still working on this issue? We are about to release 2.25.0 and we would really appreciate having this PR merged before the release.

@jesmith17
Copy link
Contributor Author

@vy all of the requested changes have been made. I updated the pom.xml to remove the configuration section and committed that to this same PR.

Are there additional changes that are needed to get this merged?

@vy
Copy link
Member

vy commented May 19, 2025

@jesmith17, CI is failing due to linting issues. Would you mind running ./mvnw spotless:apply, committing the suggested changes, and pushing those commits, please?

You can locally verify this as follows:

./mvnw spotless:apply
./mvnw install -DskipTests

@jesmith17
Copy link
Contributor Author

@vy

So the ./mvnw command never works for me. It tells me that my maven version has an issue with the distributionSha256Sum property. I have reported this before, but it just seemed to go round and round about build env.

But running
mvn spotless:apply only showed a change to the pom.xml which I committed.

At least I think I did, it's almost impossible at this point to distinguish which pull request is which. Let me know if it's better. I have never been able to run the full set of tests locally. See the above conversation thread for those details.

@ppkarwasz
Copy link
Contributor

So the ./mvnw command never works for me.

Did #3676 fix it for you?

@ppkarwasz ppkarwasz enabled auto-merge (squash) May 19, 2025 21:30
@jesmith17
Copy link
Contributor Author

Doesn't appear to have. I did a fresh pull before I did the spotless:apply and that would have included the new file, but I still got the same error.

@vy vy removed the request for review from garydgregory May 20, 2025 07:19
@vy vy removed the waiting-for-user More information is needed from the user label May 20, 2025
@vy
Copy link
Member

vy commented May 20, 2025

@jesmith17, could you please sign your commits and force-push, please?

@vy
Copy link
Member

vy commented May 20, 2025

@jesmith17, alternatively you can grant me write-access to your fork, and I can take care of any leftover work.

@jesmith17
Copy link
Contributor Author

Just invited you and gave you access.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
appenders Affects one or more Appender plugins
Projects
Status: Waiting for user
Development

Successfully merging this pull request may close these issues.

4 participants