-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do not manually generate Mongo ObjectIds on the client. Let MongoDB to create them. #238
Do not manually generate Mongo ObjectIds on the client. Let MongoDB to create them. #238
Conversation
Hi @sebaoliveri - Thanks for the PR! I'm struggling with this one a bit. Does it cause #219 to regress? It seems like it should. |
I think it's Mongo (Java) client who generates the |
It is not related to #219 and it has nothing to do with realtime collections. Issue: Not all expected Mongo Documents are read when using "currentEventsByTag". What happens is that when you manually create an ObjectID(A) (using BSONObjectID.generate()) in thread(1), there is a timeframe until ObjectID(A) is actually persisted. *ObjectID(B) The issue happens when "currentEventsByTag" is executed in between both persists. I am going to explain with a real case, providing more details. Let's say that I call "currentEventsByTag" twice (the second call is made 500ms after the fist call). 1) When I first call "currentEventsByTag" this is how mongo looks like:This is the query that reads events from Mongo: `val query = BSONDocument(
When the query above reads the list of events that I listed above, it returns a Source that once consumed the last event that the Source returns is row (18) (because the query sorts ObjectId by order Asc): So this ObjectId is the highest one of event list, and the last consumed by the Source. This ObjectId is the one I use as the Offset when I call "currentEventsByTag" for a second time 2) When I call "currentEventsByTag" for the second time this is how mongo looks like:I now passed in ObjectId("5d35bd59730000ae66336776") as the OffSet (The highest ObjectId returned by the Source provided by the query when called for the first time). But please note this row (that was not persisted when we call currentEventsByTag the first time): ObjectId("5d35bd59730000ae66336773") IS LESS THAN ObjectId("5d35bd59730000ae66336776") Because of this the query filter $gt
will completely discard row 19 and will provide a Source that once consumed will not provide this row from Mongo. Solution: Let MongoDB create ObjectIDs so chronology is guaranteed. |
Hi @sebaoliveri, |
…d, otherwise delegate the creation to MongoDB
Hi @yahor-filipchyk, I understand. We are already in PROD, using the driver 'ReactiveMongo' and we are not using realtime collections at all. I added some more code to only generate ObjectIDs manually when realtime collections are used. |
I just noticed this is closely related to #214 (symptoms). I think the fix to that is a side branch that is slowly progressing towards a monotonic counter / sequence numbers that should eliminate all of these problems. As for the latest changes, I have a couple concerns.
|
The only change for ReactiveMongo to be considered should be there. |
@scullxbones no, I am gonna close this PR |
Closing doesn't fix the leak, you need to contact GitHub to remove public trace, and anyway change the credentials. |
I was persisting 1000 events in a multi concurrent scenario, and then I was pulling them using "currentEventsByTag" passing in the correct offset each time but the total of events pulled was random between the range 850-999, but never 1000.
This is because the following:
Suppose that ObjectIds were natural integers. What I saw in Mongo was that documents were stored like this:
*1
*2
*4
*3
*5
*6
These numbers above are ObjectIds.
The issue occurs when I pull the events using "currentEventsByTag" and the last event I got from the Source is *4.
When I got *4 I keep the Offset, so the next time I call "currentEventsByTag" passing in that Offset, the query brings *5, *6 but it never brings *3, so that event is lost.
So because ObjectId is being generated manually, in a concurrent scenario there is a gap between the creation and the actual persist, so order of ObjectIds is not guaranteed in MongoDB.