-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
File locking and/or atomic writes of attributes (and blocks) #63
Comments
For now, there isn't any file locking for blocks, so access from multiple threads or processes must I had a look into this a while ago for the c++ side. There are two basic locking mechanisms on POSIX systems:
I did not move forward on this is as I am not sure how portable these solutions are (though supporting locking on POSIX first would be ok). |
|
Regarding the issue with the Getting the values from python to C++ might be a bit brittle / inconvenient though. But iIf the file locking for chunks works out I could give it another look. |
Not sure this helps but FWIW in Zarr the ProcessSynchronizer uses a package
called fasteners for file locking -
https://github.com/zarr-developers/zarr/blob/master/zarr/sync.py
…On Thu, 21 Jun 2018, 22:32 Constantin Pape, ***@***.***> wrote:
Regarding the issue with the AttributeManager:
I haven't looked into file locking in python, but another option would be
to
use the C++ implementation of reading / writing attributes from python
(and then using lockf in C++).
Getting the values from python to C++ might be a bit brittle /
inconvenient though.
That's why I reimplemented the functionality in python in the first place.
But iIf the file locking for chunks works out I could give it another look.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#63 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAq8QoR2nPbUrofKGEGUdlxWTxOQmGOzks5t_BD7gaJpZM4UysPy>
.
|
The solutions on stackoverflow for python are generally "use a separate lock file", which sounds kind of like solving a race condition with another race condition. It'd be much more portable though - as we tend to keep N5 files on remote servers, there is the possibility of different OSes trying to access and host them at the same time. Probably best not to go that route if the reference implementation doesn't, though. Windows seems to have its own file locking semantics, and in rust there's a library which acts as a wrapper around both that and flock. I think there are a few of those in python, in various states of disrepair. I was just thinking about pushing all the attribute stuff through the C++ layer. The "stupid but it just might work" solution to passing JSON objects between python and C++ would be to just pass the string - it is a serialisation protocol after all ;) The important thing would be to get the C++ layer to hold the file lock for as long as it takes to send the data to python, receive data back from python, and write it out. Testing these race conditions sounds like it would be a real pain, too. |
@alimanfoo: Thanks for the pointer. Do you happen to know if this works for shared filesystems? @clbarnes: Yes, just sending the json string would work :). Let's see if |
FIrst attempt at file locks for writing chunks (so far only implemented for n5) in #65 . I have no idea if this actually works, so we will need some proper test cases. |
I haven't tested fasteners on shared file systems. My guess is it will work on nfsv4, otherwise no idea. Also FWIW in Zarr for multithreaded parallelism we use thread locks instead of file locks. But the synchronisation mechanism is pluggable, in the sense that you can use any type of synchronisation with any type of storage, or no synchronisation at all if you know you will always align write operations with chunk boundaries. |
I guess we can test whether the locks are respected by creating a block, locking it from python, and then trying to write to it in the normal way. We could test race conditions by having one big block, spawning X python processes, and have them wait for a I could take a crack at this today. |
Here's my branch with the tests; I'm not sure if this testing strategy will work so I've based it on master for now and will rebase on the locking implementation if they (correctly) fail. https://github.com/clbarnes/z5/blob/race-tests/src/python/test/test_locking.py This depends on the assumptions that
|
Hm, tests are hanging a lot, always a threat with multiprocessing. |
Got the failures I wanted! I'll just check these refactors still work and then will rebase and PR. |
Putting this here for reference: |
It sounds like the only reliable method would be lock files, which feels pretty gross because you'd need to poll the file system, which can be pretty slow over a network. |
I was thinking about how to deal with the fact that when attributes are changed, the file is opened, read, closed, truncated + opened, and then written to. This may lead to a race condition if two processes are trying to write to the same attributes file. Unfortunately, python stdlib support for file locking is pretty terrible, from the looks of things.
I also had a quick grep of the C++ side of things and was wondering if there was any file locking when writing blocks. I'm going to start using z5py with a multiprocessing pipeline soon and it would be convenient if I didn't have to manage block access myself.
The text was updated successfully, but these errors were encountered: