Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Races between readdir/getdents64 and name access. #908

Closed
verygreen opened this issue Apr 14, 2016 · 16 comments
Closed

Races between readdir/getdents64 and name access. #908

verygreen opened this issue Apr 14, 2016 · 16 comments
Assignees

Comments

@verygreen
Copy link
Contributor

verygreen commented Apr 14, 2016

Currently borg reads the directory content, then iterates the names, does lstat on them and calls a processing function based on a file type.
This is racy as between lstat and open/whatever inside of the Archive.process_* method somebody might come in and replace the underlying file (atomically via rename, or via delete + new create).

If the scandir code in #905 is also extended to its logical conclusion and would skip lstat altogether, using file type to determine what process method to call, the race window would extend even more since there might be significant time passing between reading the directory and actually getting into some of the files later in the output.
Additional races are possible even inside of the process methods, some of which are documented in #906

I suspect the correct way to deal with this would be to pass in inode number into the process_* method.
The very first thing such method would then do is to open the name presented (with possible exception of process_symlink, since symlinks cannot be opened).
Code needs to be prepared to get ENOENT on open/readlink. if this happens, this should be handled as if this name was not even in the readdir list (since we lost the race to access it anyway).
Once the open is done, fstat on the file descriptor will tell us if inode number has changed and along with it the file type so that we might need to call another processing function (if we can pass an open fd there, we can totally avoid additional racing there).
The rest of the processing function needs to keep using fd-based accesses since in a live filesystem a file name might start point to a different location at any moment and assuming it's always the same inode is unsafe.

Additional problem is present in Archiver._process method for the directory handling.
Upon noticing that we are dealing with a directory, first archive.process_directory is called on the file name, second, we do readdir or scandir on it, bt by the time we get to it, the directory might disappear (which is ok, I guess, we'll just have an entry for an empty dir), or it might be replaced with another file type (also ok, I guess, we still get the empty dir in the archive) or it might be replaced by another dir - this is more problematic as we'd then proceed to insert wrong dir content into the archive which might pose problems starting from just inconsistent data in the archive and then security issues (imagine if the first scan had permissible access, but by the time we got to read the contents - we got one that was a lot more restrictive).
Finally even worse variation of this would be file-replacement - since iteration is happening with the full path every time (yuck), if I have
/path/dir1/file1
/path/dir2/file1
and I rename dir2 to dir1 just at the right time, I am backing up wrong file1 while processing dir1 and I am none the wiser.
The fix here would be to move all the iterating into process_dir().
it would start as usual with open (O_DIRECTORY), then once we have that fd, we are golden, we just need to use fgetdirents64 so that we read this directory at the fd not at some inconclusive path.
Also when we process the entries - we need to perform openat(2) to open that named in the directory referenced by directory filedescriptor so they don't switch under us.
I guess the alternative here would be for _process to be called not with a full name, but with a file descriptor, fstat it if needed, and pass it down to process_*.
On dir processing it would do iteration via fgetdents, perform openat(2) on results and pass that fd down recursively instead of the name.
For platforms not implementing fgetdents/openat, I guess it's possible to chdir into every subdir and do relative lookups, but returning to the previous level might get tricky then.

bountry (paid using borgbackup org funds): https://www.bountysource.com/issues/32968949-races-between-readdir-getdents64-and-name-access

@verygreen
Copy link
Contributor Author

Another attack vector that's easier to pull out is to rename a symlink into the file name between the stat and the open to make borg open another file.
To thwart that, all opens should also have "O_NOFOLLOW" open flag, then when such a thing happened, open would fail and we'd be safe.

Another worry, looking into process_file as an example, I see it takes st (the stat call output) and runs with it without rechecking! Whatever are those other fstat sources, they are not ending up in the archive so potential for things going wrong increase even more.

Basically I just need to catch a right moment when borg is processing something in my home directory, replace one of the files with a symlink to /etc/shadow (or even a hardlink! if on the same fs) and then get the sysadmin to restore that file for me (or use some automated system).

@ThomasWaldmann
Copy link
Member

ThomasWaldmann commented Apr 15, 2016

good finds about the race conditions also, maybe have 1 issue per thing you find.

@verygreen
Copy link
Contributor Author

The races are one issue, it's just really convoluted with many facets. Until you close the root cause, it's just whack-a-mole game. So I feel like this should remain a single issue.

I just filed #909 for the file_known_and_unchanged idea.

@enkore
Copy link
Contributor

enkore commented Jul 9, 2016

Windows has pretty much the same thing as inodes, just calls them differently (iirc no real official name, just MFT entry). They have 64 bit IDs, called "file index" or "file ID". nFileIndexHigh/ nFileIndexLow. NTFS also has 16 byte OIDs for files (that are also mentioned in #983 as metadata to back up), these are also unique but application-assignable.

I am unsure whether the file ID includes the counterpart of Linux'/Unix' "inode generation".

@ThomasWaldmann
Copy link
Member

ThomasWaldmann commented Aug 6, 2018

note: we use os.scandir(path) and it requires python >= 3.7 to support giving a directory FD instead of a path on UNIX.

see also: benhoyt/scandir#108

@borgbackup borgbackup deleted a comment from verygreen Aug 6, 2018
@ThomasWaldmann
Copy link
Member

ThomasWaldmann commented Aug 12, 2018

I had a look at open modes and got the impression that there is no way to get a fd to each possible fs item type that can be also used to get xattrs/acls/bsdflags. symlinks and blk/chr/fifo items also seem problematic with that approach.

I had a look at tar then and they seem to always use *at(parentfd, name) kind of calls:

https://git.savannah.gnu.org/cgit/tar.git/tree/src/create.c

So, assuming we have an open directory fd in parentfd and do multiple *at(parentfd, name) calls with it, is this good enough to avoid stuff changing in between these calls? Is the directory and inodes it refers to kind of frozen in the moment we open it (from the perspective of parentfd users)?

@enkore
Copy link
Contributor

enkore commented Aug 12, 2018

Is the directory and inodes it refers to kind of frozen in the moment we open it (from the perspective of parentfd users)?

No

@verygreen
Copy link
Contributor Author

What item types (outside of symlinks) do you think you cannot get an fd on? Should be doable for everything else.

parent fd is good, but end item is better otherwise a small race window still remains.
fgetxattr will give you attributes on that fd, acl_get_fd(3) gives you ACLs by fd. I am not sure what is bsdflags but there should be fd-specific way of getting those too.

@enkore
Copy link
Contributor

enkore commented Aug 12, 2018

I mapped the BSD-flags (nodump, immutable etc.) on Linux via the FS_IOC_GETFLAGS ioctl, so that always works through a fd anyway.

@ThomasWaldmann
Copy link
Member

Besides symlinks, I read some comments in the code that opening some devices may trigger delays or cause issues.

@ThomasWaldmann
Copy link
Member

Also, it is not totally clear to me, which fs items may have which attributes.
E.g. can xattrs or bsdflags be on anything?

@verygreen
Copy link
Contributor Author

yes, xattrs, ACLs and bsd attributes might be on any inode.

I think O_NDELAY might help with device blocking at times, but yeah, otherwise it's risky business and a path based calls might be warranted here (combined with the inode/fid check you got from your readdir/getdirents call). Granted you can always path based with inode check, but then you'll need some extra logic to retry on changes?

There are two types of changes you want to be aware of:
getdirent returns something, but subsequent lstat/open(NO_FOLLOW) returns ENOENT == valid race, the file was just deleted, don't include it at all.
lstat/fstat return a different fid/inode number - somebody switched the file under us - make sure you update the internal state to match this latest info even if the file type has changed, and so the contents can be properly updated too.

Also if the file type switches from one that results in path-based to fd-based logic - you want to retry the open anyway (the other way around supposedly should not matter? either the open will fail or it will block - and you have zero control over both).

@ThomasWaldmann ThomasWaldmann self-assigned this Jan 25, 2019
ThomasWaldmann added a commit to ThomasWaldmann/borg that referenced this issue Feb 17, 2019
scenario:

- x is a regular file
- borg does stat on x: is a regular file
- so borg dispatches to process_file
- attack: x gets replaced by a symlink (mv symlink x)
- in process_file, borg opens x and must not follow the symlink nor
  continue processing as a normal file, but rather error in open()
  due to NOFOLLOW.
ThomasWaldmann added a commit to ThomasWaldmann/borg that referenced this issue Feb 17, 2019
we must avoid a handler processing a fs item of wrong file type,
so check if it has changed.
@ThomasWaldmann
Copy link
Member

@verygreen can you review #4043, please?

in update_check_stat() i do not check for a inode change, only for file type change (and for now it will just skip the file if the latter happens).

is there a problem left with a inode change race if I use the current stat info (and if so, which)?

@verygreen
Copy link
Contributor Author

which particular patch? there are many in strange time order on #4043

if you do not check for inode number you are open for races like: Imagine I have /home on the same partition as /, the backup starts, gets a listing of files in one of my dirs and then starts working off files, meanwhile I work from the other end of it and use link(2) on some of my files so the name now point at some file I have no read rights to, like say /etc/shadow - the backup then gets /etc/shadow content into a file otherwise owned by me. And if there's a restore I get to see what's in that secret file.

@verygreen
Copy link
Contributor Author

ok, figured the commit and left some comments

ThomasWaldmann added a commit to ThomasWaldmann/borg that referenced this issue Feb 20, 2019
we must avoid a handler processing a fs item of wrong file type,
so check if it has changed.
@ThomasWaldmann
Copy link
Member

fixed by #4043.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants