Skip to content

.. #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open

.. #1

Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
104 changes: 48 additions & 56 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,44 +1,20 @@
# BFG
B-tree Filesystem Git attempts to enable git-like workflow for subvolumes. Commit, push, checkout, stash, pull..
# BtrFsGit - BFG

<p align="center">
<a href="https://pypi.python.org/pypi/btrfsgit">
<img src="https://img.shields.io/pypi/v/btrfsgit.svg"
alt = "Release Status">
</a>
<p align="center"><a href="https://pypi.python.org/pypi/btrfsgit"><img src="https://img.shields.io/pypi/v/btrfsgit.svg" alt = "pypi Release Status"></a>

BFG stands for "B-tree Filesystem Git". It borrows git concepts for operations on BTRFS subvolumes without a central location. Commit, push, stash, checkout, pull, etc. And it tries to get the most out of incremental send/receive. I built this because my scenario is not just simple backup, but also transfering subvolumes back and forth between multiple machines, where no one machine is a single source of truth. In other words, a desktop computer and a notebook, and a subvol with a bunch of VM images. And then maybe a bunch of external backup HDDs...

## user install:
```pip install --user BtrFsGit```


## dev install:
```
pip install --user poetry
`cd BtrFsGit
poetry install # only installs the executable into somewhere like `/.cache/pypoetry/virtualenvs/bfg-iXQCHChq-py3.6/bin/`. It just doesn't have a "development mode" like setuptools have with `pip install -e .`. So find that directory, and copy the `bfg` into your `~/.local/bin/`. But that's about to be [fixed soon](https://github.com/python-poetry/poetry/issues/34).
```



## status
Undertested, but `commit_and_push_and_checkout`, `remote_commit_and_pull` and other commands work. python-fire (the CLI lib) behaves in unexpected ways sometimes. Data loss could occur ;)

## why
I built this because my scenario is not just simple backup, but also transfering subvolumes back and forth between multiple machines, where no one machine is a single source of truth. In other words, a desktop computer and a notebook, and a subvol with a bunch of VM images. And then maybe a bunch of external backup HDDs.

## cool features
* It tries to figure out shared parents smartly, by walking the uuids of subvolumes of both filesystems. It doesn't just expect the last transferred snapshot to "be there", in a fixed location, like other tools do.
* No config files, just specify a source subvol and a target subvol (and the ID 5 mount point) on the command line, and in case of a remote machine, a ssh command to use.
* No config files, just specify a source subvol and a target subvol on the command line, and in case of a remote machine, a ssh command to use.

## what this doesn's do (yet?)

## what this doesn't do (yet?)
* snapshot pruning
* cleanup after failure / .tmp destination
* finding shared parent by simply listing the snapshots dirs
* config files

## todo
* what happens when there is only an incomplete snapshot on target?
* config files - for configuring your "remotes", for example.

## planned features
* automatically saving and propagating `sub list` dumps - to allow finding shared parents also for offine generating of send streams, even across multiple machine hops
Expand All @@ -52,30 +28,42 @@ I built this because my scenario is not just simple backup, but also transfering
* an attempt to immitate more of git, like merging, exact same command syntax, commit messages (well maybe commit messages would make sense, maybe as a backend to datalad?)..


## status
`commit_and_push_and_checkout`, `remote_commit_and_pull`, and other commands work, but shared parent logic and some other areas are still being improved. The CLI lib we use, python-fire, behaves in unexpected ways sometimes.


## install:
```
pip install --user btrfsgit
```

## example workflow

this is how i ping-pong my data between my two machines:
```
bfg \
--YES=true \ # no confirmations
--LOCAL_FS_TOP_LEVEL_SUBVOL_MOUNT_POINT=/nvme0n1p6_crypt_root \ # ugly hack
--sshstr='/opt/hpnssh/usr/bin/ssh -p 2222 -o TCPRcvBufPoll=yes -o NoneSwitch=yes -o NoneEnabled=yes [email protected]' \
commit_and_push_and_checkout \ # the command
--SUBVOLUME=/d \ # source
--REMOTE_SUBVOLUME=/mx500data/lean # target
--sshstr=$SSHSTR \
commit_and_push_and_checkout \
--SUBVOLUME=/d \
--REMOTE_SUBVOLUME=/mx500data/lean
```

...this:
* makes a read-only snapshot of /d/ in /.bfg_snapshots.d/<timestamp>_from_<hostname>
* comes up with a snapshot name, by default this is "{timestamp}_from_{hostname}"
* makes a read-only snapshot of /d/ as /.bfg_snapshots.d/{snapshot name}
* finds the best shared parent and sends the snapshot to the other machine over ssh
* receives it on the other machine in /mx500data/.bfg_snapshots.lean
* makes a read-only snapshot of /mx500data/lean in /mx500data/.bfg_snapshots.lean/<timestamp>_stash
* receives it on the other machine as /mx500data/.bfg_snapshots.lean/.incomplete/{snapshot name}
* makes a read-only snapshot of /mx500data/lean as /mx500data/.bfg_snapshots.lean/{timestamp}_stash
* deletes /mx500data/lean
* makes a read-write snapshot of the received snapshot, in /mx500data/lean
* makes a read-write snapshot of the received snapshot, as /mx500data/lean


And back:
```
bfg --YES=true --REMOTE_FS_TOP_LEVEL_SUBVOL_MOUNT_POINT=/mx500data --sshstr='/opt/hpnssh/usr/bin/ssh -p 2222 -o TCPRcvBufPoll=yes -o NoneSwitch=yes -o NoneEnabled=yes [email protected]' remote_commit_and_pull --SUBVOLUME=/d --REMOTE_SUBVOLUME=/mx500data/lean
And back: ```
bfg --sshstr=$SSHSTR remote_commit_and_pull --SUBVOLUME=/d --REMOTE_SUBVOLUME=/mx500data/lean
```

in this case my SSHSTR = `'/opt/hpnssh/usr/bin/ssh -p 2222 -o TCPRcvBufPoll=yes -o NoneSwitch=yes -o NoneEnabled=yes [email protected]"`

full output:
[example_session.md](misc/example_session.md)

Expand All @@ -85,24 +73,17 @@ see also:
## available commands
[docs](docs/bfg/bfg.md)

## prerequisites

### install
This isnt a proper python package yet. Python3.8 is expected. Checkout the repo, do
```
virtualenv -p /usr/bin/python3.8 venv
pip install -r requirements.txt

```
## prerequisites
### mount the root
#### problem
If you want to work with subvolumes mounted with `subvol=..`: This is how linux distributions set up your system by default. In this case, BFG would not be able to automatically find the filesystem path of a subvolume given its UUID, so, it wouldn't be able to call `btrfs send` with correct `-p` parents.
If your root partition is BTRFS, your "/" is probably not the true top level subvolume (id 5) of the filesystem, but merely "/@". This is how linux distributions set up your system by default. These non-toplevel mounts make it hard for BFG to map subvolume UUIDs to full filesystem paths, so, it may not be able to figure out the correct `-p` argument for `btrfs send` commands.
#### solution
make sure that the root subvolume of your BTRFS filesystem is always mounted. For example my fstab entry:
Make sure that the root subvolume of your BTRFS filesystem is always mounted. For example my fstab entry is:
```
/dev/mapper/nvme0n1p6_crypt /nvme0n1p6_crypt_root btrfs defaults,subvol= 0 2
```
For some operations, you will need to pass this mountpoint like so: `--LOCAL_FS_TOP_LEVEL_SUBVOL_MOUNT_POINT=...` or `--REMOTE_FS_TOP_LEVEL_SUBVOL_MOUNT_POINT=...`.

### avoid nested subvolumes
#### problem
To be able to make use of stash and checkout, the subvolume that you want to manage with BFG should not contain other subvolumes, so that it can be `btrfs subvolume delete`'d without affecting your snapshots or other subvolumes. (or possibly we could just `mv`?)
Expand All @@ -113,6 +94,17 @@ As an example, i have a subvolume `/data`, and by default, BFG will store all sn
#### problem
BTRFS doesn't make a subvolume read-only when it's `btrfs receive`-ing. If another program writes into it at that time, something bad will happen..
#### solution
don't do it!
Don't touch them! Snapshots in progress are stored in `.incomplete/`.


## dev install:
```
pip install --user poetry
`cd BtrFsGit
poetry install # only installs the executable into somewhere like `/.cache/pypoetry/virtualenvs/bfg-iXQCHChq-py3.6/bin/`. It just doesn't have a "development mode" like setuptools have with `pip install -e .`. So find that directory, and copy the `bfg` into your `~/.local/bin/`. But that's about to be [fixed soon](https://github.com/python-poetry/poetry/issues/34).
```

## current todo
* what happens when there is only an incomplete snapshot on target? it should be rw, so the receive should fail. Do we want to get into ensuring that a snapshot is complete before using it? This sounds more like snazzer territory, but otoh, we can at least refuse to use .tmp's by default?

* should we optionally store a history inside each subvol, say, `.bfg_parents.json`? Basically, every snapshot created off a subvolume would have its UUID recorded in a list. This would be another way to track down shared parents.
155 changes: 122 additions & 33 deletions bfg/bfg.py
Original file line number Diff line number Diff line change
Expand Up @@ -417,25 +417,78 @@ def remote_send(s, REMOTE_SNAPSHOT, LOCAL_DIR, PARENT, CLONESRCS):
_prerr('exit code ' + str(p2.returncode))
exit(1)

def _top_subvol_mount_pount(path):
"""
untested but this should remote the need to specify TOP_LEVEL_SUBVOL_MOUNT_POINT.
"""
fs_device = cmd(['df', 'path']).readlines()[1].split()[0]
with open('/proc/mounts') as mounts:
for mount_line in mounts.readlines():
mount = mount_line.split()
if mount[0] == fs_device:
options0 = mount[3]
options1 = options0.split(',')
for o in options1:
if o == 'subvol=/':
return mount[1]



def find_common_parent(s, subvolume, remote_subvolume, my_uuid, direction):
candidates = s.parent_candidates(subvolume, remote_subvolume, my_uuid, direction).val
#candidates.sort(key = lambda sv: sv['subvol_id']) # nope, subvol id is a crude approximation. What happens when you snapshot and old ro snapshot? It gets the highest id.
if len(candidates) != 0:
winner = candidates[0]
s._add_abspath(winner)
_prerr(f'PICKED COMMON PARENT {winner}.')
return Res(winner)
else:
return Res(None)
winner = s._find_common_parent2(s, subvolume, remote_subvolume, my_uuid, direction)
_prerr(f'DETERMINED COMMON PARENT: {winner}.')
s._add_abspath(winner)
return winner


def _find_common_parent2(s, subvolume, remote_subvolume, my_uuid, direction):
all_subvols = s.all_subvols_by_uuid(s, subvolume, remote_subvolume)
by_uuid_walking = s._parent_candidates_by_uuid_walking(all_subvols, my_uuid, direction)
if len(by_uuid_walking) != 0:
return Res(by_uuid_walking[0])
_prerr(f'looking up shared parents by walking UUIDs failed.')

by_path = s._parent_candidates_by_path(subvolume, remote_subvolume, all_subvols, my_uuid, direction).val
if len(by_path) != 0:
return Res(by_path[0])
_prerr(f'looking up shared parents by filesystem path failed.')


return Res(None)

def _parent_candidates_by_path(s, subvolume, remote_subvolume, all_subvols, my_uuid, direction):

# if subvolume or remote_subvolume is not an absolute sv5-mountpoint path,..
subvolume = s.subvol_abs_path(subvolume)
remote_subvolume = s.subvol_abs_path(remote_subvolume)


for k,v in all_subvols.items():
if not v['ro']: continue
if not v['machine'] == 'local': continue



def subvol_abs_path(subvol_path):
subvol_id = int(cmd('btrfs', 'ins', 'rootid', subvol_path))
r = subvol_record_by_id(all_subvols, subvol_id)
s._add_abspath(r)
return r['abspath']



def _add_abspath(s, subvol_record):
if subvol_record['machine'] == 'remote':
s._remote_add_abspath(subvol_record)
else:
s._local_add_abspath(subvol_record)







def _local_add_abspath(s, subvol_record):
if s._local_fs_id5_mount_point is None:
s._local_fs_id5_mount_point = prompt(
Expand All @@ -459,18 +512,46 @@ def _remote_add_abspath(s, subvol_record):
subvol_record['abspath'] = s._remote_fs_id5_mount_point + '/' + s._remote_cmd(['btrfs', 'ins', 'sub', str(subvol_record['subvol_id']), s._remote_fs_id5_mount_point]).strip()


def parent_candidates(s, subvolume, remote_subvolume, my_uuid, direction):


# ^ refactor these two
# if runner == s._remote_cmd:
#
# _remote_fs_id5_mount_point becomes a memoized property

# allow functioning without subvol5 mounted, but dont record abspath? this might be useful if we dont allow it on one machine but at least allow it on the other maybe? Dont need to have it mounted on receiver, if i determine parents by uuid or by simple fs path

# "if btrfs sub list would print an absolute, complete and reliable path of each subvolume, in a consistent way"
# abspath might better be named topvol_absolute_fs_path and ideally we'd care more about the part after the mount point path, to abstract away from concrete mount points?














def _parent_candidates_by_uuid_walking(s, all_subvols, my_uuid, direction):
candidates = []
for c in s._parent_candidates(subvolume, remote_subvolume, my_uuid, direction):
for c in VolWalker(all_subvols, direction).walk(my_uuid):
candidates.append(c)
_prerr('shared parent: ' + c['local_uuid'])
return Res(candidates)
_prerr('shared parent found by walking UUIDs: ' + c['local_uuid'])
return (candidates)


def all_subvols_by_uuid(s, subvolume, remote_subvolume):

def _parent_candidates(s, subvolume, remote_subvolume, my_uuid, direction):
#remote_subvols = _get_subvolumes(s._remote_cmd, remote_subvolume, is_sv5=False) ..
#local_subvols = _get_subvolumes(s._local_cmd, subvolume, is_sv5=False) ...

remote_subvols = _get_subvolumes(s._remote_cmd, remote_subvolume)
local_subvols = _get_subvolumes(s._local_cmd, subvolume)
remote_subvols = _get_subvolumes(s._remote_cmd, s._remote_fs_id5_mount_point)
local_subvols = _get_subvolumes(s._local_cmd, s._local_fs_id5_mount_point)
other_subvols = load_subvol_dumps()

all_subvols = []
Expand All @@ -490,58 +571,63 @@ def _parent_candidates(s, subvolume, remote_subvolume, my_uuid, direction):
raise 'wut'
all_subvols2[i['local_uuid']] = i

return all_subvols2

yield from VolWalker(all_subvols2, direction).walk(my_uuid)




def _get_subvolumes(command_runner, subvolume):
def _get_subvolumes(command_runner, subvolume, is_sv5=True):
"""
:param subvolume: filesystem path to a subvolume on the filesystem that we want to get a list of subvolumes of
:param subvolume: filesystem path to the filesystem that we want to get a list of subvolumes of
:return: list of subvolume records
"""

subvols = []
cmd = ['btrfs', 'subvolume', 'list', '-q', '-t', '-R', '-u']
for line in command_runner(cmd + [subvolume]).splitlines()[2:]:
subvol = _make_snapshot_struct_from_sub_list_output_line(line)
subvol = _make_snapshot_struct_from_sub_list_output_line(line, is_sv5)
subvols.append(subvol)



# mark read-only subvols
ro_subvols = set()
for line in command_runner(cmd + ['-r', subvolume]).splitlines()[2:]:
subvol = _make_snapshot_struct_from_sub_list_output_line(line)
subvol = _make_snapshot_struct_from_sub_list_output_line(line, is_sv5)
ro_subvols.add(subvol['local_uuid'])
#_prerr(str(ro_subvols))

for i in subvols:
ro = i['local_uuid'] in ro_subvols
i['ro'] = ro
#_prerr(str(i))
i['ro'] = i['local_uuid'] in ro_subvols



subvols.sort(key = lambda sv: -sv['subvol_id'])
return subvols



def _make_snapshot_struct_from_sub_list_output_line(line):
def _make_snapshot_struct_from_sub_list_output_line(line, is_sv5):
#logging.debug(line)
items = line.split()
subvol_id = items[0]
parent_uuid = items[3]
received_uuid = items[4]
local_uuid = items[5]

snapshot = {}

parent_uuid = items[3]
received_uuid = items[4]

if received_uuid == '-':
received_uuid = None
if parent_uuid == '-':
parent_uuid = None

snapshot['received_uuid'] = received_uuid
snapshot['parent_uuid'] = parent_uuid
snapshot['local_uuid'] = local_uuid
snapshot['subvol_id'] = int(subvol_id)
snapshot['local_uuid'] = items[5]
snapshot['subvol_id'] = int(items[0])

# could be omitted if you dont want to mount subvol5. Otherwise is assumed to be the path relative to subvol5 mount point. This means that you should be sub list'ing the mountpoint. "-a" appears to be total garbage too.
if is_sv5:
snapshot['sv5_path'] = items[6]

return snapshot

Expand All @@ -558,3 +644,6 @@ def main():

if __name__ == "__main__":
main() # pragma: no cover


# https://wiki.tnonline.net/w/Blog/Finding_subvolumes
Loading