Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wishes for 2022 #896

Open
vt-alt opened this issue Dec 22, 2021 · 5 comments
Open

Wishes for 2022 #896

vt-alt opened this issue Dec 22, 2021 · 5 comments

Comments

@vt-alt
Copy link
Contributor

vt-alt commented Dec 22, 2021

Not to blame, but list of weakness of burp we sometimes getting. (Btw it seems development is stalled?)

  • Sometimes big trees (like unpacked Linux kernel source) is backed up very very slowly (many hours) with very small cpu load. We have this on 1/10 basis on notebooks. I wish to debug it more, but it's hard to reproduce (its 100% reproducible for the people when it's started to occur for them, but I cannot take their notebook for experiments).
  • On normal circumstances (and most important to first backups) - backup speed is limited with a single 100% cpu load by zlib compression. I would suggest to use better fast and parallelizable compression algorithms like zstd.
  • Restore of particular directory is very slow. Maybe this is related to that we can only restore by a regexp.

Recently I wanted to restore package database for several days like this:

$ burp -ar -b 0000687 -d 2021-11-21 -r '^/var/[^/]+/(apt|rpm)/' -v

It's ~400M, but one restore taking about a hour. Plus, when I wanted to relaunch command with time I cannot re-run restore quickly, because of repository lock and I should still wait a hour when server process finishes. Inability to parallel restore is bad.

  • I only use protocol 1. Protocol 2 is permanently not production ready. While competitors are already and for a long time use chunking deduplicated backups.
@grke
Copy link
Owner

grke commented Dec 22, 2021

Hello,

Yes I am a bit stalled at the moment, due to lack of time, and I am the only developer.
I intend to keep working on burp when I get some time.

Thank you for the suggestions.
I don't think implementing zstd is as simple as you might think. It requires parallel threads, which I think would basically require rewriting most of the internals of burp. And that wouldn't help if you had multiple clients backing up at the same time.
Actually - which part of the backup are you talking about here - phase2 or something else?

Some ideas for two of the speed issues above, if you are not doing these already:

If you have lots of small files to back up, you might want to turn off librsync (set librsync=0).

For faster restores, you might want to try using hardlinked_archive=1.
Backups that are hardlinked means that the restore doesn't have to apply any diffs when it comes to restoring a file, so it can just feed the bytes straight off the disk. You can see which backups are already hardlinked by standing in the client's storage directory on the server and doing an 'ls */hardlinked'.

@vt-alt
Copy link
Contributor Author

vt-alt commented Dec 22, 2021

Thanks for the reply and suggestions!

phase2 or something else?

Yes, where file transfer occurs.

@grke
Copy link
Owner

grke commented Dec 22, 2021

Yes, where file transfer occurs.

Do you see the 100% cpu on the client, or server, or both?

@pagalba-com
Copy link

I think if this is Windows clients, it can face Windows Task Scheduler reduced priority issue. Please look at https://aavtech.site/2018/01/windows-task-scheduler-changing-task-priority/
After some update, Windows changed default task priority.

@pagalba-com
Copy link

One more thing @vt-alt, while using rsync library for large files, low CPU and network usage can be seen on both client and server, while it is in progress of finding differences, especially for large files. So if there is already duplicate data, it is not sent, as well it is not processed. In some cases it is faster to set rsync library file size cut off in config file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants