Skip to content
This repository was archived by the owner on Jun 6, 2023. It is now read-only.

cleanupTempfiles.minutes - default value #61

Open
air3ijai opened this issue Jan 4, 2023 · 1 comment
Open

cleanupTempfiles.minutes - default value #61

air3ijai opened this issue Jan 4, 2023 · 1 comment

Comments

@air3ijai
Copy link
Contributor

air3ijai commented Jan 4, 2023

Hello,

We just did a test how the Pod will handle multiple restarts during backups.

  1. At some point there maybe snapshot creation started and interrupted
  2. As a result we may have a temporary backup file, which was not finished
    drwxr-xr-x. 1 root root          56 Jan  4 11:04 ..
    -rw-r--r--. 1 root root 20547669028 Jan  4 10:02 dump.rdb
    -rw-r--r--. 1 root root  5188599808 Jan  4 11:03 temp-1-3.rdb
    -rw-r--r--. 1 root root  1432674655 Jan  4 10:46 temp-1-9.rdb
    -rw-r--r--. 1 root root  1078273848 Jan  4 10:46 temp-2086607563.1.rdb
    -rw-r--r--. 1 root root           0 Jan  4 11:06 temp-2088105784.1.rdb
    
  3. At the next start KeyDB will load the data and then start to sync from the Master
  4. After the sync it will perform a new backup
  5. This backup can be interrupted and as a result we may have one more temp file.

Doing this in a loop, we may running out of disk space. It is for sure a corner case.

Current value for the cleanupTempfiles.minutes is 60 minutes and it will not delete all precedent crashes happened just some minutes ago.

What is the main reason to have such a big value?

For Bitnami Redis Chart we use the following

master:
  preExecCmds: "rm -rf /data/temp*.*"

So, we will delete all temporary files right before the Redis start.

@air3ijai
Copy link
Contributor Author

Got the issue today on the Dev after a lot of issues we experienced yesterday in the Kubernetes cluster

Dumps

drwxr-xr-x. 2 root root         102 Jan 11 15:38 .
drwxr-xr-x. 1 root root          56 Jan 10 14:53 ..
-rw-r--r--. 1 root root 20553313533 Jan 10 07:13 dump.rdb
-rw-r--r--. 1 root root  2190929920 Jan 10 07:43 temp--1701050988.1.rdb
-rw-r--r--. 1 root root  1806061732 Jan 11 15:38 temp-324797-0.rdb
-rw-r--r--. 1 root root  2700424307 Jan 10 07:43 temp-652292-0.rdb

Save error loop

1:319:S 11 Jan 2023 15:35:47.933 * Replica 192.168.10.10:6379 asks for synchronization
1:319:S 11 Jan 2023 15:35:47.933 * Full resync requested by replica 192.168.10.10:6379
1:319:S 11 Jan 2023 15:35:47.933 * Starting BGSAVE for SYNC with target: disk
1:319:S 11 Jan 2023 15:35:48.105 * Background saving started by pid 324179
1:319:S 11 Jan 2023 15:35:48.105 * Background saving started
324179:319:C 11 Jan 2023 15:38:35.454 # Write error saving DB on disk: No space left on device
1:319:S 11 Jan 2023 15:38:36.601 # Background saving error
1:319:S 11 Jan 2023 15:38:36.601 # SYNC failed. BGSAVE child returned an error
1:319:S 11 Jan 2023 15:38:36.601 # Connection with replica 192.168.10.10:6379 lost.

1:319:S 11 Jan 2023 15:38:36.783 * Replica 192.168.10.10:6379 asks for synchronization
1:319:S 11 Jan 2023 15:38:36.783 * Full resync requested by replica 192.168.10.10:6379
1:319:S 11 Jan 2023 15:38:36.783 * Starting BGSAVE for SYNC with target: disk
1:319:S 11 Jan 2023 15:38:36.956 * Background saving started by pid 324797
1:319:S 11 Jan 2023 15:38:36.956 * Background saving started
324797:319:C 11 Jan 2023 15:41:19.609 # Write error saving DB on disk: No space left on device
1:319:S 11 Jan 2023 15:41:20.887 # Background saving error
1:319:S 11 Jan 2023 15:41:20.887 # SYNC failed. BGSAVE child returned an error
1:319:S 11 Jan 2023 15:41:20.887 # Connection with replica 192.168.10.10:6379 lost.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant