Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[filesys] issues: archive bloating and plugin timeouts #3965

Open
TrevorBenson opened this issue Mar 14, 2025 · 8 comments
Open

[filesys] issues: archive bloating and plugin timeouts #3965

TrevorBenson opened this issue Mar 14, 2025 · 8 comments

Comments

@TrevorBenson
Copy link
Member

TrevorBenson commented Mar 14, 2025

I have been investigating some issues with the filesys plugin.

  1. Bloating of archives from around 25MB compressed to 50-150MB
  2. Timeouts during execution on platforms with 20 or greater storage devices, leading to not having sos_commands/filesys content.

Bloating

I observed this over time, generally 3-9 months of server uptime. It appears related to files in /proc/fs/ext4/<device>/mb_groups growing to 10MB, or greater, when there are 20-60 storage drives. I don't expect any solution at sos report level, just reporting it since I noticed it while tracking issues in filesys.

Timeouts

The timeouts appear to be specific to dumpe2fs taking extremely long times to run. I intentionally use dumpe2fs output, so have been extending the plugin-timeout option to work around the plugin timeouts, but started to investigate due to setting plugin timeout to 900 or even 1500 still resulting in having no filesys command output included in the sos archive.

Based on the PluginOption definitions it would appear that. lsof, dumpe2fs and e2freefrag should be optional and are set to default=False.

option_list = [
PluginOpt('lsof', default=False,
desc='collect information on all open files'),
PluginOpt('dumpe2fs', default=False, desc='dump filesystem info'),
PluginOpt('frag', default=False,
desc='collect filesystem fragmentation status')
]

Except dumpe2fs exists in sos_commands/filesys for every collection that does not timeout. I actually thought it was enabled by default until reviewing the filesys plugin more closely:

dumpe2fs_opts = '-h'
if self.get_option('dumpe2fs'):
dumpe2fs_opts = ''
mounts = '/proc/mounts'
ext_fs_regex = r"^(/dev/\S+).+ext[234]\s+"
for dev in self.do_regex_find_all(ext_fs_regex, mounts):
self.add_cmd_output(f"dumpe2fs {dumpe2fs_opts} {dev}",
tags="dumpe2fs_h")
if self.get_option('frag'):
self.add_cmd_output(f"e2freefrag {dev}", priority=100)

Inside the block iterating over devices:

  • The dumpe2fs is always executed.
  • The tag is dumpe2fs_h regardless if the -h option is included.
  • The e2freefrag is not executed unless its PluginOpt is set to True.

So dumpe2fs simply removes the -h option by default and does not collects superblock information.

-h     only display the superblock information and not any of the block group descriptor detail information.

Possible Solutions

  1. Change the conditional block so that dumpe2fs is disabled unless it's PluginOpt is set to True.
    • Changes current plugin behavior.
    • Makes the existing option work as its description implies.
    • Breaking changes for users of -k filesys.dumpe2fs
  2. Change the dumpe2fs PluginOpt description to state the option is about collecting superblock w/o block group descriptor detail.
    • Maintains current plugin behavior.
    • Clarifies the true purpose of the option.
    • No way to disable dumpe2fs
    • No breaking changes for users of -k filesys.dumpe2fs
  3. Add a new PluginOpt for dumpe2fs_superblock w/ default=False & change PluginOpt for dumpe2fs to default=True
    • Maintains current plugin behavior.
    • Provides control over the superblock option.
    • Breaking changes for users of -k filesys.dumpe2fs

I'm hesitant to just select what I think is the best option and open a PR without feedback from others.

@TurboTurtle @pmoravec @arif-ali @jcastill I would appreciate feedback that can be provided. I'm happy to make the commits and open the PR as long as a consensus can be reached about the correct solution to use.

  • From the plugin description it appears the original implementation expected it to be disabled and requested by the user.
  • The commit history is quite old for the plugin, so I suspect many people presume dumpe2fs is enabled by default and rely on it
@TurboTurtle
Copy link
Member

Oof, this one is difficult.

I lean towards "the existing behavior is a bug, and dump2fs should be entirely gated by the plugin option", but I'm willing to bet that several groups (at a minimum, several from Red Hat) are used to and expect the "buggy" behavior. @sosreport/team-red-hat can we informally poll the sbrs that use this plugin the most to get a bead on expected behavior and how often they use this collection?

I don't think option 3 is a way we want to go; Options Are Evil(tm) and I'm generally resistant to multiple plugin options around what ultimately ends up being the "same" collection.

No way to disable dumpe2fs

Not a plugin option, but users (today) can use --skip-commands dumpe2fs to do this.

Regardless of what we decide here, we'll want to address the tagging inconsistency.

@pmoravec
Copy link
Contributor

Do I get you right that even dumpe2fs -h <dev> timeouts? Hmm..

How rare this can be? We have some commands guarded by a plugopt due to the same reason of potential timeout (but I found just the lsof option right from filesys, now). Knowing this would help us to assess Option 1.

I agree the current behaviour is counter-intuitive but I dont know what should be the default behaviour (or plugin options). Due to the legacy reasons / commit history, I am inclined in collecting dumpe2fs -h <dev> by default (until it often timeouts) and change the options accordingly.

I am asking my filesys colleagues for a feedback to understand the users' requirements.

@jcastill
Copy link
Member

I'll check the insights and the AI teams, to see if they are using that output.

The way I understand the situation at the moment is:
By default we gather superblock information always with -h, because that's very quick* and offers useful information to the filesystem team, and only if the option dumpe2fs is used do we gather the extended one (i.e. superblock and block group descriptor), which could take longer depending on how big the filesystem is and other details about it.

So point 2:

Change the dumpe2fs PluginOpt description to state the option is about collecting superblock w/o block group descriptor detail.

Could probably be:

Change the dumpe2fs PluginOpt description to state the option is about collecting extended information about the filesystem, i.e. superblock and block group descriptor detail.

  • Very quick in theory, but as @pmoravec asked... do you get timeouts with -h as well?

@bmr-cymru
Copy link
Member

bmr-cymru commented Mar 17, 2025

There's a fairly long history here of tension between gathering useful data vs. bloat, runtime, and IO load concerns. Roughly a decade ago we were told that dumpe2fs (at least -h) was essential and it was reinstated - that situation may have changed in the meantime, so it's always worth checking back.

The history of the option goes like this:

The dumpe2fs collection was originally added in commit 5349c95, replacing the older tune2fs -l collection:

https://bugzilla.redhat.com/show_bug.cgi?id=443037

It then got reverted back in commit 8d443ca, and switched back to dumpe2fs in commit c24bf71.

The dumpe2fs plugin option to gate the collection was added in commit 18191c4 ("- moved 1.9 to trunk" / tag r1.9)/

(this is all before my time)

In commit 868acaf I changed dumpe2fs to dumpe2fs -h unconditionally for all ext2/3/4 file systems. At this point the whole thing was gated by filesys.dumpe2fs.

In commit d6272e0 the current situation of collecting dumpe2fs -h by default, but only collecting full dumpe2fs if filesys.dumpe2fs is set. This was requested in #297 by Pier Lambri via RH bz1105629:

https://bugzilla.redhat.com/show_bug.cgi?id=1105629

The incorrect tagging was added in 7761406 - most likely Insights at this point only knew about the default dumpe2fs_h output, but it's confusing if applied to full dumpe2fs (and potentially leads to problems with the extra data collected in this mode).

Collecting dumpe2fs -h should only read a few tens of KiB from the device regardless of size although this is obviously multiplied by the number of devices containing ext234 file systems present on the host, so there is still the potential for timeouts with very large numbers of devices.

So dumpe2fs simply removes the -h option by default and does not collects superblock information.

Other way around: with -h the command only collects superblock information. Without -h it collects superblock and all block group descriptors from the entire device. Large file systems have many block groups hence the slowdown with large devices. If we were to gate this separately it would want to be something like dumpe2fs_groups, although I don't think that's the best option.

Option 1. may not be viable since collecting dumpe2fs -h by default since it was explicitly requested that we reinstate this. Only way to find out is to ask (I would avoid ripping it out and seeing who complains 🙂 )

Option 2. is I think a good idea irrespective of the other decisions. The current description string is highly misleading:

 filesys.dumpe2fs          off             dump filesystem info

That should at least be something like "dump full filesystem info" or "dump filesystem info with group descriptors".

Option 3. I tend to agree with @TurboTurtle. More options isn't the answer here. Perhaps one alternative would be to restrict the set of devices we operate on, either by an arbitrary cutoff or by limiting the collection to "system devices" (for some definition of "system devices"). This would complicate things but it might be the only reasonable middle ground between support who want this always-on and large system users who are impacted by the time consumption and IO generated by running these extra commands.

@TrevorBenson
Copy link
Member Author

TrevorBenson commented Mar 19, 2025

Do I get you right that even dumpe2fs -h <dev> timeouts? Hmm..

@pmoravec I believe so. The output I see is usually dumpe2fs_-h_.dev.sdb1 etc. for each device. The one time in the last few months I had access to investigate I manually tested the dumpe2fs (and e2freefrag) commands on the storage devices.

The commands were taking anywhere from around 20 seconds to over 300 seconds to complete per device. The 300 seconds (in this instance e2freefrag results) was during 80% idle CPU and an IO Wait of sub 5%. When systems have been freshly rebooted either command appears to return almost instantly, or at least quick enough not to be any concern with a default plugin timeout value even when having around 60 storage devices to enumerate.

It's possible the test (manually performed) did not include the -h however. When I had someone else test the other week I only had them recreate the results with e2freefrag, but didn't request them to test dumpe2fs at that time.

@TrevorBenson
Copy link
Member Author

TrevorBenson commented Mar 19, 2025

By default we gather superblock information always with -h, because that's very quick* and offers useful information to the filesystem team, and only if the option dumpe2fs is used do we gather the extended one (i.e. superblock and block group descriptor), which could take longer depending on how big the filesystem is and other details about it.

For clarity, I observe the issues with sos report and zero PluginOpt values being passed to filesys. From this, and fairly minimal testing months ago, it appears even w/ superblock that it can happen instead of only when extended information is gathered. Right before reporting this is when we tried a plugin-timeout = 900 in the sos.conf file, and filesys on a couple of systems still did not complete, so sos_commands/filesys did not exist at all in the collected archive.

EDIT:
However, I don't have a copy of the test to review to be 100% sure -h was being used in the test. Given I just noticed the logic in the conditional block in recent investigation, I suspect its quite possible that I didnt test with -h, but given this system only had 26 drives and mb_groups files were zero bytes in /proc, my suspicion is that its still dumpe2fs resulting in the timeouts for the plugin even with -h.

@TrevorBenson
Copy link
Member Author

I'll inquire with the engineer in France if we can access one of the platforms we recently observed the issue on and if we can retest with dump2efs -h, as well as each add_cmd_output() executed with the default PluginOpts.

@TrevorBenson
Copy link
Member Author

So dumpe2fs simply removes the -h option by default and does not collects superblock information.

Other way around: with -h the command only collects superblock information. Without -h it collects superblock and all block group descriptors from the entire device.

Yep, poorly worded on my part, should have said does not collect only superblock information.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants