Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lib/cpuinfo: Increase the file descriptors limit to handle more CPUs #263

Closed
wants to merge 1 commit into from

Conversation

babumoger
Copy link
Contributor

The pqos tool fails with the following errors on systems with 300 or more CPU cores.
$pqos
NOTE: Mixed use of MSR and kernel interfaces to manage
CAT or CMT & MBM may lead to unexpected behavior.
ERROR: Could not open /sys/fs/resctrl directory
ERROR: Failed to stop resctrl events
ERROR: Failed to start all selected OS monitoring events Monitoring start error on core(s) 339, status 1

By default, the file descriptor limit is set to 1024 for a session. pqos monitor uses 3 descriptors for each CPU for perf monitoring. So, it runs out of limit(1024) on systems with 300 or more CPUs.

Fix the issue by detecting the number of CPUs in the system and increasing the descriptor limit using system call getrlimit and setrlimit respectively. Increase the limit to 4 times the number of CPUs to take care of open files limit.

Description

By default, the file descriptor limit is set to 1024 for a session. pqos monitor uses 3 descriptors for each CPU for perf monitoring. So, it runs out of limit(1024) on systems with 300 or more CPUs.

Fix the issue by detecting the number of CPUs in the system and increasing the descriptor limit using system call getrlimit and setrlimit respectively. Increase the limit to 4 times the number of CPUs to take care of open files limit.

Affected parts

  • library
  • pqos utility
  • rdtset utility
  • App QoS
  • other: (please specify)

Motivation and Context

#261

How Has This Been Tested?

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

Checklist:

  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.

Sorry, something went wrong.

@babumoger
Copy link
Contributor Author

Please take a look at the code.

@rkanagar
Copy link
Contributor

Please take a look at the code.

Yes, we are reviewing this code. Thanks

@rkanagar
Copy link
Contributor

rkanagar commented Apr 18, 2024

Hi Babu,
Please implement the attached fd_diff.txt
fd_diff.txt

@babumoger babumoger force-pushed the AMD-Max_cores_fix-0.1 branch from 9b41e2a to 244a252 Compare April 19, 2024 14:10
@babumoger
Copy link
Contributor Author

Hi Babu, Please implement the attached fd_diff.txt fd_diff.txt

Hi Raghavan, I have implemented your changes. Please review. thanks

@babumoger babumoger force-pushed the AMD-Max_cores_fix-0.1 branch from 244a252 to c25bd6d Compare April 25, 2024 16:07
The pqos tool fails with the following errors on systems with 300 or more
CPU cores.
$pqos
NOTE:  Mixed use of MSR and kernel interfaces to manage
       CAT or CMT & MBM may lead to unexpected behavior.
ERROR: Could not open /sys/fs/resctrl directory
ERROR: Failed to stop resctrl events
ERROR: Failed to start all selected OS monitoring events
Monitoring start error on core(s) 339, status 1

By default, the file descriptor limit is set to 1024 for a session. pqos
monitor uses 3 descriptors for each CPU for perf monitoring. So, it runs
out of limit(1024) on systems with 300 or more CPUs.

Fix the issue by detecting the number of CPUs in the system and increasing
the descriptor limit using system call getrlimit and setrlimit respectively.
Increase the limit to 4 times the number of CPUs to take care of open files
limit.

Signed-off-by: Babu Moger <[email protected]>
@babumoger babumoger force-pushed the AMD-Max_cores_fix-0.1 branch from c25bd6d to 42475e2 Compare April 30, 2024 19:00
@rkanagar
Copy link
Contributor

Merged in v24.05 release

@rkanagar rkanagar closed this Jun 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants