-
Notifications
You must be signed in to change notification settings - Fork 546
Fix for Recognizing --threads Option in Benchmarking #2161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅ |
@@ -26,7 +26,14 @@ def image_format(self): | |||
|
|||
def load(self, model_path, inputs=None, outputs=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
num_threads
parameter not present?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that needs to be added
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @anandhu-eng, since we are using a general load function call, we need to ensure that the num_threads
parameter is added to all the load functions of each individual backend.py file to prevent the current GitHub actions from failing. Additionally, is this a useful feature to add across all backends? As it is currently missing in all of them, is there a specific reason why it was not included? As obviously, everyone will want to use the maximum resources of their system to get the best results, and the only scenario I can think of is when the benchmarking has to be done using relatively less RAM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sujik18 other option will be to use ENV variable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @sujik18 , just skimmed through the code. I see that args.threads
is being used here and it directly maps to the QueueRunner
class here (I'm taking for Offline scenario) and the workers (threads) are defined here. Would you say the current implementation's thread usage is sub-optimal, or is this PR primarily focused on enhancing the performance of internal operations during prediction?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @anandhu-eng, currently while running the script, the --threads
tag value is not being taken into account and script always utilizes the whole system-wide threads available. Even though args.threads
is passed to QueueRunner
it seems like backend might be ignoring it.
With this PR, primarily I have tried to configure the backend to respect the threads
parameter value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @sujik18 . So I understand that current change controls the number of threads used by the model backend.
Out of curiosity, I would like to ask whether the threads
argument is meant only to control the number of inputs processed in parallel by the model backend, or if it is also intended to control the parallel operations performed by the model backend. I'm considering the case where the user doesn't need to control the number of threads used by the model backend but wants to control the number of inputs processed in parallel by the model backend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whether the threads argument is meant only to control the number of inputs processed in parallel by the model backend, or if it is also intended to control the parallel operations performed by the model backend.
threads
argument intends to control the number of physical cores/threads that will be utilised in parallel execution of the model by backend. On a different note, I am not sure whether through script we can control which particular CPU can be utilized other than the option to disable the ones which shouldn't be used.
I'm considering the case where the user doesn't need to control the number of threads used by the model backend but wants to control the number of inputs processed in parallel by the model backend.
I guess a user can achieve that by setting the batch_size
parameter and not setting threads
parameter.
@@ -646,7 +646,9 @@ def main(): | |||
model = backend.load( | |||
args.model, | |||
inputs=args.inputs, | |||
outputs=args.outputs) | |||
# outputs=args.output, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @sujik18 , wasn't it originally args.outputs
which is different from args.output
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, right! It feels like I had unknowingly removed it and later didn't notice it
In reference to this issue
main.py
to correctly propagate the--threads
argument to the backend loader, which was previously missing.Test Logs:
mlc log thread command run.txt
System stats during the execution of the script with --threads=2