-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multithreading vs. multiprocessing #171
Comments
I sometimes run into problems with multi-processing, e.g. an older version of |
Yes, I also remembered that |
I think "heavy processing" is always relative but anyway, the overhead might still occupy most of the computing time. Measuring time spent in the processing function: import audb
import audinterface
import audmath
import time
def process_func(signal, sampling_rate):
global tsum
tx = time.time()
res = audmath.db(audmath.rms(signal))
tsum += time.time() - tx
return res
db = audb.load("emodb", version="1.4.1")
for multiprocessing in [False, True]:
for num_workers in [1, 5]:
interface = audinterface.Feature(
["rms"],
process_func=process_func,
num_workers=num_workers,
multiprocessing=multiprocessing,
)
tsum = 0.
t0 = time.time()
df = interface.process_index(db.files)
t = time.time() - t0
print(f"{multiprocessing=}, {num_workers=}: {t:.2f} s, "
f"processing time: {tsum:.2f} s")
The figure for the last row (multiprocessing) is not correct with this method, of course, but for the outputs with one worker, we see that only a small part of the execution time is spent in |
I repeated the measurement with import audb
import audmath
import opensmile
import time
db = audb.load("emodb", version="1.4.1")
for multiprocessing in [False, True]:
for num_workers in [1, 5]:
interface = opensmile.Smile(
num_workers=num_workers,
multiprocessing=multiprocessing,
)
t0 = time.time()
df = interface.process_index(db.files)
t = time.time() - t0
print(f"{multiprocessing=}, {num_workers=}: {t:.2f} s") and there it does not make a difference if we use multi-processing or not:
But when testing with another feature extractor: import audb
import audmath
import audmld
import time
db = audb.load("emodb", version="1.4.1")
for multiprocessing in [False, True]:
for num_workers in [1, 5]:
interface = audmld.Mld(
num_workers=num_workers,
multiprocessing=multiprocessing,
)
t0 = time.time()
df = interface.process_index(db.files)
t = time.time() - t0
print(f"{multiprocessing=}, {num_workers=}: {t:.2f} s") there is indeed a difference:
So I guess, this indicates that we did some (wrong?) choice in its implementation, resulting to support only multiprocessing? |
At the moment we have as default
multiprocessing=False
, but I wonder what was/is the reasoning behind it.When browsing the web, I can find the following statement:
When doing a simple test:
it returns (after running the second time)
Even though we don't do heavy processing here, multi-processing seems to be faster in this case. Is this expected?
/cc @ureichel, @ChristianGeng, @frankenjoe, @maxschmitt, @audeerington, @schruefer
The text was updated successfully, but these errors were encountered: