Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multithreading vs. multiprocessing #171

Open
hagenw opened this issue Apr 12, 2024 · 4 comments
Open

Multithreading vs. multiprocessing #171

hagenw opened this issue Apr 12, 2024 · 4 comments
Labels
question Further information is requested

Comments

@hagenw
Copy link
Member

hagenw commented Apr 12, 2024

At the moment we have as default multiprocessing=False, but I wonder what was/is the reasoning behind it.

When browsing the web, I can find the following statement:

  • multi-threading is good for IO-bound processes like reading or downloading files
  • multi-processing is good for computational heavy tasks

When doing a simple test:

import audb
import audinterface
import audmath
import time

def process_func(signal, sampling_rate):
    return audmath.db(audmath.rms(signal))

db = audb.load("emodb", version="1.4.1")
for multiprocessing in [False, True]:
    for num_workers in [1, 5]:
        interface = audinterface.Feature(
            ["rms"],
            process_func=process_func,
            num_workers=num_workers,
            multiprocessing=multiprocessing,
        )
        t0 = time.time()
        df = interface.process_index(db.files)
        t = time.time() - t0
        print(f"{multiprocessing=}, {num_workers=}: {t:.2f} s")

it returns (after running the second time)

multiprocessing=False, num_workers=1: 0.16 s                                                        
multiprocessing=False, num_workers=5: 0.26 s
multiprocessing=True, num_workers=1: 0.16 s
multiprocessing=True, num_workers=5: 0.11 s

Even though we don't do heavy processing here, multi-processing seems to be faster in this case. Is this expected?

/cc @ureichel, @ChristianGeng, @frankenjoe, @maxschmitt, @audeerington, @schruefer

@hagenw hagenw added the question Further information is requested label Apr 12, 2024
@frankenjoe
Copy link
Collaborator

I sometimes run into problems with multi-processing, e.g. an older version ofopensmile was not supporting I think.

@hagenw
Copy link
Member Author

hagenw commented Apr 12, 2024

Yes, I also remembered that multiprocessing=False seemed to be the safer choice, and in audb it does provide the expected speed enhancement when downloading files. But I wonder, if this might be different when executing the process function in audinterface.

@maxschmitt
Copy link
Contributor

I think "heavy processing" is always relative but anyway, the overhead might still occupy most of the computing time.

Measuring time spent in the processing function:

import audb
import audinterface
import audmath
import time

def process_func(signal, sampling_rate):
    global tsum
    tx = time.time()
    res = audmath.db(audmath.rms(signal))
    tsum += time.time() - tx
    return res

db = audb.load("emodb", version="1.4.1")
for multiprocessing in [False, True]:
    for num_workers in [1, 5]:
        interface = audinterface.Feature(
            ["rms"],
            process_func=process_func,
            num_workers=num_workers,
            multiprocessing=multiprocessing,
        )
        tsum = 0.
        t0 = time.time()
        df = interface.process_index(db.files)
        t = time.time() - t0
        print(f"{multiprocessing=}, {num_workers=}: {t:.2f} s, "
              f"processing time: {tsum:.2f} s")
multiprocessing=False, num_workers=1: 0.87 s, processing time: 0.06 s
multiprocessing=False, num_workers=5: 0.47 s, processing time: 0.60 s
multiprocessing=True, num_workers=1: 0.40 s, processing time: 0.05 s
multiprocessing=True, num_workers=5: 0.39 s, processing time: 0.00 s

The figure for the last row (multiprocessing) is not correct with this method, of course, but for the outputs with one worker, we see that only a small part of the execution time is spent in process_func and differences might be mainly due to overheads.

@hagenw
Copy link
Member Author

hagenw commented Apr 15, 2024

I repeated the measurement with opensmile:

import audb
import audmath
import opensmile
import time

db = audb.load("emodb", version="1.4.1")
for multiprocessing in [False, True]:
    for num_workers in [1, 5]: 
        interface = opensmile.Smile(
            num_workers=num_workers,
            multiprocessing=multiprocessing,
        )
        t0 = time.time()
        df = interface.process_index(db.files)
        t = time.time() - t0
        print(f"{multiprocessing=}, {num_workers=}: {t:.2f} s")

and there it does not make a difference if we use multi-processing or not:

multiprocessing=False, num_workers=1: 20.27 s                                                       
multiprocessing=False, num_workers=5: 6.29 s
multiprocessing=True, num_workers=1: 20.32 s
multiprocessing=True, num_workers=5: 6.54 s

But when testing with another feature extractor:

import audb
import audmath
import audmld
import time

db = audb.load("emodb", version="1.4.1")
for multiprocessing in [False, True]:
    for num_workers in [1, 5]: 
        interface = audmld.Mld(
            num_workers=num_workers,
            multiprocessing=multiprocessing,
        )
        t0 = time.time()
        df = interface.process_index(db.files)
        t = time.time() - t0
        print(f"{multiprocessing=}, {num_workers=}: {t:.2f} s")

there is indeed a difference:

multiprocessing=False, num_workers=1: 118.00 s                                                      
multiprocessing=False, num_workers=5: 189.54 s
multiprocessing=True, num_workers=1: 106.39 s
multiprocessing=True, num_workers=5: 46.43 s

So I guess, this indicates that we did some (wrong?) choice in its implementation, resulting to support only multiprocessing?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants