-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cheb2leg is slow #97
Comments
I guess its all in the plan: julia> @time p = plan_cheb2leg(randn(100_000));
8.466608 seconds (7 allocations: 781.531 KiB)
julia> @time p * randn(100_000);
0.160250 seconds (172 allocations: 1.536 MiB) I remember the Toeplitz dot Hankel being much faster. |
Indeed, the plans are slower than the Toeplitz-dot-Hankel approach but the execution times are much faster. Comparing with Figure 5 from the Webb, Townsend, Olver paper, I see:
I think that for multiple transforms, this is a better alternative. There is one (possibly surprising) reason the plans are slower than before: in C, the Float64 plans are first created in long double precision before being converted to double precision data structures to squeeze out the trailing bits more accurately. Similarly, in Float32, they are first created in double precision. If one would accept 12-14 digits instead of 14-16, then the purely 64-bit plans would be fine for 64-bit execution. The plan time at 100_000 is then just a bit less than the current Float32 plan: Notice that now, the 1D plans are also multithreaded, so: julia> c = randn(Float64, 100_000);
julia> @time p = plan_cheb2leg(c);
8.197012 seconds (5 allocations: 208 bytes)
julia> @time p*c;
0.092681 seconds (8 allocations: 781.844 KiB)
julia> x = randn(Float64, 100_000, 20);
julia> @time p*x;
0.514451 seconds (8 allocations: 15.259 MiB)
With four logical cores, the twenty-column execution is not twenty times slower. |
Ok so we should use Toeplitz dot Hankel for |
Better would be to implement the Alpert--Rokhlin scheme in C for cases where the connection coefficients and bounds on off-diagonal block ranks are all known. See #75 (comment) |
This seems very slow:
The text was updated successfully, but these errors were encountered: