-
-
Notifications
You must be signed in to change notification settings - Fork 881
Open
grepdemos/ImageSharp
#3Milestone
Description
As @saucecontrol pointed out in his comment, we can get rid of VPERMS
in the following code:
ImageSharp/src/ImageSharp/Processing/Processors/Transforms/Resize/ResizeKernel.cs
Lines 104 to 112 in e2211c3
result256_0 = Fma.MultiplyAdd( | |
Unsafe.As<Vector4, Vector256<float>>(ref rowStartRef), | |
Avx2.PermuteVar8x32(Vector256.CreateScalarUnsafe(*(double*)bufferStart).AsSingle(), mask), | |
result256_0); | |
result256_1 = Fma.MultiplyAdd( | |
Unsafe.As<Vector4, Vector256<float>>(ref Unsafe.Add(ref rowStartRef, 2)), | |
Avx2.PermuteVar8x32(Vector256.CreateScalarUnsafe(*(double*)(bufferStart + 2)).AsSingle(), mask), | |
result256_1); |
If FMA is detected we should allocate 4x buffer and to the duplication in ResizeKernelMap.Calculate
, which should be much cheaper than doing it in every convolution:
ImageSharp/src/ImageSharp/Processing/Processors/Transforms/Resize/ResizeKernelMap.cs
Lines 115 to 120 in e2211c3
public static ResizeKernelMap Calculate<TResampler>( | |
in TResampler sampler, | |
int destinationSize, | |
int sourceSize, | |
MemoryAllocator memoryAllocator) | |
where TResampler : struct, IResampler |
saucecontrol