You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I did a brief benchmark performance by joining 2 dataframes containing 43K rows. The joined columns contain unique values, meaning that there can only be a single match between 1 row in df_A and 1 row in df_B.
The performance of and Inner join for go-gota was: 37.68s.
In contrast, the same logic, when executed using Pandas in Python took barely 1s.
From the looks of the present implementation, go-gota is indeed implementing a nested loop join, which can be inefficient for large datasets.
Can I check if there is a road map to address this issue? If not, would it be possible for me to try and submit a PR with implementations for hash join & merge join features? Believe those will help speed up the performance of joins.
A couple of things:
Looking for inputs here.
The text was updated successfully, but these errors were encountered: