-
Notifications
You must be signed in to change notification settings - Fork 501
[WIP][SYSTEMDS-3907] tee ooc operator #2317
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @mboehm7 , could you please check this, I would be grateful for any review comments.
if (hop.getParent().size() > 1) { | ||
for (Hop consumer : hop.getParent()) { | ||
if (consumer instanceof TeeOp) { | ||
if (consumer instanceof DataOp) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this check only checking for DataOp.
// Create the new TeeOp with the original hop as input | ||
TeeOp teeOp = new TeeOp(sharedInput); | ||
// TeeOp teeOp = new TeeOp(sharedInput); | ||
DataOp teeOp = new DataOp("tee_out_" + sharedInput.getName(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
presently we are creating different variable names than the ones created automatically _mvar1 ... --> is this the correct approach?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no worries, these hop names are just for debugging - while generating instructions, we generate the _mvar, _fvar, _var accordingly.
DataOp teeOp = new DataOp("tee_out_" + sharedInput.getName(), | ||
sharedInput.getDataType(), | ||
sharedInput.getValueType(), | ||
Types.OpOpData.TRANSIENTWRITE, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it supposed to be TRANSIENTWRITE or TRANSIENTREAD?
import org.apache.sysds.lops.LopsException; | ||
import org.apache.sysds.lops.Sql; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll remove *
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes exactly - let's try to avoid the wildcard imports (to avoid ambiguity).
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #2317 +/- ##
============================================
+ Coverage 72.26% 72.28% +0.01%
- Complexity 46636 46690 +54
============================================
Files 1496 1499 +3
Lines 176862 177039 +177
Branches 34767 34794 +27
============================================
+ Hits 127805 127965 +160
- Misses 39407 39420 +13
- Partials 9650 9654 +4 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for making progress here @j143 - below you find my answers, additional comments, and a recommendation for a design change that might simplify things.
import org.apache.sysds.lops.LopsException; | ||
import org.apache.sysds.lops.Sql; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes exactly - let's try to avoid the wildcard imports (to avoid ambiguity).
|
||
private boolean _recompileRead = true; | ||
|
||
private boolean _isTeeOp = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather make it a op-type of DataOp like TWRITE.
setLineNumbers(l); | ||
setLops(l); | ||
// setLineNumbers(l); | ||
// setLops(l); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the line numbers and lops should also be set for the TeeOp
// Create the new TeeOp with the original hop as input | ||
TeeOp teeOp = new TeeOp(sharedInput); | ||
// TeeOp teeOp = new TeeOp(sharedInput); | ||
DataOp teeOp = new DataOp("tee_out_" + sharedInput.getName(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no worries, these hop names are just for debugging - while generating instructions, we generate the _mvar, _fvar, _var accordingly.
boolean isTransposeMM = isTranposePattern(hop); | ||
boolean isMatrix = hop.getDataType() == Types.DataType.MATRIX; | ||
|
||
return isOOCEnabled && multipleConsumers && isNotAlreadyTee && isTransposeMM && isMatrix; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's fine to currently only do it for transpose - it will not affect other things, and later we generalize the approach.
for (Hop parent: hop.getParent()) { | ||
String opString = parent.getOpString(); | ||
if (parent instanceof ReorgOp) { | ||
if (opString.contains("r'") || opString.contains("transpose")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please always use the following utils: HopRewriteUtils.isReorgOp(parent, ReOrgOp.TRANS)
} | ||
} | ||
else if (parent instanceof AggBinaryOp) | ||
if (opString.contains("*") || opString.contains("ba+*")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here as above.
String[] out = outputs.split(Lop.OPERAND_DELIMITOR); | ||
String output2 = outputs + "_copy"; | ||
|
||
// This method generates the instruction string: OOC°tee°input°output1°output2... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would recommend not making tee an operation with two outputs but just one. (copying the input). After thinking a bit more about it, maybe the entire notion of a tee was not a good idea. Potentially, we could make the stream resettable, and the tee operator takes a normal stream, and outputs a resettable stream which can then be consumed by arbitrary many other operations (which also solves the problem of complex control flow interactions where the multiple parents are not immediately visible)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @mboehm7 . Are you referring to the way, where we have lightweight marker for the stream so the multiple consumers could utilize it. Please let me raise a different PR with the option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes exactly - I'm thinking about something like setResettable()
on the taskqueue, which means the blocks are never removed (and potentially subject to eviction) but different operations can get new iterables that always iterate the all blocks. The first operation would simply pull all blocks into memory, and the rmvar then drops all of them. If there is a single consumer we would not need to make it resettable, but can just stream them in.
implement tee ooc operator
Done:
TODO:
2. presently only, meant for t(X) %*% X, need to see if this works in general.