-
Notifications
You must be signed in to change notification settings - Fork 8
Conversation
Signed-off-by: Alexander Bezzubov <[email protected]>
This change includes: - part of the UastIterExt impl - better error handling Signed-off-by: Alexander Bezzubov <[email protected]>
Signed-off-by: Alexander Bezzubov <[email protected]>
Signed-off-by: Alexander Bezzubov <[email protected]>
CI fails on bblfhsh driver installation that is not relevant for the change.
Will restart it manually. |
Signed-off-by: Alexander Bezzubov <[email protected]>
CI is green now. As soon as FIXMEs are addressed it's going to be ready for review. |
Signed-off-by: Alexander Bezzubov <[email protected]>
Signed-off-by: Alexander Bezzubov <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 9 of 15 files at r1, 1 of 3 files at r2, 3 of 4 files at r3.
Reviewable status: 13 of 15 files reviewed, 10 unresolved discussions (waiting on @bzz, @creachadair, and @dennwc)
src/main/native/jni_utils.h, line 30 at r3 (raw file):
extern const char METHOD_JARR_ADD[]; extern const char METHOD_OBJ_TO_STR[]; extern const char MERHOD_RE_INIT[];
typo: METHOD_*
here and on the next line, and at their points of initialization in jni_utils.cc
.
src/main/native/jni_utils.h, line 43 at r3 (raw file):
JNIEnv *getJNIEnv(); jobject NewJavaObject(JNIEnv *, const char *, const char *, ...);
It would be very helpful if either the prototypes or the definitions of these functions could get comments, as they seem both to lack them now. I don't think they need to be particularly detailed, but the names alone don't make it clear which ones are constructing new material vs. accessing existing material.
src/main/native/jni_utils.cc, line 76 at r3 (raw file):
jmethodID toString = env->GetMethodID(exceptionCls, "toString", METHOD_OBJ_TO_STR); if (env->ExceptionCheck() || !toString) {
Since ||
short-circuits, should we do the toString
check first?
src/main/native/org_bblfsh_client_v2_libuast_Libuast.cc, line 132 at r3 (raw file):
// lookup searches for a specific node handle. // Helper for iterators. jobject lookup(NodeHandle node) { return toJ(node); }
I'm not clear why this is needed outside the iterator itself. Can we not inline the toJ
call into the Iterate
method? Or if not (e.g., if there is some conversion magic happening that I overlooked), could this be made private so that it is not exposed to arbitrary callers?
src/main/native/org_bblfsh_client_v2_libuast_Libuast.cc, line 599 at r3 (raw file):
JNIEnv *env, jobject self) { // this.ctx - delete Context as iterator owns it auto ctx = getHandle<Context>(env, self, "ctx");
If the iterator owns this, maybe a better solution would be to equip the iterator with a releaseHandle
method. Right now we're relying on the (undocumented!) assumption that the Context will no longer use the handle we have stolen from it, even though nothing in the code helps enforce that.
Relatedly below—I'm worried about the lifetime issues here, and there doesn't seem to be any documentation about the expectations.
src/main/native/org_bblfsh_client_v2_libuast_Libuast.cc, line 616 at r3 (raw file):
auto iter = reinterpret_cast<uast::Iterator<Node *> *>(iterPtr); try {
What are we protecting against here? If we think we could have an invalid iterator shouldn't we check that separately? Exception handlers are expensive.
src/main/scala/org/bblfsh/client/v2/libuast/Libuast.scala, line 42 at r3 (raw file):
override def next(): T = { if (!lookedAhead) {
I think this could be simpler as
if (hasHext()) {
return nextNode
}
return null
That way all the lookahead logic is self-contained in the one method that actually needs to distinguish whether the lookahead happened yet.
src/main/scala/org/bblfsh/client/v2/libuast/Libuast.scala, line 46 at r3 (raw file):
} lookedAhead = false nextNode // never null, iff called after .hasNext
Even without my suggestion at line 42, I don't think this can be true: If .hasNext
reported false (say because the iterator was immediately closed, or it has no elements) then I believe nextNode
is null.
src/test/scala/org/bblfsh/client/v2/LibuastManagedIteratorTest.scala, line 78 at r3 (raw file):
val poActual = Seq("root", "son1", "son1_1", "son1_2", "son2", "son2_1", "son2_2") nodes should have size (poActual.size) nodes shouldEqual poActual
Doesn't this subsume the length check?
src/test/scala/org/bblfsh/client/v2/LibuastNativeIteratorTest.scala, line 79 at r3 (raw file):
val nodes = iter.toList nodes shouldNot be(empty) println(s"Iterator returned ${nodes.size} nodes")
Debug leftovers?
Signed-off-by: Alexander Bezzubov <[email protected]>
Signed-off-by: Alexander Bezzubov <[email protected]>
Signed-off-by: Alexander Bezzubov <[email protected]>
Signed-off-by: Alexander Bezzubov <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 11 of 15 files reviewed, 10 unresolved discussions (waiting on @bzz, @creachadair, and @dennwc)
src/main/native/jni_utils.h, line 30 at r3 (raw file):
Previously, creachadair (M. J. Fromberger) wrote…
typo:
METHOD_*
here and on the next line, and at their points of initialization injni_utils.cc
.
Done 🤦♂️
src/main/native/jni_utils.h, line 43 at r3 (raw file):
Previously, creachadair (M. J. Fromberger) wrote…
It would be very helpful if either the prototypes or the definitions of these functions could get comments, as they seem both to lack them now. I don't think they need to be particularly detailed, but the names alone don't make it clear which ones are constructing new material vs. accessing existing material.
Done 👍
src/main/native/jni_utils.cc, line 76 at r3 (raw file):
Previously, creachadair (M. J. Fromberger) wrote…
Since
||
short-circuits, should we do thetoString
check first?
I think this is fine - it may be just a bit excessive as the first check is thorough and should be enough.
There may be many reasons for an exception, but the last one only verifies that the method indeed was found. But the more I grapple with duality of native/managed error handling in JNI, the more I realize that it's always better to err on the safe side :)
src/main/native/org_bblfsh_client_v2_libuast_Libuast.cc, line 132 at r3 (raw file):
Previously, creachadair (M. J. Fromberger) wrote…
I'm not clear why this is needed outside the iterator itself. Can we not inline the
toJ
call into theIterate
method? Or if not (e.g., if there is some conversion magic happening that I overlooked), could this be made private so that it is not exposed to arbitrary callers?
I do not see a way to inline this logic into the UastIterExt.nativeNext()
(I presume that's what the iterator above means) without duplicating this code, as the same needs to be done for getting a root node (proxy JVM object) out of the native contextExt.
At the same time, I do not see how this can be made private, given that it is needed in those 2 different places.
Pushed the small change to remove confusing comment and make the whole flow (hopefully) a tiny bit clearer. Hope it helps!
src/main/native/org_bblfsh_client_v2_libuast_Libuast.cc, line 599 at r3 (raw file):
assumption that the Context will no longer use the handle we have stolen from it
I'm a bit confused as Context does not use any handle.
May be what is confusing here is the getHandle
method name? It is just a convenience for reading the opaque pointer to a native object out of the managed object instance field and type-casting. I have added missing documentation for this helper.
So what happens when the managed iterator is disposed (after casting the type of opaque pointer) the native Context is deleted. Comment above was meant to document this assumption.
UastIterExt
e.g does not do that, as it does not own the Context, which thus can be used by other managed/native code later and is somebody else's responsibility.
Hope this helps, but do let me know if you have some other suggestions here.
src/main/native/org_bblfsh_client_v2_libuast_Libuast.cc, line 616 at r3 (raw file):
Previously, creachadair (M. J. Fromberger) wrote…
What are we protecting against here? If we think we could have an invalid iterator shouldn't we check that separately? Exception handlers are expensive.
It is the way of how libuast API is designed to handle errors 🤷♂️
Python client does exactly the same, so I suggest to keep it as close as possible for the sanity of maintenance of both of them.
src/main/scala/org/bblfsh/client/v2/libuast/Libuast.scala, line 46 at r3 (raw file):
Previously, creachadair (M. J. Fromberger) wrote…
Even without my suggestion at line 42, I don't think this can be true: If
.hasNext
reported false (say because the iterator was immediately closed, or it has no elements) then I believenextNode
is null.
Sorry for confusion - you are obvious right. It's just the comment that is miss-placed here.
The sole intention of this is for .hasNext()
to never return true in case when the next node is null
(which happens, as it's a way the native iterators work)
JFYI iterators in all previous versions of the scale-client were always adding the extra null
as the last element of the result :/
Updated the documentation to make it clear.
src/test/scala/org/bblfsh/client/v2/LibuastManagedIteratorTest.scala, line 78 at r3 (raw file):
Previously, creachadair (M. J. Fromberger) wrote…
Doesn't this subsume the length check?
Yes, it does, but only when everything works :)
Meaning: while gradually improving the implementation, test were "grown" to refine the conditions they verify.
Signed-off-by: Alexander Bezzubov <[email protected]>
Signed-off-by: Alexander Bezzubov <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 10 of 15 files reviewed, 10 unresolved discussions (waiting on @bzz, @creachadair, and @dennwc)
src/main/scala/org/bblfsh/client/v2/libuast/Libuast.scala, line 42 at r3 (raw file):
Previously, creachadair (M. J. Fromberger) wrote…
I think this could be simpler as
if (hasHext()) { return nextNode } return nullThat way all the lookahead logic is self-contained in the one method that actually needs to distinguish whether the lookahead happened yet.
I see what you mean.
One obstacle is that in Scala, an expression of type null
does no conform to the type T
.
Another is that we still need to set lookedAhead = false
, otherwise next lookahead will not happen.
Done.
Signed-off-by: Alexander Bezzubov <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 9 of 15 files reviewed, 4 unresolved discussions (waiting on @bzz, @creachadair, and @dennwc)
src/main/native/org_bblfsh_client_v2_libuast_Libuast.cc, line 132 at r3 (raw file):
Previously, bzz (Alexander) wrote…
I do not see a way to inline this logic into the
UastIterExt.nativeNext()
(I presume that's what the iterator above means) without duplicating this code, as the same needs to be done for getting a root node (proxy JVM object) out of the native contextExt.At the same time, I do not see how this can be made private, given that it is needed in those 2 different places.
Pushed the small change to remove confusing comment and make the whole flow (hopefully) a tiny bit clearer. Hope it helps!
I see what you mean. It's just weird to have this random public (and non-static) member function that has nothing to do with the contents of the instance. If it's not specific to the type, can it be a free function rather than a member?
src/main/native/org_bblfsh_client_v2_libuast_Libuast.cc, line 599 at r3 (raw file):
Previously, bzz (Alexander) wrote…
assumption that the Context will no longer use the handle we have stolen from it
I'm a bit confused as Context does not use any handle.
May be what is confusing here is the
getHandle
method name? It is just a convenience for reading the opaque pointer to a native object out of the managed object instance field and type-casting. I have added missing documentation for this helper.So what happens when the managed iterator is disposed (after casting the type of opaque pointer) the native Context is deleted. Comment above was meant to document this assumption.
UastIterExt
e.g does not do that, as it does not own the Context, which thus can be used by other managed/native code later and is somebody else's responsibility.Hope this helps, but do let me know if you have some other suggestions here.
Maybe I'm using the wrong terminology. The code as written here does a somewhat stately type conversion to a Context*
, diddles some state of the ContextExt
, then deletes the Context*
. Are you saying that we're pulling out and invalidating a field of ContextExt
? And if so: Could the ContextExt
have a member function to do that, e.g., ext->disposeContext()
or something—so that it's clear we're not removing something that will dangle?
As an aside, the names of these things are frustratingly generic, and there are not nearly enough comments anywhere. I don't know what a "context" is or what its relationship is supposed to be to a "contextExt" and as far as I can tell "handle" is just a void*
cast to an integer. Why did we do that, I wonder? The confusion is not your fault—all these things pre-date this PR—but it makes it really hard to understand the effects of the new code. So please forgive me if what I'm saying is missing some key point.
src/main/native/org_bblfsh_client_v2_libuast_Libuast.cc, line 616 at r3 (raw file):
Previously, bzz (Alexander) wrote…
It is the way of how libuast API is designed to handle errors 🤷♂️
Python client does exactly the same, so I suggest to keep it as close as possible for the sanity of maintenance of both of them.
That makes sense—but then I think we should take a careful sweep over the other methods here, because I think there may be other cases where we're not doing this check but where an exception could occur.
src/main/scala/org/bblfsh/client/v2/libuast/Libuast.scala, line 42 at r3 (raw file):
Previously, bzz (Alexander) wrote…
I see what you mean.
One obstacle is that in Scala, an expression of type
null
does no conform to the typeT
.
Another is that we still need to setlookedAhead = false
, otherwise next lookahead will not happen.Done.
I see, thanks for the explanation.
I think what you have here now is OK. I wonder, though, if it would not be simpler to use private var nextNode : Option[T] = _
. We're essentially emulating that anyway, by carefully adjusting a separate Boolean value, and at that point I'm not sure the advantage of storing the T
by value is sufficient to warrant the confusion. At least an Option encapsulates the presence checks.
Maybe that has different problems, I don't know, but I feel like this is too complicated for such a simple task as keeping track of a single item that may or may not be present.
src/test/scala/org/bblfsh/client/v2/LibuastManagedIteratorTest.scala, line 78 at r3 (raw file):
Previously, bzz (Alexander) wrote…
Yes, it does, but only when everything works :)
Meaning: while gradually improving the implementation, test were "grown" to refine the conditions they verify.
I mean—if two sequences are equal, they must be the same length, right? And if the test harness is broken so that it can't compare them, wouldn't we have other problems?
Anyway, this isn't wrong, and it's fine; it just struck me as unnecessary, and I was trying to figure out why the extra check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 15 files at r1, 3 of 5 files at r4.
Reviewable status: 13 of 15 files reviewed, 4 unresolved discussions (waiting on @bzz, @creachadair, and @dennwc)
Signed-off-by: Alexander Bezzubov <[email protected]>
Signed-off-by: Alexander Bezzubov <[email protected]>
Signed-off-by: Alexander Bezzubov <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 13 of 15 files reviewed, 2 unresolved discussions (waiting on @creachadair and @dennwc)
src/main/native/org_bblfsh_client_v2_libuast_Libuast.cc, line 132 at r3 (raw file):
Previously, creachadair (M. J. Fromberger) wrote…
I see what you mean. It's just weird to have this random public (and non-static) member function that has nothing to do with the contents of the instance. If it's not specific to the type, can it be a free function rather than a member?
Got it, thank you for explaining, I see the concern now!
But it does use that type - it access this
to link a new node to this contextExt.
Of course we can have a "factory" functions that receives both ctx and nodeHandler, but as soon as it uses the type and the python client follows the same convention, I would prefer to keep it.
I'm sorry that it may be feels like I keep repeating this argument way too often, and this should be a separate conversation later on whether we want to keep it this way.
But at least for now, until we have all clients feature-complete and rolling, I feel like the ability to maintain the clients depends on this fact, as it does reduce the bus factor for two project twice.
src/main/native/org_bblfsh_client_v2_libuast_Libuast.cc, line 599 at r3 (raw file):
Previously, creachadair (M. J. Fromberger) wrote…
Maybe I'm using the wrong terminology. The code as written here does a somewhat stately type conversion to a
Context*
, diddles some state of theContextExt
, then deletes theContext*
. Are you saying that we're pulling out and invalidating a field ofContextExt
? And if so: Could theContextExt
have a member function to do that, e.g.,ext->disposeContext()
or something—so that it's clear we're not removing something that will dangle?As an aside, the names of these things are frustratingly generic, and there are not nearly enough comments anywhere. I don't know what a "context" is or what its relationship is supposed to be to a "contextExt" and as far as I can tell "handle" is just a
void*
cast to an integer. Why did we do that, I wonder? The confusion is not your fault—all these things pre-date this PR—but it makes it really hard to understand the effects of the new code. So please forgive me if what I'm saying is missing some key point.
I was about to say that this part of code has nothing to do with ContextExt
, only to find the typo in the code that have lead to this confusion 🤦♂️
It's fixed now.
src/main/native/org_bblfsh_client_v2_libuast_Libuast.cc, line 616 at r3 (raw file):
Previously, creachadair (M. J. Fromberger) wrote…
That makes sense—but then I think we should take a careful sweep over the other methods here, because I think there may be other cases where we're not doing this check but where an exception could occur.
I this we will be better off doing that in a separate run, for the both clients at the same time - this implementation follows the logic of python client as close as possible, so that the knowledge of how to maintain it is at least shared and thus can be spread among the team.
I also belive Nacho is already on it for python (WIP) and the fixes, as soon as they are done, can be easily ported in the following PRs.
src/main/scala/org/bblfsh/client/v2/libuast/Libuast.scala, line 42 at r3 (raw file):
Previously, creachadair (M. J. Fromberger) wrote…
I see, thanks for the explanation.
I think what you have here now is OK. I wonder, though, if it would not be simpler to use
private var nextNode : Option[T] = _
. We're essentially emulating that anyway, by carefully adjusting a separate Boolean value, and at that point I'm not sure the advantage of storing theT
by value is sufficient to warrant the confusion. At least an Option encapsulates the presence checks.Maybe that has different problems, I don't know, but I feel like this is too complicated for such a simple task as keeping track of a single item that may or may not be present.
Interesting point, thank you for a suggestion!
One thing is that Options in Scala are immutable, so the cache invalidation in this case will create garbage. And we'll still need to unwrap Options in .next()
(as otherwise it defeats the purpose of Iterator protocol).
I have pushed the Option version. On the upside, it now does not have any returns so it's going to be faster (return
in Scala means throw new NonLocalReturnControl()
in JVM).
For me personally, previous version seems more readable, as having a meaningful variable name makes lookahead logic obvious to the reader, instead of it being hidden behind Option.
src/test/scala/org/bblfsh/client/v2/LibuastManagedIteratorTest.scala, line 78 at r3 (raw file):
Previously, creachadair (M. J. Fromberger) wrote…
I mean—if two sequences are equal, they must be the same length, right? And if the test harness is broken so that it can't compare them, wouldn't we have other problems?
Anyway, this isn't wrong, and it's fine; it just struck me as unnecessary, and I was trying to figure out why the extra check.
Ofc, true, it's not strictly necessary, but that's why they are in this order, so the stronger one can fail later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@creachadair I belive all feedback has been adressed, ready for another round.
Reviewable status: 13 of 15 files reviewed, 2 unresolved discussions (waiting on @creachadair and @dennwc)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 2 files at r5.
Reviewable status:complete! all files reviewed, all discussions resolved (waiting on @dennwc)
src/main/native/org_bblfsh_client_v2_libuast_Libuast.cc, line 132 at r3 (raw file):
Previously, bzz (Alexander) wrote…
Got it, thank you for explaining, I see the concern now!
But it does use that type - it accessthis
to link a new node to this contextExt.Of course we can have a "factory" functions that receives both ctx and nodeHandler, but as soon as it uses the type and the python client follows the same convention, I would prefer to keep it.
I'm sorry that it may be feels like I keep repeating this argument way too often, and this should be a separate conversation later on whether we want to keep it this way.
But at least for now, until we have all clients feature-complete and rolling, I feel like the ability to maintain the clients depends on this fact, as it does reduce the bus factor for two project twice.
I see, thank you for your patient explanation. Yeah, I agree that for now we should probably leave this alone.
src/main/native/org_bblfsh_client_v2_libuast_Libuast.cc, line 616 at r3 (raw file):
Previously, bzz (Alexander) wrote…
I this we will be better off doing that in a separate run, for the both clients at the same time - this implementation follows the logic of python client as close as possible, so that the knowledge of how to maintain it is at least shared and thus can be spread among the team.
I also belive Nacho is already on it for python (WIP) and the fixes, as soon as they are done, can be easily ported in the following PRs.
Makes sense, agreed.
src/main/scala/org/bblfsh/client/v2/libuast/Libuast.scala, line 42 at r3 (raw file):
Previously, bzz (Alexander) wrote…
Interesting point, thank you for a suggestion!
One thing is that Options in Scala are immutable, so the cache invalidation in this case will create garbage. And we'll still need to unwrap Options in
.next()
(as otherwise it defeats the purpose of Iterator protocol).I have pushed the Option version. On the upside, it now does not have any returns so it's going to be faster (
return
in Scala meansthrow new NonLocalReturnControl()
in JVM).For me personally, previous version seems more readable, as having a meaningful variable name makes lookahead logic obvious to the reader, instead of it being hidden behind Option.
Well, as I said, what you had before does work. But having to carefully manipulate the flag in two separate methods meant I had to read through the code repeatedly to convince myself we had done it correctly (you had—but I went through a couple iterations of thinking otherwise till I traced it out). So my rationale for not having the explicit Boolean is that the Option signals the intent more clearly.
That said: I would also not block if you think the explicit version was truly preferable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! all files reviewed, all discussions resolved (waiting on @dennwc)
src/main/scala/org/bblfsh/client/v2/libuast/Libuast.scala, line 42 at r3 (raw file):
Previously, creachadair (M. J. Fromberger) wrote…
Well, as I said, what you had before does work. But having to carefully manipulate the flag in two separate methods meant I had to read through the code repeatedly to convince myself we had done it correctly (you had—but I went through a couple iterations of thinking otherwise till I traced it out). So my rationale for not having the explicit Boolean is that the Option signals the intent more clearly.
That said: I would also not block if you think the explicit version was truly preferable.
I agree, having a single piece of state with well-known and understood semantics is of a great benefit for the reader. And am great full for your suggestions - after looking at it again, final version seems cleaner at the end 🎉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! all files reviewed, all discussions resolved (waiting on @dennwc)
Addresses part of the #83
TODOs:
This change is