Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash when filtering entities using an unkown column name #6600

Closed
grzesiek2010 opened this issue Feb 11, 2025 · 11 comments · Fixed by #6601
Closed

Crash when filtering entities using an unkown column name #6600

grzesiek2010 opened this issue Feb 11, 2025 · 11 comments · Fixed by #6601
Assignees

Comments

@grzesiek2010
Copy link
Member

Problem description

The app crashes when filtering entities using an expression that checks if another question in a form has no answer, using undefined or null instead of an empty string. Logic like: ${q0} = undefined.

android.database.sqlite.SQLiteException: no such column: p_undefined (code 1 SQLITE_ERROR): , while compiling: SELECT *, i.rowid
                                                                                                    FROM "peopleUndefined" e, "peopleUndefined_row_numbers" i
                                                                                                    WHERE e._id = i._id AND p_undefined = ?
                                                                                                    ORDER BY i.rowid

Steps to reproduce the problem

  1. Open this form https://staging.getodk.cloud/#/projects/137/forms/peopleUndefined2
  2. Try to navigate till the end of it.

Expected behavior

There should be no crash. We can't optimize such filters that reference other questions, as the SQLite queries we build for optimization rely solely on the entities database.

@lognaturel
Copy link
Member

lognaturel commented Feb 12, 2025

This should be a duplicate of #6597

undefined should be treated as a column which doesn't exist. The shape of the sql query and the error suggest that is the case.

@seadowg seadowg moved this from inbox to in progress in ODK Collect Feb 12, 2025
@seadowg
Copy link
Member

seadowg commented Feb 12, 2025

@grzesiek2010 I've assigned #6597 to you as well, so the PR should just mark both issues as closed (assuming the fix does tackle both).

@grzesiek2010
Copy link
Member Author

grzesiek2010 commented Feb 13, 2025

This should be a duplicate of #6597

To me, this is not a duplicate, as these are two distinct issues:

@lognaturel please correct me if I'm wrong.

@seadowg
Copy link
Member

seadowg commented Feb 13, 2025

@grzesiek2010 yeah it sounds like they're subtly different to me from that. I still think it makes sense to just tackle them together though as the changes should be within the same area?

@grzesiek2010
Copy link
Member Author

Yes the fix is actually the same I just wanted to make it clear that the issues are not duplicates.

@lognaturel
Copy link
Member

Sorry if I'm coming across as pedantic! I want to make sure we have alignment on this because if we have misunderstandings about the expression the implementation might be wrong.

The user intent is different between the two so I agree on that. But I still believe that the query shape is identical. It's entirely possible that I have a misunderstanding too so I want to outline my thinking.

that references another question with an undefined/null value

To me, this wording implies that undefined would be treated as though it referenced a question in the form. That should not be what the expression ${question}=undefined is interpreted as when it's in a predicate (choice filter). The undefined part should be treated as a relative reference in the context of the secondary instance. So it should be treated as a property/column which happens not to exist.

Maybe the form I shared with this pattern caused confusion because it also had calculates with expressions like ${question}=undefined. When that expression is in a calculate, undefined is a relative reference in the context of the primary instance. So it is indeed treated as a question that does not exist. Even then, it's not that it has an undefined/null value, it's that JR returns '' when it can't evaluate a reference.

when we fall back to Javarosa, it evaluates correctly as ${question}=""

That sounds right. That part of the expression should be equivalent to ${question}="" for each item in secondary instance. In other words, JR should be evaluating ${question}=undefined in the context of each item in the secondary instance, finding that undefined doesn't exist, and returning '' instead.

However, Javarosa returns an empty list of properties for such a non-existent property, which makes sense to me.

Can you say more about what that means and how it's different from how the other query is evaluated?

@grzesiek2010
Copy link
Member Author

The user intent is different between the two so I agree on that. But I still believe that the query shape is identical. It's entirely possible that I have a misunderstanding too so I want to outline my thinking.

Yes, the query is identical as in both cases a missing column is detected and an SQLiteException is thrown. That's because that undefined is treated as a column. However, when we fall back to JR it is treated as a value (empty string) that we compare with an answer entered for a given question.

To me, this wording implies that undefined would be treated as though it referenced a question in the form. That should not be what the expression ${question}=undefined is interpreted as when it's in a predicate (choice filter). The undefined part should be treated as a relative reference in the context of the secondary instance. So it should be treated as a property/column which happens not to exist.

Maybe the form I shared with this pattern caused confusion because it also had calculates with expressions like ${question}=undefined. When that expression is in a calculate, undefined is a relative reference in the context of the primary instance. So it is indeed treated as a question that does not exist. Even then, it's not that it has an undefined/null value, it's that JR returns '' when it can't evaluate a reference.

I didn't debug JR (I only added tests) so maybe I'm wrong but I assumed that in the example you shared with me earlier:
station_number=${staff_station_number} and (${birth_date_formatted}="" or birth_date=${birth_date_formatted})
when the answer for ${staff_station_number} is, say, 10, all entities with a station_number property of 10 will be returned - provided that either the answer for ${birth_date_formatted} is not entered or, if entered, it matches the birth_date property.

isn't that right? That's what I meant by saying that:

when we fall back to Javarosa, it evaluates correctly as ${question}=""

Can you say more about what that means and how it's different from how the other query is evaluated?

Again at the moment of building an SQLite query, both cases result in a missing column, and an exception is thrown but then when we fall back to JR:
For #6597 when we have a simple filtering with one property and that property does not exist then an empty list of entities will be returned. Again I didn't debug JR but that's what my tests show: https://github.com/getodk/collect/pull/6601/files#diff-4a25b4c0ba0780186c2edabe26d8518d4e3e6a022a482669373e2a71ceac6dfbR624
For #6600 it is no longer missing something as JR is able to evaluate the expression correctly. In other words, it has access to that referenced question and also is able to treat undefined/null as an empty string.

That's the difference. Does that make sense?

@seadowg
Copy link
Member

seadowg commented Feb 14, 2025

I've just gone a bit of a journey caching up on this.

Image

There are two things that were throwing me off:

  1. For issues like this, I think it's a good idea to avoid using XLSForm sugar. I'm personally finding it really hard to follow. As I understand it, the issue is referring to the expression instance('peopleUndefined')/root/item[ /data/q0 =undefined] that's used in a nodeset for a select1 right (looking at the referenced form).
  2. What also had me really confused (even though it now seems obvious) is that undefined (or null) has no special meaning in XPath as far as I (now) understand - JavaRosa would just parse it as a relative path expression for a child named "undefined". I went into this thinking undefined was sugar for '' (as there's no null concept in XPath).

Given that, I now understand the comment "undefined should be treated as a column which doesn't exist" from @lognaturel (and the following discussion) which was messing with me, but I'm now also confused as to why this isn't a duplicate of #6597: undefined (in this issue) and distributed (in #6597) are semantically equivalent. In both issues we've got an Eq expression against a relative path expression to a non-existent node. I'm probably just badly rewording everything you've both already said here, but I needed to get it down to get it straight in my own head 😂, so apologies for that.

Please correct me if I'm wrong here as I don't want to review #6601 again until I'm confident I've got this in my head properly.

@lognaturel
Copy link
Member

Got it about XLSForm sugar, sorry about that.

I still see them as duplicate issues. As I understand it, @grzesiek2010 is making a distinction between an expression with an unknown column and a literal value vs an unknown column and a form reference. See more at #6601 (comment)

@seadowg
Copy link
Member

seadowg commented Feb 14, 2025

I still see them as duplicate issues. As I understand it, @grzesiek2010 is making a distinction between an expression with an unknown column and a literal value vs an unknown column and a form reference. See more at #6601 (comment)

Ok so basically so we can break it down to two kinds of related expressions that are causing problems:

Forgetting about the optimizations for a sec and just considering XPath semantics, both expressions are semantically equivalent to true if the right (here, but they could be reversed obviously) side's actual value is an empty string or false for a non-empty string I think? A non-existent child's value is always '' basically.

If that's the case, I think we can optimize these expressions - even if they're nested with an and/or. I think it's worth discussing whether that's worth it or not. We can just fall back to JavaRosa's standard evaluation instead (like in #6601), but that will mean loading the whole entity list into memory. My initial assumption is that that's ok because these cases will be rare, but I wanted to throw that out there to see if anyone disagrees. The scenario in #6600 (a property referenced in a follow-up that doesn't get added until an update form in a later project stage) seems like it could be quite common to me for example.

@lognaturel
Copy link
Member

I agree that it would be very nice to optimize these but not critical for now. I put some thoughts on how at #6601 (comment)

@lognaturel lognaturel changed the title Crash when filtering entities using an expression that references another question with an undefined/null value Crash when filtering entities using an unkown column name Feb 17, 2025
@github-project-automation github-project-automation bot moved this from in progress to done in ODK Collect Feb 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: done
Development

Successfully merging a pull request may close this issue.

3 participants