Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Counting query results #448

Open
stevenvergenz opened this issue Apr 9, 2014 · 6 comments
Open

Counting query results #448

stevenvergenz opened this issue Apr 9, 2014 · 6 comments

Comments

@stevenvergenz
Copy link
Contributor

Currently, it seems like there is no clean way to determine the number of statements a given query will return. I can make a query, then recurse over the more URLs to build an aggregate total, throwing away the actual statement bodies, but that seems unnecessarily bandwidth- and processing-intensive for both the client and the server.

It would be great if successive versions of the xAPI included some means of obtaining this sort of aggregate data directly from the LRS database without having to resort to multiple large AJAX calls. Perhaps another query argument for the GET /statements API?

@andyjohnson
Copy link
Contributor

Discussed on the call, while we do recognize the need for features such as this, we would expect a Reporting API to handle query operations.

@stevenvergenz
Copy link
Contributor Author

Yeah, I'm already working on a reporting API (https://github.com/adlnet/xAPI-Dashboard), but it's all client-side Javascript, so it's less efficient than it would be LRS-side. It's good enough for now though, until some reporting standards emerge for the LRS.

@fugu13
Copy link
Contributor

fugu13 commented May 12, 2014

Reporting is a lot trickier than it seems. Even something as simple as a
count can be difficult as data scales (there's a reason that, for example,
Google search results don't give you the right number of results, sometimes
by a drastic amount). What's more, with streaming events, the exact total
number of events that match a coarse filter is rarely that interesting --
sure it's a number that could be displayed, but it's not a very good number
to understand anything other than approximate data volume (which an
approximate count can do just as well).

Similarly, as you've no doubt found, it's not going to make sense to do it
in-browser with javascript.

We've got some light experimentation in generic LRS reporting going on in
our Domesday code base (open source) (I spend waaay too much time thinking
up names): https://github.com/Saltbox/domesday . We're going with an
approach analogous to old school web analytics -- cronnable commands that
do batch processing (eventually batch incremental processing) to generate
static files for inspection. Right now we're just generating the core CSVs,
but we imagine eventually generating browseable HTML interfaces on top of
those.

Sincerely,
Russell

On Mon, May 12, 2014 at 6:09 AM, Steven Vergenz [email protected]:

Yeah, I'm already working on a reporting API (
https://github.com/adlnet/xAPI-Dashboard), but it's all client-side
Javascript, so it's less efficient than it would be LRS-side. It's good
enough for now though, until some reporting standards emerge for the LRS.


Reply to this email directly or view it on GitHubhttps://github.com//issues/448#issuecomment-42829382
.

@stevenvergenz
Copy link
Contributor Author

Here's the thing though: one rarely has to work with the entire body of statements in the LRS. If you're querying the database and it's trying to return millions of statements, you're probably doing it wrong. My javascript statement database can run queries on 10k statements no problem, and that's probably a reasonable scope for simple analytics (for example, an average test score).

I agree, accurate analytics on astronomical datasets are an unreasonable expectation. But for limited datasets it would be very helpful.

To your statement @andyjohnson, reporting tools do belong in a Reporting API. But since many LRSs will probably be implementing some form of reporting anyway, a set of guidelines, or even an optional standard, would be better than nothing.

@andyjohnson
Copy link
Contributor

I could definitely see a reporting spec/standard emerging - hopefully
sooner rather than later. The problem we've seen historically with
patchwork, thrown-together solutions is that they somehow become "best
practices", not "best" in the sense they are actually good practices, but
they become adopted in a widespread manner. It will take some time, but I
see ADL stepping up to the plate if it doesn't come from anywhere else.

On Tue, May 13, 2014 at 9:26 AM, Steven Vergenz [email protected]:

Here's the thing though: one rarely has to work with the entire body of
statements in the LRS. If you're querying the database and it's trying to
return millions of statements, you're probably doing it wrong. My
javascript statement database can run queries on 10k statements no problem,
and that's probably a reasonable scope for simple analytics (for example,
an average test score).

I agree, accurate analytics on astronomical datasets are an unreasonable
expectation. But for limited datasets it would be very helpful.

To your statement @andyjohnson https://github.com/andyjohnson,
reporting tools do belong in a Reporting API. But since many LRSs will
probably be implementing some form of reporting anyway, a set of
guidelines, or even an optional standard, would be better than nothing.


Reply to this email directly or view it on GitHubhttps://github.com//issues/448#issuecomment-42953808
.

Andy Johnson
ADL Technical Team
608-318-0049

@fugu13
Copy link
Contributor

fugu13 commented May 13, 2014

Hi Steven,

We're already seeing single activities for some customers hitting hundreds
of thousands to millions of statements -- data collection possible with the
#xapi is very rich. What's more, even if the number needing to be counted
isn't absurd (millions isn't), the problem is the number of combinations
needing to be counted (to return counts efficiently), and that grows much,
much more quickly.

Sure, even given that, simple result rollup over a single activity isn't
too bad (though I think most would find the time to fetch 10k records to
get basic summaries of results for a single activity less than responsive).
But that's really not enough. At the very least people trying to get real
use out of data will want to do rapid comparisons across multiple slices of
people and activities -- we've got a report for a customer that rolls up
over hundreds of agent-activity combinations over up to hundreds of
thousands of complexly filtered people (and more statements).

I don't want to be too negative -- I'm not! #xapi reporting is a
fascinating and broad thing that bears divers exploration. But in addition
to looking at things that are easy/basic to include now, it's also very
important to keep an eye on the broader future.

Sincerely,
Russell

On Tue, May 13, 2014 at 6:26 AM, Steven Vergenz [email protected]:

Here's the thing though: one rarely has to work with the entire body of
statements in the LRS. If you're querying the database and it's trying to
return millions of statements, you're probably doing it wrong. My
javascript statement database can run queries on 10k statements no problem,
and that's probably a reasonable scope for simple analytics (for example,
an average test score).

I agree, accurate analytics on astronomical datasets are an unreasonable
expectation. But for limited datasets it would be very helpful.

To your statement @andyjohnson https://github.com/andyjohnson,
reporting tools do belong in a Reporting API. But since many LRSs will
probably be implementing some form of reporting anyway, a set of
guidelines, or even an optional standard, would be better than nothing.


Reply to this email directly or view it on GitHubhttps://github.com//issues/448#issuecomment-42953808
.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants