Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assignment of UUID #767

Open
rswetnam opened this issue Oct 11, 2015 · 27 comments
Open

Assignment of UUID #767

rswetnam opened this issue Oct 11, 2015 · 27 comments
Labels

Comments

@rswetnam
Copy link
Contributor

Section 2.4 says describes an id as:

UUID assigned by LRS if not set by the Activity Provider.

It seems to me that this could cause problems. Let's say I am a small communitycollege and I have an activity provider that generates statements but does not assign UUID's to those statements. Let's say I am responsible for sending my statements of learning experience to the regional LRS and the national LRS. This means that national and regional LRS' would have different uuids for the same learning statements that I generated. I see this as problematic.

@garemoko
Copy link
Contributor

This is a good catch - whilst most APs don't send statements to multiple LRSs, some might and may not realize the particular important of generating the statement id.

In section 2.4.1 we already say Ids SHOULD be generated by the Activity Provider. For 1.0.3 we could also say Where the Activity Provider will send the statement to multiple LRSs, the Activity Provider SHOULD* ensure that the same Id is used. This leaves open the option of sending the statement without id to one LRS and re-using that id for subsequent LRSs, which would be fine.

We should also add some wording explaining the rationale for using the same id in both LRSs, perhaps also explaining why the id is a UUID at the same time.

@rswetnam
Copy link
Contributor Author

I don't believe that what I described is an edge case nor do I believe that exhorting activity providers to be good citizens will solve it - especially if the xAPI becomes widely used. The problem I believe stems from the current "definition" of the Activity Provider which is not a definition but a casual description which does not identify the key functions of this component in the xAPI ecosystem. I suggest that the definition of this component be changed from:

The software object that is communicating with the LRS to record information about a learning experience. May be similar to a SCORM package in that it is possible to bundle learning assets with the software object that performs this communication, but an Activity Provider could also be separate from the experience it is reporting about.

to something like:

A software component responsible for generating syntactically correct, properly formatted and uniquely identified statements of learning experience and providing these to Learning Record Stores.

I don't think that the learning record store component should be responsible for generating uuids. That's not to say that the current LRS out there should not be be generating uuids - as many probably contain the statement generation function - but that's a discussion for another thread...

@brianjmiller
Copy link
Contributor

Why would the AP be sending to the regional and/or national LRS, why wouldn't the local LRS do it, at which point the id has already been assigned?

I think the fact that you can say "I see this as problematic." means that so will the implementer on that side, IOW they'll think "hey I have to already have the id to make this match up, how do I achieve that?". Ensuring an AP can talk to multiple LRS would be more difficult. Making the AP assign an id puts burden on the AP which we've always tried to avoid, and having the LRS have that burden seems reasonable, and an id is certainly a requirement.

I'm 👎 on the change in general, this is a technical spec, people should read it that way. On the AP description, certainly could change, but I think there are issues with the suggested definition, not the least of which is there are APIs other than the statement ones. An AP is just a system that communicates with an LRS.

@rswetnam
Copy link
Contributor Author

What does IOW mean?

On Mon, Oct 12, 2015 at 1:14 PM, Brian J. Miller [email protected]
wrote:

Why would the AP be sending to the regional and/or national LRS, why
wouldn't the local LRS do it, at which point the id has already been
assigned?

I think the fact that you can say "I see this as problematic." means that
so will the implementer on that side, IOW they'll think "hey I have to
already have the id to make this match up, how do I achieve that?".
Ensuring an AP can talk to multiple LRS would be more difficult. Making the
AP assign an id puts burden on the AP which we've always tried to avoid,
and having the LRS have that burden seems reasonable, and an id is
certainly a requirement.

I'm [image: 👎] on the change in general, this is a technical spec,
people should read it that way. On the AP description, certainly could
change, but I think there are issues with the suggested definition, not the
least of which is there are APIs other than the statement ones. An AP is
just a system that communicates with an LRS.


Reply to this email directly or view it on GitHub
#767 (comment).

Roger Swetnam
p. 604-408-5703
c. 778-848-3118

@aaronesilvers
Copy link
Contributor

In Other Words (IOW)

-a-

On Oct 12, 2015, at 4:17 PM, rswetnam [email protected] wrote:

What does IOW mean?

@rswetnam
Copy link
Contributor Author

@brianjmiller I'm afraid this is the first technical spec that I've commented on so any suggestions you have on how I should be reading it or am misreading it would be greatly appreciated.

Also, I was unaware that you've always tried putting the burden on the AP to generate unique identifiers. What's the rationale for this? If you could help me here, I think I would be in a better position to reply.

@rswetnam
Copy link
Contributor Author

I am concerned that the proposed solution for multiple unique identifiers for the same learning experiences relies on Activity Providers to not only be good citizens but smart citizens.

Consider the following use case. xApi has become very popular and recruitment agencies are asking job applicants to post their CVs to their Learning Record Stores. I've just graduated from Blogs University and have just figured out that I can create a table of my learning experiences create a little program that will allow me to post my collections of learning statements to multiple recruitment agencies by simply inputting their address and pressing a button. Because I didn't attend the class on unique identifiers, I don't associate uuids with my statements - but no problem - all 50 of them will accept my postings and generate those uuids for me.

@canweriotnow
Copy link
Contributor

Section 2.4 says describes an id as:

UUID assigned by LRS if not set by the Activity Provider.

It seems to me that this could cause problems.

There are probably enough things in the spec that "could cause problems" to overflow a 32-bit unsigned integer. But LRS-assigned UUIDS are a positive boon because:

  1. Most APs don't need to burden themselves with UUID-generation, as that seems needless.
  2. LRS-generated (and returned to the AP as per the spec!) UUIDs can optimize by using, for instance, SQUUIDs (semi-sequential UUIDs), which avoid index fragmentation on the LRS.

What @rswetnam is describing is not an AP->LRS problem, it's an LRS->LRS problem; if you need to move statements between LRSs, take the whole statement including UUID as the spec (at least seemingly) mandates. Don't generate a new one just because you're moving data around, that defeats the purpose of having UUIDs in the first place.

@aaronesilvers
Copy link
Contributor

@canweriotnow, this makes a lot of sense to me:

if you need to move statements between LRSs, take the whole statement including UUID as the spec (at least seemingly) mandates. Don't generate a new one just because you're moving data around, that defeats the purpose of having UUIDs in the first place.

The spec is vague in that regard. At the time this language was first put in, LRS-to-LRS communication wasn’t something yet practiced and (2.4) as it stands could be interpreted both ways.

If in practice LRSs aren’t currently respecting the original UUID in LRS-to-LRS communication, I can see this clarification as something I’d like to see in a 2.0 because this could certainly be a breaking change for some.

@aaronesilvers
Copy link
Contributor

@rswetnam,

Also, I was unaware that you've always tried putting the burden on the AP to generate unique identifiers. What's the rationale for this? If you could help me here, I think I would be in a better position to reply.

Each AP is it’s own hub of activity in an emergent network of APs. To localize the use of the AP’s generated data, an AP has to have access to the UUIDs for the statements it creates.

@canweriotnow
Copy link
Contributor

@aaronesilvers So, point by point:

  1. I'm not sure this is 2.0 because it's not (in semver terms) necessarily a breaking change; once a UUID has been assigned, it is a part of that immutable statement - for 2.0, we might want to address a single AP submitting to multiple LRSs at once, because once that statement is recorded, and the UUID assigned, they effectively become two different statements containing the same data. This is a potential area for concern.
  2. We've dealt with AP->Intermediary LRS->Final/canonical LRS situations. It's not a problem if implemented intelligently. Intelligent implementation is left as an exercise for the reader.
  3. Since the response to a statement POST (or PUT) is an Array of statement UUID's. the AP always has access. Yet another implementation detail.

I guess it frustrates me that we're treading in the footsteps of the IETF who wrote RFCs expecting that people would implement TCP or HTTP intelligently and clean up their own messes if they screwed up, and then fall in these morasses which more closely resemble CSS, where the standards designer assumed he was designing for idiots, and we had to invent things like LESS and SASS/SCSS to get around the idiotic idiot-proofing.

I'd rather design for the intelligent case.

P.S. Not calling anyone here an idiot, I just have been through too many standards processes and seen successes and cock-ups and would like to stay on the success side (which is where I believe we are headed).

@canweriotnow
Copy link
Contributor

@aaronesilvers Also, consider that not all APs are created equal; I know you're aware, but many of us forget, xAPI is not just another iteration of SCORM; it goes so far beyond trite LMS applications... a thousand beacons or RFID readers could be reporting to a single embedded system (Rasp Pi, Android device, etc.), that is actually capable of constructing statements and making an HTTPS connection to an LRS... or maybe a simpler system that assembles data, sends bytes over TCP to a system capable of HTTPS communication... what, then is the AP? The RFID reader? The intermediary, slightly smarter chip? The system that can actually make an HTTPS POST and get the UUIDs back?

We're not building for the same world that a lot of the old SCORM folks are thinking of... and if we target xAPI to that we're shooting ourselves in the face, repeatedly, with a mortar.

We're used to a world of pure signal, test scores and the like... I suggest that anyone interested in the future of xAPI read Claude Shannon before proceeding.

(Also, I use HTTPS here b/c if you're sending data to an LRS over HTTP, I'm going to MiTM attack that out of sheer principle. It'd be unethical to do otherwise.)

@brianjmiller
Copy link
Contributor

@rswetnam I'll reply to you directly since you addressed me but I think the others have captured well the thoughts. I think you are reading it fine, I just wanted to caution against adding too much to a technical spec that wasn't purely technical, which is to say testable as well. I think your thoughts and commentary are worthwhile, even just in an issue like this where they can be searched, are discoverable, the history can be captured, etc. I consider that working really well, without having to add non-technical language to describe things to the document itself. To me (read: opinion) it is enough to say APs can generate ids, LRSs must, and that they are UUID, leave the use cases, best practices, and reasons as to why for issues, blog posts, tweets, etc.

I think you reversed something, we try to put the burden on the LRS (avoid doing so on the AP). The reason to put the burden on the LRS rather than the AP is because we are hoping there are a lot of them (millions perhaps), but there are very likely to remain few implementations of LRSs (hundred? hundreds? right now I think we are at about 10-15) and probably at least one order of magnitude fewer installations.

@rswetnam
Copy link
Contributor Author

In suggesting that APs be required to include unique identifiers with statements posted to Learning Record Stores, I am making the following assumptions:

  • There is a very real possibility that APs will broadcast statements of learning experience to multiple LRS' such as the case of job applicants broadcasting their cvs to the LRS's of multiple employers
  • If they are not required to do so, some will not include unique identifiers with their statements
  • Having multiple copies of the same learning experiences with different uuids is a problem for the system
  • Requiring APs to include uuids with Learning statements would not be that much of an additional burden. If a piece of software is is capable of generating and posting properly formatted and syntactically correct statements to an LRS, then also including UUIDs is not that much of an additional hurdle
  • The burden to the APs and reduced accessibility that requiring UUIDs would result in would be more than offset by the increased integrity of the overall system.

I am more than open to the possibility that any of my assumptions are wrong and would be interested to learn which and why.

@brianjmiller
Copy link
Contributor

what, then is the AP?

I think that is the point. Personally I'd like to see us use the first sentence in the original definition of an AP, because that is the only thing that one can really say about it. It is specifically a system communicating with an LRS. All of the things that aren't using the model and APIs defined in the xAPI spec are not APs and they are talking to systems that are not LRSs. They may well be doing things correctly and efficiently, they just aren't using xAPI even if that data is eventually translated (the key word here) into xAPI. And yes an LRS can itself by an AP as it may be a system communicating with an LRS (either internally to itself) or with another LRS.

I think @aaronesilvers comment that "At the time this language was first put in, LRS-to-LRS communication wasn’t something yet practiced" isn't correct. LRS to LRS communication was known to be needed and had been explored at the earliest conceptions of the spec and is the reason for a number of the properties of statements, such as timestamp vs stored, the authority, etc.

@brianjmiller
Copy link
Contributor

@rswetnam that's why it is a "SHOULD" requirement, it is considered a best practice and they should do so. But making it a "MUST" may well break some use cases and at this point would certainly be backwards incompatible.

If they are not required to do so, some will not include unique identifiers with their statements

But that isn't a problem for the spec, that is a problem for the AP. If it makes that AP less capable then the market will handle that.

Having multiple copies of the same learning experiences with different uuids is a problem for the system

May be a problem for the system, that system may also know how to handle that problem wisely. Something being a technical challenge through poor authoring isn't something that the spec has to correct for.

I'll agree, I don't see generation of UUIDs as a burden, but I suspect that is also why it is included as a "SHOULD" instead of being left out completely. Having said that, random number generation can be taxing on a system, and as @canweriotnow has pointed out the types of systems likely to be generating statement like data could be very low powered and may not need to know the UUIDs of its statements, so unnecessarily burdening them to solve a non-problem doesn't seem necessary.

As far as an AP sending a statement to multiple LRSs, I disagree that they need independent identifiers, there isn't anything indicating an AP can't send the same statement to multiple LRSs. The LRS is expected to be able to handle receiving the same statement multiple times, how to do so is explicitly called out in the specification, and covers the case where it may come from multiple different systems (such as the originating AP vs. another LRS).

@aaronesilvers
Copy link
Contributor

I think @aaronesilvers https://github.com/aaronesilvers comment that "At the time this language was first put in, LRS-to-LRS communication wasn’t something yet practiced" isn't correct. LRS to LRS communication was known to be needed and had been explored at the earliest conceptions of the spec and is the reason for a number of the properties of statements, such as timestamp vs stored, the authority, etc.

@brianjmiller to try and be clear here, it’s not a dig.

Yes, we absolutely knew LRS-to-LRS communication was needed and had been explored, but you have to admit there’s a difference between knowing, modeling and prototyping and that communication being realized with multiple LRSs developed by different teams. My point being, yes, of course, conceiving of the need for LRS-to-LRS communication influenced several properties — and still there are nuances that may be recognized only when it’s in the wild and we realize that despite our best efforts, without the aid of conformance requirements spelling out what are seemingly minute details, that implementation may vary.

@aaronesilvers
Copy link
Contributor

+1 @brianjmiller

@rswetnam https://github.com/rswetnam that's why it is a "SHOULD" requirement, it is considered a best practice and they should do so. But making it a "MUST" may well break some use cases and at this point would certainly be backwards incompatible.
If they are not required to do so, some will not include unique identifiers with their statements

But that isn't a problem for the spec, that is a problem for the AP. If it makes that AP less capable then the market will handle that.

Having multiple copies of the same learning experiences with different uuids is a problem for the system

May be a problem for the system, that system may also know how to handle that problem wisely. Something being a technical challenge through poor authoring isn't something that the spec has to correct for.

I'll agree, I don't see generation of UUIDs as a burden, but I suspect that is also why it is included as a "SHOULD" instead of being left out completely. Having said that, random number generation can be taxing on a system, and as @canweriotnow https://github.com/canweriotnow has pointed out the types of systems likely to be generating statement like data could be very low powered and may not need to know the UUIDs of its statements, so unnecessarily burdening them to solve a non-problem doesn't seem necessary.

As far as an AP sending a statement to multiple LRSs, I disagree that they need independent identifiers, there isn't anything indicating an AP can't send the same statement to multiple LRSs. The LRS is expected to be able to handle receiving the same statement multiple times, how to do so is explicitly called out in the specification, and covers the case where it may come from multiple different systems (such as the originating AP vs. another LRS).

@canweriotnow
Copy link
Contributor

👍 @brianjmiller and @aaronesilvers

@rswetnam
Copy link
Contributor Author

I accept that there is no appetite for requiring APs to include unique identifiers with their statements. At the risk of beating a dead horse, I'd like to have one last kick at the can for consideration of this position in a future version of the spec.

I'm looking to land a job in an LRS factory.

  • I send out the statement that I passed xAPI 101 to 50 recruitment agencies but do not include a UUID with the statement.
  • These 50 agencies accept my statement and assign their own UUIDs.
  • Then 20 of those agencies send my statement along with their own UUIDs to 3 potential LRS employers.
  • Because the UUIDs are different, each of the LRS employers accept 20 apparently different records describing my one learning experience.

Now sure, these LRS employers can parse the 20 records and see that they describe the same experience - and do something with them. I guess my point is why add this cost and complexity to the overall system when it could be avoided simply by requiring APs to include unique identifiers with their statements - kind of like client-side validation for the xAPI ecosystem. I'm not convinced that this a significant marginal cost for APs or that it will greatly reduce accessibility to the system.

Fortunately most APs are not lazy like me and will follow best practices and include unique identifiers in their xAPI statements.

@canweriotnow
Copy link
Contributor

@rswetnam

Fortunately most APs are not lazy like me and will follow best practices and include unique identifiers in their xAPI statements.

The above is only true for your (and related) use case(s). But let's follow your logic:

Why not follow better practices for distributed systems and:

  • Pass a single statement to an authoritative LRS that allows public GET(s)
  • Pass a URI pointing to that statement to 50 recruitment agencies
  • Let them retrieve it with LRS-generated UUID
  • Go to bed and maybe since they're not trying to untangle a clusterf*ck of possibly (or not) contradictory data, wake up to an email inviting you to an interview?

Apologies for the snark, but at this point, it really does seem like a "learn the spec and be smart about implementation" sort of problem.

And FWIW, I do work in an "LRS Factory" or the formal equivalent, and if you passed the same statement to 50 LRSs from a single AP w/o realizing "oh, hey, this is the use case where I should include a UUID", you'd just have disqualified yourself as a candidate 😸

@garemoko
Copy link
Contributor

I'm going to try and summarize the key points so we can discuss on a future call.

Stuff we agree on:

  1. The spec recommends that the AP generates the UUID. It also says that statements are immutable.
  2. It's possible that an AP might send the same statement to multiple LRSs. (There's some debate on how common this is/will be).
  3. It's possible that the AP might not generate the statement id in this case. Each LRS would then generate a different statement id. This would be bad. (There's some debate on how bad this is).

Possible options:
@rswetnam would like to always require the AP to generate the statement id because he thinks this will be a common case and very bad. This would be a breaking major version change.

I would like to add a sentence or two explaining why the AP should ideally generate the statement id and why it's especially important if sending the statement to multiple LRSs. I think it's a not-impossible use case that could case headaches for reporting.

@brianjmiller would like to leave the spec as it is. APs should be able to figure it out for themselves.

@canweriotnow seems to be (correct me if I'm wrong) arguing for recommending that the LRS generates the statement id and outlined some benefits of that case.

@canweriotnow
Copy link
Contributor

@garemoko I'm not for disallowing AP id generation, but I'm arguing the case for LRS id generation as allowed, and perhaps preferred, practice. Pretty much in line with @brianjmiller, I think.

If I'm against anything, it's designing APs to send to multiple LRSs by default; there should be an intermediary redundancy check, at least. I can envision scenarios in which for instance, tons of sensors have tons of potential "hubs" (for lack of a better word) to which they send data; multiple hubs might have received duplicate data from sensors in range; therefore the logical solution is to have them forward that data to a location actually responsible for crafting the statements and sending them to an LRS.

Of course, this is an inverse of @rswetnam's problem; he describes a conventional system broadcasting data to LRSs around the globe. I see two solutions:

  1. Create a pub/sub model for xAPI 2.0
  2. Fix the problem before it gets there.

Now, I'm not in principle against (1), in fact, it's pretty sexy as distributed data infra goes. But I don't think it's necessary. As for (2), that's easy. IF YOUR AP IS GOING TO PUBLISH TO >1 LRS GENERATE A UUID if not, the spec just works as is.

If you don't know in advance which is the case, FIND OUT BEFORE DEPLOYING YOUR FRAKKING AP.

I'm happy to discuss this on the next call if necessary, but my gods, of all the issues we need to deal with in this spec, this should be just under "what color do we paint the bikeshed?"

@aaronesilvers
Copy link
Contributor

FWIW, https://github.com/pubsubhubbub is a pub/sub model and it is referenced (maybe even used?) by the Federal Learning Registry, which was conveniently developed as a predecessor to xAPI.

As @canweriotnow asserted, the very notion of this is definitely a 2.0 topic for discussion and possibly consideration.

@andyjohnson
Copy link
Contributor

Great discussion! @rswetnam this is why new blood is so important to really get these things thought through. I think all of the points have been made, here's where I line up:

  1. I don't see the same Statement out there with multiple UUIDs as a huge problem. The internet is up to about 99% repeated crap. The cases where this happens would be small and get smaller as the "worse" APs would either learn or be forced out of the market. With timestamps, we could also see LRSs cleaning up their collective Statement repetitions (I know Statements are immutable, but we don't talk about LRSs purging Statements that are not needed). Also, how many data flows would send Statements FROM one AP TO multiple LRSs which THEN compare Statements? If the LRSs have a "broski" relationship, they can work this problem out.

  2. All for a pub/sub model, but way beyond scope of this version. I'd prefer someone else solves this problem as it isn't specific to xAPI.

  3. I'm glad @canweriotnow made the point of smaller devices sending Statements. APs may not have the processing power or functional capability to generate UUIDs, which is fine for that use case and not something we want to lose. Also, SCORM made ISDs have to learn SCORM, I'd prefer people can create xAPI content without understanding UUIDs (and I don't want to dump that on the tools).

  4. Our primary use case is still single AP to single LRS. There will be LRS ecosystems, but they shouldn't be driving the spec.

@rswetnam
Copy link
Contributor Author

Thanks @garemoko for excellent summary here. I see that the ideas I am putting forward involve breaking changes and are more suitable for discussion on the next major semantic version of the spec.

Is there an area that is more appropriate where are could put forward this argument as I do not want to muddy discussions about the current minor version? For that venue, I would like to make the following points:

  • Having an ecosystem in which there are multiple UUIDs for the same learning statements is a definite possibility in a world where some people with "personal learning lockers" are able to broadcast statements without UUIDs.
  • I'm not sure how bad for the broader system this is. As @andyjohnson points out the internet is made up of 99% repeated crap.
  • I would suggest, however, that allowing for the possibility of multiple UUIDs for the same statements requires an additional level of capability and complexity for LRSs that are required to do additional parsing and handling of "dirty" data that they would not have to do otherwise. I am not sure how much of an extra burden this is.
  • I do think however that there might be unintended consequences here in that taking the burden of requiring activity providers to supply uuids with their statements - presumably as a means of increasing accessibility to the system - does place an additional burden on LRSs and somewhat reduces the accessibility of the system to smaller LRS providers.
  • It could be argued that such a system reduces accessibility for small recruitment agencies or HR departments from deploying simpler LRS appliances for receiving statements of learning from personal learning lockers requiring them instead to rely on LRSs supplied by larger companies better able to deal with increased complexity.

Let me say that I am not advocating for some sort of Jeffersonian democracy of small independent LRSs. I'm just saying that we should consider the possibility that building additional complexity into the system by not requiring generators of learning statements to include UUIDs with them may have consequences that we have not fully thought out. That is perhaps a discussion for another forum.

Finally, @canweriotnow Jason Lewis I would request that you tone down the personal attacks against me. While getting the other kids in the playground to shout out bikeshed colours is effective in showing your contempt for me and my tedious arguments, I don't think it adds to the level of discussion in this forum. I would however, be interested in learning more about how small-powered devices which might not be able to generate UUIDs fit into the system that you mentioned earlier in this thread.

@thomasturrell
Copy link
Contributor

There are some interesting effects caused by the way in which the spec handles identity, equality and immutability.

Two statements with different id's could be equal.
Two statements with the same id are not necessarily equal (but the second one would be rejected by an LRS).
Statements are immutable but the id (and some other properties) can be mutated.

Given that equality is not defined by two statements having the same ID, it is difficult to see how it would be possible to determine if the same statement existed in two different LRS's.

Interestingly I have seen LRP's send the same statement to an LRS multiple times but given that the LRS has no way of testing if the same statement has already been stored, it simply stores it again.

Of course letting the learning record provider set the id presupposes that it is capable of generating UUID's in a way which avoids collisions. In the past, I have seen LRP's which frequently send the same ID.

Personally I would like to see a future version of the spec either require the learning record provider to set the ID or prohibit the activity provider from setting the ID. Allowing both the activity provider and LRS to set the ID seems unnecessary.

It's interesting to note that cmi5 requires activity providers to set the ID in statements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants