-
Notifications
You must be signed in to change notification settings - Fork 402
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow HTML in all user facing strings #778
Comments
Per the 10/21/15, can't have preference of one markup language over another, can't assume browser, really we have to leave them as "plain text". Leave the text as is. Systems are free to manipulate data how they want; xAPI allows other avenues for varied formats of "Strings", etc. |
To me "plain text" means that the text shouldn't contain any formating, if it contains markup code (i.e. tags) they are not supposed to be understood as markup, but as plain text. Previously the specification said they would be strings, ie. they could be plain text or they could be markup. Perhaps we should revert back to strings, and add a standardized way to specify the "markup type"? So 'format: "html5"' for instance? |
I don't think you can argue that because the spec says "strings" that means any markup can be used and we should expect systems interpreting the data to figure it out. By that logic, why shouldn't the string by a series of 0s and 1s representing the text in binary? A property representing mark up type seems like a good idea, but any new property would have to wait for at least a version 1.1 which is not coming any time soon. The conclusion of #475 was that an extension can be used to store the markup version. If that extension gains significant adoption, then that's a good indication that it's a feature worth incorporating into the spec. |
You can send any markup you want, but yeah, you definitely can't expect a system interpreting the data to figure it out. I think an extension is a good idea. My recommendation would be to put a version that's reasonably human readable even if markup isn't interpreted in the existing xAPI fields, and to put the marked up version, annotated with information on the type of markup, elsewhere. My recommendation would be to use extensions based on existing ways of doing this (with content type for markup involved and so forth). |
I was referencing this change in the master branch: https://github.com/adlnet/xAPI-Spec/pull/560/files So yes, my suggestion is adding this property to 1.1 so that statements can have all relevant data and those "consuming" the statement will know how to handle the strings in the statements. I think it should be pretty obvious that this is needed, and thus we shouldn't have to wait for the community to find a way(or multiple ways) to do it. If this has already been discussed and the conclusion is to wait and see I don't agree with the conclusion, but I don't have a big problem with it either. |
Adding that outside of an extension would be incompatible with every existing LRS and would cause a need for a minor version change. Every other change we've been working on has been descriptive and does not require a minor version change. The right solution in this push, which is about organization and clarity, is to use extensions (or attachments, which you can think of as basically a specialized kind of extension). It's definitely reasonable to go on the table for future discussions; I think you'll find there's less practical need than you're supposing, though. Even if a system knows there's, say, HTML in a value, that doesn't make it safe to display that HTML, unless the system is fully confident of the provenance of the HTML. Rendering markup is an attack vector (even just display markup, if you allow styles that could move display elements around to where they shouldn't be). It'll always be a good idea to include a non marked-up version of text for systems that receive the data but aren't confident in its provenance. |
Isn't that something the LRS should handle? I mean purifying the HTML before displaying it? There aren't lots of HTML purifiers out there, but there are some good ones, at least if you can use GPL code. All CMSs have the same problem with user input from untrusted users, but they don't have to fall back to plain text for that reason. Sidenote: I have an impression that there might be too many LRS people in here compared to the number of authoring tool people. Soon you guys will have us aggregate the data for you as well ;) |
If you're bringing up CMSs, quite a few of them only allow authoring with a very tiny subset of HTML for low privilege general users, and for good reason; that's not a safety net that applies to systems working with xAPI data. Systems that take input from a broad variety of sources for later use, such as Google when choosing snippets of websites, generally strip all markup prior to display, even if they could preserve some of it. If you're worried about taking data and displaying it within the authoring tool, you're already free to send and interpret HTML (though I'd advise restraint unless you're fairly confident it was your authoring tool that sent it), you just can't rely on it being handled the same way by others. If you're looking to provide fields for general consumption by other systems, it will still always be a good idea to have a version of the data that isn't marked up at all, and that's why the fields in xAPI are that way. |
You said it, Google strips it. Not the websites that provides it. And off course, our authoring tool also just supports a few tags, and what tags we support depends on where it is used. We also strip away a lot of properties and protocols inside each tag, but that is a lot better than plain text. We would even provide safe HTML in the statements in most cases, but those receiving our statement off course can't trust them. After plain text has been added to the specification the specification tells authoring tools to strip away important data from the statements instead of keeping it there and only strip the data away when needed. |
Per the 10/4/15 call, realized the UTF-8 String requirement didn't make it into 1.0.3 Restructure. Recommend more string language in Part 2: New Section 4.1 (bumping others). Suggest language around expectations. Maybe "Systems digesting these strings cannot be expected to render various coding languages." - text in Language Maps is currently pretty good. Andrews example of a quiz question having an actual question on HTML shouldn't render it. The system that does not render is more correct than the system that doesn't not. Keep extensions, but short. |
You probably have a typo here? "The system that does not render is more correct than the system that doesn't not." "Systems digesting these strings cannot be expected to render various coding languages." sounds good. The systems will always have to regard the statements as untrusted and thus strip tags/purify etc. I see the problem with HTML code not being easily rendered in a non web view system, but if that system at least would know that this is HTML they could handle the markup if they want to - for instance display images. |
Ahh yes thanks, "does not" was the intent. |
@falcon-git it's been brought up before on this thread, but have you taken a close look at the attachment capability? (https://github.com/adlnet/xAPI-Spec/blob/master/xAPI.md#4111-attachments) The use case you describe at the start of the issue seems like the sort of thing attachments were added for though there does seem to be a gap in tying the "completion evidence" in the attachment to interaction type activities. |
Yes I have, but it was hard to know how to map the attached image to an alternative (probably what you were saying as well) For images that belongs to the question "What do you see in this photo?" the attachment makes a lot of sense. |
Suggest we re-tag this issue as I don't think we're planning to do anything in this patch version. |
Sorry for commenting on an old issue. This issue was indirectly referenced from a pull request in the cmi5 spec. I believe that the name and description in the verb definition have limited usefulness for a client if they do not know what the content type of those values is. A simple statement viewer of a reasonably well used LRS illustrates the problem quite nicely. There is often a mix of html, control characters, white space etc. Simply dealing with newline characters is difficult enough. See example. I suggest that adding MIME type to the language map might be useful. However an alternative approach could be that the language map is always assumed to contain text/plain and is always rendered as such. |
In the current spec, the type is plain text, so the client should know what the content type is. If an application is rendering the content as anything other than plain text then either that's a bug, or an allowance to account for the LRP sending statements containing non-plain text data in plain text fields. |
Thank you @garemoko, since you have contributed so much to this specification, it is very useful to have your input.
Sorry I missed this.
How does an application know when to make an allowance? The difference between bug and an allowance seems to be very subtle. If I have an application running on a Smart TV which is rendering name and description from the verb definition, how does that application know how to render statements that came from a multitude of LRP's? If it is always plain text then I understand that a browser based application could substitute newlines for However when it comes to HTML, markdown or even plain text like a poem or ASCII art where whitespace might be important it is very hard to know what to do. Should an application rendering a name and description look for HTML tags and remove them for the case when it is not an allowance? Should a browser based application substitute multiple spaces for non breaking spaces when rendering names and descriptions? Sorry for all the questions, I'm stuck on this issue because it seems like a fundamental block to interoperability between systems. Amongst some of the contributors to the xAPI and cmi5 specification there seems to be a difference of opinion. Perhaps @cawerkenthin @brianjmiller or @gavbaa could contribute their thoughts, understanding or experience to this issue? Although I appreciate that everyone might be a little fatigued by this! |
@thomasturrell I'm seeing much difference in opinion about the current state of the spec here, maybe in terms of how urgent making a change would be, or even which strings such a change might be desirable for? I think you're right that the spec doesn't provide interoperability for formatting in properties defined as strings. There is no way to know what systems "make an allowance" so it's best not to count on any such allowance. Making any such allowance would risk mangling strings that literally include tags but reasonably don't escape them because they're expecting plain text rendering: eg: I mentioned attachments earlier in the thread, but I'll do so again here. xAPI does an interoperable way to store formatted text (or images, or anything else) in the form of attachments: https://github.com/adlnet/xAPI-Spec/blob/master/xAPI-Data.md#2411-attachments. Though even in that case, it makes sense to have a plain text string to fall back on to deal with systems incapable of or unwilling to render whatever content type is specified. The verb "display" property in particular was meant as just a display name for the verb itself, little more than "Experienced" instead of |
@bscSCORM I agree, I think attachments (and possibly extensions, which were mentioned above) would suit many of the use cases that I am seeing in the wild. |
Related issue: #475
Strings that are "fetched from the ui" and put into a statement may contain urls, and are allowed to in the current version of the spec. The related issue #475 changes this so that tags will be stripped away. By doing this we lose information. The reason for doing this seems to be to make the job easier for LRSs and analytic tools. A better approach might be to let these solve their own problem by stripping tags where they do not support tags, and keep tags where tags are indeed supported.
A practical example from one of our customers is the use of inline images in alternatives in multichoice. "Click on the picture of the dog". If the pictures are stripped the information in the statement becomes useless. The images alt text typically won't hold information about the animal on the picture since this would be possible to use to cheat, and a task like this won't for for blind people unless you add a huge alt text describing the animal on the pictures without naming it. In practice this might be done in education sector, but will often not be done in the enterprise sector.
Other examples are advanced analytics tools tailor made for some content types. If they don't get the html they won't be able to reproduce the content type correctly. (People are looking into creating tools like this for H5P for instance)
The text was updated successfully, but these errors were encountered: