forked from w3c/mst-content-hint
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathindex.html
446 lines (439 loc) · 19.3 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>MediaStreamTrack Content Hints</title>
<script src="https://www.w3.org/Tools/respec/respec-w3c" class="remove"></script>
<script class='remove'>
"use strict";
// See https://github.com/w3c/respec/wiki/ for how to configure ReSpec
var respecConfig = {
"doRDFa": false,
"githubAPI": "https://api.github.com/repos/w3c/mst-content-hint",
"issueBase": "https://www.github.com/w3c/mst-content-hint/issues/",
"noLegacyStyle": true,
"editors": [
{ name: "Harald Alvestrand", company: "Google", w3cid: "24610" }
],
"formerEditors": [{
name: "Peter Boström",
company: "Google",
},
// Add additional editors here.
// https://github.com/w3c/respec/wiki/editors
],
"shortName": "mst-content-hint",
"edDraftURI": "https://w3c.github.io/mst-content-hint/",
"subjectPrefix": "[mst-content-hint]",
"otherLinks": [{
"key": "Repository",
"data": [{
"value": "We are on GitHub.",
"href": "https://github.com/w3c/mst-content-hint"
}, {
"value": "File a bug.",
"href": "https://github.com/w3c/mst-content-hint/issues"
}, {
"value": "Commit history.",
"href": "https://github.com/w3c/mst-content-hint/commits/gh-pages"
}, ]
}],
wg: "Web Real-Time Communications Working Group",
// URI of the public WG page
wgURI: "https://www.w3.org/2011/04/webrtc/",
// name (without the @w3c.org) of the public mailing to which comments are due
wgPublicList: "public-media-capture",
wgPatentURI: "https://www.w3.org/2004/01/pp-impl/47318/status",
specStatus: "ED",
// until we separate out the test suite for this spec:
testSuiteURI: "https://github.com/web-platform-tests/wpt/tree/master/webrtc/",
};
</script>
</head>
<body>
<section id="abstract">
<p>
This specification extends <a>MediaStreamTrack</a> to provide a
media-content hint attribute. This optional hint permits
<a>MediaStreamTrack</a> consumers such as <dfn>PeerConnection</dfn>
(defined in [[webrtc]]) or <dfn>MediaRecorder</dfn> (defined in
[[mediastream-recording]]) to encode or process track media with methods
more appropriate to the type of content that is being consumed.
</p>
<p>
Adding a media-content hint provides a way for a web application to help
track consumers make more informed decision of what encoder parameters and
processing algorithms to use on the consumed content.
</p>
</section>
<section id="sotd">
</section>
<section id="introduction">
<h2>Introduction</h2>
<p>
Algorithms used for processing speech and music differ greatly.
Echo-cancellation algorithms developed for speech-type content might not
work well on music, and noise-suppression algorithms might remove drum
snares or other "noisy" content. While this makes speech more intelligible
it is less appropriate for music signals.
</p>
<p>
For video, webcam content often require denoising and is often
intelligible even when downscaled or with high quantization levels.
Screencast content of presentations or webpages with a lot of text content
is completely unintelligible if the quantization levels are too high or
if the content is downscaled or otherwise blurry.
</p>
<p>
Without automatic detection of media content, a <a>MediaStreamTrack</a>
consumer can only make an educated guess. This guess may be based on
assuming that screencast content, such as
<code>chrome.desktopCapture</code>, contains text content and must use low
quantization levels, and drop frames extensively to meet bitrate
requirements. Another assumption is that regular USB video devices provide
webcam video, and higher quantization levels and downscaling are
acceptible.
</p>
<p>
While usually appropriate this educated guess leads to sub-optimal
settings when incorrect. This manifests as high framedropping when
screencasting high-motion content such as a movie or streaming a video
game and treating it as text. Treating highly-detailed content as regular
webcam video on the other hand leads to too-blurry content when being
either quantized or downscaled beyond readability to meet bitrate
requirements. This mismatch may also happen when HDMI video-capture cards
are seen as USB webcams but actually screencast webpage text.
</p>
<figure id="text-downscaling">
<img src="lorem-ipsum.png" alt="Lost text intelligibility when downscaling."/>
<figcaption>
While downscaling can be done to preserve motion in low-bitrate
scenarios, this example illustrates lost text intelligibility when
incorrectly applied to detailed content. Example shows 100%, 50% and 25%
cubic downscale corresponding to downscaling from HD to VGA and QVGA
resolutions respectively.
</figcaption>
</figure>
<p>
In some cases the web application can make a more-educated guess or take
user input to inform consumers of what kind of content is being encoded. A
web application that streams video-game content would be able to preserve
motion from desktop capture at the cost of individual frame detail. A
music-studio application would be able to prevent noise suppression from
removing snares from a music track.
</p>
<p>
These settings are not intended to replace encoder-level settings
completely but rather complement them with a simpler hint that does not
require broad knowledge of video encoders, audio-processing steps or
more extensive tuning.
</p>
<p>
A separate section of this specification describes expected behavior
for specific components that process a MediaStreamTrack.
</p>
</section>
<section id="conformance">
</section>
<section id="terminology">
<h2>Terminology</h2>
<p> The terms <dfn data-cite="!WEBRTC#dom-rtcrtpsender">RTCRtpSender</dfn>
and <dfn data-cite="!WEBRTC#dom-rtcrtpsendparameters">RTCRtpSendParameters</dfn>
are defined in [[!WEBRTC]].
</p>
</section>
<section data-link-for="MediaStreamTrack" id="mediastreamtrack-extension">
<h2>Extension to MediaStreamTrack</h2>
<pre class="idl">
partial interface <span class="idlInterfaceID">MediaStreamTrack</span> {
attribute DOMString contentHint;
};
</pre>
<p>
This specification extends <dfn>MediaStreamTrack</dfn> and makes use of
its <dfn>kind</dfn> attribute as defined in [[GETUSERMEDIA]].
</p>
<p>
Each <a>MediaStreamTrack</a> has an associated <dfn>application-set
content hint</dfn>, which is initially <code>""</code>, signifying unset.
This <a>application-set content hint</a> corresponds to the
<dfn data-dfn-for="MediaStreamTrack">contentHint</dfn> attribute of <a>MediaStreamTrack</a> which may be
used by the web application to provide a hint of what type of content is
contained within the track, to guide how it should be treated by
<a>MediaStreamTrack</a> consumers.
</p>
<p>
Valid values for the application-set content hint are dependent on the
<a>kind</a> of <a>MediaStreamTrack</a> contained. On setting
<a>contentHint</a> to <i>value</i>,
</p>
<ol>
<li>
If this <a>MediaStreamTrack</a>'s <a>kind</a> attribute is
<code>"audio"</code>, and <i>value</i> is not one of <code>""</code>,
<code>"speech"</code>, <code>"speechRecognition"</code>, or <code>"music"</code>, abort these steps.
</li>
<li>
If this <a>MediaStreamTrack</a>'s <a>kind</a> attribute is
<code>"video"</code>, and <i>value</i> is not one of <code>""</code>,
<code>"motion"</code>, <code>"detail"</code> or <code>"text"</code>,
abort these steps.
</li>
<li>
Set this <a>MediaStreamTrack</a>'s application-set content hint to
<i>value</i>.
</li>
<li>
The implementation should adapt its decision on how to handle the
content of this <a>MediaStreamTrack</a> according to the new value of
its application-set content hint. This adaptation should happen as
quickly as reasonable, e.g. within the next couple of captured video
frames or audio buffers.
</li>
</ol>
<p>
On getting <a>contentHint</a>,
</p>
<ol>
<li>
Return this <a>MediaStreamTrack</a>'s application-set content hint.
</li>
</ol>
<p>
<i>Note that the initial value of application-set content hints is
<code>""</code>, corresponding to that no hint has been provided. It
does not default to the implementation's best guess of contained type of
content.</i>
</p>
<section>
<h3>Audio Content Hints</h3>
<p>
Audio content hints are only applicable when the <a>MediaStreamTrack</a>
contains an audio track.
</p>
<table class="simple"><tbody><tr><th colspan="2">Audio content hints</th></tr>
<tr><td><code>""</code></td><td><p>
No hint has been provided, the implementation should make its
best-informed guess on how to handle contained audio data. This may be
inferred from how the track was opened or by doing content analysis.
<tr><td><code id="idl-def-AudioContentHint.speech">speech</code></td><td><p>
The track should be treated as if it contains speech data. Consuming
this signal it may be appropriate to apply noise suppression or boost
intelligibility of the incoming signal.
</p></td></tr>
<tr><td><code id="idl-def-AudioContentHint.speechRecognition">speechRecognition</code></td><td><p>
The track should be treated as if it contains data for the purpose of
speech recognition by a machine. Consuming this signal it may be
appropriate to boost intelligibility of the incoming signal for
transcription and turn off audio-processing components that are used for
human consumption.
</p></td></tr>
<tr><td><code id="idl-def-AudioContentHint.music">music</code></td><td><p>
The track should be treated as if it contains music data. Generally this
might imply tuning or turning off audio-processing components that are
used to process speech data to prevent the audio from being distorted.
</p></td></tr>
</tbody></table>
</section>
<section>
<h3>Video Content Hints</h3>
<p>
Video content hints are only applicable when the <a>MediaStreamTrack</a>
contains a video track.
</p>
<table class="simple"><tbody><tr><th colspan="2">Video content hints</th></tr>
<tr><td><code>""</code></td><td><p>
No hint has been provided, the implementation should make its
best-informed guess on how contained video content should be treated.
This can for example be inferred from how the track was opened or by
doing content analysis.
<tr><td><code id="idl-def-VideoContentHint.motion">motion</code></td><td><p>
The track should be treated as if it contains video where motion is
important. This is normally webcam video, movies or video games.
Quantization artefacts and downscaling are acceptible in order to
preserve motion as well as possible while still retaining target
bitrates. During low bitrates when compromises have to be made, more
effort is spent on preserving frame rate than edge quality and details.
</p></td></tr>
<tr><td><code id="idl-def-VideoContentHint.detail">detail</code></td><td><p>
The track should be treated as if video details are extra important.
This is generally applicable to presentations or web pages with text
content, painting or line art. This setting would normally optimize for
detail in the resulting individual frames rather than smooth playback.
Artefacts from quantization or downscaling that make small text or line
art unintelligible should be avoided.
</p></td></tr>
<tr><td><code id="idl-def-VideoContentHint.text">text</code></td><td><p>
The track should be treated as if video details are extra important,
and that significant sharp edges and areas of consistent color can
occur frequently.
This is generally applicable to presentations or web pages with text
content. This setting would normally optimize for
detail in the resulting individual frames rather than smooth
playback, and may take advantage of encoder tools that optimize for
text rendering.
Artefacts from quantization or downscaling that make small text
or line art unintelligible should be avoided.
</p></td></tr>
</tbody></table>
</section>
</section>
<section>
<h2>Behaviors of other components based on content-hint</h2>
<section>
<h3>Behavior of a MediaStreamTrack</h3>
<p>When setting a contentHint value for a MediaStreamTrack, the
UA MUST <a>apply a default</a> as follows:
</p>
<ul>
<li>For an audio track with the value "music", and for constraints
echoCancellation, autoGainControl and noiseSuppression
apply a default of "false".</li>
<li>For an audio track with the value "speech", and for constraints
echoCancellation and autoGainControl, apply a default of
"true".</li>
<li>For an audio track with the value "speechRecognition", and for
constraints echoCancellation, autoGainControl, and noiseSuppression,
apply a default of "false".</li>
</ul>
<p>To <dfn>apply a default</dfn> to the constraint <var>c</var>
with value <var>t</var>, perform the following steps:
<ul>
<li>If the value <var>t</var> satisfies the applied constraints,
set the setting corresponding to c <var>t</var>.</li>
<li>Otherwise, choose a value for the setting corresponding to c
that satisfies the applied constraints.</li>
<li>Remember the value <var>t</var>.</li>
<li>In parallel, update the track with the new settings.</li>
</ul>
<p>Whenever the "apply constraints" algorithm is subsequently
run, the UA MUST choose the remembered value <var>t</var> if it is
now a permitted value.
</p>
<p>When setting a contentHint value of "", all remembered values of
<var>t</var> are removed.
</p>
</section>
<section>
<h3>Degradation preference when encoding</h3>
<p>
When encoding video, and some constraint (bandwidth, CPU) prevents
encoding at the configured framerate and resolution, the encoder
must make a choice on how to modify the encoding parameters.
</p>
<p>
This section defines terms to describe the choice, and an enum
that can be used to indicate that choice in an API.
</p>
<div>
<pre class="idl"
>enum RTCDegradationPreference {
"maintain-framerate",
"maintain-resolution",
"balanced"
};</pre>
<table data-link-for="RTCDegradationPreference" data-dfn-for=
"RTCDegradationPreference" class="simple">
<tbody>
<tr>
<th colspan="2"><code><a>RTCDegradationPreference</a></code> Enumeration description</th>
</tr>
<tr>
<td data-tests="RTCRtpParameters-degradationPreference.html"><dfn data-idl><code>maintain-framerate</code></dfn></td>
<td>
<p>Degrade resolution in order to maintain framerate.</p>
</td>
</tr>
<tr>
<td><dfn data-idl><code>maintain-resolution</code></dfn></td>
<td>
<p>Degrade framerate in order to maintain resolution.</p>
</td>
</tr>
<tr>
<td data-tests="RTCRtpParameters-degradationPreference.html"><dfn data-idl><code>balanced</code></dfn></td>
<td>
<p>Degrade a balance of framerate and resolution.</p>
</td>
</tr>
</tbody>
</table>
</div>
An attribute is defined for RTCRtpSendParameters that allows this
to be explicitly indicated for an RTCRtpSender:
<pre class='idl'>partial dictionary RTCRtpSendParameters {
RTCDegradationPreference degradationPreference;
};</pre>
<section>
<h2>Dictionary <code><a>RTCRtpSendParameters</a></code> New Members</h2>
<dl>
<dt data-tests="RTCRtpParameters-degradationPreference.html"><dfn data-idl><code>degradationPreference</code></dfn> of type
<span class="idlMemberType"><a>RTCDegradationPreference</a></span>.
<dd>
<p>When bandwidth is constrained and the
<code><a>RTCRtpSender</a></code> needs to choose between degrading
resolution or degrading framerate,
<code>degradationPreference</code> indicates which is
preferred.</p>
</dd>
</dl>
</section>
</section>
<section>
<h3>Behavior of an RTCPeerConnection</h3>
<p>An <a>RTCRtpSender</a> transmitting a MediaStreamTrack for which a
contentHint attribute has been
set MUST use the following degradation preferences, unless an
explicit degradationPreference attribute has been set in the sender's
parameters:
</p>
<ul>
<li>For a video track with the attribute value "motion",
use "maintain-framerate".
</li>
<li>For a video track with the attribute value "detail",
use "maintain-resolution".
</li>
<li>For a video track with the attribute value "text",
use "maintain-resolution". In addition, if the
encoding codec is AV1, activate encoding tools for "text" mode.
</li>
</ul>
</section>
<section>
<h3>Behavior of a MediaStreamRecorder</h3>
<p>For a video track with the attribute value "text", if the
encoding codec is AV1, activate encoding tools for "text"
mode.
</p>
</section>
</section>
<section>
<h2>Security and Privacy Considerations</h2>
<p>
This specification adds an API where the user can send
information to the user agent. This information is not
communicated elsewhere, and is not stored in any permanent
location.
</p>
<p>
By probing this API, the client may get some information on how
the user agent implements this specification, which may give some
insight into its configuration. Apart from this, no user agent
information or data is exposed to the client through this API,
and the API does not give any access to browser UI, influence
over the user agent's security characteristics or the generation
of temporary identifiers.
</p>
<p>
This API does not distinguish between first-party, third-party
or incognito contexts; it behaves identically in all these
cases.
</p>
<p>
No personally identifiable or otherwise sensitive information is
handled via the use of this API.
</p>
</section>
</body>
</html>