333 views
Audio/Video Transport Core Maintenance (avtcore) Working Group =============================================================== CHAIRS: Jonathan Lennox Bernard Aboba Virtual Interim Agenda Date: Thursday, February 23, 2023 Time: 08:00 - 10:00 Pacific Time Notes: https://notes.ietf.org/s/notes-ietf-interim-2023-avtcore-01-avtcore Meeting link: https://datatracker.ietf.org/meeting/interim-2023-avtcore-01/session/avtcore Remote instructions: https://meetings.conf.meetecho.com/interim/?short=4da9b4c1-62cd-458e-8a8f-58fa91cf8ce4 Slides: https://docs.google.com/presentation/d/1QAo7WiUmIfKWYp5ntV37aYvtnSuyZ0gNqrcrnQS4KkM/ ------------------------------------------------- 1. Preliminaries (Chairs, 15 min) Note Well, Note Takers, Agenda Bashing, Draft status 2. RTP Payload Format for SCIP (D. Hanson, 15 min) https://datatracker.ietf.org/doc/html/draft-ietf-avtcore-rtp-scip 3. RTP Control Protocol (RTCP) Messages for Green Metadata (Yong He, 15 min) https://datatracker.ietf.org/doc/html/draft-ietf-avtcore-rtcp-green-metadata 4. RTP over QUIC (J. Ott, M. Engelbart, S. Dawkins, 25 min) https://datatracker.ietf.org/doc/html/draft-ietf-avtcore-rtp-over-quic 5. Viewport and Region-of-Interest-Dependent Delivery of Visual Volumetric Media (S. Gudumasu, 10 min) https://datatracker.ietf.org/doc/html/draft-gudumasu-avtcore-rtp-volumetric-media-roi 6. Wrapup and Next Steps (Chairs, 15 min) ------------------------------------------------- # Draft status updates * VP9 payload is in MISSREF, waiting on framemarking draft. # RTP Payload Format for SCIP (Dan Hanson, Mike Faller) IESG Ballot has completed, 3 DISCUSS comments. * In SCIP packetization and rate control is handled by the underlying codec. If the underlying codec (e.g. G.729) does not have the ability to control rate, then the ability of SCIP to respond to congestion will be limited. * SCIP supports both control and data traffic. Control traffic handles key management as well as negotiation. Since control traffic has structure as defined in the SCIP documents and includes a length field, so that SCIP control messages can be split between RTP packets. Data traffic consists of audio/video as well as chat and whiteboard, is encrypted and therefore appears to RTP as an opaque blob that can be split between RTP packets. This means that the SCIP stream can appear differently depending on the state of SCIP and whether the traffic represents control or data. * SCIP authors are looking for WG review of the proposed changes before publishing a revision 5. * Bernard Aboba: Would be useful to take a look at the ballot positions. The IESG is looking for a standard RTP payload type document, with sections on the RTP packet format, RTCP feedback, etc. but they didn't get that. * BA: The SCIP authors appear to have responded to Franchesca's review comments relating to change control. Roman is asking if the traffic is entirely opaque - once you've exchanged the keys, it is entirely opaque, correct? * Dan Hanson: Yes. But the control traffic has structure, as described in SCIP-210. * BA: While you don't have to go into detail on the key establishment process, I do think you need to describe how control traffic and data traffic are packetized. Once keys are negotiated, the data traffic is entirely opaque. Packetization is handled by the underlying codec, which hands the payload up to SCIP for encryption and placement in the payload field. So if the codec is H.264, an H.264 packetizer is providing a payload in RFC 6184 format to SCIP for encryption. * BA: You may need to say a little bit about how the SCIP are fragmented into RTP packets. * DH: The messages have their own header with message length. * BA: So you will need to describe how the SCIP control messages are split between packets (e.g. how fragmentation works) as well as how the fragments are de-packetized. * BA: There is another comment from Roman about secure session establishment protocol behaviour. I'd suggest talking about the basics, but I don't think you need to reproduce the SCIP state machine, because you just need enough to describe how packetization and de-packetization works. * BA: Sarker's comment asks what RTP profile should be used. Also need to specify exactly what "highly variable" means. * Jonathan Lennox: The RTP profile will depend on the underlying codec. * BA: However, SDP does not reflect that, because the negotiation is handled within SCIP. * Mike Faller: We've been throwing the word "codec" around a lot, but that underlying codec could be a chat session, so it's better to think of it as data being carried by SCIP. * BA: Yes, since the payload can represent media or chat or whiteboard, data is probably a better term. * BA: This seems like a question that a description of the overall architecture and layering might address. * MF: What do you need to know about SCIP to implement it? * BA: Since negotiation of the underlying codec is handled by SCIP, the RTP profile does not need to be negotiated in SDP. Also, that is why the traffic is "highly variable". * MF: One of the fears from networking equipment vendors is wanting to know how to filter on SCIP, but you can't as it's so variable and changing. * Jonathan Lennox: Need a statement in the document to just not try. * BA: QUIC deals with this in the "ossification" section - you could add a section here telling implementers to not bother with deep packet inspection because the traffic can vary depending on whether it is data or control traffic. Attempting to parse opaque data is pointless. # RTP Control Protocol (RTCP) Messages for Green Metadata (Yong He) * Jonathan Lennox: Question from Nokia - if there was any negotiation of the resolution in SDP, how does that interact with these messages? Probably that you should never go above the resolution which SDP allows. * JL: Question from Magnus Westerland about the format * JL: Better to call the document "Temporal-Spatial Resolution" as the Green Metadata effort within MPEG is more than just TSR. # RTP over QUIC (Mathis Engelbart, Spencer Dawkins) * Mathis Engelbart: Had feedback that the congestion control section needed to describe what kind of congestion control was required, and which layer should perform congestion control. *Spencer's summary of controversial ideas in congestion control:* 1. *"Disabling QUIC CC"* - obviously, any implementation can do anything at its end, but there's not a defined way to tell the OTHER end to do that. So, how do we make sure we're talking to an "other end" that will Do The Right Thing? Port numbers? ALPN (as is in the draft today)? Just assume the RTP sender will never cause QUIC CC to kick in, because it's interactive media, and think happy thoughts? Other ideas? 2. *Conforming to RFC 8085* (because we're running over UDP with little or no congestion control happening there). 3. *Is there a MTI rate adaptation algorithm for RTP-over-QUIC?* - Spencer doesn't think so, because NADA and SCREAM are IRTF-stream algorithms. For some value of "we", "we" could talk to the authors about how to bring this through the IETF. Are we going to do that? Spencer also notes that Christian Huitema has a proposal for media-aware CC in MOQ now, so it may be too soon to pick an algorithm now.* * Spencer Dawkins: Been talking about disabling QUIC CC for years. Any implementation can do whatever it wants at its end, but no way for the peer to be told to do the right thing. Could be in ALPN, but some people don't like that. * SD: What would we need to do to conform to RFC 8085 as we're running over UDP, and turning off QUIC CC would require understanding of what the app layer will do * Bernard Aboba: In QUIC, both sides don't need to use the same congestion control algorithm. The CC algorithm isn't negotiated, and yet the two sides can interoperate. On disabling QUIC CC, is this a serious thing that people want to do? We've talked in various places about trying to choose CCs that would work. If you are compiling a QUIC library within your application, you could remove the QUIC CC algorithm and handle CC in the application instead, but it's highly unlikely that a browser would allow an application to disable QUIC CC. As an example, in WebTransport, there is no constructor argument for "no congestion control". * SD: There's two layers to this - using the language of disabling QUIC CC, but how do all the entities in an implementation know who is doing congestion control, and how to know that's not appropriate for H3? I.e. don't use BBR or other CCs that do bandwidth probing. * BA: In the case you've just mentioned, it's possible for one peer to use BBR and the other to use New Reno. The Probe RTT phase would destabilize rate control (the application sees loss or increased delay, so it reduces rate, then probe RTT kicks in, delay goes back up and application will reduce rate even more though it is not necessary). So rather than worrying about CC negotiation, the draft should instead describe what algorithms are likely to provide good results. * SD: Should we have a mandatory to implement rate adaptation algorithm? Spencer doesn't think so, but wondering if others agree? * BA: Rate adaptation depends on what mechanisms the encoder provides. The encoder may support scalable video coding, which enables dropping or adding of layers, or it may support per-frame or per-macroblock QP control. Or the encoder might just allow changes in resolution or framerate. So a given rate control algorithm may not be implementable if the encoder doesn't offer the required controls. * SD: A lot of the work in RMCAT (NADA, SCReAM) was done on how to not break other flows of NADA or SCReAM, but not necessarily what would break other congestion controllers. * Peter Thatcher: I don't think this is a question of whether we're turning off QUIC CC - what is the feedback it's going to use to implement feedback to RTP - is that going to be feedback from QUIC or from RTP? Are we going to extend QUIC to have the timestamps necessary, or is it going to be feedback embedded in streams or datagrams. How implementations use that feedback should be up to that implementation. * SD: Agree with Peter. I want to improve the quality of discussions by having better descriptions on issues on github. We haven't had a lot of conversation that focuses on all three options. * Jonathan Lennox: Agree that whether you have congestion control at the RTP level - I think the desision of how much is available to send can happen at either end, but the decision of what to send needs to be decided by the sender only and this needs to be clarified. My expectation is that a CC like New Reno or CUBIC, and if you run a rate-based CC like GCC or NADA on top of that is that they'd work together well, but not with BBR. The issue is communication between the layers. * BA: Key frames cause the most issues because they are so much larger than everything else. Average Bitrate Targets are just averages, but since keyframes are 10+ times larger than P-frames, keyframes are much more susceptible to loss and queuing delays. Also, when sent over QUIC reliable transport, the size of the congestion window is useful to know, since keyframes may require multiple roundtrips to send. For exampe, at high resolutions, the keyframe might be 150 KB and if the cwind is 15 KB (10 packets), that implies 10 roundtrips! This can dictate the startup latency, or the length of a video freeze that will be experienced in event that a keyframe is needed to switch between streams or to recover from loss. W3C WebTransport WG has been working on metrics to provide to the application. * SD: Going to have an arms race between people wanting to send more and those wanting better compression. The very biggest frames that we send are getting bigger over time, until someone comes up with a way to amend that. * BA: Better compression means you can either keep the quality the same and save bits, or use the same number of bits but do more with it. In many applications (such as 4K gaming or AR/VR) the inclination will be to provide higher quality, so the bitrate won't go down. * SD: Mathis and Joerg have already done some due dilligence with QUIC CC, and noted it wasn't kicking in on some representative RTP traffic. It didn't, but we need to keep an eye open in case that starts happening. * BA: The amount of concurrency is an issue. If you're sending the key frame at the same time as P-frames, this allows the pipe to stay full even if the keyframe is experiencing loss and retransmission. While this may drag out the length of time needed to send a key frame, it can decrease glass-glass latency because the pipe is kept full, and the cwind grows or recovers more quickly (since it being pushed up by the concurrent sending of P-frames). # Viewport and Region-of-Interest-Dependent Delivery of Visual Volumetric Media (Srinivas Gudumasu) * Jonathan Lennox: Are you seeking adoption? * Srinivas Gudumasu: Not at this time, just looking for feedback. * Spencer Dawkins: Is this the rtp-v3c-00 draft? * SG: No this is on top of that * JL: Might be better for MPEG to define what a region and the viewport is - make sure you have the correct experts looking at it. * SG: Already working on this. * Mo Zanaty: You were talking about coordinates system - does V3C describe in terms of meters? Or are they arbitrary and not based on real-world sizes? Might be not interoperable when looking at an observer's real world dimensions. * SG: The sender converts the dimensions and translates them into the coordinates for the spatial regions. * Mo Zanaty: Needs more wordsmithing to make this more obvious that it's images and pixels relative to the media itself. # Action items * SCIP authors to add material to the introduction providing an overview of SCIP, section on RTP payload format (including packetization/de-packetization) answering IESG questions. # Log of Zulip Chat Sam Hurst I can try taking notes - will be my first time for an IETF meeting but I can give it a go 11:05:52 Dale Worley I'm not hearing any audio, though the browser and Meetecho think I've got sound enabled and Meetecho reports multiple kbps. I heard the previous speaker. 11:17:11 Jonathan Lennox Dale, is that still the case? 11:19:14 Dale Worley Yes, and toggling the immediately obvious controls doesn't fix it. I am getting the notifications as to who is speaking. 11:20:12 Ugh, my UI problem. The "Mute Audio" icon in the upper left is "mute audio to me" whereas the same icon in the lower right is "mute audio I am generating". 11:22:23 Sam Hurst My audio glitched out for a moment there - what doesn't matter (so I can add it to the notes) 11:33:03 Rui Paulo we can also tell the other side to disable CC via QUIC transport parameters 11:50:17 Joerg Ott That would mean that a QUIC library without RTP would be able to signal that, too, which we may not want to suggest. No CC at all would not be a good idea. 11:51:10 Peter Thatcher This isn't really about "disabling QUIC CC". This is more about "the congestion control for the RTP packets is done using feedback other than QUIC feedback". Whether it's "in QUIC" or "out of QUIC" is an implementation detail. The real question for a standard is: what feedback is being used. Is it QUIC feedback or RTP/RTCP feedback? 11:54:30 Joerg Ott Right, I am saying only that you should not be allowed to negotiate turning off QUIC CC inside QUIC transport parameter negotiation 11:55:39 Peter Thatcher For example, in the WebRTC world, we don't standardize googc. We standardize transport-wide-cc (or whatever was the result of trying to standardize it :). 11:56:01 I agree it doesn't make sense to negotiate turning off QUIC CC. It might make sense to negotiate some kind of QUIC version of REMB, though. 11:56:58 Harald Alvestrand well, we failed to find the resources to actually propose transport-cc for standardization, so that kind of died..... 11:57:10 the proper view is probably that we should have a congestion limitation as part of QUIC, but a decision on what to send next (or decide not to send) as part of the application. 11:58:21 Peter Thatcher Options for feedback: A. REMB style: have the receiver tell the sender a number. Probably need a timestamp similar to the RTP abs-send-time header extension attached to each packet for the receiver to make a good calculation. B. transport-wide-cc style: have the receiver send per-packet receive timestamps in ACKs. Let the sender do the calculation. I think in either case, extending QUIC will be much better than doing this above QUIC. 12:02:25 I think this works for B: https://www.ietf.org/archive/id/draft-smith-quic-receive-ts-00.txt 12:03:52 Joerg Ott B. is what we have in the draft 12:05:15 including quite a bit of discussion on what to inherit from the QUIC layer 12:05:40 Vidhi Goel How big is the Pframe? 12:06:12 Joerg Ott Of course, good ol' RTCP is also an option when you need something that QUIC doesn't give you (yet) 12:06:29 Peter Thatcher Joerg: That sounds like a good approach. Why don't we just try and implement that with, say, googcc, and see if it works well? I'm guessing it will. What more do we need? 12:07:08 Mo Zanaty MOQ has a similar need / delimma. That argues for solving this within the QUIC CC and feedback layer, not above it. 12:07:53 Peter Thatcher And I'd like googc to work in WebTransport as well. 12:08:14 Mo Zanaty For receive timestamp extensions to QUIC, there is also Christian's original proposal: https://datatracker.ietf.org/doc/draft-huitema-quic-ts/ 12:09:10 Peter Thatcher If anyone is interested in adding support for https://www.ietf.org/archive/id/draft-smith-quic-receive-ts-00.txt and googcc to an impl of QUIC in Rust, let me know. I'm interested in making that work. 12:10:30 Joerg Ott Peter: we implemented quite a bit of this, and it seemed to work. But the details of QUIC CC interaction remain. 12:10:43 Let's chat offline about this (I will have to disappear shortly for boarding my flight) 12:11:22 Vidhi Goel I can work on congestion control issues 12:13:29 Peter Thatcher Joerg: I'm interested to see how far you got 12:13:50 If anyone wants to chat QUIC CC offline, send me an email (pthatcher@microsoft.com) 12:15:25 Joerg Ott We'll reach out. 12:15:44