diff --git a/source/configure/calls-deployment.rst b/source/configure/calls-deployment.rst index 8ddea4df247..3f0c0b07c87 100644 --- a/source/configure/calls-deployment.rst +++ b/source/configure/calls-deployment.rst @@ -4,460 +4,184 @@ Calls self-hosted deployment .. include:: ../_static/badges/allplans-cloud-selfhosted.rst :start-after: :nosearch: -This document provides information on how to successfully make the Calls plugin work on self-hosted deployments. It also outlines some of the most common deployment strategies with example diagrams, and also provides the deployment guidelines for the recording and transcription service. - -- `Terminology <#terminology>`__ -- `Plugin components <#plugin-components>`__ -- `Requirements <#requirements>`__ -- `Limitations <#limitations>`__ -- `Configuration <#configuration>`__ -- `Performance <#performance>`__ -- `RTCD Service <#the-rtcd-service>`__ -- `Configure recording and transcriptions <#configure-recording-and-transcriptions>`__ -- `Kubernetes deployments <#kubernetes-deployments>`__ -- `Frequently asked questions <#frequently-asked-questions>`__ -- `Troubleshooting <#troubleshooting>`__ +This document provides an overview of Mattermost Calls deployment options for self-hosted environments, including deployment architectures, key requirements, and important considerations. -Terminology ------------ +Quick Links +---------- -- `WebRTC `__: The set of underlying protocols/specifications on top of which calls are implemented. -- **RTC (Real Time Connection)**: The real-time connection. This is the channel used to send media tracks (audio/video/screen). -- **WS (WebSocket)**: The WebSocket connection. This is the channel used to set up a connection (signaling process). -- `NAT (Network Address Translation) `__: A networking technique to map IP addresses. -- `STUN (Session Traversal Utilities for NAT) `__: A protocol/service used by WebRTC clients to help traversing NATs. On the server side it's mainly used to figure out the public IP of the instance. -- `TURN (Traversal Using Relays around NAT) `__: A protocol/service used to help WebRTC clients behind strict firewalls connect to a call through media relay. +For detailed information on specific topics, please refer to these specialized guides: -Plugin components ------------------ +- `RTCD Setup and Configuration `__: Comprehensive guide for setting up the dedicated RTCD service +- `Calls Troubleshooting `__: Detailed troubleshooting steps and debugging techniques +- `Calls Metrics and Monitoring `__: Guide to monitoring Calls performance using metrics and observability -- **Calls plugin**: This is the main entry point and a requirement to enable channel calls. +About Mattermost Calls +--------------------- -- **rtcd**: This is an optional service that can be deployed to offload all the functionality and data processing involved with the WebRTC connections. Read more about when and why to use `rctd <#the-rtcd-service>`__ below. +Mattermost Calls provides integrated audio calling and screen sharing capabilities within Mattermost channels. It's built on WebRTC technology and can be deployed either: -Requirements ------------- +1. **Integrated mode**: Built into the Calls plugin (simpler, suitable for smaller deployments) +2. **RTCD mode**: Using a dedicated service for improved performance and scalability (recommended for production environments) + +Terminology +----------- -Server -~~~~~~ +- `WebRTC `__: The set of protocols on which calls are built +- **RTC**: Real-Time Connection channel used for media (audio/video/screen) +- **WS**: WebSocket connection used for signaling and connection setup +- **SFU**: Selective Forwarding Unit, routes media between participants +- `NAT `__: Network Address Translation for mapping IP addresses +- `STUN `__: Protocol used by WebRTC clients to help traverse NATs +- `TURN `__: Protocol to relay media for clients behind strict firewalls -- Run Mattermost server on a secure (HTTPs) connection. This is a necessary requirement on the client to allow capturing devices (e.g., microphone, screen). See the `config TLS `__ section for more info. -- See `network requirements `__ below. +Key Components +------------- -Client -~~~~~~ +- **Calls plugin**: The main plugin that enables calls functionality +- **RTCD service**: Optional dedicated service for offloading media processing (Enterprise feature) +- **calls-offloader**: Service for call recording and transcription (if enabled) -- Clients need to be able to connect (send and receive data) to the instance hosting the calls through the UDP port configured as ``RTC Server Port``. If this is not possible a TURN server should be used to achieve connectivity. -- Depending on the platform or operating system, clients may need to grant additional permissions to the application (e.g., browser, desktop app) to allow them to capture audio inputs or share the screen. +Network Requirements +------------------ -Network -~~~~~~~ +The following network connectivity is required: -+---------------------------------+--------+-----------------+------------------------------------------------------------+------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ -| Service | Ports | Protocols | Source | Target | Purpose | -+---------------------------------+--------+-----------------+------------------------------------------------------------+------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ -| API (Calls plugin) | 80,443 | TCP (incoming) | Mattermost clients (web/desktop/mobile) | Mattermost instance (Calls plugin) | To allow for HTTP and WebSocket connectivity from clients to Calls plugin. This API is exposed on the same connection as Mattermost, so there’s likely no need to change anything. | -+---------------------------------+--------+-----------------+------------------------------------------------------------+------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ -| RTC (Calls plugin or ``rtcd``) | 8443 | UDP (incoming) | Mattermost clients (Web/Desktop/Mobile) | Mattermost instance or ``rtcd`` service | To allow clients to establish connections that transport calls related media (e.g. audio, video). This should be open on any network component (e.g. NAT, firewalls) in between the instance running the plugin (or ``rtcd``) and the clients joining calls so that UDP traffic is correctly routed both ways (from/to clients). | -+---------------------------------+--------+-----------------+------------------------------------------------------------+------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ -| RTC (Calls plugin or ``rtcd``) | 8443 | TCP (incoming) | Mattermost clients (Web/Desktop/Mobile) | Mattermost instance or ``rtcd`` service | To allow clients to establish connections that transport calls related media (e.g. audio, video). This should be open on any network component (e.g. NAT, firewalls) in between the instance running the plugin (or ``rtcd``) and the clients joining calls so that TCP traffic is correctly routed both ways (from/to clients). This can be used as a backup channel in case clients are unable to connect using UDP. It requires ``rtcd`` version >= v0.11 and Calls version >= v0.17. | -+---------------------------------+--------+-----------------+------------------------------------------------------------+------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ -| API (``rtcd``) | 8045 | TCP (incoming) | Mattermost instance(s) (Calls plugin) | ``rtcd`` service | To allow for HTTP/WebSocket connectivity from Calls plugin to ``rtcd`` service. Can be expose internally as the service only needs to be reachable by the instance(s) running the Mattermost server. | -+---------------------------------+--------+-----------------+------------------------------------------------------------+------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ -| STUN (Calls plugin or ``rtcd``) | 3478 | UDP (outgoing) | Mattermost Instance(s) (Calls plugin) or ``rtcd`` service | Configured STUN servers | (Optional) To allow for either Calls plugin or ``rtcd`` service to discover their instance public IP. Only needed if configuring STUN/TURN servers. This requirement does not apply when manually setting an IP or hostname through the |ice_host_override_link| config option. | -+---------------------------------+--------+-----------------+------------------------------------------------------------+------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ ++-------------------+--------+-----------------+-------------------------+------------------------+ +| Service | Ports | Protocols | Source | Target | ++===================+========+=================+=========================+========================+ +| Calls plugin API | 80,443 | TCP (incoming) | Mattermost clients | Mattermost server | ++-------------------+--------+-----------------+-------------------------+------------------------+ +| RTC media | 8443 | UDP (incoming) | Mattermost clients | Mattermost or RTCD | ++-------------------+--------+-----------------+-------------------------+------------------------+ +| RTC media | 8443 | TCP (incoming) | Mattermost clients | Mattermost or RTCD | ++-------------------+--------+-----------------+-------------------------+------------------------+ +| RTCD API | 8045 | TCP (incoming) | Mattermost server | RTCD service | ++-------------------+--------+-----------------+-------------------------+------------------------+ +| STUN | 3478 | UDP (outgoing) | Mattermost or RTCD | STUN servers | ++-------------------+--------+-----------------+-------------------------+------------------------+ -.. |ice_host_override_link| replace:: `ICE Host Override `__ +For complete network requirements, see the `RTCD Setup and Configuration `__ guide. Limitations ----------- - In Mattermost Cloud, up to 200 participants per channel can join a call. -- In Mattermost self-hosted deployments, the default maximum number of participants is unlimited. The recommended maximum number of participants per call is 200. This setting can be changed in **System Console > Plugin Management > Calls > Max call participants**. There's no limit to the total number of participants across all calls as the supported value greatly depends on instance resources. For more details, refer to the `performance section `__ below. +- In Mattermost self-hosted deployments, the default maximum number of participants is unlimited. The recommended maximum number of participants per call is 200. +- You can configure the maximum participants in **System Console > Plugin Management > Calls > Max call participants**. Configuration ------------- For Mattermost self-hosted customers, the calls plugin is pre-packaged, installed, and enabled. Configuration to allow end-users to use it can be found in the `System Console `__. -Modes of operation ------------------- +Deployment Architecture Options +----------------------------- -Depending on how the Mattermost server is running, there are several modes under which the Calls plugin can operate. Please refer to the section below on `the rtcd service <#the-rtcd-service>`__ to learn about the ``rtcd`` and the Selective Forwarding Unit (SFU). +Mattermost Calls can be deployed in several configurations: -============================ =============== ================= - Mattermost deployment SFU SFU deployment -============================ =============== ================= - Single instance integrated - Single instance rtcd - High availability cluster integrated clustered - High availability cluster integrated single handler - High availability cluster rtcd -============================ =============== ================= - -Single instance -~~~~~~~~~~~~~~~ - -Integrated -^^^^^^^^^^ - -This is the default mode when first installing the plugin on a single Mattermost instance setup. The WebRTC service is integrated in the plugin itself and runs alongside the Mattermost server. +Single Instance Deployments +~~~~~~~~~~~~~~~~~~~~~~~~~~ .. image:: ../images/calls-deployment-image3.png :alt: A diagram of the integrated configuration model of a single instance. + :width: 600px -rtcd -^^^^ - -An external, dedicated and scalable WebRTC service (``rtcd``) is used to handle all calls media routing. +**Integrated mode**: The WebRTC service runs within the Calls plugin on the Mattermost server. .. image:: ../images/calls-deployment-image7.png :alt: A diagram of a Web RTC deployment configuration. + :width: 600px -High availability cluster -~~~~~~~~~~~~~~~~~~~~~~~~~ +**RTCD mode**: A dedicated RTCD service handles media routing, reducing load on the Mattermost server. -Clustered -^^^^^^^^^ - -This is the default mode when running the plugin in a high availability cluster. Every Mattermost node will run an instance of the plugin that includes a WebRTC service. Calls are distributed across all available nodes through the existing load-balancer: a call is hosted on the instance where the initiating websocket connection (first client to join) is made. A single call will be hosted on a single cluster node. +High Availability Deployments +~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. image:: ../images/calls-deployment-image4.png - :alt: A diagram of a single handler deployment. - -Single handler -^^^^^^^^^^^^^^ - -This is a fallback mode to only let one node in the cluster to host calls. While the plugin would still run on all nodes, all calls will be routed through the handler node. This mode must be enabled by running the instance with a special environment variable set (MM_CALLS_IS_HANDLER=true). - -.. image:: ../images/calls-deployment-image5.png :alt: A diagram of a clustered calls deployment. + :width: 600px -rtcd (HA) -^^^^^^^^^^ +**Clustered mode**: Each Mattermost node runs an instance of the plugin with its own WebRTC service. .. image:: ../images/calls-deployment-image2.png :alt: A diagram of an rtcd deployment. + :width: 600px -Performance ------------ - -Calls performance primarily depends on two resources: CPU and bandwidth (both network latency and overall throughput). The final consumption exhibits quadratic growth with the number of clients transmitting and receiving media. - -As an example, a single call with 10 participants of which two are unmuted (transmitting voice data) will generally consume double the resources than the same call with a single participant unmuted. What ultimately counts towards performance is the overall number of concurrent media flows (in/out) across the server. - -Benchmarks -~~~~~~~~~~ - -Here are some results from internally conducted performance tests on a dedicated ``rtcd`` instance: - -+-------+------------+--------------+----------------+-----------+--------------+--------------------+----------------+ -| Calls | Users/call | Unmuted/call | Screen sharing | CPU (avg) | Memory (avg) | Bandwidth (in/out) | Instance (EC2) | -+=======+============+==============+================+===========+==============+====================+================+ -| 100 | 8 | 2 | no | 60% | 0.5GB | 22Mbps / 125Mbps | c6i.xlarge | -+-------+------------+--------------+----------------+-----------+--------------+--------------------+----------------+ -| 100 | 8 | 2 | no | 30% | 0.5GB | 22Mbps / 125Mbps | c6i.2xlarge | -+-------+------------+--------------+----------------+-----------+--------------+--------------------+----------------+ -| 100 | 8 | 2 | yes | 86% | 0.7GB | 280Mbps / 2.2Gbps | c6i.2xlarge | -+-------+------------+--------------+----------------+-----------+--------------+--------------------+----------------+ -| 10 | 50 | 2 | no | 35% | 0.3GB | 5.25Mbps / 86Mbps | c6i.xlarge | -+-------+------------+--------------+----------------+-----------+--------------+--------------------+----------------+ -| 10 | 50 | 2 | no | 16% | 0.3GB | 5.25Mbps / 86Mbps | c6i.2xlarge | -+-------+------------+--------------+----------------+-----------+--------------+--------------------+----------------+ -| 10 | 50 | 2 | yes | 90% | 0.3GB | 32Mbps / 1.33Gbps | c6i.xlarge | -+-------+------------+--------------+----------------+-----------+--------------+--------------------+----------------+ -| 10 | 50 | 2 | yes | 45% | 0.3GB | 32Mbps / 1.33Gbps | c6i.2xlarge | -+-------+------------+--------------+----------------+-----------+--------------+--------------------+----------------+ -| 5 | 200 | 2 | no | 65% | 0.6GB | 8.2Mbps / 180Mbps | c6i.xlarge | -+-------+------------+--------------+----------------+-----------+--------------+--------------------+----------------+ -| 5 | 200 | 2 | no | 30% | 0.6GB | 8.2Mbps / 180Mbps | c6i.2xlarge | -+-------+------------+--------------+----------------+-----------+--------------+--------------------+----------------+ -| 5 | 200 | 2 | yes | 90% | 0.7GB | 31Mbps / 2.2Gbps | c6i.2xlarge | -+-------+------------+--------------+----------------+-----------+--------------+--------------------+----------------+ - -Dedicated service -~~~~~~~~~~~~~~~~~ - -For Enterprise customers we offer a way to offload performance costs through a `dedicated service `__ that can be used to further scale up calls. - -Load testing -~~~~~~~~~~~~ - -We provide a `load-test tool `__ that can be used to simulate and measure the performance impact of calls. - -Monitoring -~~~~~~~~~~ - -Both the plugin and the external ``rtcd`` service expose some Prometheus metrics to monitor performance. We provide an `official dashboard `__ that can be imported in Grafana. You can refer to `Performance monitoring `__ for more information on how to set up Prometheus and visualize metrics through Grafana. - -Calls plugin metrics -^^^^^^^^^^^^^^^^^^^^ - -Metrics for the calls plugin are exposed through the public ``/plugins/com.mattermost.calls/metrics`` API endpoint. - -**Process** - -- ``mattermost_plugin_calls_process_cpu_seconds_total``: Total user and system CPU time spent in seconds. -- ``mattermost_plugin_calls_process_max_fds``: Maximum number of open file descriptors. -- ``mattermost_plugin_calls_process_open_fds``: Number of open file descriptors. -- ``mattermost_plugin_calls_process_resident_memory_bytes``: Resident memory size in bytes. -- ``mattermost_plugin_calls_process_virtual_memory_bytes``: Virtual memory size in bytes. - -**WebRTC connection** - -- ``mattermost_plugin_calls_rtc_conn_states_total``: Total number of RTC connection state changes. -- ``mattermost_plugin_calls_rtc_errors_total``: Total number of RTC errors. -- ``mattermost_plugin_calls_rtc_rtp_bytes_total``: Total number of sent/received RTP packets in bytes. - - - Note: removed as of v0.16.0 - -- ``mattermost_plugin_calls_rtc_rtp_packets_total``: Total number of sent/received RTP packets. - - - Note: removed as of v0.16.0 - -- ``mattermost_plugin_calls_rtc_rtp_tracks_total``: Total number of incoming/outgoing RTP tracks. - - - Note: added as of v0.16.0 - -- ``mattermost_plugin_calls_rtc_sessions_total``: Total number of active RTC sessions. - -**Database** - -- ``mattermost_plugin_calls_store_ops_total``: Total number of db store operations. - -**WebSocket** - -- ``mattermost_plugin_calls_websocket_connections_total``: Total number of active WebSocket connections. -- ``mattermost_plugin_calls_websocket_events_total``: Total number of WebSocket events. - -WebRTC service metrics -^^^^^^^^^^^^^^^^^^^^^^ - -Metrics for the ``rtcd`` service are exposed through the ``/metrics`` API endpoint. - -**Process** - -- ``rtcd_process_cpu_seconds_total``: Total user and system CPU time spent in seconds. -- ``rtcd_plugin_calls_process_max_fds``: Maximum number of open file descriptors. -- ``rtcd_plugin_calls_process_open_fds``: Number of open file descriptors. -- ``rtcd_plugin_calls_process_resident_memory_bytes``: Resident memory size in bytes. -- ``rtcd_plugin_calls_process_virtual_memory_bytes``: Virtual memory size in bytes. - -**WebRTC Connection** - -- ``rtcd_rtc_conn_states_total``: Total number of RTC connection state changes. -- ``rtcd_rtc_errors_total``: Total number of RTC errors. -- ``rtcd_rtc_rtp_bytes_total``: Total number of sent/received RTP packets in bytes. - - - Note: removed as of v0.10.0 - -- ``rtcd_rtc_rtp_packets_total``: Total number of sent/received RTP packets. - - - Note: removed as of v0.10.0 - -- ``rtcd_rtc_rtp_tracks_total``: Total number of incoming/outgoing RTP tracks. - - - Note: added as of v0.10.0 - -- ``rtcd_rtc_sessions_total``: Total number of active RTC sessions. - -**WebSocket** - -- ``rtcd_ws_connections_total``: Total number of active WebSocket connections. -- ``rtcd_ws_messages_total``: Total number of received/sent WebSocket messages. - -System tunings -~~~~~~~~~~~~~~ +**RTCD with HA**: Dedicated RTCD services handle media routing for high availability. -If you want to host many calls or calls with a large number of participants, take a look at the following platform specific (Linux) tunings (this is the only officially supported target for the plugin right now): - -.. code:: - - # Setting the maximum buffer size of the receiving UDP buffer to 16MB - net.core.rmem_max = 16777216 - - # Setting the maximum buffer size of the sending UDP buffer to 16MB - net.core.wmem_max = 16777216 - - # Allow to allocate more memory as needed for more control messages that need to be sent for each socket connected - net.core.optmem_max = 16777216 - -The rtcd service ----------------- - -.. include:: ./calls-rtcd-ent-only.rst - :start-after: :nosearch: - -The Calls plugin has a built-in `Selective Forwarding Unit (SFU) `__ to route audio and screensharing data. This is the ``integrated`` option described in the `<#modes-of-operation>`__ section above. But this SFU functionality can be deployed separately as an external ``rtcd`` instance. - -Reasons to use the ``rtcd`` service -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -This section will help you understand when and why your organization would want to use ``rtcd``. - -.. note:: - - ``rtcd`` is a standalone service, which adds operational complexity, maintenance costs, and requires an enterprise licence. For those who are evaluating Calls, and for many small instances of Mattermost, the integrated SFU (the one included in the Calls plugin) may be sufficient initially. - -The ``rtcd`` service is the recommended way to host Calls for the following reasons: - -- **Performance of the main Mattermost server(s).** When the Calls plugin runs the SFU, calls traffic is added to the processing load of the server running the rest of your Mattermost services. If Calls traffic spikes, it can negatively affect the responsiveness of these services. Using an rtcd service isolates the calls traffic processing to those rtcd instances, and also reduces costs by minimizing CPU usage spikes. - -- **Performance, scalability, and stability of the Calls product.** If Calls traffic spikes, or more overall capacity is needed, ``rtcd`` servers can be added to balance the load. As an added benefit, if the Mattermost traffic spikes, or if a Mattermost instance needs to be restarted, those people in a current call will not be affected - current calls won't be dropped. - -Some caveats apply here. Web socket events (for example: emoji reactions, hand raising, muting/unmuting) will not be transmitted while the main Mattermost server is down. But the call itself will continue while the main server restarts. - -- **Kubernetes deployments.** In a Kubernetes deployment, ``rtcd`` is strongly recommended; it is currently the only officially supported way to run Calls. -- **Technical benefits.** The dedicated ``rtcd`` service has been optimized and tuned at the system/network level for real-time audio/video traffic, where latency is generally more important than throughput. - -In general, ``rtcd`` is the preferred solution for a performant and scalable deployment. With ``rtcd``, the Mattermost server will be minimally impacted when hosting a high number of calls. - -Horizontal scalability -~~~~~~~~~~~~~~~~~~~~~~ - -The supported way to enable horizontal scalability for Calls is through a form of DNS based load balancing. This can be achieved regardless of how the ``rtcd`` service is deployed (bare bone instance, Kubernetes, or an alternate way). - -In order for this to work, the `RTCD Service URL `__ should point to a hostname that resolves to multiple IP addresses, each pointing to a running ``rtcd`` instance. The Mattermost Calls plugin will then automatically distribute calls amongst the available hosts. - -The expected requirements are the following: - -- When a new ``rtcd`` instance is deployed, it should be added to the DNS record. The plugin side will then be able to pick it up and start assigning calls to the new host. - -- If a ``rtcd`` instance goes down, it should be removed from the DNS record. The plugin side can then detect the change and stop assigning new calls to that host. - -.. note:: - Load balancing is done at the call level. This means that a single call will always live on a single ``rtcd`` instance. - There's currently no support for spreading sessions belonging to the same call across a fleet of instances. - -Configure recording and transcriptions --------------------------------------- - -Before you can start recording and transcribing calls, you need to configure the ``calls-offloader`` job service. You can read about how to do that `here `__. Performance and scalability recommendations related to this service can be found in `here `__. - -.. note:: - If deploying the service in a Kubernetes cluster, refer to the later section on `Helm charts <#helm-charts>`__. - -Once the ``calls-offloader`` service is running, recordings should be explicitly enabled through the `Enable call recordings `__ config setting and the service's URL should be configured using `Job service URL `__. - - -Call transcriptions can be enabled through the `Enable call transcriptions `__ config setting. - -.. note:: - The call transcriptions functionality is available starting in Calls version v0.22.0 - -Kubernetes deployments ----------------------- - -The Calls plugin has been designed to integrate well with Kubernetes to offer improved scalability and control over the deployment. - -This is a sample diagram showing how the ``rtcd`` standalone service can be deployed in a Kubernetes cluster: +Kubernetes Deployments +~~~~~~~~~~~~~~~~~~~~ .. image:: ../images/calls-deployment-kubernetes.png :alt: A diagram of calls deployed in a Kubernetes cluster. + :width: 600px -If Mattermost isn't deployed in a Kubernetes cluster, and you want to use this deployment type, visit the `Kubernetes operator guide `__. +For Kubernetes deployments, the RTCD service is strongly recommended and is the only officially supported approach. -Helm Charts -~~~~~~~~~~~ - -The recommended way to deploy Calls related components and services in a Kubernetes deployment is to use the officially provided Helm charts. Related documentation including detailed information on how to deploy these services can be found in our ``mattermost-helm`` repository: +For Kubernetes deployments, the recommended approach is to use the officially provided Helm charts: - `rtcd Helm chart `__ - - `calls-offloader Helm chart `__ -Frequently asked questions --------------------------- +When to Use RTCD +-------------- -Is there encryption? -~~~~~~~~~~~~~~~~~~~~ - -Media (audio/video) is encrypted using security standards as part of WebRTC. It's mainly a combination of DTLS and SRTP. It's not e2e encrypted in the sense that in the current design all media needs to go through Mattermost which acts as a media router and has complete access to it. Media is then encrypted back to the clients so it's secured during transit. In short: only the participant clients and the Mattermost server have access to unencrypted call data. +The dedicated RTCD service (available with Enterprise license) is recommended for: -Are there any third-party services involved? -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +- **Production environments**: Isolates call traffic from other Mattermost services +- **Performance optimization**: Dedicated service tuned for real-time media +- **Scalability**: Add RTCD instances as call volume grows +- **Call stability**: Calls continue even if Mattermost server needs to restart +- **Kubernetes deployments**: Required for officially supported Kubernetes deployments -The only external service used is a Mattermost official STUN server (``stun.global.calls.mattermost.com``) which is configured as default. This is primarily used to find the public address of the Mattermost instance if none is provided through the |ice_host_override_link| option. The only information sent to this service is the IP addresses of clients connecting as no other traffic goes through it. It can be removed in cases where the |ice_host_override_link| setting is provided. +For detailed RTCD setup instructions, see the `RTCD Setup and Configuration `__ guide. -Is using UDP a requirement? -~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Call Recording and Transcription +------------------------------ -Yes, UDP is the recommended protocol to serve real-time media as it allows for the lowest latency between peers. However, there are a couple of possible solutions to cover clients that due to limitations or strict firewalls are unable to use UDP: +For call recording and transcription, you need to: -- Since plugin version 0.17 and ``rtcd`` version 0.11 the RTC service will listen for TCP connections in addition to UDP ones. If configured correctly (e.g. using commonly allowed ports such as 80 or 443) it's possible to have clients connect directly through TCP when unable to do it through the preferred UDP channel. +1. Deploy the ``calls-offloader`` service +2. Configure the service URL in the System Console +3. Enable call recordings and/or transcriptions in the plugin settings -- Run calls through an external TURN server that listens on TCP and relays all media traffic between peers. However, this is a sub-optimal solution that should be avoided if possible as it will introduce extra latency along with added infrastructural cost. +Performance Considerations +------------------------ -Do I need a TURN server? -~~~~~~~~~~~~~~~~~~~~~~~~ +Calls performance primarily depends on: -TURN becomes necessary when you expect to have clients that are unable to connect through the configured UDP port. This can happen due to very restrictive firewalls that either block non standard ports even in the outgoing direction or don't allow the use of the UDP protocol altogether (e.g. some corporate firewalls). In such cases TURN is needed to allow connectivity. +- **CPU resources**: More participants require more processing power +- **Network bandwidth**: Both incoming and outgoing traffic increases with participant count +- **Active speakers**: Unmuted participants require significantly more resources -We officially support and recommend using `coturn `__ for a stable and performant TURN service implementation. +For detailed performance metrics, benchmarks, and monitoring guidance, see the `Calls Metrics and Monitoring `__ guide. -How will this work with an existing reverse proxy sitting in front of Mattermost? -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Frequently Asked Questions +------------------------ -Generally clients should connect directly to either Mattermost or, if deployed, the dedicated ``rtcd`` service through the configured UDP port . However, it's also possible to route the traffic through an existing load balancer as long as this has support for routing the UDP protocol (e.g. nginx). Of course this will require additional configuration and potential changes to how the plugin is run as it won't be possible to load balance the UDP flow across multiple instances like it happens for HTTP. +**Is calls traffic encrypted?** +Yes, using WebRTC security standards (DTLS/SRTP). Traffic is encrypted in transit. -Do calls require a dedicated server to work or can they run alongside Mattermost? -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +**Are there any third-party services involved?** +Only a Mattermost STUN server (``stun.global.calls.mattermost.com``) is used by default. This can be removed if you set the ICE Host Override configuration. -The plugin can function in different modes. By default calls are handled completely by the plugin which runs as part of Mattermost. It's also possible to use a dedicated service to offload the computational and bandwidth costs and scale further (Enterprise only). +**Is using UDP a requirement?** +UDP is recommended for best performance, but TCP fallback is supported since plugin version 0.17 and RTCD version 0.11. -Can the traffic between Mattermost and ``rtcd`` be kept internal or should it be opened to the public? -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +**Do I need a TURN server?** +Only if clients are behind restrictive firewalls that block UDP. We recommend `coturn `__ if needed. -When possible, it's recommended to keep communication between the Mattermost cluster and the dedicated ``rtcd`` service under the same private network as this can greatly simplify deployment and security. There's no requirement to expose ``rtcd``'s HTTP API to the public internet. +**Can RTCD traffic be kept internal?** +Yes, and it's recommended. Only the media ports need to be accessible to end-users. Troubleshooting --------------- -Connectivity issues -~~~~~~~~~~~~~~~~~~~ - -If calls are failing to connect or timing out, it's likely there could be a misconfiguration at either the plugin config or networking level. - -For example, the `RTC Server Port (UDP) `__ or the `RTC Server Port (TCP) `__ may not be open or forwarded correctly. - - -Connectivity checks -^^^^^^^^^^^^^^^^^^^ - -An easy way to check whether data can go through is to perform some tests using the ``netcat`` command line tool. - -On the host running Calls (could be the Mattermost instance itself or the one running ``rtcd`` depending on the chosen setup), run the following: - -.. code-block:: bash - - nc -l -u -p 8443 - -On the client side (i.e., the machine you would normally use to run the Mattermost desktop app or browser), run the following: - -.. code-block:: bash - - nc -v -u HOST_IP 8443 - -If connection succeeds, you should be able to send and receive text messages by typing and hitting enter on either side. - -.. note:: - ``HOST_IP`` should generally be the public (client facing) IP of the Mattermost - (or ``rtcd``) instance hosting the calls. When set, it should be the value of the |ice_host_override_link| - config setting. - - ``8443`` should be changed with the port configured in `RTC Server Port `__. - - The same checks can be performed to test connectivity through the TCP port using the same commands with ``-u`` flag removed. - -Network packets debugging -^^^^^^^^^^^^^^^^^^^^^^^^^ - -A more advanced way to debug networking issues is to use the ``tcpdump`` command line utility to temporaily monitor network packets flowing in and out of the instance hosting calls. - -On the server side, run the following: - -.. code-block:: bash +For comprehensive troubleshooting steps and debugging techniques, please refer to the `Calls Troubleshooting `__ guide. - sudo tcpdump -n port 8443 +Next Steps +--------- -This command will output information (i.e. source and destination addresses) for all the network packets being sent or received through port ``8443``. This is a good way to check whether data is getting in and out of the instance and can be used to quickly identify network configuration issues. +1. For detailed setup instructions, see `RTCD Setup and Configuration `__ +2. For monitoring guidance, see `Calls Metrics and Monitoring `__ +3. If you encounter issues, see `Calls Troubleshooting `__ \ No newline at end of file diff --git a/source/configure/calls-metrics-monitoring.rst b/source/configure/calls-metrics-monitoring.rst new file mode 100644 index 00000000000..241b8fc282b --- /dev/null +++ b/source/configure/calls-metrics-monitoring.rst @@ -0,0 +1,322 @@ +Calls Metrics and Monitoring +========================= + +.. include:: ../_static/badges/allplans-cloud-selfhosted.rst + :start-after: :nosearch: + +This guide provides detailed information on monitoring Mattermost Calls performance and health through metrics and observability tools. Effective monitoring is essential for maintaining optimal call quality and quickly addressing any issues that arise. + +- `Metrics overview <#metrics-overview>`__ +- `Setting up monitoring <#setting-up-monitoring>`__ +- `Key metrics to monitor <#key-metrics-to-monitor>`__ +- `Grafana dashboards <#grafana-dashboards>`__ +- `Alerting recommendations <#alerting-recommendations>`__ +- `Performance baselines <#performance-baselines>`__ + +Metrics Overview +-------------- + +Mattermost Calls provides metrics through Prometheus for both the Calls plugin and the RTCD service. These metrics help track: + +- Active call sessions and participants +- Media track statistics +- Connection states and errors +- Resource utilization (CPU, memory, network) +- WebSocket connections and events + +The metrics are exposed through HTTP endpoints: + +- **Calls Plugin**: ``/plugins/com.mattermost.calls/metrics`` +- **RTCD Service**: ``/metrics`` (default) or a configured endpoint + +Setting Up Monitoring +------------------- + +Prerequisites +^^^^^^^^^^^ + +To monitor Calls metrics, you'll need: + +1. **Prometheus**: For collecting and storing metrics +2. **Grafana**: For visualizing metrics (optional but recommended) + +Installing Prometheus +^^^^^^^^^^^^^^^^^^ + +1. **Download and install Prometheus**: + + Visit the [Prometheus download page](https://prometheus.io/download/) for installation instructions. + +2. **Configure Prometheus** to scrape metrics from Mattermost and RTCD: + + Example ``prometheus.yml`` configuration: + + .. code-block:: yaml + + scrape_configs: + - job_name: 'mattermost-calls' + scrape_interval: 15s + metrics_path: '/plugins/com.mattermost.calls/metrics' + static_configs: + - targets: ['mattermost-server:8065'] + + - job_name: 'rtcd' + scrape_interval: 15s + static_configs: + - targets: ['rtcd-server:9090'] + +Installing Grafana +^^^^^^^^^^^^^^^ + +1. **Download and install Grafana**: + + Visit the [Grafana download page](https://grafana.com/grafana/download) for installation instructions. + +2. **Configure Grafana** to use Prometheus as a data source: + + - Add a new data source in Grafana + - Select Prometheus as the type + - Enter the URL of your Prometheus server + - Test and save the configuration + +Enabling Metrics in RTCD +^^^^^^^^^^^^^^^^^^^^^^ + +Add the following to your RTCD configuration file: + +.. code-block:: json + + { + "metrics": { + "enableProm": true, + "promPort": 9090 + } + } + +Key Metrics to Monitor +-------------------- + +RTCD Metrics +^^^^^^^^^^ + +Process Metrics +"""""""""""""" + +These metrics help monitor the health and resource usage of the RTCD process: + +- ``rtcd_process_cpu_seconds_total``: Total CPU time spent +- ``rtcd_process_open_fds``: Number of open file descriptors +- ``rtcd_process_max_fds``: Maximum number of file descriptors +- ``rtcd_process_resident_memory_bytes``: Memory usage in bytes +- ``rtcd_process_virtual_memory_bytes``: Virtual memory used + +**Interpretation**: + +- High CPU usage (>70%) may indicate the need for additional RTCD instances +- Steadily increasing memory usage might indicate a memory leak +- High number of file descriptors could indicate connection handling issues + +WebRTC Connection Metrics +""""""""""""""""""""""" + +These metrics track the WebRTC connections and media flow: + +- ``rtcd_rtc_conn_states_total{state="X"}``: Count of connections in different states +- ``rtcd_rtc_errors_total{type="X"}``: Count of RTC errors by type +- ``rtcd_rtc_rtp_tracks_total{direction="X"}``: Count of RTP tracks (incoming/outgoing) +- ``rtcd_rtc_sessions_total``: Total number of active RTC sessions + +**Interpretation**: + +- Increasing error counts may indicate connectivity or configuration issues +- Track by state to see if connections are failing to establish or dropping +- Larger track counts require proportionally more CPU and bandwidth + +WebSocket Metrics +""""""""""""""" + +These metrics track the signaling channel: + +- ``rtcd_ws_connections_total``: Total number of active WebSocket connections +- ``rtcd_ws_messages_total{direction="X"}``: Count of WebSocket messages (sent/received) + +**Interpretation**: + +- Connection count should match expected participant numbers +- Unusually high message counts might indicate protocol issues +- Connection drops might indicate network issues + +Calls Plugin Metrics +^^^^^^^^^^^^^^^^^ + +Similar metrics are available for the Calls plugin with the following prefixes: + +- Process metrics: ``mattermost_plugin_calls_process_*`` +- WebRTC connection metrics: ``mattermost_plugin_calls_rtc_*`` +- WebSocket metrics: ``mattermost_plugin_calls_websocket_*`` +- Store metrics: ``mattermost_plugin_calls_store_ops_total`` + +Grafana Dashboards +---------------- + +Official Dashboard +^^^^^^^^^^^^^^^^ + +Mattermost provides an official Grafana dashboard for monitoring Calls performance: + +1. **Download the dashboard JSON**: + + Get it from [GitHub](https://github.com/mattermost/mattermost-performance-assets/blob/master/grafana/mattermost-calls-performance-monitoring.json) + +2. **Import the dashboard** into Grafana: + + - Navigate to Dashboards > Import + - Upload the JSON file or paste its contents + - Select your Prometheus data source + - Click Import + +3. **Key panels** in the dashboard: + + - Active Calls and Participants + - RTC Connection States + - Media Tracks (In/Out) + - CPU and Memory Usage + - Network Traffic + - Error Counts + +Custom Dashboard Panels +^^^^^^^^^^^^^^^^^^^^ + +Consider adding these custom panels to your dashboard: + +1. **Error Rate Panel**: + + PromQL query: + + .. code-block:: text + + sum(rate(rtcd_rtc_errors_total[5m])) by (type) + +2. **Connection Success Rate**: + + PromQL query: + + .. code-block:: text + + sum(rtcd_rtc_conn_states_total{state="connected"}) / (sum(rtcd_rtc_conn_states_total{state="connected"}) + sum(rtcd_rtc_conn_states_total{state="failed"})) + +3. **Media Track Count by Direction**: + + PromQL query: + + .. code-block:: text + + sum(rtcd_rtc_rtp_tracks_total) by (direction) + +Alerting Recommendations +--------------------- + +Setting up alerts helps you respond quickly to potential issues. Here are recommended alert thresholds: + +1. **High CPU Usage Alert**: + + PromQL query: + + .. code-block:: text + + rate(rtcd_process_cpu_seconds_total[5m]) > 0.8 + + This alerts when CPU usage exceeds 80% over 5 minutes. + +2. **Connection Failure Rate Alert**: + + PromQL query: + + .. code-block:: text + + sum(rate(rtcd_rtc_conn_states_total{state="failed"}[5m])) / sum(rate(rtcd_rtc_conn_states_total[5m])) > 0.1 + + This alerts when more than 10% of connection attempts fail over 5 minutes. + +3. **WebSocket Connection Drop Alert**: + + PromQL query: + + .. code-block:: text + + rate(rtcd_ws_connections_total{state="closed"}[5m]) > 5 + + This alerts when more than 5 WebSocket connections are dropping per minute. + +4. **Memory Leak Detection**: + + PromQL query: + + .. code-block:: text + + rate(rtcd_process_resident_memory_bytes[30m]) > 1024 * 1024 * 10 + + This alerts when memory usage is increasing by more than 10MB per 30 minutes. + +Performance Baselines +------------------ + +Understanding normal performance patterns helps identify anomalies. Here are baseline expectations based on call volume: + +Small Deployment (1-10 concurrent calls) +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +- **CPU Usage**: 5-15% on a modern 4-core server +- **Memory Usage**: 200-500MB +- **Network**: 5-20 Mbps (depending on participant count and unmuted users) + +Medium Deployment (10-50 concurrent calls) +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +- **CPU Usage**: 15-40% on a modern 8-core server +- **Memory Usage**: 500MB-1GB +- **Network**: 20-100 Mbps + +Large Deployment (50+ concurrent calls) +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +- **CPU Usage**: Consider multiple RTCD instances +- **Memory Usage**: 1-2GB per instance +- **Network**: 100Mbps-1Gbps (with horizontal scaling) + +Below are the detailed benchmarks based on internal performance testing: + ++-------+------------+--------------+----------------+-----------+--------------+--------------------+----------------+ +| Calls | Users/call | Unmuted/call | Screen sharing | CPU (avg) | Memory (avg) | Bandwidth (in/out) | Instance (EC2) | ++=======+============+==============+================+===========+==============+====================+================+ +| 100 | 8 | 2 | no | 60% | 0.5GB | 22Mbps / 125Mbps | c6i.xlarge | ++-------+------------+--------------+----------------+-----------+--------------+--------------------+----------------+ +| 100 | 8 | 2 | no | 30% | 0.5GB | 22Mbps / 125Mbps | c6i.2xlarge | ++-------+------------+--------------+----------------+-----------+--------------+--------------------+----------------+ +| 100 | 8 | 2 | yes | 86% | 0.7GB | 280Mbps / 2.2Gbps | c6i.2xlarge | ++-------+------------+--------------+----------------+-----------+--------------+--------------------+----------------+ +| 10 | 50 | 2 | no | 35% | 0.3GB | 5.25Mbps / 86Mbps | c6i.xlarge | ++-------+------------+--------------+----------------+-----------+--------------+--------------------+----------------+ +| 10 | 50 | 2 | no | 16% | 0.3GB | 5.25Mbps / 86Mbps | c6i.2xlarge | ++-------+------------+--------------+----------------+-----------+--------------+--------------------+----------------+ +| 10 | 50 | 2 | yes | 90% | 0.3GB | 32Mbps / 1.33Gbps | c6i.xlarge | ++-------+------------+--------------+----------------+-----------+--------------+--------------------+----------------+ +| 10 | 50 | 2 | yes | 45% | 0.3GB | 32Mbps / 1.33Gbps | c6i.2xlarge | ++-------+------------+--------------+----------------+-----------+--------------+--------------------+----------------+ +| 5 | 200 | 2 | no | 65% | 0.6GB | 8.2Mbps / 180Mbps | c6i.xlarge | ++-------+------------+--------------+----------------+-----------+--------------+--------------------+----------------+ +| 5 | 200 | 2 | no | 30% | 0.6GB | 8.2Mbps / 180Mbps | c6i.2xlarge | ++-------+------------+--------------+----------------+-----------+--------------+--------------------+----------------+ +| 5 | 200 | 2 | yes | 90% | 0.7GB | 31Mbps / 2.2Gbps | c6i.2xlarge | ++-------+------------+--------------+----------------+-----------+--------------+--------------------+----------------+ + +Metric Retention Recommendations +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +For historical analysis and trend identification: + +- **Short-term metrics**: Keep 15-second resolution data for 2 weeks +- **Medium-term metrics**: Keep 1-minute resolution data for 2 months +- **Long-term metrics**: Keep 5-minute resolution data for 1 year + +Configure Prometheus storage accordingly to balance disk usage with retention needs. \ No newline at end of file diff --git a/source/configure/calls-rtcd-setup.rst b/source/configure/calls-rtcd-setup.rst new file mode 100644 index 00000000000..73450faaabd --- /dev/null +++ b/source/configure/calls-rtcd-setup.rst @@ -0,0 +1,450 @@ +RTCD Setup and Configuration +========================= + +.. include:: ../_static/badges/allplans-cloud-selfhosted.rst + :start-after: :nosearch: + +.. raw:: html + +
+ +Note + +|plans-img-yellow| The rtcd service is available only on `Enterprise `__ plans + +.. |plans-img-yellow| image:: ../_static/images/badges/flag_icon_yellow.svg + :class: mm-badge-flag + +.. raw:: html + +
+ +This guide provides detailed instructions for setting up, configuring, and validating a Mattermost Calls deployment using the dedicated RTCD service. + +- `Why use RTCD <#why-use-rtcd>`__ +- `Prerequisites <#prerequisites>`__ +- `Installation and deployment <#installation-and-deployment>`__ +- `Configuration <#configuration>`__ +- `Validation and testing <#validation-and-testing>`__ +- `Horizontal scaling <#horizontal-scaling>`__ +- `Integration with Mattermost <#integration-with-mattermost>`__ + +Why use RTCD +----------- + +The RTCD service (Real-Time Communication Daemon) is the recommended way to host Mattermost Calls for production environments for the following key reasons: + +1. **Performance isolation**: RTCD runs as a standalone service, isolating the resource-intensive calls traffic from the main Mattermost servers. This prevents call traffic spikes from affecting the rest of your Mattermost deployment. + +2. **Scalability**: When calls traffic increases, additional RTCD instances can be deployed to handle the load, without affecting your Mattermost servers. + +3. **Call stability**: With RTCD, if a Mattermost server needs to be restarted, ongoing calls won't be disrupted. The call audio/video will continue while the Mattermost server restarts (though some features like emoji reactions will be temporarily unavailable). + +4. **Kubernetes support**: For Kubernetes deployments, RTCD is the only officially supported way to run Calls. + +5. **Real-time optimization**: The RTCD service is specifically optimized for real-time audio/video traffic, with configurations prioritizing low latency over throughput. + +Prerequisites +------------ + +Before deploying RTCD, ensure you have: + +- A Mattermost Enterprise license +- A server or VM with sufficient CPU and network capacity (see the `Performance `__ section for sizing guidance) +- Network configuration that allows: + - UDP port 8443 (default) open between clients and RTCD servers + - TCP port 8045 (default) open between Mattermost servers and RTCD servers + - TCP port 8443 (optional backup) between clients and RTCD servers + +Installation and Deployment +-------------------------- + +There are multiple ways to deploy RTCD, depending on your environment: + +Bare Metal or VM Deployment +^^^^^^^^^^^^^^^^^^^^^^^^^^ + +1. Download the latest release from the `RTCD GitHub repository `__ + +2. Create a configuration file (``config.toml``) with the following minimal settings: + + .. code-block:: toml + + [api] + http.listen_address = ":8045" + + [rtc] + ice_address_udp = "" + ice_port_udp = 8443 + ice_host_override = "YOUR_RTCD_SERVER_PUBLIC_IP" + +3. Run the RTCD service: + + .. code-block:: bash + + ./rtcd --config config.toml + +Kubernetes Deployment +^^^^^^^^^^^^^^^^^^^ + +For Kubernetes deployments, use the official Helm chart: + +1. Add the Mattermost Helm repository: + + .. code-block:: bash + + helm repo add mattermost https://helm.mattermost.com + helm repo update + +2. Install the RTCD chart: + + .. code-block:: bash + + helm install mattermost-rtcd mattermost/mattermost-rtcd \ + --set ingress.enabled=true \ + --set ingress.host=rtcd.example.com \ + --set service.annotations."service\\.beta\\.kubernetes\\.io/aws-load-balancer-backend-protocol"=udp \ + --set rtcd.ice.hostOverride=rtcd.example.com + + Refer to the `RTCD Helm chart documentation `__ for additional configuration options. + +Docker Deployment +^^^^^^^^^^^^^^^ + +1. Create a configuration file as described in the Bare Metal section + +2. Run the RTCD container: + + .. code-block:: bash + + docker run -d --name rtcd \ + -p 8045:8045 \ + -p 8443:8443/udp \ + -p 8443:8443/tcp \ + -v /path/to/config.toml:/rtcd/config/config.toml \ + mattermost/rtcd:latest + +Configuration +----------- + +RTCD Configuration File +^^^^^^^^^^^^^^^^^^^^^ + +The RTCD service uses a TOML configuration file. Here's a comprehensive example with commonly used settings: + +.. code-block:: toml + + [api] + # The address and port to which the HTTP API server will listen + http.listen_address = ":8045" + # Security settings for authentication + security.allow_self_registration = false + security.enable_admin = true + security.admin_secret_key = "YOUR_API_KEY" + # Configure allowed origins for CORS + security.allowed_origins = ["https://mattermost.example.com"] + + [rtc] + # The UDP address and port for media traffic + ice_address_udp = "" + ice_port_udp = 8443 + # The TCP address and port for fallback connections + ice_address_tcp = "" + ice_port_tcp = 8443 + # Public hostname or IP that clients will use to connect + ice_host_override = "rtcd.example.com" + + [logger] + # Logging configuration + enable_console = true + console_json = false + console_level = "INFO" + enable_file = true + file_json = true + file_level = "DEBUG" + file_location = "rtcd.log" + + [metrics] + # Prometheus metrics configuration + enable_prom = true + prom_port = 9090 + +Key Configuration Options: + +- **api.http.listen_address**: The address and port where the RTCD HTTP API service listens +- **rtc.ice_address_udp**: The UDP address for media traffic (empty means listen on all interfaces) +- **rtc.ice_port_udp**: The UDP port for media traffic +- **rtc.ice_address_tcp**: The TCP address for fallback media traffic +- **rtc.ice_port_tcp**: The TCP port for fallback media traffic +- **rtc.ice_host_override**: The public hostname or IP address clients will use to connect to RTCD +- **api.security.allowed_origins**: List of allowed origins for CORS +- **api.security.admin_secret_key**: API key for Mattermost servers to authenticate with RTCD + +STUN/TURN Configuration +^^^^^^^^^^^^^^^^^^^^^ + +For clients behind strict firewalls, you may need to configure STUN/TURN servers. In the RTCD configuration file, reference your STUN/TURN servers as follows: + +.. code-block:: toml + + [rtc] + # STUN/TURN server configuration + ice_servers = [ + { urls = ["stun:stun.example.com:3478"] }, + { urls = ["turn:turn.example.com:3478"], username = "turnuser", credential = "turnpassword" } + ] + +We recommend using `coturn `__ for your TURN server implementation. For setting up and configuring coturn: + +1. Refer to the `official coturn documentation `__ +2. A basic coturn configuration file might look like this: + + .. code-block:: text + + # Basic coturn configuration - customize for your environment + # Refer to official documentation for complete options + + # Listener interface(s) + listening-ip=YOUR_SERVER_IP + listening-port=3478 + + # Relay interface(s) + relay-ip=YOUR_SERVER_IP + min-port=49152 + max-port=65535 + + # Authentication + lt-cred-mech + user=turnuser:turnpassword + + # TLS (recommended for production) + # cert=/path/to/cert.pem + # pkey=/path/to/privkey.pem + + # Logging + verbose + fingerprint + +3. Always test your TURN server connectivity before deploying to production using a tool like `Trickle ICE `__ + +For more advanced scenarios or troubleshooting, consult the official coturn documentation and WebRTC resources. + +System Tuning +^^^^^^^^^^^ + +For high-volume deployments, tune your Linux system: + +1. Add the following to ``/etc/sysctl.conf``: + + .. code-block:: bash + + # Increase UDP buffer sizes + net.core.rmem_max = 16777216 + net.core.wmem_max = 16777216 + net.core.optmem_max = 16777216 + +2. Apply the settings: + + .. code-block:: bash + + sudo sysctl -p + +Validation and Testing +-------------------- + +After deploying RTCD, validate the installation: + +1. **Check service status**: + + .. code-block:: bash + + curl http://YOUR_RTCD_SERVER:8045/api/v1/health + # Should return {"status":"ok"} + +2. **Test UDP connectivity**: + + On the RTCD server: + + .. code-block:: bash + + nc -l -u -p 8443 + + On a client machine: + + .. code-block:: bash + + nc -v -u YOUR_RTCD_SERVER 8443 + + Type a message and hit Enter on either side. If messages are received on both ends, UDP connectivity is working. + +3. **Test TCP connectivity** (if enabled): + + Similar to the UDP test, but remove the ``-u`` flag from both commands. + +4. **Monitor metrics**: + + If you've enabled Prometheus metrics, access them at: + + .. code-block:: bash + + curl http://YOUR_RTCD_SERVER:9090/metrics + +Horizontal Scaling +---------------- + +To scale RTCD horizontally: + +1. **Deploy multiple RTCD instances**: + + Deploy multiple RTCD servers, each with their own unique IP address. + +2. **Configure DNS-based load balancing**: + + Set up a DNS record that points to multiple RTCD IP addresses: + + .. code-block:: bash + + rtcd.example.com. IN A 10.0.0.1 + rtcd.example.com. IN A 10.0.0.2 + rtcd.example.com. IN A 10.0.0.3 + +3. **Configure health checks**: + + Set up health checks to automatically remove unhealthy RTCD instances from DNS. + +4. **Configure Mattermost**: + + In the Mattermost System Console, set the **RTCD Service URL** to your DNS name (e.g., ``rtcd.example.com``). + +The Mattermost Calls plugin will distribute calls among the available RTCD hosts. Remember that a single call will always be hosted on one RTCD instance; sessions belonging to the same call are not spread across different instances. + +RTCD Connectivity Diagrams +----------------------- + +Understanding the network connectivity between clients, Mattermost servers, and RTCD services is crucial for proper deployment. The following diagrams illustrate the key communication paths in different deployment scenarios. + +Basic RTCD Deployment +^^^^^^^^^^^^^^^^^^^ + +In this basic deployment model, RTCD handles all media traffic while the Mattermost server manages signaling: + +:: + + +----------------+ +----------------+ +----------------+ + | | 1 | | 2 | | + | Client A |<----->| Mattermost |<----->| RTCD | + | | WS | Server | API | Service | + | | | | | | + +----------------+ +----------------+ +----------------+ + ^ ^ + | | + | Media (RTP) | + | 3 | + +-------------------------------------------------+ + +1. **WebSocket Connection (WS)**: Clients connect to Mattermost server using WebSockets for signaling and call control +2. **API Connection**: Mattermost server communicates with RTCD service for call setup and management +3. **Media (RTP) Connection**: Clients send/receive audio and screen sharing directly with RTCD service + +High Availability RTCD Deployment +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +For high availability, multiple RTCD instances can be deployed with DNS-based load balancing: + +:: + + +----------------+ +----------------+ +----------------+ + | | | | | RTCD #1 | + | Client A | | Mattermost |<----->| | + | | | Server | +----------------+ + +----------------+ | HA | + ^ | | +----------------+ + | | | | RTCD #2 | + +----------------+ | |<----->| | + | | | | +----------------+ + | Client B |<----->| | + | | | | +----------------+ + +----------------+ | | | RTCD #3 | + ^ | |<----->| | + | +----------------+ +----------------+ + | ^ + | | + +-------------------------------------------------+ + Media flows to appropriate + RTCD instance + +In this model: +- Each client connects to Mattermost through the load balancer +- Mattermost distributes calls among available RTCD instances +- A single call is always hosted on one RTCD instance +- If an RTCD instance fails, only calls on that instance are affected + +RTCD with TURN Server +^^^^^^^^^^^^^^^^^^ + +For environments with restrictive firewalls, a TURN server can relay media: + +:: + + +----------------+ +----------------+ +----------------+ + | | | | | | + | Client A |<----->| Mattermost |<----->| RTCD | + | (Firewall) | | Server | | Service | + | | | | | | + +----------------+ +----------------+ +----------------+ + ^ ^ + | | + | | + v | + +----------------+ | + | | | + | TURN Server |<---------------------------------------+ + | | Media Relay + +----------------+ + +- Clients behind restrictive firewalls connect to the TURN server +- TURN server relays media between clients and RTCD +- Adds some latency but enables connectivity in challenging network environments + +Detailed Network Protocol Diagram +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +This diagram shows the specific protocols and ports used in a typical RTCD deployment: + +:: + + WebSockets HTTP(S) RTCD API + TCP 80/443 TCP 8045 + +--------------+ +--------------+ +------------------------+ + | | | | | | + | Clients |<--| Mattermost |<--| RTCD | + | | | Server | | | + +--------------+ +--------------+ +------------------------+ + ^ ^ + | | + | | + | Media (RTP/RTCP) | + +------------------------------------------+ + UDP 8443 (preferred) + TCP 8443 (fallback) + +Integration with Mattermost +------------------------- + +Once RTCD is properly set up and validated, configure Mattermost to use it: + +1. Go to **System Console > Plugins > Calls** + +2. Enable the **Enable RTCD Service** option + +3. Set the **RTCD Service URL** to your RTCD service address (either a single server or DNS load-balanced hostname) + +4. If configured, enter the **RTCD API Key** that matches the one in your RTCD configuration + +5. Save the configuration + +6. Test by creating a new call in any Mattermost channel + +7. Verify that the call is being routed through RTCD by checking the RTCD logs and metrics + +For detailed Mattermost Calls configuration options, see the `Calls Plugin Configuration Settings `__ documentation. \ No newline at end of file diff --git a/source/configure/calls-troubleshooting.rst b/source/configure/calls-troubleshooting.rst new file mode 100644 index 00000000000..d824cdfbca8 --- /dev/null +++ b/source/configure/calls-troubleshooting.rst @@ -0,0 +1,440 @@ +Troubleshooting Mattermost Calls +=========================== + +.. include:: ../_static/badges/allplans-cloud-selfhosted.rst + :start-after: :nosearch: + +This guide provides comprehensive troubleshooting steps for Mattermost Calls, particularly focusing on the dedicated RTCD deployment model. Follow these steps to identify and resolve common issues. + +- `Common issues <#common-issues>`__ +- `Connectivity troubleshooting <#connectivity-troubleshooting>`__ +- `Log analysis <#log-analysis>`__ +- `Performance issues <#performance-issues>`__ +- `Debugging tools <#debugging-tools>`__ +- `Advanced diagnostics <#advanced-diagnostics>`__ + +Common Issues +----------- + +Calls Not Connecting +^^^^^^^^^^^^^^^^^^^^ + +**Symptoms**: Users can start calls but cannot connect, or calls connect but drop quickly. + +**Possible causes and solutions**: + +1. **Network connectivity issues**: + - Verify that UDP port 8443 (or your configured port) is open between clients and RTCD servers + - Ensure TCP port 8045 is open between Mattermost and RTCD servers + - Check that any load balancers are properly configured for UDP traffic + +2. **ICE configuration issues**: + - Verify the ``ice.hostOverride`` setting in RTCD configuration matches the publicly accessible hostname or IP + - Ensure STUN/TURN servers are properly configured if needed + +3. **API connectivity**: + - Verify that Mattermost servers can reach the RTCD API endpoint + - Check that the API key is correctly configured in both Mattermost and RTCD + +4. **Plugin configuration**: + - Ensure the Calls plugin is enabled and properly configured + - Verify the RTCD service URL is correct in the System Console + +Audio Issues +^^^^^^^^^^^ + +**Symptoms**: Users can connect to calls, but audio is one-way, choppy, or not working. + +**Possible causes and solutions**: + +1. **Client permissions**: + - Ensure browser/app has microphone permissions + - Check if users are using multiple audio devices that might interfere + +2. **Network quality**: + - High latency or packet loss can cause audio issues + - Try testing with TCP fallback enabled (requires RTCD v0.11+ and Calls v0.17+) + +3. **Audio device configuration**: + - Users should verify their audio input/output settings + - Try different browsers or the desktop app + +Call Quality Issues +^^^^^^^^^^^^^^^^^ + +**Symptoms**: Calls connect but quality is poor, with latency, echo, or distortion. + +**Possible causes and solutions**: + +1. **Server resources**: + - Check CPU usage on RTCD servers - high CPU can cause quality issues + - Refer to the `Performance Monitoring setup guide <../performance-monitoring/setup-guide.rst>`__ for detailed instructions on monitoring and optimizing performance + - Monitor network bandwidth usage + +2. **Network congestion**: + - Check for packet loss between clients and RTCD + - Consider network QoS settings to prioritize real-time traffic + +3. **Client-side issues**: + - Browser or app limitations + - Hardware limitations (CPU, memory) + - Network congestion at the user's location + +Connectivity Troubleshooting +-------------------------- + +Basic Connectivity Tests +^^^^^^^^^^^^^^^^^^^^^^ + +1. **HTTP API connectivity test**: + + Test if the RTCD API is reachable: + + .. code-block:: bash + + curl http://YOUR_RTCD_SERVER:8045/api/v1/health + # Expected response: {"status":"ok"} + +2. **UDP connectivity test**: + + On the RTCD server: + + .. code-block:: bash + + nc -l -u -p 8443 + + On a client machine: + + .. code-block:: bash + + nc -v -u YOUR_RTCD_SERVER 8443 + + Type a message and press Enter. If you see the message on both sides, UDP connectivity is working. + +3. **TCP fallback connectivity test**: + + Same as the UDP test, but without the ``-u`` flag: + + On the RTCD server: + + .. code-block:: bash + + nc -l -p 8443 + + On a client machine: + + .. code-block:: bash + + nc -v YOUR_RTCD_SERVER 8443 + +Network Packet Analysis +^^^^^^^^^^^^^^^^^^^^^ + +To capture and analyze network traffic: + +1. **Capture UDP traffic on the RTCD server**: + + .. code-block:: bash + + sudo tcpdump -n 'udp port 8443' -i any + +2. **Capture TCP API traffic**: + + .. code-block:: bash + + sudo tcpdump -n 'tcp port 8045' -i any + +3. **Analyze traffic patterns**: + + - Verify packets are flowing both ways + - Look for ICMP errors that might indicate firewall issues + - Check for patterns of packet loss + +4. **Use Wireshark for deeper analysis**: + + For more detailed packet inspection, capture traffic with tcpdump and analyze with Wireshark: + + .. code-block:: bash + + sudo tcpdump -n -w calls_traffic.pcap 'port 8443' + + Then analyze the ``calls_traffic.pcap`` file with Wireshark. + +Firewall Configuration Checks +^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +1. **Check iptables rules** (Linux): + + .. code-block:: bash + + sudo iptables -L -n + + Ensure there are no rules blocking UDP port 8443 or TCP ports 8045/8443. + +2. **Check cloud provider security groups**: + + Verify that security groups or network ACLs allow: + - Inbound UDP on port 8443 from client networks + - Inbound TCP on port 8045 from Mattermost server networks + - Inbound TCP on port 8443 (if TCP fallback is enabled) + +3. **Check intermediate firewalls**: + + - Corporate firewalls might block UDP traffic + - Some networks might require TURN servers for traversal + +Log Analysis +---------- + +RTCD Logs +^^^^^^^^ + +The RTCD service logs important events and errors. Set the log level to "debug" for troubleshooting: + +1. **In the configuration file**: + + .. code-block:: json + + { + "log": { + "level": "debug", + "json": true + } + } + +2. **Common log patterns to look for**: + + - **Connection errors**: Look for "failed to connect" or "connection error" messages + - **ICE negotiation failures**: Look for "ICE failed" or "ICE timeout" messages + - **API authentication issues**: Look for "unauthorized" or "invalid API key" messages + +Mattermost Logs +^^^^^^^^^^^^^ + +Check the Mattermost server logs for Calls plugin related issues: + +1. **Enable debug logging** in System Console > Environment > Logging > File Log Level + +2. **Filter for Calls-related logs**: + + .. code-block:: bash + + grep -i "calls" /path/to/mattermost.log + +3. **Look for common patterns**: + + - Connection errors to RTCD + - Plugin initialization issues + - WebSocket connection problems + +Browser Console Logs +^^^^^^^^^^^^^^^^^ + +Instruct users to check their browser console logs: + +1. **In Chrome/Edge**: + - Press F12 to open Developer Tools + - Go to the Console tab + - Look for errors related to WebRTC, Calls, or media permissions + +2. **Specific patterns to look for**: + + - "getUserMedia" errors (microphone permission issues) + - "ICE connection" failures + - WebSocket connection errors + +Performance Issues +--------------- + +Diagnosing High CPU Usage +^^^^^^^^^^^^^^^^^^^^^^^ + +If RTCD servers show high CPU usage: + +1. **Check concurrent calls and participants**: + + - Access the Prometheus metrics endpoint to see active sessions + - Compare with the benchmark data in the documentation + +2. **Profile CPU usage** (Linux): + + .. code-block:: bash + + top -p $(pgrep rtcd) + + Or for detailed per-thread usage: + + .. code-block:: bash + + ps -eLo pid,ppid,tid,pcpu,comm | grep rtcd + +3. **Enable pprof profiling** (if needed): + + Add to your RTCD configuration: + + .. code-block:: json + + { + "debug": { + "pprof": true, + "pprofPort": 6060 + } + } + + Then capture a CPU profile: + + .. code-block:: bash + + curl http://localhost:6060/debug/pprof/profile > cpu.profile + + Analyze with: + + .. code-block:: bash + + go tool pprof -http=:8080 cpu.profile + +Diagnosing Network Bottlenecks +^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +If you suspect network bandwidth issues: + +1. **Monitor network utilization**: + + .. code-block:: bash + + iftop -n + +2. **Check for packet drops**: + + .. code-block:: bash + + netstat -su | grep -E 'drop|error' + +3. **Verify system network buffers**: + + .. code-block:: bash + + sysctl -a | grep net.core.rmem + sysctl -a | grep net.core.wmem + + Ensure these match the recommended values: + + .. code-block:: bash + + net.core.rmem_max = 16777216 + net.core.wmem_max = 16777216 + net.core.optmem_max = 16777216 + +Debugging Tools +------------ + +WebRTC Internals (Chrome) +^^^^^^^^^^^^^^^^^^^^^^^^ + +For in-depth WebRTC diagnostics in Chrome: + +1. **Access chrome://webrtc-internals** in a new browser tab while on a call + +2. **Examine the connection details**: + + - ICE connection state + - Selected candidate pairs + - DTLS/SRTP setup + - Bandwidth estimation + +3. **Look for specific issues**: + + - Candidate gathering delays + - Failed ICE connections + - Bandwidth limitations + +Prometheus Metrics Analysis +^^^^^^^^^^^^^^^^^^^^^^^^^ + +Use Prometheus metrics for real-time and historical performance data: + +1. **Key metrics to monitor**: + + - ``rtcd_rtc_sessions_total``: Number of active RTC sessions + - ``rtcd_rtc_conn_states_total``: Connection state transitions + - ``rtcd_rtc_errors_total``: Error counts + - ``rtcd_rtc_rtp_tracks_total``: Media track count + - ``rtcd_process_cpu_seconds_total``: CPU usage + +2. **Set up Grafana dashboards**: + + Import the official [Mattermost Calls dashboard](https://github.com/mattermost/mattermost-performance-assets/blob/master/grafana/mattermost-calls-performance-monitoring.json) into Grafana for visualization. + +Advanced Diagnostics +----------------- + +WebRTC Diagnostic Commands +^^^^^^^^^^^^^^^^^^^^^^^^ + +For detailed WebRTC diagnostics: + +1. **Test STUN server connectivity**: + + .. code-block:: bash + + # Using stun-client (you may need to install it) + stun-client stun.global.calls.mattermost.com + + This should return your public IP address if STUN is working correctly. + +2. **Verify TURN server**: + + .. code-block:: bash + + # Using turnutils_uclient (part of coturn) + turnutils_uclient -v -s your-turn-server -u username -p password + + This tests if your TURN server is correctly configured. + +3. **Test end-to-end latency**: + + Between client locations and RTCD server: + + .. code-block:: bash + + ping -c 10 your-rtcd-server + + Look for consistent, low latency (<100ms ideally for voice calls). + +Client-Side Testing Tools +^^^^^^^^^^^^^^^^^^^^^^^ + +Tools to help diagnose client-side issues: + +1. **WebRTC Troubleshooter**: + + Direct users to [WebRTC Troubleshooter](https://test.webrtc.org/) for browser capability testing. + +2. **Network Quality Tests**: + + Use [Speedtest](https://www.speedtest.net/) or similar to check internet connection quality. + +3. **Browser-Specific WebRTC Info**: + + - Chrome: chrome://webrtc-internals + - Firefox: about:webrtc + +When to Contact Support +^^^^^^^^^^^^^^^^^^^^ + +Consider contacting Mattermost Support when: + +1. You've tried basic troubleshooting steps without resolution +2. You're experiencing persistent connection failures across multiple clients +3. You notice unexpected or degraded performance despite proper configuration +4. You need help interpreting diagnostic information +5. You suspect a bug in the Calls plugin or RTCD service + +When contacting support, please include: + +- RTCD version and configuration (with sensitive information redacted) +- Mattermost server version +- Calls plugin version +- Client environments (browsers, OS versions) +- Relevant logs and diagnostic information +- Detailed description of the issue and steps to reproduce \ No newline at end of file diff --git a/source/scale/elasticsearch.rst b/source/scale/elasticsearch.rst index 72ffe8f3d5f..176ad2d7914 100644 --- a/source/scale/elasticsearch.rst +++ b/source/scale/elasticsearch.rst @@ -217,7 +217,7 @@ The following JSON provides an example of a "least privilege" permission set tha "index_permissions": [ { "index_patterns": [ - "t-70907*" + "\*" ], "allowed_actions": [ "indices:admin/get", @@ -245,7 +245,7 @@ A simpler, more flexible, and resilient variant of the above would be: "index_permissions": [ { "index_patterns": [ - "t-70907*" + "\*" ], "allowed_actions": [ "indices:*"