Tuning PostgreSQL for a Matrix Synapse Homeserver

7. Disk Space

  1. Database Size
    1. Synapse Compress State Utility
  2. Media Size

Efficient disk space management ensures that your server remains responsive and that you're making the most of your available resources. This is difficult to cover in detail, as the applications and usage of a Matrix server vary wildly, but I've included some general guidance below:

Database Size

Over time, your PostgreSQL database will grow as more data is added. It's important to keep an eye on the size of your tables, especially those that are known to grow rapidly, such as state_groups_state in Synapse.

This query will list your largest tables:

WITH table_sizes AS (
    SELECT table_schema,
           table_name, 
           pg_total_relation_size('"' || table_schema || '"."' || table_name || '"') AS size
    FROM information_schema.tables
    WHERE table_schema NOT IN ('pg_catalog', 'information_schema')
    ORDER BY size DESC
)
SELECT table_schema AS schema,
       table_name AS table,
       pg_size_pretty(size) AS "size"
FROM table_sizes
LIMIT 10;

 schema |            table             |  size
--------+------------------------------+--------
 public | state_groups_state           | 29 GB
 public | event_json                   | 818 MB
...

On a Synapse server, you should find state_groups_state is by far the largest one, and can see which rooms are the largest with a query like this:

WITH room_counts AS (
    SELECT room_id,
           COUNT(*),
           COUNT(*) * 1.0 / SUM(COUNT(*)) OVER () AS ratio
    FROM state_groups_state
    GROUP BY room_id
), table_sizes AS (
    SELECT table_schema,
           table_name, 
           pg_total_relation_size('"' || table_schema || '"."' || table_name || '"') AS size
    FROM information_schema.tables
    WHERE table_name = 'state_groups_state'
)
SELECT rc.room_id AS room_id,
       rc.count AS state_entries,
       ROUND(rc.ratio * 100, 2) AS percentage,
       pg_size_pretty(ts.size * rc.ratio) AS estimated_size
FROM room_counts rc, table_sizes ts
ORDER BY rc.count DESC
LIMIT 10;

            room_id             | state_entries | percentage | estimated_size
--------------------------------+---------------+------------+----------------
 !OGEhHVWSdvArJzumhm:matrix.org |     125012687 |      91.75 | 26 GB
 !ehXvUhWNASUkSLvAGP:matrix.org |      10003431 |       7.34 | 2152 MB
...

Synapse Compress State Utility

For Synapse, the state_groups_state table can grow significantly. To help manage this, The Matrix Foundation has developed a tool called Synapse Compress State that can compress state maps without losing any data.

Media Size

Media files, such as images and videos and other message attachments, are stored on the filesystem rather than the database, but are tracked in PostgreSQL. Large media files can consume significant disk space, and it can be a challenge to narrow down what is using all of the space through Synapse directly.

With this query you can see how many files of each type were uploaded each month, and the total disk space that consumes:

WITH media_size AS (
    SELECT EXTRACT(YEAR FROM to_timestamp(created_ts / 1000)) AS year,
        EXTRACT(MONTH FROM to_timestamp(created_ts / 1000)) AS month,
        media_type AS mime_type,
        COUNT(*) AS files,
        SUM(media_length) AS total_bytes
    FROM local_media_repository
    GROUP BY media_type, year, month
    ORDER BY total_bytes DESC
)
SELECT year,
    month,
    mime_type,
    files,
    pg_size_pretty(total_bytes) AS total_size
FROM  media_size
LIMIT 10;

 year | month | mime_type  | files | total_size
------+-------+------------+-------+------------
 2023 |     9 | video/mp4  |   464 | 2004 MB
 2023 |     9 | image/png  |   592 | 1648 MB
 2023 |    10 | video/mp4  |   308 | 1530 MB
 2023 |     8 | image/png  |  2614 | 1316 MB
 ...

Note: For tips on how to have Synapse automatically reduce the size of your media files, see Extra Tips in the Synapse Deployment Guide.