Go to file
2023-01-09 11:21:58 -06:00
frontend blah 2023-01-09 11:21:58 -06:00
readme adding models / services 2023-01-06 18:13:51 -06:00
.gitignore ignore config.json 2023-01-07 15:06:16 -06:00
config.example.json ignore config.json 2023-01-07 15:06:16 -06:00
db_model.go copypasting frontend from greenhouse + working on login 2023-01-07 14:44:33 -06:00
disk_space.go adding room names 2023-01-07 20:21:53 -06:00
frontend.go add delete rooms function 2023-01-09 11:11:51 -06:00
go.mod copypasting frontend from greenhouse + working on login 2023-01-07 14:44:33 -06:00
go.sum copypasting frontend from greenhouse + working on login 2023-01-07 14:44:33 -06:00
main.go blah 2023-01-09 11:21:58 -06:00
matrix_admin_service.go add delete rooms function 2023-01-09 11:11:51 -06:00
ReadMe.md copypasting frontend from greenhouse + working on login 2023-01-07 14:44:33 -06:00
storage_service.go blah 2023-01-09 11:21:58 -06:00

matrix-synapse-diskspace-janitor

scruffy the janitor from futurama

toilets and boilers, boilers and toilets

Matrix-synapse (the matrix homeserver implementation) requires a postgres database server to operate. It stores a lot of stuff in this postgres database, information about all the rooms that users on the server have joined, etc.

The problem at hand:

Matrix-synapse stores a lot of data that it has no way of cleaning up or deleting.

Specifically, there is a table it creates in the database called state_groups_state:

root@matrix:~# sudo -u postgres pg_dump synapse -t state_groups_state --schema-only
--
-- PostgreSQL database dump
--
...

CREATE TABLE public.state_groups_state (
    state_group bigint NOT NULL,
    room_id text NOT NULL,
    type text NOT NULL,
    state_key text NOT NULL,
    event_id text NOT NULL
);

I don't understand what this table is for, however, I can recognize fairly easily that it accounts for the grand majority of the disk space bloat of a matrix-synapse instance:

top 10 tables by disk space used, cyberia.club instance:

a pie chart showing state_groups_state using 87% of the disk space

So, I think it's safe to say that if we can cut down the size of state_groups_state, then we can solve our disk space issues.

I know that there are other projects dedicated to this, like https://github.com/matrix-org/rust-synapse-compress-state

However, a cursory examination of the data in state_groups_state led me to believe maybe there is an easier and better way.

state_groups_state DOES have a room_id column on it. It's not indexed by room_id, but we can still count the # of rows for each room and rank them:

top 100 rooms by number of state_groups_state rows, cyberia.club instance:

a pie chart with two slices taking up about 2 thirds of the pie, and the remaining third taken up mostly by the next 8 slices

In summary, it looks like

about 90% of the disk space used by matrix-synapse is in state_groups_state, and about 90% of the rows in state_groups_state come from just a handfull of rooms.

So from this information we have hatched a plan:

Just delete those rooms from our homeserver 4head

However, unfortunately the matrix-synapse delete room API does not remove anything from state_groups_state.

This is similar to the way that the matrix-synapse message retention policies also do not remove anything from state_groups_state.

In fact, probably helps explain why state_groups_state gets hundreds of millions of rows and takes up so much disk space: Nothing ever deletes from it!!