Commit graph

42 commits

Author SHA1 Message Date
ppom
3a61db9e6f
plugin: shutdown: add function that permit graceful shutdown by signal
Handling SIGTERM (etc) signals permit graceful shutdown, cleaning of resources etc.

Added in ipset and cluster.
2026-02-12 12:00:00 +01:00
ppom
ae28cfbb31
cluster: adapt to plugin interface change 2026-02-09 12:00:00 +01:00
ppom
05c6c1fbce
Fix tests
I initially wrote those tests with a test secret key file in the same directory.
Better having them write their own secret key file in their own dir
than a dangling test file in source code and be sensitive to the directory tests are run in.
2026-01-19 12:00:00 +01:00
ppom
615d721c9a
cluster: Upgrade iroh to 0.95.1 2026-01-19 12:00:00 +01:00
ppom
19ee5688a7
Testing with clusters of up to 15 nodes. Fails at ~6 to 9 nodes.
Still a "connection lost" issue.
Happens irregularly.
Nodes tend to ignore incoming connections because their id is too small.
I should debug why it is the case.
Nodes may succeed to recreate connections,
but they should not lose connections on localhost like that...
2026-01-19 12:00:00 +01:00
ppom
fb6f54d84f
Disable test where one plugin is in multiple nodes of one cluster. Test pass! 2026-01-19 12:00:00 +01:00
ppom
4fce6ecaf5
no long living task to try connect to a node. one shot task. add interval randomness. 2026-01-19 12:00:00 +01:00
ppom
5bfcf318c7
Tests on a cluster of 2 nodes 2026-01-19 12:00:00 +01:00
ppom
7ede2fa79c
cluster: Fix use of stream timestamp in action 2026-01-19 12:00:00 +01:00
ppom
1e082086e5
cluster: add tests
- on configuration
- on sending messages to its own cluster
2026-01-19 12:00:00 +01:00
ppom
8b3bde456e
cluster: UTC: no need for conversions, as Time already is UTC-aware 2025-12-14 12:00:00 +01:00
ppom
2095009fa9
cluster: use treedb for message queue persistance 2025-12-15 12:00:00 +01:00
ppom
c595552504
plugin: Remove action oneshot response 2025-12-07 12:00:00 +01:00
ppom
fbf8c24e31
cluster: try_connect opens the channels and handshakes itself
This fixes a deadlock where each node is initiating a connection
and therefore unable to accept an incoming connection.

connection_rx can now be either a raw connection or an initialized connection.
cluster startup has been refactored to take this into account and make
ConnectionManager create this channel itself.
2025-12-08 12:00:00 +01:00
ppom
da257966d9
Fix connection time out 🎉
I misinterpreted a millisecond arg as seconds, so the timeout was at 2ms
and the keep alive at 200ms, what could go wrong?

Also I gave this TransportConfig option to connect too. If not, the
default is used, not the Endpoint's own config.
https://github.com/n0-computer/iroh/issues/2872
2025-12-07 12:00:00 +01:00
ppom
b14f781528
cluster: use reaction_plugin's PatternLine 2025-12-08 12:00:00 +01:00
ppom
79d85c1df1
Reduce usage of chrono
TODO: handle migrations
2025-12-07 12:00:00 +01:00
ppom
1c423c5258
Fix panic caused by previous commit
Connection still close as soon as they idle :/
2025-12-07 12:00:00 +01:00
ppom
b667b1a373
Get rid of remoc for peer communications
I couldn't understand why all communications timed out as soon as all
messages are sent with a remoc RecvError::ChMux "multiplexer terminated".

So I'm getting rid of remoc (for now at least) and sending/receiving
raw data over the stream.

For now it panics, after the handshake complete, which is already good
after only one test O:D
2025-12-07 12:00:00 +01:00
ppom
83ac520d27
Connections have ids, to fix simultaneous connections races 2025-12-07 12:00:00 +01:00
ppom
3ed2ebd488
Two nodes succeeded to exchange messages 🎉
Separated try_connect to another task, to prevent interblocking

Send a byte to the new stream so that the other can see the stream
and accept it.
2025-12-07 12:00:00 +01:00
ppom
ff5200b0a0
cluster: add a lot of DEBUG msgs, Show trait to ease logging 2025-12-07 12:00:00 +01:00
ppom
a5d31f6c1a
cluster: First round of fixes and tests after first run
Still not working!
2025-12-07 12:00:00 +01:00
ppom
43fdd3a877
cluster: finish first draft
finish ConnectionManager main loop
handle local & remote messages, maintain local queue
2025-12-07 12:00:00 +01:00
ppom
0635bae544
cluster: created ConnectionManager
Reorganized code.
Moved some functionnality from EndpointManager to ConnectionManager.
Still a lot to do there, but few in the rest of the code.
2025-12-07 12:00:00 +01:00
ppom
552b311ac4
Move shutdown module to reaction-plugin and use in cluster 2025-12-07 12:00:00 +01:00
ppom
71d26766f8
plugin: Stream plugins now pass time information along their lines
This will permit the cluster to accurately receive older-than-immediate
information, and it will permit potential log plugins (journald?) to go
back in time at startup.
2025-12-07 12:00:00 +01:00
ppom
a70b45ba2d
Move parse_duration to reaction-plugin and fix dependency tree 2025-12-07 12:00:00 +01:00
ppom
40c6202cd4
WIP switch to one task per connection 2025-12-07 12:00:00 +01:00
ppom
7e680a3a66
Remove shared_secret option 2025-12-07 12:00:00 +01:00
ppom
9235873084
Expose parse_duration to the plugin
It may be better to put it in the reaction-plugin module instead
2025-12-07 12:00:00 +01:00
ppom
ba9ab4c319
Remove insecure handshake and just check if we know this public key 2025-12-07 12:00:00 +01:00
ppom
2e7fa016c6
Insecure hash-based handshake. I must find something else. 2025-12-07 12:00:00 +01:00
ppom
3f6e74d096
Accept remote connections. Prepare work for shared_secret handshake
Renamed ConnectionInitializer to EndpointManager.
Endpoint isn't shared with Cluster anymore.

Moved big `match` in `loop` to own function, mainly to separate it from
the select macro and reduce LSP latency. But that's cleaner too.
2025-12-07 12:00:00 +01:00
ppom
983eff13eb
cluster initialization
- Actions are connected to Cluster,
- Separate task to (re)initialize connections
2025-12-07 12:00:00 +01:00
ppom
cd2d337850
Fixed communication error: do not use serde_json::Value
So maybe serde_json's Value can't be serialized with postbag.
Recreated my own Value that can be converted from and to serde_json's.

removed one useless tokio::spawn.
2025-12-07 12:00:00 +01:00
ppom
310d3dbe99
Fix plugin build, one secret key per cluster, more work on cluster init 2025-12-07 12:00:00 +01:00
ppom
58180fe609
fmt, clippy, tests, fix some tests after startup refacto 2025-12-07 12:00:00 +01:00
ppom
a7604ca8d5
WIP allow plugin to print error to stderr and capture them
I have a race condition where reaction quits before printing process' stderr.
This will be the occasion to rework (again) reaction's daemon startup
2025-12-07 12:00:00 +01:00
ppom
124a2827d9
Cluster plugin init
- Remove PersistData utility
- Provide plugins a state directory instead, by starting them inside.
- Store the secret key as a file inside this directory.
- Use iroh's crate for base64 encoding, thus removing one dependency.
- Implement plugin's stream_impl and action_impl functions,
  creating all necessary data structures.
2025-12-07 12:00:00 +01:00
ppom
e3060d0404
cluster: retrieve, generate and store iroh SecretKey 2025-12-07 12:00:00 +01:00
ppom
61fe405b85
Add cluster plugin skeleton 2025-12-07 12:00:00 +01:00