Discovery
A DataStorm node needs to discover other nodes in order to subscribe or publish to data elements from topics. By default, DataStorm discovers other nodes with UDP multicast but it's also possible to discover nodes by connecting to another node that will act as a registry for topic discovery purposes.
On this page:
Multicast
By default, a DataStorm node sends datagrams to advertise topics to other nodes and listens for advertisements from other nodes through UDP multicast. Multicast discovery can be disabled with the DataStorm.Node.Multicast.Enabled=0
property setting.
Connection
While multicast discovery allows to easily find nodes without configuration, it's not suitable for network environments where multicast isn't allowed or where nodes are deployed in different networks which can't exchange UDP multicast datagrams. Connection based discovery is available when multicast can't be used. A node can connect to another one through a TCP connection and register itself with this node to advertise its topics. The topics will be advertised by the node to other connected nodes and eventually through multicast if multicast forwarding is enabled.
A node uses the DataStorm.Node.ConnectTo
property to configure the endpoint of the node to connect to. Multiple endpoints can be specified to allow failover in case a node becomes un-reachable. The node will only connect to a single endpoint but if this endpoint becomes unavailable, it will try to connect to the other endpoints. In order to accept connections, a node must configure its server endpoints to listen on a well known port with the DataStorm.Node.Server.Endpoints
. For example:
Connection based discovery allows many deployment scenarios for DataStorm writer and reader applications. Let's examine these different scenarios.
One writer with multiple readers
The writer listens on a well known server endpoint and multiple readers can connect to it.
Multiple writers with one reader
The reader listens on a well known server endpoint and multiple writers connect to it.
Multiple writers with multiple readers and a single registry node
The readers and writers connect to one registry node for discovery purposes.
In this scenario, the DataStorm node that acts as the registry can be any DataStorm node. For instance, it could be one of the readers or writers. It can also be the dsnode
executable provided with the DataStorm distribution.
In this scenario, the registry node is used for discovery purposes only. Readers and writers continue to exchange data directly by establishing a direct TCP connection.
Multiple writers with multiple readers and multiple registry nodes
The scenario presented above has a single point of failure: the single registry node.
In order to eliminate this single point of failure, it's possible to deploy multiple registry nodes and configure the readers and writers to connect to any of these registry nodes. Readers and writers will still connect to a single registry but if the connection to this registry fails, it will reestablish a connection with another registry if the original registry is no longer available. This guarantees that there's no single point of failures.
Again, the registry nodes here can be any node. It could be reader or writer nodes.
The registry nodes must connect each other in order to exchange topic advertisements from their connected readers or writers.