Overview

DataStorm is a new high-performance brokerless publish/subscribe framework. It allows you to distribute data between your applications through a simple yet powerful API.

Concepts

An application using DataStorm consists of nodes that either publish or subscribe to "data elements" within topics. A topic is a named local object that contains a key-value dictionary, and each key-value entry in this dictionary is a data element.  

A topic allows you to create data writers or readers to send or receive samples for data elements in topics with the same names. Each sample contains a data element plus lifecycle information, such as the timestamp of when the sample was published.

To minimize network usage, DataStorm provides a feature called "partial updates" that allows the application to send only a subset of the value to update.

A data reader subscribes to a data writer of a topic with the same name using the data element's key or a key filter. A key filter allows the reader to receive samples from writers whose key matches the filter.  Readers can also use sample filters to receive only specific samples. A writer will only send samples matching the sample filter. Filters are custom functions provided by the application, and DataStorm provides a few pre-defined filters such as a regular expression filter for keys of type string.

Brokerless

DataStorm nodes do not require a central service to communicate. Nodes can use UDP multicast or TCP to advertise their topics and connect to each other directly if they have topics in common. There is no single-point failure with this model and it greatly simplifies the deployment and management of the distributed application.

When using TCP for discovery, nodes need to connect to another node which is either a regular node or a node dedicated for discovery. Nodes need to connect to the same node to be discoverable. These discovery nodes can be replicated and they can connect each other to ensure there's no single-point of failure.

Based on Ice

The implementation of DataStorm relies on Ice, and DataStorm naturally plays well with Ice: you can easily distribute data defined using Ice with DataStorm.

You can also use DataStorm without Ice, in particular:

  • you don't need to know Ice or Ice APIs to use DataStorm
  • you can easily distribute data with simple types using DataStorm
  • you can distribute data with more complex types using DataStorm by providing your own serialization/deserialization functions

For Ice types, DataStorm uses automatically the Ice-generated marshaling and unmarshaling code.

DataStorm vs IceStorm

Ice already provides a pub/sub service named IceStorm. So if you need pub/sub with Ice, should you use IceStorm or DataStorm?

IceStorm is broker-based pub/sub service, where the broker (the IceStorm service) can be replicated for fault-tolerance. It is mature and available for all programming languages supported by Ice.

IceStorm is all about distributing remote Ice calls: when your publisher makes a oneway call on a given topic, it makes a regular Ice remote call, and IceStorm replicates and routes this call to all the subscribers registered with this topic. These subscribers are regular Ice objects that you implement, and they receive the call (oneway request) sent by the publisher just like any other Ice request dispatch.

DataStorm is a brand new library-based pub/sub framework.

DataStorm is all about distributing data. When one of your applications needs data produced by another application, DataStorm helps you publish, filter and receive data items very easily - you don't need to worry about network connections or making remote calls.

Languages

DataStorm currently supports only the C++ programming language.