Skip to main content

Your First Dataflow

Hydroflow+ programs require special structure to support code generation and distributed deployments. There are two main components of a Hydroflow+ program:

  • The flow graph describes the dataflow logic of the program.
  • The deployment describes how to map the flow graph to a physical deployment.
tip

We recommend using the Hydroflow+ template to get started with a new project. The template comes with a pre-configured build system and the following example pre-implemented.

cargo install cargo-generate
cargo generate gh:hydro-project/hydroflow template/hydroflow_plus

cd into the generated folder, ensure the correct nightly version of rust is installed, and test the generated project:

cd <my-project>
rustup update
cargo test

Let's look a minimal example of a Hydroflow+ program. We'll start with a simple flow graph that prints out the first 10 natural numbers. First, we'll define the flow graph.

The Flow Graph

src/first_ten_distributed.rs
use hydroflow_plus::*;
use stageleft::*;

pub struct P1 {}
pub struct P2 {}

pub fn first_ten_distributed(flow: &FlowBuilder) -> (Process<P1>, Process<P2>) {
let process = flow.process::<P1>();
let second_process = flow.process::<P2>();

let numbers = flow.source_iter(&process, q!(0..10));
numbers
.send_bincode(&second_process)
.for_each(q!(|n| println!("{}", n)));

(process, second_process)
}

To build a Hydroflow+ application, we need to define a dataflow that spans multiple processes. The FlowBuilder parameter captures the global dataflow, and we can instantiate processes to define boundaries between distributed logic. When defining a process, we also pass in a type parameter to a "tag" that identifies the process. When transforming streams, the Rust type system will guarantee that we are operating on streams on the same process.

let process = flow.process::<P1>();
let second_process = flow.process::<P2>();

Now, we can build out the dataflow to run on this process. Every dataflow starts at a source that is bound to a specific process. First, we instantiate a stream that emits the first 10 natural numbers.

let numbers = flow.source_iter(&process, q!(0..10));

In Hydroflow+, whenever there are snippets of Rust code passed to operators (like source_iter, map, or for_each), we use the q! macro to mark them. For example, we may use Rust snippets to define static sources of data or closures that transform them.

To print out these numbers, we can use the for_each operator (note that the body of for_each is a closure wrapped in q!):

numbers
.send_bincode(&second_process)
.for_each(q!(|n| println!("{}", n)));

In the next section, we will look at how to deploy this program to run on multiple processs.