Getting Started

Prerequisites

DYAD has the following minimum requirements to build and install:

  • A C99-compliant C compiler

  • A C++11-compliant C++ compiler

  • Autoconf 2.63

  • Automake

  • Libtool

  • Make

  • pkg-config

  • Jansson 2.10

  • flux-core

Installation

Manual Installation

Attention

Currently, DYAD can only be installed manually. This page will be updated as additional methods of installation are added.

Note

Recommended for developers and contributors

You can get DYAD from its GitHub repository using these commands:

$ git clone https://github.com/flux-framework/dyad.git
$ cd dyad

DYAD uses the Autotools for building and installation. To start the build process, run the following command to generate the necessary configuration scripts using Autoconf:

$ ./auotgen.sh

Next, configure DYAD using the following command:

$ ./configure --prefix=<INSTALL_PATH>

Besides the normal configure script flags, DYAD’s configure script also has the following flags:

Flag

Type (default)

Description

–enable-dyad-debug

Bool (true if provided)

if enabled, include debugging prints and logs for DYAD at runtime

–enable-perfflow

Bool (true if provided)

if enabled, build PerfFlow Aspect-based performance measurement annotations for DYAD

Note

The installation prefix (i.e., --prefix) is also used to try to locate flux-core. First, configure will look for flux-core in the installation prefix. If it is not found there, configure will then use pkg-config to locate flux-core.

Finally, build and install DYAD using the following commands:

$ make [-j]
$ make install

Building with PerfFlow Aspect Support (Optional)

DYAD has optional support for collecting cross-cutting performance data using PerfFlow Aspect. To enable this support, first build PerfFlow Aspect for C/C++ using their instructions. Then, modify your method of choice for building DYAD as follows:

  • Manual Installation: add --enable-perfflow to your invocation of ./configure

Using DYAD’s APIs

Currently, DYAD provides APIs for the following programming languages:

  • C

  • C++

This section describes the basics of integrating them into an application.

C API

DYAD’s C API leverages the LD_PRELOAD trick to integrate into user applications. As a result, users can utilize DYAD’s C API by simply adding the following before the shell command that launches their application:

$ LD_PRELOAD=path/to/dyad_wrapper.so

Once preloaded, DYAD’s C API will intercept the open and fopen functions when consuming files and the close and fclose functions when producing files. As a result, if their code already uses thse functions, users do not need to change their code.

C++ API

DYAD’s C++ API is implemented as a small library that wraps C++’s Standard Library file streams. To use DYAD’s C++ API, first, add the following to your code:

#include <dyad_stream_api.hpp>

This header defines thin wrappers around the file streams provided by the C++ Standard Library. More specifically, it provides the following classes:

  • dyad::basic_ifstream_dyad

  • dyad::ifstream_dyad

  • dyad::wifstream_dyad

  • dyad::basic_ofstream_dyad

  • dyad::ofstream_dyad

  • dyad::wofstream_dyad

  • dyad::basic_fstream_dyad

  • dyad::fstream_dyad

  • dyad::wfstream_dyad

When using DYAD, these file streams should be used in place of the file streams from the C++ Standard Library. The DYAD file streams should be directly used to do the following:

  • Open files (with the file stream’s open method)

  • Close files (with the file stream’s close method or destructor)

  • Access the underlying C++ Standard Library file stream using the DYAD stream’s get_stream method

All reading from and writing to files should be done using the underlying C++ Standard Library file stream. A simple example of using DYAD’s C++ API in a producer application is shown below:

#include <dyad_stream_api.hpp>

void produce_file(std::string& full_path, int32_t* data, std::size_t data_size)
{
    dyad::ofstream_dyad ofs_dyad;
    ofs_dyad.open(full_path, std::ofstream::out | std::ios::binary);
    std::ofstream& ofs = ofs_dyad.get_stream();
    ofs.write((char*) data, data_size);
    ofs_dyad.close();
}

After replacing C++ Standard Library file streams with their DYAD equivalents, there is one final requirement to using the C++ API. When compiling your code, you must link the associated library (i.e., libdyad_stream.so or libdyad_stream.a). This library can be found in the lib subdirectory of the install prefix.

Running DYAD

There are three steps to running DYAD-enabled applications:

  1. Create a Flux key-value store (KVS) namespace

  2. Determine the managed directories for each application

  3. Load DYAD’s Flux module

  4. Configure and run the DYAD-enabled applications

Create a Flux KVS Namespace

DYAD uses its own namespace in Flux’s hierarchical key-value store (KVS) to isolate itself from the KVS entries from other Flux services. Thus, the first step in running DYAD is to create a KVS namespace. This namespace is used by DYAD to exchange file information (e.g., the Flux broker that “owns” a file) needed to synchronize the consumer application and transfer the file from producer to consumer. To create this namespace, run the following:

$ flux kvs namespace create <DYAD_KVS_NAMESPACE>

The namespace can be whatever string value you want.

Determine the Managed Directories for Each Application

To determine when to perform synchronization and data transfer, DYAD tracks two directories for each application: a producer-managed directory and a consumer-managed directory. At least one of these directories must be specified for DYAD to do anything. If neither are provided, the application will still run, but DYAD will not do anything.

When a producer-managed directory is provided, DYAD will store information about any file stored in that directory (or its subdirectories) into a namespace within the Flux key-value store (KVS). This information is later used by DYAD to transfer files from producer to consumer.

When a consumer-managed directory is provided, DYAD will block the application whenever a file inside that directory (or subdirectory) is opened. This blocking will last until DYAD sees information about the file in the Flux KVS namespace. If the information retrieved from the KVS indicates that the file is actually located elsewhere, DYAD will use Flux’s remote procedure call (RPC) system to ask DYAD’s Flux module to transfer the file. If a transfer occurs, the file’s contents will be stored at the file path passed to the original file opening function (e.g., open, fopen).

Before running the following steps, determine the producer- and/or consumer-managed directories for each application. These directories will need to be provided to the commands in the next steps.

Note

When opening or closing a file not in the producer- or consumer-managed directories, DYAD will simply open or close the file. DYAD changes the behavior of opening or closing only the files in the managed directories.

Load DYAD’s Flux Module

The next step in running DYAD is to load DYAD’s Flux module. The module is the component of DYAD responsible for sending files from producer to consumer. Once loaded, this module will run whenever its associated Flux broker receives a relevant remote procedure call from a DYAD-enabled consumer. To load the module, first, determine where dyad.so is located. This should normally be <PREFIX>/lib/dyad.so. Once you have found the path to dyad.so, you can load the module on the current Flux broker using:

$ flux module load path/to/dyad.so <DYAD_PATH_PRODUCER>

The dyad.so module takes a single command-line argument: the producer-managed directory. The producer uses this directory as the root from which the module will look for files to transfer.

Note that the command above will only load the module on the Flux broker on which the command is run. This can be an issue if you are submitting jobs because you will not know on which broker your jobs will be run. As a result, it is highly recommended that you launch the DYAD module on all brokers in your Flux instance. You can do this by running:

$ flux exec -r all flux module load path/to/dyad.so <DYAD_PATH_PRODUCER>

Configure and Run the DYAD-Enabled Applications

Once the KVS namespace and DYAD module are set up, the DYAD-enabled applications can be run. To run a DYAD-enabled application, simply run your application as normal with certain environment variables set. A table containing the current environment variables recognized by DYAD is shown below.

Name

Type

Required?

Default

Description

DYAD_KVS_NAMESPACE

String

Yes

N/A

The Flux KVS namespace that DYAD will use to record or look

for file information

DYAD_PATH_PRODUCER

Directory Path

Yes [1]

N/A

The producer-managed path of the application

DYAD_PATH_CONSUMER

The consumer-managed path of the application

DYAD_SHARED_STORAGE

0 or 1

No

0

If 1 (i.e., true), only provide per-file synchronization of

the consumer (i.e., no transfer)

DYAD_KEY_DEPTH [2]

Integer

No

3

The number of levels in Flux’s hierarchical KVS to use

within DYAD’s namespace

DYAD_KEY_BINS [2]

Integer

No

1024

The maximum number of unique values for the keys associated

with any given level of Flux’s hierarchical KVS within

DYAD’s namespace