Utility API

Internal utilities

Defines

_GNU_SOURCE

DYAD_UTIL_LOGGER

DYAD_PATH_DELIM

Functions

uint32_t hash_str(const char *str, const uint32_t seed)

Computes a non-zero MurmurHash3 hash of a string.

Hashes str using MurmurHash3_x64_128 with the given seed, then reduces the four 32-bit output words via XOR and adds 1 to ensure the result is never zero. This allows zero to be used unambiguously as an error sentinel by the caller.

If str is shorter than 128 bytes, it is padded with '@' characters to 128 bytes before hashing to improve hash distribution for short strings, consistent with the padding used in gen_path_key().

Parameters:

str – [in] Null-terminated string to hash. If NULL or empty, returns 0.
seed – [in] Seed value for MurmurHash3.

Return values:

0 – str is NULL or empty.
non-zero – The computed hash value.

Returns:

uint32_t

uint32_t hash_path_prefix(const char *str, const uint32_t seed, const size_t len)

Computes a non-zero MurmurHash3 hash of the first len bytes of a string.

Hashes only the first len bytes of str using MurmurHash3_x64_128 with the given seed, then reduces the four 32-bit output words via XOR and adds 1 to ensure the result is never zero. This allows zero to be used unambiguously as an error sentinel by the caller.

If len is shorter than 128 bytes, the prefix is padded with '@' characters to 128 bytes before hashing to improve hash distribution for short prefixes, consistent with the padding used in hash_str() and gen_path_key().

If str is shorter than len bytes, the function returns 0 since the requested prefix length exceeds the actual string length.

Parameters:

str – [in] Null-terminated string whose prefix is to be hashed. If NULL, returns 0.
seed – [in] Seed value for MurmurHash3.
len – [in] Number of bytes to hash from the start of str. If 0 or greater than strlen(str), returns 0.

Return values:

0 – str is NULL, len is 0, or len exceeds the length of str.
non-zero – The computed hash value of the first len bytes.

Returns:

uint32_t

char *concat_str(char *str, const char *to_append, const char *connector, size_t str_capacity)

Appends a string to an existing buffer, joining them with a connector.

Concatenates connector and to_append onto the end of str in-place, producing "str + connector + to_append". The result is written back into the str buffer.

If str already ends with connector, the trailing connector is stripped before appending to avoid duplicating it. For example, concatenating "foo/" with connector "/" and to_append "bar" produces "foo/bar" rather than "foo//bar".

The operation is performed via an intermediate heap-allocated buffer to safely handle the in-place update of str. If str, to_append, or connector overlap in memory, the function returns NULL without modifying str.

Note

This function allocates a temporary heap buffer internally for the concatenation and frees it before returning.

Parameters:

str – [inout] Null-terminated string to append to. Also serves as the output buffer. Must not be NULL and must be at least str_capacity bytes in size.
to_append – [in] Null-terminated string to append. Must not be NULL and must not overlap with str.
connector – [in] Null-terminated string to insert between str and to_append (e.g. "/"). Must not overlap with str.
str_capacity – [in] Total size of the str buffer in bytes. The combined result must fit within this capacity including the null terminator.

Return values:

str – The operation succeeded and str now contains the concatenated result.
NULL – The combined result would exceed str_capacity, or str, to_append, and connector overlap in memory.

Returns:

char*

bool extract_user_path(const char *prefix, const char *full, const char *delim, char *upath, const size_t upath_capacity)

Extracts the path component following a managed directory prefix.

Checks whether full begins with prefix (separated by delim) and, if so, extracts the portion of full that follows the prefix and delimiter into upath. This is used to derive the path of a file relative to a DYAD-managed directory from its absolute path.

For example, with prefix "/managed", delim "/", and full "/managed/subdir/file.txt", the extracted upath would be "subdir/file.txt".

The following conditions all cause the function to return false without modifying upath:

upath is NULL.
prefix, full, or delim overlaps with the upath buffer.
full does not begin with prefix.
Any path argument exceeds PATH_MAX bytes.
full is equal to prefix with no user path component following it.
The delimiter is not present between prefix and the user path in full (e.g. "/managed_other/file" does not match prefix "/managed").
The extracted user path would exceed upath_capacity bytes including the null terminator.

If prefix itself ends with delim, the trailing delimiter is stripped before matching to avoid requiring a double delimiter between the prefix and the user path.

Note

upath is not explicitly null-terminated by this function. Callers should zero-initialize the buffer before calling to ensure the result is null-terminated.

Parameters:

prefix – [in] Null-terminated managed directory path to match against the start of full. Must not be NULL.
full – [in] Null-terminated absolute file path to extract from. Must not be NULL and must not overlap with upath.
delim – [in] Null-terminated path delimiter string (e.g. "/"). If NULL, treated as an empty string.
upath – [out] Buffer to receive the extracted relative path. Must not be NULL or overlap with any other argument. Not null-terminated by this function; the caller should ensure the buffer is zeroed before calling.
upath_capacity – [in] Size of the upath buffer in bytes. The extracted path must fit within this capacity including a null terminator.

Return values:

true – full begins with prefix and the relative path was successfully extracted into upath.
false – Any of the failure conditions listed above were met. upath is not modified.

Returns:

bool

bool cmp_canonical_path_prefix(const dyad_ctx_t *ctx, const bool is_prod, const char *path, char *upath, const size_t upath_capacity)

Checks whether a path falls under a DYAD-managed directory and extracts its relative component.

Determines if path is under the DYAD-managed directory for either the producer (is_prod is true) or consumer (is_prod is false), and if so, extracts the portion of path following the managed prefix into upath.

To handle symlinks and non-canonical paths, the check is attempted in up to four passes before returning false:

Hash and match path against the managed path prefix.
Hash and match path against the canonical (real) managed path prefix, if one is available (can_prefix_len > 0).
Resolve path to its canonical form via realpath(), then hash and match the result against the managed path prefix.
Hash and match the canonical form of path against the canonical managed path prefix.

Each pass first compares a hash of the appropriate prefix-length of the path against the pre-computed prefix hash stored in the context, and only calls extract_user_path() on a hash match. This avoids the cost of full string comparison for paths that clearly do not match.

upath is populated by the first passing match and the function returns immediately without attempting further passes.

Note

The function assumes that the prefix lengths (prod_managed_len, cons_managed_len, etc.) and pre-computed hashes stored in ctx are accurate and consistent with the corresponding path strings. No internal validation of these values is performed.

Note

Hash collisions between an unrelated path and a managed prefix will cause extract_user_path() to be called unnecessarily, but the full string comparison inside extract_user_path() will correctly reject the mismatch.

Note

This function only works correctly when there are no multiple absolute paths to the same file via hard links.

Parameters:

ctx – [in] Pointer to the DYAD context. Must not be NULL. Provides the managed path, its canonical form, their lengths, and their pre-computed hashes for both producer and consumer sides.
is_prod – [in] If true, match against the producer-managed path (ctx->prod_managed_path). If false, match against the consumer-managed path (ctx->cons_managed_path).
path – [in] Null-terminated path to check. May be a symlink or non-canonical path; realpath() is used as a fallback if direct matching fails.
upath – [out] Buffer to receive the relative path component following the managed prefix. Should be zero-initialized by the caller. Not explicitly null-terminated by this function.
upath_capacity – [in] Size of the upath buffer in bytes.

Return values:

true – path (or its canonical form) is under the managed directory and the relative component has been written to upath.
false – ctx is NULL, path does not fall under the managed directory under any of the four matching passes, or realpath() failed when resolving path.

Returns:

bool

int mkpath(const char *dir, const mode_t m)

Recursively creates a directory and all missing parent directories.

Creates dir and any intermediate parent directories that do not yet exist, similar to mkdir -p. If dir already exists, returns 0 immediately without error.

The implementation recurses up the directory tree via dirname() until it reaches a directory that already exists, then creates each missing component on the way back down. strdupa() is used to duplicate the path before passing it to dirname() since dirname() may modify its argument in place.

See https://stackoverflow.com/questions/2336242/recursive-mkdir-system-call-on-unix for the basis of this implementation.

Note

The permission mode m is applied to each directory created during the recursive descent. The effective permissions may differ from m depending on the process umask.

Note

This function uses strdupa() which allocates on the stack. Deep directory hierarchies or very long paths may cause stack overflow.

Warning

Return codes from intermediate mkdir() calls during recursion are not checked. Only the return value of the final mkdir() for dir itself is returned to the caller.

Parameters:

dir – [in] Null-terminated path of the directory to create. Must not be NULL. If NULL, sets errno to EINVAL and returns 1.
m – [in] Permission mode bits to apply to each newly created directory, passed directly to mkdir().

Return values:

0 – dir already exists or was successfully created along with all required parent directories.
1 – dir is NULL (errno set to EINVAL).
non-zero – The return value of mkdir() for the final directory component if creation failed, with errno set by mkdir().

Returns:

int

int mkdir_as_needed(const char *path, const mode_t m)

Creates a directory and all missing parent directories, with existence and permission checks.

Creates path and any missing intermediate parent directories using mkpath(). Before attempting creation, checks whether path already exists and validates that it is a directory with the expected permission bits. The same checks are repeated after mkpath() returns a non-zero value, since a concurrent process may have created the directory in the interim.

The process umask is temporarily set to 0 during directory creation to ensure that the permission bits specified by m are applied exactly as requested. The original umask is restored after mkpath() returns.

If DYAD_SYNC_DIR is defined at compile time, the parent directory of path is synced via sync_containing_dir() after successful creation to ensure the new directory entry is durable on storage.

Note

The umask is restored to its original value after mkpath() returns, but is not restored if mkpath() is interrupted abnormally.

Note

Return code 5 is not an error in the strict sense — the directory is usable — but callers may wish to log or handle the permission mismatch depending on their security requirements.

Warning

This function calls perror() directly on mkpath() failure, which writes to stderr. Callers that manage their own error output should be aware of this side effect.

Parameters:

path – [in] Null-terminated path of the directory to create. Must not be NULL or empty.
m – [in] Permission mode bits to apply to newly created directories. The umask is set to 0 during creation so these bits are applied exactly.

Return values:

0 – The directory was successfully created.
1 – The directory already exists with the requested permissions.
5 – The directory already exists but with different permission bits.
-1 – mkpath() failed and the directory does not exist afterward.
-2 – path already exists but is not a directory.
-3 – path is NULL or empty.
-4 – mkpath() failed but a subsequent stat() found path exists as a non-directory entry.

Returns:

int

int get_path(const int fd, const size_t max_size, char *path)

Resolves the file path associated with an open file descriptor.

Reads the symbolic link /proc/self/fd/ followed by fd via readlink() to obtain the path of the file currently open on fd, and writes the result into path. This is a Linux-specific mechanism and requires /proc to be mounted.

path is zero-initialized up to max_size + 1 bytes before the readlink() call. If readlink() returns exactly max_size bytes, a truncation warning is logged since the path may have been silently truncated.

Note

If readlink() returns exactly max_size bytes, the path may have been truncated. A debug message is logged but the function still returns 0. Callers that require exact paths should use a buffer of at least PATH_MAX + 1 bytes.

Note

This function relies on /proc/self/fd/, which is Linux-specific and requires /proc to be mounted.

Warning

There is an off-by-one issue in the null terminator placement: path[max_size + 1] is written rather than path[max_size], which writes one byte past the end of a max_size + 1 sized buffer. The path buffer should be at least max_size + 2 bytes to avoid a buffer overwrite.

Parameters:

fd – [in] Open file descriptor whose path is to be resolved.
max_size – [in] Maximum number of bytes to write into path, excluding the null terminator. Must be at least 1. The path buffer must be at least max_size + 1 bytes in size to accommodate the null terminator.
path – [out] Buffer to receive the resolved path. Zero-initialized by this function up to max_size + 1 bytes before the readlink() call.

Return values:

0 – The path was successfully resolved and written to path.
-1 – max_size is less than 1, or readlink() failed (errno set by readlink()).

Returns:

int

bool is_path_dir(const char *path): Check if the path is a directory.

bool is_fd_dir(int fd)

Checks whether an open file descriptor refers to a directory.

Parameters:

fd – [in] File descriptor to check. If negative, returns false immediately without calling fstat().

Return values:

true – fd is a valid open file descriptor referring to a directory.
false – fd is negative, fstat() failed, or fd does not refer to a directory.

Returns:

bool

ssize_t get_file_size(int fd)

Returns the size of an open file in bytes.

Calls fstat() on fd to obtain the file size from the file’s stat structure. Does not modify the file position and works on any file descriptor for which fstat() is supported.

Note

On a return value of 0, errno can be checked to distinguish between an empty file and a fstat() failure. If fstat() failed, errno is set to one of:

EBADF: fd is not a valid open file descriptor.
EFAULT: The stat buffer address is invalid (internal error).
EIO: An I/O error occurred while reading file metadata.

Parameters:: fd – [in] Open file descriptor to measure.
Return values:: >=0 – The size of the file in bytes. A value of 0 means the file is empty or fstat() failed.
Returns:: ssize_t

dyad_rc_t dyad_excl_flock(const dyad_ctx_t *ctx, int fd, struct flock *lock)

Acquires an exclusive (write) lock on an open file descriptor.

Sets a POSIX write lock (F_WRLCK) over the entire file using fcntl() with F_SETLKW, blocking the caller until the lock is acquired. This prevents other processes from acquiring any lock (shared or exclusive) on the file until the lock is released via dyad_release_flock().

If lock is NULL, the function returns without taking any action.

Parameters:

ctx – [in] DYAD context.
fd – [in] File descriptor of the open file to lock.
lock – [out] Pointer to a flock structure populated by this function. Must not be NULL. The structure is used for subsequent unlock calls via dyad_release_flock().

Return values:

DYAD_RC_OK – The lock was successfully acquired.
DYAD_RC_BADFIO – The fcntl() call failed to acquire the lock.

Returns:

dyad_rc_t Return code indicating the outcome:

dyad_rc_t dyad_shared_flock(const dyad_ctx_t *ctx, int fd, struct flock *lock)

Acquires a shared (read) lock on an open file descriptor.

Sets a POSIX read lock (F_RDLCK) over the entire file using fcntl() with F_SETLKW, blocking the caller until the lock is acquired. Multiple consumers holding shared locks on the same file may coexist, but a shared lock cannot be acquired while an exclusive lock is held, and vice versa.

If lock is NULL, the function returns without taking any action.

Parameters:

ctx – [in] DYAD context.
fd – [in] File descriptor of the open file to lock.
lock – [out] Pointer to a flock structure populated by this function. Must not be NULL. The structure is used for subsequent unlock calls via dyad_release_flock().

Return values:

DYAD_RC_OK – The shared lock was successfully acquired.
DYAD_RC_BADFIO – The fcntl() call failed to acquire the lock.

Returns:

dyad_rc_t Return code indicating the outcome:

dyad_rc_t dyad_release_flock(const dyad_ctx_t *ctx, int fd, struct flock *lock)

Releases a lock previously acquired on an open file descriptor.

Clears a POSIX lock (F_UNLCK) over the entire file using fcntl() with F_SETLKW, releasing any lock (exclusive or shared) previously set by dyad_excl_flock() or dyad_shared_flock(). Other processes blocked on a lock acquisition for this file will be allowed to proceed.

If lock is NULL, the function returns without taking any action.

Parameters:

ctx – [in] DYAD context.
fd – [in] File descriptor of the open file to unlock.
lock – [inout] Pointer to the flock structure previously populated by dyad_excl_flock() or dyad_shared_flock(). Must not be NULL.

Return values:

DYAD_RC_OK – The shared lock was successfully acquired.
DYAD_RC_BADFIO – The fcntl() call failed to acquire the lock.

Returns:

dyad_rc_t Return code indicating the outcome:

Returns:

dyad_rc_t Return code indicating the outcome:

int sync_containing_dir(const char *path): Run fsync for the containing directory of the given path. For example, if path is “/a/b”, then fsync on “/a”. This cannot be used with DYAD interception.