ggml_backend_sched_t typedef

ggml_backend_sched_t = Pointer<ggml_backend_sched>

The backend scheduler allows for multiple backend devices to be used together Handles compute buffer allocation, assignment of tensors to backends, and copying of tensors between backends The backends are selected based on:

  • the backend that supports the operation
  • the location of the pre-allocated tensors (e.g. the weights) /* Example usage:

operations that use tensors allocated in a buffer with USAGE_WEIGHTS will be assigned preferrably to run on the same backend as the buffer ggml_backend_buffer_set_usage(buf_weights, GGML_BACKEND_BUFFER_USAGE_WEIGHTS);

    sched = ggml_backend_sched_new({backend_gpu, backend_gpu2, backend_cpu}, NULL, num_backends, GGML_DEFAULT_GRAPH_SIZE, false);

initialize buffers from a max size graph (optional) reserve_graph = build_graph(sched, max_batch_size);

manually assign nodes to a backend (optional, should not be needed in most cases) struct ggml_tensor * node = ggml_mul_mat(ctx, ...); ggml_backend_sched_set_tensor_backend(sched, node, backend_gpu);

    ggml_backend_sched_reserve(sched, reserve_graph);

compute graph = build_graph(sched); // the graph and its tensors are single-use in terms of allocation, multi-use in terms of computation for (int i = 0; i < 10; ++i) { ggml_backend_sched_graph_compute(sched, graph); // on the first iteration the graph is allocated automatically }

if there are graph inputs: graph = build_graph(sched); // get a new graph that is not allocated (the metadata for the old graph is freed once ggml_free is called) ggml_backend_sched_reset(sched); // clear the allocation of the previous graph ggml_backend_sched_alloc_graph(sched, graph); // explicitly allocate the new graph but do not execute it ggml_backend_tensor_set(input_tensor, ...); // copy data to the newly allocated graph tensors ggml_backend_sched_graph_compute(sched, graph); // execute the graph

as an alternative to the above it is also possible to assign the inputs to a dedicated context and allocate them statically via ggml_backend_alloc_ctx_tensors } */

Implementation

typedef ggml_backend_sched_t = ffi.Pointer<ggml_backend_sched>;