# Defining User Q-Functions

An important feature of LibCEED.jl is the ability to define user Q-functions natively in Julia. These user Q-functions work with both the CPU and CUDA backends.

User Q-functions describe the action of the $D$ operator at quadrature points (see libCEED's theoretical framework). Since the Q-functions are invoked at every quadrature point, efficiency is very important.

## Apply mass Q-function in C

Before describing how to define user Q-functions in Julia, we will briefly given an example of a user Q-function defined in C. This is the "apply mass" Q-function from ex1-volume.c, which computes the action of the mass operator. The mass operator on each element can be written as $B^\intercal D B$, where $B$ is the basis operator, and $D$ represents multiplication by quadrature weights and geometric factors (i.e. the determinant of the mesh transformation Jacobian at each qudarture point). It is the action of $D$ that the Q-function must implement. The C source of the Q-function is:

/// libCEED Q-function for applying a mass operator
CEED_QFUNCTION(f_apply_mass)(void *ctx, const CeedInt Q,
const CeedScalar *const *in,
CeedScalar *const *out) {
const CeedScalar *u = in[0], *qdata = in[1];
CeedScalar *v = out[0];
// Quadrature Point Loop
CeedPragmaSIMD
for (CeedInt i=0; i<Q; i++) {
v[i] = qdata[i] * u[i];
} // End of Quadrature Point Loop
return 0;
}

From this example, we see that a user Q-function is a C callback that takes a "data context" pointer, a number of quadrature points, and two arrays of arrays, one for inputs, and one for outputs.

In this example, the first input array is u, which is the value of the trial function evaluated at each quadrature point. The second input array is qdata, which contains the precomputed geometric factors. There is only one output array, v, which will store the pointwise product of u and data. Given the definition of this Q-function, the CeedQFunction object is created by

CeedQFunctionCreateInterior(ceed, 1, f_apply_mass, f_apply_mass_loc, &apply_qfunc);
CeedQFunctionAddInput(apply_qfunc, "u", 1, CEED_EVAL_INTERP);
CeedQFunctionAddInput(apply_qfunc, "qdata", 1, CEED_EVAL_NONE);
CeedQFunctionAddOutput(apply_qfunc, "v", 1, CEED_EVAL_INTERP);

When adding the inputs and outputs, CEED_EVAL_INTERP indicates that the $B$ basis operator should be used to interpolate the trial and test functions from nodal points to quadrature points, and CEED_EVAL_NONE indicates that the qdata is already precomputed at quadrature points, and no interpolation is requried.

## Apply mass Q-function in Julia

We now replicate this Q-function in Julia. The main way of defining user Q-functions in Julia is using the @interior_qf macro. The above C code (both the definition of the Q-function, its creation, and adding the inputs and outputs) is analogous to the following Julia code:

@interior_qf apply_qfunc = (
ceed, Q,
(u, :in, EVAL_INTERP, Q), (qdata, :in, EVAL_NONE, Q),
(v, :out, EVAL_INTERP, Q),
@inbounds @simd for i=1:Q
v[i] = qdata[i]*u[i]
end
)

This creates a QFunction object named apply_qfunc. The Q-function is defined by the tuple on the right-hand side. ceed is the name of the Ceed object where the Q-function will be created, and the second argument, Q, is the name of that variable that will contain the number of quadrature points. The next three arguments are specifications of the input and output fields:

    (u, :in, EVAL_INTERP, Q),
(qdata, :in, EVAL_NONE, Q),
(v, :out, EVAL_INTERP, Q),

Each input or output field specification is a tuple, where the first entry is the name of the array, and the second entry is either :in or :out, according to whether the array is an input or output array. The third entry is the EvalMode of the field. The remaining entries are the dimensions of the array. The first dimension is always equal to the number of quadrature points. In this case, all the arrays are simply vectors whose size is equal to the number of quadrature points, but in more sophisticated examples (e.g. the apply diffusion Q-function) these arrays could consists of vectors or matrices at each quadrature point. After providing all of the array specifications, the body of the Q-function is provided.

## Apply diffusion Q-function in Julia

For a more sophisticated example of a Q-function, we consider the "apply diffusion" Q-function, used in ex2-surface. This Q-function computes the action of the diffusion operator. When written in the form $B^\intercal D B$, in this case $B$ represents the basis gradient matrix, and $D$ represents multiplication by $w \det(J) J^{-\intercal} J^{-1}$, where $J$ is the mesh transformation Jacobian, and $w$ is the quadrature weight.

This Q-function is implemented in Julia as follows:

@interior_qf apply_qfunc = (
ceed, Q, dim=dim,
(du, :in, EVAL_GRAD, Q, dim),
(qdata, :in, EVAL_NONE, Q, dim*(dim+1)÷2),
(dv, :out, EVAL_GRAD, Q, dim),
@inbounds @simd for i=1:Q
dXdxdXdxT = getvoigt(@view(qdata[i,:]), CeedDim(dim))
dui = SVector{dim}(@view(du[i,:]))
dv[i,:] .= dXdxdXdxT*dui
end
)

In contrast to the previous example, before the field specifications, this Q-function includes a constant definition dim=dim. The @interior_qf macro allows for any number of constant definitions, which make the specified values available within the body of the Q-function as compile-time constants.

In this example, dim is either 1, 2, or 3 according to the spatial dimension of the problem. When the user Q-function is defined, LibCEED.jl will JIT compile the body of the Q-function and make it available to libCEED as a C callback. In the body of this Q-function, dim will be available, and its value will be a compile-time constant, allowing for (static) dispatch based on the value of dim, and eliminating branching.

Note that dim is also available for use in the field specifications. In this example, the field specifications are slightly more involved that in the previous example. The arrays are given by

    (du, :in, EVAL_GRAD, Q, dim),
(qdata, :in, EVAL_NONE, Q, dim*(dim+1)÷2),
(dv, :out, EVAL_GRAD, Q, dim),

Note that the input array du has EvalMode EVAL_GRAD, meaning that this array stores the gradient of the trial function at each quadrature point. Therefore, at each quadrature point, du stores a vector of length dim, and so the shape of du is (Q, dim). Similarly, the action of $D$ is given by $w \det(J) J^{-\intercal} J^{-1} \nabla u$, which is also a vector of length dim at each quadrature point. This means that the output array dv also has shape (Q, dim).

The geometric factors stored in qdata represent the symmetric matrix $w \det(J) J^{-\intercal} J^{-1}$ evaluated at every quadrature point. In order to reduce data usage, instead of storing this data as a $d \times d$ matrix, we use the fact that we know it is symmetric to only store $d(d+1)/2$ entries, and the remaining entries we infer by symmetry. These entries are stored using the Voigt convention. LibCEED.jl provides some utilities for storing and extracting symmetric matrices stored in this fashion.

After the field specifications, we have the body of the Q-function:

@inbounds @simd for i=1:Q
dXdxdXdxT = getvoigt(@view(qdata[i,:]), CeedDim(dim))
dui = SVector{dim}(@view(du[i,:]))
dv[i,:] .= dXdxdXdxT*dui
end

First, the matrix $w \det(J) J^{-\intercal} J^{-1}$ is stored in the variable dXdxdXdxT. The symmetric entries of this matrix are accesed using @view(qdata[i,:]), which avoids allocations. getvoigt is used to convert from Voigt notation to a symmetric matrix, which returns a statically sized SMatrix. The version for the correct spatial dimension is selected using CeedDim(dim), which allows for compile-time dispatch, since dim is a constant whose value is known as a constant when the Q-function is JIT compiled.

Then, the gradient of $u$ at the given quadrature point is loaded as a fixed-size SVector. The result is placed into the output array, where the StaticArrays.jl package evaluates dXdxdXdxT*dui using an optimized matrix-vector product for small matrices (since their sizes are known statically).

## GPU Kernels

If the Ceed resource uses a CUDA backend, then the user Q-functions defined using @interior_qf are automatically compiled as CUDA kernels using CUDA.jl. Some Julia features are not available in GPU code (for example, dynamic dispatch), so if the Q-function is intended to be run on the GPU, the user should take care when defining the body of the user Q-function.