Computational Complexity Analysis

1. Global Parameters

Symbol

Meaning

\(S\)

Number of input sequences.

\(M_s\)

Number of enabled embedding models.

\(L_s\)

Number of selected layers (only affects storage).

\(L_q\)

Average sequence length.

\(E_L\)

Number of reference embeddings in the lookup table.

\(C_{\text{embed}}\)

Inference cost per sequence/model (depends on architecture and sequence length).

\(C_{\text{dist}}\)

Cost of computing the distance between two embeddings.

\(p\)

Effective parallelism factor (processors or GPU multiprocess).

2. Stage A — Embedding Generation

\[\text{Cost A} \approx O\left(\frac{S \times M_s \times L_q \times C_{\text{embed}}}{p_{\text{embed}}}\right)\]
  • A single embedding is generated for each sequence and model.

  • \(L_s\) does not affect compute cost — it only impacts I/O (layer dumps in HDF5).

  • The cost scales linearly with:

    • number of sequences \(S\),

    • number of models \(M_s\),

    • average sequence length \(L_q\),

    • and proportionally to model size and architecture.

\[p_{\text{embed}} \text{ represents the effective speed-up from batching, GPU cores, VRAM and I/O throughput.}\]

Note

HDF5 writing cost is negligible compared to inference. The real bottleneck is the forward pass.

3. Stage B — Lookup & Distance Computation

\[\text{Lookup cost} \approx O\left(\frac{E_q \times E_L \times L_s \times C_{\text{dist}}}{p_{\text{lookup}}}\right)\]

where:

\[E_q = S \times M_s\]
  • For each selected layer of each enabled model, the lookup operation must be executed independently.

  • This results in a multiplication over three dimensions:

    • number of query embeddings (\(E_q\)),

    • number of reference embeddings (\(E_L\)),

    • number of selected layers (\(L_s\)).

  • The lookup operation itself is implemented as a vectorized matrix multiplication between:

    \[Q \in \mathbb{R}^{(E_q \times L_s) \times d} \quad \text{and} \quad R \in \mathbb{R}^{E_L \times d}\]

    where \(d\) is the embedding dimensionality.

    This produces:

    \[\text{Similarity matrix} \in \mathbb{R}^{(E_q \times L_s) \times E_L}\]
  • Taxonomy or redundancy filters:

    • add a small constant overhead,

    • reduce the effective reference size \(E_L^{\text{eff}}\),

    • but do not change the asymptotic order.

If filters are applied:

\[E_L^{\text{eff}} = f(E_L) \quad \text{with } f(E_L) \leq E_L\]

and

\[\text{Lookup cost} \approx O\left(\frac{E_q \times E_L \times L_s \times C_{\text{dist}}}{p_{\text{lookup}}}\right)\]

where:

\[E_q = S \times M_s\]
\[p_{\text{lookup}} \text{ captures the effective GPU acceleration for the vectorized kernel.}\]

Note

Since each selected layer multiplies the number of lookup passes, \(L_s\) becomes a first-class scaling factor in lookup complexity. This is typically the dominant term when using multiple models and multiple layers per model.

4. Stage C — Post-processing

\[\text{Cost C} \approx O\left(E_q \times K \times C_{\text{collapse}}\right)\]
  • \(K\): number of retained neighbors per query embedding.

  • This stage does not scale quadratically — its cost depends on aggregation and annotation per neighbor.

  • As \(K\) increases, selection and downstream annotation become non-negligible.

Note

Post-processing is linear in \(K\) and can significantly contribute to runtime when \(K\) is large.

5. Overall Complexity

\[T_{\text{total}} \approx O\left(\frac{S \times M_s \times L_q \times C_{\text{embed}}}{p_{\text{embed}}}\right) + O\left(\frac{S \times M_s \times L_s \times E_L \times C_{\text{dist}}}{p_{\text{lookup}}}\right) + O\left(S \times M_s \times K \times C_{\text{collapse}}\right)\]
  • If \(E_L\) is large, the lookup term dominates.

  • If \(S\) is very large and \(E_L\) moderate, Stage A can become significant.

  • Stage C grows linearly with \(K\) and may meaningfully contribute to total runtime at high neighbor counts.

6. Dominant Factors per Stage

Stage

Dominant factors

Secondary influence

Comment

A — Embedding

\(S\), \(M_s\), \(L_q\)

Write I/O (negligible)

Controlled by model/layer selection and batch size.

B — Lookup

\(S\), \(M_s\), \(E_L\), \(L_s\)

Filters (asymptotically neutral)

Quadratic scaling — main bottleneck.

C — Post

\(S\), \(M_s\), \(K\)

Linear and predictable.

7. Practical Observations

  • Average sequence length impacts only Stage A.

  • Number of layers affects the lookup stage linearly — each selected layer multiplies the number of lookup operations.

  • Filters help reduce memory and absolute runtime, but do not change asymptotic complexity.

  • Parallelism is critical:

    • Stage A: batching + VRAM.

    • Stage B: vectorized GPU kernels.

  • High values of \(K\) also increase the cost of top-K selection and annotation in Stage C.

  • For large-scale experiments with millions of reference embeddings, and especially at large \(K\), Stage B (lookup) together with top-K selection becomes the asymptotic bottleneck.