Computational Complexity Analysis¶

1. Global Parameters¶

Symbol	Meaning
\(S\)	Number of input sequences.
\(M_s\)	Number of enabled embedding models.
\(L_s\)	Number of selected layers (only affects storage).
\(L_q\)	Average sequence length.
\(E_L\)	Number of reference embeddings in the lookup table.
\(C_{\text{embed}}\)	Inference cost per sequence/model (depends on architecture and sequence length).
\(C_{\text{dist}}\)	Cost of computing the distance between two embeddings.
\(p\)	Effective parallelism factor (processors or GPU multiprocess).

2. Stage A — Embedding Generation¶

\[\text{Cost A} \approx O\left(\frac{S \times M_s \times L_q \times C_{\text{embed}}}{p_{\text{embed}}}\right)\]

A single embedding is generated for each sequence and model.
\(L_s\) does not affect compute cost — it only impacts I/O (layer dumps in HDF5).
The cost scales linearly with:
- number of sequences \(S\),
- number of models \(M_s\),
- average sequence length \(L_q\),
- and proportionally to model size and architecture.

\[p_{\text{embed}} \text{ represents the effective speed-up from batching, GPU cores, VRAM and I/O throughput.}\]

Note

HDF5 writing cost is negligible compared to inference. The real bottleneck is the forward pass.

3. Stage B — Lookup & Distance Computation¶

\[\text{Lookup cost} \approx O\left(\frac{E_q \times E_L \times L_s \times C_{\text{dist}}}{p_{\text{lookup}}}\right)\]

where:

\[E_q = S \times M_s\]

For each selected layer of each enabled model, the lookup operation must be executed independently.
This results in a multiplication over three dimensions:
- number of query embeddings (\(E_q\)),
- number of reference embeddings (\(E_L\)),
- number of selected layers (\(L_s\)).
The lookup operation itself is implemented as a vectorized matrix multiplication between:

\[Q \in \mathbb{R}^{(E_q \times L_s) \times d} \quad \text{and} \quad R \in \mathbb{R}^{E_L \times d}\]

where \(d\) is the embedding dimensionality.

This produces:

\[\text{Similarity matrix} \in \mathbb{R}^{(E_q \times L_s) \times E_L}\]
Taxonomy or redundancy filters:
- add a small constant overhead,
- reduce the effective reference size \(E_L^{\text{eff}}\),
- but do not change the asymptotic order.

If filters are applied:

\[E_L^{\text{eff}} = f(E_L) \quad \text{with } f(E_L) \leq E_L\]

and

\[\text{Lookup cost} \approx O\left(\frac{E_q \times E_L \times L_s \times C_{\text{dist}}}{p_{\text{lookup}}}\right)\]

where:

\[E_q = S \times M_s\]

\[p_{\text{lookup}} \text{ captures the effective GPU acceleration for the vectorized kernel.}\]

Note

Since each selected layer multiplies the number of lookup passes, \(L_s\) becomes a first-class scaling factor in lookup complexity. This is typically the dominant term when using multiple models and multiple layers per model.

4. Stage C — Post-processing¶

\[\text{Cost C} \approx O\left(E_q \times K \times C_{\text{collapse}}\right)\]

\(K\): number of retained neighbors per query embedding.
This stage does not scale quadratically — its cost depends on aggregation and annotation per neighbor.
As \(K\) increases, selection and downstream annotation become non-negligible.

Note

Post-processing is linear in \(K\) and can significantly contribute to runtime when \(K\) is large.

5. Overall Complexity¶

\[T_{\text{total}} \approx O\left(\frac{S \times M_s \times L_q \times C_{\text{embed}}}{p_{\text{embed}}}\right) + O\left(\frac{S \times M_s \times L_s \times E_L \times C_{\text{dist}}}{p_{\text{lookup}}}\right) + O\left(S \times M_s \times K \times C_{\text{collapse}}\right)\]

If \(E_L\) is large, the lookup term dominates.
If \(S\) is very large and \(E_L\) moderate, Stage A can become significant.
Stage C grows linearly with \(K\) and may meaningfully contribute to total runtime at high neighbor counts.

6. Dominant Factors per Stage¶

Stage	Dominant factors	Secondary influence	Comment
A — Embedding	\(S\), \(M_s\), \(L_q\)	Write I/O (negligible)	Controlled by model/layer selection and batch size.
B — Lookup	\(S\), \(M_s\), \(E_L\), \(L_s\)	Filters (asymptotically neutral)	Quadratic scaling — main bottleneck.
C — Post	\(S\), \(M_s\), \(K\)	—	Linear and predictable.

7. Practical Observations¶

Average sequence length impacts only Stage A.
Number of layers affects the lookup stage linearly — each selected layer multiplies the number of lookup operations.
Filters help reduce memory and absolute runtime, but do not change asymptotic complexity.
Parallelism is critical:
- Stage A: batching + VRAM.
- Stage B: vectorized GPU kernels.
High values of \(K\) also increase the cost of top-K selection and annotation in Stage C.
For large-scale experiments with millions of reference embeddings, and especially at large \(K\), Stage B (lookup) together with top-K selection becomes the asymptotic bottleneck.