Computational Complexity Analysis¶
1. Global Parameters¶
Symbol |
Meaning |
|---|---|
\(S\) |
Number of input sequences. |
\(M_s\) |
Number of enabled embedding models. |
\(L_s\) |
Number of selected layers (only affects storage). |
\(L_q\) |
Average sequence length. |
\(E_L\) |
Number of reference embeddings in the lookup table. |
\(C_{\text{embed}}\) |
Inference cost per sequence/model (depends on architecture and sequence length). |
\(C_{\text{dist}}\) |
Cost of computing the distance between two embeddings. |
\(p\) |
Effective parallelism factor (processors or GPU multiprocess). |
2. Stage A — Embedding Generation¶
A single embedding is generated for each sequence and model.
\(L_s\) does not affect compute cost — it only impacts I/O (layer dumps in HDF5).
The cost scales linearly with:
number of sequences \(S\),
number of models \(M_s\),
average sequence length \(L_q\),
and proportionally to model size and architecture.
Note
HDF5 writing cost is negligible compared to inference. The real bottleneck is the forward pass.
3. Stage B — Lookup & Distance Computation¶
where:
For each selected layer of each enabled model, the lookup operation must be executed independently.
This results in a multiplication over three dimensions:
number of query embeddings (\(E_q\)),
number of reference embeddings (\(E_L\)),
number of selected layers (\(L_s\)).
The lookup operation itself is implemented as a vectorized matrix multiplication between:
\[Q \in \mathbb{R}^{(E_q \times L_s) \times d} \quad \text{and} \quad R \in \mathbb{R}^{E_L \times d}\]where \(d\) is the embedding dimensionality.
This produces:
\[\text{Similarity matrix} \in \mathbb{R}^{(E_q \times L_s) \times E_L}\]Taxonomy or redundancy filters:
add a small constant overhead,
reduce the effective reference size \(E_L^{\text{eff}}\),
but do not change the asymptotic order.
If filters are applied:
and
where:
Note
Since each selected layer multiplies the number of lookup passes, \(L_s\) becomes a first-class scaling factor in lookup complexity. This is typically the dominant term when using multiple models and multiple layers per model.
4. Stage C — Post-processing¶
\(K\): number of retained neighbors per query embedding.
This stage does not scale quadratically — its cost depends on aggregation and annotation per neighbor.
As \(K\) increases, selection and downstream annotation become non-negligible.
Note
Post-processing is linear in \(K\) and can significantly contribute to runtime when \(K\) is large.
5. Overall Complexity¶
If \(E_L\) is large, the lookup term dominates.
If \(S\) is very large and \(E_L\) moderate, Stage A can become significant.
Stage C grows linearly with \(K\) and may meaningfully contribute to total runtime at high neighbor counts.
6. Dominant Factors per Stage¶
Stage |
Dominant factors |
Secondary influence |
Comment |
|---|---|---|---|
A — Embedding |
\(S\), \(M_s\), \(L_q\) |
Write I/O (negligible) |
Controlled by model/layer selection and batch size. |
B — Lookup |
\(S\), \(M_s\), \(E_L\), \(L_s\) |
Filters (asymptotically neutral) |
Quadratic scaling — main bottleneck. |
C — Post |
\(S\), \(M_s\), \(K\) |
— |
Linear and predictable. |
7. Practical Observations¶
Average sequence length impacts only Stage A.
Number of layers affects the lookup stage linearly — each selected layer multiplies the number of lookup operations.
Filters help reduce memory and absolute runtime, but do not change asymptotic complexity.
Parallelism is critical:
Stage A: batching + VRAM.
Stage B: vectorized GPU kernels.
High values of \(K\) also increase the cost of top-K selection and annotation in Stage C.
For large-scale experiments with millions of reference embeddings, and especially at large \(K\), Stage B (lookup) together with top-K selection becomes the asymptotic bottleneck.