Utilities

Helper functions and utilities.

Search

Search utilities for VSA codebook operations.

This module provides search functions for finding hypervectors in codebooks, including k-nearest neighbors, threshold-based search, and batch similarity computation.

Key Features:
  • K-nearest neighbors (K-NN) search

  • Threshold-based retrieval

  • Vectorized batch similarity computation

  • Efficient codebook operations

Based on:

Standard VSA search operations for associative memory and content-addressable storage.

References

Kanerva (2009): Hyperdimensional Computing Plate (2003): Holographic Reduced Representations

holovec.utils.search.nearest_neighbors(query: Any, codebook: Dict[str, Any], model: VSAModel, k: int = 5, return_similarities: bool = True) Tuple[List[str], List[float] | None][source]

Find k-nearest neighbors in codebook.

Computes similarity between query and all codebook entries, returning the k entries with highest similarity.

Parameters:
  • query – Query hypervector

  • codebook – Dictionary mapping labels to hypervectors

  • model – VSA model for similarity computation

  • k – Number of neighbors to return (default: 5)

  • return_similarities – If True, return similarities (default: True)

Returns:

  • labels: List of k labels sorted by similarity (highest first)

  • similarities: List of k similarities (if return_similarities=True),

    otherwise None

Return type:

Tuple of

Raises:
  • TypeError – If arguments are not correct types

  • ValueError – If k < 1, k > codebook size, or codebook is empty

Examples

>>> # Find 5 nearest neighbors
>>> labels, sims = nearest_neighbors(query, codebook, model, k=5)
>>> for label, sim in zip(labels, sims):
...     print(f"{label}: {sim:.3f}")
>>>
>>> # Get only labels
>>> labels, _ = nearest_neighbors(
...     query, codebook, model, k=3, return_similarities=False
... )

References

Kanerva (2009): Hyperdimensional computing and associative memory

Find all codebook entries above similarity threshold.

Returns all entries where similarity(query, entry) >= threshold, sorted by similarity (highest first).

Parameters:
  • query – Query hypervector

  • codebook – Dictionary mapping labels to hypervectors

  • model – VSA model for similarity computation

  • threshold – Minimum similarity threshold (default: 0.8)

  • return_similarities – If True, return similarities (default: True)

Returns:

  • labels: List of labels above threshold, sorted by similarity

  • similarities: List of similarities (if return_similarities=True),

    otherwise None

Return type:

Tuple of

Raises:
  • TypeError – If arguments are not correct types

  • ValueError – If threshold not in [0.0, 1.0] or codebook is empty

Examples

>>> # Find all matches above 0.9 similarity
>>> labels, sims = threshold_search(
...     query, codebook, model, threshold=0.9
... )
>>> print(f"Found {len(labels)} matches")
>>>
>>> # Lenient threshold
>>> labels, _ = threshold_search(
...     query, codebook, model, threshold=0.5,
...     return_similarities=False
... )

References

Standard associative memory retrieval operation

holovec.utils.search.batch_similarity(queries: List[Any], codebook: Dict[str, Any], model: VSAModel) List[Dict[str, float]][source]

Compute similarities between multiple queries and codebook.

Efficiently computes similarity between each query and all codebook entries, returning results as a list of dictionaries.

Parameters:
  • queries – List of query hypervectors

  • codebook – Dictionary mapping labels to hypervectors

  • model – VSA model for similarity computation

Returns:

List of dictionaries, one per query, mapping labels to similarities

Raises:
  • TypeError – If arguments are not correct types

  • ValueError – If queries is empty or codebook is empty

Examples

>>> # Batch process multiple queries
>>> results = batch_similarity([q1, q2, q3], codebook, model)
>>> for i, sims in enumerate(results):
...     print(f"Query {i}:")
...     best_label = max(sims, key=sims.get)
...     print(f"  Best: {best_label} ({sims[best_label]:.3f})")
>>>
>>> # Find best match for each query
>>> for query_sims in results:
...     best = max(query_sims.items(), key=lambda x: x[1])
...     print(f"Best: {best[0]} with similarity {best[1]:.3f}")

References

Vectorized operations for efficient batch processing

holovec.utils.search.segment_pattern(vec: Any, space: SparseSegmentSpace) List[int][source]

Return per-segment argmax indices (length S) for a vector.

Projects vec to the nearest valid segment pattern via space.normalize(), then returns the index of the active bit per segment.

holovec.utils.search.find_by_segment_pattern(codebook: Dict[str, Any], space: SparseSegmentSpace, pattern: List[int | None], match_mode: str = 'exact', min_fraction: float = 1.0) List[Tuple[str, float]][source]

Find entries whose segment pattern matches the query pattern.

  • pattern: list of length S with segment indices or None/-1 as wildcards.

  • match_mode:
    • ‘exact’: all specified segments must match; returns [(label, 1.0), …]

    • ‘fraction’: return fraction of matching specified segments, filter by min_fraction

Returns a list of (label, score) sorted by score desc.

Operations

General utility operations for VSA systems.

This module provides general-purpose operations for hypervector manipulation and analysis, including top-k selection, noise injection, and similarity matrix computation.

Key Features:
  • Top-k selection from scored collections

  • Controlled noise injection for robustness testing

  • Pairwise similarity matrix computation

  • Support for various VSA operations

References

Kanerva (2009): Hyperdimensional Computing Plate (2003): Holographic Reduced Representations

holovec.utils.operations.select_top_k(items: Dict[str, float], k: int = 5) List[Tuple[str, float]][source]

Select top-k items by score.

Sorts items by score (descending) and returns the top k items as (label, score) tuples.

Parameters:
  • items – Dictionary mapping labels to scores

  • k – Number of items to select (default: 5)

Returns:

List of (label, score) tuples sorted by score (highest first)

Raises:
  • TypeError – If arguments are not correct types

  • ValueError – If k < 1, k > items size, or items is empty

Examples

>>> # Select top 3 by similarity
>>> scores = {'a': 0.95, 'b': 0.87, 'c': 0.92, 'd': 0.75}
>>> top = select_top_k(scores, k=3)
>>> print(top)
[('a', 0.95), ('c', 0.92), ('b', 0.87)]
>>>
>>> # Get just the labels
>>> labels = [label for label, _ in select_top_k(scores, k=2)]
>>> print(labels)
['a', 'c']

References

Standard selection operation for ranked retrieval

holovec.utils.operations.add_noise(vector: Any, model: VSAModel, noise_level: float = 0.1, seed: int = None) Any[source]

Add controlled noise to a hypervector.

Adds noise by bundling the original vector with a random vector, weighted by noise_level. Useful for testing robustness and approximate matching.

Parameters:
  • vector – Original hypervector

  • model – VSA model for random generation and bundling

  • noise_level – Proportion of noise to add (0.0 = none, 1.0 = full) (default: 0.1)

  • seed – Random seed for reproducibility (default: None)

Returns:

Noisy hypervector

Raises:
  • TypeError – If arguments are not correct types

  • ValueError – If noise_level not in [0.0, 1.0]

Examples

>>> # Add 10% noise
>>> noisy = add_noise(original, model, noise_level=0.1)
>>> sim = model.similarity(original, noisy)
>>> print(f"Similarity after noise: {sim:.3f}")
>>>
>>> # Heavy noise for stress testing
>>> very_noisy = add_noise(original, model, noise_level=0.5)
>>>
>>> # Reproducible noise
>>> noisy1 = add_noise(original, model, noise_level=0.2, seed=42)
>>> noisy2 = add_noise(original, model, noise_level=0.2, seed=42)
>>> # noisy1 and noisy2 are identical

References

Robustness testing in Kanerva (2009) and related work

holovec.utils.operations.similarity_matrix(vectors: List[Any], model: VSAModel, labels: List[str] = None) ndarray[source]

Compute pairwise similarity matrix.

Computes similarity between all pairs of vectors, returning an n×n similarity matrix where entry (i,j) is similarity(vectors[i], vectors[j]).

Parameters:
  • vectors – List of hypervectors

  • model – VSA model for similarity computation

  • labels – Optional labels for vectors (for reference, not used in computation)

Returns:

NxN numpy array of pairwise similarities

Raises:
  • TypeError – If arguments are not correct types

  • ValueError – If vectors is empty or labels length doesn’t match

Examples

>>> # Compute similarity matrix
>>> vectors = [model.random(seed=i) for i in range(5)]
>>> sim_matrix = similarity_matrix(vectors, model)
>>> print(f"Shape: {sim_matrix.shape}")
(5, 5)
>>> print(f"Diagonal (self-similarity): {np.diag(sim_matrix)}")
>>>
>>> # With labels for interpretation
>>> labels = ['cat', 'dog', 'bird', 'fish', 'snake']
>>> sim_matrix = similarity_matrix(vectors, model, labels)
>>> # Most similar pair (excluding self-similarity)
>>> np.fill_diagonal(sim_matrix, -np.inf)
>>> i, j = np.unravel_index(np.argmax(sim_matrix), sim_matrix.shape)
>>> print(f"Most similar: {labels[i]} - {labels[j]}")

References

Standard analysis tool for VSA systems

CPSE/CPSD

Context-preserving encoding and decoding.

CPSE/CPSD utilities for context-preserving compositional encoding.

This module provides utilities for Context-Preserving SDR Encoding (CPSE) and Context-Preserving SDR Decoding (CPSD), which represent a superior evolution of Context-Dependent Thinning (CDT).

Key Features:
  • Order preservation via position permutations

  • Stable convergence (1.95% ± 0.15% error)

  • Fast convergence (4-5 iterations for M≥4 components)

  • Practical decoding methods (basic CPSD + Triadic Memory)

Based on:

Malits & Mendelson (2025) “Context-Preserving Encoding/Decoding of Compositional Structures”

References

Paper: Malits & Mendelson (2025) - CPSE/CPSD specifications GitHub: https://github.com/PeterOvermann/TriadicMemory

Mathematical Foundation:
  • Additive iterations: K ≈ log(1 - 1/M) / log(1 - M·p) [Eq. 8]

  • Subtractive iterations: Complex formula [Eq. 15]

  • Total: 4-5 iterations for M ≥ 4 (near-constant)

class holovec.utils.cpse.CPSEMetadata(n_components: int, permutation_seeds: List[int], base_seed: int = 42)[source]

Bases: object

Metadata for CPSE encoding operations.

Tracks permutation patterns, component structure, and encoding parameters for context-preserving operations. This metadata is essential for decoding and should be stored alongside encoded vectors.

The metadata enables:
  • Reconstruction of position permutations for decoding

  • Validation of convergence in encoding/decoding cycles

  • Deterministic reproduction of encoding operations

n_components

Number of components in composition (M)

permutation_seeds

Seeds for generating position-specific permutations

base_seed

Base seed for reproducibility

Examples

>>> # Create metadata for 5-component composition
>>> metadata = CPSEMetadata(
...     n_components=5,
...     permutation_seeds=[42, 43, 44, 45, 46],
...     base_seed=42
... )
>>>
>>> # Serialize for storage
>>> metadata.to_json('cpse_metadata.json')
>>>
>>> # Later, reload for decoding
>>> metadata = CPSEMetadata.from_json('cpse_metadata.json')

References

Malits & Mendelson (2025), Section 3.1: Position Encoding

Initialize CPSE metadata.

Parameters:
  • n_components – Number of components in composition (must be >= 2)

  • permutation_seeds – Seed for each position permutation (must have length == n_components)

  • base_seed – Base seed for reproducibility (default: 42)

Raises:
  • TypeError – If arguments are not correct types

  • ValueError – If n_components < 2 or permutation_seeds length mismatch

Examples

>>> # Minimal valid metadata
>>> metadata = CPSEMetadata(2, [42, 43])
>>>
>>> # Typical usage with 5 components
>>> seeds = generate_permutation_patterns(n_patterns=5)
>>> metadata = CPSEMetadata(5, seeds, base_seed=42)
__init__(n_components: int, permutation_seeds: List[int], base_seed: int = 42)[source]

Initialize CPSE metadata.

Parameters:
  • n_components – Number of components in composition (must be >= 2)

  • permutation_seeds – Seed for each position permutation (must have length == n_components)

  • base_seed – Base seed for reproducibility (default: 42)

Raises:
  • TypeError – If arguments are not correct types

  • ValueError – If n_components < 2 or permutation_seeds length mismatch

Examples

>>> # Minimal valid metadata
>>> metadata = CPSEMetadata(2, [42, 43])
>>>
>>> # Typical usage with 5 components
>>> seeds = generate_permutation_patterns(n_patterns=5)
>>> metadata = CPSEMetadata(5, seeds, base_seed=42)
to_dict() Dict[str, Any][source]

Serialize metadata to dictionary.

Returns:

Dictionary with all metadata fields

Examples

>>> metadata = CPSEMetadata(3, [42, 43, 44])
>>> data = metadata.to_dict()
>>> print(data)
{'n_components': 3, 'permutation_seeds': [42, 43, 44], 'base_seed': 42}
classmethod from_dict(data: Dict[str, Any]) CPSEMetadata[source]

Deserialize metadata from dictionary.

Parameters:

data – Dictionary with metadata fields

Returns:

CPSEMetadata instance

Raises:
  • KeyError – If required fields are missing

  • TypeError/ValueError – If field values are invalid

Examples

>>> data = {'n_components': 3, 'permutation_seeds': [42, 43, 44], 'base_seed': 42}
>>> metadata = CPSEMetadata.from_dict(data)
>>> print(metadata.n_components)
3
to_json(path: str)[source]

Save metadata to JSON file.

Parameters:

path – File path for saving

Examples

>>> metadata = CPSEMetadata(3, [42, 43, 44])
>>> metadata.to_json('my_cpse_metadata.json')
classmethod from_json(path: str) CPSEMetadata[source]

Load metadata from JSON file.

Parameters:

path – File path for loading

Returns:

CPSEMetadata instance

Raises:

Examples

>>> metadata = CPSEMetadata.from_json('my_cpse_metadata.json')
>>> print(metadata.n_components)
3
__repr__() str[source]

String representation of metadata.

__eq__(other: object) bool[source]

Check equality with another CPSEMetadata instance.

holovec.utils.cpse.generate_permutation_patterns(n_patterns: int, base_seed: int = 42) List[int][source]

Generate permutation seeds for CPSE encoding.

Creates deterministic permutation seeds for position-dependent thinning operations. Each seed generates a unique permutation matrix used to encode position information.

The seeds are generated as: [base_seed, base_seed+1, …, base_seed+n-1]

Parameters:
  • n_patterns – Number of permutation patterns to generate

  • base_seed – Base random seed (default: 42)

Returns:

List of permutation seeds (length == n_patterns)

Raises:

Examples

>>> # Generate seeds for 5-component composition
>>> seeds = generate_permutation_patterns(n_patterns=5)
>>> print(seeds)
[42, 43, 44, 45, 46]
>>>
>>> # Generate with custom base seed
>>> seeds = generate_permutation_patterns(n_patterns=3, base_seed=100)
>>> print(seeds)
[100, 101, 102]

References

Malits & Mendelson (2025), Section 3.1: Position Encoding - Each position i gets permutation p̃ᵢ derived from seed[i] - Deterministic generation ensures reproducibility

holovec.utils.cpse.validate_cpse_convergence(original_components: List[Any], decoded_components: List[Any], model: VSAModel, threshold: float = 0.95) Tuple[bool, List[float]][source]

Validate CPSE decoding convergence.

Checks if decoded components are sufficiently similar to originals by computing pairwise similarities and comparing against a threshold. This is essential for verifying that the encoding-decoding cycle preserves information.

Typical convergence rates (Malits & Mendelson 2025, Table 1):
  • Basic CPSD: 95-98% similarity for M=2-5 components

  • With Triadic Memory: 97-99% similarity

  • Target threshold: 0.95 (95%) is conservative

Parameters:
  • original_components – Original component hypervectors (length M)

  • decoded_components – Decoded component hypervectors (length M)

  • model – VSA model for similarity computation

  • threshold – Minimum acceptable similarity (default: 0.95)

Returns:

  • converged (bool): True if all similarities >= threshold

  • similarities (List[float]): Similarity for each component pair

Return type:

Tuple of

Raises:
  • TypeError – If arguments are not correct types

  • ValueError – If component lists have different lengths

Examples

>>> # Validate decoding with strict threshold
>>> converged, sims = validate_cpse_convergence(
...     original_components=originals,
...     decoded_components=decoded,
...     model=model,
...     threshold=0.95
... )
>>> if converged:
...     print(f"Converged! Avg similarity: {np.mean(sims):.3f}")
... else:
...     print(f"Failed to converge. Min similarity: {min(sims):.3f}")
>>>
>>> # More lenient threshold for noisy conditions
>>> converged, sims = validate_cpse_convergence(
...     originals, decoded, model, threshold=0.90
... )

References

Malits & Mendelson (2025), Section 4: Experimental Results - Table 1 shows typical convergence rates for different M - Figure 3 demonstrates convergence behavior

See Also