Utilities¶
Helper functions and utilities.
Search¶
Search utilities for VSA codebook operations.
This module provides search functions for finding hypervectors in codebooks, including k-nearest neighbors, threshold-based search, and batch similarity computation.
- Key Features:
K-nearest neighbors (K-NN) search
Threshold-based retrieval
Vectorized batch similarity computation
Efficient codebook operations
- Based on:
Standard VSA search operations for associative memory and content-addressable storage.
References
Kanerva (2009): Hyperdimensional Computing Plate (2003): Holographic Reduced Representations
- holovec.utils.search.nearest_neighbors(query: Any, codebook: Dict[str, Any], model: VSAModel, k: int = 5, return_similarities: bool = True) Tuple[List[str], List[float] | None][source]¶
Find k-nearest neighbors in codebook.
Computes similarity between query and all codebook entries, returning the k entries with highest similarity.
- Parameters:
query – Query hypervector
codebook – Dictionary mapping labels to hypervectors
model – VSA model for similarity computation
k – Number of neighbors to return (default: 5)
return_similarities – If True, return similarities (default: True)
- Returns:
labels: List of k labels sorted by similarity (highest first)
- similarities: List of k similarities (if return_similarities=True),
otherwise None
- Return type:
Tuple of
- Raises:
TypeError – If arguments are not correct types
ValueError – If k < 1, k > codebook size, or codebook is empty
Examples
>>> # Find 5 nearest neighbors >>> labels, sims = nearest_neighbors(query, codebook, model, k=5) >>> for label, sim in zip(labels, sims): ... print(f"{label}: {sim:.3f}") >>> >>> # Get only labels >>> labels, _ = nearest_neighbors( ... query, codebook, model, k=3, return_similarities=False ... )
References
Kanerva (2009): Hyperdimensional computing and associative memory
- holovec.utils.search.threshold_search(query: Any, codebook: Dict[str, Any], model: VSAModel, threshold: float = 0.8, return_similarities: bool = True) Tuple[List[str], List[float] | None][source]¶
Find all codebook entries above similarity threshold.
Returns all entries where similarity(query, entry) >= threshold, sorted by similarity (highest first).
- Parameters:
query – Query hypervector
codebook – Dictionary mapping labels to hypervectors
model – VSA model for similarity computation
threshold – Minimum similarity threshold (default: 0.8)
return_similarities – If True, return similarities (default: True)
- Returns:
labels: List of labels above threshold, sorted by similarity
- similarities: List of similarities (if return_similarities=True),
otherwise None
- Return type:
Tuple of
- Raises:
TypeError – If arguments are not correct types
ValueError – If threshold not in [0.0, 1.0] or codebook is empty
Examples
>>> # Find all matches above 0.9 similarity >>> labels, sims = threshold_search( ... query, codebook, model, threshold=0.9 ... ) >>> print(f"Found {len(labels)} matches") >>> >>> # Lenient threshold >>> labels, _ = threshold_search( ... query, codebook, model, threshold=0.5, ... return_similarities=False ... )
References
Standard associative memory retrieval operation
- holovec.utils.search.batch_similarity(queries: List[Any], codebook: Dict[str, Any], model: VSAModel) List[Dict[str, float]][source]¶
Compute similarities between multiple queries and codebook.
Efficiently computes similarity between each query and all codebook entries, returning results as a list of dictionaries.
- Parameters:
queries – List of query hypervectors
codebook – Dictionary mapping labels to hypervectors
model – VSA model for similarity computation
- Returns:
List of dictionaries, one per query, mapping labels to similarities
- Raises:
TypeError – If arguments are not correct types
ValueError – If queries is empty or codebook is empty
Examples
>>> # Batch process multiple queries >>> results = batch_similarity([q1, q2, q3], codebook, model) >>> for i, sims in enumerate(results): ... print(f"Query {i}:") ... best_label = max(sims, key=sims.get) ... print(f" Best: {best_label} ({sims[best_label]:.3f})") >>> >>> # Find best match for each query >>> for query_sims in results: ... best = max(query_sims.items(), key=lambda x: x[1]) ... print(f"Best: {best[0]} with similarity {best[1]:.3f}")
References
Vectorized operations for efficient batch processing
- holovec.utils.search.segment_pattern(vec: Any, space: SparseSegmentSpace) List[int][source]¶
Return per-segment argmax indices (length S) for a vector.
Projects vec to the nearest valid segment pattern via space.normalize(), then returns the index of the active bit per segment.
- holovec.utils.search.find_by_segment_pattern(codebook: Dict[str, Any], space: SparseSegmentSpace, pattern: List[int | None], match_mode: str = 'exact', min_fraction: float = 1.0) List[Tuple[str, float]][source]¶
Find entries whose segment pattern matches the query pattern.
pattern: list of length S with segment indices or None/-1 as wildcards.
- match_mode:
‘exact’: all specified segments must match; returns [(label, 1.0), …]
‘fraction’: return fraction of matching specified segments, filter by min_fraction
Returns a list of (label, score) sorted by score desc.
Operations¶
General utility operations for VSA systems.
This module provides general-purpose operations for hypervector manipulation and analysis, including top-k selection, noise injection, and similarity matrix computation.
- Key Features:
Top-k selection from scored collections
Controlled noise injection for robustness testing
Pairwise similarity matrix computation
Support for various VSA operations
References
Kanerva (2009): Hyperdimensional Computing Plate (2003): Holographic Reduced Representations
- holovec.utils.operations.select_top_k(items: Dict[str, float], k: int = 5) List[Tuple[str, float]][source]¶
Select top-k items by score.
Sorts items by score (descending) and returns the top k items as (label, score) tuples.
- Parameters:
items – Dictionary mapping labels to scores
k – Number of items to select (default: 5)
- Returns:
List of (label, score) tuples sorted by score (highest first)
- Raises:
TypeError – If arguments are not correct types
ValueError – If k < 1, k > items size, or items is empty
Examples
>>> # Select top 3 by similarity >>> scores = {'a': 0.95, 'b': 0.87, 'c': 0.92, 'd': 0.75} >>> top = select_top_k(scores, k=3) >>> print(top) [('a', 0.95), ('c', 0.92), ('b', 0.87)] >>> >>> # Get just the labels >>> labels = [label for label, _ in select_top_k(scores, k=2)] >>> print(labels) ['a', 'c']
References
Standard selection operation for ranked retrieval
- holovec.utils.operations.add_noise(vector: Any, model: VSAModel, noise_level: float = 0.1, seed: int = None) Any[source]¶
Add controlled noise to a hypervector.
Adds noise by bundling the original vector with a random vector, weighted by noise_level. Useful for testing robustness and approximate matching.
- Parameters:
vector – Original hypervector
model – VSA model for random generation and bundling
noise_level – Proportion of noise to add (0.0 = none, 1.0 = full) (default: 0.1)
seed – Random seed for reproducibility (default: None)
- Returns:
Noisy hypervector
- Raises:
TypeError – If arguments are not correct types
ValueError – If noise_level not in [0.0, 1.0]
Examples
>>> # Add 10% noise >>> noisy = add_noise(original, model, noise_level=0.1) >>> sim = model.similarity(original, noisy) >>> print(f"Similarity after noise: {sim:.3f}") >>> >>> # Heavy noise for stress testing >>> very_noisy = add_noise(original, model, noise_level=0.5) >>> >>> # Reproducible noise >>> noisy1 = add_noise(original, model, noise_level=0.2, seed=42) >>> noisy2 = add_noise(original, model, noise_level=0.2, seed=42) >>> # noisy1 and noisy2 are identical
References
Robustness testing in Kanerva (2009) and related work
- holovec.utils.operations.similarity_matrix(vectors: List[Any], model: VSAModel, labels: List[str] = None) ndarray[source]¶
Compute pairwise similarity matrix.
Computes similarity between all pairs of vectors, returning an n×n similarity matrix where entry (i,j) is similarity(vectors[i], vectors[j]).
- Parameters:
vectors – List of hypervectors
model – VSA model for similarity computation
labels – Optional labels for vectors (for reference, not used in computation)
- Returns:
NxN numpy array of pairwise similarities
- Raises:
TypeError – If arguments are not correct types
ValueError – If vectors is empty or labels length doesn’t match
Examples
>>> # Compute similarity matrix >>> vectors = [model.random(seed=i) for i in range(5)] >>> sim_matrix = similarity_matrix(vectors, model) >>> print(f"Shape: {sim_matrix.shape}") (5, 5) >>> print(f"Diagonal (self-similarity): {np.diag(sim_matrix)}") >>> >>> # With labels for interpretation >>> labels = ['cat', 'dog', 'bird', 'fish', 'snake'] >>> sim_matrix = similarity_matrix(vectors, model, labels) >>> # Most similar pair (excluding self-similarity) >>> np.fill_diagonal(sim_matrix, -np.inf) >>> i, j = np.unravel_index(np.argmax(sim_matrix), sim_matrix.shape) >>> print(f"Most similar: {labels[i]} - {labels[j]}")
References
Standard analysis tool for VSA systems
CPSE/CPSD¶
Context-preserving encoding and decoding.
CPSE/CPSD utilities for context-preserving compositional encoding.
This module provides utilities for Context-Preserving SDR Encoding (CPSE) and Context-Preserving SDR Decoding (CPSD), which represent a superior evolution of Context-Dependent Thinning (CDT).
- Key Features:
Order preservation via position permutations
Stable convergence (1.95% ± 0.15% error)
Fast convergence (4-5 iterations for M≥4 components)
Practical decoding methods (basic CPSD + Triadic Memory)
- Based on:
Malits & Mendelson (2025) “Context-Preserving Encoding/Decoding of Compositional Structures”
References
Paper: Malits & Mendelson (2025) - CPSE/CPSD specifications GitHub: https://github.com/PeterOvermann/TriadicMemory
- Mathematical Foundation:
Additive iterations: K ≈ log(1 - 1/M) / log(1 - M·p) [Eq. 8]
Subtractive iterations: Complex formula [Eq. 15]
Total: 4-5 iterations for M ≥ 4 (near-constant)
- class holovec.utils.cpse.CPSEMetadata(n_components: int, permutation_seeds: List[int], base_seed: int = 42)[source]¶
Bases:
objectMetadata for CPSE encoding operations.
Tracks permutation patterns, component structure, and encoding parameters for context-preserving operations. This metadata is essential for decoding and should be stored alongside encoded vectors.
- The metadata enables:
Reconstruction of position permutations for decoding
Validation of convergence in encoding/decoding cycles
Deterministic reproduction of encoding operations
- n_components¶
Number of components in composition (M)
- permutation_seeds¶
Seeds for generating position-specific permutations
- base_seed¶
Base seed for reproducibility
Examples
>>> # Create metadata for 5-component composition >>> metadata = CPSEMetadata( ... n_components=5, ... permutation_seeds=[42, 43, 44, 45, 46], ... base_seed=42 ... ) >>> >>> # Serialize for storage >>> metadata.to_json('cpse_metadata.json') >>> >>> # Later, reload for decoding >>> metadata = CPSEMetadata.from_json('cpse_metadata.json')
References
Malits & Mendelson (2025), Section 3.1: Position Encoding
Initialize CPSE metadata.
- Parameters:
n_components – Number of components in composition (must be >= 2)
permutation_seeds – Seed for each position permutation (must have length == n_components)
base_seed – Base seed for reproducibility (default: 42)
- Raises:
TypeError – If arguments are not correct types
ValueError – If n_components < 2 or permutation_seeds length mismatch
Examples
>>> # Minimal valid metadata >>> metadata = CPSEMetadata(2, [42, 43]) >>> >>> # Typical usage with 5 components >>> seeds = generate_permutation_patterns(n_patterns=5) >>> metadata = CPSEMetadata(5, seeds, base_seed=42)
- __init__(n_components: int, permutation_seeds: List[int], base_seed: int = 42)[source]¶
Initialize CPSE metadata.
- Parameters:
n_components – Number of components in composition (must be >= 2)
permutation_seeds – Seed for each position permutation (must have length == n_components)
base_seed – Base seed for reproducibility (default: 42)
- Raises:
TypeError – If arguments are not correct types
ValueError – If n_components < 2 or permutation_seeds length mismatch
Examples
>>> # Minimal valid metadata >>> metadata = CPSEMetadata(2, [42, 43]) >>> >>> # Typical usage with 5 components >>> seeds = generate_permutation_patterns(n_patterns=5) >>> metadata = CPSEMetadata(5, seeds, base_seed=42)
- to_dict() Dict[str, Any][source]¶
Serialize metadata to dictionary.
- Returns:
Dictionary with all metadata fields
Examples
>>> metadata = CPSEMetadata(3, [42, 43, 44]) >>> data = metadata.to_dict() >>> print(data) {'n_components': 3, 'permutation_seeds': [42, 43, 44], 'base_seed': 42}
- classmethod from_dict(data: Dict[str, Any]) CPSEMetadata[source]¶
Deserialize metadata from dictionary.
- Parameters:
data – Dictionary with metadata fields
- Returns:
CPSEMetadata instance
- Raises:
KeyError – If required fields are missing
TypeError/ValueError – If field values are invalid
Examples
>>> data = {'n_components': 3, 'permutation_seeds': [42, 43, 44], 'base_seed': 42} >>> metadata = CPSEMetadata.from_dict(data) >>> print(metadata.n_components) 3
- to_json(path: str)[source]¶
Save metadata to JSON file.
- Parameters:
path – File path for saving
Examples
>>> metadata = CPSEMetadata(3, [42, 43, 44]) >>> metadata.to_json('my_cpse_metadata.json')
- classmethod from_json(path: str) CPSEMetadata[source]¶
Load metadata from JSON file.
- Parameters:
path – File path for loading
- Returns:
CPSEMetadata instance
- Raises:
FileNotFoundError – If file doesn’t exist
json.JSONDecodeError – If file is not valid JSON
KeyError – If required fields are missing
Examples
>>> metadata = CPSEMetadata.from_json('my_cpse_metadata.json') >>> print(metadata.n_components) 3
- holovec.utils.cpse.generate_permutation_patterns(n_patterns: int, base_seed: int = 42) List[int][source]¶
Generate permutation seeds for CPSE encoding.
Creates deterministic permutation seeds for position-dependent thinning operations. Each seed generates a unique permutation matrix used to encode position information.
The seeds are generated as: [base_seed, base_seed+1, …, base_seed+n-1]
- Parameters:
n_patterns – Number of permutation patterns to generate
base_seed – Base random seed (default: 42)
- Returns:
List of permutation seeds (length == n_patterns)
- Raises:
TypeError – If arguments are not correct types
ValueError – If n_patterns < 1
Examples
>>> # Generate seeds for 5-component composition >>> seeds = generate_permutation_patterns(n_patterns=5) >>> print(seeds) [42, 43, 44, 45, 46] >>> >>> # Generate with custom base seed >>> seeds = generate_permutation_patterns(n_patterns=3, base_seed=100) >>> print(seeds) [100, 101, 102]
References
Malits & Mendelson (2025), Section 3.1: Position Encoding - Each position i gets permutation p̃ᵢ derived from seed[i] - Deterministic generation ensures reproducibility
- holovec.utils.cpse.validate_cpse_convergence(original_components: List[Any], decoded_components: List[Any], model: VSAModel, threshold: float = 0.95) Tuple[bool, List[float]][source]¶
Validate CPSE decoding convergence.
Checks if decoded components are sufficiently similar to originals by computing pairwise similarities and comparing against a threshold. This is essential for verifying that the encoding-decoding cycle preserves information.
- Typical convergence rates (Malits & Mendelson 2025, Table 1):
Basic CPSD: 95-98% similarity for M=2-5 components
With Triadic Memory: 97-99% similarity
Target threshold: 0.95 (95%) is conservative
- Parameters:
original_components – Original component hypervectors (length M)
decoded_components – Decoded component hypervectors (length M)
model – VSA model for similarity computation
threshold – Minimum acceptable similarity (default: 0.95)
- Returns:
converged (bool): True if all similarities >= threshold
similarities (List[float]): Similarity for each component pair
- Return type:
Tuple of
- Raises:
TypeError – If arguments are not correct types
ValueError – If component lists have different lengths
Examples
>>> # Validate decoding with strict threshold >>> converged, sims = validate_cpse_convergence( ... original_components=originals, ... decoded_components=decoded, ... model=model, ... threshold=0.95 ... ) >>> if converged: ... print(f"Converged! Avg similarity: {np.mean(sims):.3f}") ... else: ... print(f"Failed to converge. Min similarity: {min(sims):.3f}") >>> >>> # More lenient threshold for noisy conditions >>> converged, sims = validate_cpse_convergence( ... originals, decoded, model, threshold=0.90 ... )
References
Malits & Mendelson (2025), Section 4: Experimental Results - Table 1 shows typical convergence rates for different M - Figure 3 demonstrates convergence behavior
See Also¶
API Reference - Complete API reference