Utilities¶

Helper functions and utilities.

Search¶

Search utilities for VSA codebook operations.

This module provides search functions for finding hypervectors in codebooks, including k-nearest neighbors, threshold-based search, and batch similarity computation.

Key Features:

K-nearest neighbors (K-NN) search
Threshold-based retrieval
Vectorized batch similarity computation
Efficient codebook operations

Based on:

Standard VSA search operations for associative memory and content-addressable storage.

References

Kanerva (2009): Hyperdimensional Computing Plate (2003): Holographic Reduced Representations

holovec.utils.search.nearest_neighbors(query: Any, codebook: Dict[str, Any], model: VSAModel, k: int = 5, return_similarities: bool = True) → Tuple[List[str], List[float] | None][source]¶

Find k-nearest neighbors in codebook.

Computes similarity between query and all codebook entries, returning the k entries with highest similarity.

Parameters:

query – Query hypervector
codebook – Dictionary mapping labels to hypervectors
model – VSA model for similarity computation
k – Number of neighbors to return (default: 5)
return_similarities – If True, return similarities (default: True)

Returns:

labels: List of k labels sorted by similarity (highest first)
similarities: List of k similarities (if return_similarities=True),
otherwise None

Return type:

Tuple of

Raises:

TypeError – If arguments are not correct types
ValueError – If k < 1, k > codebook size, or codebook is empty

Examples

>>> # Find 5 nearest neighbors
>>> labels, sims = nearest_neighbors(query, codebook, model, k=5)
>>> for label, sim in zip(labels, sims):
...     print(f"{label}: {sim:.3f}")
>>>
>>> # Get only labels
>>> labels, _ = nearest_neighbors(
...     query, codebook, model, k=3, return_similarities=False
... )

References

Kanerva (2009): Hyperdimensional computing and associative memory

holovec.utils.search.threshold_search(query: Any, codebook: Dict[str, Any], model: VSAModel, threshold: float = 0.8, return_similarities: bool = True) → Tuple[List[str], List[float] | None][source]¶

Find all codebook entries above similarity threshold.

Returns all entries where similarity(query, entry) >= threshold, sorted by similarity (highest first).

Parameters:

query – Query hypervector
codebook – Dictionary mapping labels to hypervectors
model – VSA model for similarity computation
threshold – Minimum similarity threshold (default: 0.8)
return_similarities – If True, return similarities (default: True)

Returns:

labels: List of labels above threshold, sorted by similarity
similarities: List of similarities (if return_similarities=True),
otherwise None

Return type:

Tuple of

Raises:

TypeError – If arguments are not correct types
ValueError – If threshold not in [0.0, 1.0] or codebook is empty

Examples

>>> # Find all matches above 0.9 similarity
>>> labels, sims = threshold_search(
...     query, codebook, model, threshold=0.9
... )
>>> print(f"Found {len(labels)} matches")
>>>
>>> # Lenient threshold
>>> labels, _ = threshold_search(
...     query, codebook, model, threshold=0.5,
...     return_similarities=False
... )

References

Standard associative memory retrieval operation

holovec.utils.search.batch_similarity(queries: List[Any], codebook: Dict[str, Any], model: VSAModel) → List[Dict[str, float]][source]¶

Compute similarities between multiple queries and codebook.

Efficiently computes similarity between each query and all codebook entries, returning results as a list of dictionaries.

Parameters:

queries – List of query hypervectors
codebook – Dictionary mapping labels to hypervectors
model – VSA model for similarity computation

Returns:

List of dictionaries, one per query, mapping labels to similarities

Raises:

TypeError – If arguments are not correct types
ValueError – If queries is empty or codebook is empty

Examples

>>> # Batch process multiple queries
>>> results = batch_similarity([q1, q2, q3], codebook, model)
>>> for i, sims in enumerate(results):
...     print(f"Query {i}:")
...     best_label = max(sims, key=sims.get)
...     print(f"  Best: {best_label} ({sims[best_label]:.3f})")
>>>
>>> # Find best match for each query
>>> for query_sims in results:
...     best = max(query_sims.items(), key=lambda x: x[1])
...     print(f"Best: {best[0]} with similarity {best[1]:.3f}")

References

Vectorized operations for efficient batch processing

holovec.utils.search.segment_pattern(vec: Any, space: SparseSegmentSpace) → List[int][source]¶

Return per-segment argmax indices (length S) for a vector.

Projects vec to the nearest valid segment pattern via space.normalize(), then returns the index of the active bit per segment.

holovec.utils.search.find_by_segment_pattern(codebook: Dict[str, Any], space: SparseSegmentSpace, pattern: List[int | None], match_mode: str = 'exact', min_fraction: float = 1.0) → List[Tuple[str, float]][source]¶

Find entries whose segment pattern matches the query pattern.

pattern: list of length S with segment indices or None/-1 as wildcards.
match_mode:
- ‘exact’: all specified segments must match; returns [(label, 1.0), …]
- ‘fraction’: return fraction of matching specified segments, filter by min_fraction

Returns a list of (label, score) sorted by score desc.

Operations¶

General utility operations for VSA systems.

This module provides general-purpose operations for hypervector manipulation and analysis, including top-k selection, noise injection, and similarity matrix computation.

Key Features:

Top-k selection from scored collections
Controlled noise injection for robustness testing
Pairwise similarity matrix computation
Support for various VSA operations

References

Kanerva (2009): Hyperdimensional Computing Plate (2003): Holographic Reduced Representations

holovec.utils.operations.select_top_k(items: Dict[str, float], k: int = 5) → List[Tuple[str, float]][source]¶

Select top-k items by score.

Sorts items by score (descending) and returns the top k items as (label, score) tuples.

Parameters:

items – Dictionary mapping labels to scores
k – Number of items to select (default: 5)

Returns:

List of (label, score) tuples sorted by score (highest first)

Raises:

TypeError – If arguments are not correct types
ValueError – If k < 1, k > items size, or items is empty

Examples

>>> # Select top 3 by similarity
>>> scores = {'a': 0.95, 'b': 0.87, 'c': 0.92, 'd': 0.75}
>>> top = select_top_k(scores, k=3)
>>> print(top)
[('a', 0.95), ('c', 0.92), ('b', 0.87)]
>>>
>>> # Get just the labels
>>> labels = [label for label, _ in select_top_k(scores, k=2)]
>>> print(labels)
['a', 'c']

References

Standard selection operation for ranked retrieval

holovec.utils.operations.add_noise(vector: Any, model: VSAModel, noise_level: float = 0.1, seed: int = None) → Any[source]¶

Add controlled noise to a hypervector.

Adds noise by bundling the original vector with a random vector, weighted by noise_level. Useful for testing robustness and approximate matching.

Parameters:

vector – Original hypervector
model – VSA model for random generation and bundling
noise_level – Proportion of noise to add (0.0 = none, 1.0 = full) (default: 0.1)
seed – Random seed for reproducibility (default: None)

Returns:

Noisy hypervector

Raises:

TypeError – If arguments are not correct types
ValueError – If noise_level not in [0.0, 1.0]

Examples

>>> # Add 10% noise
>>> noisy = add_noise(original, model, noise_level=0.1)
>>> sim = model.similarity(original, noisy)
>>> print(f"Similarity after noise: {sim:.3f}")
>>>
>>> # Heavy noise for stress testing
>>> very_noisy = add_noise(original, model, noise_level=0.5)
>>>
>>> # Reproducible noise
>>> noisy1 = add_noise(original, model, noise_level=0.2, seed=42)
>>> noisy2 = add_noise(original, model, noise_level=0.2, seed=42)
>>> # noisy1 and noisy2 are identical

References

Robustness testing in Kanerva (2009) and related work

holovec.utils.operations.similarity_matrix(vectors: List[Any], model: VSAModel, labels: List[str] = None) → ndarray[source]¶

Compute pairwise similarity matrix.

Computes similarity between all pairs of vectors, returning an n×n similarity matrix where entry (i,j) is similarity(vectors[i], vectors[j]).

Parameters:

vectors – List of hypervectors
model – VSA model for similarity computation
labels – Optional labels for vectors (for reference, not used in computation)

Returns:

NxN numpy array of pairwise similarities

Raises:

TypeError – If arguments are not correct types
ValueError – If vectors is empty or labels length doesn’t match

Examples

>>> # Compute similarity matrix
>>> vectors = [model.random(seed=i) for i in range(5)]
>>> sim_matrix = similarity_matrix(vectors, model)
>>> print(f"Shape: {sim_matrix.shape}")
(5, 5)
>>> print(f"Diagonal (self-similarity): {np.diag(sim_matrix)}")
>>>
>>> # With labels for interpretation
>>> labels = ['cat', 'dog', 'bird', 'fish', 'snake']
>>> sim_matrix = similarity_matrix(vectors, model, labels)
>>> # Most similar pair (excluding self-similarity)
>>> np.fill_diagonal(sim_matrix, -np.inf)
>>> i, j = np.unravel_index(np.argmax(sim_matrix), sim_matrix.shape)
>>> print(f"Most similar: {labels[i]} - {labels[j]}")

References

Standard analysis tool for VSA systems

CPSE/CPSD¶

Context-preserving encoding and decoding.

CPSE/CPSD utilities for context-preserving compositional encoding.

This module provides utilities for Context-Preserving SDR Encoding (CPSE) and Context-Preserving SDR Decoding (CPSD), which represent a superior evolution of Context-Dependent Thinning (CDT).

Key Features:

Order preservation via position permutations
Stable convergence (1.95% ± 0.15% error)
Fast convergence (4-5 iterations for M≥4 components)
Practical decoding methods (basic CPSD + Triadic Memory)

Based on:

Malits & Mendelson (2025) “Context-Preserving Encoding/Decoding of Compositional Structures”

References

Paper: Malits & Mendelson (2025) - CPSE/CPSD specifications GitHub: https://github.com/PeterOvermann/TriadicMemory

Mathematical Foundation:

Additive iterations: K ≈ log(1 - 1/M) / log(1 - M·p) [Eq. 8]
Subtractive iterations: Complex formula [Eq. 15]
Total: 4-5 iterations for M ≥ 4 (near-constant)

class holovec.utils.cpse.CPSEMetadata(n_components: int, permutation_seeds: List[int], base_seed: int = 42)[source]¶

Bases: object

Metadata for CPSE encoding operations.

Tracks permutation patterns, component structure, and encoding parameters for context-preserving operations. This metadata is essential for decoding and should be stored alongside encoded vectors.

The metadata enables:

Reconstruction of position permutations for decoding
Validation of convergence in encoding/decoding cycles
Deterministic reproduction of encoding operations

n_components¶: Number of components in composition (M)

permutation_seeds¶: Seeds for generating position-specific permutations

base_seed¶: Base seed for reproducibility

Examples

>>> # Create metadata for 5-component composition
>>> metadata = CPSEMetadata(
...     n_components=5,
...     permutation_seeds=[42, 43, 44, 45, 46],
...     base_seed=42
... )
>>>
>>> # Serialize for storage
>>> metadata.to_json('cpse_metadata.json')
>>>
>>> # Later, reload for decoding
>>> metadata = CPSEMetadata.from_json('cpse_metadata.json')

References

Malits & Mendelson (2025), Section 3.1: Position Encoding

Initialize CPSE metadata.

Parameters:

n_components – Number of components in composition (must be >= 2)
permutation_seeds – Seed for each position permutation (must have length == n_components)
base_seed – Base seed for reproducibility (default: 42)

Raises:

TypeError – If arguments are not correct types
ValueError – If n_components < 2 or permutation_seeds length mismatch

Examples

>>> # Minimal valid metadata
>>> metadata = CPSEMetadata(2, [42, 43])
>>>
>>> # Typical usage with 5 components
>>> seeds = generate_permutation_patterns(n_patterns=5)
>>> metadata = CPSEMetadata(5, seeds, base_seed=42)

__init__(n_components: int, permutation_seeds: List[int], base_seed: int = 42)[source]¶

Initialize CPSE metadata.

Parameters:

n_components – Number of components in composition (must be >= 2)
permutation_seeds – Seed for each position permutation (must have length == n_components)
base_seed – Base seed for reproducibility (default: 42)

Raises:

TypeError – If arguments are not correct types
ValueError – If n_components < 2 or permutation_seeds length mismatch

Examples

>>> # Minimal valid metadata
>>> metadata = CPSEMetadata(2, [42, 43])
>>>
>>> # Typical usage with 5 components
>>> seeds = generate_permutation_patterns(n_patterns=5)
>>> metadata = CPSEMetadata(5, seeds, base_seed=42)

to_dict() → Dict[str, Any][source]¶

Serialize metadata to dictionary.

Returns:: Dictionary with all metadata fields

Examples

>>> metadata = CPSEMetadata(3, [42, 43, 44])
>>> data = metadata.to_dict()
>>> print(data)
{'n_components': 3, 'permutation_seeds': [42, 43, 44], 'base_seed': 42}

classmethod from_dict(data: Dict[str, Any]) → CPSEMetadata[source]¶

Deserialize metadata from dictionary.

Parameters:

data – Dictionary with metadata fields

Returns:

CPSEMetadata instance

Raises:

KeyError – If required fields are missing
TypeError/ValueError – If field values are invalid

Examples

>>> data = {'n_components': 3, 'permutation_seeds': [42, 43, 44], 'base_seed': 42}
>>> metadata = CPSEMetadata.from_dict(data)
>>> print(metadata.n_components)
3

to_json(path: str)[source]¶

Save metadata to JSON file.

Parameters:: path – File path for saving

Examples

>>> metadata = CPSEMetadata(3, [42, 43, 44])
>>> metadata.to_json('my_cpse_metadata.json')

classmethod from_json(path: str) → CPSEMetadata[source]¶

Load metadata from JSON file.

Parameters:

path – File path for loading

Returns:

CPSEMetadata instance

Raises:

FileNotFoundError – If file doesn’t exist
json.JSONDecodeError – If file is not valid JSON
KeyError – If required fields are missing

Examples

>>> metadata = CPSEMetadata.from_json('my_cpse_metadata.json')
>>> print(metadata.n_components)
3

__repr__() → str[source]¶: String representation of metadata.

__eq__(other: object) → bool[source]¶: Check equality with another CPSEMetadata instance.

holovec.utils.cpse.generate_permutation_patterns(n_patterns: int, base_seed: int = 42) → List[int][source]¶

Generate permutation seeds for CPSE encoding.

Creates deterministic permutation seeds for position-dependent thinning operations. Each seed generates a unique permutation matrix used to encode position information.

The seeds are generated as: [base_seed, base_seed+1, …, base_seed+n-1]

Parameters:

n_patterns – Number of permutation patterns to generate
base_seed – Base random seed (default: 42)

Returns:

List of permutation seeds (length == n_patterns)

Raises:

TypeError – If arguments are not correct types
ValueError – If n_patterns < 1

Examples

>>> # Generate seeds for 5-component composition
>>> seeds = generate_permutation_patterns(n_patterns=5)
>>> print(seeds)
[42, 43, 44, 45, 46]
>>>
>>> # Generate with custom base seed
>>> seeds = generate_permutation_patterns(n_patterns=3, base_seed=100)
>>> print(seeds)
[100, 101, 102]

References

Malits & Mendelson (2025), Section 3.1: Position Encoding - Each position i gets permutation p̃ᵢ derived from seed[i] - Deterministic generation ensures reproducibility

holovec.utils.cpse.validate_cpse_convergence(original_components: List[Any], decoded_components: List[Any], model: VSAModel, threshold: float = 0.95) → Tuple[bool, List[float]][source]¶

Validate CPSE decoding convergence.

Checks if decoded components are sufficiently similar to originals by computing pairwise similarities and comparing against a threshold. This is essential for verifying that the encoding-decoding cycle preserves information.

Typical convergence rates (Malits & Mendelson 2025, Table 1):

Basic CPSD: 95-98% similarity for M=2-5 components
With Triadic Memory: 97-99% similarity
Target threshold: 0.95 (95%) is conservative

Parameters:

original_components – Original component hypervectors (length M)
decoded_components – Decoded component hypervectors (length M)
model – VSA model for similarity computation
threshold – Minimum acceptable similarity (default: 0.95)

Returns:

converged (bool): True if all similarities >= threshold
similarities (List[float]): Similarity for each component pair

Return type:

Tuple of

Raises:

TypeError – If arguments are not correct types
ValueError – If component lists have different lengths

Examples

>>> # Validate decoding with strict threshold
>>> converged, sims = validate_cpse_convergence(
...     original_components=originals,
...     decoded_components=decoded,
...     model=model,
...     threshold=0.95
... )
>>> if converged:
...     print(f"Converged! Avg similarity: {np.mean(sims):.3f}")
... else:
...     print(f"Failed to converge. Min similarity: {min(sims):.3f}")
>>>
>>> # More lenient threshold for noisy conditions
>>> converged, sims = validate_cpse_convergence(
...     originals, decoded, model, threshold=0.90
... )

References

Malits & Mendelson (2025), Section 4: Experimental Results - Table 1 shows typical convergence rates for different M - Figure 3 demonstrates convergence behavior

Utilities¶

Search¶

Operations¶

CPSE/CPSD¶

See Also¶