Encoders¶

Encoders for different data types.

Base Encoder¶

class holovec.encoders.Encoder(model: VSAModel)[source]¶

Bases: ABC

Abstract base class for all encoders.

Encoders transform data into hypervectors compatible with VSA models. They follow the principle of locality preservation: similar inputs should map to similar hypervectors.

model¶: VSA model instance used for vector operations

backend¶: Backend instance (inherited from model)

dimension¶: Dimensionality of hypervectors (inherited from model)

Initialize encoder with a VSA model.

Parameters:: model – VSA model instance to use for operations
Raises:: ValueError – If model is not compatible with this encoder

__init__(model: VSAModel)[source]¶

Initialize encoder with a VSA model.

Parameters:: model – VSA model instance to use for operations
Raises:: ValueError – If model is not compatible with this encoder

abstractmethod encode(data: Any) → Any[source]¶

Encode data into hypervector.

Parameters:: data – Input data (type depends on encoder)
Returns:: Hypervector representation of shape (dimension,)
Raises:: ValueError – If data is invalid for this encoder

encode_batch(data_list: List[Any]) → List[Any][source]¶

Encode multiple data points.

Default implementation encodes each item individually. Subclasses may override for more efficient batch encoding.

Parameters:: data_list – List of data items to encode
Returns:: List of encoded hypervectors

abstractmethod decode(hypervector: Any) → Any[source]¶

Decode hypervector back to data (if possible).

Parameters:: hypervector – Hypervector to decode, shape (dimension,)
Returns:: Decoded data, or None if encoder is not reversible
Raises:: NotImplementedError – If encoder does not support decoding

abstract property is_reversible: bool¶

Whether this encoder supports decoding.

Returns:: True if decode() is implemented and functional, False otherwise

abstract property compatible_models: List[str]¶

List of compatible VSA model names.

Returns:: List of model names (e.g., [‘FHRR’, ‘HRR’])

abstract property input_type: str¶

Description of expected input type.

Returns:: Human-readable string describing input type (e.g., “scalar float”, “sequence of symbols”, “2D array”)

__repr__() → str[source]¶: String representation of encoder.

Scalar Encoders¶

class holovec.encoders.FractionalPowerEncoder(model: VSAModel, min_val: float, max_val: float, bandwidth: float = 1.0, seed: int | None = None, phase_dist: str = 'uniform', mixture_bandwidths: List[float] | None = None, mixture_weights: List[float] | None = None)[source]¶

Bases: ScalarEncoder

Fractional Power Encoding (FPE) for continuous scalars.

Based on Frady et al. (2021) “Computing on Functions Using Randomized Vector Representations”. Encodes scalars by exponentiating a random phasor base vector: encode(x) = φ^x.

The inner product between encoded vectors approximates a similarity kernel (sinc for uniform phase distribution). This encoding preserves linearity and enables precise decoding via sinc kernel reconstruction.

Works best with FHRR (complex domain) but also supports HRR (real domain).

References

Frady et al. (2021): https://arxiv.org/abs/2109.03429 Verges et al. (2025): Learning encoding phasors with FPE

bandwidth¶: Controls kernel width (lower = wider kernel)

base_phasor¶: Random phasor vector φ = [e^(iφ₁), …, e^(iφₙ)]

Initialize FractionalPowerEncoder.

Parameters:

model (VSAModel) – VSA model (FHRR or HRR). FHRR (complex-valued) is preferred for exact fractional powers. HRR (real-valued) uses cosine projection.
min_val (float) – Minimum value of encoding range. Values below this will be clipped.
max_val (float) – Maximum value of encoding range. Values above this will be clipped.
bandwidth (float, optional) –
Bandwidth parameter β controlling kernel width (default: 1.0).

Mathematical Role: - Encoding: z(x) = φ^(β·x_normalized) - Kernel: K(x₁, x₂) ≈ sinc(β·π·|x₁ - x₂|) for uniform phase distribution - Smaller β → wider kernel → more generalization - Larger β → narrower kernel → more discrimination

Typical Values: - β = 0.01: Wide kernel, high generalization (classification) - β = 1.0: Medium kernel (default) - β = 10.0: Narrow kernel, low generalization (regression)
seed (int or None, optional) – Random seed for generating base phasor (for reproducibility). Different seeds produce different random frequency vectors θ.
phase_dist (str, optional) –
Distribution for sampling frequency vector θ (default: ‘uniform’).

Available Distributions: - ‘uniform’: θⱼ ~ Uniform[-π, π] → sinc kernel (default) - ‘gaussian’: θⱼ ~ N(0, 1) → Gaussian kernel approximation - ‘laplace’: θⱼ ~ Laplace(0, 1) → Exponential kernel, heavy tails - ‘cauchy’: θⱼ ~ Cauchy(0, 1) → Very heavy tails, long-range - ‘student’: θⱼ ~ Student-t(df=3) → Moderate tails, robust

Different distributions induce different similarity kernels, affecting generalization properties.
mixture_bandwidths (List[float] or None, optional) –
List of K bandwidth values [β₁, β₂, …, βₖ] for mixture encoding.

Mixture Encoding: Instead of single bandwidth β, use weighted combination:

z_mix(x) = Σₖ αₖ · φ^(βₖ·x)

where αₖ are mixture_weights. This creates multi-scale representation combining coarse (small β) and fine (large β) kernels.

Example: mixture_bandwidths = [0.01, 0.1, 1.0, 10.0] # 4 scales Creates encoding with both local and global similarity.
mixture_weights (List[float] or None, optional) –
Weights αₖ for each bandwidth in mixture (must sum to 1).

If None and mixture_bandwidths is provided, uses uniform weights:
αₖ = 1/K for all k

Weights can be: 1. Hand-crafted (domain knowledge) 2. Learned via learn_mixture_weights() (ridge regression) 3. Uniform (default)

Raises:

ValueError – If phase_dist not in valid set, or if mixture_weights/mixture_bandwidths have mismatched lengths.

Notes

Mathematical Foundation:

Fractional Power Encoding maps scalar x to hypervector via:: z(x) = φ^(β·x_normalized)

where: - φ = [e^(iθ₁), e^(iθ₂), …, e^(iθₐ)] is base phasor (D dimensions) - θⱼ are random frequencies sampled from phase_dist - x_normalized ∈ [0, 1] is x mapped to unit interval - β is bandwidth parameter

Inner Product Kernel:

For uniform phase distribution θⱼ ~ Uniform[-π, π]:: ⟨z(x₁), z(x₂)⟩ / D ≈ sinc(β·π·|x₁ - x₂|)

This sinc kernel has important properties: - Smooth interpolation between similar values - Exact at x₁ = x₂ (similarity = 1) - Decreases monotonically with distance - Zero-crossings at integer multiples of 1/β

Comparison to Random Fourier Features:

FPE is equivalent to Random Fourier Features (Rahimi & Recht, 2007) for kernel approximation:

k(x₁, x₂) ≈ φ(x₁)ᵀφ(x₂) / D

where φ(x) = [cos(θ₁x), sin(θ₁x), …, cos(θₐx), sin(θₐx)]

For complex hypervectors, FPE uses complex exponentials instead:: φ(x) = [e^(iθ₁x), e^(iθ₂x), …, e^(iθₐx)]

which provides more compact representation and supports exact fractional power operations in frequency domain.

References

Frady et al. (2021): “Computing on Functions Using Randomized Vector Representations” - Original FPE paper
Rahimi & Recht (2007): “Random Features for Large-Scale Kernel Machines”
Sutherland & Schneider (2015): “On the Error of Random Fourier Features”
Verges et al. (2025): “Learning Encoding Phasors with Fractional Power Encoding”

Examples

>>> # Basic FPE for temperature encoding
>>> model = VSA.create('FHRR', dim=10000)
>>> encoder = FractionalPowerEncoder(model, min_val=0, max_val=100)
>>> temp_25 = encoder.encode(25.0)
>>> temp_26 = encoder.encode(26.0)
>>> similarity = model.similarity(temp_25, temp_26)  # ≈ 0.95

>>> # Multi-scale mixture encoding
>>> encoder_mix = FractionalPowerEncoder(
...     model, min_val=0, max_val=100,
...     mixture_bandwidths=[0.01, 0.1, 1.0, 10.0],
...     mixture_weights=[0.4, 0.3, 0.2, 0.1]  # Emphasize coarse scales
... )

>>> # Alternative kernel via phase distribution
>>> encoder_gauss = FractionalPowerEncoder(
...     model, min_val=0, max_val=100,
...     phase_dist='gaussian'  # Gaussian kernel instead of sinc
... )

__init__(model: VSAModel, min_val: float, max_val: float, bandwidth: float = 1.0, seed: int | None = None, phase_dist: str = 'uniform', mixture_bandwidths: List[float] | None = None, mixture_weights: List[float] | None = None)[source]¶

Initialize FractionalPowerEncoder.

Parameters:

model (VSAModel) – VSA model (FHRR or HRR). FHRR (complex-valued) is preferred for exact fractional powers. HRR (real-valued) uses cosine projection.
min_val (float) – Minimum value of encoding range. Values below this will be clipped.
max_val (float) – Maximum value of encoding range. Values above this will be clipped.
bandwidth (float, optional) –
Bandwidth parameter β controlling kernel width (default: 1.0).

Mathematical Role: - Encoding: z(x) = φ^(β·x_normalized) - Kernel: K(x₁, x₂) ≈ sinc(β·π·|x₁ - x₂|) for uniform phase distribution - Smaller β → wider kernel → more generalization - Larger β → narrower kernel → more discrimination

Typical Values: - β = 0.01: Wide kernel, high generalization (classification) - β = 1.0: Medium kernel (default) - β = 10.0: Narrow kernel, low generalization (regression)
seed (int or None, optional) – Random seed for generating base phasor (for reproducibility). Different seeds produce different random frequency vectors θ.
phase_dist (str, optional) –
Distribution for sampling frequency vector θ (default: ‘uniform’).

Available Distributions: - ‘uniform’: θⱼ ~ Uniform[-π, π] → sinc kernel (default) - ‘gaussian’: θⱼ ~ N(0, 1) → Gaussian kernel approximation - ‘laplace’: θⱼ ~ Laplace(0, 1) → Exponential kernel, heavy tails - ‘cauchy’: θⱼ ~ Cauchy(0, 1) → Very heavy tails, long-range - ‘student’: θⱼ ~ Student-t(df=3) → Moderate tails, robust

Different distributions induce different similarity kernels, affecting generalization properties.
mixture_bandwidths (List[float] or None, optional) –
List of K bandwidth values [β₁, β₂, …, βₖ] for mixture encoding.

Mixture Encoding: Instead of single bandwidth β, use weighted combination:

z_mix(x) = Σₖ αₖ · φ^(βₖ·x)

where αₖ are mixture_weights. This creates multi-scale representation combining coarse (small β) and fine (large β) kernels.

Example: mixture_bandwidths = [0.01, 0.1, 1.0, 10.0] # 4 scales Creates encoding with both local and global similarity.
mixture_weights (List[float] or None, optional) –
Weights αₖ for each bandwidth in mixture (must sum to 1).

If None and mixture_bandwidths is provided, uses uniform weights:
αₖ = 1/K for all k

Weights can be: 1. Hand-crafted (domain knowledge) 2. Learned via learn_mixture_weights() (ridge regression) 3. Uniform (default)

Raises:

ValueError – If phase_dist not in valid set, or if mixture_weights/mixture_bandwidths have mismatched lengths.

Notes

Mathematical Foundation:

Fractional Power Encoding maps scalar x to hypervector via:: z(x) = φ^(β·x_normalized)

Inner Product Kernel:

For uniform phase distribution θⱼ ~ Uniform[-π, π]:: ⟨z(x₁), z(x₂)⟩ / D ≈ sinc(β·π·|x₁ - x₂|)

Comparison to Random Fourier Features:

FPE is equivalent to Random Fourier Features (Rahimi & Recht, 2007) for kernel approximation:

k(x₁, x₂) ≈ φ(x₁)ᵀφ(x₂) / D

where φ(x) = [cos(θ₁x), sin(θ₁x), …, cos(θₐx), sin(θₐx)]

For complex hypervectors, FPE uses complex exponentials instead:: φ(x) = [e^(iθ₁x), e^(iθ₂x), …, e^(iθₐx)]

which provides more compact representation and supports exact fractional power operations in frequency domain.

References

Frady et al. (2021): “Computing on Functions Using Randomized Vector Representations” - Original FPE paper
Rahimi & Recht (2007): “Random Features for Large-Scale Kernel Machines”
Sutherland & Schneider (2015): “On the Error of Random Fourier Features”
Verges et al. (2025): “Learning Encoding Phasors with Fractional Power Encoding”

Examples

>>> # Basic FPE for temperature encoding
>>> model = VSA.create('FHRR', dim=10000)
>>> encoder = FractionalPowerEncoder(model, min_val=0, max_val=100)
>>> temp_25 = encoder.encode(25.0)
>>> temp_26 = encoder.encode(26.0)
>>> similarity = model.similarity(temp_25, temp_26)  # ≈ 0.95

>>> # Multi-scale mixture encoding
>>> encoder_mix = FractionalPowerEncoder(
...     model, min_val=0, max_val=100,
...     mixture_bandwidths=[0.01, 0.1, 1.0, 10.0],
...     mixture_weights=[0.4, 0.3, 0.2, 0.1]  # Emphasize coarse scales
... )

>>> # Alternative kernel via phase distribution
>>> encoder_gauss = FractionalPowerEncoder(
...     model, min_val=0, max_val=100,
...     phase_dist='gaussian'  # Gaussian kernel instead of sinc
... )

encode(value: float) → Any[source]¶

Encode scalar value to hypervector using fractional power.

Parameters:: value (float) – Scalar value to encode. Will be clipped to [min_val, max_val].
Returns:: Encoded hypervector of shape (dimension,) in backend format.
Return type:: Array

Notes

Single Bandwidth Encoding:

For single bandwidth β, implements:: z(x) = φ^(β·x_normalized)

where: - x_normalized = (value - min_val) / (max_val - min_val) ∈ [0, 1] - φ = [e^(iθ₁), …, e^(iθₐ)] is base phasor with random frequencies θⱼ - Result is normalized according to model’s space

Element-wise computation:: z_j(x) = e^(i·θⱼ·β·x_normalized) (complex models) z_j(x) = cos(θⱼ·β·x_normalized) (real models)

Mixture Encoding:

When mixture_bandwidths = [β₁, …, βₖ] is provided, uses weighted sum:: z_mix(x) = Σₖ αₖ · φ^(βₖ·x_normalized)

where αₖ are mixture_weights (default: uniform αₖ = 1/K).

Advantages of Mixture Encoding:

Multi-Scale Representation: Combines coarse (small β) and fine (large β) similarity kernels in single hypervector
Improved Generalization: Coarse scales provide robustness, fine scales provide discrimination
Learned Weights: Weights αₖ can be learned via learn_mixture_weights() to optimize for specific task
Kernel Combination: Mixture is equivalent to combining multiple kernels: K_mix(d) = Σₖ αₖ·K_βₖ(d)

Computational Complexity:

Single bandwidth: O(D) operations (element-wise exponential)
Mixture with K bandwidths: O(K·D) operations
Backend operations (exp, multiply) are vectorized/GPU-accelerated

Normalization:

Output is normalized using model’s normalization scheme: - FHRR/HRR: L2 normalization (unit norm) - MAP: Element-wise normalization - BSC/BSDC: No normalization (binary)

This ensures hypervectors are in valid space for subsequent binding/bundling operations.

Examples

>>> # Basic encoding
>>> model = VSA.create('FHRR', dim=10000)
>>> encoder = FractionalPowerEncoder(model, min_val=0, max_val=100)
>>> hv_25 = encoder.encode(25.0)  # Encode temperature 25°C
>>> hv_26 = encoder.encode(26.0)
>>> similarity = model.similarity(hv_25, hv_26)
>>> print(f"Similarity: {similarity:.3f}")  # ≈ 0.950 (close values)

>>> # Mixture encoding for multi-scale representation
>>> encoder_mix = FractionalPowerEncoder(
...     model, min_val=0, max_val=100,
...     mixture_bandwidths=[0.01, 1.0, 100.0]
... )
>>> hv_mix = encoder_mix.encode(25.0)  # Combines 3 scales

>>> # Effect of bandwidth on similarity
>>> enc_wide = FractionalPowerEncoder(model, 0, 100, bandwidth=0.1)
>>> enc_narrow = FractionalPowerEncoder(model, 0, 100, bandwidth=10.0)
>>> sim_wide = model.similarity(enc_wide.encode(25), enc_wide.encode(30))
>>> sim_narrow = model.similarity(enc_narrow.encode(25), enc_narrow.encode(30))
>>> # sim_wide > sim_narrow (wider kernel → more generalization)

decode(hypervector: Any, resolution: int = 1000, max_iterations: int = 100, tolerance: float = 1e-06) → float[source]¶

Decode hypervector back to scalar value using two-stage optimization.

Parameters:

hypervector (Array) – Hypervector to decode (typically a noisy/bundled encoding).
resolution (int, optional) – Number of grid points for coarse search (default: 1000). Higher resolution improves initial guess but increases cost.
max_iterations (int, optional) – Maximum gradient descent iterations (default: 100). Typical convergence: 20-50 iterations.
tolerance (float, optional) – Convergence tolerance for gradient descent (default: 1e-6). Stop when |Δx| < tolerance.

Returns:

Decoded scalar value in [min_val, max_val].

Return type:

float

Notes

Decoding Algorithm:

Uses two-stage optimization to find value x maximizing similarity:: x* = argmax_x ⟨encode(x), hypervector⟩

Stage 1: Coarse Grid Search (O(resolution · D)) - Evaluate similarity at resolution uniformly-spaced points - Find x₀ with highest similarity - Provides good initialization for gradient descent

Stage 2: Gradient Descent (O(max_iterations · D)) - Starting from x₀, perform gradient ascent:

x_{t+1} = x_t + η_t · ∇_x ⟨encode(x_t), hypervector⟩

Gradient computed via finite differences:
∇_x ≈ (sim(x + ε) - sim(x)) / ε
Step size η_t decays: η_t = η_0 · 0.95^t (prevents oscillation)
Clips updates to [0, 1] normalized range

Why This Works:

For FPE with sinc kernel K(x₁, x₂) = sinc(β·π·|x₁ - x₂|): - Similarity function is unimodal (single peak) - Peak occurs at x = x_true (encoded value) - Gradient descent converges to global maximum

However, for noisy hypervectors (e.g., bundled encodings): - Multiple local maxima may exist - Coarse search reduces chance of local minimum trap - Wider kernels (small β) → smoother objective → easier optimization

Approximation Quality:

Decoding accuracy depends on several factors:

Dimension D: Higher D → more accurate encoding → better decoding - D = 1000: Moderate accuracy (similarity ≈ 0.85) - D = 10000: High accuracy (similarity ≈ 0.99)
Signal-to-Noise Ratio: Clean encoding vs bundled/noisy - Clean: Near-perfect recovery (error < 1%) - Bundled (10 items): Good recovery (error ≈ 5-10%) - Bundled (100 items): Degraded (error ≈ 20-30%)
Bandwidth β: Wider kernels → smoother similarity landscape - β = 0.01: Very smooth, easy to optimize - β = 10.0: Narrow kernel, may have local maxima
Mixture Encoding: Multiple bandwidths complicate landscape - May require finer grid search (higher resolution) - May need more gradient descent iterations

Computational Cost:

Total operations: O(resolution · D + max_iterations · D)

Typical values: - resolution = 1000, max_iterations = 100, D = 10000 - Total: ~1.1M evaluations - Runtime: ~0.1-1.0 seconds (CPU), ~0.01-0.1 seconds (GPU)

For real-time applications, reduce resolution or max_iterations: - resolution = 100 (coarser search) - max_iterations = 20 (early stopping)

Comparison to Other Decoders:

Codebook Lookup (LevelEncoder): O(K · D) for K levels Faster but discrete, no interpolation
Resonator Network (cleanup): O(iterations · M · D) for M items Better for structured/compositional decoding
FPE Gradient Descent: O(resolution · D + iterations · D) Best for continuous scalar recovery

References

Frady et al. (2021): “Computing on Functions Using Randomized Vector Representations” - Section on FPE decoding
Nocedal & Wright (2006): “Numerical Optimization” - Gradient descent methods and convergence analysis

Examples

>>> # Basic decoding
>>> model = VSA.create('FHRR', dim=10000)
>>> encoder = FractionalPowerEncoder(model, min_val=0, max_val=100)
>>> hv = encoder.encode(25.0)
>>> decoded = encoder.decode(hv)
>>> print(f"Decoded: {decoded:.2f}")  # ≈ 25.00

>>> # Decoding noisy hypervector (bundled encoding)
>>> hv_bundle = model.bundle([encoder.encode(25.0), encoder.encode(26.0)])
>>> decoded_bundle = encoder.decode(hv_bundle)
>>> print(f"Decoded bundle: {decoded_bundle:.2f}")  # ≈ 25.5

>>> # Fast decoding (lower resolution/iterations)
>>> decoded_fast = encoder.decode(hv, resolution=100, max_iterations=20)

property is_reversible: bool¶: FPE supports approximate decoding.

property compatible_models: List[str]¶: FPE works best with FHRR, also compatible with HRR.

__repr__() → str[source]¶: String representation.

learn_mixture_weights(values: List[float], labels: List[int], reg: float = 0.001) → List[float][source]¶

Learn mixture weights (alphas) for fixed mixture_bandwidths using a simple ridge-style objective that aligns encoded mixtures to per-class prototypes.

Approach:

Build class prototypes p_c as the mean of current encodings (using current weights)
For each sample i, compute per-band encodings E_i = [e_{i1},…,e_{iK}] (shape d×K)
Solve (Σ E_i^T E_i + reg I) α = Σ E_i^T p_{y_i}
Project α onto simplex (nonnegative, sum=1)

Parameters:

values – list of scalar inputs
labels – list of integer class labels (same length as values)
reg – L2 regularization strength (default 1e-3)

Returns:

Learned mixture weights (list of floats summing to 1)

Notes

Requires mixture_bandwidths to be set (K>=2)
Uses numpy for solving normal equations; backend remains unchanged

class holovec.encoders.ThermometerEncoder(model: VSAModel, min_val: float, max_val: float, n_bins: int = 100, seed: int | None = None)[source]¶

Bases: ScalarEncoder

Thermometer encoding for scalar values.

Divides value range into N bins and encodes a value as the bundle of all bins it exceeds. Creates monotonic similarity profile.

Simpler and more robust than FPE, but with coarser granularity. Works with all VSA models.

References

Kanerva (2009): “Hyperdimensional Computing”

Initialize ThermometerEncoder.

Parameters:

model – VSA model (any)
min_val – Minimum value of encoding range
max_val – Maximum value of encoding range
n_bins – Number of bins to divide range into (default 100)
seed – Random seed for generating bin vectors

Raises:

ValueError – If n_bins < 2

__init__(model: VSAModel, min_val: float, max_val: float, n_bins: int = 100, seed: int | None = None)[source]¶

Initialize ThermometerEncoder.

Parameters:

model – VSA model (any)
min_val – Minimum value of encoding range
max_val – Maximum value of encoding range
n_bins – Number of bins to divide range into (default 100)
seed – Random seed for generating bin vectors

Raises:

ValueError – If n_bins < 2

encode(value: float) → Any[source]¶

Encode scalar as bundle of all bins it exceeds.

Parameters:: value – Scalar value to encode
Returns:: Encoded hypervector (bundle of activated bins)

decode(hypervector: Any) → float[source]¶

Decode is not implemented for ThermometerEncoder.

Thermometer encoding is not easily reversible without storing additional information.

Raises:: NotImplementedError – Always raises

property is_reversible: bool¶: Thermometer encoding is not reversible.

property compatible_models: List[str]¶: Works with all VSA models.

__repr__() → str[source]¶: String representation.

class holovec.encoders.LevelEncoder(model: VSAModel, min_val: float, max_val: float, n_levels: int, seed: int | None = None)[source]¶

Bases: ScalarEncoder

Level (codebook) encoding for discrete scalar values.

Maps discrete levels to random orthogonal vectors via lookup table. Fast (O(1) encode/decode) and exact for discrete values.

Best used when you have a small number of discrete values rather than continuous range.

Example

>>> # Encode weekdays (7 discrete values)
>>> model = VSA.create('FHRR', dim=10000)
>>> encoder = LevelEncoder(model, min_val=0, max_val=6, n_levels=7)
>>> monday = encoder.encode(0)  # Exact encoding
>>> friday = encoder.encode(4)

Initialize LevelEncoder.

Parameters:

model – VSA model (any)
min_val – Minimum value (corresponds to level 0)
max_val – Maximum value (corresponds to level n_levels-1)
n_levels – Number of discrete levels
seed – Random seed for generating level vectors

Raises:

ValueError – If n_levels < 2

__init__(model: VSAModel, min_val: float, max_val: float, n_levels: int, seed: int | None = None)[source]¶

Initialize LevelEncoder.

Parameters:

model – VSA model (any)
min_val – Minimum value (corresponds to level 0)
max_val – Maximum value (corresponds to level n_levels-1)
n_levels – Number of discrete levels
seed – Random seed for generating level vectors

Raises:

ValueError – If n_levels < 2

encode(value: float) → Any[source]¶

Encode scalar to nearest level’s hypervector.

Parameters:: value – Scalar value to encode
Returns:: Hypervector corresponding to nearest level

decode(hypervector: Any) → float[source]¶

Decode hypervector to nearest level value.

Parameters:: hypervector – Hypervector to decode
Returns:: Decoded scalar value (will be one of the discrete levels)

property is_reversible: bool¶: Level encoding is reversible (to nearest level).

property compatible_models: List[str]¶: Works with all VSA models.

__repr__() → str[source]¶: String representation.

Sequence Encoders¶

class holovec.encoders.PositionBindingEncoder(model: VSAModel, codebook: Dict[str, Any] | None = None, max_length: int | None = None, auto_generate: bool = True, seed: int | None = None)[source]¶

Bases: SequenceEncoder

Position binding encoder for sequences using permutation-based positions.

Based on Plate (2003) “Holographic Reduced Representations” and Schlegel et al. (2021) “A comparison of vector symbolic architectures”.

Encodes sequences by binding each element with a position-specific permutation of a base position vector:

encode([A, B, C]) = bind(A, ρ¹) + bind(B, ρ²) + bind(C, ρ³)

where ρ is the permutation operation and ρⁱ represents i applications.

This encoding is: - Order-sensitive: Different positions create different bindings - Variable-length: Works with any sequence length - Partial-match capable: Similar sequences have similar encodings

codebook¶: Dictionary mapping symbols to hypervectors

auto_generate¶: Whether to auto-generate vectors for unknown symbols

seed_offset¶: Offset for generating consistent symbol vectors

Example

>>> model = VSA.create('MAP', dim=10000)
>>> encoder = PositionBindingEncoder(model)
>>>
>>> # Encode a sequence of symbols
>>> seq = ['hello', 'world', '!']
>>> hv = encoder.encode(seq)
>>>
>>> # Similar sequences have high similarity
>>> seq2 = ['hello', 'world']
>>> hv2 = encoder.encode(seq2)
>>> model.similarity(hv, hv2)  # High (shared prefix)

Initialize position binding encoder.

Parameters:

model – VSA model instance
codebook – Pre-defined symbol → hypervector mapping (optional)
max_length – Maximum sequence length (None for unlimited)
auto_generate – Auto-generate vectors for unknown symbols (default: True)
seed – Random seed for generating symbol vectors

Raises:

ValueError – If model is not compatible

__init__(model: VSAModel, codebook: Dict[str, Any] | None = None, max_length: int | None = None, auto_generate: bool = True, seed: int | None = None)[source]¶

Initialize position binding encoder.

Parameters:

model – VSA model instance
codebook – Pre-defined symbol → hypervector mapping (optional)
max_length – Maximum sequence length (None for unlimited)
auto_generate – Auto-generate vectors for unknown symbols (default: True)
seed – Random seed for generating symbol vectors

Raises:

ValueError – If model is not compatible

encode(sequence: List[str | int]) → Any[source]¶

Encode sequence using position binding.

Each element is bound with a position-specific permutation and all bound pairs are bundled:

result = Σᵢ bind(element_i, permute(position_vector, i))

Parameters:

sequence – List of symbols (strings or integers) to encode

Returns:

Hypervector representing the sequence

Raises:

ValueError – If sequence is empty
ValueError – If sequence exceeds max_length
ValueError – If symbol not in codebook and auto_generate=False

Example

>>> encoder.encode(['cat', 'sat', 'on', 'mat'])

decode(hypervector: Any, max_positions: int = 10, threshold: float = 0.3) → List[str][source]¶

Decode sequence hypervector to recover symbols.

Uses cleanup memory approach: for each position, unpermute and find most similar symbol in codebook.

Parameters:

hypervector – Sequence hypervector to decode
max_positions – Maximum positions to try decoding (default: 10)
threshold – Minimum similarity threshold for valid symbols (default: 0.3)

Returns:

List of decoded symbols (may be shorter than original)

Raises:

RuntimeError – If codebook is empty

Note

Decoding is approximate and works best for sequences shorter than max_positions with high SNR.

Example

>>> encoded = encoder.encode(['a', 'b', 'c'])
>>> decoded = encoder.decode(encoded, max_positions=5)
>>> decoded  # ['a', 'b', 'c'] (approximate)

add_symbol(symbol: str | int, vector: Any | None = None)[source]¶

Add a symbol to the codebook.

Parameters:

symbol – Symbol to add
vector – Hypervector to associate (generated if None)

Example

>>> # Pre-define a vector for a special symbol
>>> special_vec = model.random(seed=42)
>>> encoder.add_symbol('<START>', special_vec)

get_codebook_size() → int[source]¶

Get number of symbols in codebook.

Returns:: Number of symbols stored

property is_reversible: bool¶

PositionBindingEncoder supports approximate decoding.

Returns:: True (approximate decoding available)

property compatible_models: List[str]¶

Works with all VSA models that support permutation.

Returns:: List of all model names

__repr__() → str[source]¶: String representation.

class holovec.encoders.NGramEncoder(model: VSAModel, n: int = 2, stride: int = 1, mode: str = 'bundling', codebook: Dict[str, Any] | None = None, auto_generate: bool = True, seed: int | None = None)[source]¶

Bases: SequenceEncoder

N-gram encoder for capturing local sequence patterns using sliding windows.

Based on Plate (2003), Rachkovskij (1996), and Kleyko et al. (2023) Section 3.3.4.

Encodes sequences by extracting n-grams (sliding windows of n consecutive symbols) and encoding each n-gram compositionally:

For sequence [A, B, C, D] with n=2, stride=1: - Extract n-grams: [A,B], [B,C], [C,D] - Encode each n-gram using position binding - Combine via bundling or chaining

Two encoding modes:

Bundling mode (bag-of-ngrams): encode(seq) = bundle([encode_ngram([A,B]), encode_ngram([B,C]), …]) - Order-invariant across n-grams (but preserves within n-gram) - Good for classification (e.g., text categorization) - Similar to bag-of-words but with local context
Chaining mode (ordered n-grams): encode(seq) = Σᵢ bind(encode_ngram(ngramᵢ), ρⁱ) - Order-sensitive across n-grams - Good for sequence matching - Enables partial decoding

n¶: Size of n-grams (1=unigrams, 2=bigrams, 3=trigrams, etc.)

stride¶: Step size between n-grams (1=overlapping, n=non-overlapping)

mode¶: ‘bundling’ or ‘chaining’

ngram_encoder¶: Internal PositionBindingEncoder for individual n-grams

Example

>>> model = VSA.create('MAP', dim=10000)
>>> encoder = NGramEncoder(model, n=2, stride=1, mode='bundling')
>>>
>>> # Encode text as bigrams
>>> seq = ['the', 'cat', 'sat', 'on', 'mat']
>>> hv = encoder.encode(seq)  # Bigrams: [the,cat], [cat,sat], [sat,on], [on,mat]
>>>
>>> # Similar text has high similarity
>>> seq2 = ['the', 'cat', 'sat', 'on', 'hat']
>>> hv2 = encoder.encode(seq2)  # Shares 3/4 bigrams
>>> model.similarity(hv, hv2)  # High similarity

Initialize n-gram encoder.

Parameters:

model – VSA model instance
n – Size of n-grams (must be >= 1)
stride – Step between n-grams (must be >= 1)
mode – ‘bundling’ for bag-of-ngrams or ‘chaining’ for ordered n-grams
codebook – Optional pre-defined symbol → hypervector mapping
auto_generate – Auto-generate vectors for unknown symbols
seed – Random seed for symbol vector generation

Raises:

ValueError – If n < 1, stride < 1, or mode is invalid

__init__(model: VSAModel, n: int = 2, stride: int = 1, mode: str = 'bundling', codebook: Dict[str, Any] | None = None, auto_generate: bool = True, seed: int | None = None)[source]¶

Initialize n-gram encoder.

Parameters:

model – VSA model instance
n – Size of n-grams (must be >= 1)
stride – Step between n-grams (must be >= 1)
mode – ‘bundling’ for bag-of-ngrams or ‘chaining’ for ordered n-grams
codebook – Optional pre-defined symbol → hypervector mapping
auto_generate – Auto-generate vectors for unknown symbols
seed – Random seed for symbol vector generation

Raises:

ValueError – If n < 1, stride < 1, or mode is invalid

encode(sequence: List[str | int]) → Any[source]¶

Encode sequence using n-gram representation.

Extracts all n-grams using sliding window with specified stride, encodes each n-gram, then combines via bundling or chaining.

Parameters:: sequence – List of symbols to encode
Returns:: Hypervector representing the sequence as n-grams
Raises:: ValueError – If sequence is too short (length < n)

Example

>>> # Bigrams with stride=1 (overlapping)
>>> encoder = NGramEncoder(model, n=2, stride=1)
>>> encoder.encode(['A', 'B', 'C'])  # N-grams: AB, BC
>>>
>>> # Trigrams with stride=2 (partial overlap)
>>> encoder = NGramEncoder(model, n=3, stride=2)
>>> encoder.encode(['A', 'B', 'C', 'D', 'E'])  # N-grams: ABC, CDE

decode(hypervector: Any, max_ngrams: int = 10, threshold: float = 0.3) → List[List[str | int]][source]¶

Decode n-gram hypervector to recover n-grams.

Only supported for ‘chaining’ mode. For ‘bundling’ mode, n-grams are order-invariant and cannot be sequentially decoded.

Parameters:

hypervector – Encoded sequence hypervector
max_ngrams – Maximum number of n-grams to decode
threshold – Minimum similarity threshold for valid n-grams

Returns:

List of decoded n-grams, each as a list of symbols

Raises:

NotImplementedError – If mode is ‘bundling’ (not decodable)
RuntimeError – If codebook is empty

Example

>>> encoder = NGramEncoder(model, n=2, mode='chaining')
>>> hv = encoder.encode(['A', 'B', 'C'])
>>> decoder.decode(hv, max_ngrams=3)  # [['A', 'B'], ['B', 'C']]

get_codebook() → Dict[str, Any][source]¶

Get the internal symbol codebook.

Returns:: Dictionary mapping symbols to hypervectors

get_codebook_size() → int[source]¶

Get number of unique symbols in codebook.

Returns:: Number of symbols

property is_reversible: bool¶

NGramEncoder supports decoding only in ‘chaining’ mode.

Returns:: True if mode is ‘chaining’, False if ‘bundling’

property compatible_models: List[str]¶

Works with all VSA models.

Returns:: List of all model names

__repr__() → str[source]¶: String representation.

class holovec.encoders.TrajectoryEncoder(model: VSAModel, scalar_encoder: ScalarEncoder, n_dimensions: int = 1, time_range: Tuple[float, float] | None = None, seed: int | None = None)[source]¶

Bases: SequenceEncoder

Trajectory encoder for continuous sequences (time series, paths, motion).

Based on Frady et al. (2021) “Computing on Functions” and position binding from Plate (2003), encoding trajectories by binding temporal information with spatial positions.

A trajectory is a sequence of positions over time: - 1D: time series [v₁, v₂, v₃, …] - 2D: path [(x₁,y₁), (x₂,y₂), …] - 3D: motion [(x₁,y₁,z₁), (x₂,y₂,z₂), …]

Encoding strategy:

For each time step tᵢ with position pᵢ: 1. Encode time: time_hv = scalar_encode(tᵢ) 2. Encode position coords: coord_hvs = [scalar_encode(c) for c in pᵢ] 3. Bind coords to dimensions: pos_hv = Σⱼ bind(Dⱼ, coord_hv_j) 4. Bind time with position: point_hv = bind(time_hv, pos_hv) 5. Permute by index: indexed_hv = permute(point_hv, i)

trajectory_hv = Σᵢ indexed_hv

This creates an encoding that: - Preserves temporal ordering (via permutation) - Captures smooth trajectories (via continuous scalar encoding) - Enables partial matching and interpolation - Supports multi-dimensional paths

scalar_encoder¶: Encoder for continuous values (FPE or Thermometer)

n_dimensions¶: Dimensionality of trajectory (1D, 2D, or 3D)

time_range¶: (min_time, max_time) for temporal normalization

dim_vectors¶: Hypervectors for spatial dimensions (x, y, z)

Example

>>> from holovec import VSA
>>> from holovec.encoders import FractionalPowerEncoder, TrajectoryEncoder
>>>
>>> model = VSA.create('FHRR', dim=10000)
>>> scalar_enc = FractionalPowerEncoder(model, min_val=0, max_val=100)
>>> encoder = TrajectoryEncoder(model, scalar_encoder=scalar_enc, n_dimensions=2)
>>>
>>> # Encode a 2D path
>>> path = [(10, 20), (15, 25), (20, 30), (25, 35)]
>>> hv = encoder.encode(path)
>>>
>>> # Similar paths have high similarity
>>> path2 = [(10, 20), (15, 25), (20, 30), (25, 40)]  # Slightly different
>>> hv2 = encoder.encode(path2)
>>> model.similarity(hv, hv2)  # High similarity

Initialize trajectory encoder.

Parameters:

model – VSA model instance
scalar_encoder – Encoder for continuous values (FPE or Thermometer recommended)
n_dimensions – Trajectory dimensionality (1, 2, or 3)
time_range – (min, max) time values for normalization (optional)
seed – Random seed for dimension vector generation

Raises:

ValueError – If n_dimensions not in {1, 2, 3}
TypeError – If scalar_encoder is not reversible

__init__(model: VSAModel, scalar_encoder: ScalarEncoder, n_dimensions: int = 1, time_range: Tuple[float, float] | None = None, seed: int | None = None)[source]¶

Initialize trajectory encoder.

Parameters:

model – VSA model instance
scalar_encoder – Encoder for continuous values (FPE or Thermometer recommended)
n_dimensions – Trajectory dimensionality (1, 2, or 3)
time_range – (min, max) time values for normalization (optional)
seed – Random seed for dimension vector generation

Raises:

ValueError – If n_dimensions not in {1, 2, 3}
TypeError – If scalar_encoder is not reversible

encode(trajectory: List[float | Tuple[float, ...]]) → Any[source]¶

Encode a trajectory as a hypervector.

Each point in the trajectory is encoded with temporal information, then all points are combined with position-based permutation.

Parameters:: trajectory – List of points - 1D: List[float] e.g., [1.0, 2.5, 3.7, …] - 2D: List[Tuple[float, float]] e.g., [(1,2), (3,4), …] - 3D: List[Tuple[float, float, float]] e.g., [(1,2,3), …]
Returns:: Hypervector representing the trajectory
Raises:: ValueError – If trajectory is empty or points have wrong dimensionality

Example

>>> # 1D time series
>>> encoder_1d = TrajectoryEncoder(model, scalar_enc, n_dimensions=1)
>>> hv = encoder_1d.encode([1.0, 2.5, 3.7, 5.2])
>>>
>>> # 2D path
>>> encoder_2d = TrajectoryEncoder(model, scalar_enc, n_dimensions=2)
>>> hv = encoder_2d.encode([(0,0), (1,1), (2,2)])

decode(hypervector: Any, max_points: int = 10) → List[Tuple[float, ...]][source]¶

Decode trajectory hypervector to recover approximate points.

Note: Trajectory decoding is not yet implemented. It requires: 1. Unpermuting each position 2. Unbinding time from position 3. Unbinding each coordinate from dimension vectors 4. Decoding scalar values 5. Interpolation for smooth trajectories

Parameters:

hypervector – Encoded trajectory hypervector
max_points – Maximum points to decode

Returns:

List of decoded points (not implemented yet)

Raises:

NotImplementedError – Trajectory decoding requires solving nested binding inverse problem.

Notes

Trajectory decoding is not implemented because it requires multi-level unbinding with cascading error accumulation:

Mathematical Challenge:

The encoding process creates nested bindings:

trajectory_hv = bundle([: bind(time(t), bind(dimension(d), scalar(coord[t,d]))) for all t, d

])

To decode a single point at time t: 1. Unbind time: point_hv[t] = unbind(trajectory_hv, time(t)) 2. For each dimension d:

Unbind dimension: coord_hv[d] = unbind(point_hv[t], dimension(d))

Decode scalar: coord[t,d] = scalar_decode(coord_hv[d])

Why This Is Intractable:

Two-level unbinding: Time then dimension (or vice versa)
Error compounding: Each unbind adds noise
No known time points: Must search over possible time values
Interpolation complexity: Smooth trajectory requires dense sampling
Computational cost: * For T time points, D dimensions * Requires: T × D × (decode_iterations) evaluations * Example: 100 points × 3D × 100 iterations = 30,000 evals

Additional Challenges:

Order Ambiguity: Don’t know which time point comes first
Density Unknown: Don’t know temporal sampling rate
Dimension Count: Must know dimensionality a priori
Coordinate Ranges: Scalar decoder needs value bounds

Possible Approaches (Future Work):

Constrained Decoding: If time points are known: - Unbind each known time point - Decode coordinates independently - Complexity: O(T × D × decode_cost)
Template Matching: Pre-encode common trajectory patterns - Create codebook of canonical trajectories - Use cleanup to find nearest match - Works for classification, not reconstruction
Learned Decoder: Train neural network trajectory_hv → points - Requires large training dataset - Can learn to handle noise and ambiguity - See: Imani et al. (2019) for similar approach
Iterative Resonator: Use resonator cleanup at each level - Unbind time with resonator cleanup - Unbind dimension with resonator cleanup - Requires codebooks for both time and coordinates

Current Recommendation:

Use TrajectoryEncoder for one-way encoding in applications like: - Trajectory classification (gesture recognition, motion analysis) - Trajectory similarity search (find similar paths) - Trajectory clustering (group similar motions)

For reconstruction, consider storing original trajectories separately and using hypervector encoding only for similarity queries.

References

Plate (2003): “Holographic Reduced Representations” - Section 4.3 on error accumulation in multi-level binding
Räsänen & Saarinen (2016): “Sequence prediction with sparse distributed hyperdimensional coding” - Analysis of temporal binding

property is_reversible: bool¶

TrajectoryEncoder does not yet support decoding.

Returns:: False (decoding not implemented)

Note

Decoding requires multi-level unbinding and interpolation, which will be implemented in a future version.

property compatible_models: List[str]¶

Works with all VSA models.

Returns:: List of all model names

property input_type: str¶: Input type description.

__repr__() → str[source]¶: String representation.

Spatial Encoders¶

class holovec.encoders.ImageEncoder(model: VSAModel, scalar_encoder: ScalarEncoder, normalize_pixels: bool = True, seed: int | None = None)[source]¶

Bases: Encoder

Image encoder for 2D images (grayscale, RGB, or RGBA).

Encodes images by binding spatial positions (x, y) with pixel values. For color images, each channel is bound to a channel dimension vector before being combined with position information.

Encoding strategy:

For each pixel at position (x, y) with value v: 1. Encode position: pos_hv = bundle([bind(X, enc(x)), bind(Y, enc(y))]) 2. Encode value(s):

Grayscale: val_hv = enc(v)

RGB: val_hv = bundle([bind(R, enc(r)), bind(G, enc(g)), bind(B, enc(b))])

Bind position with value: pixel_hv = bind(pos_hv, val_hv)
Bundle all pixels: image_hv = bundle([all pixel_hvs])

This creates a distributed representation that preserves both spatial structure and pixel values, enabling similarity-based image comparison.

Parameters:

model (VSAModel) – The VSA model to use for encoding operations.
scalar_encoder (ScalarEncoder) – Encoder for continuous pixel values (0-255 typically).
normalize_pixels (bool, optional) – Whether to normalize pixel values to [0, 1] before encoding. Default is True.
seed (int, optional) – Random seed for reproducibility. Default is None.

n_channels¶

Number of channels in the last encoded image (1, 3, or 4).

Type:: int

image_shape¶

Shape (height, width, channels) of the last encoded image.

Type:: tuple

Examples

>>> from holovec import VSA
>>> from holovec.encoders import ImageEncoder, ThermometerEncoder
>>> import numpy as np
>>>
>>> model = VSA.create('MAP', dim=10000, seed=42)
>>> scalar_enc = ThermometerEncoder(model, min_val=0, max_val=1, n_bins=256, seed=42)
>>> encoder = ImageEncoder(model, scalar_enc, normalize_pixels=True, seed=42)
>>>
>>> # Encode a small grayscale image
>>> image = np.array([[100, 150], [200, 250]], dtype=np.uint8)
>>> hv = encoder.encode(image)
>>> print(hv.shape)  # (10000,)
>>>
>>> # Encode RGB image
>>> rgb_image = np.random.randint(0, 256, (28, 28, 3), dtype=np.uint8)
>>> hv_rgb = encoder.encode(rgb_image)

Initialize ImageEncoder.

__init__(model: VSAModel, scalar_encoder: ScalarEncoder, normalize_pixels: bool = True, seed: int | None = None)[source]¶: Initialize ImageEncoder.

encode(image: Any | numpy.ndarray) → Any[source]¶

Encode an image into a hypervector.

Parameters:: image (array-like) – Image array with shape (height, width) for grayscale or (height, width, channels) for color images. Pixel values should be in range [0, 255] for uint8 or [0, 1] for float. Typically a NumPy array from PIL, OpenCV, or similar libraries.
Returns:: Hypervector encoding of the image.
Return type:: Array
Raises:: ValueError – If image has invalid shape or number of channels.

Notes

This encoder accepts images as NumPy arrays (the standard format from image libraries like PIL, OpenCV, scikit-image) and processes them using the configured backend. While input must be NumPy, internal VSA operations use the model’s backend (NumPy/PyTorch/JAX).

decode(hypervector: Any, height: int, width: int, n_channels: int = 1) → numpy.ndarray[source]¶

Decode a hypervector to reconstruct an approximate image.

Note: Image decoding is approximate and requires knowing the target image dimensions. Reconstruction quality depends on the scalar encoder’s decoding capabilities and may require candidate value search.

Parameters:

hypervector (Array) – The hypervector to decode.
height (int) – Target image height.
width (int) – Target image width.
n_channels (int, optional) – Number of channels (1, 3, or 4). Default is 1.

Returns:

Reconstructed image with shape (height, width) for grayscale or (height, width, n_channels) for color.

Return type:

np.ndarray

Raises:

NotImplementedError – Image decoding is computationally intractable without additional constraints.

Notes

Image decoding is not implemented because it requires solving a high-dimensional inverse problem that is fundamentally ill-posed:

Mathematical Challenge:

The encoding process binds pixel values with position vectors:: image_hv = bundle([bind(position(i,j), scalar(pixel[i,j])) for all i,j])

To decode, we must: 1. Unbind each position: pixel_hv[i,j] = unbind(image_hv, position(i,j)) 2. Decode each scalar: pixel[i,j] = scalar_decode(pixel_hv[i,j])

Why This Is Intractable:

Unbinding is approximate (except for FHRR with exact inverse)
Each unbind operation introduces noise
For H×W image: H×W unbind operations compound errors
Scalar decoding via optimization (1000 evals × 100 iterations)
Total: ~100M evaluations for 100×100 image
No gradient available for joint optimization

Alternative Approaches:

Database Retrieval: Encode query image, find nearest match in database - Complexity: O(N) for N known images - Works well for classification/recognition tasks
Iterative Resonator: Use resonator cleanup with pixel codebook - Requires pre-built codebook of common pixel patterns - May reconstruct coarse structure but not fine details
Neural Decoder: Train neural network image_hv → image - Requires supervised training data - Can learn inverse mapping empirically - See: Imani et al. (2019) “VoiceHD” for similar approach

For practical applications, use ImageEncoder for one-way encoding (e.g., image→hypervector→classifier) rather than reconstruction.

References

Imani et al. (2019): “VoiceHD: Hyperdimensional Computing for Efficient Speech Recognition”
Plate (2003): “Holographic Reduced Representations” - Chapter 4 on approximate unbinding and error accumulation

property is_reversible: bool¶

Whether the encoder supports decoding.

Returns:: False - image decoding not yet implemented.
Return type:: bool

property compatible_models: List[str]¶

List of compatible VSA model names.

Returns:: All VSA models supported (depends on scalar encoder compatibility).
Return type:: list of str

property input_type: str¶

Description of expected input type.

Returns:: Description of input format.
Return type:: str

__repr__() → str[source]¶: Return string representation.

class holovec.encoders.VectorEncoder(model: VSAModel, scalar_encoder: ScalarEncoder, n_dimensions: int, normalize_input: bool = False, seed: int | None = None)[source]¶

Bases: StructuredEncoder

Vector encoder for multi-dimensional numeric data using role-filler binding.

Encodes vectors by binding each dimension with its scalar-encoded value:

encode([v₁, v₂, …, vₐ]) = Σᵢ bind(Dᵢ, scalar_encode(vᵢ))

where: - Dᵢ is a random hypervector for dimension i - scalar_encode(vᵢ) encodes the scalar value using FPE/Thermometer/Level - bind() creates a role-filler binding - Σ bundles all dimension-value pairs

This creates a compositional encoding where: - Each dimension has explicit representation (Dᵢ) - Similar values in corresponding dimensions → higher similarity - Supports partial matching across dimensions - Enables approximate decoding via unbinding

scalar_encoder¶: Encoder for individual scalar values

n_dimensions¶: Number of dimensions in input vectors

dim_vectors¶: List of dimension hypervectors (Dᵢ)

normalize_input¶: Whether to normalize input vectors

Example

>>> from holovec import VSA
>>> from holovec.encoders import FractionalPowerEncoder, VectorEncoder
>>>
>>> model = VSA.create('FHRR', dim=10000)
>>> scalar_enc = FractionalPowerEncoder(model, min_val=0, max_val=1)
>>> encoder = VectorEncoder(model, scalar_encoder=scalar_enc, n_dims=128)
>>>
>>> # Encode a feature vector (list or any backend array)
>>> features = [0.5] * 128  # Can also use numpy/torch/jax arrays
>>> hv = encoder.encode(features)
>>>
>>> # Similar vectors have high similarity
>>> features2 = [0.51] * 128  # Slightly different
>>> hv2 = encoder.encode(features2)
>>> model.similarity(hv, hv2)  # High similarity
>>>
>>> # Decode to recover approximate values
>>> recovered = encoder.decode(hv)
>>> # Verify approximate recovery via similarity
>>> model.similarity(encoder.encode(recovered), hv) > 0.9

Initialize vector encoder.

Parameters:

model – VSA model instance
scalar_encoder – Encoder for individual scalar values
n_dimensions – Number of dimensions in input vectors
normalize_input – Whether to normalize input vectors to unit length
seed – Random seed for dimension vector generation

Raises:

ValueError – If n_dimensions < 1
TypeError – If scalar_encoder is not a ScalarEncoder

__init__(model: VSAModel, scalar_encoder: ScalarEncoder, n_dimensions: int, normalize_input: bool = False, seed: int | None = None)[source]¶

Initialize vector encoder.

Parameters:

model – VSA model instance
scalar_encoder – Encoder for individual scalar values
n_dimensions – Number of dimensions in input vectors
normalize_input – Whether to normalize input vectors to unit length
seed – Random seed for dimension vector generation

Raises:

ValueError – If n_dimensions < 1
TypeError – If scalar_encoder is not a ScalarEncoder

encode(vector: Any) → Any[source]¶

Encode a vector using dimension binding.

Each element is bound with its corresponding dimension vector:

result = Σᵢ bind(Dᵢ, scalar_encode(vector[i]))

Parameters:: vector – Input vector to encode, shape (n_dimensions,)
Returns:: Hypervector representing the vector
Raises:: ValueError – If vector shape doesn’t match n_dimensions

Example

>>> encoder = VectorEncoder(model, scalar_enc, n_dims=3)
>>> vector = [1.0, 2.0, 3.0]  # Can also be numpy/torch/jax array
>>> hv = encoder.encode(vector)

decode(hypervector: Any) → Any[source]¶

Decode vector hypervector to recover approximate values.

For each dimension i: 1. Unbind dimension: value_hv = unbind(hypervector, Dᵢ) 2. Decode scalar: value ≈ scalar_encoder.decode(value_hv)

Parameters:: hypervector – Vector hypervector to decode, shape (dimension,)
Returns:: Decoded vector, shape (n_dimensions,) (backend array type)
Raises:: NotImplementedError – If scalar_encoder doesn’t support decoding

Note

Decoding is approximate and quality depends on: - VSA model (exact vs. approximate binding) - Scalar encoder precision - Number of dimensions (more dims → more noise)

Example

>>> original = [1.0, 2.0, 3.0]
>>> encoded = encoder.encode(original)
>>> decoded = encoder.decode(encoded)
>>> # Check approximate recovery (using backend operations)
>>> model.similarity(encoder.encode(decoded), encoded) > 0.9

property is_reversible: bool¶

VectorEncoder supports approximate decoding if scalar_encoder does.

Returns:: True if scalar_encoder supports decoding, False otherwise

property compatible_models: List[str]¶

Works with all VSA models.

Decoding quality varies: - Exact models (FHRR, MAP): High accuracy - Approximate models (HRR, BSC): Moderate accuracy

Returns:: List of all model names

property input_type: str¶: Input type description.

__repr__() → str[source]¶: String representation.

Encoders¶

Base Encoder¶

Scalar Encoders¶

Sequence Encoders¶

Spatial Encoders¶

See Also¶