Encoders¶
Encoders for different data types.
Base Encoder¶
- class holovec.encoders.Encoder(model: VSAModel)[source]¶
Bases:
ABCAbstract base class for all encoders.
Encoders transform data into hypervectors compatible with VSA models. They follow the principle of locality preservation: similar inputs should map to similar hypervectors.
- model¶
VSA model instance used for vector operations
- backend¶
Backend instance (inherited from model)
- dimension¶
Dimensionality of hypervectors (inherited from model)
Initialize encoder with a VSA model.
- Parameters:
model – VSA model instance to use for operations
- Raises:
ValueError – If model is not compatible with this encoder
- __init__(model: VSAModel)[source]¶
Initialize encoder with a VSA model.
- Parameters:
model – VSA model instance to use for operations
- Raises:
ValueError – If model is not compatible with this encoder
- abstractmethod encode(data: Any) Any[source]¶
Encode data into hypervector.
- Parameters:
data – Input data (type depends on encoder)
- Returns:
Hypervector representation of shape (dimension,)
- Raises:
ValueError – If data is invalid for this encoder
- encode_batch(data_list: List[Any]) List[Any][source]¶
Encode multiple data points.
Default implementation encodes each item individually. Subclasses may override for more efficient batch encoding.
- Parameters:
data_list – List of data items to encode
- Returns:
List of encoded hypervectors
- abstractmethod decode(hypervector: Any) Any[source]¶
Decode hypervector back to data (if possible).
- Parameters:
hypervector – Hypervector to decode, shape (dimension,)
- Returns:
Decoded data, or None if encoder is not reversible
- Raises:
NotImplementedError – If encoder does not support decoding
- abstract property is_reversible: bool¶
Whether this encoder supports decoding.
- Returns:
True if decode() is implemented and functional, False otherwise
- abstract property compatible_models: List[str]¶
List of compatible VSA model names.
- Returns:
List of model names (e.g., [‘FHRR’, ‘HRR’])
Scalar Encoders¶
- class holovec.encoders.FractionalPowerEncoder(model: VSAModel, min_val: float, max_val: float, bandwidth: float = 1.0, seed: int | None = None, phase_dist: str = 'uniform', mixture_bandwidths: List[float] | None = None, mixture_weights: List[float] | None = None)[source]¶
Bases:
ScalarEncoderFractional Power Encoding (FPE) for continuous scalars.
Based on Frady et al. (2021) “Computing on Functions Using Randomized Vector Representations”. Encodes scalars by exponentiating a random phasor base vector: encode(x) = φ^x.
The inner product between encoded vectors approximates a similarity kernel (sinc for uniform phase distribution). This encoding preserves linearity and enables precise decoding via sinc kernel reconstruction.
Works best with FHRR (complex domain) but also supports HRR (real domain).
References
Frady et al. (2021): https://arxiv.org/abs/2109.03429 Verges et al. (2025): Learning encoding phasors with FPE
- bandwidth¶
Controls kernel width (lower = wider kernel)
- base_phasor¶
Random phasor vector φ = [e^(iφ₁), …, e^(iφₙ)]
Initialize FractionalPowerEncoder.
- Parameters:
model (VSAModel) – VSA model (FHRR or HRR). FHRR (complex-valued) is preferred for exact fractional powers. HRR (real-valued) uses cosine projection.
min_val (float) – Minimum value of encoding range. Values below this will be clipped.
max_val (float) – Maximum value of encoding range. Values above this will be clipped.
bandwidth (float, optional) –
Bandwidth parameter β controlling kernel width (default: 1.0).
Mathematical Role: - Encoding: z(x) = φ^(β·x_normalized) - Kernel: K(x₁, x₂) ≈ sinc(β·π·|x₁ - x₂|) for uniform phase distribution - Smaller β → wider kernel → more generalization - Larger β → narrower kernel → more discrimination
Typical Values: - β = 0.01: Wide kernel, high generalization (classification) - β = 1.0: Medium kernel (default) - β = 10.0: Narrow kernel, low generalization (regression)
seed (int or None, optional) – Random seed for generating base phasor (for reproducibility). Different seeds produce different random frequency vectors θ.
phase_dist (str, optional) –
Distribution for sampling frequency vector θ (default: ‘uniform’).
Available Distributions: - ‘uniform’: θⱼ ~ Uniform[-π, π] → sinc kernel (default) - ‘gaussian’: θⱼ ~ N(0, 1) → Gaussian kernel approximation - ‘laplace’: θⱼ ~ Laplace(0, 1) → Exponential kernel, heavy tails - ‘cauchy’: θⱼ ~ Cauchy(0, 1) → Very heavy tails, long-range - ‘student’: θⱼ ~ Student-t(df=3) → Moderate tails, robust
Different distributions induce different similarity kernels, affecting generalization properties.
mixture_bandwidths (List[float] or None, optional) –
List of K bandwidth values [β₁, β₂, …, βₖ] for mixture encoding.
Mixture Encoding: Instead of single bandwidth β, use weighted combination:
z_mix(x) = Σₖ αₖ · φ^(βₖ·x)
where αₖ are mixture_weights. This creates multi-scale representation combining coarse (small β) and fine (large β) kernels.
Example: mixture_bandwidths = [0.01, 0.1, 1.0, 10.0] # 4 scales Creates encoding with both local and global similarity.
mixture_weights (List[float] or None, optional) –
Weights αₖ for each bandwidth in mixture (must sum to 1).
- If None and mixture_bandwidths is provided, uses uniform weights:
αₖ = 1/K for all k
Weights can be: 1. Hand-crafted (domain knowledge) 2. Learned via learn_mixture_weights() (ridge regression) 3. Uniform (default)
- Raises:
ValueError – If phase_dist not in valid set, or if mixture_weights/mixture_bandwidths have mismatched lengths.
Notes
Mathematical Foundation:
- Fractional Power Encoding maps scalar x to hypervector via:
z(x) = φ^(β·x_normalized)
where: - φ = [e^(iθ₁), e^(iθ₂), …, e^(iθₐ)] is base phasor (D dimensions) - θⱼ are random frequencies sampled from phase_dist - x_normalized ∈ [0, 1] is x mapped to unit interval - β is bandwidth parameter
Inner Product Kernel:
- For uniform phase distribution θⱼ ~ Uniform[-π, π]:
⟨z(x₁), z(x₂)⟩ / D ≈ sinc(β·π·|x₁ - x₂|)
This sinc kernel has important properties: - Smooth interpolation between similar values - Exact at x₁ = x₂ (similarity = 1) - Decreases monotonically with distance - Zero-crossings at integer multiples of 1/β
Comparison to Random Fourier Features:
FPE is equivalent to Random Fourier Features (Rahimi & Recht, 2007) for kernel approximation:
k(x₁, x₂) ≈ φ(x₁)ᵀφ(x₂) / D
where φ(x) = [cos(θ₁x), sin(θ₁x), …, cos(θₐx), sin(θₐx)]
- For complex hypervectors, FPE uses complex exponentials instead:
φ(x) = [e^(iθ₁x), e^(iθ₂x), …, e^(iθₐx)]
which provides more compact representation and supports exact fractional power operations in frequency domain.
References
Frady et al. (2021): “Computing on Functions Using Randomized Vector Representations” - Original FPE paper
Rahimi & Recht (2007): “Random Features for Large-Scale Kernel Machines”
Sutherland & Schneider (2015): “On the Error of Random Fourier Features”
Verges et al. (2025): “Learning Encoding Phasors with Fractional Power Encoding”
Examples
>>> # Basic FPE for temperature encoding >>> model = VSA.create('FHRR', dim=10000) >>> encoder = FractionalPowerEncoder(model, min_val=0, max_val=100) >>> temp_25 = encoder.encode(25.0) >>> temp_26 = encoder.encode(26.0) >>> similarity = model.similarity(temp_25, temp_26) # ≈ 0.95
>>> # Multi-scale mixture encoding >>> encoder_mix = FractionalPowerEncoder( ... model, min_val=0, max_val=100, ... mixture_bandwidths=[0.01, 0.1, 1.0, 10.0], ... mixture_weights=[0.4, 0.3, 0.2, 0.1] # Emphasize coarse scales ... )
>>> # Alternative kernel via phase distribution >>> encoder_gauss = FractionalPowerEncoder( ... model, min_val=0, max_val=100, ... phase_dist='gaussian' # Gaussian kernel instead of sinc ... )
- __init__(model: VSAModel, min_val: float, max_val: float, bandwidth: float = 1.0, seed: int | None = None, phase_dist: str = 'uniform', mixture_bandwidths: List[float] | None = None, mixture_weights: List[float] | None = None)[source]¶
Initialize FractionalPowerEncoder.
- Parameters:
model (VSAModel) – VSA model (FHRR or HRR). FHRR (complex-valued) is preferred for exact fractional powers. HRR (real-valued) uses cosine projection.
min_val (float) – Minimum value of encoding range. Values below this will be clipped.
max_val (float) – Maximum value of encoding range. Values above this will be clipped.
bandwidth (float, optional) –
Bandwidth parameter β controlling kernel width (default: 1.0).
Mathematical Role: - Encoding: z(x) = φ^(β·x_normalized) - Kernel: K(x₁, x₂) ≈ sinc(β·π·|x₁ - x₂|) for uniform phase distribution - Smaller β → wider kernel → more generalization - Larger β → narrower kernel → more discrimination
Typical Values: - β = 0.01: Wide kernel, high generalization (classification) - β = 1.0: Medium kernel (default) - β = 10.0: Narrow kernel, low generalization (regression)
seed (int or None, optional) – Random seed for generating base phasor (for reproducibility). Different seeds produce different random frequency vectors θ.
phase_dist (str, optional) –
Distribution for sampling frequency vector θ (default: ‘uniform’).
Available Distributions: - ‘uniform’: θⱼ ~ Uniform[-π, π] → sinc kernel (default) - ‘gaussian’: θⱼ ~ N(0, 1) → Gaussian kernel approximation - ‘laplace’: θⱼ ~ Laplace(0, 1) → Exponential kernel, heavy tails - ‘cauchy’: θⱼ ~ Cauchy(0, 1) → Very heavy tails, long-range - ‘student’: θⱼ ~ Student-t(df=3) → Moderate tails, robust
Different distributions induce different similarity kernels, affecting generalization properties.
mixture_bandwidths (List[float] or None, optional) –
List of K bandwidth values [β₁, β₂, …, βₖ] for mixture encoding.
Mixture Encoding: Instead of single bandwidth β, use weighted combination:
z_mix(x) = Σₖ αₖ · φ^(βₖ·x)
where αₖ are mixture_weights. This creates multi-scale representation combining coarse (small β) and fine (large β) kernels.
Example: mixture_bandwidths = [0.01, 0.1, 1.0, 10.0] # 4 scales Creates encoding with both local and global similarity.
mixture_weights (List[float] or None, optional) –
Weights αₖ for each bandwidth in mixture (must sum to 1).
- If None and mixture_bandwidths is provided, uses uniform weights:
αₖ = 1/K for all k
Weights can be: 1. Hand-crafted (domain knowledge) 2. Learned via learn_mixture_weights() (ridge regression) 3. Uniform (default)
- Raises:
ValueError – If phase_dist not in valid set, or if mixture_weights/mixture_bandwidths have mismatched lengths.
Notes
Mathematical Foundation:
- Fractional Power Encoding maps scalar x to hypervector via:
z(x) = φ^(β·x_normalized)
where: - φ = [e^(iθ₁), e^(iθ₂), …, e^(iθₐ)] is base phasor (D dimensions) - θⱼ are random frequencies sampled from phase_dist - x_normalized ∈ [0, 1] is x mapped to unit interval - β is bandwidth parameter
Inner Product Kernel:
- For uniform phase distribution θⱼ ~ Uniform[-π, π]:
⟨z(x₁), z(x₂)⟩ / D ≈ sinc(β·π·|x₁ - x₂|)
This sinc kernel has important properties: - Smooth interpolation between similar values - Exact at x₁ = x₂ (similarity = 1) - Decreases monotonically with distance - Zero-crossings at integer multiples of 1/β
Comparison to Random Fourier Features:
FPE is equivalent to Random Fourier Features (Rahimi & Recht, 2007) for kernel approximation:
k(x₁, x₂) ≈ φ(x₁)ᵀφ(x₂) / D
where φ(x) = [cos(θ₁x), sin(θ₁x), …, cos(θₐx), sin(θₐx)]
- For complex hypervectors, FPE uses complex exponentials instead:
φ(x) = [e^(iθ₁x), e^(iθ₂x), …, e^(iθₐx)]
which provides more compact representation and supports exact fractional power operations in frequency domain.
References
Frady et al. (2021): “Computing on Functions Using Randomized Vector Representations” - Original FPE paper
Rahimi & Recht (2007): “Random Features for Large-Scale Kernel Machines”
Sutherland & Schneider (2015): “On the Error of Random Fourier Features”
Verges et al. (2025): “Learning Encoding Phasors with Fractional Power Encoding”
Examples
>>> # Basic FPE for temperature encoding >>> model = VSA.create('FHRR', dim=10000) >>> encoder = FractionalPowerEncoder(model, min_val=0, max_val=100) >>> temp_25 = encoder.encode(25.0) >>> temp_26 = encoder.encode(26.0) >>> similarity = model.similarity(temp_25, temp_26) # ≈ 0.95
>>> # Multi-scale mixture encoding >>> encoder_mix = FractionalPowerEncoder( ... model, min_val=0, max_val=100, ... mixture_bandwidths=[0.01, 0.1, 1.0, 10.0], ... mixture_weights=[0.4, 0.3, 0.2, 0.1] # Emphasize coarse scales ... )
>>> # Alternative kernel via phase distribution >>> encoder_gauss = FractionalPowerEncoder( ... model, min_val=0, max_val=100, ... phase_dist='gaussian' # Gaussian kernel instead of sinc ... )
- encode(value: float) Any[source]¶
Encode scalar value to hypervector using fractional power.
- Parameters:
value (float) – Scalar value to encode. Will be clipped to [min_val, max_val].
- Returns:
Encoded hypervector of shape (dimension,) in backend format.
- Return type:
Array
Notes
Single Bandwidth Encoding:
- For single bandwidth β, implements:
z(x) = φ^(β·x_normalized)
where: - x_normalized = (value - min_val) / (max_val - min_val) ∈ [0, 1] - φ = [e^(iθ₁), …, e^(iθₐ)] is base phasor with random frequencies θⱼ - Result is normalized according to model’s space
- Element-wise computation:
z_j(x) = e^(i·θⱼ·β·x_normalized) (complex models) z_j(x) = cos(θⱼ·β·x_normalized) (real models)
Mixture Encoding:
- When mixture_bandwidths = [β₁, …, βₖ] is provided, uses weighted sum:
z_mix(x) = Σₖ αₖ · φ^(βₖ·x_normalized)
where αₖ are mixture_weights (default: uniform αₖ = 1/K).
Advantages of Mixture Encoding:
Multi-Scale Representation: Combines coarse (small β) and fine (large β) similarity kernels in single hypervector
Improved Generalization: Coarse scales provide robustness, fine scales provide discrimination
Learned Weights: Weights αₖ can be learned via learn_mixture_weights() to optimize for specific task
Kernel Combination: Mixture is equivalent to combining multiple kernels: K_mix(d) = Σₖ αₖ·K_βₖ(d)
Computational Complexity:
Single bandwidth: O(D) operations (element-wise exponential)
Mixture with K bandwidths: O(K·D) operations
Backend operations (exp, multiply) are vectorized/GPU-accelerated
Normalization:
Output is normalized using model’s normalization scheme: - FHRR/HRR: L2 normalization (unit norm) - MAP: Element-wise normalization - BSC/BSDC: No normalization (binary)
This ensures hypervectors are in valid space for subsequent binding/bundling operations.
Examples
>>> # Basic encoding >>> model = VSA.create('FHRR', dim=10000) >>> encoder = FractionalPowerEncoder(model, min_val=0, max_val=100) >>> hv_25 = encoder.encode(25.0) # Encode temperature 25°C >>> hv_26 = encoder.encode(26.0) >>> similarity = model.similarity(hv_25, hv_26) >>> print(f"Similarity: {similarity:.3f}") # ≈ 0.950 (close values)
>>> # Mixture encoding for multi-scale representation >>> encoder_mix = FractionalPowerEncoder( ... model, min_val=0, max_val=100, ... mixture_bandwidths=[0.01, 1.0, 100.0] ... ) >>> hv_mix = encoder_mix.encode(25.0) # Combines 3 scales
>>> # Effect of bandwidth on similarity >>> enc_wide = FractionalPowerEncoder(model, 0, 100, bandwidth=0.1) >>> enc_narrow = FractionalPowerEncoder(model, 0, 100, bandwidth=10.0) >>> sim_wide = model.similarity(enc_wide.encode(25), enc_wide.encode(30)) >>> sim_narrow = model.similarity(enc_narrow.encode(25), enc_narrow.encode(30)) >>> # sim_wide > sim_narrow (wider kernel → more generalization)
- decode(hypervector: Any, resolution: int = 1000, max_iterations: int = 100, tolerance: float = 1e-06) float[source]¶
Decode hypervector back to scalar value using two-stage optimization.
- Parameters:
hypervector (Array) – Hypervector to decode (typically a noisy/bundled encoding).
resolution (int, optional) – Number of grid points for coarse search (default: 1000). Higher resolution improves initial guess but increases cost.
max_iterations (int, optional) – Maximum gradient descent iterations (default: 100). Typical convergence: 20-50 iterations.
tolerance (float, optional) – Convergence tolerance for gradient descent (default: 1e-6). Stop when |Δx| < tolerance.
- Returns:
Decoded scalar value in [min_val, max_val].
- Return type:
Notes
Decoding Algorithm:
- Uses two-stage optimization to find value x maximizing similarity:
x* = argmax_x ⟨encode(x), hypervector⟩
Stage 1: Coarse Grid Search (O(resolution · D)) - Evaluate similarity at resolution uniformly-spaced points - Find x₀ with highest similarity - Provides good initialization for gradient descent
Stage 2: Gradient Descent (O(max_iterations · D)) - Starting from x₀, perform gradient ascent:
x_{t+1} = x_t + η_t · ∇_x ⟨encode(x_t), hypervector⟩
- Gradient computed via finite differences:
∇_x ≈ (sim(x + ε) - sim(x)) / ε
Step size η_t decays: η_t = η_0 · 0.95^t (prevents oscillation)
Clips updates to [0, 1] normalized range
Why This Works:
For FPE with sinc kernel K(x₁, x₂) = sinc(β·π·|x₁ - x₂|): - Similarity function is unimodal (single peak) - Peak occurs at x = x_true (encoded value) - Gradient descent converges to global maximum
However, for noisy hypervectors (e.g., bundled encodings): - Multiple local maxima may exist - Coarse search reduces chance of local minimum trap - Wider kernels (small β) → smoother objective → easier optimization
Approximation Quality:
Decoding accuracy depends on several factors:
Dimension D: Higher D → more accurate encoding → better decoding - D = 1000: Moderate accuracy (similarity ≈ 0.85) - D = 10000: High accuracy (similarity ≈ 0.99)
Signal-to-Noise Ratio: Clean encoding vs bundled/noisy - Clean: Near-perfect recovery (error < 1%) - Bundled (10 items): Good recovery (error ≈ 5-10%) - Bundled (100 items): Degraded (error ≈ 20-30%)
Bandwidth β: Wider kernels → smoother similarity landscape - β = 0.01: Very smooth, easy to optimize - β = 10.0: Narrow kernel, may have local maxima
Mixture Encoding: Multiple bandwidths complicate landscape - May require finer grid search (higher resolution) - May need more gradient descent iterations
Computational Cost:
Total operations: O(resolution · D + max_iterations · D)
Typical values: - resolution = 1000, max_iterations = 100, D = 10000 - Total: ~1.1M evaluations - Runtime: ~0.1-1.0 seconds (CPU), ~0.01-0.1 seconds (GPU)
For real-time applications, reduce resolution or max_iterations: - resolution = 100 (coarser search) - max_iterations = 20 (early stopping)
Comparison to Other Decoders:
Codebook Lookup (LevelEncoder): O(K · D) for K levels Faster but discrete, no interpolation
Resonator Network (cleanup): O(iterations · M · D) for M items Better for structured/compositional decoding
FPE Gradient Descent: O(resolution · D + iterations · D) Best for continuous scalar recovery
References
Frady et al. (2021): “Computing on Functions Using Randomized Vector Representations” - Section on FPE decoding
Nocedal & Wright (2006): “Numerical Optimization” - Gradient descent methods and convergence analysis
Examples
>>> # Basic decoding >>> model = VSA.create('FHRR', dim=10000) >>> encoder = FractionalPowerEncoder(model, min_val=0, max_val=100) >>> hv = encoder.encode(25.0) >>> decoded = encoder.decode(hv) >>> print(f"Decoded: {decoded:.2f}") # ≈ 25.00
>>> # Decoding noisy hypervector (bundled encoding) >>> hv_bundle = model.bundle([encoder.encode(25.0), encoder.encode(26.0)]) >>> decoded_bundle = encoder.decode(hv_bundle) >>> print(f"Decoded bundle: {decoded_bundle:.2f}") # ≈ 25.5
>>> # Fast decoding (lower resolution/iterations) >>> decoded_fast = encoder.decode(hv, resolution=100, max_iterations=20)
- learn_mixture_weights(values: List[float], labels: List[int], reg: float = 0.001) List[float][source]¶
Learn mixture weights (alphas) for fixed mixture_bandwidths using a simple ridge-style objective that aligns encoded mixtures to per-class prototypes.
- Approach:
Build class prototypes p_c as the mean of current encodings (using current weights)
For each sample i, compute per-band encodings E_i = [e_{i1},…,e_{iK}] (shape d×K)
Solve (Σ E_i^T E_i + reg I) α = Σ E_i^T p_{y_i}
Project α onto simplex (nonnegative, sum=1)
- Parameters:
values – list of scalar inputs
labels – list of integer class labels (same length as values)
reg – L2 regularization strength (default 1e-3)
- Returns:
Learned mixture weights (list of floats summing to 1)
Notes
Requires mixture_bandwidths to be set (K>=2)
Uses numpy for solving normal equations; backend remains unchanged
- class holovec.encoders.ThermometerEncoder(model: VSAModel, min_val: float, max_val: float, n_bins: int = 100, seed: int | None = None)[source]¶
Bases:
ScalarEncoderThermometer encoding for scalar values.
Divides value range into N bins and encodes a value as the bundle of all bins it exceeds. Creates monotonic similarity profile.
Simpler and more robust than FPE, but with coarser granularity. Works with all VSA models.
References
Kanerva (2009): “Hyperdimensional Computing”
Initialize ThermometerEncoder.
- Parameters:
model – VSA model (any)
min_val – Minimum value of encoding range
max_val – Maximum value of encoding range
n_bins – Number of bins to divide range into (default 100)
seed – Random seed for generating bin vectors
- Raises:
ValueError – If n_bins < 2
- __init__(model: VSAModel, min_val: float, max_val: float, n_bins: int = 100, seed: int | None = None)[source]¶
Initialize ThermometerEncoder.
- Parameters:
model – VSA model (any)
min_val – Minimum value of encoding range
max_val – Maximum value of encoding range
n_bins – Number of bins to divide range into (default 100)
seed – Random seed for generating bin vectors
- Raises:
ValueError – If n_bins < 2
- encode(value: float) Any[source]¶
Encode scalar as bundle of all bins it exceeds.
- Parameters:
value – Scalar value to encode
- Returns:
Encoded hypervector (bundle of activated bins)
- decode(hypervector: Any) float[source]¶
Decode is not implemented for ThermometerEncoder.
Thermometer encoding is not easily reversible without storing additional information.
- Raises:
NotImplementedError – Always raises
- class holovec.encoders.LevelEncoder(model: VSAModel, min_val: float, max_val: float, n_levels: int, seed: int | None = None)[source]¶
Bases:
ScalarEncoderLevel (codebook) encoding for discrete scalar values.
Maps discrete levels to random orthogonal vectors via lookup table. Fast (O(1) encode/decode) and exact for discrete values.
Best used when you have a small number of discrete values rather than continuous range.
Example
>>> # Encode weekdays (7 discrete values) >>> model = VSA.create('FHRR', dim=10000) >>> encoder = LevelEncoder(model, min_val=0, max_val=6, n_levels=7) >>> monday = encoder.encode(0) # Exact encoding >>> friday = encoder.encode(4)
Initialize LevelEncoder.
- Parameters:
model – VSA model (any)
min_val – Minimum value (corresponds to level 0)
max_val – Maximum value (corresponds to level n_levels-1)
n_levels – Number of discrete levels
seed – Random seed for generating level vectors
- Raises:
ValueError – If n_levels < 2
- __init__(model: VSAModel, min_val: float, max_val: float, n_levels: int, seed: int | None = None)[source]¶
Initialize LevelEncoder.
- Parameters:
model – VSA model (any)
min_val – Minimum value (corresponds to level 0)
max_val – Maximum value (corresponds to level n_levels-1)
n_levels – Number of discrete levels
seed – Random seed for generating level vectors
- Raises:
ValueError – If n_levels < 2
- encode(value: float) Any[source]¶
Encode scalar to nearest level’s hypervector.
- Parameters:
value – Scalar value to encode
- Returns:
Hypervector corresponding to nearest level
Sequence Encoders¶
- class holovec.encoders.PositionBindingEncoder(model: VSAModel, codebook: Dict[str, Any] | None = None, max_length: int | None = None, auto_generate: bool = True, seed: int | None = None)[source]¶
Bases:
SequenceEncoderPosition binding encoder for sequences using permutation-based positions.
Based on Plate (2003) “Holographic Reduced Representations” and Schlegel et al. (2021) “A comparison of vector symbolic architectures”.
Encodes sequences by binding each element with a position-specific permutation of a base position vector:
encode([A, B, C]) = bind(A, ρ¹) + bind(B, ρ²) + bind(C, ρ³)
where ρ is the permutation operation and ρⁱ represents i applications.
This encoding is: - Order-sensitive: Different positions create different bindings - Variable-length: Works with any sequence length - Partial-match capable: Similar sequences have similar encodings
- codebook¶
Dictionary mapping symbols to hypervectors
- auto_generate¶
Whether to auto-generate vectors for unknown symbols
- seed_offset¶
Offset for generating consistent symbol vectors
Example
>>> model = VSA.create('MAP', dim=10000) >>> encoder = PositionBindingEncoder(model) >>> >>> # Encode a sequence of symbols >>> seq = ['hello', 'world', '!'] >>> hv = encoder.encode(seq) >>> >>> # Similar sequences have high similarity >>> seq2 = ['hello', 'world'] >>> hv2 = encoder.encode(seq2) >>> model.similarity(hv, hv2) # High (shared prefix)
Initialize position binding encoder.
- Parameters:
model – VSA model instance
codebook – Pre-defined symbol → hypervector mapping (optional)
max_length – Maximum sequence length (None for unlimited)
auto_generate – Auto-generate vectors for unknown symbols (default: True)
seed – Random seed for generating symbol vectors
- Raises:
ValueError – If model is not compatible
- __init__(model: VSAModel, codebook: Dict[str, Any] | None = None, max_length: int | None = None, auto_generate: bool = True, seed: int | None = None)[source]¶
Initialize position binding encoder.
- Parameters:
model – VSA model instance
codebook – Pre-defined symbol → hypervector mapping (optional)
max_length – Maximum sequence length (None for unlimited)
auto_generate – Auto-generate vectors for unknown symbols (default: True)
seed – Random seed for generating symbol vectors
- Raises:
ValueError – If model is not compatible
- encode(sequence: List[str | int]) Any[source]¶
Encode sequence using position binding.
Each element is bound with a position-specific permutation and all bound pairs are bundled:
result = Σᵢ bind(element_i, permute(position_vector, i))
- Parameters:
sequence – List of symbols (strings or integers) to encode
- Returns:
Hypervector representing the sequence
- Raises:
ValueError – If sequence is empty
ValueError – If sequence exceeds max_length
ValueError – If symbol not in codebook and auto_generate=False
Example
>>> encoder.encode(['cat', 'sat', 'on', 'mat'])
- decode(hypervector: Any, max_positions: int = 10, threshold: float = 0.3) List[str][source]¶
Decode sequence hypervector to recover symbols.
Uses cleanup memory approach: for each position, unpermute and find most similar symbol in codebook.
- Parameters:
hypervector – Sequence hypervector to decode
max_positions – Maximum positions to try decoding (default: 10)
threshold – Minimum similarity threshold for valid symbols (default: 0.3)
- Returns:
List of decoded symbols (may be shorter than original)
- Raises:
RuntimeError – If codebook is empty
Note
Decoding is approximate and works best for sequences shorter than max_positions with high SNR.
Example
>>> encoded = encoder.encode(['a', 'b', 'c']) >>> decoded = encoder.decode(encoded, max_positions=5) >>> decoded # ['a', 'b', 'c'] (approximate)
- add_symbol(symbol: str | int, vector: Any | None = None)[source]¶
Add a symbol to the codebook.
- Parameters:
symbol – Symbol to add
vector – Hypervector to associate (generated if None)
Example
>>> # Pre-define a vector for a special symbol >>> special_vec = model.random(seed=42) >>> encoder.add_symbol('<START>', special_vec)
- get_codebook_size() int[source]¶
Get number of symbols in codebook.
- Returns:
Number of symbols stored
- property is_reversible: bool¶
PositionBindingEncoder supports approximate decoding.
- Returns:
True (approximate decoding available)
- class holovec.encoders.NGramEncoder(model: VSAModel, n: int = 2, stride: int = 1, mode: str = 'bundling', codebook: Dict[str, Any] | None = None, auto_generate: bool = True, seed: int | None = None)[source]¶
Bases:
SequenceEncoderN-gram encoder for capturing local sequence patterns using sliding windows.
Based on Plate (2003), Rachkovskij (1996), and Kleyko et al. (2023) Section 3.3.4.
Encodes sequences by extracting n-grams (sliding windows of n consecutive symbols) and encoding each n-gram compositionally:
For sequence [A, B, C, D] with n=2, stride=1: - Extract n-grams: [A,B], [B,C], [C,D] - Encode each n-gram using position binding - Combine via bundling or chaining
Two encoding modes:
Bundling mode (bag-of-ngrams): encode(seq) = bundle([encode_ngram([A,B]), encode_ngram([B,C]), …]) - Order-invariant across n-grams (but preserves within n-gram) - Good for classification (e.g., text categorization) - Similar to bag-of-words but with local context
Chaining mode (ordered n-grams): encode(seq) = Σᵢ bind(encode_ngram(ngramᵢ), ρⁱ) - Order-sensitive across n-grams - Good for sequence matching - Enables partial decoding
- n¶
Size of n-grams (1=unigrams, 2=bigrams, 3=trigrams, etc.)
- stride¶
Step size between n-grams (1=overlapping, n=non-overlapping)
- mode¶
‘bundling’ or ‘chaining’
- ngram_encoder¶
Internal PositionBindingEncoder for individual n-grams
Example
>>> model = VSA.create('MAP', dim=10000) >>> encoder = NGramEncoder(model, n=2, stride=1, mode='bundling') >>> >>> # Encode text as bigrams >>> seq = ['the', 'cat', 'sat', 'on', 'mat'] >>> hv = encoder.encode(seq) # Bigrams: [the,cat], [cat,sat], [sat,on], [on,mat] >>> >>> # Similar text has high similarity >>> seq2 = ['the', 'cat', 'sat', 'on', 'hat'] >>> hv2 = encoder.encode(seq2) # Shares 3/4 bigrams >>> model.similarity(hv, hv2) # High similarity
Initialize n-gram encoder.
- Parameters:
model – VSA model instance
n – Size of n-grams (must be >= 1)
stride – Step between n-grams (must be >= 1)
mode – ‘bundling’ for bag-of-ngrams or ‘chaining’ for ordered n-grams
codebook – Optional pre-defined symbol → hypervector mapping
auto_generate – Auto-generate vectors for unknown symbols
seed – Random seed for symbol vector generation
- Raises:
ValueError – If n < 1, stride < 1, or mode is invalid
- __init__(model: VSAModel, n: int = 2, stride: int = 1, mode: str = 'bundling', codebook: Dict[str, Any] | None = None, auto_generate: bool = True, seed: int | None = None)[source]¶
Initialize n-gram encoder.
- Parameters:
model – VSA model instance
n – Size of n-grams (must be >= 1)
stride – Step between n-grams (must be >= 1)
mode – ‘bundling’ for bag-of-ngrams or ‘chaining’ for ordered n-grams
codebook – Optional pre-defined symbol → hypervector mapping
auto_generate – Auto-generate vectors for unknown symbols
seed – Random seed for symbol vector generation
- Raises:
ValueError – If n < 1, stride < 1, or mode is invalid
- encode(sequence: List[str | int]) Any[source]¶
Encode sequence using n-gram representation.
Extracts all n-grams using sliding window with specified stride, encodes each n-gram, then combines via bundling or chaining.
- Parameters:
sequence – List of symbols to encode
- Returns:
Hypervector representing the sequence as n-grams
- Raises:
ValueError – If sequence is too short (length < n)
Example
>>> # Bigrams with stride=1 (overlapping) >>> encoder = NGramEncoder(model, n=2, stride=1) >>> encoder.encode(['A', 'B', 'C']) # N-grams: AB, BC >>> >>> # Trigrams with stride=2 (partial overlap) >>> encoder = NGramEncoder(model, n=3, stride=2) >>> encoder.encode(['A', 'B', 'C', 'D', 'E']) # N-grams: ABC, CDE
- decode(hypervector: Any, max_ngrams: int = 10, threshold: float = 0.3) List[List[str | int]][source]¶
Decode n-gram hypervector to recover n-grams.
Only supported for ‘chaining’ mode. For ‘bundling’ mode, n-grams are order-invariant and cannot be sequentially decoded.
- Parameters:
hypervector – Encoded sequence hypervector
max_ngrams – Maximum number of n-grams to decode
threshold – Minimum similarity threshold for valid n-grams
- Returns:
List of decoded n-grams, each as a list of symbols
- Raises:
NotImplementedError – If mode is ‘bundling’ (not decodable)
RuntimeError – If codebook is empty
Example
>>> encoder = NGramEncoder(model, n=2, mode='chaining') >>> hv = encoder.encode(['A', 'B', 'C']) >>> decoder.decode(hv, max_ngrams=3) # [['A', 'B'], ['B', 'C']]
- get_codebook() Dict[str, Any][source]¶
Get the internal symbol codebook.
- Returns:
Dictionary mapping symbols to hypervectors
- get_codebook_size() int[source]¶
Get number of unique symbols in codebook.
- Returns:
Number of symbols
- class holovec.encoders.TrajectoryEncoder(model: VSAModel, scalar_encoder: ScalarEncoder, n_dimensions: int = 1, time_range: Tuple[float, float] | None = None, seed: int | None = None)[source]¶
Bases:
SequenceEncoderTrajectory encoder for continuous sequences (time series, paths, motion).
Based on Frady et al. (2021) “Computing on Functions” and position binding from Plate (2003), encoding trajectories by binding temporal information with spatial positions.
A trajectory is a sequence of positions over time: - 1D: time series [v₁, v₂, v₃, …] - 2D: path [(x₁,y₁), (x₂,y₂), …] - 3D: motion [(x₁,y₁,z₁), (x₂,y₂,z₂), …]
- Encoding strategy:
For each time step tᵢ with position pᵢ: 1. Encode time: time_hv = scalar_encode(tᵢ) 2. Encode position coords: coord_hvs = [scalar_encode(c) for c in pᵢ] 3. Bind coords to dimensions: pos_hv = Σⱼ bind(Dⱼ, coord_hv_j) 4. Bind time with position: point_hv = bind(time_hv, pos_hv) 5. Permute by index: indexed_hv = permute(point_hv, i)
trajectory_hv = Σᵢ indexed_hv
This creates an encoding that: - Preserves temporal ordering (via permutation) - Captures smooth trajectories (via continuous scalar encoding) - Enables partial matching and interpolation - Supports multi-dimensional paths
- scalar_encoder¶
Encoder for continuous values (FPE or Thermometer)
- n_dimensions¶
Dimensionality of trajectory (1D, 2D, or 3D)
- time_range¶
(min_time, max_time) for temporal normalization
- dim_vectors¶
Hypervectors for spatial dimensions (x, y, z)
Example
>>> from holovec import VSA >>> from holovec.encoders import FractionalPowerEncoder, TrajectoryEncoder >>> >>> model = VSA.create('FHRR', dim=10000) >>> scalar_enc = FractionalPowerEncoder(model, min_val=0, max_val=100) >>> encoder = TrajectoryEncoder(model, scalar_encoder=scalar_enc, n_dimensions=2) >>> >>> # Encode a 2D path >>> path = [(10, 20), (15, 25), (20, 30), (25, 35)] >>> hv = encoder.encode(path) >>> >>> # Similar paths have high similarity >>> path2 = [(10, 20), (15, 25), (20, 30), (25, 40)] # Slightly different >>> hv2 = encoder.encode(path2) >>> model.similarity(hv, hv2) # High similarity
Initialize trajectory encoder.
- Parameters:
model – VSA model instance
scalar_encoder – Encoder for continuous values (FPE or Thermometer recommended)
n_dimensions – Trajectory dimensionality (1, 2, or 3)
time_range – (min, max) time values for normalization (optional)
seed – Random seed for dimension vector generation
- Raises:
ValueError – If n_dimensions not in {1, 2, 3}
TypeError – If scalar_encoder is not reversible
- __init__(model: VSAModel, scalar_encoder: ScalarEncoder, n_dimensions: int = 1, time_range: Tuple[float, float] | None = None, seed: int | None = None)[source]¶
Initialize trajectory encoder.
- Parameters:
model – VSA model instance
scalar_encoder – Encoder for continuous values (FPE or Thermometer recommended)
n_dimensions – Trajectory dimensionality (1, 2, or 3)
time_range – (min, max) time values for normalization (optional)
seed – Random seed for dimension vector generation
- Raises:
ValueError – If n_dimensions not in {1, 2, 3}
TypeError – If scalar_encoder is not reversible
- encode(trajectory: List[float | Tuple[float, ...]]) Any[source]¶
Encode a trajectory as a hypervector.
Each point in the trajectory is encoded with temporal information, then all points are combined with position-based permutation.
- Parameters:
trajectory – List of points - 1D: List[float] e.g., [1.0, 2.5, 3.7, …] - 2D: List[Tuple[float, float]] e.g., [(1,2), (3,4), …] - 3D: List[Tuple[float, float, float]] e.g., [(1,2,3), …]
- Returns:
Hypervector representing the trajectory
- Raises:
ValueError – If trajectory is empty or points have wrong dimensionality
Example
>>> # 1D time series >>> encoder_1d = TrajectoryEncoder(model, scalar_enc, n_dimensions=1) >>> hv = encoder_1d.encode([1.0, 2.5, 3.7, 5.2]) >>> >>> # 2D path >>> encoder_2d = TrajectoryEncoder(model, scalar_enc, n_dimensions=2) >>> hv = encoder_2d.encode([(0,0), (1,1), (2,2)])
- decode(hypervector: Any, max_points: int = 10) List[Tuple[float, ...]][source]¶
Decode trajectory hypervector to recover approximate points.
Note: Trajectory decoding is not yet implemented. It requires: 1. Unpermuting each position 2. Unbinding time from position 3. Unbinding each coordinate from dimension vectors 4. Decoding scalar values 5. Interpolation for smooth trajectories
- Parameters:
hypervector – Encoded trajectory hypervector
max_points – Maximum points to decode
- Returns:
List of decoded points (not implemented yet)
- Raises:
NotImplementedError – Trajectory decoding requires solving nested binding inverse problem.
Notes
Trajectory decoding is not implemented because it requires multi-level unbinding with cascading error accumulation:
Mathematical Challenge:
- The encoding process creates nested bindings:
- trajectory_hv = bundle([
bind(time(t), bind(dimension(d), scalar(coord[t,d]))) for all t, d
])
To decode a single point at time t: 1. Unbind time: point_hv[t] = unbind(trajectory_hv, time(t)) 2. For each dimension d:
Unbind dimension: coord_hv[d] = unbind(point_hv[t], dimension(d))
Decode scalar: coord[t,d] = scalar_decode(coord_hv[d])
Why This Is Intractable:
Two-level unbinding: Time then dimension (or vice versa)
Error compounding: Each unbind adds noise
No known time points: Must search over possible time values
Interpolation complexity: Smooth trajectory requires dense sampling
Computational cost: * For T time points, D dimensions * Requires: T × D × (decode_iterations) evaluations * Example: 100 points × 3D × 100 iterations = 30,000 evals
Additional Challenges:
Order Ambiguity: Don’t know which time point comes first
Density Unknown: Don’t know temporal sampling rate
Dimension Count: Must know dimensionality a priori
Coordinate Ranges: Scalar decoder needs value bounds
Possible Approaches (Future Work):
Constrained Decoding: If time points are known: - Unbind each known time point - Decode coordinates independently - Complexity: O(T × D × decode_cost)
Template Matching: Pre-encode common trajectory patterns - Create codebook of canonical trajectories - Use cleanup to find nearest match - Works for classification, not reconstruction
Learned Decoder: Train neural network trajectory_hv → points - Requires large training dataset - Can learn to handle noise and ambiguity - See: Imani et al. (2019) for similar approach
Iterative Resonator: Use resonator cleanup at each level - Unbind time with resonator cleanup - Unbind dimension with resonator cleanup - Requires codebooks for both time and coordinates
Current Recommendation:
Use TrajectoryEncoder for one-way encoding in applications like: - Trajectory classification (gesture recognition, motion analysis) - Trajectory similarity search (find similar paths) - Trajectory clustering (group similar motions)
For reconstruction, consider storing original trajectories separately and using hypervector encoding only for similarity queries.
References
Plate (2003): “Holographic Reduced Representations” - Section 4.3 on error accumulation in multi-level binding
Räsänen & Saarinen (2016): “Sequence prediction with sparse distributed hyperdimensional coding” - Analysis of temporal binding
Spatial Encoders¶
- class holovec.encoders.ImageEncoder(model: VSAModel, scalar_encoder: ScalarEncoder, normalize_pixels: bool = True, seed: int | None = None)[source]¶
Bases:
EncoderImage encoder for 2D images (grayscale, RGB, or RGBA).
Encodes images by binding spatial positions (x, y) with pixel values. For color images, each channel is bound to a channel dimension vector before being combined with position information.
- Encoding strategy:
For each pixel at position (x, y) with value v: 1. Encode position: pos_hv = bundle([bind(X, enc(x)), bind(Y, enc(y))]) 2. Encode value(s):
Grayscale: val_hv = enc(v)
RGB: val_hv = bundle([bind(R, enc(r)), bind(G, enc(g)), bind(B, enc(b))])
Bind position with value: pixel_hv = bind(pos_hv, val_hv)
Bundle all pixels: image_hv = bundle([all pixel_hvs])
This creates a distributed representation that preserves both spatial structure and pixel values, enabling similarity-based image comparison.
- Parameters:
model (VSAModel) – The VSA model to use for encoding operations.
scalar_encoder (ScalarEncoder) – Encoder for continuous pixel values (0-255 typically).
normalize_pixels (bool, optional) – Whether to normalize pixel values to [0, 1] before encoding. Default is True.
seed (int, optional) – Random seed for reproducibility. Default is None.
Examples
>>> from holovec import VSA >>> from holovec.encoders import ImageEncoder, ThermometerEncoder >>> import numpy as np >>> >>> model = VSA.create('MAP', dim=10000, seed=42) >>> scalar_enc = ThermometerEncoder(model, min_val=0, max_val=1, n_bins=256, seed=42) >>> encoder = ImageEncoder(model, scalar_enc, normalize_pixels=True, seed=42) >>> >>> # Encode a small grayscale image >>> image = np.array([[100, 150], [200, 250]], dtype=np.uint8) >>> hv = encoder.encode(image) >>> print(hv.shape) # (10000,) >>> >>> # Encode RGB image >>> rgb_image = np.random.randint(0, 256, (28, 28, 3), dtype=np.uint8) >>> hv_rgb = encoder.encode(rgb_image)
Initialize ImageEncoder.
- __init__(model: VSAModel, scalar_encoder: ScalarEncoder, normalize_pixels: bool = True, seed: int | None = None)[source]¶
Initialize ImageEncoder.
- encode(image: Any | numpy.ndarray) Any[source]¶
Encode an image into a hypervector.
- Parameters:
image (array-like) – Image array with shape (height, width) for grayscale or (height, width, channels) for color images. Pixel values should be in range [0, 255] for uint8 or [0, 1] for float. Typically a NumPy array from PIL, OpenCV, or similar libraries.
- Returns:
Hypervector encoding of the image.
- Return type:
Array
- Raises:
ValueError – If image has invalid shape or number of channels.
Notes
This encoder accepts images as NumPy arrays (the standard format from image libraries like PIL, OpenCV, scikit-image) and processes them using the configured backend. While input must be NumPy, internal VSA operations use the model’s backend (NumPy/PyTorch/JAX).
- decode(hypervector: Any, height: int, width: int, n_channels: int = 1) numpy.ndarray[source]¶
Decode a hypervector to reconstruct an approximate image.
Note: Image decoding is approximate and requires knowing the target image dimensions. Reconstruction quality depends on the scalar encoder’s decoding capabilities and may require candidate value search.
- Parameters:
- Returns:
Reconstructed image with shape (height, width) for grayscale or (height, width, n_channels) for color.
- Return type:
np.ndarray
- Raises:
NotImplementedError – Image decoding is computationally intractable without additional constraints.
Notes
Image decoding is not implemented because it requires solving a high-dimensional inverse problem that is fundamentally ill-posed:
Mathematical Challenge:
- The encoding process binds pixel values with position vectors:
image_hv = bundle([bind(position(i,j), scalar(pixel[i,j])) for all i,j])
To decode, we must: 1. Unbind each position: pixel_hv[i,j] = unbind(image_hv, position(i,j)) 2. Decode each scalar: pixel[i,j] = scalar_decode(pixel_hv[i,j])
Why This Is Intractable:
Unbinding is approximate (except for FHRR with exact inverse)
Each unbind operation introduces noise
For H×W image: H×W unbind operations compound errors
Scalar decoding via optimization (1000 evals × 100 iterations)
Total: ~100M evaluations for 100×100 image
No gradient available for joint optimization
Alternative Approaches:
Database Retrieval: Encode query image, find nearest match in database - Complexity: O(N) for N known images - Works well for classification/recognition tasks
Iterative Resonator: Use resonator cleanup with pixel codebook - Requires pre-built codebook of common pixel patterns - May reconstruct coarse structure but not fine details
Neural Decoder: Train neural network image_hv → image - Requires supervised training data - Can learn inverse mapping empirically - See: Imani et al. (2019) “VoiceHD” for similar approach
For practical applications, use ImageEncoder for one-way encoding (e.g., image→hypervector→classifier) rather than reconstruction.
References
Imani et al. (2019): “VoiceHD: Hyperdimensional Computing for Efficient Speech Recognition”
Plate (2003): “Holographic Reduced Representations” - Chapter 4 on approximate unbinding and error accumulation
- property is_reversible: bool¶
Whether the encoder supports decoding.
- Returns:
False - image decoding not yet implemented.
- Return type:
- class holovec.encoders.VectorEncoder(model: VSAModel, scalar_encoder: ScalarEncoder, n_dimensions: int, normalize_input: bool = False, seed: int | None = None)[source]¶
Bases:
StructuredEncoderVector encoder for multi-dimensional numeric data using role-filler binding.
Encodes vectors by binding each dimension with its scalar-encoded value:
encode([v₁, v₂, …, vₐ]) = Σᵢ bind(Dᵢ, scalar_encode(vᵢ))
where: - Dᵢ is a random hypervector for dimension i - scalar_encode(vᵢ) encodes the scalar value using FPE/Thermometer/Level - bind() creates a role-filler binding - Σ bundles all dimension-value pairs
This creates a compositional encoding where: - Each dimension has explicit representation (Dᵢ) - Similar values in corresponding dimensions → higher similarity - Supports partial matching across dimensions - Enables approximate decoding via unbinding
- scalar_encoder¶
Encoder for individual scalar values
- n_dimensions¶
Number of dimensions in input vectors
- dim_vectors¶
List of dimension hypervectors (Dᵢ)
- normalize_input¶
Whether to normalize input vectors
Example
>>> from holovec import VSA >>> from holovec.encoders import FractionalPowerEncoder, VectorEncoder >>> >>> model = VSA.create('FHRR', dim=10000) >>> scalar_enc = FractionalPowerEncoder(model, min_val=0, max_val=1) >>> encoder = VectorEncoder(model, scalar_encoder=scalar_enc, n_dims=128) >>> >>> # Encode a feature vector (list or any backend array) >>> features = [0.5] * 128 # Can also use numpy/torch/jax arrays >>> hv = encoder.encode(features) >>> >>> # Similar vectors have high similarity >>> features2 = [0.51] * 128 # Slightly different >>> hv2 = encoder.encode(features2) >>> model.similarity(hv, hv2) # High similarity >>> >>> # Decode to recover approximate values >>> recovered = encoder.decode(hv) >>> # Verify approximate recovery via similarity >>> model.similarity(encoder.encode(recovered), hv) > 0.9
Initialize vector encoder.
- Parameters:
model – VSA model instance
scalar_encoder – Encoder for individual scalar values
n_dimensions – Number of dimensions in input vectors
normalize_input – Whether to normalize input vectors to unit length
seed – Random seed for dimension vector generation
- Raises:
ValueError – If n_dimensions < 1
TypeError – If scalar_encoder is not a ScalarEncoder
- __init__(model: VSAModel, scalar_encoder: ScalarEncoder, n_dimensions: int, normalize_input: bool = False, seed: int | None = None)[source]¶
Initialize vector encoder.
- Parameters:
model – VSA model instance
scalar_encoder – Encoder for individual scalar values
n_dimensions – Number of dimensions in input vectors
normalize_input – Whether to normalize input vectors to unit length
seed – Random seed for dimension vector generation
- Raises:
ValueError – If n_dimensions < 1
TypeError – If scalar_encoder is not a ScalarEncoder
- encode(vector: Any) Any[source]¶
Encode a vector using dimension binding.
Each element is bound with its corresponding dimension vector:
result = Σᵢ bind(Dᵢ, scalar_encode(vector[i]))
- Parameters:
vector – Input vector to encode, shape (n_dimensions,)
- Returns:
Hypervector representing the vector
- Raises:
ValueError – If vector shape doesn’t match n_dimensions
Example
>>> encoder = VectorEncoder(model, scalar_enc, n_dims=3) >>> vector = [1.0, 2.0, 3.0] # Can also be numpy/torch/jax array >>> hv = encoder.encode(vector)
- decode(hypervector: Any) Any[source]¶
Decode vector hypervector to recover approximate values.
For each dimension i: 1. Unbind dimension: value_hv = unbind(hypervector, Dᵢ) 2. Decode scalar: value ≈ scalar_encoder.decode(value_hv)
- Parameters:
hypervector – Vector hypervector to decode, shape (dimension,)
- Returns:
Decoded vector, shape (n_dimensions,) (backend array type)
- Raises:
NotImplementedError – If scalar_encoder doesn’t support decoding
Note
Decoding is approximate and quality depends on: - VSA model (exact vs. approximate binding) - Scalar encoder precision - Number of dimensions (more dims → more noise)
Example
>>> original = [1.0, 2.0, 3.0] >>> encoded = encoder.encode(original) >>> decoded = encoder.decode(encoded) >>> # Check approximate recovery (using backend operations) >>> model.similarity(encoder.encode(decoded), encoded) > 0.9
- property is_reversible: bool¶
VectorEncoder supports approximate decoding if scalar_encoder does.
- Returns:
True if scalar_encoder supports decoding, False otherwise
See Also¶
Encoding Data - Encoder selection guide
Encoder Theory: From Scalars to Sequences - Encoder theory
HoloVec Examples - Encoder examples