Version: 1.1
Date: 2025
Status: Draft
- Overview
- Format Basics
- Tag System
- Data Type Specifications
- Struct and Enum Encoding
- Schema Evolution
- Implementation Notes
The senax-encoder binary format is designed for efficient, compact serialization with a focus on forward and backward compatibility. Each value is tagged with a type identifier, enabling schema evolution and version compatibility.
- Compact Representation: Variable-length encoding for common values
- Self-describing: Each value includes type information
- Version Resilience: Unknown fields/types can be safely skipped
- Little Endian: Consistent byte order across platforms
All multi-byte integers are encoded in little-endian format.
All encoded values follow this pattern:
[TAG:u8] [DATA:variable]
Where:
TAGis a single byte identifying the type and encoding methodDATAis the encoded value, format depends on the tag
For optimal space efficiency, integers use variable-length encoding:
- Values 0-127: Encoded directly in the tag byte
- Larger values: Use dedicated tag + payload encoding
- Signed integers: Negative values use bit-inverted encoding (not ZigZag)
Field IDs and variant IDs use an optimized encoding scheme for space efficiency:
Encoding Rules:
- Field IDs 1-250: Encoded as single
u8byte - Field IDs 251+: Encoded as
0xFFmarker byte followed byu64little-endian - Terminator: Encoded as
0x00byte to mark end of fields
Format:
// Small field ID (1-250)
[field_id:u8] [field_value]
// Large field ID (251+)
[0xFF] [field_id:u64_le] [field_value]
// Terminator
[0x00]
Size Benefits:
- Most field IDs (1-250) use only 1 byte instead of 8 bytes
- Terminator uses 1 byte instead of 8 bytes
- Large field IDs (rare) use 9 bytes (1 marker + 8 data)
Examples:
field_id=1 -> [0x01] // Direct u8 encoding
field_id=250 -> [0xFA] // Direct u8 encoding (250 = 0xFA)
field_id=251 -> [0xFF, 0xFB, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00] // Marker + u64_le
terminator -> [0x00] // End of fields
This optimization significantly reduces binary size for typical structs and enums while maintaining full u64 field ID range support.
Tags are assigned in ranges for semantic grouping:
pub const TAG_ZERO: u8 = 0;
pub const TAG_ONE: u8 = 1;
// 2-127: Direct encoding for values 2-127
pub const TAG_U8_127: u8 = 127; // Value 127
// Extended integer types
pub const TAG_NONE: u8 = 128;
pub const TAG_SOME: u8 = 129;
pub const TAG_U8: u8 = 131;
pub const TAG_U16: u8 = 132;
pub const TAG_U32: u8 = 133;
pub const TAG_U64: u8 = 134;
pub const TAG_U128: u8 = 135;
pub const TAG_NEGATIVE: u8 = 136;
// Floating point
pub const TAG_F32: u8 = 137;
pub const TAG_F64: u8 = 138;
// Strings
pub const TAG_STRING_BASE: u8 = 139; // 139-179: Short strings (0-40 chars)
pub const TAG_STRING_LONG: u8 = 180;
// Collections and containers
pub const TAG_BINARY: u8 = 181;
pub const TAG_STRUCT_UNIT: u8 = 182;
pub const TAG_STRUCT_NAMED: u8 = 183;
pub const TAG_STRUCT_UNNAMED: u8 = 184;
pub const TAG_ENUM: u8 = 185;
pub const TAG_ENUM_NAMED: u8 = 186;
pub const TAG_ENUM_UNNAMED: u8 = 187;
pub const TAG_ARRAY_VEC_SET_BASE: u8 = 188; // 188-193: Short arrays (0-5 elements)
pub const TAG_ARRAY_VEC_SET_LONG: u8 = 194;
pub const TAG_TUPLE: u8 = 195;
pub const TAG_MAP: u8 = 196;
// Extended types (optional features)
pub const TAG_CHRONO_DATETIME: u8 = 197;
pub const TAG_CHRONO_NAIVE_DATE: u8 = 198;
pub const TAG_CHRONO_NAIVE_TIME: u8 = 199;
pub const TAG_DECIMAL: u8 = 200;
pub const TAG_UUID: u8 = 201; // Shared by UUID and ULIDEncoding:
false:TAG_ZERO(0x00)true:TAG_ONE(0x01)
Example:
true -> [0x01]
false -> [0x00]
Compact Encoding (0-127):
value -> [TAG_ZERO + value]
Extended Encoding:
u8 -> [TAG_U8] [value-128:u8] (range: 128-383)
u16 -> [TAG_U16] [value:u16_le] (range: 256-65535)
u32 -> [TAG_U32] [value:u32_le] (range: 65536-4294967295)
u64 -> [TAG_U64] [value:u64_le] (range: 4294967296-18446744073709551615)
u128 -> [TAG_U128] [value:u128_le] (range: 18446744073709551616+)
Size Selection:
- 0-127: Direct encoding (1 byte total)
- 128-383: u8 encoding (2 bytes total) - stores value-128
- 384-65535: u16 encoding (3 bytes total)
- etc.
Examples:
42 -> [0x2A] // TAG_ZERO + 42 = 0 + 42 = 42 = 0x2A
128 -> [0x83, 0x00] // TAG_U8, 128-128=0
255 -> [0x83, 0x7F] // TAG_U8, 255-128=127
383 -> [0x83, 0xFF] // TAG_U8, 383-128=255
384 -> [0x84, 0x80, 0x01] // TAG_U16, 384 in LE
Special Cases:
0:TAG_ZERO(0x00)1:TAG_ONE(0x01)
Encoding Rule:
- 0 and positive values: Encoded as unsigned integers
- Negative values:
TAG_NEGATIVE(0x88) + bit-inverted encoding
Format:
// 0, positive values
[value:variable_uint]
// Negative values
[TAG_NEGATIVE] [(!n):variable_uint]
Examples:
0 -> [0x00] // TAG_ZERO
1 -> [0x01] // TAG_ONE
2 -> [0x02] // TAG_ZERO+2
-1 -> [0x88, 0x00] // TAG_NEGATIVE, !(-1)=0 -> TAG_ZERO
-2 -> [0x88, 0x01] // TAG_NEGATIVE, !(-2)=1 -> TAG_ONE
-128 -> [0x88, 0x7F] // TAG_NEGATIVE, !(-128)=127 -> TAG_ZERO+127
Format:
f32 -> [TAG_F32] [value:f32_le]
f64 -> [TAG_F64] [value:f64_le]
Cross-Type Decoding:
- f64 can be decoded as f32 (with potential precision loss)
- f32 to f64 cross-decoding is not supported due to precision ambiguity
Short Strings (0-40 bytes):
[TAG_STRING_BASE + length] [utf8_bytes]
Long Strings:
[TAG_STRING_LONG] [length:variable_uint] [utf8_bytes]
Examples:
"" -> [0x8B] // TAG_STRING_BASE + 0
"hi" -> [0x8D, 0x68, 0x69] // TAG_STRING_BASE + 2, "hi"
"long" -> [0xB4, 0x04, 0x6C, 0x6F, 0x6E, 0x67] // TAG_STRING_LONG, length=4, "long"
Format:
None -> [TAG_NONE] // 0x80 (128)
Some(v) -> [TAG_SOME] [encoded_value] // 0x81 (129) + value
Short Collections (0-5 elements):
[TAG_ARRAY_VEC_SET_BASE + count] [element1] [element2] ...
Long Collections:
[TAG_ARRAY_VEC_SET_LONG] [count:variable_uint] [element1] [element2] ...
Format:
[TAG_MAP] [count:variable_uint] [key1] [value1] [key2] [value2] ...
Format:
[TAG_TUPLE] [element_count:variable_uint] [element1] [element2] ...
Vec and Bytes:
[TAG_BINARY] [length:variable_uint] [raw_bytes]
Format:
[TAG_CHRONO_DATETIME] [seconds:i64] [nanos:u32]
All DateTime types (UTC, Local) are normalized to UTC for storage.
Format:
[TAG_CHRONO_NAIVE_DATE] [days_from_epoch:i64]
Epoch: 1970-01-01
Format:
[TAG_CHRONO_NAIVE_TIME] [seconds_from_midnight:u32] [nanoseconds:u32]
Format:
[TAG_CHRONO_NAIVE_DATETIME] [seconds:i64] [nanos:u32]
Stores as seconds and nanoseconds since Unix epoch (1970-01-01 00:00:00 UTC).
Format:
[TAG_DECIMAL] [mantissa:i128] [scale:u32]
Format:
[TAG_UUID] [value:u128_le]
Note: UUID and ULID share the same tag and are binary compatible at the encoding level.
Dynamic JSON values are supported when the serde_json feature is enabled. Each JSON value variant has its own tag:
- TAG_JSON_NULL (202): JSON null value
- TAG_JSON_BOOL (203): JSON boolean (uses existing boolean encoding)
- TAG_JSON_NUMBER (204): JSON number with type preservation
- TAG_JSON_STRING (205): JSON string (uses existing string encoding)
- TAG_JSON_ARRAY (206): JSON array
- TAG_JSON_OBJECT (207): JSON object
JSON numbers are encoded with type preservation to maintain integer/float distinction:
Format: TAG_JSON_NUMBER + type_marker + value
type_marker = 0: Unsigned integer, followed by u64 encodingtype_marker = 1: Signed integer, followed by i64 encodingtype_marker = 2: Float, followed by f64 encoding
Examples:
42(integer) →[204, 0, ...](TAG_JSON_NUMBER, unsigned integer marker, i64 encoding)3.14159(float) →[204, 2, ...](TAG_JSON_NUMBER, float marker, f64 encoding)
Format: TAG_JSON_ARRAY + length + elements...
Format: TAG_JSON_OBJECT + length + (key, value)...
Keys are encoded as strings, values are recursively encoded as JSON values.
Examples:
null→[202]true→[203, 4](TAG_JSON_BOOL, TAG_ONE)"hello"→[205, 144](TAG_JSON_STRING, string encoding)[]→[206, 3](TAG_JSON_ARRAY, length 0){}→[207, 3](TAG_JSON_OBJECT, length 0)
Format:
[TAG_STRUCT_UNIT]
Format:
[TAG_STRUCT_NAMED] [field_id_optimized] [field_value] ... [0x00]
Field Encoding Rules:
- Each field is encoded as
[field_id_optimized] [field_value] - Field IDs are derived from field names (CRC64(ECMA-182) hash) or custom
#[senax(id=n)]attributes - Field IDs 1-250 are encoded as single
u8bytes - Field IDs 251+ are encoded as
0xFFmarker +u64little-endian - Optional fields with
Nonevalues are omitted entirely - Terminator: single zero byte (0x00) marks end of fields
Format:
[TAG_STRUCT_UNNAMED] [field_count:variable_uint] [field1] [field2] ...
Format:
[TAG_ENUM] [variant_id_optimized]
Format:
[TAG_ENUM_NAMED] [variant_id_optimized] [field_id_optimized] [field_value] ... [0x00]
Format:
[TAG_ENUM_UNNAMED] [variant_id_optimized] [field_count:variable_uint] [field1] [field2] ...
Variant ID Assignment:
- Derived from variant name (CRC64 hash) or custom
#[senax(id=n)]attributes - Variant IDs 1-250 are encoded as single
u8bytes - Variant IDs 251+ are encoded as
0xFFmarker +u64little-endian - Must be stable across versions for compatibility
Adding Fields:
- New optional fields: Automatically handled (default to None)
- New required fields: Must have defaults or be made optional
- In addition to having a Rust default value, you must explicitly annotate the field with
#[senax(default)]to ensure forward/backward compatibility.
- In addition to having a Rust default value, you must explicitly annotate the field with
- Fields with
#[senax(skip_default)]: Only encoded when value differs from default, automatically use default value when missing during decode
Adding Enum Variants:
- Use custom
#[senax(id=n)]for stable IDs - Unknown variants cause decode errors
Removing Fields:
- Unknown field IDs are automatically skipped during decoding
- No decoder changes required
Removing Enum Variants:
- May cause decode errors if old data contains removed variants
- Consider deprecation strategy
Field order changes are automatically handled due to ID-based encoding.
Compatible Changes:
u32↔i64(if values fit)f32↔f64u32→Option<u32>
Incompatible Changes:
String→u32Vec<T>→HashMap<K,V>- None → Required
Decoders must implement a skip_value() function that can skip unknown tagged values without parsing them. This enables forward compatibility.
Decode Errors:
- Invalid UTF-8 in strings
- Unknown enum variants
- Malformed data (unexpected EOF, invalid tags)
- Type conversion failures
All multi-byte values use little-endian encoding for consistency across platforms.
This specification defines the complete binary format for senax-encoder. Implementations should follow these rules exactly to ensure cross-version and cross-platform compatibility.