XUASTC LDR ‐ Arithmetic Profile

Note: This is a preliminary and in-progress specification, and is subject to change.

XUASTC LDR - Arithmetic Profile

Introduction

The arithmetic/range coding profile supports two modes:

Full arithmetic: Block metadata, endpoint data, and weight data are arithmetic coded (Zstd is not used at all).
Hybrid arithmetic/Zstd: All weight data is coded with Zstd (just like the full Zstd profile) for faster transcoding. All other data is encoded the same way as full arithmetic mode.

Said-style arithmetic/range coding (paper here) is used by the arithmetic/range profile due to its flexibility, excellent documentation, optional optimizations, the availability of a fully independent implementation (FastAC), and its ability to efficiently support hundreds of contexts. The full low-level entropy decoder is fully described in detail here.

The range coder supports binary and multi-alphabet symbols with adaptive context modeling, unsigned integers using Truncated Binary Encoding, and plain bits or unsigned integers of various bit lengths.

All data (including raw bits) in the full arithmetic profile is sent via the arithmetic coder in a single large sequential stream (fully within the "arithmetic" section). In the hybrid profile the weight related data is sent in numerous additional Zstd compressed or uncompressed sections, very similar to the full Zstd profile.

For readers familiar with video codecs, this profile is conceptually similar to CABAC-style decoding operating purely in intra mode, except that the output consists of ASTC logical blocks rather than pixels.

Mipmap Data Profile Selection

The first byte of the compressed mipmap data indicates the profile (or syntax) used to compress the level's texture data:

enum xuastc_ldr_syntax
{
    cFullArith = 0,
    cHybridArithZstd = 1,
    cFullZStd = 2
};

If the first byte is cFullArith (0), the full arithmetic profile is used, and if cHybridArithZstd (1) the hybrid arithmetic+Zstd profile is used. This document covers these two cases, and the other case (cFullZstd, which doesn't use Said-style arithmetic coding at all) is covered in the main specification here.

Profile Header - Full Arithmetic

In the full arithmetic profile, there is no profile header after the first byte (cFullArith, or 0). The mipmap level's arithmetic coded data immediately follows this byte.

Top-Level Layout:

+----------------------------------------+
| syntax byte (cFullArith or 0)          |
+----------------------------------------+
| arithmetic coded data                  |
+----------------------------------------+

Profile Header - Hybrid

In the hybrid profile, the mipmap level's data is prefixed by a 45 byte little endian header, and is then followed by various sections of arithmetic coded, Zstd compressed or uncompressed data:

Top-Level Layout:

+----------------------------------------+
| syntax byte (cHybridArithZstd or 1)    |
+----------------------------------------+
| arithmetic coded data                  |
+----------------------------------------+
| mean0 (Zstd)                           |
| mean1 (Zstd)                           |
| run (Zstd)                             |
| coeff (Zstd)                           |
| sign (uncompressed/raw bits)           |
+----------------------------------------+
| weight2_bits (Zstd)                    |
| weight3_bits (Zstd)                    |
| weight4_bits (Zstd)                    |
| weight8_bytes (Zstd)                   |
+----------------------------------------+

There is one section of arithmetic coded data (which must always be present), and up to eight sections of Zstd compressed data. The minimum size of the arithmetic coded section is 5 bytes. A section length of 0 bytes indicates that this section is not present.

The first byte must be cHybridArithZstd (1), which is also the first byte of the 45 byte header.

#pragma pack(push, 1)
struct xuastc_ldr_arith_header
{
    uint8_t m_flags; // must be cHybridArithZstd (1)
    uint32_t m_arith_bytes_len; // byte size of arithmetic/range coded section
    uint32_t m_mean0_bits_len; // Zstd, byte size of mean0_bits section
    uint32_t m_mean1_bytes_len; // Zstd, byte size of mean1_bits section
    uint32_t m_run_bytes_len; // Zstd, byte size of run_bytes section
    uint32_t m_coeff_bytes_len; // Zstd, byte size of coeff_bytes section
    uint32_t m_sign_bits_len; // always uncompressed
    uint32_t m_weight2_bits_len; // Zstd, 2-bit weights (4 per byte), up to BISE_4_LEVELS
    uint32_t m_weight3_bits_len; // Zstd, 3-bit weights (2 per byte), up to BISE_8_LEVELS
    uint32_t m_weight4_bits_len; // Zstd, 4-bit weights (2 per byte), up to BISE_16_LEVELS
    uint32_t m_weight8_bytes_len; // Zstd, 8-bit weights (1 per byte), up to BISE_32_LEVELS
    uint32_t m_unused; // Future expansion
};
#pragma pack(pop)

System Header (Both Profiles)

The following small System Header is always present at the beginning of the arithmetic coded section. It's coded using raw bits (using arith_dec::get_bits() (i.e. through the arithmetic/range coder, not directly as individual bits):

Field	Bits
Stream Version	5
ASTC Block Size Index	4
sRGB Decode Flag	1
Actual Width	16
Actual Height	16
Has Alpha	1
Use DCT	1
DCT Q (dct_q)×2	8 (optional)

This header is interpreted in the same way as the Zstd profile's System Header. The Stream Version must be 1.

DCT Q is only present if Use DCT is 1. Once divided by 2.0, the resulting Q value must be [1.0,100.0].

The number of logical ASTC blocks to be decoded below is computed as floor((actual_width+block_width-1)/block_width) and floor((actual_height+block_height-1)/block_height).

Mip-Map Decoding

The following sections reference these other references or resources:

XUASTC LDR Arithmetic/Range Decoding
The astc_helpers namespace, directly based off the ASTC specification
The log_astc_block structure, also directly based off the ASTC specification
The Reuse XY Delta Table.

Global grouped_trial_modes class

The grouped_trial_modes class (or struct in the C++ transcoder source code) is used during XUASTC LDR configuration (or "trial mode") decoding. It is initialized once, for each block size, at library init time. This class organizes the XUASTC LDR configuration (or latent) space into a set of buckets in a multi-dimensional array, where each bucket contains an array of trial mode indices. This array is used during decoding trial mode indices.

// Trial Mode (XUASTC configuration) description structure
struct trial_mode
{
    // Grid dimension
    uint32_t m_grid_width;
    uint32_t m_grid_height;
    // ASTC CEM index
    uint32_t m_cem;
    // dual plane CCS index, or -1
    int m_ccs_index;
    // endpoint and weight ISE range
    uint32_t m_endpoint_ise_range;
    uint32_t m_weight_ise_range;
    // number of partitions: [1,3]
    uint32_t m_num_parts;
};

grouped_trial_modes g_grouped_encoder_trial_modes[astc_helpers::cTOTAL_BLOCK_SIZES];

const OTM_NUM_CEMS = 14; // 0-13 (13=highest valid LDR CEM)
const OTM_NUM_SUBSETS = 3; // 1-3
const OTM_NUM_CCS = 5; // -1 to 3
const OTM_NUM_GRID_SIZES = 2; // 0=small or 1=large (grid_w>=block_w-1 and grid_h>=block_h-1)
const OTM_NUM_GRID_ANISOS = 3; // 0=W=H, 1=W>H, 2=W<H

// Calculate weight grid anisotropy index from grid and block dimensions using integer math
uint32_t calc_grid_aniso_val(uint32_t gw, uint32_t gh, uint32_t bw, uint32_t bh)
{
    assert((gw > 0) && (gh > 0));
    assert((bw > 0) && (bh > 0));
    assert((gw <= 12) && (gh <= 12) && (bw <= 12) && (bh <= 12));
    assert((gw <= bw) && (gh <= bh));
                
    // Compare gw/bw vs. gh/bh using integer math:
    // gw*bh >= gh*bw  -> X-dominant (1), else Y-dominant (2)
    const uint32_t lhs = gw * bh;
    const uint32_t rhs = gh * bw;

    // Equal (isotropic), X=Y
    if (lhs == rhs)
        return 0;

    // Anisotropic - 1=X, 2=Y
    return (lhs >= rhs) ? 1 : 2;
}

class grouped_trial_modes
{
public:
    std::vector<uint32_t> m_tm_groups[OTM_NUM_CEMS][OTM_NUM_SUBSETS][OTM_NUM_CCS][OTM_NUM_GRID_SIZES][OTM_NUM_GRID_ANISOS]; // indices of encoder trial modes in each bucket

    // Add a new XUASTC trial mode description to the correct bucket
    void add(uint32_t block_width, uint32_t block_height,
        const trial_mode& tm, uint32_t tm_index)
    {
        const uint32_t cem_index = tm.m_cem;
        assert(cem_index < OTM_NUM_CEMS);

        const uint32_t subset_index = tm.m_num_parts - 1;
        assert(subset_index < OTM_NUM_SUBSETS);

        const uint32_t ccs_index = tm.m_ccs_index + 1;
        assert(ccs_index < OTM_NUM_CCS);

        const uint32_t grid_size = (tm.m_grid_width >= (block_width - 1)) && (tm.m_grid_height >= (block_height - 1));
        const uint32_t grid_aniso = calc_grid_aniso_val(tm.m_grid_width, tm.m_grid_height, block_width, block_height);

        // Get the bucket reference
        auto &v = m_tm_groups[cem_index][subset_index][ccs_index][grid_size][grid_aniso];

        // Add the trial mode index to the bucket
        v.push_back(tm_index);
    }
}

To initialize this table: for each block size, iterate through all trial modes (defined in the XUASTC LDR specification - Appendix 2: Encoded XUASTC LDR Block Configuration Table), and add each one using grouped_trial_modes::add(). During iteration reject any trial modes with weight grids larger than each block size.

The get_tm_candidates() function below, which is used by the decoder to determine trial mode indices, accesses this array.

Constants/Enums

// Shared constants/enums
const REUSE_XY_DELTA_BITS = 5;
const NUM_REUSE_XY_DELTAS = 1 << REUSE_XY_DELTA_BITS;

const PART_HASH_BITS = 6u;
const PART_HASH_SIZE = 1u << PART_HASH_BITS;

const TM_HASH_BITS = 7u;
const TM_HASH_SIZE = 1u << TM_HASH_BITS;

enum xuastc_mode
{
	cMODE_SOLID = 0,
	cMODE_RAW = 1,

	// Full cfg, partition ID, and all endpoint value reuse.
	cMODE_REUSE_CFG_ENDPOINTS_LEFT = 2,
	cMODE_REUSE_CFG_ENDPOINTS_UP = 3,
	cMODE_REUSE_CFG_ENDPOINTS_DIAG = 4,

	cMODE_RUN = 5,

	cMODE_TOTAL,
};

const ARITH_HEADER_MARKER_BITS = 5;
const ARITH_HEADER_MARKER = 0x01;
const FINAL_SYNC_MARKER_BITS = 8;
const FINAL_SYNC_MARKER = 0xAF;
const DCT_RUN_LEN_EOB_SYM_INDEX = 64;
const DCT_MEAN_LEVELS0 = 9;
const DCT_MEAN_LEVELS1 = 33;
const DCT_MAX_ARITH_COEFF_MAG = 255;

Constants and Arithmetic Decoding Models

The following arithmetic models must be initialized before decoding.

// Declare models used during decompression

arith::arith_data_model mode_model((uint32_t)xuastc_mode::cMODE_TOTAL);

arith::arith_data_model solid_color_dpcm_model[4];
for (uint32_t i = 0; i < 4; i++)
	solid_color_dpcm_model[i].init(256, true);

arith::arith_data_model raw_endpoint_models[astc_helpers::TOTAL_ENDPOINT_ISE_RANGES];
for (uint32_t i = 0; i < astc_helpers::TOTAL_ENDPOINT_ISE_RANGES; i++)
	raw_endpoint_models[i].init(astc_helpers::get_ise_levels(astc_helpers::FIRST_VALID_ENDPOINT_ISE_RANGE + i));

arith::arith_data_model dpcm_endpoint_models[astc_helpers::TOTAL_ENDPOINT_ISE_RANGES];
for (uint32_t i = 0; i < astc_helpers::TOTAL_ENDPOINT_ISE_RANGES; i++)
	dpcm_endpoint_models[i].init(astc_helpers::get_ise_levels(astc_helpers::FIRST_VALID_ENDPOINT_ISE_RANGE + i));
		
arith::arith_bit_model is_base_ofs_model;
arith::arith_bit_model use_dct_model[4];
arith::arith_bit_model use_dpcm_endpoints_model;

arith::arith_data_model cem_index_model[8];
for (uint32_t i = 0; i < 8; i++)
	cem_index_model[i].init(OTM_NUM_CEMS);

arith::arith_data_model subset_index_model[OTM_NUM_SUBSETS];
for (uint32_t i = 0; i < OTM_NUM_SUBSETS; i++)
	subset_index_model[i].init(OTM_NUM_SUBSETS);

arith::arith_data_model ccs_index_model[OTM_NUM_CCS];
for (uint32_t i = 0; i < OTM_NUM_CCS; i++)
	ccs_index_model[i].init(OTM_NUM_CCS);

arith::arith_data_model grid_size_model[OTM_NUM_GRID_SIZES];
for (uint32_t i = 0; i < OTM_NUM_GRID_SIZES; i++)
	grid_size_model[i].init(OTM_NUM_GRID_SIZES);

arith::arith_data_model grid_aniso_model[OTM_NUM_GRID_ANISOS];
for (uint32_t i = 0; i < OTM_NUM_GRID_ANISOS; i++)
	grid_aniso_model[i].init(OTM_NUM_GRID_ANISOS);

arith::arith_data_model dct_run_len_model; // [0,63] or 64=EOB
arith::arith_data_model dct_coeff_mag; // [1,255] (blocks with larger mags go DPCM)
arith::arith_data_model weight_mean_models[2];
arith::arith_data_model raw_weight_models[astc_helpers::TOTAL_WEIGHT_ISE_RANGES];

// use_fast_decoding is true only in the hybrid profile (cHybridArithZstd). In full arithmetic profile (cFullArith) it is false.

if (!use_fast_decoding)
{
	// Models used for weight decompression in pure arithmetic mode.
	dct_run_len_model.init(65);
	dct_coeff_mag.init(255);

	weight_mean_models[0].init(DCT_MEAN_LEVELS0);
	weight_mean_models[1].init(DCT_MEAN_LEVELS1);
	
	for (uint32_t i = 0; i < astc_helpers::TOTAL_WEIGHT_ISE_RANGES; i++)
		raw_weight_models[i].init(astc_helpers::get_ise_levels(astc_helpers::FIRST_VALID_WEIGHT_ISE_RANGE + i));
}

arith::arith_data_model submode_models[OTM_NUM_CEMS][OTM_NUM_SUBSETS][OTM_NUM_CCS][OTM_NUM_GRID_SIZES][OTM_NUM_GRID_ANISOS];

arith::arith_bit_model endpoints_use_bc_models[4];

arith::arith_data_model endpoint_reuse_delta_model(NUM_REUSE_XY_DELTAS); // symbol from ASTC HDR 6x6 and XUASTC LDR Zstd profile
		
arith::arith_data_model config_reuse_model[4];
for (uint32_t i = 0; i < 4; i++)
	config_reuse_model[i].init(4);

arith::arith_gamma_contexts m_run_len_contexts;

arith::arith_bit_model use_part_hash_model[4];
arith::arith_data_model part2_hash_index_model(PART_HASH_SIZE, true);
arith::arith_data_model part3_hash_index_model(PART_HASH_SIZE, true);

Temporary State Using During Decoding

The following temporary state is used during decompression. num_blocks_x, and num_blocks_y are computed from the System Header fields.

// The decoded ASTC logical blocks (only need 8 mipmap rows)
vector2D<astc_helpers::log_astc_block> log_blocks;
log_blocks.resize(num_blocks_x, 8);

// Ensure all logical blocks are cleared
memset(log_blocks.get_ptr(), 0, log_blocks.size_in_bytes());

// Additional block state, used for prediction purposes
struct prev_block_state
{
    bool m_was_solid_color;
    bool m_used_weight_dct;
    bool m_first_endpoint_uses_bc;
    bool m_reused_full_cfg;
    bool m_used_part_hash;

    int m_tm_index; // -1=invalid (solid color block)
    uint32_t m_base_cem_index; // doesn't include base+ofs
    uint32_t m_subset_index, m_ccs_index, m_grid_size, m_grid_aniso;

    prev_block_state()
    {
        clear();
    }

    void clear()
    { 
        // The decoder explicitly sets m_tm_index = -1 for solid-color blocks.
        memset(this, 0, sizeof(*this));
    }
};
		
vector2D<prev_block_state> prev_block_states;

// We need 2 rows of per-block prediction state
prev_block_states.resize(num_blocks_x, 2);

// Current run length
uint32_t cur_run_len = 0;

// 2 and 3 subset partition hash tables
int part2_hash[PART_HASH_SIZE];
std::fill(part2_hash, part2_hash + PART_HASH_SIZE, -1);

int part3_hash[PART_HASH_SIZE];
std::fill(part3_hash, part3_hash + PART_HASH_SIZE, -1);

High-Level Block Decoding

This section describes the top-level process for reconstructing a mipmap level into a 2D array of logical ASTC blocks, which can be immediately stored as physical ASTC blocks, or transcoded to another GPU texture format such as BC7. After the System Header is decoded, the working buffers are initialized, and the temporary state is initialized, blocks are processed in raster order (left to right, top to bottom) using small rolling histories of previously decoded blocks and predictor state (log_blocks and prev_block_states).

For each block, the decoder first checks for an active run and, if present, replicates the previous block and associated predictor state. Otherwise it decodes a per-block mode symbol from the arithmetic stream and dispatches to the appropriate decoding path: solid-color blocks, run commands, or non-solid blocks (raw or reuse-based). Each decoded block updates predictor state fields that influence subsequent context modeling, and decoding continues until all blocks are produced and the final stream sync marker is verified.

// Inputs from the previously decoded System Header:
block_width, block_height
actual_width, actual_height
has_alpha
use_dct (global flag), dct_q

num_blocks_x = (actual_width  + block_width  - 1) / block_width // integer division 
num_blocks_y = (actual_height + block_height - 1) / block_height // integer division

// Previously defined output logical blocks, and temporary state:
// log_blocks, prev_block_states

cur_run_len = 0

for (by = 0; by < num_blocks_y; by++)
{
  for (bx = 0; bx < num_blocks_x; bx++)
  {
    prev_block_state& new_state = prev_block_states(bx, by & 1)
    new_state.clear()

    left_state  = (bx > 0) ? &prev_block_states(bx-1, by & 1) : null
    upper_state = (by > 0) ? &prev_block_states(bx, (by-1) & 1) : null
    diag_state  = (bx > 0 && by > 0) ? &prev_block_states(bx-1, (by-1) & 1) : null
    pred_state  = left_state ? left_state : upper_state

    log_astc_block& log_blk = log_blocks(bx, by & 7)
    log_blk.clear()

    // ------------------------
    // Active run replication
    // ------------------------
    if (cur_run_len > 0)
    {
      // previous block is left if available, else upper (run is always valid here)
      prev_log = (bx > 0) ? log_blocks(bx-1, by & 7)
                          : log_blocks(bx, (by-1) & 7)

      log_blk = prev_log
      emit_block(bx, by, log_blk)   // callback if used

      // Copy predictor metadata so future contexts match the stream's modeling
      prev_state = (bx > 0) ? left_state : upper_state
      new_state.copy_from(*prev_state)

      cur_run_len--
      continue
    }

    // ------------------------
    // Decode per-block mode
    // ------------------------
    mode_index = dec.decode_sym(mode_model)

    switch (mode_index)
    {
      case cMODE_SOLID:
        decode_solid_block(dec, bx, by, log_blk, new_state, left/upper context...)
        emit_block(bx, by, log_blk)
        break

      case cMODE_RUN:
        // illegal for (0,0)
        if (bx == 0 && by == 0) fail()

        cur_run_len = dec.decode_gamma(m_run_len_contexts)
        if (cur_run_len == 0) fail()

        if (cur_run_len > (num_blocks_x - bx)) fail()

        // replicate previous block into current block
        prev_log = (bx > 0) ? log_blocks(bx-1, by & 7)
                            : log_blocks(bx, (by-1) & 7)

        log_blk = prev_log
        emit_block(bx, by, log_blk)

        prev_state = (bx > 0) ? left_state : upper_state
        new_state.copy_from(*prev_state)

        cur_run_len--        // current block counts as first repeated block
        break

      case cMODE_REUSE_CFG_ENDPOINTS_LEFT:
      case cMODE_REUSE_CFG_ENDPOINTS_UP:
      case cMODE_REUSE_CFG_ENDPOINTS_DIAG:
      case cMODE_RAW:
        decode_non_solid_block(dec, bx, by, log_blk, new_state,
                               left_state, upper_state, diag_state, pred_state,
                               log_blocks ring buffer, global flags...)
        emit_block(bx, by, log_blk)
        break

      default:
        fail()
    }
  }

  assert(cur_run_len == 0) // runs do not cross scanlines
}

// End-of-stream sync marker (see prior section)
final = dec.get_bits(FINAL_SYNC_MARKER_BITS)
if (final != FINAL_SYNC_MARKER) fail()

Solid Block Decoding

Solid Block Decoding handles the case where an ASTC block represents a single constant color. In this mode, the decoder predicts an 8-bit RGBA color from the immediately preceding block (preferring the left neighbor if available, otherwise the upper neighbor). If the previous block was also solid, its solid color is used directly as the predictor; otherwise, the midpoint of the previous block’s first color endpoint pair is used. The arithmetic decoder then reconstructs the block’s color by decoding per-channel DPCM deltas relative to this predictor, with values wrapping modulo 256. If the texture has no alpha channel, the alpha value is forced to 255. The resulting 8-bit components are expanded to 16-bit form and stored in the logical block, and the predictor state is updated to reflect that a solid-color block was decoded, biasing future context modeling accordingly.

// Note "local" functions inherit local variables from the block decompression loop above
local function decode_solid_block()
{
  // Prefer left predictor, else upper, else none
  pPrev_log_blk = null
  if (bx != 0)
    pPrev_log_blk = &log_blocks(bx - 1, by & 7)
  else if (by != 0)
    pPrev_log_blk = &log_blocks(bx, (by - 1) & 7)

  prev_solid_color[4] = { 0, 0, 0, 0 }

  if (pPrev_log_blk != null)
  {
    if (pPrev_log_blk->m_solid_color_flag_ldr)
    {
      prev_solid_color[0] = pPrev_log_blk->m_solid_color[0] >> 8
      prev_solid_color[1] = pPrev_log_blk->m_solid_color[1] >> 8
      prev_solid_color[2] = pPrev_log_blk->m_solid_color[2] >> 8
      prev_solid_color[3] = pPrev_log_blk->m_solid_color[3] >> 8
    }
    else
    {
      // Decode previous block's first CEM to 8-bit RGBA colors following the ASTC standard, use the halfway point as the predictor.
      prev_l, prev_h = astc_helpers::decode_endpoints(
        pPrev_log_blk->m_color_endpoint_modes[0],
        pPrev_log_blk->m_endpoints,
        pPrev_log_blk->m_endpoint_ise_range)

      prev_solid_color[0] = (prev_l[0] + prev_h[0] + 1) >> 1
      prev_solid_color[1] = (prev_l[1] + prev_h[1] + 1) >> 1
      prev_solid_color[2] = (prev_l[2] + prev_h[2] + 1) >> 1
      prev_solid_color[3] = (prev_l[3] + prev_h[3] + 1) >> 1
    }
  }

  // Decode DPCM symbols, apply with wrapping
  r = (prev_solid_color[0] + dec.decode_sym(solid_color_dpcm_model[0])) & 0xFF
  g = (prev_solid_color[1] + dec.decode_sym(solid_color_dpcm_model[1])) & 0xFF
  b = (prev_solid_color[2] + dec.decode_sym(solid_color_dpcm_model[2])) & 0xFF

  a = 255
  if (has_alpha)
    a = (prev_solid_color[3] + dec.decode_sym(solid_color_dpcm_model[3])) & 0xFF

  log_blk.m_solid_color_flag_ldr = true
  log_blk.m_solid_color[0] = (uint16_t)(r | (r << 8))
  log_blk.m_solid_color[1] = (uint16_t)(g | (g << 8))
  log_blk.m_solid_color[2] = (uint16_t)(b | (b << 8))
  log_blk.m_solid_color[3] = (uint16_t)(a | (a << 8))

  // Bias the statistics towards using DCT (most common case).
  if (use_dct)
    new_state.m_used_weight_dct = true

  new_state.m_first_endpoint_uses_bc = true
  new_state.m_was_solid_color = true
  new_state.m_tm_index = -1
  new_state.m_base_cem_index = astc_helpers::CEM_LDR_RGB_DIRECT
  new_state.m_subset_index = 0
  new_state.m_ccs_index = 0
  new_state.m_grid_size = 0
  new_state.m_grid_aniso = 0
  new_state.m_reused_full_cfg = false
  new_state.m_used_part_hash = true   // bias to true
}

Non-Solid Block Decoding

Non-Solid block decoding handles all ASTC blocks that are not encoded as a single constant color, including blocks that reuse configuration and/or endpoint data from neighboring blocks as well as fully specified blocks. In this mode, the decoder first determines whether the block can reuse a complete configuration (and possibly endpoints) from the left, upper, or upper-left neighbor, or whether it must decode a new configuration from the arithmetic stream. For newly decoded configurations, the decoder selects a trial mode using context-adaptive symbols, optionally applies base-plus-offset endpoint modes, and decodes a partition identifier if the block uses multiple partitions.

Once the block’s configuration is established, color endpoints are reconstructed either directly or via DPCM prediction from a nearby block using reuse deltas, with optional blue-contraction decisions decoded per partition. Finally, the block’s weight grid is decoded, using either DCT-based reconstruction or weight DPCM depending on per-block decisions and global settings. Throughout this process, the decoder updates predictor state to reflect the decoded configuration, endpoint usage, and weight coding choices, ensuring that subsequent blocks are decoded with matching context models.

local function decode_non_solid_block()
{
  tm_index = 0
  actual_cem = 0

  // -------------------------------------------------------
  // 1. Handle full cfg+partition+endpoint reuse from neighbor
  //    (the cMODE_REUSE_CFG_ENDPOINTS_* modes)
  // -------------------------------------------------------
  if (mode_index != cMODE_RAW)
  {
    // 0 = left, 1 = upper, 2 = left-upper
    cfg_dx = 0
    cfg_dy = 0
    pCfg_state = null

    switch (mode_index)
    {
      case cMODE_REUSE_CFG_ENDPOINTS_LEFT:
        cfg_dx = -1
        pCfg_state = pLeft_state
        break
      case cMODE_REUSE_CFG_ENDPOINTS_UP:
        cfg_dx = 0
        cfg_dy = -1
        pCfg_state = pUpper_state
        break
      case cMODE_REUSE_CFG_ENDPOINTS_DIAG:
        cfg_dx = -1
        cfg_dy = -1
        pCfg_state = pDiag_state
        break
      default:
        fail()
    }

    if (((cfg_dx + bx) < 0) or ((cfg_dy + by) < 0) or (pCfg_state == null))
      fail()

    if (pCfg_state->m_tm_index < 0)
      fail()

    cfg_log_blk = log_blocks(bx + cfg_dx, (by + cfg_dy) & 7)

    tm_index = pCfg_state->m_tm_index
    actual_cem = cfg_log_blk.m_color_endpoint_modes[0]

    // Copy full ASTC config fields from cfg_log_blk into log_blk
    for i = 0 to cfg_log_blk.m_num_partitions - 1:
      log_blk.m_color_endpoint_modes[i] = actual_cem

    log_blk.m_dual_plane = cfg_log_blk.m_dual_plane
    log_blk.m_color_component_selector = cfg_log_blk.m_color_component_selector
    log_blk.m_num_partitions = cfg_log_blk.m_num_partitions
    log_blk.m_partition_id = cfg_log_blk.m_partition_id
    log_blk.m_endpoint_ise_range = cfg_log_blk.m_endpoint_ise_range
    log_blk.m_weight_ise_range = cfg_log_blk.m_weight_ise_range
    log_blk.m_grid_width = cfg_log_blk.m_grid_width
    log_blk.m_grid_height = cfg_log_blk.m_grid_height

    // Copy endpoint payload (all endpoint values)
    total_endpoint_vals = astc_helpers::get_num_cem_values(actual_cem) * log_blk.m_num_partitions
    memcpy(log_blk.m_endpoints, cfg_log_blk.m_endpoints, total_endpoint_vals)

    // Copy predictor metadata from reused neighbor
    new_state.m_tm_index = pCfg_state->m_tm_index
    new_state.m_base_cem_index = pCfg_state->m_base_cem_index
    new_state.m_subset_index = pCfg_state->m_subset_index
    new_state.m_ccs_index = pCfg_state->m_ccs_index
    new_state.m_grid_size = pCfg_state->m_grid_size
    new_state.m_grid_aniso = pCfg_state->m_grid_aniso
    new_state.m_used_part_hash = pCfg_state->m_used_part_hash
    new_state.m_reused_full_cfg = true

    // Recompute blue contraction usage if the CEM supports it
    actual_cem_supports_bc = astc_helpers::cem_supports_bc(actual_cem)
    if (actual_cem_supports_bc)
    {
      new_state.m_first_endpoint_uses_bc =
        astc_helpers::used_blue_contraction(actual_cem, log_blk.m_endpoints, log_blk.m_endpoint_ise_range)
    }

    // Endpoints are already present (copied), weights still need decoding.
    decode_weights_for_block()   // see below
    return
  }

  // -------------------------------------------------------
  // 2. RAW mode: may reuse only config, or decode full config
  // -------------------------------------------------------

  // This matches the reused_full_cfg_model_index logic:
  reused_full_cfg_model_index = 0
  if (pLeft_state != null) reused_full_cfg_model_index = pLeft_state->m_reused_full_cfg
  else reused_full_cfg_model_index = 1

  if (pUpper_state != null)
    reused_full_cfg_model_index |= (pUpper_state->m_reused_full_cfg ? 2 : 0)
  else
    reused_full_cfg_model_index |= 2

  config_reuse_index = dec.decode_sym(config_reuse_model[reused_full_cfg_model_index])

  if (config_reuse_index < cMaxConfigReuseNeighbors)
  {
    // config reuse from neighbor (0=left, 1=upper, 2=diag)
    cfg_dx = 0
    cfg_dy = 0
    pCfg_state = null

    switch (config_reuse_index)
    {
      case 0: cfg_dx = -1; pCfg_state = pLeft_state; break
      case 1: cfg_dx = 0; cfg_dy = -1; pCfg_state = pUpper_state; break
      case 2: cfg_dx = -1; cfg_dy = -1; pCfg_state = pDiag_state; break
      default: fail()
    }

    if (((cfg_dx + bx) < 0) or ((cfg_dy + by) < 0) or (pCfg_state == null))
      fail()

    if (pCfg_state->m_tm_index < 0)
      fail()

    cfg_log_blk = log_blocks(bx + cfg_dx, (by + cfg_dy) & 7)

    tm_index = pCfg_state->m_tm_index
    log_blk.m_partition_id = cfg_log_blk.m_partition_id
    actual_cem = cfg_log_blk.m_color_endpoint_modes[0]

    new_state.m_tm_index = pCfg_state->m_tm_index
    new_state.m_base_cem_index = pCfg_state->m_base_cem_index
    new_state.m_subset_index = pCfg_state->m_subset_index
    new_state.m_ccs_index = pCfg_state->m_ccs_index
    new_state.m_grid_size = pCfg_state->m_grid_size
    new_state.m_grid_aniso = pCfg_state->m_grid_aniso
    new_state.m_used_part_hash = pCfg_state->m_used_part_hash
    new_state.m_reused_full_cfg = true

    // IMPORTANT: In this path, only tm_index + partition_id are reused here.
    // The remaining ASTC config fields are filled in from tm_index below.
  }
  else
  {
    // Decode full ASTC config (trial mode selection + optional partition decode)
    decode_full_astc_config()   // see below
    // Must set: tm_index, actual_cem, log_blk.m_partition_id (if needed),
    //           and update new_state.* fields as in the C++.
  }

  // -----------------------------------------
  // 3. Fill log_blk config fields from tm_index
  // -----------------------------------------
  tm = encoder_trial_modes[tm_index]

  // actual_cem already computed (may be base+ofs adjusted in full config path)
  // Set per-partition CEMs:
  for part_iter = 0 to tm.m_num_parts - 1:
    log_blk.m_color_endpoint_modes[part_iter] = actual_cem

  log_blk.m_num_partitions = tm.m_num_parts
  log_blk.m_dual_plane = (tm.m_ccs_index >= 0)
  if (log_blk.m_dual_plane)
    log_blk.m_color_component_selector = tm.m_ccs_index

  log_blk.m_weight_ise_range = tm.m_weight_ise_range
  log_blk.m_endpoint_ise_range = tm.m_endpoint_ise_range
  log_blk.m_grid_width = tm.m_grid_width
  log_blk.m_grid_height = tm.m_grid_height

  // -----------------------------
  // 4. Decode endpoints (raw/DPCM)
  // -----------------------------
  decode_endpoints_for_block() // (see below; sets log_blk.m_endpoints and new_state.m_first_endpoint_uses_bc)

  // -----------------------------
  // 5. Decode weights
  // -----------------------------
  decode_weights_for_block()   // (see below; sets log_blk.m_weights and new_state.m_used_weight_dct)
}

`decode_full_astc_config()` function

decode_full_astc_config() decodes a complete ASTC block configuration from the arithmetic stream when the current block cannot reuse a prior configuration. It uses context-adaptive models derived from the predicted neighbor state (left preferred, otherwise upper) to decode grouped trial-mode descriptors (CEM class, subset count, dual-plane component selector, and grid shape categories), selects a specific trial mode from the resulting candidate set (optionally decoding a submode index), and records the chosen mode metadata into the per-block predictor state.

If the selected mode is a DIRECT endpoint mode, the function may promote it to the corresponding BASE+OFFSET mode by decoding the is_base_ofs bit. For multi-partition modes, it then decodes the partition pattern identifier, choosing between direct truncated-binary decoding or reuse via a small hash table based on a contexted use_part_hash flag, and writes the resulting partition seed into log_blk.m_partition_id. The function establishes tm_index, actual_cem, and (when needed) the partition id; the caller uses these results to populate the remaining logical block configuration fields before decoding endpoints and weights.


function uint32_t cem_to_ldrcem_index(uint32_t cem)
{
	switch (cem)
	{
	case astc_helpers::CEM_LDR_LUM_DIRECT: return 0;
	case astc_helpers::CEM_LDR_LUM_ALPHA_DIRECT: return 1;
	case astc_helpers::CEM_LDR_RGB_BASE_SCALE: return 2;
	case astc_helpers::CEM_LDR_RGB_DIRECT: return 3;
	case astc_helpers::CEM_LDR_RGB_BASE_PLUS_OFFSET: return 4;
	case astc_helpers::CEM_LDR_RGB_BASE_SCALE_PLUS_TWO_A: return 5;
	case astc_helpers::CEM_LDR_RGBA_DIRECT: return 6;
	case astc_helpers::CEM_LDR_RGBA_BASE_PLUS_OFFSET: return 7;
	default:
		assert(0);
		break;
	}

	return 0;
}

function const uint_vec& get_tm_candidates(const grouped_trial_modes& grouped_enc_trial_modes,
	uint32_t cem_index, uint32_t subset_index, uint32_t ccs_index, uint32_t grid_size, uint32_t grid_aniso)
{
	assert(cem_index < OTM_NUM_CEMS);
	assert(subset_index < OTM_NUM_SUBSETS);
	assert(ccs_index < OTM_NUM_CCS);
	assert(grid_size < OTM_NUM_GRID_SIZES);
	assert(grid_aniso < OTM_NUM_GRID_ANISOS);

	const uint_vec& modes = grouped_enc_trial_modes.m_tm_groups[cem_index][subset_index][ccs_index][grid_size][grid_aniso];
	return modes;
}

local function decode_full_astc_config()
{
  // ------------------------------------------
  // 1. Decode grouped “mode descriptors” (OTM)
  //    This selects a candidate list of trial modes (TM’s),
  //    then an optional submode index selects the final tm_index.
  // ------------------------------------------

  prev_cem_index = astc_helpers::CEM_LDR_RGB_DIRECT
  prev_subset_index = 0
  prev_ccs_index = 0
  prev_grid_size = 0
  prev_grid_aniso = 0

  if (pPred_state != null)
  {
    prev_cem_index = pPred_state->m_base_cem_index
    prev_subset_index = pPred_state->m_subset_index
    prev_ccs_index = pPred_state->m_ccs_index
    prev_grid_size = pPred_state->m_grid_size
    prev_grid_aniso = pPred_state->m_grid_aniso
  }

  ldrcem_index = cem_to_ldrcem_index(prev_cem_index)

  cem_index = dec.decode_sym(cem_index_model[ldrcem_index])
  subset_index = dec.decode_sym(subset_index_model[prev_subset_index])
  ccs_index = dec.decode_sym(ccs_index_model[prev_ccs_index])
  grid_size_index = dec.decode_sym(grid_size_model[prev_grid_size])
  grid_aniso_index = dec.decode_sym(grid_aniso_model[prev_grid_aniso])

  // Given a bucket ID, get the array of trial modes associated with it
  modes = get_tm_candidates(
            g_grouped_encoder_trial_modes[astc_block_size_index],
            cem_index, subset_index, ccs_index, grid_size_index, grid_aniso_index)

  submode_index = 0
  if (modes.size() > 1)
  {
    submode_model = submode_models[cem_index][subset_index][ccs_index][grid_size_index][grid_aniso_index]
    if (submode_model.get_num_data_syms() == 0)
      submode_model.init(modes.size_u32(), true)

    submode_index = dec.decode_sym(submode_model)
  }

  // Ensure the index isn't too high for this bucket
  if (submode_index >= modes.size())
    fail()

  tm_index = modes[submode_index]

  // Update predictor state (this is the “base config” for future contexts)
  new_state.m_tm_index = tm_index
  new_state.m_base_cem_index = cem_index
  new_state.m_subset_index = subset_index
  new_state.m_ccs_index = ccs_index
  new_state.m_grid_size = grid_size_index
  new_state.m_grid_aniso = grid_aniso_index
  new_state.m_reused_full_cfg = false

  // ------------------------------------------
  // 2. Validate tm_index and determine actual_cem
  // ------------------------------------------

  if (tm_index >= encoder_trial_modes.size())
    fail()

  tm = encoder_trial_modes[tm_index]

  actual_cem = tm.m_cem

  // Optional base+offset promotion for DIRECT CEMs
  if ((tm.m_cem == astc_helpers::CEM_LDR_RGB_DIRECT) or (tm.m_cem == astc_helpers::CEM_LDR_RGBA_DIRECT))
  {
    is_base_ofs = dec.decode_bit(is_base_ofs_model)
    if (is_base_ofs)
    {
      if (actual_cem == astc_helpers::CEM_LDR_RGB_DIRECT)
        actual_cem = astc_helpers::CEM_LDR_RGB_BASE_PLUS_OFFSET
      else if (actual_cem == astc_helpers::CEM_LDR_RGBA_DIRECT)
        actual_cem = astc_helpers::CEM_LDR_RGBA_BASE_PLUS_OFFSET
    }
  }

  // ------------------------------------------
  // 3. Decode partition id (if needed)
  // ------------------------------------------

  if (tm.m_num_parts > 1)
  {
    total_unique_indices = get_total_unique_patterns(astc_block_size_index, tm.m_num_parts)

    // Context index based on whether left/upper used part hash
    use_part_model_index = 0
    if (pLeft_state != null) use_part_model_index = pLeft_state->m_used_part_hash
    else use_part_model_index = 1

    if (pUpper_state != null)
      use_part_model_index |= (pUpper_state->m_used_part_hash ? 2 : 0)
    else
      use_part_model_index |= 2

    pPart_hash = (tm.m_num_parts == 2) ? part2_hash : part3_hash

    use_part_hash_flag = dec.decode_bit(use_part_hash_model[use_part_model_index])

    unique_pat_index = 0
    if (!use_part_hash_flag)
    {
      unique_pat_index = dec.decode_truncated_binary(total_unique_indices)

      // Store into hash table
      pPart_hash[part_hash_index(unique_pat_index)] = unique_pat_index

      new_state.m_used_part_hash = false
    }
    else
    {
      hash_index = dec.decode_sym((tm.m_num_parts == 2) ? part2_hash_index_model : part3_hash_index_model)
      unique_pat_index = pPart_hash[hash_index]

      if ((int)unique_pat_index < 0)
        fail()

      new_state.m_used_part_hash = true
    }

    if (unique_pat_index >= get_total_unique_patterns(astc_block_size_index, tm.m_num_parts))
      fail()

    log_blk.m_partition_id =
      unique_pat_index_to_part_seed(astc_block_size_index, tm.m_num_parts, unique_pat_index)
  }
  else
  {
    // bias to true
    new_state.m_used_part_hash = true
  }

  // NOTE: decode_full_astc_config() only establishes:
  // - tm_index
  // - actual_cem
  // - log_blk.m_partition_id (if tm.m_num_parts > 1)
  // - new_state fields shown above
  //
  // The caller fills the remaining log_blk ASTC config fields from tm_index,
  // then decodes endpoints and weights.
}

`decode_endpoints_for_block()` function

This function reconstructs the color endpoint values for the current logical ASTC block after its configuration has been established (trial mode, endpoint ISE range, partition count, and actual CEM). It first decodes a flag selecting either direct endpoint coding or endpoint DPCM. In the direct path, it decodes each endpoint value for each partition from the appropriate arithmetic model for the current endpoint ISE range. In the DPCM path, it selects a predictor block using a decoded reuse-delta index, verifies the predictor is in-bounds and not a solid-color block, and (when the active CEM supports it) decodes per-partition blue-contraction usage using context derived from neighboring blocks.

The predictor’s endpoints are then converted across CEM domains using convert_endpoints_across_cems(), and per-endpoint deltas are decoded and applied in the rank domain with modular wrapping before converting back to ISE values via the dequantization tables. Finally, when supported, the function updates the predictor state to record whether the decoded endpoints use blue contraction, which is used to condition subsequent context modeling.

local function decode_endpoints_for_block()
{
  // Preconditions (established by caller):
  // - tm_index is valid
  // - tm = encoder_trial_modes[tm_index]
  // - actual_cem has been determined (including optional base+ofs promotion)
  // - log_blk.m_num_partitions, log_blk.m_endpoint_ise_range, and log_blk.m_color_endpoint_modes[]
  //   have been set by the caller (or cfg+endpoint reuse path)
  //
  // Postconditions:
  // - log_blk.m_endpoints[] is filled in
  // - if astc_helpers::cem_supports_bc(actual_cem), then new_state.m_first_endpoint_uses_bc is updated

  tm = encoder_trial_modes[tm_index]

  actual_cem_supports_bc = astc_helpers::cem_supports_bc(actual_cem)

  total_endpoint_vals = astc_helpers::get_num_cem_values(actual_cem)

  used_dpcm_endpoints_flag = dec.decode_bit(use_dpcm_endpoints_model)

  if (!used_dpcm_endpoints_flag)
  {
    // Raw endpoint (no DPCM)
    raw_model = raw_endpoint_models[log_blk.m_endpoint_ise_range - astc_helpers::FIRST_VALID_ENDPOINT_ISE_RANGE]

    for (part_iter = 0; part_iter < tm.m_num_parts; part_iter++)
    {
      for (val_iter = 0; val_iter < total_endpoint_vals; val_iter++)
      {
        log_blk.m_endpoints[part_iter * total_endpoint_vals + val_iter] =
          (uint8_t)dec.decode_sym(raw_model)
      }
    }
  }
  else
  {
    // Endpoint DPCM

    num_endpoint_levels = astc_helpers::get_ise_levels(log_blk.m_endpoint_ise_range)
    endpoint_rank_to_ise = astc_helpers::g_dequant_tables.get_endpoint_tab(log_blk.m_endpoint_ise_range).m_rank_to_ISE
    endpoint_ise_to_rank = astc_helpers::g_dequant_tables.get_endpoint_tab(log_blk.m_endpoint_ise_range).m_ISE_to_rank

    reuse_delta_index = dec.decode_sym(endpoint_reuse_delta_model)

    reuse_bx = bx + basist::astc_6x6_hdr::g_reuse_xy_deltas[reuse_delta_index].m_x
    reuse_by = by + basist::astc_6x6_hdr::g_reuse_xy_deltas[reuse_delta_index].m_y

    if ((reuse_bx < 0) or (reuse_by < 0) or (reuse_bx >= num_blocks_x) or (reuse_by >= num_blocks_y))
      fail()

    pEndpoint_pred_log_blk = &log_blocks(reuse_bx, reuse_by & 7)

    if (pEndpoint_pred_log_blk->m_solid_color_flag_ldr)
      fail()

    // Determine context index for endpoints_use_bc_models[]
    bc_model_index = 0
    if (pLeft_state != null) bc_model_index = pLeft_state->m_first_endpoint_uses_bc
    else bc_model_index = 1

    if (pUpper_state != null)
      bc_model_index |= (pUpper_state->m_first_endpoint_uses_bc ? 2 : 0)
    else
      bc_model_index |= 2

    endpoints_use_bc[astc_helpers::MAX_PARTITIONS] = { false }

    if (actual_cem_supports_bc)
    {
      for (part_iter = 0; part_iter < log_blk.m_num_partitions; part_iter++)
      {
        endpoints_use_bc[part_iter] = dec.decode_bit(endpoints_use_bc_models[bc_model_index])
      }
    }

    predicted_endpoints[astc_helpers::MAX_PARTITIONS][astc_helpers::MAX_CEM_ENDPOINT_VALS] = { 0 }

    for (part_iter = 0; part_iter < log_blk.m_num_partitions; part_iter++)
    {
      always_repack_flag = false
      blue_contraction_clamped_flag = false
      base_ofs_clamped_flag = false

      // Convert predictor endpoints across CEM domains (mini-CEM encoder).
      conv_status = convert_endpoints_across_cems(
        pEndpoint_pred_log_blk->m_color_endpoint_modes[0],
        pEndpoint_pred_log_blk->m_endpoint_ise_range,
        pEndpoint_pred_log_blk->m_endpoints,
        log_blk.m_color_endpoint_modes[0],
        log_blk.m_endpoint_ise_range,
        predicted_endpoints[part_iter],
        always_repack_flag,
        endpoints_use_bc[part_iter],
        false, // is_hdr
        blue_contraction_clamped_flag,
        base_ofs_clamped_flag)

      if (!conv_status)
        fail()
    }

    // Apply decoded DPCM values to CEM/BISE encoded endpoint values
    dpcm_model = dpcm_endpoint_models[log_blk.m_endpoint_ise_range - astc_helpers::FIRST_VALID_ENDPOINT_ISE_RANGE]

    for (part_iter = 0; part_iter < tm.m_num_parts; part_iter++)
    {
      for (val_iter = 0; val_iter < total_endpoint_vals; val_iter++)
      {
        endpoint_idx = part_iter * total_endpoint_vals + val_iter

        delta = (uint8_t)dec.decode_sym(dpcm_model)

        e_val =
          imod(delta + endpoint_ise_to_rank[predicted_endpoints[part_iter][val_iter]], num_endpoint_levels)

        log_blk.m_endpoints[endpoint_idx] = endpoint_rank_to_ise[e_val]
      }
    }
  }

  if (actual_cem_supports_bc)
  {
    new_state.m_first_endpoint_uses_bc =
      astc_helpers::used_blue_contraction(actual_cem, log_blk.m_endpoints, log_blk.m_endpoint_ise_range)
  }
}

decode_weights_for_block() function

decode_weights_for_block() reconstructs the block’s weight grid after the block configuration and endpoints are known. It determines the number of weight planes (one for single-plane blocks, two for dual-plane blocks) and the total number of weights from the block’s grid dimensions, then—if the global use_dct flag is set—decodes a per-block block_used_dct decision using context derived from neighboring blocks. If DCT is selected, the decoder reads a DC symbol using the appropriate mean model (chosen by the number of DC levels for the current weight ISE range), then decodes a sequence of run-length/coefficients in zigzag order until an end-of-block symbol is reached; these symbols are passed to grid_dct.decode_block_weights() to perform the IDCT reconstruction of the weight grid for each plane.

If DCT is not used, the decoder performs weight DPCM: it decodes per-weight residuals from the arithmetic model for the current weight ISE range, accumulates them modulo the number of weight levels, and maps the resulting ranks back to ISE values using the ASTC dequantization tables. The function stores the final weights into log_blk.m_weights and updates predictor state to reflect whether DCT was used, ensuring consistent context modeling for subsequent blocks.

local function decode_weights_for_block()
{
  // Preconditions (established by caller):
  // - tm_index is valid
  // - tm = encoder_trial_modes[tm_index]
  // - log_blk.m_grid_width, log_blk.m_grid_height
  // - log_blk.m_weight_ise_range
  // - log_blk.m_dual_plane (and selector if dual plane)
  // - use_fast_decoding is true only in the hybrid profile (cHybridArithZstd). In full arithmetic profile (cFullArith) it is false.
  //
  // Postconditions:
  // - log_blk.m_weights[] is filled in (all planes)
  // - new_state.m_used_weight_dct is updated (if global use_dct is enabled)

  tm = encoder_trial_modes[tm_index]

  total_planes  = (tm.m_ccs_index >= 0) ? 2 : 1
  total_weights = tm.m_grid_width * tm.m_grid_height

  // -----------------------------
  // Decide if this block uses DCT
  // -----------------------------
  use_dct_model_index = 0
  if (use_dct)
  {
    if (pLeft_state != null)
      use_dct_model_index = pLeft_state->m_used_weight_dct
    else
      use_dct_model_index = 1

    if (pUpper_state != null)
      use_dct_model_index |= (pUpper_state->m_used_weight_dct ? 2 : 0)
    else
      use_dct_model_index |= 2
  }

  block_used_dct = false
  if (use_dct) // mipmap global flag read from system header
    block_used_dct = dec.decode_bit(use_dct_model[use_dct_model_index])
  
  // ============================================================
  // Hybrid profile (use_fast_decoding == true)
  // ============================================================
  if (use_fast_decoding)
  {
    // Hybrid profile note:
    // The mean*, run, coeff, sign, and weight*_bits/bytes sections are each decoded
    // as independent, monotonically advancing bit/byte streams with a single global
    // read cursor per section. These cursors are NOT reset per block or per plane.
    // Symbols are consumed strictly in raster block order, then by plane, then by
    // weight (or zigzag position for DCT), exactly matching the nested decoding loops. 

    if (block_used_dct)
    {
      new_state.m_used_weight_dct = true

      num_dc_levels = grid_weight_dct::get_num_weight_dc_levels(
                        log_blk.m_weight_ise_range)
      syms.m_num_dc_levels = num_dc_levels

      for (plane_iter = 0; plane_iter < total_planes; plane_iter++)
      {
        syms.m_coeffs.resize(0)

        // DC symbol from external sections:
        // - 33-level DC: 8-bit from mean1_bytes
        // -  9-level DC: 4-bit from mean0_bits
        if (num_dc_levels == DCT_MEAN_LEVELS1)
          syms.m_dc_sym = mean1_bytes.get_bits8()
        else
          syms.m_dc_sym = mean0_bits.get_bits4()

        cur_zig_ofs = 1

        while (cur_zig_ofs < total_weights)
        {
          run_len = run_bytes.get_bits8()
          if (run_len == DCT_RUN_LEN_EOB_SYM_INDEX)
            break

          cur_zig_ofs += run_len
          if (cur_zig_ofs >= total_weights)
            fail()

          sign  = sign_bits.get_bits1()
          coeff = coeff_bytes.get_bits8() + 1
          if (sign)
            coeff = -coeff

          syms.m_coeffs.push_back(
            dct_syms::coeff(run_len, coeff))

          cur_zig_ofs++
        }

        // weight grid IDCT
        if (!grid_dct.decode_block_weights(
              dct_q, plane_iter, log_blk, &syms))
          fail()
      }
    }
    else
    {
      // Weight grid DPCM (hybrid)
      num_weight_levels  =
        astc_helpers::get_ise_levels(log_blk.m_weight_ise_range)

      weight_rank_to_ise =
        astc_helpers::g_dequant_tables
          .get_weight_tab(log_blk.m_weight_ise_range)
          .m_rank_to_ISE

      for (plane_iter = 0; plane_iter < total_planes; plane_iter++)
      {
        prev_w = num_weight_levels / 2

        for (weight_iter = 0; weight_iter < total_weights; weight_iter++)
        {
          if (num_weight_levels <= 4)
            r = weight2_bits.get_bits2()
          else if (num_weight_levels <= 8)
            r = weight3_bits.get_bits4()
          else if (num_weight_levels <= 16)
            r = weight4_bits.get_bits4()
          else
            r = weight8_bytes.get_bits8()

          w = imod(prev_w + r, num_weight_levels)
          prev_w = w

          log_blk.m_weights[
            plane_iter + weight_iter * total_planes] =
              (uint8_t)weight_rank_to_ise[w]
        }
      }
    }

    return
  }

  // ============================================================
  // Pure arithmetic profile (use_fast_decoding == false)
  // ============================================================
  if (block_used_dct)
  {
    new_state.m_used_weight_dct = true

    num_dc_levels =
      grid_weight_dct::get_num_weight_dc_levels(
        log_blk.m_weight_ise_range)

    syms.m_num_dc_levels = num_dc_levels

    for (plane_iter = 0; plane_iter < total_planes; plane_iter++)
    {
      syms.m_coeffs.resize(0)

      syms.m_dc_sym =
        dec.decode_sym(
          weight_mean_models[
            (num_dc_levels == DCT_MEAN_LEVELS1) ? 1 : 0])

      cur_zig_ofs = 1

      while (cur_zig_ofs < total_weights)
      {
        run_len = dec.decode_sym(dct_run_len_model)
        if (run_len == DCT_RUN_LEN_EOB_SYM_INDEX)
          break

        cur_zig_ofs += run_len
        if (cur_zig_ofs >= total_weights)
          fail()

        sign  = dec.get_bit()
        coeff = dec.decode_sym(dct_coeff_mag) + 1
        if (sign)
          coeff = -coeff

        syms.m_coeffs.push_back(
          dct_syms::coeff(run_len, coeff))

        cur_zig_ofs++
      }

      // weight grid IDCT
      if (!grid_dct.decode_block_weights(
            dct_q, plane_iter, log_blk, &syms))
        fail()
    }
  }
  else
  {
    // Weight grid DPCM (pure arithmetic)
    num_weight_levels =
      astc_helpers::get_ise_levels(log_blk.m_weight_ise_range)

    weight_rank_to_ise =
      astc_helpers::g_dequant_tables
        .get_weight_tab(log_blk.m_weight_ise_range)
        .m_rank_to_ISE

    for (plane_iter = 0; plane_iter < total_planes; plane_iter++)
    {
      prev_w = num_weight_levels / 2

      for (weight_iter = 0; weight_iter < total_weights; weight_iter++)
      {
        r = dec.decode_sym(
              raw_weight_models[
                log_blk.m_weight_ise_range -
                astc_helpers::FIRST_VALID_WEIGHT_ISE_RANGE])

        w = imod(prev_w + r, num_weight_levels)
        prev_w = w

        log_blk.m_weights[
          plane_iter + weight_iter * total_planes] =
            (uint8_t)weight_rank_to_ise[w]
      }
    }
  }
}

XUASTC LDR ‐ Arithmetic Profile

XUASTC LDR - Arithmetic Profile

Introduction

Mipmap Data Profile Selection

Profile Header - Full Arithmetic

Profile Header - Hybrid

System Header (Both Profiles)

Mip-Map Decoding

Global grouped_trial_modes class

Constants/Enums

Constants and Arithmetic Decoding Models

Temporary State Using During Decoding

High-Level Block Decoding

Solid Block Decoding

Non-Solid Block Decoding

decode_full_astc_config() function

decode_endpoints_for_block() function

decode_weights_for_block() function

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally

`decode_full_astc_config()` function

`decode_endpoints_for_block()` function