Skip to content

[Bug]: pgai-vectorizer-worker 0.9.2 crashes on datetime JSON, retries indefinitely #783

@Ahmad-mtos

Description

@Ahmad-mtos

Stack

  • DB image: timescale/timescaledb-ha:pg17-ts2.18-all (self-hosted, Docker)
  • Worker image: timescale/pgai-vectorizer-worker:latest (pulled 2025-04-12)
  • OpenAI model: text-embedding-3-large
  • A second environment on Timescale Cloud (same schema & migration) doesn't show the same error.

docs table

 developer_id:uuid
 doc_id:uuid
 title:text
 content:text
 index:integer
 modality:text
 embedding_model:text
 embedding_dimensions:integer
 language:text
 created_at:timestamptz
 updated_at:timestamptz
 metadata:jsonb
 search_tsv:tsvector
 content_hash:text

Vectorizer migration

SELECT ai.create_vectorizer (
    source        => 'docs',
    destination   => 'docs_embeddings',
    embedding     => ai.embedding_openai('text-embedding-3-large', 1024, 'document'),
    chunking      => ai.chunking_recursive_character_text_splitter(…),
    scheduling    => ai.scheduling_timescaledb(),
    indexing      => ai.indexing_diskann(),
    formatting    => ai.formatting_python_template(E'Title: $title\n\n$chunk'),
    processing    => ai.processing_default(),
    enqueue_existing => TRUE
);

What we see

TypeError: Object of type datetime is not JSON serializable
  • Logged ~472 k times in ai.vectorizer_errors; worker restarts every ~5 s.
  • Accumulated ≈ 18 billion input tokens on text-embedding-3-large (~ US $2.3 k).
  • Issue began shortly after 2025-04-12 image pull.
  • Cloud deployment (managed worker) embeds the same data fine.

Update (2025-06-03) — follow-up from Discord

  • Discussed this incident with @JamesGuthrie on the pgAI Discord.
  • He confirmed that the underlying cause of our crash is already removed in pgAI vectorizer worker ≥ v0.10.0 (see PR feat!: truncate inputs to OpenAI #567 ); Timescale Cloud runs this version, which is why we don’t observe the issue there.
  • A separate improvement to the worker’s retry logic is still being worked on.

Questions

  1. Upgrade path: Is it considered safe to migrate a self-hosted deployment from timescale/pgai-vectorizer-worker:0.9.2 straight to the latest 0.10.x by following the official migration guide?
  2. Patch option: If an immediate major upgrade isn’t feasible, is a 0.9.x patch (e.g. 0.9.2-p1) planned that back-ports the fix preventing this crash?

Thanks for any guidance—this will help us schedule downtime or decide on a stop-gap solution.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions