3.7.0
π Transformers.js v3.7 β Voxtral, LFM2, ModernBERT Decoder
π€ New models
This update adds support for 3 new architectures:
Voxtral
Voxtral Mini is an enhancement of Ministral 3B, incorporating state-of-the-art audio input capabilities while retaining best-in-class text performance. It excels at speech transcription, translation and audio understanding. ONNX weights for Voxtral-Mini-3B-2507 can be found here. Learn more about Voxtral in the release blog post.
Try it out with our online demo:
Voxtral.WebGPU.demo.mp4
Example: Audio transcription
import { VoxtralForConditionalGeneration, VoxtralProcessor, TextStreamer, read_audio } from "@huggingface/transformers";
// Load the processor and model
const model_id = "onnx-community/Voxtral-Mini-3B-2507-ONNX";
const processor = await VoxtralProcessor.from_pretrained(model_id);
const model = await VoxtralForConditionalGeneration.from_pretrained(
model_id,
{
dtype: {
embed_tokens: "fp16", // "fp32", "fp16", "q8", "q4"
audio_encoder: "q4", // "fp32", "fp16", "q8", "q4", "q4f16"
decoder_model_merged: "q4", // "q4", "q4f16"
},
device: "webgpu",
},
);
// Prepare the conversation
const conversation = [
{
"role": "user",
"content": [
{ "type": "audio" },
{ "type": "text", "text": "lang:en [TRANSCRIBE]" },
],
}
];
const text = processor.apply_chat_template(conversation, { tokenize: false });
const audio = await read_audio("http://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/mlk.wav", 16000);
const inputs = await processor(text, audio);
// Generate the response
const generated_ids = await model.generate({
...inputs,
max_new_tokens: 256,
streamer: new TextStreamer(processor.tokenizer, { skip_special_tokens: true, skip_prompt: true }),
});
// Decode the generated tokens
const new_tokens = generated_ids.slice(null, [inputs.input_ids.dims.at(-1), null]);
const generated_texts = processor.batch_decode(
new_tokens,
{ skip_special_tokens: true },
);
console.log(generated_texts[0]);
// I have a dream that one day this nation will rise up and live out the true meaning of its creed.
LFM2
LFM2 is a new generation of hybrid models developed by Liquid AI, specifically designed for edge AI and on-device deployment. It sets a new standard in terms of quality, speed, and memory efficiency.
The models, which we have converted to ONNX, come in three different sizes: 350M, 700M, and 1.2B parameters.
Example: Text-generation with LFM2-350M:
import { pipeline, TextStreamer } from "@huggingface/transformers";
// Create a text generation pipeline
const generator = await pipeline(
"text-generation",
"onnx-community/LFM2-350M-ONNX",
{ dtype: "q4" },
);
// Define the list of messages
const messages = [
{ role: "system", content: "You are a helpful assistant." },
{ role: "user", content: "What is the capital of France?" },
];
// Generate a response
const output = await generator(messages, {
max_new_tokens: 512,
do_sample: false,
streamer: new TextStreamer(generator.tokenizer, { skip_prompt: true, skip_special_tokens: true }),
});
console.log(output[0].generated_text.at(-1).content);
// The capital of France is Paris. It is a vibrant city known for its historical landmarks, art, fashion, and gastronomy.
ModernBERT Decoder
These models form part of the Ettin suite: the first collection of paired encoder-only and decoder-only models trained with identical data, architecture, and training recipes. Ettin enables fair comparisons between encoder and decoder architectures across multiple scales, providing state-of-the-art performance for open-data models in their respective size categories.
The list of supported models can be found here.
import { pipeline, TextStreamer } from "@huggingface/transformers";
// Create a text generation pipeline
const generator = await pipeline(
"text-generation",
"onnx-community/ettin-decoder-150m-ONNX",
{ dtype: "fp32" },
);
// Generate a response
const text = "Q: What is the capital of France?\nA:";
const output = await generator(text, {
max_new_tokens: 128,
streamer: new TextStreamer(generator.tokenizer, { skip_prompt: true, skip_special_tokens: true }),
});
console.log(output[0].generated_text);
Added in #1371.
π οΈ Other improvements
- Add special tokens in text-generation pipeline if tokenizer requires in #1370
Full Changelog: 3.6.3...3.7.0