A video-centric learning platform powered by cutting-edge AI.
- Auto captioning — Automatically generate accurate captions to improve comprehension and accessibility.
- Synthesized audio (multiple accents) — Choose voice and accent for playback to personalize the listening experience.
- Visual question answering — Ask questions about slides or frames and receive concise, context-aware answers.
- Summarization — Produce short, structured summaries and highlights of video content.
- Content extraction — Extract text, math equations, timestamps, and other structured data from video.
- Voice mode — Interact using voice input and receive spoken responses for hands-free use.
- Export — Export subtitles, transcripts, conversations, and annotations in common formats.
- Subtitle correction — Edit and refine generated subtitles through an intuitive UI.
- And more...
To get your development environment up and running, follow these simple steps:
You should have a local ollama server running at the default port 11434. Also, the LLM part is adopting ministral-3:8b for reasoning, nomic-embed-text:latest for text embeddings, you should have them pulled down onto your local environment.
In addition, make sure you are able to access huggingface, or having Kokoro TTS and whisper models cached locally since these models are used for auto captioning, voice input, voice responding, synthesized sound tracks etc.
make backendmake frontend- Browse the folder: demos
- Here's an example of VQA demo:

© 2026 Kinsight Labs. All rights reserved.