A powerful web application that converts audio to text and translates it into multiple languages using advanced AI models.
- Overview
- Application Interface
- Features
- Technologies
- Installation
- Usage
- Supported Languages
- How It Works
- Links
This application provides an intuitive interface for real-time audio transcription and translation. Simply upload an audio file or record an audio, select your target language, and get both the transcribed text and its translation instantly. The app features a modern dark theme and works seamlessly on both desktop and mobile devices.
Application interface in laptop mode showing the full desktop layout
Mobile interface showing the responsive design for smaller screens
- Audio Transcription: Convert speech to text using OpenAI's Whisper model
- Multi-language Translation: Translate transcribed text into 7 different languages
- Real-time Processing: Live transcription and translation
- Multiple Input Methods: Upload audio files or record directly through the interface
- Responsive Design: Works perfectly on desktop and mobile devices
- Dark Theme Interface: Modern, eye-friendly design
- Format Support: WAV, MP3, M4A, FLAC audio formats
- OpenAI Whisper: State-of-the-art speech recognition model
- Gradio: Web interface framework for machine learning applications
- Deep Translator: Translation library using Google Translate API
- PyTorch: Deep learning framework for model execution
- Python 3.7+: Core programming language
- Python 3.7 or higher
- Git
-
Clone the repository
git clone https://huggingface.co/spaces/malimalikayesha/Transcription_and_Translation_App cd Transcription_and_Translation_App -
Create virtual environment
python -m venv venv venv\Scripts\activate # On Windows # source venv/bin/activate # On macOS/Linux
-
Install dependencies
pip install -r requirements.txt
-
Run the application
python app.py
-
Access the application
- Click on the link that appears in the terminal (typically
http://127.0.0.1:7860) - The interface will load in your browser
- Note: Transcription and translation may take a while when running on local CPU
- Click on the link that appears in the terminal (typically
-
Upload/Record Audio
- Click "Drop Audio Here" or drag and drop your audio file or use the mic icon to record audio
- Supported formats: WAV, MP3, M4A, FLAC
- Or use the microphone icon to record directly
-
Select Target Language
- Choose your desired translation language from the dropdown
- Available options: English, Spanish, French, German, Chinese, Japanese, Urdu
-
Process & View Results
- The system automatically processes your audio
- View the original transcribed text in the left panel
- See the translated text in the right panel
- Use the "Clear" button to reset and start over
| Language | Code | Language | Code |
|---|---|---|---|
| English 🇺🇸 | en | German 🇩🇪 | de |
| Spanish 🇪🇸 | es | Chinese (Simplified) 🇨🇳 | zh-cn |
| French 🇫🇷 | fr | Japanese 🇯🇵 | ja |
| Urdu 🇵🇰 | ur |
-
Audio Processing
- Audio files are loaded and normalized to 30-second segments
- Converted to log-Mel spectrogram format for processing
-
Speech Recognition
- OpenAI Whisper "base" model processes the audio
- Generates high-accuracy text transcription
-
Language Translation
- Google Translator automatically detects the source language
- Translates the transcribed text to the selected target language
-
Real-time Display
- Both transcribed and translated texts are displayed simultaneously
- Results appear instantly as processing completes
- Live Demo: Hugging Face Space
- Whisper Documentation: OpenAI Whisper
- Gradio Documentation: Gradio Docs
Made with ❤️ by malimalikayesha


