Script

Tip: Want your speakers to sing? Add or (singing) to mark musical moments. Example: "Speaker 0: ♪ Happy birthday to you! ♪"

Configuration

2
1.3
5
0.9
0.9

Voice Input
Hold Space or click to record
About Live Transcribe Uses local faster-whisper (free, no API key needed) with ~3–5 second latency. If faster-whisper isn't available, falls back to OpenAI Whisper API (add key in Settings).

Teams / Zoom integration: Browsers can only capture your microphone, not system audio.
  1. Use the Call Recording tab — upload meeting files for best diarization.
  2. On macOS, use BlackHole to route system audio.
  3. On Windows, enable Stereo Mix in sound settings.
Transcript
Transcript will appear here...
Upload Recording
Drop audio/video file here or click to browse
.mp3, .mp4, .wav, .m4a, .webm supported
Live Meeting
Generate Summary

Audio Library

Help & How To

Quick Start Guides

Podcast Generation

  1. Enter a topic or write your own script using Speaker 0:, Speaker 1:, etc.
  2. Choose a model: Realtime-0.5B (fast, free) or TTS-1.5B/Large/7B (Pro, higher quality)
  3. Select voices for each speaker and configure speakers (1-4)
  4. Click Generate and wait for your podcast to be created
  5. Singing tip: Add ♪ or (singing) to make speakers sing: "Speaker 0: ♪ Happy birthday! ♪"

Live Transcription

  1. Click the microphone button or press and hold the space bar
  2. Speak clearly into your microphone
  3. Release to stop recording - transcription appears instantly
  4. Copy or clear the transcript using the buttons below

Call & Meeting Recording

  1. Upload an audio file (.wav, .mp3, .m4a, .flac, .ogg)
  2. Click Transcribe to convert speech to text
  3. Generate Summary to get key points, action items, and decisions
  4. Download transcript (.txt) or summary (.md)
Frequently Asked Questions

What's the difference between the TTS models?

Realtime-0.5B: Fast, runs locally on your device, free tier. Great for quick single-voice TTS and simple podcasts.
TTS-1.5B: Cloud-powered, better quality, handles multi-speaker well. Requires Pro plan.
TTS-Large: Higher quality voices, more natural intonation. Uses more GPU (1.5x tokens). Requires Pro.
TTS-7B: Highest quality, most natural speech, best for professional content. Uses most GPU (2x tokens). Requires Pro.

How does pricing work?

Free ($0): 10 min podcast, 20 min transcription, 5 scripts/month
Bring Your Own ($9): 30 min podcast, 60 min transcription, 20 scripts/month (use your own API keys)
Starter ($14.99): 60 min podcast, 120 min transcription, 50 scripts/month
Pro ($29): 300 min podcast, 500 min transcription, 200 scripts/month + cloud TTS models

Note: TTS-Large uses 1.5x tokens, TTS-7B uses 2x tokens. More speakers also increase usage.

Can I use my own API keys?

Yes! Go to Settings (click your profile) and add your own:
Hugging Face Token - for cloud transcription fallback
OpenAI API Key - optional for Whisper API transcription
Ollama API Key - for cloud script generation fallback

The Bring Your Own plan ($9/month) is perfect if you want to use your own keys with higher limits.

Why is my podcast generation slow?

Generation time depends on:
Model size: TTS-7B is slower but highest quality
Script length: Longer scripts take more time
Number of speakers: More speakers = longer processing
Cloud vs Local: Cloud models may sleep after inactivity (~30s wake time)

Tip: Use Realtime-0.5B for fastest results (runs on your device).

Where are my generated files stored?

All generated podcasts and TTS clips are automatically saved to your Library tab. You can:
• Play audio files directly in the browser
• Download files as WAV format
• Delete files you no longer need

Files are stored securely on the server and linked to your account.

Need More Help?

Can't find what you're looking for? Contact us:
Email: support@unicornai.studio
Website: unicornai.studio