Tip: Want your speakers to sing? Add ♪ or (singing) to mark musical moments. Example: "Speaker 0: ♪ Happy birthday to you! ♪"
Teams / Zoom integration: Browsers can only capture your microphone, not system audio.
- Use the Call Recording tab — upload meeting files for best diarization.
- On macOS, use BlackHole to route system audio.
- On Windows, enable Stereo Mix in sound settings.
Audio Library
Help & How To
Podcast Generation
- Enter a topic or write your own script using
Speaker 0:,Speaker 1:, etc. - Choose a model: Realtime-0.5B (fast, free) or TTS-1.5B/Large/7B (Pro, higher quality)
- Select voices for each speaker and configure speakers (1-4)
- Click Generate and wait for your podcast to be created
- Singing tip: Add ♪ or (singing) to make speakers sing: "Speaker 0: ♪ Happy birthday! ♪"
Live Transcription
- Click the microphone button or press and hold the space bar
- Speak clearly into your microphone
- Release to stop recording - transcription appears instantly
- Copy or clear the transcript using the buttons below
Call & Meeting Recording
- Upload an audio file (.wav, .mp3, .m4a, .flac, .ogg)
- Click Transcribe to convert speech to text
- Generate Summary to get key points, action items, and decisions
- Download transcript (.txt) or summary (.md)
What's the difference between the TTS models?
Realtime-0.5B: Fast, runs locally on your device, free tier. Great for quick single-voice TTS and simple podcasts.
TTS-1.5B: Cloud-powered, better quality, handles multi-speaker well. Requires Pro plan.
TTS-Large: Higher quality voices, more natural intonation. Uses more GPU (1.5x tokens). Requires Pro.
TTS-7B: Highest quality, most natural speech, best for professional content. Uses most GPU (2x tokens). Requires Pro.
How does pricing work?
Free ($0): 10 min podcast, 20 min transcription, 5 scripts/month
Bring Your Own ($9): 30 min podcast, 60 min transcription, 20 scripts/month (use your own API keys)
Starter ($14.99): 60 min podcast, 120 min transcription, 50 scripts/month
Pro ($29): 300 min podcast, 500 min transcription, 200 scripts/month + cloud TTS models
Note: TTS-Large uses 1.5x tokens, TTS-7B uses 2x tokens. More speakers also increase usage.
Can I use my own API keys?
Yes! Go to Settings (click your profile) and add your own:
• Hugging Face Token - for cloud transcription fallback
• OpenAI API Key - optional for Whisper API transcription
• Ollama API Key - for cloud script generation fallback
The Bring Your Own plan ($9/month) is perfect if you want to use your own keys with higher limits.
Why is my podcast generation slow?
Generation time depends on:
• Model size: TTS-7B is slower but highest quality
• Script length: Longer scripts take more time
• Number of speakers: More speakers = longer processing
• Cloud vs Local: Cloud models may sleep after inactivity (~30s wake time)
Tip: Use Realtime-0.5B for fastest results (runs on your device).
Where are my generated files stored?
All generated podcasts and TTS clips are automatically saved to your Library tab. You can:
• Play audio files directly in the browser
• Download files as WAV format
• Delete files you no longer need
Files are stored securely on the server and linked to your account.
Can't find what you're looking for? Contact us:
Email: support@unicornai.studio
Website: unicornai.studio