A web-based interface for LLM agents with full conversational voice capabilities, featuring a separate window for code/script display. This project allows users to interact with advanced language models using both voice and text, with real-time responses in both modalities.
- Voice Interaction: Seamless voice input and output with high-quality, low-latency synthesis using Piper (offline TTS)
- Speech Recognition: Offline speech recognition powered by Mozilla DeepSpeech
- Text Conversation: Real-time text display with proper formatting for messages
- Code Display: Separate window for viewing and copying generated code, with syntax highlighting
- API Key Management: Securely store and manage your API keys for different LLM providers
- Multiple LLM Support: Connect to different LLM providers including OpenAI (GPT-4, GPT-3.5) and Anthropic (Claude)
- Customizable Voice: Select from multiple voice options including male, female, British, Australian, Indian, and Spanish accents
- Responsive Design: Works on desktop and mobile devices with adaptive layout
- Resizable Interface: Drag to resize conversation and code panels to your preference
- Frontend: HTML5, CSS3, JavaScript (Vanilla)
- Backend: Node.js, Express
- Real-time Communication: Socket.io
- Speech Recognition: Mozilla DeepSpeech (offline)
- Text-to-Speech: Piper (offline, high-quality synthesis)
- Code Highlighting: highlight.js
- LLM Integration: OpenAI API, Anthropic API
- Node.js (v16 or higher)
- npm or yarn
- 2GB+ free disk space for voice models
-
Clone the repository
git clone [https://github.com/Tecknomancer/super-agent-interface.git) cd llm-voice-interface
-
Install dependencies and download models
npm run setup
This will install all required dependencies and download the voice models needed for speech recognition and synthesis.
-
Set up environment variables
cp .env.example .env
Then edit the
.env
file to add your API keys for OpenAI and/or Anthropic. -
Start the server
npm start
-
Open your browser Navigate to
http://localhost:3000
to start using the interface.
- Click the microphone button to start voice recording
- Speak clearly into your microphone
- Recording will automatically stop after you finish speaking
- The system will process your speech, send it to the LLM, and respond both in text and voice
- Type your message in the input field
- Press Enter or click the send button
- The system will process your message and respond in text (and voice if enabled)
- Code snippets detected in the LLM response will automatically appear in the code window
- Use the "Copy" button to copy code to your clipboard
- The language is automatically detected and syntax highlighting applied
- Use the voice selector dropdown to choose your preferred voice
- Toggle voice output on/off using the speaker button
- Voice settings are remembered between sessions
- Click the "API Keys" button to open the management modal
- Add new API keys with a name, provider type, and the key itself
- Select and use any stored API key
- Delete keys you no longer need
The system includes multiple voice options:
- Male (US English)
- Female (US English)
- British English
- Australian English
- Indian English
- Spanish
Both speech recognition and text-to-speech are processed locally, providing:
- Enhanced privacy (no audio data sent to external services)
- Lower latency for voice interactions
- No usage limits or API costs
- Ability to function without internet connectivity
llm-voice-interface/
βββ .github/ # GitHub workflow configurations
βββ public/ # Client-side files
β βββ css/ # Styling
β βββ js/ # Client JavaScript
β βββ assets/ # Images and other assets
β βββ index.html # Main HTML file
βββ server/ # Server-side code
β βββ server.js # Express server
β βββ services/ # Service modules
β βββ llmService.js # LLM API integration
β βββ piperService.js # Text-to-speech
β βββ speechRecognitionService.js # Speech-to-text
βββ scripts/ # Utility scripts
βββ README.md # This file
- OpenAI: GPT-4o, GPT-3.5-turbo
- Anthropic: Claude 3 Opus, Claude 3 Sonnet
To add support for additional LLM providers:
- Extend the
llmService.js
file with appropriate API calls - Add the new provider to the model selector in
index.html
- Implement the necessary authentication in
apiKeyManager.js
- API keys are stored in browser localStorage with basic encoding
- For production use, consider implementing a more secure key management system
- The project is designed for personal/local use by default
- Ensure your microphone is properly connected and has necessary permissions
- Check that the DeepSpeech models were downloaded correctly
- Consider using a USB microphone for better quality in noisy environments
- Verify that Piper was installed correctly during setup
- Check console for any errors related to voice synthesis
- Ensure the selected voice model exists in the models directory
- Verify that the server is running on the expected port
- Check for any CORS issues if accessing from a different domain
- Ensure WebSocket connections are not blocked by firewalls
npm run dev
This starts the server with hot reloading enabled.
Custom voice models compatible with Piper can be added to the server/services/voice/models
directory. After adding, update the VOICE_MODELS
object in piperService.js
.
The interface is fully responsive and works on mobile devices with the following considerations:
- The panels stack vertically on small screens
- Microphone access requires HTTPS on most mobile browsers
- Performance may vary depending on the device's capabilities
MIT License - See LICENSE file for details.
- Mozilla DeepSpeech for the speech recognition engine
- Piper for the high-quality text-to-speech synthesis
- Socket.io for real-time communication
- highlight.js for code syntax highlighting
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the project
- Create your feature branch (
git checkout -b feature/amazing-feature
) - Commit your changes (
git commit -m 'Add some amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
Project Link: https://github.com/Tecknomancer/super-agent-interface