LLM Voice Interface

A web-based interface for LLM agents with full conversational voice capabilities, featuring a separate window for code/script display. This project allows users to interact with advanced language models using both voice and text, with real-time responses in both modalities.

🌟 Features

Voice Interaction: Seamless voice input and output with high-quality, low-latency synthesis using Piper (offline TTS)
Speech Recognition: Offline speech recognition powered by Mozilla DeepSpeech
Text Conversation: Real-time text display with proper formatting for messages
Code Display: Separate window for viewing and copying generated code, with syntax highlighting
API Key Management: Securely store and manage your API keys for different LLM providers
Multiple LLM Support: Connect to different LLM providers including OpenAI (GPT-4, GPT-3.5) and Anthropic (Claude)
Customizable Voice: Select from multiple voice options including male, female, British, Australian, Indian, and Spanish accents
Responsive Design: Works on desktop and mobile devices with adaptive layout
Resizable Interface: Drag to resize conversation and code panels to your preference

🔧 Technology Stack

Frontend: HTML5, CSS3, JavaScript (Vanilla)
Backend: Node.js, Express
Real-time Communication: Socket.io
Speech Recognition: Mozilla DeepSpeech (offline)
Text-to-Speech: Piper (offline, high-quality synthesis)
Code Highlighting: highlight.js
LLM Integration: OpenAI API, Anthropic API

🚀 Quick Start

Prerequisites

Node.js (v16 or higher)
npm or yarn
2GB+ free disk space for voice models

Installation

Clone the repository

git clone [https://github.com/Tecknomancer/super-agent-interface.git)
cd llm-voice-interface

Install dependencies and download models
```
npm run setup
```
This will install all required dependencies and download the voice models needed for speech recognition and synthesis.
Set up environment variables
```
cp .env.example .env
```
Then edit the .env file to add your API keys for OpenAI and/or Anthropic.
Start the server
```
npm start
```
Open your browser Navigate to http://localhost:3000 to start using the interface.

📝 Usage Guide

Voice Interaction

Click the microphone button to start voice recording
Speak clearly into your microphone
Recording will automatically stop after you finish speaking
The system will process your speech, send it to the LLM, and respond both in text and voice

Text Interaction

Type your message in the input field
Press Enter or click the send button
The system will process your message and respond in text (and voice if enabled)

Code Display

Code snippets detected in the LLM response will automatically appear in the code window
Use the "Copy" button to copy code to your clipboard
The language is automatically detected and syntax highlighting applied

Customizing Voice

Use the voice selector dropdown to choose your preferred voice
Toggle voice output on/off using the speaker button
Voice settings are remembered between sessions

Managing API Keys

Click the "API Keys" button to open the management modal
Add new API keys with a name, provider type, and the key itself
Select and use any stored API key
Delete keys you no longer need

🔊 Voice Features

Available Voices

The system includes multiple voice options:

Male (US English)
Female (US English)
British English
Australian English
Indian English
Spanish

Offline Processing

Both speech recognition and text-to-speech are processed locally, providing:

Enhanced privacy (no audio data sent to external services)
Lower latency for voice interactions
No usage limits or API costs
Ability to function without internet connectivity

🛠️ Project Structure

llm-voice-interface/
├── .github/               # GitHub workflow configurations
├── public/                # Client-side files
│   ├── css/               # Styling
│   ├── js/                # Client JavaScript
│   ├── assets/            # Images and other assets
│   └── index.html         # Main HTML file
├── server/                # Server-side code
│   ├── server.js          # Express server
│   └── services/          # Service modules
│       ├── llmService.js  # LLM API integration
│       ├── piperService.js # Text-to-speech
│       └── speechRecognitionService.js # Speech-to-text
├── scripts/               # Utility scripts
└── README.md              # This file

🧩 API Integration

Supported LLM Providers

OpenAI: GPT-4o, GPT-3.5-turbo
Anthropic: Claude 3 Opus, Claude 3 Sonnet

Adding New LLM Providers

To add support for additional LLM providers:

Extend the llmService.js file with appropriate API calls
Add the new provider to the model selector in index.html
Implement the necessary authentication in apiKeyManager.js

🔒 Security Notes

API keys are stored in browser localStorage with basic encoding
For production use, consider implementing a more secure key management system
The project is designed for personal/local use by default

🔍 Troubleshooting

Voice Recognition Issues

Ensure your microphone is properly connected and has necessary permissions
Check that the DeepSpeech models were downloaded correctly
Consider using a USB microphone for better quality in noisy environments

Voice Synthesis Issues

Verify that Piper was installed correctly during setup
Check console for any errors related to voice synthesis
Ensure the selected voice model exists in the models directory

Connection Issues

Verify that the server is running on the expected port
Check for any CORS issues if accessing from a different domain
Ensure WebSocket connections are not blocked by firewalls

📚 Development

Running in Development Mode

npm run dev

This starts the server with hot reloading enabled.

Adding Custom Voice Models

Custom voice models compatible with Piper can be added to the server/services/voice/models directory. After adding, update the VOICE_MODELS object in piperService.js.

📱 Mobile Support

The interface is fully responsive and works on mobile devices with the following considerations:

The panels stack vertically on small screens
Microphone access requires HTTPS on most mobile browsers
Performance may vary depending on the device's capabilities

📄 License

MIT License - See LICENSE file for details.

🙏 Acknowledgments

Mozilla DeepSpeech for the speech recognition engine
Piper for the high-quality text-to-speech synthesis
Socket.io for real-time communication
highlight.js for code syntax highlighting

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Fork the project
Create your feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add some amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

📬 Contact

Project Link: https://github.com/Tecknomancer/super-agent-interface

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
public		public
scripts		scripts
server		server
LICENSE		LICENSE
README.md		README.md
package.json		package.json

License

Tecknomancer/super-agent-interface

Folders and files

Latest commit

History

Repository files navigation