Skip to content

Tecknomancer/super-agent-interface

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

15 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

LLM Voice Interface

A web-based interface for LLM agents with full conversational voice capabilities, featuring a separate window for code/script display. This project allows users to interact with advanced language models using both voice and text, with real-time responses in both modalities.

LLM Voice Interface

🌟 Features

  • Voice Interaction: Seamless voice input and output with high-quality, low-latency synthesis using Piper (offline TTS)
  • Speech Recognition: Offline speech recognition powered by Mozilla DeepSpeech
  • Text Conversation: Real-time text display with proper formatting for messages
  • Code Display: Separate window for viewing and copying generated code, with syntax highlighting
  • API Key Management: Securely store and manage your API keys for different LLM providers
  • Multiple LLM Support: Connect to different LLM providers including OpenAI (GPT-4, GPT-3.5) and Anthropic (Claude)
  • Customizable Voice: Select from multiple voice options including male, female, British, Australian, Indian, and Spanish accents
  • Responsive Design: Works on desktop and mobile devices with adaptive layout
  • Resizable Interface: Drag to resize conversation and code panels to your preference

πŸ”§ Technology Stack

  • Frontend: HTML5, CSS3, JavaScript (Vanilla)
  • Backend: Node.js, Express
  • Real-time Communication: Socket.io
  • Speech Recognition: Mozilla DeepSpeech (offline)
  • Text-to-Speech: Piper (offline, high-quality synthesis)
  • Code Highlighting: highlight.js
  • LLM Integration: OpenAI API, Anthropic API

πŸš€ Quick Start

Prerequisites

  • Node.js (v16 or higher)
  • npm or yarn
  • 2GB+ free disk space for voice models

Installation

  1. Clone the repository

    git clone [https://github.com/Tecknomancer/super-agent-interface.git)
    cd llm-voice-interface
  2. Install dependencies and download models

    npm run setup

    This will install all required dependencies and download the voice models needed for speech recognition and synthesis.

  3. Set up environment variables

    cp .env.example .env

    Then edit the .env file to add your API keys for OpenAI and/or Anthropic.

  4. Start the server

    npm start
  5. Open your browser Navigate to http://localhost:3000 to start using the interface.

πŸ“ Usage Guide

Voice Interaction

  1. Click the microphone button to start voice recording
  2. Speak clearly into your microphone
  3. Recording will automatically stop after you finish speaking
  4. The system will process your speech, send it to the LLM, and respond both in text and voice

Text Interaction

  1. Type your message in the input field
  2. Press Enter or click the send button
  3. The system will process your message and respond in text (and voice if enabled)

Code Display

  • Code snippets detected in the LLM response will automatically appear in the code window
  • Use the "Copy" button to copy code to your clipboard
  • The language is automatically detected and syntax highlighting applied

Customizing Voice

  1. Use the voice selector dropdown to choose your preferred voice
  2. Toggle voice output on/off using the speaker button
  3. Voice settings are remembered between sessions

Managing API Keys

  1. Click the "API Keys" button to open the management modal
  2. Add new API keys with a name, provider type, and the key itself
  3. Select and use any stored API key
  4. Delete keys you no longer need

πŸ”Š Voice Features

Available Voices

The system includes multiple voice options:

  • Male (US English)
  • Female (US English)
  • British English
  • Australian English
  • Indian English
  • Spanish

Offline Processing

Both speech recognition and text-to-speech are processed locally, providing:

  • Enhanced privacy (no audio data sent to external services)
  • Lower latency for voice interactions
  • No usage limits or API costs
  • Ability to function without internet connectivity

πŸ› οΈ Project Structure

llm-voice-interface/
β”œβ”€β”€ .github/               # GitHub workflow configurations
β”œβ”€β”€ public/                # Client-side files
β”‚   β”œβ”€β”€ css/               # Styling
β”‚   β”œβ”€β”€ js/                # Client JavaScript
β”‚   β”œβ”€β”€ assets/            # Images and other assets
β”‚   └── index.html         # Main HTML file
β”œβ”€β”€ server/                # Server-side code
β”‚   β”œβ”€β”€ server.js          # Express server
β”‚   └── services/          # Service modules
β”‚       β”œβ”€β”€ llmService.js  # LLM API integration
β”‚       β”œβ”€β”€ piperService.js # Text-to-speech
β”‚       └── speechRecognitionService.js # Speech-to-text
β”œβ”€β”€ scripts/               # Utility scripts
└── README.md              # This file

🧩 API Integration

Supported LLM Providers

  • OpenAI: GPT-4o, GPT-3.5-turbo
  • Anthropic: Claude 3 Opus, Claude 3 Sonnet

Adding New LLM Providers

To add support for additional LLM providers:

  1. Extend the llmService.js file with appropriate API calls
  2. Add the new provider to the model selector in index.html
  3. Implement the necessary authentication in apiKeyManager.js

πŸ”’ Security Notes

  • API keys are stored in browser localStorage with basic encoding
  • For production use, consider implementing a more secure key management system
  • The project is designed for personal/local use by default

πŸ” Troubleshooting

Voice Recognition Issues

  • Ensure your microphone is properly connected and has necessary permissions
  • Check that the DeepSpeech models were downloaded correctly
  • Consider using a USB microphone for better quality in noisy environments

Voice Synthesis Issues

  • Verify that Piper was installed correctly during setup
  • Check console for any errors related to voice synthesis
  • Ensure the selected voice model exists in the models directory

Connection Issues

  • Verify that the server is running on the expected port
  • Check for any CORS issues if accessing from a different domain
  • Ensure WebSocket connections are not blocked by firewalls

πŸ“š Development

Running in Development Mode

npm run dev

This starts the server with hot reloading enabled.

Adding Custom Voice Models

Custom voice models compatible with Piper can be added to the server/services/voice/models directory. After adding, update the VOICE_MODELS object in piperService.js.

πŸ“± Mobile Support

The interface is fully responsive and works on mobile devices with the following considerations:

  • The panels stack vertically on small screens
  • Microphone access requires HTTPS on most mobile browsers
  • Performance may vary depending on the device's capabilities

πŸ“„ License

MIT License - See LICENSE file for details.

πŸ™ Acknowledgments

🀝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

  1. Fork the project
  2. Create your feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add some amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

πŸ“¬ Contact

Project Link: https://github.com/Tecknomancer/super-agent-interface

About

LLM Voice Interface

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published