Integrations.
AI model routing across 13 providers, NVIDIA NIM voice, YouTube and SoundCloud playback.
AI model routing
Jarvis rotates through a pool of AI providers per request. There's no single hard-coded model. Whichever provider responds fastest and isn't rate-limited wins that turn.
The current pool includes:
- OpenRouter - gateway access to dozens of frontier models (Claude, GPT, Llama, DeepSeek, etc.)
- Mistral - Mistral Medium/Large, low latency, persona-friendly
- Google Gemini - 2.5 Pro and 2.0 Flash, BLOCK_NONE safety
- Groq - Llama 3.3 and Qwen on custom silicon, sub-second responses
- Cerebras - Qwen 3 235B on wafer-scale inference
- DeepSeek - V3.2 direct, includes "thinking" reasoning mode
Persona compliance varies by provider. Roughly, looser-safety models stay in character longer for long roleplay sessions; stricter-RLHF models occasionally break and lecture. The router weighs both latency and successful completion.
Voice stack - NVIDIA NIM
Voice is its own pipeline. When you run /voice, Jarvis joins the channel and starts streaming audio chunks through:
- Wake-word detection - local model listens for "jarvis" / "garmin" / your custom word
- Speech-to-text - NVIDIA NIM Parakeet, post wake-word audio only
- AI response - same provider pool as text chat
- Text-to-speech - NVIDIA NIM TTS, streamed back into the voice channel
Raw audio is never persisted. Only post-wake-word transcripts ever leave the box, and those follow the same 30-day memory retention as text chat.
Self-hosters: NIM access is optional. The bot runs without voice if you don't configure NIM credentials - it just won't respond to /voice.
Music sources
- YouTube - searches via the YouTube Data API; playback via
yt-dlp→ffmpeg. Both single tracks and playlists. Long-form videos auto-stream. - SoundCloud - public tracks resolve via the SoundCloud API. Private/login-walled tracks won't work.
- Direct file upload - attach up to 9 audio files (10 MB each) to a single
/playcall. Supported formats include MP3, WAV, OGG, FLAC, M4A, OPUS.
Streams run through ffmpeg on the host.
Storage & portal
Per-server config, memory, blacklists, and warning logs live in MongoDB. Data at rest is encrypted with a master key; rotating the key re-encrypts on access.
The web portal uses Discord OAuth for sign-in. Sessions are HTTP-only cookies. Once signed in, you can manage automod lists, role/channel restrictions, and feature toggles from a UI instead of slash commands.
Webhooks & API
Jarvis exposes a small JSON API on /api/stats (public guild/user counts, 30-day reliability, and 24h request volume) and an internal webhook receiver. The webhook endpoint signs incoming bodies and is intended for internal release automation, not third-party integrations.
Endpoints worth knowing:
GET /api/stats- public, returns{ guildCount, userCount, reliability30d, requests24h }GET /health- bearer-gated livenessGET /metrics/commands- bearer-gated command usage breakdownGET /sitemap.xml,GET /robots.txt,GET /blog/feed.xml- SEO surface
Drop a question in the AGIS support channel. Most things get answered within the day - the maintainer reads every message.