Integrations.

AI model routing across 13 providers, NVIDIA NIM voice, YouTube and SoundCloud playback.

Section 03 of 5 6 min read ← All docs

AI model routing

Jarvis rotates through a pool of AI providers per request. There's no single hard-coded model. Whichever provider responds fastest and isn't rate-limited wins that turn.

The current pool includes:

  • OpenRouter - gateway access to dozens of frontier models (Claude, GPT, Llama, DeepSeek, etc.)
  • Mistral - Mistral Medium/Large, low latency, persona-friendly
  • Google Gemini - 2.5 Pro and 2.0 Flash, BLOCK_NONE safety
  • Groq - Llama 3.3 and Qwen on custom silicon, sub-second responses
  • Cerebras - Qwen 3 235B on wafer-scale inference
  • DeepSeek - V3.2 direct, includes "thinking" reasoning mode

Persona compliance varies by provider. Roughly, looser-safety models stay in character longer for long roleplay sessions; stricter-RLHF models occasionally break and lecture. The router weighs both latency and successful completion.

Voice stack - NVIDIA NIM

Voice is its own pipeline. When you run /voice, Jarvis joins the channel and starts streaming audio chunks through:

  1. Wake-word detection - local model listens for "jarvis" / "garmin" / your custom word
  2. Speech-to-text - NVIDIA NIM Parakeet, post wake-word audio only
  3. AI response - same provider pool as text chat
  4. Text-to-speech - NVIDIA NIM TTS, streamed back into the voice channel

Raw audio is never persisted. Only post-wake-word transcripts ever leave the box, and those follow the same 30-day memory retention as text chat.

Self-hosters: NIM access is optional. The bot runs without voice if you don't configure NIM credentials - it just won't respond to /voice.

Music sources

  • YouTube - searches via the YouTube Data API; playback via yt-dlpffmpeg. Both single tracks and playlists. Long-form videos auto-stream.
  • SoundCloud - public tracks resolve via the SoundCloud API. Private/login-walled tracks won't work.
  • Direct file upload - attach up to 9 audio files (10 MB each) to a single /play call. Supported formats include MP3, WAV, OGG, FLAC, M4A, OPUS.

Streams run through ffmpeg on the host.

Storage & portal

Per-server config, memory, blacklists, and warning logs live in MongoDB. Data at rest is encrypted with a master key; rotating the key re-encrypts on access.

The web portal uses Discord OAuth for sign-in. Sessions are HTTP-only cookies. Once signed in, you can manage automod lists, role/channel restrictions, and feature toggles from a UI instead of slash commands.

Webhooks & API

Jarvis exposes a small JSON API on /api/stats (public guild/user counts, 30-day reliability, and 24h request volume) and an internal webhook receiver. The webhook endpoint signs incoming bodies and is intended for internal release automation, not third-party integrations.

Endpoints worth knowing:

  • GET /api/stats - public, returns { guildCount, userCount, reliability30d, requests24h }
  • GET /health - bearer-gated liveness
  • GET /metrics/commands - bearer-gated command usage breakdown
  • GET /sitemap.xml, GET /robots.txt, GET /blog/feed.xml - SEO surface
Still stuck?

Drop a question in the AGIS support channel. Most things get answered within the day - the maintainer reads every message.