Thank you for the feedback and we very much appreciate it. Local models now supported in new release v0.2.0
We have implemented local models through two main approaches:
1. Ollama Integration
The easiest way to get started with local models. ADK-Rust has native Ollama support, so you can run any model available through Ollama (Llama, Mistral, Gemma, etc.) locally.
See: Ollama Documentation
For Ollama (no key, just run: ollama serve) and simple project like the one below:
use adk_rust::prelude::*;
use adk_rust::Launcher;
#[tokio::main]
async fn main() -> AnyhowResult<()> {
dotenvy::dotenv().ok();
// Requires: ollama serve && ollama pull llama3.2
let model = OllamaModel::new(OllamaConfig::new("llama3.2"))?;
let agent = LlmAgentBuilder::new("assistant")
.instruction("You are a helpful assistant.")
.model(Arc::new(model))
.build()?;
Launcher::new(Arc::new(agent)).run().await?;
Ok(())
}
2. mistral.rs Integration
For high-performance local inference, we've integrated mistral.rs directly. This gives you native Rust performance for local model execution without external dependencies.
See: mistral.rs Documentation
For native local inference without external dependencies, use the adk-mistralrs crate:
use adk_mistralrs::{MistralRsModel, MistralRsConfig, ModelSource, QuantizationLevel};
use adk_agent::LlmAgentBuilder;
use std::sync::Arc;
#[tokio::main]
async fn main() -> anyhow::Result<()> {
// Load model with ISQ quantization for reduced memory
let config = MistralRsConfig::builder()
.model_source(ModelSource::huggingface("microsoft/Phi-3.5-mini-instruct"))
.isq(QuantizationLevel::Q4_0)
.paged_attention(true)
.build();
let model = MistralRsModel::new(config).await?;
let agent = LlmAgentBuilder::new("local-assistant")
.instruction("You are a helpful assistant running locally.")
.model(Arc::new(model))
.build()?;
Ok(())
}
Note: adk-mistralrs is not on crates.io due to git dependencies.
Features: ISQ quantization, PagedAttention, multi-GPU splitting, LoRA/X-LoRA adapters, vision/speech/diffusion models, MCP integration.