Back
Instant Guru
2026

Overview
Multimodal AI tutoring agent for Indian students (Class 10–12), live in production on the Arivihan app. Accepts image, text, and audio inputs and produces text or image responses. Uses chain-of-thought reasoning and an advanced subagent architecture with specialists for doubt solving, content search, and guidance — with multilingual understanding (Hindi, English, Hinglish) and real-time streaming.
Technologies
PythonFastAPIOpenAI APIGoogle GeminiRedisDockerSSE
Detailed Features
Multimodal I/O
- Accepts image, text, and audio inputs — students can snap a problem, type a doubt, or speak naturally.
- Generates text and image outputs depending on what best answers the query.
- Image and audio paths flow through the same agentic pipeline as text — single source of truth for reasoning.
Multilingual Understanding
- Native handling of Hindi, English, and Hinglish queries with code-mixed responses.
- Fuzzy keyword-to-chapter resolution across 11+ search tools (lectures, topper notes, PPT notes, PYQ papers/questions, important questions, chapterwise/full-length tests, NCERT solutions).
- Curriculum-aware: chapter and activity resolution backed by structured course datasets.
Real-Time Streaming
- Server-Sent Events (SSE) pipeline emitting structured events: status, response deltas, cards, buttons, follow-ups, and usage stats.
- Streaming response assembler merges parallel subagent outputs into a single unified reply.
- Pluggable LLM provider layer: OpenAI primary with a Gemini adapter.
Infrastructure & Sessions
- Two-tier session storage: Redis hot cache (48h TTL) backed by DynamoDB for persistence; async fire-and-forget writes.
- Context compaction strategy preserves recent turns while summarizing older history to control token cost.
- Dockerized deployment with FastAPI + Uvicorn; lifespan-managed Redis and DynamoDB clients.