Skip to main content

SakuraSensei: Japanese Conversational AI Tutor

·1 min

Read the full blog post about building SakuraSensei

Context-aware Japanese Telegram bot with LangChain, custom persona, memory persistence, multi-dataset RAG (JLPT, JMDICT, Tatoeba, JaSquad), multi-agent news explanation, cloze-question generation from YouTube via Whisper + VAD.

Technologies Used #

AI & Language Models:

  • LangChain for conversation orchestration
  • Custom persona and memory persistence
  • Multi-agent architecture

Data Sources & RAG:

  • JLPT vocabulary and grammar datasets
  • JMDICT (Japanese-English dictionary)
  • Tatoeba example sentences
  • JaSquad question-answering dataset

Audio Processing:

  • Whisper for speech recognition
  • Voice Activity Detection (VAD)
  • YouTube audio extraction

Platform:

  • Telegram Bot API
  • Python backend

Key Features #

Conversational Learning:

  • Context-aware conversations in Japanese
  • Personalized learning experience with memory
  • Custom AI persona for engaging interactions

Multi-Dataset RAG:

  • Retrieval-augmented generation from multiple Japanese learning resources
  • JLPT-level appropriate content
  • Example sentences and definitions

News Explanation:

  • Multi-agent system for explaining Japanese news
  • Breaking down complex articles into learnable content

Interactive Quizzes:

  • Cloze-question generation from YouTube videos
  • Automated question creation using Whisper transcription
  • Real-time practice materials

Memory & Persistence:

  • Conversation history tracking
  • User progress monitoring
  • Personalized learning paths