AIRAGNext.jsMySQLGemini 2.0
Building my Digital Twin: A Deep Dive into Free RAG Architecture
By •6 min read

How I used Gemini 2.0 and MySQL to build an interactive AI version of myself trained on PDF documentation.
# Building a 100% Free RAG Digital Twin with Gemini 2.0 & MySQL
Have you ever wanted to talk to a PDF? Or better yet, have an AI version of yourself talk to your recruiters based on your actual career documentation?
In this guide, I'll walk you through how I built my "Digital Twin" chatbot using a high-performance **RAG (Retrieval-Augmented Generation)** architecture—all while keeping it 100% free using Google's Gemini 2.0 Flash and a standard MySQL database.
---
## The Architecture: How it Works
Most AI chatbots "guess" based on general training. A Digital Twin must be grounded in facts. Here is the 4-step pipeline I implemented:
### 1. The Ingestion Engine (PDF to Knowledge)
We don't just "upload" a PDF. We use a custom Node.js script and `pdfjs-dist` to extract raw text. But raw text is messy. To make it AI-ready, I implemented **Semantic Chunking**:
* **Sentence-Aware Splitting**: We break the text into 1,000-character "nuggets."
* **Overlapping Context**: Each nugget starts with the previous sentence so the AI never loses the thread of a thought.
* **1st-Person Transformation**: During ingestion, we use Regex to convert 3rd-person mentions ("Ashish did X") into 1st-person facts ("I did X").
### 2. The Storage (MySQL Full-Text Search)
Instead of expensive vector databases like Pinecone, I utilized **MySQL Full-Text Indexing**. By adding a `FULLTEXT` index to the `content` column, we can perform "Natural Language" queries at lightning speed.
```sql
SELECT content FROM ai_knowledge
WHERE MATCH(content) AGAINST ('How much experience does Ashish have?' IN NATURAL LANGUAGE MODE);
```
### 3. The Brain (Gemini 2.0 Flash)
When a user asks a question, we don't send the whole PDF to the AI (that would be slow and expensive). Instead:
1. We find the top 3 most relevant "nuggets" in the DB.
2. We send **only those nuggets** + the user question to **Gemini 2.0 Flash**.
3. We use a "Persona-Locked" prompt to ensure the AI speaks only as Me.
### 4. The Voice (Web Speech API)
To make the twin truly "alive," I integrated the browser's `SpeechSynthesisUtterance` API. The moment the AI synthesizes an answer, the browser speaks it back in a natural human voice.
---
## Why this is a Game Changer for Portfolios
* **Zero Cost**: Uses Gemini's generous free tier and your existing hosting DB.
* **Highly Scalable**: You can add 100 PDFs or 100,000 nuggets, and MySQL will still find the answer in milliseconds.
* **Persona Integrity**: By transforming the data during ingestion, the AI never breaks character.
This project proves that you don't need a massive budget to build state-of-the-art AI features. You just need the right architecture.