Building an Email Support Agent with AI: Behind the Scenes
When you're running a lean operation, customer support can become a bottleneck fast. That's why we built Nova — an AI email support agent that monitors our inbox, classifies incoming messages, drafts responses, and knows when to escalate to a human.
This post is a behind-the-scenes look at how Nova works, the technical decisions we made, and the lessons we learned along the way.
Why Email Support (Not Chat)?
We considered building a chatbot first, but email support made more sense for several reasons:
- Asynchronous — No pressure to respond in real-time, which gives AI more time to think
- Documented — Every interaction is automatically logged
- Structured — Emails have clear starts and ends, unlike rambling chat conversations
- Universal — Every customer has email; not everyone wants to use chat
- Error-tolerant — A slight delay in email response is normal; a frozen chatbot is frustrating
Architecture Overview
Nova's architecture has four core components:
1. IMAP Monitor
The first component watches our support inbox for new emails. It connects via IMAP (Internet Message Access Protocol) and polls every 60 seconds.
IMAP Inbox → Poll every 60s → New email detected → Parse → Send to classifier
Key technical decisions:
- We use Node.js with the
imapflowlibrary for reliable IMAP connections - Emails are parsed with
mailparserto extract the body, attachments, and metadata - We maintain an IMAP IDLE connection for near-instant detection, falling back to polling if IDLE disconnects
- Each processed email gets flagged so we never process it twice
2. AI Classifier
When a new email arrives, the classifier determines:
- Category — Is this a support request, sales inquiry, spam, or something else?
- Priority — Is this urgent (account locked, payment issue) or routine (feature request, general question)?
- Intent — What specifically does the customer need?
- Sentiment — Is the customer frustrated, neutral, or happy?
The classifier uses an LLM with a carefully crafted system prompt. We found that providing 5-10 example classifications dramatically improved accuracy compared to just describing the categories.
Classification accuracy over time:
- Week 1: ~75% (frequent misclassification of edge cases)
- Week 4: ~88% (after refining examples and adding edge case handling)
- Week 12: ~94% (with ongoing prompt refinement and feedback loops)
3. Response Engine
Based on the classification, Nova takes one of three actions:
Auto-respond (60% of emails):
For common questions with clear answers — password resets, pricing inquiries, feature explanations. Nova drafts a response using the classification context and our knowledge base, then sends it directly.
Draft for review (25% of emails):
For less straightforward requests — custom quotes, technical troubleshooting, partnership inquiries. Nova drafts a response but flags it for human review before sending.
Escalate immediately (15% of emails):
For situations requiring human judgment — angry customers, legal issues, account security, anything the classifier is uncertain about. These go straight to a human with Nova's classification attached.
4. Escalation Layer
The escalation system is the most important part of the entire setup. Getting it wrong means either:
- Customers get bad AI responses (too little escalation)
- Humans get overwhelmed with trivial requests (too much escalation)
Our escalation rules:
- Confidence score below 0.7 → Escalate
- Negative sentiment + high priority → Escalate
- Customer has emailed 3+ times on the same issue → Escalate
- Email mentions "lawyer," "legal," "sue," "cancel subscription" → Escalate
- Attachment is present and category is unclear → Escalate
The Knowledge Base
Nova's responses are only as good as the knowledge it has access to. We maintain a structured knowledge base with:
- Product documentation — Features, pricing, how-to guides
- FAQ entries — Common questions and canonical answers
- Policy documents — Refund policy, privacy policy, terms of service
- Troubleshooting guides — Step-by-step fixes for common issues
- Response templates — Pre-approved language for sensitive topics
The knowledge base is stored as structured markdown files, loaded into Nova's context when drafting responses. We update it weekly based on new questions that come in.
Lessons Learned
1. The 80/20 Rule Applies Perfectly
Roughly 80% of support emails fall into 5-6 common categories. If you nail those categories, you've automated the bulk of your support workload. Don't try to handle every edge case from day one.
2. Confidence Scores Are Essential
Every AI classification should include a confidence score. Without it, you can't set meaningful escalation thresholds. We use the LLM's own confidence assessment plus a secondary check based on keyword matching.
3. Tone Matters More Than You Think
Early on, Nova's responses were accurate but felt robotic. We spent significant time refining the tone — making responses warm, helpful, and human-like without being fake. The key was including tone guidelines in the system prompt with specific examples.
4. Feedback Loops Drive Improvement
Every time a human corrects Nova's classification or rewrites a response, that feedback gets incorporated into the next iteration. This continuous improvement loop is what took us from 75% to 94% accuracy.
5. Start with Human-in-the-Loop
We ran Nova in "draft only" mode for the first three weeks. Every response was reviewed by a human before sending. This built our confidence in the system and generated the training data we needed to improve.
6. Monitor, Monitor, Monitor
We track response time, classification accuracy, customer satisfaction scores, and escalation rates daily. Any sudden change triggers an alert. AI systems can degrade silently — monitoring catches issues before customers do.
Results After 3 Months
- Average response time: From 4 hours to 12 minutes
- Support volume handled by AI: 60% fully automated
- Customer satisfaction: Maintained at 4.6/5 (no decline from pre-AI)
- Human support time saved: ~25 hours per week
- Cost: LLM API costs of approximately $30/month (using efficient model routing)
Should You Build One?
If your business handles more than 20 support emails per day with repetitive questions, an AI email agent pays for itself quickly. Here's our recommendation:
- Start by categorizing — Manually classify 100 recent emails to understand your patterns
- Build the classifier first — Get classification working before auto-responses
- Run in draft mode — Human review everything for at least 2 weeks
- Gradually release — Auto-respond to the easiest category first, then expand
- Never stop monitoring — Weekly accuracy reviews are non-negotiable
Building Nova was one of the best investments we've made at AuditX. It's not about replacing human support — it's about ensuring every customer gets a fast, accurate response, whether from AI or a person.