HELPDESK.AI

AI-Powered Ticket Creation and Categorization from User Input

Infosys Springboard Virtual Internship 6.0 | Group 2

AI & Machine Learning Track | March 2026

Problem Statement

The Human-Process Bottleneck in ITSM

Manual Intervention

  • Heavy reliance on human agents to understand and interpret complex, unstructured user complaints.
  • Extensive manual effort required to categorize issues into correct support domains and teams.
  • Subjective assignment of priority levels, leading to inconsistent and delayed routing.

Operational Impact

  • Significant Delays in ticket resolution times and frequent breaches of agreed SLAs.
  • Critical Human Errors in classification resulting in misrouting and "ticket ping-pong."
  • Inability to Scale Capacity to meet growing enterprise support volumes efficiently.

A paradigm shift is required to automate intelligence at the point of entry.

Project Objectives

AI Ticket Automation

Completely automate ticket creation and multi-label triage using advanced Deep Learning models.

Metadata Extraction

Eliminate manual forms by dynamically extracting structured metadata via Named Entity Recognition.

Flood Prevention

Automatically block identical or redundant IT incidents using semantic similarity vector checks.

L1 Self-Service

Empower user resolution through a real-time conversational AI Chat interface resolving basic issues instantly.

Visionary OCR Analysis

Leverage Tesseract and Gemini to pull technical data and error codes directly from user screenshots.

Enterprise Scalability

Deploy a robust, multi-tenant serverless architecture capable of handling high-volume operational loads.

System Architecture

3-Layer Decoupled SaaS

L1
☁ Vercel Edge Network

Presentation Layer

React 19  ·  Vite  ·  TailwindCSS  ·  Zustand  ·  React Router v6

AI-Powered Ticket Submit AI Processing Simulator Auto-Resolve Chat Admin Analytics Dashboard Master Admin Portal Voice Input Stripe Subscriptions
POST /ai/analyze_ticket  ·  async JSON REST
L2
☁ Hugging Face Spaces

Intelligence & API Layer

FastAPI  ·  Python 3.12  ·  PyTorch  ·  HuggingFace Transformers

DistilBERT v3 Classifier NER Entity Harvesting Cosine Similarity Deduplication Tesseract OCR Gemini Reasoning < 400ms Inference Adversarial Retraining
Supabase Client  ·  JWT Bearer Token  ·  RLS Policies
L3
☁ Supabase Cloud

Data & Auth Layer

Supabase  ·  PostgreSQL  ·  JWT  ·  Row-Level Security (RLS)

profiles table (RBAC) tickets table (JSONB metadata) Email + Magic Link Auth Multi-Tenant Isolation Real-Time Sync 4-Layer Permission Matrix

Dataset Overview

Total Population

19,008 Tickets

ACCESS

Identity & permissions focus (MFA, Password Resets, Role Changes).

HARDWARE

Physical equipment issues (Battery, Monitor, Printers).

NETWORK

Connectivity and remote access (VPN, DNS, WiFi).

SOFTWARE

Application-level errors, installations, and license issues.

Category Distribution

Distribution of Ticket Priorities

Auto-Resolved Tickets

Data Preprocessing & Pipeline

01

Data Deduplication

Removing 386 duplicate support tickets for evaluation integrity.

02

Noise Reduction

Cleaning IT slang and standardizing mixed-case inputs.

03

BIO Tag Alignment

Mapping word-level NER tags to DistilBERT subword offsets.

04

Label Multi-Mapping

Encoding Category, Sub-Category, and Priority labels to unique IDs.

05

Stratified 80/20 Split

Balanced training/test sets locked by label distribution.

06

Tensor Conversion

Input formatting into PyTorch Tensors for GPU fine-tuning.

Output Representation View

// Original Input

"wifi is entirely dead in conference room B"

// Tokenized State

input_ids: [101, 7523, 2003, ... 3154, 102]

attention_mask: [1, 1, 1, ... 1, 1]

// Resulting Labels

label_id: 8 NETWORK | WIFI

ner_labels: [-100, 3, 0, ... 7, -100] BIO-TAGGED

Model Development & Training Pipeline

Colab Logo

Trained on Google Colab

Full GPU-accelerated distilbert-base-uncased fine-tuning

Open Notebook

Methodology & Execution

  • Strict Deduplication Removed 386 absolute duplicates to prevent data leakage and ensure an honest, unbiased evaluation.

  • Stratified Splitting Performed an 80/20 train-test split locked by label_id for balanced category representation.

  • Baseline Selection Deployed distilbert-base-uncased for its exceptional speed-to-performance ratio in dual classification networks.

  • Fine-Tuning Architecture Utilized PyTorch and HuggingFace Trainer with AdamW optimizer, fp16 acceleration, and Epoch logging.

Final Evaluation Results

Text Classification Model

99.25%

Accuracy

99.26%

Weighted F1

Trained on 6,946 deduplicated samples (Batch: 32, LR: 2e-5).

Named Entity Recognition (NER)

90.00%

F1-Score

92.31%

Precision

11 tag BIO scheme extraction (Batch: 16, LR: 5e-5).

Named Entity Recognition (NER)

The Approach

Transformer-based entity harvesting running in parallel with classification.

What it Extracts

Device Names, Hostnames, IP Addresses, OS Versions, Lab IDs, and Error Codes.

Why it Matters

Eliminates the need for users to manually fill out 10 different dropdown menus.

// Extracted Entities

{

"Device": "MacBook Pro M2",

"IP": "192.168.1.14",

"OS": "macOS 14.2",

"Error": "ERR_CONNECTION..."

}

USER Architecture Flow

How the system Works

Submits Issue
Neural Processing
Classification
Deduplication
Real-time State Sync
User (Frontend)
FastAPI Backend
AI Inference Engine
DistilBERT v3
NER Extraction
Cosine Similarity
Supabase DB
Admin Portal

Integration & Deployment

Complete Technology Stack

Frontend Experience

React 19
Vite
Tailwind CSS
Zustand

Backend & AI Inference

Python 3.12
FastAPI
PyTorch & HF
Google Gemini

Database & BaaS

Supabase
PostgreSQL

Hosting & CI/CD

Vercel
Hugging Face
GitHub Actions

Live Platform Demo

~3.5 Minutes

https://helpdesk-ai-app.vercel.app

Chaos to Clarity Flow

AI Visual Simulator

4-Layer Authorization

Challenges & Resiliency

[SYSTEM_LOG: STABLE]

Challenge 01

Semantic Distribution Disparity

$ analysis --weights
> ERR: High-frequency "General" noise distorting minority class vectors.
> Bias detected in specific hardware partitions.
Solution

Applied Synthetic Augmentation & stratified sampling to ensure robust detection of rare failure modes.

Challenge 02

Serverless Inference Latency

$ ping api.inference.hf
> 110M Params loading...
> TIMEOUT: Cold-start delay > 12s on sporadic requests.
Optimized

Integrated FastAPI Lifespan Hooks to pre-cache model in RAM, slashing cold-start latency by 95%.

Challenge 03

Colloquial Context Ambiguity

$ process "jira dead"
> UnknownToken: "Hinglish" mix & enterprise jargon.
> Confidence Score: 0.42 [WEAK]
Normalized

Fine-tuned on a Technical Colloquial Corpus to resolve mixed-lanuage slang and organization jargon.

Conclusion & Future Scope

Conclusion

Project Success

Sub-500ms

AI Neural Pipeline

Replacing the 4-8 minute human manual triage bottleneck with instant routing.

Future Scope

Active Learning Loops

Using corrections_log.json to auto-retrain based on Admin feedback.

Mobile Application

A native companion app with push-notification SLA alerts for immediate response.

Thank You!

Questions & Answers

Live Tech Demo

helpdeskaiv1.vercel.app

HELPDESK.AI • INFOSYS SPRINGBOARD INTERNSHIP 6.0