HELPDESK.AI

AI-Powered Ticket Creation and Categorization from User Input

Infosys Springboard Virtual Internship 6.0 | Group 2

AI & Machine Learning Track | March 2026

Problem Statement

The Human-Process Bottleneck in ITSM

Manual Intervention

  • Heavy reliance on human agents to understand and interpret complex, unstructured user complaints.
  • Extensive manual effort required to categorize issues into correct support domains and teams.
  • Subjective assignment of priority levels, leading to inconsistent and delayed routing.

Operational Impact

  • Significant Delays in ticket resolution times and frequent breaches of agreed SLAs.
  • Critical Human Errors in classification resulting in misrouting and "ticket ping-pong."
  • Inability to Scale Capacity to meet growing enterprise support volumes efficiently.

A paradigm shift is required to automate intelligence at the point of entry.

Project Objectives

AI Ticket Automation

Completely automate ticket creation and multi-label triage using advanced Deep Learning models.

Metadata Extraction

Eliminate manual forms by dynamically extracting structured metadata via Named Entity Recognition.

Flood Prevention

Automatically block identical or redundant IT incidents using semantic similarity vector checks.

L1 Self-Service

Empower user resolution through a real-time conversational AI Chat interface resolving basic issues instantly.

Visionary OCR Analysis

Leverage Tesseract and Gemini to pull technical data and error codes directly from user screenshots.

Enterprise Scalability

Deploy a robust, multi-tenant serverless architecture capable of handling high-volume operational loads.

System Architecture

How the system Works

Submits Issue
Neural Processing
Classification
Deduplication
Real-time State Sync
User (Frontend)
FastAPI Backend
AI Inference Engine
DistilBERT v3
NER Extraction
Cosine Similarity
Supabase DB
Admin Portal

Dataset Overview

Total Population

19,008 Tickets

ACCESS

Identity & permissions focus (MFA, Password Resets, Role Changes).

HARDWARE

Physical equipment issues (Battery, Monitor, Printers).

NETWORK

Connectivity and remote access (VPN, DNS, WiFi).

SOFTWARE

Application-level errors, installations, and license issues.

Category Distribution

Distribution of Ticket Priorities

Auto-Resolved Tickets

Data Preprocessing & Pipeline

01

Data Deduplication

Removing 386 duplicate support tickets for evaluation integrity.

02

Noise Reduction

Cleaning IT slang and standardizing mixed-case inputs.

03

BIO Tag Alignment

Mapping word-level NER tags to DistilBERT subword offsets.

04

Label Multi-Mapping

Encoding Category, Sub-Category, and Priority labels to unique IDs.

05

Stratified 80/20 Split

Balanced training/test sets locked by label distribution.

06

Tensor Conversion

Input formatting into PyTorch Tensors for GPU fine-tuning.

Output Representation View

// Original Input

"wifi is entirely dead in conference room B"

// Tokenized State

input_ids: [101, 7523, 2003, ... 3154, 102]

attention_mask: [1, 1, 1, ... 1, 1]

// Resulting Labels

label_id: 8 NETWORK | WIFI

ner_labels: [-100, 3, 0, ... 7, -100] BIO-TAGGED

Model Development & Training Pipeline

Colab Logo

Trained on Google Colab

Full GPU-accelerated distilbert-base-uncased fine-tuning

Open Notebook

Methodology & Execution

  • Strict Deduplication Removed 386 absolute duplicates to prevent data leakage and ensure an honest, unbiased evaluation.

  • Stratified Splitting Performed an 80/20 train-test split locked by label_id for balanced category representation.

  • Baseline Selection Deployed distilbert-base-uncased for its exceptional speed-to-performance ratio in dual classification networks.

  • Fine-Tuning Architecture Utilized PyTorch and HuggingFace Trainer with AdamW optimizer, fp16 acceleration, and Epoch logging.

Final Evaluation Results

Text Classification Model

99.25%

Accuracy

99.26%

Weighted F1

Trained on 6,946 deduplicated samples (Batch: 32, LR: 2e-5).

Named Entity Recognition (NER)

90.00%

F1-Score

92.31%

Precision

11 tag BIO scheme extraction (Batch: 16, LR: 5e-5).

Named Entity Recognition (NER)

The Approach

Transformer-based entity harvesting running in parallel with classification.

What it Extracts

Device Names, Hostnames, IP Addresses, OS Versions, Lab IDs, and Error Codes.

Why it Matters

Eliminates the need for users to manually fill out 10 different dropdown menus.

// Extracted Entities

{

"Device": "MacBook Pro M2",

"IP": "192.168.1.14",

"OS": "macOS 14.2",

"Error": "ERR_CONNECTION..."

}

Integration & Deployment

Complete Technology Stack

Frontend Experience

React 19
Vite
Tailwind CSS
Zustand

Backend & AI Inference

Python 3.12
FastAPI
PyTorch & HF
Google Gemini

Database & BaaS

Supabase
PostgreSQL

Hosting & CI/CD

Vercel
Hugging Face
GitHub Actions

Live Platform Demo

~3.5 Minutes

https://helpdesk-ai-app.vercel.app

Chaos to Clarity Flow

AI Visual Simulator

4-Layer Authorization

Challenges & Resiliency

[SYSTEM_LOG: STABLE]

Challenge 01

Semantic Distribution Disparity

$ analysis --weights
> ERR: High-frequency "General" noise distorting minority class vectors.
> Bias detected in specific hardware partitions.
Solution

Applied Synthetic Augmentation & stratified sampling to ensure robust detection of rare failure modes.

Challenge 02

Serverless Inference Latency

$ ping api.inference.hf
> 110M Params loading...
> TIMEOUT: Cold-start delay > 12s on sporadic requests.
Optimized

Integrated FastAPI Lifespan Hooks to pre-cache model in RAM, slashing cold-start latency by 95%.

Challenge 03

Colloquial Context Ambiguity

$ process "jira dead"
> UnknownToken: "Hinglish" mix & enterprise jargon.
> Confidence Score: 0.42 [WEAK]
Normalized

Fine-tuned on a Technical Colloquial Corpus to resolve mixed-lanuage slang and organization jargon.

Conclusion and Future Scope

Conclusion

End-to-End Triage Automation

Successfully transformed unstructured natural language input into structured, actionable service tickets.

High-Precision Intelligence

Achieved accurate categorization and priority assignment using fine-tuned transformer models.

Operational Efficiency

Streamlined workflows with multimodal OCR and semantic duplicate detection, operating at sub-400ms inference speeds.

Future Scope

Adaptive Retraining

Establish continuous learning loops powered by the integrated admin feedback and correction system.

Complex Multimodal Support

Expand vision capabilities to interpret technical diagrams and video-based issue reports.

Autonomous Remediation

Integrate with enterprise APIs to enable the AI to perform direct technical fixes like account resets.

Thank You!

Questions & Answers

Live Tech Demo

helpdeskaiv1.vercel.app

HELPDESK.AI • INFOSYS SPRINGBOARD INTERNSHIP 6.0