Projects — Vinay Kumar K V

Project 01

AI Doc2XML
Dual-Agent System

Containerised Open Source

An agentic AI system that automates extraction of structured data from pharmaceutical regulatory documents and converts it into XML format. Built using LangChain to orchestrate two specialised agents working in sequence — directly inspired by document processing challenges observed at GSK.

Supports both local Ollama models for privacy-sensitive environments and Azure OpenAI for production-grade accuracy. Containerised with Docker and deployed live on Azure Container Apps via Azure Container Registry.

Problem it solves

Manual extraction of structured data from pharmaceutical regulatory certificates is slow, error-prone, and does not scale. Two specialised agents rather than one general-purpose agent improves accuracy and auditability.

Tech Stack

PythonLangChainMulti-Agent ArchitectureAzure OpenAIOllama (Local LLM)DockerAzure Container AppsAzure Container RegistryGradio

Live App → GitHub →

Architecture — Agent Flow

01Document uploaded via Gradio UI

↓

02Extractor Agent — reads document, identifies fields, extracts structured data

↓

03Reviewer Agent — validates extraction, flags gaps or anomalies

↓

04XML output generated from validated structured data

↓

05User downloads XML file

Key Technical Decisions

Two agents not one: A reviewer agent catches extraction errors before output — more reliable for complex documents.
Local + Cloud LLM: Ollama for environments where data cannot leave premises; Azure OpenAI for production accuracy.
Docker containerisation: Solved cross-environment Python dependency conflicts preventing direct Azure deployment.
Azure Container Apps: Serverless scaling — zero cost when idle, scales on demand without managing infrastructure.

What I Learned

Dependency management is the biggest practical barrier to Python app deployment — Docker eliminates it
Multi-agent systems need clear boundaries — agents that try to do too much become unreliable
Environment variable management via Azure secrets is essential for API key security in production

Project 02

GxP Quality Intelligence
NLP Classifier + XAI

GitHub Notebook

An end-to-end NLP classification system for pharmaceutical deviation reports. Classifies each deviation as critical, major, or minor — a task currently done manually by quality teams. Includes an Explainable AI layer using SHAP to make model decisions auditable for GxP regulatory acceptance.

In a GxP environment, a model that cannot explain its decisions will not pass regulatory review. SHAP values show exactly which words drove the classification — making the model auditable by quality and regulatory teams.

Problem it solves

Manual deviation classification is subjective, slow, and inconsistent across reviewers. An automated system with explainability reduces classification time and provides a consistent, auditable decision trail.

Tech Stack

PythonScikit-learnLogistic RegressionTF-IDFFAISSSHAPExplainable AINLPPandasMatplotlib

GitHub →

Pipeline Architecture

01Raw deviation text input

↓

02Text preprocessing — tokenisation, stopwords, lemmatisation

↓

03TF-IDF feature extraction

↓

04Logistic Regression classification

↓

05FAISS similarity search for related historical deviations

↓

06SHAP explainability — word-level decision attribution

Key Technical Decisions

TF-IDF over embeddings: With a small dataset, TF-IDF generalises better than large embedding models that would overfit.
Logistic Regression over XGBoost: More interpretable and works well with TF-IDF sparse features — explainability was the priority.
SHAP for XAI: Shows which specific words drove each classification — essential for GxP audit acceptance.
FAISS retrieval: Surfaces similar historical deviations to give reviewers context alongside the classification.

Why Explainability Matters in GxP

GxP regulations require documented evidence for decisions affecting product quality
A black-box model cannot be validated under GAMP 5 principles
SHAP word-level attribution provides the audit trail regulators require
Human-in-the-loop design — model assists, quality expert approves

Project 03

Customer Churn Prediction
End-to-End ML

Live on HuggingFace Open Source

A complete end-to-end machine learning project predicting customer churn probability. The core challenge — and the most important technical decision — is handling the severe class imbalance in churn datasets where non-churners vastly outnumber churners.

Most naive approaches achieve high accuracy by predicting everything as non-churn. This project evaluates on F1 score and AUC-ROC on the minority class — the correct methodology for imbalanced classification problems.

Tech Stack

PythonScikit-learnPandasStreamlitFastAPIHuggingFace SpacesSMOTEEDAGit

Live Demo → GitHub →

ML Pipeline

01EDA — distributions, correlations, missing values, class balance

↓

02Feature engineering — encoding, scaling, feature selection

↓

03Class imbalance handling — SMOTE oversampling

↓

04Model training — multiple classifiers compared

↓

05Evaluation — F1 and AUC-ROC on minority class

↓

06FastAPI serving + Streamlit UI + HuggingFace deployment

Key Technical Decisions

SMOTE not class weights: Minority class too small for weights alone — SMOTE generates synthetic minority samples for better balance.
F1 and AUC-ROC not accuracy: A model predicting all non-churn achieves 95% accuracy but is completely useless.
FastAPI backend: Separates model serving from UI — API can be consumed independently of the Streamlit frontend.

Project 04

Jira Sprint Reporter
Python Automation

Deployed at GSK Open Source

A Python automation tool connecting to the Jira API, extracting sprint data, performing analytics, generating visual reports, and delivering them automatically via email. Built to eliminate manual sprint reporting for a 25+ member cross-functional team at GSK.

Tech Stack

PythonJira REST APIPandasMatplotlibAutomationSMTP EmailGit

GitHub →

How It Works

01Connects to Jira REST API with authentication

↓

02Extracts sprint issues, statuses, assignees, story points

↓

03Pandas transformation — velocity, completion rate, blockers

↓

04Matplotlib visualisations — burndown, status breakdown, velocity trend

↓

05HTML report generated and delivered via automated email

Business Impact

Eliminated hours of manual data extraction every sprint
Consistent report format across all sprints — no human variation
Stakeholders receive reports automatically without chasing anyone
Deployed and used live at GSK for 25+ member team

Research Project 05 — In Progress

VATSA
Unified Five-Modality AI Architecture

V-Module Complete Preprint Published Open Source

A unified five-modality AI architecture for human-level perception and action — Video, Audio, Text, Sensory, Action. Each modality encoder projects into a shared 512-dimensional latent space for cross-modal fusion. Long-term mission: a safe embodied AI that can operate alongside humans.

The Visual Module (V-Module) is complete. Audio, Text, Sensory, and Action modules are in the roadmap across the DBA research timeline (2025–2028). Architecture published as a preprint on Zenodo, April 2026.

V-Module — What's Complete

EfficientNet-B0 trained with three-stage transfer learning on CIFAR-10: 79% frozen → 94% fine-tuned → 96.31% deep unfreeze (4 layers, 40 epochs). Integrated with YOLOv8 for real-time object detection at 22 FPS live stream, generating 512-dim embeddings at 1,336 embeddings/sec at batch 16. GPU footprint: 63.7 MB.

SAMOS — Novel Research Contribution

SAMOS — Safety-Aware Multi-Output Selector

A proposed novel output routing mechanism for safe parallel multi-modal output generation in physically embodied AI systems. Instead of a softmax that forces one winner, SAMOS uses independent sigmoid activations per output head — allowing Text, Audio, Action, Feeling, and Video outputs to activate simultaneously when appropriate.

Three core components: (1) Learnable per-modality thresholds — not fixed at 0.5, learned from safety-weighted loss; (2) Asymmetric safety-weighted loss function — false activation of the Action head (physical harm risk) is penalised far more heavily than false activation of the Feeling head; (3) Uncertainty-aware gating — when uncertain, the system defaults to the safer option per modality.

The ethical principle — the robot must never harm a human, even accidentally, even when uncertain — is encoded directly into the mathematical architecture, not bolted on as a filter afterward.

Tech Stack

PyTorchEfficientNet-B0Transfer LearningYOLOv8Computer VisionMultimodal AIEmbeddingsMixed PrecisionSAMOSVLA Models

Preprint on Zenodo → GitHub →

V-Module Performance

96.31%

CIFAR-10 Accuracy

22 FPS

Real-time YOLOv8 Stream

1,336

Embeddings/sec @ Batch 16

63.7 MB

GPU Footprint

Full VATSA Pipeline (Target Architecture)

INVideo · Audio · Text · Sensory inputs

↓

01Five modality encoders → 512-dim embeddings each

↓

02Cross-modal fusion transformer

↓

03Unified situational representation

↓

04SAMOS — Safety-Aware Multi-Output Selector

↓

OUTText · Audio · Action · Feeling · Video (parallel, async)

Research Roadmap

2025–2026 (Year 1): Complete Audio and Text encoders. Medical RAG assistant as applied GenAI project.
2026–2027 (Year 2): Sensory encoder. Cross-modal fusion transformer. SAMOS prototype.
2027–2028 (Year 3): Action module. VLA integration. DBA dissertation on VATSA and SAMOS in safety-critical environments.

Open Research Problems

Can SAMOS thresholds adapt dynamically based on environment risk level?
Can output heads coordinate naturally through the shared latent space without explicit synchronisation?
How should the feeling output head be formally defined — affective state vector, physiological simulation, or social signal generator?

// Coming Soon

Project 06 — In Progress

Medical Assistant RAG

A RAG-based medical assistant built on open-source LLMs (Mistral / Llama via Ollama) and public medical datasets (PubMedQA / MedQuAD). Applying embeddings and transformer knowledge from the PGP programme to build and deploy a live clinical Q&A assistant on HuggingFace Spaces. Framed as a pharmaceutical/clinical document assistant — directly connecting GSK regulated AI experience to hands-on GenAI engineering.

LangChainRAGMistral / LlamaPubMedQAFAISSHuggingFace

Project 07 — Planned

LangGraph Multi-Agent Compliance System

Stateful multi-agent system using LangGraph — three specialised agents (document reader, classifier, compliance reporter) with conditional routing and state management. Direct application of agentic AI to pharmaceutical regulatory compliance workflows. Extension of the AI Doc2XML architecture into a more complex stateful pipeline.

LangGraphMulti-AgentAzure OpenAIState Management

Everything I've built

AI Doc2XMLDual-Agent System

GxP Quality IntelligenceNLP Classifier + XAI

Customer Churn PredictionEnd-to-End ML

Jira Sprint ReporterPython Automation

VATSAUnified Five-Modality AI Architecture

AI Doc2XML
Dual-Agent System

GxP Quality Intelligence
NLP Classifier + XAI

Customer Churn Prediction
End-to-End ML

Jira Sprint Reporter
Python Automation

VATSA
Unified Five-Modality AI Architecture