Available with 3 Months Notice Period — London, UK

Vinay KumarK V

Data Scientist & AI Engineer
GenAI · Agentic AI · Pharma & Regulated AI

I build production AI systems in regulated environments — and I have been doing it at GSK. Over the past two years I led AI product development processing 500+ regulatory documents monthly, cutting manual review time by 70%. Currently completing a DBA in AI & ML (Walsh College) and PGP at Texas McCombs. Long-term research focus: AI in Robotics & VLA models.

View Projects Get in Touch

GitHub LinkedIn Portfolio

Years IT & consulting experience

500+

Regulatory docs processed monthly by GSK AI system

70%

Reduction in manual review time — production AI at GSK

AI & ML projects with public links

// 01 — About

Building AI that works
in the real world

I am a Data Scientist and AI Engineer with 8+ years of enterprise experience, specialising in end-to-end GenAI and Agentic AI solution delivery in regulated environments. My background spans electronics engineering, enterprise systems, an MBA, and now a doctoral programme in AI & ML — giving me an unusual combination of technical depth, business thinking, and research capability.

At GSK, I led the development of an AI certificate automation system — from architecture selection through RAG pipeline design to production deployment. I ran structured LLM experiments, made the architecture decision between Azure Document Intelligence and GPT-4, and coordinated delivery across 25+ stakeholders in a GxP-regulated environment where errors have compliance consequences.

That experience made one thing clear: I did not want to keep specifying AI systems. I wanted to build them. So I invested deliberately — a DBA in AI & ML at Walsh College (2025–2028) and a PGP in AI & ML at Texas McCombs (2025–2026), running in parallel with hands-on project building.

My long-term research focus through the DBA is AI in Robotics — specifically Vision-Language-Action (VLA) models for autonomous systems in safety-critical environments. The intersection of pharmaceutical regulated AI experience and embodied AI research is where I am heading.

"What I bring that most AI Engineer candidates cannot: eight years of IT & understanding how regulated businesses make decisions, how to frame the right problem before touching a line of code, and how to communicate technical trade-offs to a board-level audience."

ROLE Data Scientist & AI EngineerGenAI · Agentic AI · Regulated AI

BASED London, United KingdomUK Sponsored · Open to relocation

STUDY DBA in AI & ML — Walsh CollegePGP AI & ML — Texas McCombs

FOCUS Gen AI, Agentic AI, AI Robotics & VLA ModelsLong-term DBA research direction

DOMAIN Pharma · Regulated AI · GxP6+ years GSK pharmaceutical systems

OPEN Data Scientist · AI Engineer · GenAI rolesPharma · Regulated industries · Deep tech

// 02 — Skills

Technical Stack

GenAI & Agentic AI

LLMs & Prompt Engineering RAG Architecture LangChain Agentic AI & AI Agents FAISS · Vector Databases Gemini API Azure OpenAI LangGraph (learning) Multi-Agent Orchestration (learning) HuggingFace Transformers (learning) Fine-tuning (learning)

Core DS & ML

Python Pandas · NumPy · Scikit-learn Machine Learning NLP & Text Classification EDA & Feature Engineering XGBoost · Random Forest Model Evaluation & Benchmarking Explainable AI (XAI / SHAP) Statistical Analysis SQL · PostgreSQL Streamlit · Gradio · FastAPI Matplotlib · Seaborn

Cloud & MLOps

Azure (OpenAI, Document Intelligence) Git · GitHub · CI/CD HuggingFace Hub Docker (learning) MLflow (learning) Azure AI Foundry · Prompt Flow (learning)

Consulting & Delivery

GxP / Regulated AI Responsible AI & AI Governance Workshop Facilitation Business Value Articulation Stakeholder Management Solution Roadmaps Agile · Scrum · Validation

// 03 — Experience

Where I've built things

May 2023 – Present

TCS · GSK plc
London, UK

Data Scientist & Technical Business Analyst

Led end-to-end AI product development for regulatory certificate automation — architecture selection, RAG pipeline design, production deployment, and live monitoring on Azure
Designed and executed structured LLM experimentation framework: comparative evaluation of Azure Document Intelligence vs GPT-4 against defined performance criteria (95%+ accuracy) — made architecture recommendation based on evidence
Built RAG pipeline POC — document chunking, FAISS vector indexing, prompt template design, and factuality evaluation in a GxP-regulated environment
Open-sourced production Python data utilities at GSK: XML digital signature tools and Excel-to-XML converters helping suppliers to integrate GSK application and process.
Conducted AI adoption workshops with QA, Operations, IT, and Regulatory stakeholders — translating AI capabilities into measurable business outcomes and to secure executive buy-in for production implementation of LLM based PDF extraction
Built Python + Jira API sprint analytics pipeline — automated reporting for 25+ member cross-functional team

Sep 2019 – May 2023

TCS · GSK plc
Bangalore

Business Analyst & Scrum Master

Introduced hypothesis-driven delivery — converted vague business requests into structured problem statements with measurable success criteria before committing development resources
Performed data-driven root cause analysis using Azure DevOps data — identifying patterns in sprint velocity, defect rates, and cycle times
Delivered solutions across 8 global pharmaceutical manufacturing sites — 8+ major product releases, 30% efficiency improvement
Managed multiple concurrent roles — Scrum Master, Validation Lead, BA, PM — for cross-functional teams of 8–12

Oct 2015 – Oct 2017

TCS · Telefonica
Chennai, India

Systems Engineer

Maintained 100% SLA compliance on enterprise backup infrastructure across Linux and Windows environments

// 04 — Projects

Things I've shipped

PROJECT 01

AI Doc2XML
Dual-Agent System

Agentic AI system using LangChain to orchestrate two specialised agents — an Extractor and a Reviewer — for pharmaceutical regulatory document processing. Supports both local Ollama models and cloud-based Anthropic Claude API.

Python LangChain Multi-Agent Local LLM Gradio

GitHub →

PROJECT 02

GxP Quality Intelligence
NLP Classifier + XAI

End-to-end NLP classifier for pharmaceutical deviation categorisation (critical/major/minor). Includes TF-IDF feature engineering, Logistic Regression, baseline comparisons, FAISS cosine similarity retrieval, and an Explainable AI layer using SHAP for GxP regulatory audit acceptance.

Python Scikit-learn TF-IDF FAISS SHAP / XAI NLP

GitHub →

PROJECT 03

Customer Churn Prediction
Live on HuggingFace

End-to-end ML project addressing class imbalance in churn prediction. Deployed live on HuggingFace Spaces with Streamlit UI and FastAPI backend. Evaluated using F1 and AUC-ROC on minority class — not accuracy — demonstrating correct evaluation methodology for imbalanced datasets.

Python Scikit-learn Streamlit FastAPI HuggingFace EDA

Live Demo → GitHub →

PROJECT 04

Jira Sprint Reporter
Python Automation

Comprehensive sprint reporting tool with automated email delivery and visual analytics. Built using Python and Jira API — eliminates manual data extraction and report generation for cross-functional delivery teams. Deployed live at GSK for a 25+ member team.

Python Jira API Data Processing Automation

GitHub →

COMING SOON

RAG Pipeline
with Evaluation Framework

Production-style RAG application with formal evaluation layer — baseline vs RAG comparison, hallucination rate measurement, chunking strategy experiments (256/512/1024 token), and FAISS retrieval quality analysis. Inspired directly by GSK production work.

LangChain FAISS RAG Evaluation Streamlit

COMING SOON

LangGraph Multi-Agent
Compliance System

Stateful multi-agent system using LangGraph — three specialised agents (document reader, classifier, compliance reporter) with conditional routing and state management. Direct application of agentic AI to pharmaceutical regulatory workflows.

LangGraph Multi-Agent Azure OpenAI State Management

// 05 — Education

Academic foundation

Doctor of Business Administration — AI & Machine Learning

Walsh College

2025 – 2028

3-year doctoral programme: Year 1 PGP in AI & ML, Year 2 MS in AI & ML, Year 3 Research & Dissertation. Research direction: AI in Robotics — Vision-Language-Action (VLA) models for autonomous systems in safety-critical regulated environments.

Doctoral Research → Gen AI, Agentic AI, AI Robotics & VLA Models

Post Graduate Program — AI & ML, Business Applications

Texas McCombs School of Business

2025 – 2026

Applied AI and ML programme covering machine learning, NLP, deep learning, computer vision, and AI strategy. Running in parallel with DBA — applied technical depth alongside research-level theory.

Running in parallel with DBA

Master of Business Administration

MSRIT, Bangalore

2017 – 2019

Business Understanding, ROI thinking and Domain skills.

82nd rank — Karnataka PGCET

Bachelor of Engineering — Electronics & Communication

Sir M Visvesvaraya Institute of Technology

2012 – 2015

Foundation in control systems, sensors, and signal processing

Directly relevant to the robotics research direction.

Vinay KumarK V

Building AI that worksin the real world

Technical Stack

Where I've built things

Things I've shipped

Academic foundation

Let's connect

Building AI that works
in the real world