Available with 3 Months Notice Period — London, UK

Vinay KumarK V

Data Scientist & AI Engineer
GenAI · Agentic AI · Pharma & Regulated AI

I build production AI systems in regulated environments — and I have been doing it at GSK. Over the past two years I led AI product development processing 500+ regulatory documents monthly, cutting manual review time by 70%. Currently completing a DBA in AI & ML (Walsh College) and PGP at Texas McCombs. Long-term research focus: AI in Robotics & VLA models.

8+
Years IT & consulting experience
500+
Regulatory docs processed monthly by GSK AI system
70%
Reduction in manual review time — production AI at GSK
3
AI & ML projects with public links
// 01 — About

Building AI that works
in the real world

I am a Data Scientist and AI Engineer with 8+ years of enterprise experience, specialising in end-to-end GenAI and Agentic AI solution delivery in regulated environments. My background spans electronics engineering, enterprise systems, an MBA, and now a doctoral programme in AI & ML — giving me an unusual combination of technical depth, business thinking, and research capability.

At GSK, I led the development of an AI certificate automation system — from architecture selection through RAG pipeline design to production deployment. I ran structured LLM experiments, made the architecture decision between Azure Document Intelligence and GPT-4, and coordinated delivery across 25+ stakeholders in a GxP-regulated environment where errors have compliance consequences.

That experience made one thing clear: I did not want to keep specifying AI systems. I wanted to build them. So I invested deliberately — a DBA in AI & ML at Walsh College (2025–2028) and a PGP in AI & ML at Texas McCombs (2025–2026), running in parallel with hands-on project building.

My long-term research focus through the DBA is AI in Robotics — specifically Vision-Language-Action (VLA) models for autonomous systems in safety-critical environments. The intersection of pharmaceutical regulated AI experience and embodied AI research is where I am heading.

"What I bring that most AI Engineer candidates cannot: eight years of IT & understanding how regulated businesses make decisions, how to frame the right problem before touching a line of code, and how to communicate technical trade-offs to a board-level audience."

ROLE Data Scientist & AI EngineerGenAI · Agentic AI · Regulated AI
BASED London, United KingdomUK Sponsored · Open to relocation
STUDY DBA in AI & ML — Walsh CollegePGP AI & ML — Texas McCombs
FOCUS Gen AI, Agentic AI, AI Robotics & VLA ModelsLong-term DBA research direction
DOMAIN Pharma · Regulated AI · GxP6+ years GSK pharmaceutical systems
OPEN Data Scientist · AI Engineer · GenAI rolesPharma · Regulated industries · Deep tech
// 02 — Skills

Technical Stack

GenAI & Agentic AI
LLMs & Prompt Engineering RAG Architecture LangChain Agentic AI & AI Agents FAISS · Vector Databases Gemini API Azure OpenAI LangGraph (learning) Multi-Agent Orchestration (learning) HuggingFace Transformers (learning) Fine-tuning (learning)
Core DS & ML
Python Pandas · NumPy · Scikit-learn Machine Learning NLP & Text Classification EDA & Feature Engineering XGBoost · Random Forest Model Evaluation & Benchmarking Explainable AI (XAI / SHAP) Statistical Analysis SQL · PostgreSQL Streamlit · Gradio · FastAPI Matplotlib · Seaborn
Cloud & MLOps
Azure (OpenAI, Document Intelligence) Git · GitHub · CI/CD HuggingFace Hub Docker (learning) MLflow (learning) Azure AI Foundry · Prompt Flow (learning)
Consulting & Delivery
GxP / Regulated AI Responsible AI & AI Governance Workshop Facilitation Business Value Articulation Stakeholder Management Solution Roadmaps Agile · Scrum · Validation
// 03 — Experience

Where I've built things

May 2023 – Present
TCS · GSK plc
London, UK
Data Scientist & Technical Business Analyst
  • Led end-to-end AI product development for regulatory certificate automation — architecture selection, RAG pipeline design, production deployment, and live monitoring on Azure
  • Designed and executed structured LLM experimentation framework: comparative evaluation of Azure Document Intelligence vs GPT-4 against defined performance criteria (95%+ accuracy) — made architecture recommendation based on evidence
  • Built RAG pipeline POC — document chunking, FAISS vector indexing, prompt template design, and factuality evaluation in a GxP-regulated environment
  • Open-sourced production Python data utilities at GSK: XML digital signature tools and Excel-to-XML converters helping suppliers to integrate GSK application and process.
  • Conducted AI adoption workshops with QA, Operations, IT, and Regulatory stakeholders — translating AI capabilities into measurable business outcomes and to secure executive buy-in for production implementation of LLM based PDF extraction
  • Built Python + Jira API sprint analytics pipeline — automated reporting for 25+ member cross-functional team
Sep 2019 – May 2023
TCS · GSK plc
Bangalore
Business Analyst & Scrum Master
  • Introduced hypothesis-driven delivery — converted vague business requests into structured problem statements with measurable success criteria before committing development resources
  • Performed data-driven root cause analysis using Azure DevOps data — identifying patterns in sprint velocity, defect rates, and cycle times
  • Delivered solutions across 8 global pharmaceutical manufacturing sites — 8+ major product releases, 30% efficiency improvement
  • Managed multiple concurrent roles — Scrum Master, Validation Lead, BA, PM — for cross-functional teams of 8–12
Oct 2015 – Oct 2017
TCS · Telefonica
Chennai, India
Systems Engineer
  • Maintained 100% SLA compliance on enterprise backup infrastructure across Linux and Windows environments
// 04 — Projects

Things I've shipped

PROJECT 01
AI Doc2XML
Dual-Agent System

Agentic AI system using LangChain to orchestrate two specialised agents — an Extractor and a Reviewer — for pharmaceutical regulatory document processing. Supports both local Ollama models and cloud-based Anthropic Claude API.

Python LangChain Multi-Agent Local LLM Gradio
PROJECT 02
GxP Quality Intelligence
NLP Classifier + XAI

End-to-end NLP classifier for pharmaceutical deviation categorisation (critical/major/minor). Includes TF-IDF feature engineering, Logistic Regression, baseline comparisons, FAISS cosine similarity retrieval, and an Explainable AI layer using SHAP for GxP regulatory audit acceptance.

Python Scikit-learn TF-IDF FAISS SHAP / XAI NLP
PROJECT 03
Customer Churn Prediction
Live on HuggingFace

End-to-end ML project addressing class imbalance in churn prediction. Deployed live on HuggingFace Spaces with Streamlit UI and FastAPI backend. Evaluated using F1 and AUC-ROC on minority class — not accuracy — demonstrating correct evaluation methodology for imbalanced datasets.

Python Scikit-learn Streamlit FastAPI HuggingFace EDA
PROJECT 04
Jira Sprint Reporter
Python Automation

Comprehensive sprint reporting tool with automated email delivery and visual analytics. Built using Python and Jira API — eliminates manual data extraction and report generation for cross-functional delivery teams. Deployed live at GSK for a 25+ member team.

Python Jira API Data Processing Automation
COMING SOON
RAG Pipeline
with Evaluation Framework

Production-style RAG application with formal evaluation layer — baseline vs RAG comparison, hallucination rate measurement, chunking strategy experiments (256/512/1024 token), and FAISS retrieval quality analysis. Inspired directly by GSK production work.

LangChain FAISS RAG Evaluation Streamlit
COMING SOON
LangGraph Multi-Agent
Compliance System

Stateful multi-agent system using LangGraph — three specialised agents (document reader, classifier, compliance reporter) with conditional routing and state management. Direct application of agentic AI to pharmaceutical regulatory workflows.

LangGraph Multi-Agent Azure OpenAI State Management
// 05 — Education

Academic foundation

Master of Business Administration
MSRIT, Bangalore
2017 – 2019
Business Understanding, ROI thinking and Domain skills.
82nd rank — Karnataka PGCET
Bachelor of Engineering — Electronics & Communication
Sir M Visvesvaraya Institute of Technology
2012 – 2015
Foundation in control systems, sensors, and signal processing
Directly relevant to the robotics research direction.
// 06 — Contact

Let's connect

Open to Data Scientist, AI Engineer, and GenAI roles in pharma, regulated industries, and deep tech. Available in approximately 6 months.

Email
vinaykumar.kv
@outlook.com
LinkedIn
vinay-kumar-k-v
GitHub
vinaykumarkv