Data Scientist & AI Engineer
GenAI · Agentic AI · Pharma & Regulated AI
I build production AI systems in regulated environments — and I have been doing it at GSK. Over the past two years I led AI product development processing 500+ regulatory documents monthly, cutting manual review time by 70%. Currently completing a DBA in AI & ML (Walsh College) and PGP at Texas McCombs. Long-term research focus: AI in Robotics & VLA models.
I am a Data Scientist and AI Engineer with 8+ years of enterprise experience, specialising in end-to-end GenAI and Agentic AI solution delivery in regulated environments. My background spans electronics engineering, enterprise systems, an MBA, and now a doctoral programme in AI & ML — giving me an unusual combination of technical depth, business thinking, and research capability.
At GSK, I led the development of an AI certificate automation system — from architecture selection through RAG pipeline design to production deployment. I ran structured LLM experiments, made the architecture decision between Azure Document Intelligence and GPT-4, and coordinated delivery across 25+ stakeholders in a GxP-regulated environment where errors have compliance consequences.
That experience made one thing clear: I did not want to keep specifying AI systems. I wanted to build them. So I invested deliberately — a DBA in AI & ML at Walsh College (2025–2028) and a PGP in AI & ML at Texas McCombs (2025–2026), running in parallel with hands-on project building.
My long-term research focus through the DBA is AI in Robotics — specifically Vision-Language-Action (VLA) models for autonomous systems in safety-critical environments. The intersection of pharmaceutical regulated AI experience and embodied AI research is where I am heading.
"What I bring that most AI Engineer candidates cannot: eight years of IT & understanding how regulated businesses make decisions, how to frame the right problem before touching a line of code, and how to communicate technical trade-offs to a board-level audience."
Agentic AI system using LangChain to orchestrate two specialised agents — an Extractor and a Reviewer — for pharmaceutical regulatory document processing. Supports both local Ollama models and cloud-based Anthropic Claude API.
End-to-end NLP classifier for pharmaceutical deviation categorisation (critical/major/minor). Includes TF-IDF feature engineering, Logistic Regression, baseline comparisons, FAISS cosine similarity retrieval, and an Explainable AI layer using SHAP for GxP regulatory audit acceptance.
End-to-end ML project addressing class imbalance in churn prediction. Deployed live on HuggingFace Spaces with Streamlit UI and FastAPI backend. Evaluated using F1 and AUC-ROC on minority class — not accuracy — demonstrating correct evaluation methodology for imbalanced datasets.
Comprehensive sprint reporting tool with automated email delivery and visual analytics. Built using Python and Jira API — eliminates manual data extraction and report generation for cross-functional delivery teams. Deployed live at GSK for a 25+ member team.
Production-style RAG application with formal evaluation layer — baseline vs RAG comparison, hallucination rate measurement, chunking strategy experiments (256/512/1024 token), and FAISS retrieval quality analysis. Inspired directly by GSK production work.
Stateful multi-agent system using LangGraph — three specialised agents (document reader, classifier, compliance reporter) with conditional routing and state management. Direct application of agentic AI to pharmaceutical regulatory workflows.
Open to Data Scientist, AI Engineer, and GenAI roles in pharma, regulated industries, and deep tech. Available in approximately 6 months.