riyanshi bohra :)

about me

"without data, you're just another person with an opinion." - w. edwards deming

a data scientist and ml engineer with a master's in data science from the university of arizona. my work spans healthcare to fintech, building ml pipelines, llm-powered systems, and ai-driven solutions that people can actually rely on.

i specialize in predictive analytics and machine learning, with deep experience in python, pytorch, sql, langgraph, and aws cloud. from neural networks for computer vision to retrieval-augmented generation for intelligent ai systems, i care about the last mile, where a good model becomes a dependable system.

contact me → ↓ download resume

tunekit.app went viral | 130+ github stars & counting

ms in data science @ univ of arizona '25

3 open-source tools. all live. 1 year.

onsetlab.app | published on pypi.

built an on-device ai browser tool. team at google noticed.

2 peer-reviewed publications in international journals

data engineer @ ibexlabs

tunekit.app went viral | 130+ github stars & counting

ms in data science @ univ of arizona '25

3 open-source tools. all live. 1 year.

onsetlab.app | published on pypi.

built an on-device ai browser tool. team at google noticed.

2 peer-reviewed publications in international journals

data engineer @ ibexlabs

fine-tuning language models on-device

pytorch · langchain · langgraph

agentic systems that actually work

ml for healthcare & fintech

rag pipelines in production

on-device inference, no cloud required

open-source contributor

fine-tuning language models on-device

pytorch · langchain · langgraph

agentic systems that actually work

ml for healthcare & fintech

rag pipelines in production

on-device inference, no cloud required

open-source contributor

3+ years academic research

15+ projects completed

8+ certifications

2,600+ contributions in the last year

view on github →

the journey

scroll through the years.

2019

Jul'19

Manipal University Jaipur

Bachelors in Information Technology · Minor in Data Science

Where it all started

↓ keep scrolling to explore

2022

Jul'22 - Sep'22

PricewaterhouseCoopers

Data Science Intern (Technology Consulting SBU)

Built predictive models for pharma supply chains
Shipped a model to production in a 3-month internship
$75K in annual savings · 32% reduction in manufacturing variability
Presented findings to senior leadership · Secured company-wide rollout

2023

Jan'23 - May'23

University of Florida

Exchange Student - Semester Abroad

Perfect 4.0 GPA
Senior Certificate Program
Selected on merit (top 1% of applicants)

May'23

Manipal University

BTech Graduated

Top 3% of a batch of ~200
Published sentiment analysis research at ICCIS 2022
Dean's List: perfect 10/10 GPA in final year (2 semesters)

Aug'23

University of Arizona

Started MS in Data Science

Selected to present at the first MS DS Lightning Talks in my first semester
Engaged in courses on AI, ML, and Data Mining.

2024

Jan'24 - Jun'24

Zuckerman College of Public Health

Applied ML Engineer (Research)

Led geospatial research on urban heat equity & shade access
Presented findings at the Southwest Urban Integrated Field Laboratory (SW-IFL) conference
One of the few grad roles where the work left the lab
Co-authored paper published in Springer Nature, December 2025

2024

University of Arizona

MS in Data Science (Year 2)

Selected from 100+ students to represent department at Grace Hopper Conference '24
Featured on the College of Information Science website
Presented research on AI-driven distracted driving detection at iShowcase 2024.
MSDS Lightning Talks presenter (2nd consecutive year)

2025

May'25

University of Arizona

MS in Data Science, Graduated

CGPA: 3.8/4.0
Recipient of the MSDS Scholarship.
Specialized in Deep Learning, Generative AI, LLMs

Nov'25 - Feb'26

Protexo Inc

Machine Learning Engineer

Fine-tuned SLMs on-device · runs offline on Android
740ms inference · 4 languages · no cloud dependency
Reduced manual labeling by 80% with a multi-agent data pipeline

Jul'25 - Feb'26

UGenome AI

Data Scientist

Built MCP server powering automated genomic report generation
A/B testing framework across 4 deployment cycles
Working at the intersection of frontier biology and applied AI

2026

Mar'26 - Present

Ibexlabs now

Data Engineer (Full-Time)

Starting a new chapter in data engineering

2019 2022 2023 2024 2025 2026

featured projects

i've worked on lots of side projects over the years, here are some recent ones. many are open-source, so if something piques your interest, check out the code.

01 · featured

tunekit.app

llm tooling · open source · unsloth-powered · zero infra

fine-tune slms 2x faster, for free. upload your data, get a production-ready model in under 15 minutes. no coding, no guesswork.

145+github stars

22forks

#19product hunt · launch day

github demo try it →

onsetlab.app

agents · open source · mcp support · pypi published

local-first framework that turns slms into tool-calling ai agents. mcp support. pip install and you're running.

86product hunt upvotes

PyPIpublished

github demo pypi try it →

overtab

chrome extension · on-device ai

highlight, hover, or speak to get instant insights, all processed locally with gemini nano. zero cloud, zero latency.

github demo web store →

pii-ner extraction

nlp · huggingface · bert · multilingual

fine-tuned 177M BERT on 20K WikiANN samples for multilingual NER tagging. published on huggingface with 120+ community downloads and production-ready inference at 264ms avg.

177Mparameters

120+hf downloads

264msavg inference

github huggingface →

more projects · drag to explore →

data · nlp · agentic pipeline

nexora: data exploration agent

ask a question. it finds the right dataset, profiles it, and generates the first useful charts automatically.

LangGraphReactFastAPI

explore →

ai · visualization · tool-calling

datalens: intelligent chart generator

turn queries into charts with a multi-agent workflow that selects metrics, writes code, and explains results.

LangGraphOpenAITavily

explore →

rag · information retrieval

VideoMind AI: Videos to Insights

paste a link. get summaries, highlights, and searchable answers built on transcripts and embeddings.

WhisperPineconeLangChain

explore →

rag · voice · multi-modal

DocTalk: Your AI Document Assistant

talk to your pdfs. voice-enabled research with cited answers.

LangChainElevenLabs

explore →

deep learning · experiment design

Biometric Predictors of Emotional States

emotional state classification from biometric signals.

TensorFlowKerasscikit-learn

explore →

fintech · flask

portfolio+: virtual trading platform

paper trading with real-time market data, performance tracking, and portfolio analytics.

FlaskPostgreSQLPython

use it →

computer vision · road safety

safedrive-AI: distracted driving detection

real-time driver behavior classification from in-cabin video for distraction and safety events.

OpenCVCNNTensorFlow

explore →

data science · environment

metropolitan climate profiling

urban heat island analysis with feature engineering, spatial signals, and interpretable modeling.

pandasPlotlyScikit-learn

explore →

nlp · sentiment analysis

Mental Health Analysis on Social Media

springer conference chapter on sentiment analysis for early signals of mental health risk in social media text.

NLPBERTSpringer

read on springer →

supervised ml · healthcare

fetal health classification

ctg-based signal classification to detect fetal risk states with supervised ml and calibrated metrics.

XGBoostpandasmatplotlib

explore →

my resume

↓ download resume

where i studied

three campuses, two countries, one obsession with data.

August 2023 - May 2025

University of Arizona

M.S. Data Science

GPA: 3.78 / 4.0 Tucson, AZ

Deep Learning · Applied NLP · Cloud Data Warehousing · Artificial Intelligence

January 2023 - May 2023

University of Florida

Senior Certificate · Computer Science

GPA: 4.0 / 4.0 Gainesville, FL

Advanced Data Structures · Algorithm Design · Software Engineering · Database Systems

July 2019 - May 2023

Manipal University Jaipur

B.Tech · Information Technology

GPA: 3.8 / 4.0 Jaipur, India

Data Mining · Big Data Analytics · Object-Oriented Programming · Natural Language Processing

where i've worked

from startups to big four, always building with data.

Ibexlabs

Data Engineer

Current

UGenome AI

Data Scientist

Protexo Inc

ML Engineer

College of Public Health

Applied ML Engineer

PwC

Data Science Intern

Data Engineer

Ibexlabs

Mar 2026 - Present Full-Time

Just getting started, more to come.

Data Scientist

UGenome AI

Jul 2025 - Feb 2026 Remote

Developed an MCP server exposing 9 API endpoints as tools for an AI agent, creating an automated insight generation pipeline that produces personalized genomic reports analyzing 86 genes in 8-10 seconds.
Architected model versioning and A/B testing framework across 4 deployment cycles, maintaining <3-point F1 deviation in production.

Stack: Python · SQL · AWS · Docker · FastAPI · NGINX · Auth0

Machine Learning Engineer

Protexo Inc

Nov 2025 - Feb 2026 San Francisco, CA

Built a multi-agent system to autonomously scrape, synthesize, augment, and validate scam data across 4 languages, reducing manual labeling by 80%.
Fine-tuned Gemma 270M and LFM2 350M on scam detection tasks, deploying quantized models (INT8) using MediaPipe on Android with 740ms inference and 78MB memory footprint.

Stack: Python · PyTorch · TensorFlow Lite · MediaPipe · ONNX Runtime · Android SDK

Applied ML Engineer (Research)

Mel and Enid Zuckerman College of Public Health

Jan 2024 - Jun 2025 Tucson, AZ

Deployed computer vision models achieving 87% accuracy on 500+ satellite imagery samples, processing 15 years of geospatial data across 3 district health initiatives.
Modeled site-level disparities using regression and time-series analysis across 1,200 school sites, identifying 25% resource disparity patterns that informed a $625K district funding allocation.

Stack: R · Python · Google Earth Engine · ArcGIS · SQL

Data Science Intern

PwC (PricewaterhouseCoopers)

Jul 2022 - Sep 2022 Mumbai, India

Automated ETL workflows in Python, SQL, and Apache Spark for 15,000+ pharma manufacturing records/month, cutting manual effort by 350 man hours/week.
Trained supervised learning models (XGBoost, Random Forest) on 7+ years of production data, projecting 20% throughput gain and $50K in annualized savings.

Stack: Python · SQL · Apache Spark · PowerBI · Docker · ETL

build. deploy. scale.

a simple way to describe what i do. pick a lane if you want the tool list.

build from ideas to v1

models, agents, and products. the fun part.

agent workflows llm fine-tuning rag pipelines tool calling on-device ai model optimization

click to open toolkit ↓

deploy making it real

latency, memory, reliability. making it run outside the notebook.

on-device inference export quantization secure endpoints runtime optimization monitoring

click to open toolkit ↓

scale systems and pipelines

pipelines that stay correct at 2am. automation that does not break.

data pipelines data quality cloud monitoring testing cost and performance

click to open toolkit ↓

tools

focus

published research

peer-reviewed papers i've co-authored across ai, nlp, and public health.

Evaluated the accuracy and reliability of low-cost PM sensors against industry-standard devices for occupational exposure monitoring. Developed calibration models and breakpoint analyses to identify performance thresholds.

Environmental Health Data Analysis Sensor Calibration

Read Paper →

Journal Article

Assessing the Efficacy of Low-Cost Air Pollution Monitoring Devices

Environmental Monitoring and Assessment · 2025

Volume 198, Article 40 · Springer Nature

Environmental Health Data Analysis Sensor Calibration

click to open →

Developed sentiment analysis techniques for early detection of mental illness indicators in social media data, contributing to proactive mental health interventions.

Sentiment Analysis NLP Machine Learning

Read Paper →

Conference Paper

Mental Health Analysis on Social Media Using Sentiment Analysis

ICCIS 2022 · Springer, Singapore

International Conference on Communication and Intelligent Systems

Sentiment Analysis NLP Machine Learning

click to open →

certified knowledge

a library drawer. drag the handle. browse the cards.

certifications drag to scroll

drawer drag →

activeloop

feb 2025

langchain and vector databases in production

rag patterns, vector db choices, and production concerns.

rag vector db langchain

view →

crewai

feb 2025

multi ai agent systems with crewai

multi-agent workflows and task planning patterns.

agents tools workflows

view →

ibm

sep 2021

machine learning with python

supervised learning, evaluation, and practical ml workflows.

ml sklearn evaluation

view →

ibm

jul 2021

python for data science, ai & development

data wrangling with numpy, pandas, and api integration.

pandas numpy apis

view →

deeplearning.ai

jun 2021

ai for everyone

ai strategy, ethics, and building an ai-first organization.

strategy ethics neural nets

view →

google

jun 2021

crash course on python

python fundamentals, oop, and automation scripting.

python oop automation

view →

u of michigan

jun 2021

programming for everybody

intro to python and core programming concepts.

python basics

view →

hackerrank

jun 2021

python (basic)

problem solving, algorithms, and data structures in python.

algorithms data structures

view →

ibm

jun 2021

building ai powered chatbots

nlp, watson assistant, and conversational dialog design.

nlp watson dialog

view →

field notes

thoughts on data science, ai, and whatever i'm tinkering with.

writing 4 entries

what a month of failing taught me about small language models

lessons from building and breaking ai agents, and why the small models might still surprise you.

small language models ai agents MCP tool use mar 5, 2026 read →

data collection and analysis with r

a practical guide to pulling US census data with tidycensus, cleaning it fast, and turning it into maps and insights.

data visualization census data statistics data analysis r jul 16,2024 read →

beyond words: the evolution of nlp with transformers

a high level map of how transformers work, why attention matters, and where zero-shot actually helps in real systems.

nlp transformers zero-shot learning explainable ai jul 5, 2024 read →

qubits vs. bits: unleashing quantum data science

a beginner-friendly way to think about qubits, why they matter, and what quantum could change for optimization and ml.

quantum computing algorithms data science emerging trends jul 1, 2024 read →

featured talks

presented at the MS DS lightning talks at the college of information science, university of arizona for two consecutive years (2023-2024).

stage

2023

watch →

2024

watch →

view event details →

what people say

from those who've worked with me directly.

Arjun Singri

Zachary Brooks

Dr. Smaranika

"she's incredibly hardworking and dependable. i'd happily work with her again."

I worked with Riyanshi on Protexo, where she played a key role in building our scam detection models. She put in serious effort collecting real-world scam and legitimate message data from across the internet, cleaning messy datasets, and extracting meaningful signals from noisy text. That foundation made a big difference in our model quality. I highly recommend her to any team looking for a strong data scientist.

Arjun Singri Founder & CEO, Protexo Inc view on linkedin

"she makes all of us better."

Riyanshi joined UGenome shortly after she finished her masters degree. Our immediate need didn't 100% fit with Riyanshi's training. This is a hard ask for many professionals yet Riyanshi has taken on this challenge with aplomb. Her work ethic, her professional communication, her desire to learn and produce high quality technical content in the complex field of genomics has been impressive. The entire team enjoys working with her.

Zachary S. Brooks, PhD, EMBA CEO, UGenome AI view on linkedin

"riyanshi is one of those who stood out from the rest."

She worked with me on a research project centered around Sentimental Analysis with NLP and Machine Learning. She's an exceptional student who is passionate about learning and self improvement. Her critical thinking, teamwork, and communication skills are outstanding. She has the strength to break the comfort zone and move forward to achieve something new. She's a valuable asset to any team.

Dr. Smaranika Mohapatra Assistant Professor, Manipal University Jaipur view on linkedin

riyanshi.

about me

the journey

featured projects

tunekit.app

onsetlab.app

overtab

pii-ner extraction

nexora: data exploration agent

datalens: intelligent chart generator

VideoMind AI: Videos to Insights

DocTalk: Your AI Document Assistant

Biometric Predictors of Emotional States

portfolio+: virtual trading platform

safedrive-AI: distracted driving detection

metropolitan climate profiling

Mental Health Analysis on Social Media

fetal health classification

my resume

where i studied

University of Arizona

University of Florida

Manipal University Jaipur

where i've worked

Data Engineer

Data Scientist

Machine Learning Engineer

Applied ML Engineer (Research)

Data Science Intern

build. deploy. scale.

published research

Assessing the Efficacy of Low-Cost Air Pollution Monitoring Devices

Mental Health Analysis on Social Media Using Sentiment Analysis

certified knowledge

langchain and vector databases in production

multi ai agent systems with crewai

machine learning with python

python for data science, ai & development

ai for everyone

crash course on python

programming for everybody

python (basic)

building ai powered chatbots

field notes

what a month of failing taught me about small language models

data collection and analysis with r

beyond words: the evolution of nlp with transformers

qubits vs. bits: unleashing quantum data science

featured talks

what people say

say hello