built for the big screen :) this portfolio is best
experienced on a desktop or laptop.
continue anyway?
it's riyanshi.
riyanshi.
hi, i'm

riyanshi.

ree · yaan · shi

i'm
who builds things people actually use.

open to interesting conversations
hover nodes to activate  ·  watch signals propagate

about me

"without data, you're just another person with an opinion." - w. edwards deming
Riyanshi Bohra

a data scientist and ml engineer with a master's in data science from the university of arizona. my work spans healthcare to fintech, building ml pipelines, llm-powered systems, and ai-driven solutions that people can actually rely on.

i specialize in predictive analytics and machine learning, with deep experience in python, pytorch, sql, langgraph, and aws cloud. from neural networks for computer vision to retrieval-augmented generation for intelligent ai systems, i care about the last mile, where a good model becomes a dependable system.

tunekit.app went viral | 130+ github stars & counting
ms in data science @ univ of arizona '25
3 open-source tools. all live. 1 year.
onsetlab.app | published on pypi.
built an on-device ai browser tool. team at google noticed.
2 peer-reviewed publications in international journals
data engineer @ ibexlabs
tunekit.app went viral | 130+ github stars & counting
ms in data science @ univ of arizona '25
3 open-source tools. all live. 1 year.
onsetlab.app | published on pypi.
built an on-device ai browser tool. team at google noticed.
2 peer-reviewed publications in international journals
data engineer @ ibexlabs
fine-tuning language models on-device
pytorch · langchain · langgraph
agentic systems that actually work
ml for healthcare & fintech
rag pipelines in production
on-device inference, no cloud required
open-source contributor
fine-tuning language models on-device
pytorch · langchain · langgraph
agentic systems that actually work
ml for healthcare & fintech
rag pipelines in production
on-device inference, no cloud required
open-source contributor
3+ years academic research
15+ projects completed
8+ certifications
2,600+ contributions in the last year
Riyanshi's GitHub contribution graph
view on github →

the journey

scroll through the years.

2019
Jul'19
Manipal University Jaipur
Bachelors in Information Technology · Minor in Data Science
  • Where it all started
keep scrolling to explore
2022
Jul'22 - Sep'22
PricewaterhouseCoopers
Data Science Intern (Technology Consulting SBU)
  • Built predictive models for pharma supply chains
  • Shipped a model to production in a 3-month internship
  • $75K in annual savings · 32% reduction in manufacturing variability
  • Presented findings to senior leadership · Secured company-wide rollout
2023
Jan'23 - May'23
University of Florida
Exchange Student - Semester Abroad
  • Perfect 4.0 GPA
  • Senior Certificate Program
  • Selected on merit (top 1% of applicants)
May'23
Manipal University
BTech Graduated
  • Top 3% of a batch of ~200
  • Published sentiment analysis research at ICCIS 2022
  • Dean's List: perfect 10/10 GPA in final year (2 semesters)
Aug'23
University of Arizona
Started MS in Data Science
  • Selected to present at the first MS DS Lightning Talks in my first semester
  • Engaged in courses on AI, ML, and Data Mining.
2024
Jan'24 - Jun'24
Zuckerman College of Public Health
Applied ML Engineer (Research)
2024
University of Arizona
MS in Data Science (Year 2)
  • Selected from 100+ students to represent department at Grace Hopper Conference '24
  • Featured on the College of Information Science website
  • Presented research on AI-driven distracted driving detection at iShowcase 2024.
  • MSDS Lightning Talks presenter (2nd consecutive year)
2025
May'25
University of Arizona
MS in Data Science, Graduated
  • CGPA: 3.8/4.0
  • Recipient of the MSDS Scholarship.
  • Specialized in Deep Learning, Generative AI, LLMs
Nov'25 - Feb'26
Protexo Inc
Machine Learning Engineer
  • Fine-tuned SLMs on-device · runs offline on Android
  • 740ms inference · 4 languages · no cloud dependency
  • Reduced manual labeling by 80% with a multi-agent data pipeline
Jul'25 - Feb'26
UGenome AI
Data Scientist
  • Built MCP server powering automated genomic report generation
  • A/B testing framework across 4 deployment cycles
  • Working at the intersection of frontier biology and applied AI
2026
Mar'26 - Present
Ibexlabs now
Data Engineer (Full-Time)
  • Starting a new chapter in data engineering
2019 2022 2023 2024 2025 2026

featured projects

i've worked on lots of side projects over the years, here are some recent ones. many are open-source, so if something piques your interest, check out the code.

more projects · drag to explore →

where i studied

three campuses, two countries, one obsession with data.

August 2023 - May 2025

University of Arizona

M.S. Data Science

GPA: 3.78 / 4.0 Tucson, AZ
Deep Learning · Applied NLP · Cloud Data Warehousing · Artificial Intelligence
January 2023 - May 2023

University of Florida

Senior Certificate · Computer Science

GPA: 4.0 / 4.0 Gainesville, FL
Advanced Data Structures · Algorithm Design · Software Engineering · Database Systems
July 2019 - May 2023

Manipal University Jaipur

B.Tech · Information Technology

GPA: 3.8 / 4.0 Jaipur, India
Data Mining · Big Data Analytics · Object-Oriented Programming · Natural Language Processing

where i've worked

from startups to big four, always building with data.

Ibexlabs
Data Engineer
Current
UGenome AI
Data Scientist
Protexo Inc
ML Engineer
College of Public Health
Applied ML Engineer
PwC
Data Science Intern

Data Engineer

Ibexlabs
Mar 2026 - Present Full-Time
  • Just getting started, more to come.

Data Scientist

UGenome AI
Jul 2025 - Feb 2026 Remote
  • Developed an MCP server exposing 9 API endpoints as tools for an AI agent, creating an automated insight generation pipeline that produces personalized genomic reports analyzing 86 genes in 8-10 seconds.
  • Architected model versioning and A/B testing framework across 4 deployment cycles, maintaining <3-point F1 deviation in production.
Stack: Python · SQL · AWS · Docker · FastAPI · NGINX · Auth0

Machine Learning Engineer

Protexo Inc
Nov 2025 - Feb 2026 San Francisco, CA
  • Built a multi-agent system to autonomously scrape, synthesize, augment, and validate scam data across 4 languages, reducing manual labeling by 80%.
  • Fine-tuned Gemma 270M and LFM2 350M on scam detection tasks, deploying quantized models (INT8) using MediaPipe on Android with 740ms inference and 78MB memory footprint.
Stack: Python · PyTorch · TensorFlow Lite · MediaPipe · ONNX Runtime · Android SDK

Applied ML Engineer (Research)

Jan 2024 - Jun 2025 Tucson, AZ
  • Deployed computer vision models achieving 87% accuracy on 500+ satellite imagery samples, processing 15 years of geospatial data across 3 district health initiatives.
  • Modeled site-level disparities using regression and time-series analysis across 1,200 school sites, identifying 25% resource disparity patterns that informed a $625K district funding allocation.
Stack: R · Python · Google Earth Engine · ArcGIS · SQL

Data Science Intern

PwC (PricewaterhouseCoopers)
Jul 2022 - Sep 2022 Mumbai, India
  • Automated ETL workflows in Python, SQL, and Apache Spark for 15,000+ pharma manufacturing records/month, cutting manual effort by 350 man hours/week.
  • Trained supervised learning models (XGBoost, Random Forest) on 7+ years of production data, projecting 20% throughput gain and $50K in annualized savings.
Stack: Python · SQL · Apache Spark · PowerBI · Docker · ETL

build. deploy. scale.

a simple way to describe what i do. pick a lane if you want the tool list.

build from ideas to v1

models, agents, and products. the fun part.

agent workflows llm fine-tuning rag pipelines tool calling on-device ai model optimization
click to open toolkit
deploy making it real

latency, memory, reliability. making it run outside the notebook.

on-device inference export quantization secure endpoints runtime optimization monitoring
click to open toolkit
scale systems and pipelines

pipelines that stay correct at 2am. automation that does not break.

data pipelines data quality cloud monitoring testing cost and performance
click to open toolkit
tools
focus

    published research

    peer-reviewed papers i've co-authored across ai, nlp, and public health.

    close

    Evaluated the accuracy and reliability of low-cost PM sensors against industry-standard devices for occupational exposure monitoring. Developed calibration models and breakpoint analyses to identify performance thresholds.

    Environmental Health Data Analysis Sensor Calibration
    Read Paper →
    Journal Article

    Assessing the Efficacy of Low-Cost Air Pollution Monitoring Devices

    Environmental Monitoring and Assessment · 2025

    Volume 198, Article 40 · Springer Nature

    Environmental Health Data Analysis Sensor Calibration
    click to open →
    close

    Developed sentiment analysis techniques for early detection of mental illness indicators in social media data, contributing to proactive mental health interventions.

    Sentiment Analysis NLP Machine Learning
    Read Paper →
    Conference Paper

    Mental Health Analysis on Social Media Using Sentiment Analysis

    ICCIS 2022 · Springer, Singapore

    International Conference on Communication and Intelligent Systems

    Sentiment Analysis NLP Machine Learning
    click to open →

    certified knowledge

    a library drawer. drag the handle. browse the cards.

    certifications drag to scroll
    drawer drag →
    activeloop
    feb 2025

    langchain and vector databases in production

    rag patterns, vector db choices, and production concerns.

    rag vector db langchain
    view →
    crewai
    feb 2025

    multi ai agent systems with crewai

    multi-agent workflows and task planning patterns.

    agents tools workflows
    view →
    ibm
    sep 2021

    machine learning with python

    supervised learning, evaluation, and practical ml workflows.

    ml sklearn evaluation
    view →
    ibm
    jul 2021

    python for data science, ai & development

    data wrangling with numpy, pandas, and api integration.

    pandas numpy apis
    view →
    deeplearning.ai
    jun 2021

    ai for everyone

    ai strategy, ethics, and building an ai-first organization.

    strategy ethics neural nets
    view →
    google
    jun 2021

    crash course on python

    python fundamentals, oop, and automation scripting.

    python oop automation
    view →
    u of michigan
    jun 2021

    programming for everybody

    intro to python and core programming concepts.

    python basics
    view →
    hackerrank
    jun 2021

    python (basic)

    problem solving, algorithms, and data structures in python.

    algorithms data structures
    view →
    ibm
    jun 2021

    building ai powered chatbots

    nlp, watson assistant, and conversational dialog design.

    nlp watson dialog
    view →

    field notes

    thoughts on data science, ai, and whatever i'm tinkering with.

    writing 4 entries
    Terminal output

    what a month of failing taught me about small language models

    lessons from building and breaking ai agents, and why the small models might still surprise you.

    data collection and analysis with r

    a practical guide to pulling US census data with tidycensus, cleaning it fast, and turning it into maps and insights.

    beyond words: the evolution of nlp with transformers

    a high level map of how transformers work, why attention matters, and where zero-shot actually helps in real systems.

    qubits vs. bits: unleashing quantum data science

    a beginner-friendly way to think about qubits, why they matter, and what quantum could change for optimization and ml.

    featured talks

    presented at the MS DS lightning talks at the college of information science, university of arizona for two consecutive years (2023-2024).

    stage
    2023
    watch →
    2024
    watch →

    what people say

    from those who've worked with me directly.

    Arjun Singri
    Arjun Singri
    Zachary Brooks
    Zachary Brooks
    Dr. Smaranika Mohapatra
    Dr. Smaranika

    "she's incredibly hardworking and dependable. i'd happily work with her again."

    I worked with Riyanshi on Protexo, where she played a key role in building our scam detection models. She put in serious effort collecting real-world scam and legitimate message data from across the internet, cleaning messy datasets, and extracting meaningful signals from noisy text. That foundation made a big difference in our model quality. I highly recommend her to any team looking for a strong data scientist.

    Arjun Singri Founder & CEO, Protexo Inc view on linkedin

    "she makes all of us better."

    Riyanshi joined UGenome shortly after she finished her masters degree. Our immediate need didn't 100% fit with Riyanshi's training. This is a hard ask for many professionals yet Riyanshi has taken on this challenge with aplomb. Her work ethic, her professional communication, her desire to learn and produce high quality technical content in the complex field of genomics has been impressive. The entire team enjoys working with her.

    Zachary S. Brooks, PhD, EMBA CEO, UGenome AI view on linkedin

    "riyanshi is one of those who stood out from the rest."

    She worked with me on a research project centered around Sentimental Analysis with NLP and Machine Learning. She's an exceptional student who is passionate about learning and self improvement. Her critical thinking, teamwork, and communication skills are outstanding. She has the strength to break the comfort zone and move forward to achieve something new. She's a valuable asset to any team.

    Dr. Smaranika Mohapatra Assistant Professor, Manipal University Jaipur view on linkedin
    try with sound