About Me

Hi! Thanks for visiting my site. Here you'll find a few projects I've worked on recently. I particularly enjoy working on projects that involve all parts of the data science stack. From data collection to modeling to software engineering/web-development. Right now, I'm a Master's of Data Science student at the University of San Francisco. I'm also a current Data Scientist at Accountability Counsel, a human rights non-profit. I create web-scraping tools and build document search capabilities, in order to better track global human rights complaints. My background is in data analytics and visualization, building business intelligence platforms. Always keen to discuss data science or collaborate on new projects, reach out!

Projects

Click the icon links for the project website and/or code repo.

Predicting Triathlon Times

When I was training for my first triathlon, I had no previous experience swimming competitively. But, I wanted to know what kind of time I should be aiming for! So I was curious: could I build a machine learning model to predict a swim time based on my age, gender, and expected bike/run times? The resulting project involved scraping the web for Olympic distance triathlon race results, building a model, and deploying it on AWS.

Prostate Cancer Detection

A Kaggle competition, our goal is to predict the presence and severity of prostate cancer from biopsy slides. Better models will lead to better tools for pathologists in their workflow treating patients. Model built using PyTorch.

Using Data to Address Accountability in Global Development Projects

As part of my research fellowship with Accountability Counsel, I gave a presentation on the work I did building data-driven tools to help users track human-rights cases affected by global development projects. The audience are members of the Schmidt Family Foundation, University of San Francisco Data Institute, and general data science community.

Making Music with Machine Learning

The Indigo web-app uses ML to easily allow users to interpolate between music samples, and generate accompanying drum tracks. Built on top of Google Magenta, a neural network package for music.

Beer Style Classification

Using data on Homebrewed Beer Recipe data, the style of each beer is predicted based on physical characteristic of the recipe (ABV, IBUs, etc.). This is a classification problem with a high number of classes, and a variety of different models are used to predict the style of beer (logistic regression, kNN, RandomForests). Model and pipelines built using Scikit Learn.

Visualization and Application of Clustering Methods

  • How does basic unsupervised clustering work?
  • What are some issues with clustering that can be solved by smarter strategies?
  • How can clustering be applied to image compression? (Cute dog pictures involved)

Click in to find out!

Exploration of Feature Importance Methods via Housing Data

Feature importance is hugely important in building better, simpler, and more generalizable machine learning models. In this notebook we will use housing data to compare different feature importance methods, understand features predictive capability, and potential pitfalls.