Projects

A selection of things I've built, shipped, and experimented with.

HOA CC&R PDF Parser + RAG Q&A Pipeline

Built an API-triggered document knowledge extraction service for a client who needed to ingest large 50 to 100 page HOA/CC&R PDFs and answer dozens of structured questions automatically. The service accepts a webhook payload, fetches the PDF from a provided URL/path, converts pages to images, runs OCR locally (Tesseract), and chunks the text into blocks where SentenceTransformer embeddings are stored in a FAISS index to enable fast and token-efficient answer retrieval. For question answering from relevent snippets, the service uses an OpenAI model for grounded answers, with quality controls to ensure answers fall in the expected JSON format.

PDF-to-Image (Poppler) OCR (Tesseract / pytesseract) SentenceTransformers Embeddings FAISS Vector Index / Similarity Search Retrieval-Augmented Generation (RAG) OpenAI API

Music-Theoretic Attention Biases for Symbolic Music Generation

Injected music-theory inductive biases into transformer attention for symbolic MIDI generation, encoding circle-of-fifths harmonic distance and temporal onset distance as learned bias embeddings added directly to attention logits. Trained a decoder-only transformer on GigaMIDI (~382K files, ~565M notes) using octuple tokenization across four experimental conditions (no bias, harmonic-only, temporal-only, combined) with data-fraction ablations to measure sample efficiency gains.

PyTorch Transformers Music Information Retrieval Attention Mechanism Design Symbolic Music (MIDI) Weights & Biases

Reducing Reward Variance in Continuous-Control DRL via Genetic Pre-Training

Investigated whether genetic algorithm (GA) policy pre-training of agent weights could improve stability, reduce reward variance, and potentially increase reward outcomes in deep reinforcement learning (DRL) continuous control problems using CleanRL and Weights & Biases tracking. Designed two genetic algorithm variants to evolve actor networks quickly in parallel and benchmarked the effects of further RL training compared to those with randomly initialized weights.

PyTorch Reinforcement Learning PPO Soft Actor-Critic (SAC) Genetic Algorithms Weights & Biases

Monte-Carlo Battleship Solver

An interactive Battleship game driven by frequentist probability. Each turn the solver samples thousands of random legal board configurations, building a hit-probability distribution over every cell to pick optimal shots. Board generation is accelerated with Cython-compiled routines and bitboard representations, packing entire ship layouts into single integers for fast collision checks and placement validation. Play against it, get suggested moves, or tune the sampling budget to trade speed for accuracy in real time.

Monte Carlo Simulation Frequentist Probability Cython Bitboards FastAPI Interactive Game Iterative Optimization

Real-Time Web Lead Qualification and Calendar Routing

ML-driven lead scoring models in real time with dynamic rerouting that cut low-quality sales meetings by 20–40% and boosted top salesperson conversion rates 2–3x, reducing the need for large sales headcounts. Built for clients at Modulus Partners alongside A/B testing programs and CRM integrations with HubSpot, OnceHub, and Zapier.

FastAPI Scikit-Learn A/B Testing Site Tracking Data Collection Campaign Management HubSpot API Custom Zapier Webhook

Census Record-Linking Research

ML research within the BYU Economics department applying Boosted Tree classifiers to census data to identify and track individuals named in multiple census years. Using graph-theoretical document relationships, the strategic choice of record training pairs for training and inference reduced record-linking error rate by 80%.

Graph Theory XGBoost Cross-Discipline MS SQL Server Research