Projects
A selection of things I've built, shipped, and experimented with.
HOA CC&R PDF Parser + RAG Q&A Pipeline
Built an API-triggered document knowledge extraction service for a client who needed to ingest large 50 to 100 page HOA/CC&R PDFs and answer dozens of structured questions automatically. The service accepts a webhook payload, fetches the PDF from a provided URL/path, converts pages to images, runs OCR locally (Tesseract), and chunks the text into blocks where SentenceTransformer embeddings are stored in a FAISS index to enable fast and token-efficient answer retrieval. For question answering from relevent snippets, the service uses an OpenAI model for grounded answers, with quality controls to ensure answers fall in the expected JSON format.
Music-Theoretic Attention Biases for Symbolic Music Generation
Injected music-theory inductive biases into transformer attention for symbolic MIDI generation, encoding circle-of-fifths harmonic distance and temporal onset distance as learned bias embeddings added directly to attention logits. Trained a decoder-only transformer on GigaMIDI (~382K files, ~565M notes) using octuple tokenization across four experimental conditions (no bias, harmonic-only, temporal-only, combined) with data-fraction ablations to measure sample efficiency gains.
Reducing Reward Variance in Continuous-Control DRL via Genetic Pre-Training
Investigated whether genetic algorithm (GA) policy pre-training of agent weights could improve stability, reduce reward variance, and potentially increase reward outcomes in deep reinforcement learning (DRL) continuous control problems using CleanRL and Weights & Biases tracking. Designed two genetic algorithm variants to evolve actor networks quickly in parallel and benchmarked the effects of further RL training compared to those with randomly initialized weights.
Monte-Carlo Battleship Solver
An interactive Battleship game driven by frequentist probability. Each turn the solver samples thousands of random legal board configurations, building a hit-probability distribution over every cell to pick optimal shots. Board generation is accelerated with Cython-compiled routines and bitboard representations, packing entire ship layouts into single integers for fast collision checks and placement validation. Play against it, get suggested moves, or tune the sampling budget to trade speed for accuracy in real time.
Real-Time Web Lead Qualification and Calendar Routing
ML-driven lead scoring models in real time with dynamic rerouting that cut low-quality sales meetings by 20–40% and boosted top salesperson conversion rates 2–3x, reducing the need for large sales headcounts. Built for clients at Modulus Partners alongside A/B testing programs and CRM integrations with HubSpot, OnceHub, and Zapier.
Census Record-Linking Research
ML research within the BYU Economics department applying Boosted Tree classifiers to census data to identify and track individuals named in multiple census years. Using graph-theoretical document relationships, the strategic choice of record training pairs for training and inference reduced record-linking error rate by 80%.