Portrait of Dom Marhoefer

NLP Master's Student at UC Santa Cruz

Dom Marhoefer

Hi! I'm Dom, an NLP Engineer & Computational Neuroscience Researcher.

I've built natural language pipelines for national nonprofits, engineered real-time streaming and stimulation infrastructure for neural organoids, and developed energy demand forecasting models for the PJM grid.

Experience
  1. Neural Systems Interface & Development

    Sep 2025 — Present
    ZMQNumPyData streaming
    • Engineered a Python wrapper for the 3Brain BioCam MEA system, enabling real-time programmatic control over 4,096 electrode channels.
    • Developed asynchronous data streaming pipelines using ZMQ to handle high-bandwidth raw neural signals from brain organoids, facilitating downstream real-time spike sorting and stimulation triggers.
  2. Data Operations Intern, Remote

    Jun — Dec 2024
    Pandasscikit-learnNLTKVADERLDA
    • Applied Latent Dirichlet Allocation (LDA) to decompose high-dimensional survey text into latent thematic distributions, identifying key drivers of positive and negative sentiment.
    • Trained Random Forest ensembles to predict participant retention; performed feature importance ranking and data visualization to validate model decision boundaries and interpret patterns.
    • Built a custom preprocessing pipeline using NLTK for lemmatization and stop-word filtering, followed by VADER for lexicon-based sentiment polarity scoring on longitudinal survey data.
  3. R&D Consulting Intern, New York, NY

    Jun — Dec 2023
    Regular expressionspython-docx
    • Architected a regex-based Named Entity Recognition (NER) system to extract structured insights from highly heterogeneous, unstructured donor prospect data.
    • Developed python-docx scripts to automate the ingestion and normalization of semi-structured text, reducing manual data entry latency and improving downstream relational database integrity.
  4. Data Operations & AI Intern, Remote

    May — Aug 2023
    scikit-learnPandasMeteostat
    • Engineered predictive models for PJM COMED hourly load using 10 years of historical data; integrated weather variables via Meteostat to capture seasonal sensitivities.
    • Achieved an RMSE within ~0.5 MW of PJM institutional benchmarks by optimizing hyperparameter manifolds via RandomizedSearchCV.
    • Developed a tail-weighted evaluation pipeline aligned with Energy Price Thresholds (EPT), prioritizing model accuracy during peak volatility periods to optimize battery arbitrage logic.
Education
UC Santa Cruz logo

UC Santa Cruz

M.S. Natural Language Processing

Silicon Valley Campus

Expected Mar 2027
New York University logo

New York University

B.A. Economic Policy & Language and Mind

New York, NY

Jul 2024

Involvement

  • Camp Kesem·Development Coordinator
    • Led multi-year fundraising initiatives raising $50K+ annually, coordinating 50 counselors and supporting 100+ campers across recurring programs.
    2020 — 2024
  • Zeta Beta Tau·Co-President and Philanthropy Chair
    • Directed operations for a 70-member chapter, managing budgets, university relations, and executive board decisions.
    2022 — 2024
  • The B+ Foundation·Team Captain
    • Ran campus-wide fundraising campaigns for pediatric cancer research, coordinating volunteers and raising $15K+.
    2022 — 2024
Selected Work
PyTorchTransformersRoBERTa

SemEval 2026, Task 10: Span Extraction & Conspiracy Classification2026

Multi-role semantic span extraction and document-level conspiracy classification and prediction

Subtask 1: Span-based Multi-Role Information Extraction

  • Built a RoBERTa-large encoder with pooled span representations from boundary, width, and contextual features.
  • Designed role-specific IoU thresholds for Action, Effect, and Evidence and implemented a custom decoding pipeline with containment-based NMS and span merging.
  • Tuned decision thresholds on held-out validation data.
  • Achieved 0.23 decoded micro-F1 under IoU-based span matching.

Subtask 2: Document-level Conspiracy Classification

  • Trained a RoBERTa-large 3-class classifier (Yes / No / Can't tell) with stratified train–validation split, label smoothing, and early stopping.
  • Ran multi-seed training and selected the best checkpoint by weighted F1.
  • Reached 0.77 weighted F1 on validation.
ZMQNumPyReal-time Systems

ThreeBrain Wrapper2026

Python API wrapper and ZMQ bridge for real-time brain organoid recordings

  • Built a Python API wrapper for the 3Brain BioCam system.
  • Implemented a modular backend (dummy, ZMQ, hardware) for offline testing and live acquisition.
  • Engineered real-time packet streaming and timestamp alignment.
  • Created test harnesses for downstream neural analysis workflows.
PyTorchMNETransformers

Multimodal Neural Semantic Decoding Research2026

Multimodal alignment research for silent speech BCI using EEG/fMRI and language models

  • Working on aligning brain activity (fMRI/EEG) with LLM embeddings (like Llama-3). The goal is to see if we can use the 'knowledge' inside a language model to help translate brain signals into meaning.
  • Using standard benchmarks like the Huth (fMRI) and Zuco (EEG) datasets to test how well machine learning models can reconstruct what a person is reading or thinking.
  • Moving beyond simple accuracy scores to use more rigorous methods (like RSA) that actually measure if the 'shape' of the model's data matches the 'shape' of the brain's data.
Let's Connect
Reach out for
  • -Collaborations in NLP, computational linguistics, or neurotechnology
  • -ML/DL engineering or applied research opportunities
  • -Open-source projects at the intersection of language and neural systems
  • -General questions about my work or experience
Learn more about me
Contact
Based inSanta Clara, CA

Open to remote opportunities and relocating for the right role.

© 2026 Dom Marhoefer