Debra Capadona

Senior Technical Program Manager | Data Engineering & ML Systems

Edge Condition Analysis is a personal research and delivery framework for exploring complex data and ML systems.

A demonstration of my ability to deliver modern end-to-end systems: from architecture and data pipelines to scalable ML inference and validation.

πŸ“ˆ
Linguistic Dimensions
9 BERT models analyzing emotional valence, temporal urgency, certainty collapse, and 6 other dimensions across 2 years of forum discourse
β†’ Explore Interactive Timeline
πŸ’¬
Word Burst Analysis
Interactive explorer revealing which topics dominated Hacker News discussions month-by-month throughout 2024-2025
β†’ Explore Word Bursts
βš™οΈ
Modern Tech Stack
Production-grade data engineering: PyTorch, PostgreSQL, Docker, GPU acceleration, interactive Plotly dashboards
β†’ View Technologies
πŸ—ΊοΈ
Project Roadmap
Future enhancements and planned features for expanding the platform's analytical capabilities
β†’ View Roadmap
Project Context & Design Intent
System Demonstration Β· Ongoing Exploratory System

This project is an evolving, scalable linguistic analysis platform designed to explore how collective language shifts under changing conditions. It currently analyzes ~200,000 Hacker News discussions (2024–2025) using nine custom BERT models trained with a weakly supervised approach, combining small curated seed labels with unsupervised pattern discovery. The system is built to scale across larger corpora as additional data sources and analytical dimensions are introduced.

The work focuses on surfacing interpretable linguistic signalsβ€”including emotional valence, urgency, certainty decay, and topic dominanceβ€”rather than producing finalized predictive outputs. The architecture supports ongoing experimentation, model iteration, and methodological refinement, allowing analytical assumptions and measurement strategies to evolve alongside the data.

Interactive Linguistic Dimensions Timeline

Explore 9 BERT-based linguistic dimensions across 2024-2025. Hover for details, toggle dimensions on/off, use the slider to zoom into specific timeframes. Event markers show major 2024-2025 events for context.

Word Burst Explorer

Interactive visualization showing which words "burst" (appear significantly more than baseline) in Hacker News discussions each month. Use the slider to navigate through 24 months of data.

Technology Stack

🐍
Python 3.11
πŸ”₯
PyTorch
πŸ€–
BERT Transformers
🐘
PostgreSQL 16
🐳
Docker
⚑
GPU/CUDA
πŸ“Š
Plotly
πŸ“ˆ
BERTopic
πŸ”„
Alembic
πŸ“‰
SciPy Stats
🎨
Matplotlib
πŸ”’
NumPy/Pandas

System Architecture

Data Pipeline:

1. Collection β†’ Scraped 197K HN stories via API with incremental checkpointing
2. Processing β†’ Tokenized and normalized text, created word-level indexes
3. ML Inference β†’ Applied 9 BERT models using GPU acceleration (~300 stories/sec)
4. Storage β†’ PostgreSQL with optimized schema, indexes, and foreign keys
5. Analysis β†’ Statistical validation, baseline establishment, coherence scoring
6. Visualization β†’ Interactive Plotly dashboards for exploration

Infrastructure: Docker containerization, Alembic migrations, GPU-optimized PyTorch models

System Architecture & Methodology

Project Scope

Built end-to-end linguistic analysis platform to demonstrate technical program management capabilities for senior roles at companies like Coinbase. The system showcases:

  • Large-scale data engineering - Processing millions of records with proper database design
  • ML model deployment - Training and deploying 9 BERT models with GPU acceleration
  • Statistical rigor - Establishing baselines, running t-tests, calculating effect sizes
  • Modern tech stack - Docker, PostgreSQL, PyTorch, interactive visualizations
  • Project delivery - Complete system from architecture to deployment in 6 weeks
01

Data Collection

Designed and implemented incremental data pipeline scraping Hacker News API. Collected 197,496 stories across 2024-2025 with proper error handling, rate limiting, and resume capability.

02

ML Model Training

Trained 9 custom BERT models for linguistic dimension analysis. Optimized for GPU inference achieving ~300 stories/second processing speed. Models saved and versioned for reproducibility.

03

Database Architecture

Designed normalized PostgreSQL schema handling millions of word-level associations. Implemented proper indexing, foreign keys, and Alembic migrations. Optimized queries for analytical workloads.

04

Statistical Analysis

Established baselines across full dataset, performed t-tests for significance, calculated effect sizes. Developed Event Coherence Index measuring cross-dimensional synchronization.

05

Visualization & Delivery

Created interactive Plotly dashboards enabling exploration of 2-year dataset. Built responsive portfolio site demonstrating technical execution. Documented methodology and prepared for GitHub publication.