Debra Capadona
Senior Technical Program Manager | Data Engineering & ML Systems
Edge Condition Analysis is a personal research and delivery framework for exploring complex data and ML systems.
A demonstration of my ability to deliver modern end-to-end systems: from architecture and data pipelines to scalable ML inference and validation.
This project is an evolving, scalable linguistic analysis platform designed to explore how collective language shifts under changing conditions. It currently analyzes ~200,000 Hacker News discussions (2024β2025) using nine custom BERT models trained with a weakly supervised approach, combining small curated seed labels with unsupervised pattern discovery. The system is built to scale across larger corpora as additional data sources and analytical dimensions are introduced.
The work focuses on surfacing interpretable linguistic signalsβincluding emotional valence, urgency, certainty decay, and topic dominanceβrather than producing finalized predictive outputs. The architecture supports ongoing experimentation, model iteration, and methodological refinement, allowing analytical assumptions and measurement strategies to evolve alongside the data.
Interactive Linguistic Dimensions Timeline
Explore 9 BERT-based linguistic dimensions across 2024-2025. Hover for details, toggle dimensions on/off, use the slider to zoom into specific timeframes. Event markers show major 2024-2025 events for context.
Word Burst Explorer
Interactive visualization showing which words "burst" (appear significantly more than baseline) in Hacker News discussions each month. Use the slider to navigate through 24 months of data.
Technology Stack
System Architecture
Data Pipeline:
1. Collection β Scraped 197K HN stories via API with incremental checkpointing
2. Processing β Tokenized and normalized text, created word-level indexes
3. ML Inference β Applied 9 BERT models using GPU acceleration (~300 stories/sec)
4. Storage β PostgreSQL with optimized schema, indexes, and foreign keys
5. Analysis β Statistical validation, baseline establishment, coherence scoring
6. Visualization β Interactive Plotly dashboards for exploration
Infrastructure: Docker containerization, Alembic migrations, GPU-optimized PyTorch models
System Architecture & Methodology
Project Scope
Built end-to-end linguistic analysis platform to demonstrate technical program management capabilities for senior roles at companies like Coinbase. The system showcases:
- Large-scale data engineering - Processing millions of records with proper database design
- ML model deployment - Training and deploying 9 BERT models with GPU acceleration
- Statistical rigor - Establishing baselines, running t-tests, calculating effect sizes
- Modern tech stack - Docker, PostgreSQL, PyTorch, interactive visualizations
- Project delivery - Complete system from architecture to deployment in 6 weeks
Data Collection
Designed and implemented incremental data pipeline scraping Hacker News API. Collected 197,496 stories across 2024-2025 with proper error handling, rate limiting, and resume capability.
ML Model Training
Trained 9 custom BERT models for linguistic dimension analysis. Optimized for GPU inference achieving ~300 stories/second processing speed. Models saved and versioned for reproducibility.
Database Architecture
Designed normalized PostgreSQL schema handling millions of word-level associations. Implemented proper indexing, foreign keys, and Alembic migrations. Optimized queries for analytical workloads.
Statistical Analysis
Established baselines across full dataset, performed t-tests for significance, calculated effect sizes. Developed Event Coherence Index measuring cross-dimensional synchronization.
Visualization & Delivery
Created interactive Plotly dashboards enabling exploration of 2-year dataset. Built responsive portfolio site demonstrating technical execution. Documented methodology and prepared for GitHub publication.