Purpose
Demonstrate product thinking, text analytics, interpretable AI-assisted analysis,
and dashboard storytelling. This project is a research portfolio piece — not financial
advice, legal guidance, or an academic regression model ready for peer review.
The underlying research question: do companies that experience negative cyclical
financial performance, or top management team turnover, exhibit measurably different
disclosure language patterns in their annual reports?
Out of Scope
The following are intentionally excluded from this demo version:
User authentication · Live SEC/EDGAR scraping · Paid data integrations (Bloomberg, Refinitiv) ·
Real-time financial data · Multi-user workspaces · Full academic regression engine with
bootstrapped confidence intervals · Production-grade MLOps pipeline.
Tech Stack
Analysis: Python · pandas · scikit-learn (TF-IDF, cosine similarity) ·
NLTK (tokenization, Gunning Fog) · Loughran–McDonald financial dictionary ·
FinBERT-compatible sentiment schema.
Demo interface: Streamlit (Python) — the original app — ported
to static HTML/JS for integration into this blog. Chart.js for interactive visualizations.
Data: EDGAR 10-K filings (Item 7 MD&A sections) ·
S&P Global financial data · BoardEx leadership events.
Extensions
Natural extensions of the current architecture:
· Replace the LM lexical scorer with a fine-tuned FinBERT model once labeled examples exist
· Add entity extraction to auto-classify leadership events from 8-K press releases
· Extend to 10-K Business Description (Item 1) for strategy-level language changes
· Build a regression model to test statistical significance of narrative shifts vs. financial outcomes
· Multi-sector expansion (energy, financial services, consumer)