Cosine Similarity
TF-IDF vector representation of each annual filing, compared year-over-year using cosine distance.
A score of 1.0 indicates identical term distributions; scores below 0.75 are treated as potentially
significant narrative changes. This is the primary stability signal — other signals are used to
characterize why the text changed.
Sentiment Analysis
Finance-oriented positive, negative, and uncertainty word ratios derived from the
Loughran–McDonald (LM) financial sentiment dictionary. Net tone = positive ratio − negative ratio.
A FinBERT-compatible output schema is included as a placeholder for supervised model integration
once labeled data is available. The deterministic LM model runs fully offline.
Readability
Gunning Fog index, which estimates the years of formal education required to read a passage on
first encounter. Index ≥ 12 is college-level; ≥ 17 is graduate-level. Year-over-year increases
in Fog index are associated with defensive or legalistic disclosure language — a secondary signal
in abnormal change scoring.
Topic Shift
TF-IDF keyword extraction identifies the dominant vocabulary cluster for each filing year.
Topic shift score is computed as the Jaccard distance between the top-N keyword sets of
adjacent years. High topic shift alongside a similarity drop suggests strategic repositioning
or an external event introducing new vocabulary (e.g., a product launch, regulatory action,
or acquisition).
Abnormal Detection
A weighted rule model combining four narrative signals (similarity drop, tone delta,
readability delta, topic shift) with two contextual signals (ROA movement, leadership-event
proximity). A year is flagged when several moderate signals or one very large narrative
signal occur together. Thresholds are intentionally transparent and can be replaced
with a statistical or supervised model once labeled data exists.
Scoring weights (current defaults):
similarity drop 35% · tone delta 20% · readability delta 15% · topic shift 20% ·
ROA movement 5% · leadership-event proximity 5%.
Event Classification
Leadership events (CEO turnover, CFO succession, board changes) are classified by type
(planned succession, forced departure, lateral hire, external appointment) and linked to
the fiscal year in which they occurred. The model tests whether co-occurrence with
narrative abnormality exceeds what would be expected by chance across the sample.