Methodology — Corporate Narrative Intelligence Lab

Cosine Similarity

TF-IDF vector representation of each annual filing, compared year-over-year using cosine distance. A score of 1.0 indicates identical term distributions; scores below 0.75 are treated as potentially significant narrative changes. This is the primary stability signal — other signals are used to characterize why the text changed.

Sentiment Analysis

Finance-oriented positive, negative, and uncertainty word ratios derived from the Loughran–McDonald (LM) financial sentiment dictionary. Net tone = positive ratio − negative ratio. A FinBERT-compatible output schema is included as a placeholder for supervised model integration once labeled data is available. The deterministic LM model runs fully offline.

Readability

Gunning Fog index, which estimates the years of formal education required to read a passage on first encounter. Index ≥ 12 is college-level; ≥ 17 is graduate-level. Year-over-year increases in Fog index are associated with defensive or legalistic disclosure language — a secondary signal in abnormal change scoring.

Topic Shift

TF-IDF keyword extraction identifies the dominant vocabulary cluster for each filing year. Topic shift score is computed as the Jaccard distance between the top-N keyword sets of adjacent years. High topic shift alongside a similarity drop suggests strategic repositioning or an external event introducing new vocabulary (e.g., a product launch, regulatory action, or acquisition).

Abnormal Detection

A weighted rule model combining four narrative signals (similarity drop, tone delta, readability delta, topic shift) with two contextual signals (ROA movement, leadership-event proximity). A year is flagged when several moderate signals or one very large narrative signal occur together. Thresholds are intentionally transparent and can be replaced with a statistical or supervised model once labeled data exists.

Scoring weights (current defaults): similarity drop 35% · tone delta 20% · readability delta 15% · topic shift 20% · ROA movement 5% · leadership-event proximity 5%.

Event Classification

Leadership events (CEO turnover, CFO succession, board changes) are classified by type (planned succession, forced departure, lateral hire, external appointment) and linked to the fiscal year in which they occurred. The model tests whether co-occurrence with narrative abnormality exceeds what would be expected by chance across the sample.

Entity	Key Fields	Source
filings	ticker · fiscal_year · section · raw_text	EDGAR 10-K
textual_metrics	word_count · net_tone · lm_pos/neg/unc · fog_index · topic_label	Computed
narrative_change	cosine_similarity · similarity_drop · Δtone · Δfog · topic_shift · abnormal_flag	Computed
financial_metrics	total_assets · roa · revenue_growth	S&P Global
leadership_events	event_date · event_type · turnover_type · successor_origin	BoardEx / Manual

Hybrid Textual Analysis Model