Loading...
Loading...
System Architecture
From raw news ingestion across four global data sources to confidence-scored, verified risk signals delivered in under five seconds.
01
Real-time data collection from global feeds
The ingestion layer continuously monitors four global data sources for financial news and regulatory filings. Each source runs on independent polling cycles with adaptive throttling to respect rate limits while maintaining sub-minute latency on breaking stories.
GDELT Global News
Context API with configurable lookback windows. Monitors news across 65 languages. Adaptive throttling with retry/backoff on HTTP 429.
Google News RSS
Entity-specific RSS feeds filtered by ticker symbol. Sub-minute latency on breaking stories. XML parsing with fallback encoding.
Yahoo Finance RSS
Market-focused financial headlines: earnings, analyst ratings, dividends. Pre-market and post-market coverage.
SEC EDGAR
EFTS API for 8-K (material events), 10-K (annual), and 10-Q (quarterly) filings. Filing-type-specific severity mapping.
02
Classification, deduplication, and enrichment
Raw articles are classified into 10 event types using keyword-pattern matching and NLP. A three-layer deduplication system ensures zero duplicate signals. Each signal is enriched with entity resolution, severity scoring, and source quality metrics.
Event Classifier
Keyword-pattern matching across 10 categories: M&A, Legal, Regulatory, Supply Chain, Management, Financial, Market, Opportunity, Cyber, Macro.
Deduplication Engine
Three layers: local SHA-256 hash, HTTP 409 conflict detection, SQLite primary key constraint. Zero duplicates at rest.
Entity Resolver
Maps article mentions to tracked tickers. Handles aliases, subsidiaries, and multi-entity articles.
Severity Scorer
Maps event type + keyword intensity to severity levels: Critical, High, Medium, Low.
03
Durable storage with versioning and audit trail
Signals are persisted to SQLite with full audit trails including ingestion timestamp, source URL, classification metadata, and scoring parameters. The storage layer supports schema versioning and provides the foundation for historical accuracy tracking.
Signal Store
SQLite database with unique constraint on signal_id. Stores full signal metadata, sources, and classification data.
Outcome Tracker
Records price movements post-signal for accuracy measurement. Configurable time windows (default 24h).
Audit Log
Full pipeline trace: ingestion time, source, dedup decisions, classification, scoring parameters.
Schema Migration
Versioned schema with forward-compatible migrations. Zero-downtime upgrades.
04
Multi-factor confidence and impact scoring
The inference layer applies a multi-factor scoring model to generate confidence and impact predictions. Factors include source diversity, verification status, severity-to-move alignment, source quality, and event-type-specific accuracy modifiers learned from historical outcomes.
Confidence Scorer
5-factor weighted model: signal confidence (40%), source diversity (20%), verification status (20%), severity alignment (10%), source quality (10%).
Impact Predictor
Predicts likely price direction and magnitude based on event type, severity, and historical accuracy data.
Accuracy Tracker
Auto-tunes scoring weights based on prediction vs. outcome comparisons. Runs daily calibration.
Quality Gates
Verification checks: minimum source count, confidence floor, cross-source corroboration.
05
Multi-channel distribution to end users
Scored signals are distributed through five delivery channels with sub-5-second latency. The REST API supports HMAC-signed authentication, while the live dashboard auto-refreshes every 5 seconds. Daily digests, webhooks, and CSV exports complete the channel mix.
REST API
Authenticated endpoints with HMAC signing, pagination, entity filtering, and severity thresholds. Rate limited per API key.
Live Dashboard
Real-time signal feed with 5-second auto-refresh. Severity filtering, investor mode, signal grouping, and AI chat.
Daily Digest
Automated summary emails with top signals, severity breakdown, trend analysis, and accuracy metrics.
Webhooks
HMAC-signed push notifications to downstream systems on new high-severity signals.
CSV Export
Full signal history export with all metadata for offline analysis and audit compliance.
Pipeline Status: Operational
All 5 stages running. Average end-to-end latency: 4.2 seconds.