weekly-prediction-cycle.py
JioHotstar +3.95 | Tier 2 | 62.25 — IPL launch imminent, 300M subscriber leverage
COL/BeLive +3.15 | Tier 3 | 44.55 — FILMART launch converts to execution, SaaS provable
Disney +2.3 | Tier 1 | 76.55 — Locker Diaries #1, DramaBox Accelerator investment
ReelShort -2.05 | Tier 1 | 82.0 — Production head defection, ShortMax 3888% growth eroding position
Netflix -2.0 | Tier 2 | 60.8 — No production activity, mobile engagement gap widening
Amazon -2.6 | Tier 3 | 50.2 — ONLY major platform with zero microdrama strategy
KLIP -2.65 | Tier 4 | 22.35 — Structural squeeze from JioHotstar
Predictive Signals:
BULLISH: JioHotstar (+9.45), COL/BeLive (+7.25), Disney (+5.55), DramaBox (+5.25), GoodShort (+4.5)
BEARISH: Amazon (-5.8), Netflix (-5.0), ReelShort (-2.6)
| Method | Dir. Accuracy | MAE | Brier Score | Notes |
|---|---|---|---|---|
| Persistence | 23.5% | 1.803 | 0.250 | Predicts no change. Floor performance. |
| Naive Momentum | 23.5% | 1.803 | 0.279 | Extends last-week direction. No improvement over persistence. |
| Mean Reversion | 47.1% | 2.107 | 0.250 | Best baseline. Companies tend to revert toward tier mean. |
| KG-Augmented (defaults) | 23.5% | 1.803 | 0.250 | Default parameters leave significant accuracy on the table. |
| Parameter | Exp 1 Default | Exp 2 Optimized | Delta | Interpretation |
|---|---|---|---|---|
| direction_threshold | 0.500 | 1.295 | +159% | Higher bar for calling a directional move. Reduces false positives. |
| confidence_base | 0.600 | 0.443 | -26% | Lower base confidence. System is more cautious by default. |
| magnitude_thresh_1 | 3.000 | 3.020 | +1% | Near-default. Magnitude thresholds were already reasonable. |
| magnitude_thresh_2 | 5.000 | 5.076 | +2% | Near-default. |
| consistency_thresh | 2.000 | 1.980 | -1% | Near-default. |
| magnitude_bonus_1 | 0.100 | 0.120 | +20% | Slightly rewards larger moves. |
| magnitude_bonus_2 | 0.100 | 0.136 | +36% | Larger bonus for big moves. System learns big moves are informative. |
| consistency_bonus | 0.050 | 0.040 | -20% | Consistency signal matters less than expected. |
| mean_reversion_rate | 0.100 | 0.257 | +157% | Strong mean reversion signal. Companies tend to revert toward tier means. |
| anomaly_contributes | False | True | changed | Anomaly signal activated. Dimension-composite gaps are predictive. |
| divergence_weight | 0.000 | 0.180 | new signal | Inter-dimension divergence is informative (18% weight). |
| tier_proximity_weight | 0.000 | 0.096 | new signal | Proximity to tier boundaries is predictive (9.6% weight). |
ETL Load
Loads new week's SBPI scoring data into the Oxigraph RDF store. Validates triples against the SBPI ontology (sbpi.ttl). Currently processing 2,588 triples across 17 companies, 5 dimensions, 3 weekly snapshots.
Prediction Accuracy Check
Compares previous week's predictions against actual outcomes. Feeds accuracy metrics into the optimization loop. Skipped if no prior predictions exist.
Prediction Generation
Multi-signal prediction engine using the 12 optimized parameters from best-config.json. Generates directional predictions (up/down/stable) with confidence scores and magnitude estimates for each company.
Attestation Upgrade
Evaluates evidence quality backing each score. Upgrades attestation metadata based on source diversity, recency, and corroboration. Tracks the provenance chain from raw source to scored assertion.
Nightly Insights
Runs 7 SPARQL queries (weekly movers, tier transitions, dimension anomalies, distribution-community gaps, predictive signals, attestation coverage, platform vs pure-play) against the Oxigraph store. Produces a timestamped markdown insight digest.
KG Interface Optimization (Exp 2)
Re-runs 30-trial TPE optimization against expanded historical data. Writes improved parameters to best-config.json if a better configuration is found. This is the core autoresearch loop from Markovick et al.
Event Impact Analysis (Track A)
Per-company event impact reports. Researches news, deals, and app store movements. Scores impact across 5 SBPI dimensions. Classifies events as MATERIAL, MONITORING, or NOISE. Last run analyzed 22 companies with 3 material events detected.
Defensive BI Recommendations (Track B)
Generates mitigation strategies for MATERIAL impact events from Track A. Filters for strategic relevance to prevent reactive noise. Only triggers when Track A identifies events worth defending against.
Signal Weight Optimization (Track C)
TPE autoresearch loop specifically for signal weighting in the BI agent output. Re-optimizes only when new accuracy labels are available. Prevents reactive noise from accumulating in the BI recommendations.
SerpAPI / Manual Research
|
v
SBPI Scoring (5 dimensions x 17 companies)
| sbpi_to_rdf.py
v
RDF Triples (sbpi.ttl ontology) ----------> Oxigraph Store (2,588 triples)
| |
v v
SPARQL Queries (7 query library) KG Interface (12 params)
| |
v v
Insight Digest (nightly-insights.py) Prediction Engine (multi-signal)
| |
v v
Markdown Reports TPE Optimization (30 trials/night)
| |
v v
insights/ directory best-config.json
| |
+----------- Weekly Editorial ----------+-------+
|
Event Impact (SerpAPI)
|
Defensive BI Agent
| Component | Technology | Role |
|---|---|---|
| RDF Store | Oxigraph (local, port 7878) | SPARQL endpoint for knowledge graph queries |
| Ontology | sbpi.ttl (Turtle/RDF) | 5-dimension scoring schema + attestation model |
| Optimizer | Optuna TPE (Python) | Tree-structured Parzen Estimator for parameter search |
| ETL | Python (sbpi_to_rdf.py) | Scoring data → RDF triples → Oxigraph |
| Research | SerpAPI + Claude CLI | Event research and impact scoring |
| Query Library | SPARQL (.rq files) | 7 analytical queries (movers, anomalies, signals, etc.) |
| Scheduler | Python (weekly-prediction-cycle.py) | 9-phase orchestrator |
| Reporting | Cloudflare Pages | Static editorial sites (sbpi-semantic-layer.pages.dev) |
The Gemini brainstorming session identified three new experiment concepts that extend the current 5-experiment autoresearch expansion plan. These proposals target the "hyper-scale" thesis: proving that ShurIQ's knowledge graph, grown via automated research at $7/report, becomes a moat that compounds independent of client revenue.
Recursive Triple Expansion
Deploy a locally hosted quantized model (Llama-3-70B class) to crawl Common Crawl, Semantic Scholar, and industry-specific feeds 24/7. Each $7 processing run extracts entities, relationships, and ontological tags based on the ShurIQ schema. An "Ontological Referee" agent checks extractions against the existing graph for redundancy or contradictions before committing to the permanent store.
Economics: $7/report × 2,000/month = $14K/month for 200K new nodes/month
Variable: At 100 nodes/briefing extraction density, ~12B nodes in first year at full scale
Self-Consistency Validation
Run thousands of "Self-Consistency" tests: ask the system to solve a problem using its KG (non-parametric, curated) vs. its parametric memory (raw LLM). The delta in accuracy is the "Ontological Premium" — the measurable value of the curated knowledge graph over vanilla LLM output. This turns the knowledge graph from an abstract asset into a quantifiable competitive advantage.
Verification: Karpathy-style self-consistency testing across domains
Output: "Inference Premium" metric — the measurable lift in report quality when using KG vs. raw LLM
Ontological Referee Loop
A two-agent quality gate for the Recursive Triple Expansion pipeline. Agent 1 (the "Slow Processor") extracts entities and relationships. Agent 2 (the "Referee") checks extractions against the existing graph across three dimensions: redundancy (already known), contradiction (conflicts with existing triples), and bridge value (connects previously disconnected clusters). Only high-value nodes that score above a bridge-value threshold get committed to the permanent graph.
Gate Logic: Propose → Cross-Reference → Stack Rank by bridge value → Human approval for top nodes → Commit
Goal: Ensure billions of nodes are signal, not noise
| Metric | Current (Issue 3) | Hyper-Scale Goal | Multiplier |
|---|---|---|---|
| Node Count | 96 nodes / 268 edges | 1B+ nodes | Automated crawling via $7 local slow-model extraction |
| Accuracy | 69.9% (directional) | 85%+ | Experiments 2-5: MOTPE, temporal decay, dimension weights |
| Verticals | 1 (micro-drama) | 10-20 verticals | Experiment 6: cross-vertical transfer (K-Pop next) |
| Internal Reports | ~3/week (manual) | 24,000+/year | $7 per report, fully automated pipeline |
What Transfers (from Micro-Drama)
- direction_threshold: 1.295 — bar for calling directional moves
- confidence_base: 0.443 — default confidence calibration
- mean_reversion_rate: 0.257 — reversion signal strength
- divergence_weight: 0.180 — inter-dimension gap signal
- tier_proximity_weight: 0.096 — boundary effects
- anomaly_contributes: True — anomaly signal activation
- + 6 magnitude/consistency thresholds and bonuses
What's New (K-Pop Specific)
- K-Pop-specific edge types: Fandom metrics, comeback cycles, group/agency relationships
- DART financial data: Korean financial disclosure system for agency revenue
- Sentiment layer: Fan community sentiment from Weverse, Twitter/X, Naver
- Dimension semantics differ: "Distribution" maps to multi-platform presence differently in K-Pop
- Dimension weights will NOT transfer (0.25/0.20/0.20/0.20/0.15 are micro-drama specific)
Copy micro-drama best-config.json
Generate 5-10 neighboring configurations
Seed K-Pop Optuna study via enqueue_trial()
Run TPE on K-Pop data
Ablation study: which parameter subsets transferred
| Criterion | Threshold | Why It Matters |
|---|---|---|
| Trials-to-convergence reduction | ≥ 40% fewer trials than cold-start | Proves the optimizer "remembers" across verticals |
| Ceiling accuracy delta | ≥ 5% higher than cold-start ceiling | Warm-start reaches a better optimum, not just faster |
| Interface param stability | ≤ 20% drift from micro-drama values | Confirms structural parameters are domain-agnostic |
| Dimension weight divergence | Significant divergence expected | Confirms weights are domain-specific (validates the split) |
| Risk | Impact | Mitigation |
|---|---|---|
| Source config is overtuned | Propagates degeneracy to K-Pop | Exp 1 (Goodhart Guard) must clear source config first |
| K-Pop dimension semantics too different | Interface params don't transfer | Ablation study separates interface from dimension parameters |
| Insufficient K-Pop data | Can't evaluate predictions | Need 4+ weeks of K-Pop scoring data before starting |
| Experiments 1-4 not stable on source vertical | Transferring from a moving target | Sequential execution order: Exp 1 → 3 → 4 → 5 → 6 |