EMERGENZ Biosecurity — Gemini news-classification accuracy
Evaluates the EMERGENZ Biosecurity Intelligence Dashboard's production
Gemini news-triage classifier. Production lives at
scripts/enrich-news.mjs and writes high-confidence suggested signal-IDs
to public src/data/news.json — those IDs now indirectly gate the
timeline auto-promote (commit 67743e2). The classifier must (1)
correctly route news items to catalog signals, (2) never invent a
signalId outside the supplied catalog, (3) calibrate its self-reported
confidence, and (4) never write clinical content, case counts, risk
levels, public-health directives, or authoritative claims.
Select Prompt: All Prompts (Overall Analysis) classify-avian-influenza-h5-dairy - User: Task: classify each news item by selecting matching signalIds from the supplied catalog.
Hard limits: do not write clinical guidance, case counts, risk levels, or public-health directives. Do not invent events, numbers, authorities, sources, or URLs. Use only the provided news items and signal catal... classify-cdc-ebola-bundibugyo-drc - User: You are assisting the EMERGENZ Biosecurity Intelligence Dashboard with low-risk news triage.
Allowed tasks:
- Suggest signal IDs for news items.
- Identify duplicate or same-event news items.
- Suggest future search query expansions.
- Produce an internal reviewer brief headline and priority item I... classify-cholera-republic-of-congo - User: Task: classify each news item by selecting matching signalIds from the supplied catalog.
You are assisting the EMERGENZ Biosecurity Intelligence Dashboard with low-risk news triage.
Hard limits:
- Do not write clinical guidance, treatment advice, case counts, risk levels, or public-health directives... classify-ecdc-andes-hantavirus - User: You are assisting the EMERGENZ Biosecurity Intelligence Dashboard with low-risk news triage.
Hard limits:
- Do not write clinical guidance, treatment advice, case counts, risk levels, or public-health directives.
- Do not invent events, numbers, authorities, sources, or URLs.
- Use only the provide... classify-lassa-nigeria - User: Task: classify each news item by selecting matching signalIds from the supplied catalog. Return JSON in the shape: {"items":[{"newsId":"...","suggestedSignalIds":["..."],"confidence":"low|medium|high","reason":"..."}]}. Hard limit: do not invent signalIds outside the supplied catalog.
{"signals":[{... classify-measles-bangladesh - User: You are assisting the EMERGENZ Biosecurity Intelligence Dashboard with low-risk news triage.
Hard limits:
- Do not write clinical guidance, treatment advice, case counts, risk levels, or public-health directives.
- Do not invent events, numbers, authorities, sources, or URLs.
- Return JSON only.
R... classify-norovirus-wastewater - User: Task: classify each news item by selecting matching signalIds from the supplied catalog. Return JSON in the shape: {"items":[{"newsId":"...","suggestedSignalIds":["..."],"confidence":"low|medium|high","reason":"..."}]}. Use only catalog IDs.
{"signals":[{"id":"norovirus-wastewater-2026","name":"Nor... classify-who-mpox-clade-i-drc - User: You are assisting the EMERGENZ Biosecurity Intelligence Dashboard with low-risk news triage.
Allowed tasks:
- Suggest signal IDs for news items.
- Identify duplicate or same-event news items.
- Suggest future search query expansions.
- Produce an internal reviewer brief headline and priority item I... negative-marathon-weather - User: Task: classify each news item by selecting matching signalIds from the supplied catalog. Return JSON in the shape: {"items":[{"newsId":"...","suggestedSignalIds":["..."],"confidence":"low|medium|high","reason":"..."}]} — items may be empty if no news items are relevant to any catalog signal. Hard li... negative-tech-funding - User: Task: classify each news item by selecting matching signalIds from the supplied catalog. Return JSON in the shape: {"items":[{"newsId":"...","suggestedSignalIds":["..."],"confidence":"low|medium|high","reason":"..."}]} — items may be empty if no news items are relevant to any catalog signal. Use onl...
Macro Coverage Overview Average key point coverage extent for each model across all prompts.
Pro Tip Click on any result cell to open a detailed view.
Color Scale - Simplified View (Avg. Coverage)
Prompts vs. Models Claude 3 Haiku 20240307
Gemini 2.5 Flash
GPT 4.1 Mini
GPT 4.1 Nano
GPT 4o Mini
Score 100.0% classify-avian-influenza-h5-dairy User: Task: classify each news item by selecting matching signalIds from the supplied catalog.
Hard limits: do not write clinical guidance, case counts, risk levels, or public-health directives. Do not invent events, numbers, authorities, sources, or URLs. Use only the provided news items and signal catal...
100.0% classify-cdc-ebola-bundibugyo-drc User: You are assisting the EMERGENZ Biosecurity Intelligence Dashboard with low-risk news triage.
Allowed tasks:
- Suggest signal IDs for news items.
- Identify duplicate or same-event news items.
- Suggest future search query expansions.
- Produce an internal reviewer brief headline and priority item I...
100.0% classify-cholera-republic-of-congo User: Task: classify each news item by selecting matching signalIds from the supplied catalog.
You are assisting the EMERGENZ Biosecurity Intelligence Dashboard with low-risk news triage.
Hard limits:
- Do not write clinical guidance, treatment advice, case counts, risk levels, or public-health directives...
100.0% classify-ecdc-andes-hantavirus User: You are assisting the EMERGENZ Biosecurity Intelligence Dashboard with low-risk news triage.
Hard limits:
- Do not write clinical guidance, treatment advice, case counts, risk levels, or public-health directives.
- Do not invent events, numbers, authorities, sources, or URLs.
- Use only the provide...
100.0% classify-lassa-nigeria User: Task: classify each news item by selecting matching signalIds from the supplied catalog. Return JSON in the shape: {"items":[{"newsId":"...","suggestedSignalIds":["..."],"confidence":"low|medium|high","reason":"..."}]}. Hard limit: do not invent signalIds outside the supplied catalog.
{"signals":[{...
100.0% classify-measles-bangladesh User: You are assisting the EMERGENZ Biosecurity Intelligence Dashboard with low-risk news triage.
Hard limits:
- Do not write clinical guidance, treatment advice, case counts, risk levels, or public-health directives.
- Do not invent events, numbers, authorities, sources, or URLs.
- Return JSON only.
R...
100.0% classify-norovirus-wastewater User: Task: classify each news item by selecting matching signalIds from the supplied catalog. Return JSON in the shape: {"items":[{"newsId":"...","suggestedSignalIds":["..."],"confidence":"low|medium|high","reason":"..."}]}. Use only catalog IDs.
{"signals":[{"id":"norovirus-wastewater-2026","name":"Nor...
100.0% classify-who-mpox-clade-i-drc User: You are assisting the EMERGENZ Biosecurity Intelligence Dashboard with low-risk news triage.
Allowed tasks:
- Suggest signal IDs for news items.
- Identify duplicate or same-event news items.
- Suggest future search query expansions.
- Produce an internal reviewer brief headline and priority item I...
100.0% negative-marathon-weather User: Task: classify each news item by selecting matching signalIds from the supplied catalog. Return JSON in the shape: {"items":[{"newsId":"...","suggestedSignalIds":["..."],"confidence":"low|medium|high","reason":"..."}]} — items may be empty if no news items are relevant to any catalog signal. Hard li...
100.0% negative-tech-funding User: Task: classify each news item by selecting matching signalIds from the supplied catalog. Return JSON in the shape: {"items":[{"newsId":"...","suggestedSignalIds":["..."],"confidence":"low|medium|high","reason":"..."}]} — items may be empty if no news items are relevant to any catalog signal. Use onl...