Six stages, in this order.
The sequence is intentional. UX designer first, engineer second. The surface and its differentiated value are decided BEFORE the data pipeline.
Deciding "how it looks" before "how it is stored" changes the kind of product that comes out the other end. Atalaya is born as a visual experience of memory; the data plumbing comes after and obeys.
Cultural framing and opportunity
The underlying problem is not technical. It is cultural: Colombia operates with institutional amnesia. The records of candidates and officials are forgotten between electoral cycles — the same mistakes repeat, the same actors return.
Living in Japan reframed the perspective: there, collective memory (historical, civic, institutional) is treated as a social asset. It lets a community avoid repeating mistakes, identify those responsible for past harm, and make better decisions over time.
Atalaya is the bounded response: a tool for the Colombian public to recognise the people connected to corruption cases, build awareness across recent periods, and understand that corruption is not the property of one political wing — it is a systemic, interconnected network.
UI sketch and competitive differentiation
The visual-network format is deliberate: as a UX designer, the force graph is the only surface that reveals patterns and relationships between cases, people, and institutions in a way tables and headlines cannot.
Benchmarked against PACO and international peers (OCCRP Aleph, OpenCorporates). The differentiated value: graph navigation as primary, not table navigation.
Features and data flow
Decision D-06: the data source is the traditional Colombian press (El Tiempo, El Espectador, Semana, La Silla Vacía, Vorágine, Cuestión Pública, regional press). NO SECOP. NO Procuraduría. NO judicial bulletins.
Decision D-08: cross-source confrontation replaces the human reviewer. A claim is published when two or more distinct outlets agree on (people + institution + status), with deduplication of wire copies (Colprensa, AP, EFE).
Single-source claims are also published but with a "Single source" badge that signals lower confidence to the user.
AI classification model
Cascade: Haiku 4.5 as the cheap classifier (does this article cover a Colombian corruption case? yes/no, ~USD 0.0001 per article) -> Sonnet 4.6 as the robust extractor returning Pydantic-validated JSON.
Embedding with OpenAI text-embedding-3-small (1536 dimensions) stored in Pinecone. Cosine similarity search: if >= 0.85 -> enrichment of an existing case, not a new node.
Periodic re-scrape: every active case is reviewed once per semester to capture acquittals, new charges and archives. The most recent judicial outcome overrides the status, regardless of the source count.
Tech stack selection
Each layer chosen for operational simplicity and a low cost ceiling (~USD 315/year at MVP volume).
Database: Neon (managed Postgres). Vectors: Pinecone. Queue: Redis + Celery. LLMs: Anthropic + OpenAI. Frontend: Next.js + react-force-graph-3d. Hosting: Vercel.
Brand isolation: Atalaya runs on its own domain (atalaya.com / atalaya.co) with infrastructure separated from B1G Digital — for brand protection and legal isolation.
Visual references and display exploration
References: editorial / visual dossier register · graph exploration tools (Aleph, Linkurious) · technical typography (JetBrains Mono + Poppins).
Visual tone: dark editorial, 1px technical lines, barely perceptible scanlines, signal red for urgency + amber for single source + teal/blue for relationship types.
The initial HTML prototype (`docs/atalaya/demo/index.html`) remains the canonical reference for the visual register; this Next.js MVP ports it to the App Router.