Technical Deep Dive

Vectors: From Central Servers to Phone Swarms

Vector databases power Spotify, ChatGPT, and image search. QIS just changes the question from "what song next?" to "what worked for people exactly like me?"

By Christopher Thomas Trevethan · January 16, 2026

Vectors aren't sci-fi. They're just coordinates in high-dimensional space.

An expert team curates similarity → turns it into a fixed vector V = [0.213, 0.654, 0.012, ..., 0.987] (128–1024 dimensions). Every device embeds its local data into the same space (on-device, raw data never leaves). Query: Use range search to find ALL entries within a tiny radius of V (not top-k, which limits results). Pull the attached metadata (outcome packet). Synthesize locally.

QIS doesn't care if the index is central, distributed, or hybrid. All do exact teleport to the right neighborhood.

No one uses it for medicine. No one uses it to route insights to the exact cohorts that need them. They use vectors for song recommendations, image search, chat retrieval.

QIS just changes the question.

Vector Databases at Scale—Right Now

1.5T
vectors indexed at Meta (FAISS)
1.4B
vectors at 5,700 QPS (Pinecone)
40K+
GitHub stars (Milvus)
8.5×
faster than previous best (FAISS)

Who Uses Vector Search Right Now?

This isn't experimental technology. It's production infrastructure.

🎵
Spotify
Song recommendations
🤖
ChatGPT
RAG retrieval
📷
Google Photos
Image similarity
🛒
Amazon
Product search
📺
Netflix
Content matching
💼
Salesforce
Enterprise AI (Milvus)

FAISS embeds billions of faces and images at Meta. Pinecone routes retrieval for LLMs. Milvus powers Salesforce, PayPal, eBay, NVIDIA, and 10,000+ production deployments.

No one embeds expert-curated health templates for outcome routing.

Because health pays for silos. Because "AI doctor" means central model. Because no one noticed the coordinates were already on your phone.

QIS noticed.

The Three Architectures

Here's every implementation path. Conservative numbers (2026 5G ~100-300 Mbps real-world). Limits. Backups. All exact-capable. All scalable.

Architecture Latency Privacy Best For
Central (Pinecone, Milvus) Fastest (1-3s for 1K packets) Lower (trust vendor) Speed, massive scale
Distributed (FAISS on P2P) Moderate (3-5s for 1K packets) Highest (no central) Sovereignty, privacy
Hybrid (On-device + cloud) Best of both Flexible Transition, pragmatism

1. Central Vector Databases Centralized

One cluster holds the full index. Range search for all vectors within tiny radius of V. Engine (IVF or Flat index with range_search) returns ALL matching IDs—not limited to top-k. Server batches or CDN-streams metadata (outcome packets).

1,000 packets
1–3 seconds
100,000 packets
4–7 seconds
1 million packets
20–40 seconds

Production proof: Pinecone handles 1.4 billion vectors at 5,700 QPS with 26ms P50 latency (December 2025). Milvus powers tens of billions of vectors across 300+ enterprises including Salesforce, PayPal, eBay, and NVIDIA.

Index types: For range search (finding ALL matches), use IVF-Flat or IndexFlat. HNSW is optimized for top-k but some implementations support radius queries. DiskANN for SSD-optimized workloads.

Backups: Built-in replication + multi-region. Fork clusters: ping both, fastest wins.

Privacy note: Outcomes anonymized. Rare buckets (<50 matches) risk re-identification—add min-N guard (return only if ≥50 matches).

Limit: Single vendor/trust. But fastest for massive scale.

2. Distributed Vector Index Decentralized

No central server. Every phone/device runs its own FAISS shard. Range search finds ALL vectors within threshold across the P2P network. Matching vectors cluster at the same address—query that address, get every outcome.

1,000 packets
3–5 seconds
100,000 packets
7–10 seconds
1 million packets
90s–2 min (phone) / 30s (laptop)

The library behind it: FAISS (Meta) is a library, not a database. It's 8.5× faster than previous best methods, indexes 1.5 trillion vectors at Meta, and powers Milvus and OpenSearch under the hood. GPU acceleration with NVIDIA cuVS achieves 12× faster index builds and 8× faster search.

On-device capability: FAISS supports ARM NEON (mobile CPUs). With product quantization, a phone can index ~2 million vectors in ~500MB-1GB RAM. Beyond that: shard across devices or use laptop.

Backups: 5 leader nodes cache full cluster metadata. Gossip heartbeat sync. Partial results if leaders partial.

Range search support: FAISS range_search on IndexFlat/IVF returns ALL matches within threshold—critical for getting every outcome, not just top-k.

3. Hybrid (On-Device + Cloud Assist) Hybrid

Phone holds local FAISS for recent/relevant. Query first hits local → if <threshold matches, fan-out to central or distributed backup.

Common patterns
Instant (local)
Rare patterns
Cloud fallback (seconds)
Privacy
Configurable per query

Real-world example: Apple Photos uses on-device ML for face clustering + iCloud sync for cross-device search. Same pattern.

Backups: Local + cloud mirror. Auto-fallback.

Limit: Trust split. But practical for transition.

How Vector Search Actually Works

Most vector search uses top-k (return K nearest). QIS needs range search—return ALL vectors within a distance threshold. FAISS supports this via range_search() on Flat and IVF indexes:

// Expert-curated template → vector patient_vector = embed({ disease: "colorectal_cancer", stage: 3, kras: 1, msi: 0, cea: 42.3, age: 67 }) // → [0.82, 0.15, 0.91, 0.03, ...] (768 dimensions) // Range search: Find ALL vectors within distance threshold // Unlike top-k search (which returns fixed k results), // range_search returns EVERY match—crucial for outcomes lims, D, I = index.range_search(patient_vector, radius=0.01) // What comes back: { "vector": [0.82, 0.15, 0.91, ...], "metadata": { "treatment": "FOLFOX + Bevacizumab", "outcome": "progression_free", "duration_months": 18, "confidence": 0.94 } }

The insight is in the metadata. The metadata isn't a pointer to go fetch something. It IS the outcome packet. Range search returns ALL matches—not limited to top-k. Your device collects every matching outcome and synthesizes locally. No second round-trip. No streaming raw data. The bucket is a mailbox with sealed envelopes already inside.

Phone Reality: Compute, Bandwidth, Battery

Can a phone actually do this? Conservative 2026 numbers:

Phone Performance—Conservative Estimates

Operation Performance
100,000 packets (~48 MB) 8–12 seconds (5G)
1 million packets (~488 MB) 2–3 min (phone) / 20–40s (hospital WiFi)
Range search (radius ≈ 0) Brute on 10K = 2ms. HNSW on 1M = 10ms
Synthesis (voting metadata) Median + count on 1M = ~100ms
Battery (500 MB transfer) ~1–2% drain (less than TikTok scroll)

Phone hits a wall at ~500 MB sustained? Plug in laptop/tablet. Same code. Ethernet/Wi-Fi. Seconds.

Index size limit: With quantization, ~2M vectors in 500MB-1GB RAM on phone. Beyond that, shard across devices or offload to laptop. FAISS handles this gracefully.

Receipts and Immutability

Same as DHT—no blockchain theater required:

// Per-packet signature metadata.signature = Ed25519.sign(outcome_packet, private_key) // Local merkle log merkle_root = hash(packet_1 || packet_2 || ... || packet_n) // Gossip to peers (distributed) or notary (central) broadcast(merkle_root) // Chain optional for legal audit

Tamper-proof without gas fees. The signature is unforgeable. The merkle log is append-only. That's enough for audit.

Why Vectors Aren't Saving Lives Already

What Vectors Are Used For

Everyone Else

"What song sounds like this?" → Spotify

"What image looks like this?" → Google Photos

"What document answers this?" → ChatGPT RAG

QIS

"What worked for patients exactly like me?"

"What yield did farms with my exact conditions achieve?"

"What maintenance prevented failure in equipment like mine?"

The technology is identical. The question is different.

No one noticed you could attach outcome packets to vectors instead of song IDs. No one noticed the same infrastructure that recommends your next playlist could route life-saving treatment insights.

Until now.

The Challenge

Show me the vector that can't attach outcome metadata.
Show me the phone that chokes on 1M range-search matches (or just plug in a laptop—same code).
Show me the index that can't do range search on-device (FAISS range_search runs locally).
Show me the pull too big for hospital Wi-Fi.

Can't?

Then vectors aren't the future either. They're the present. Every component exists.

Central for speed. Distributed for sovereignty. Hybrid for today.

Pick one (or combine—see Routing by Similarity for more methods).

The more QIS networks exist, the tighter similarity gets—clusters shrink as networks get more precise. Pulls drop to blinks. The phone—or the laptop—is waiting.

Subscribe on Substack The Three Elections 11 Flips Every Component Exists DHT Deep Dive All Articles