Vectors aren't sci-fi. They're just coordinates in high-dimensional space.
An expert team curates similarity → turns it into a fixed vector V = [0.213, 0.654, 0.012, ..., 0.987] (128–1024 dimensions). Every device embeds its local data into the same space (on-device, raw data never leaves). Query: Use range search to find ALL entries within a tiny radius of V (not top-k, which limits results). Pull the attached metadata (outcome packet). Synthesize locally.
QIS doesn't care if the index is central, distributed, or hybrid. All do exact teleport to the right neighborhood.
No one uses it for medicine. No one uses it to route insights to the exact cohorts that need them. They use vectors for song recommendations, image search, chat retrieval.
QIS just changes the question.
Vector Databases at Scale—Right Now
Who Uses Vector Search Right Now?
This isn't experimental technology. It's production infrastructure.
FAISS embeds billions of faces and images at Meta. Pinecone routes retrieval for LLMs. Milvus powers Salesforce, PayPal, eBay, NVIDIA, and 10,000+ production deployments.
No one embeds expert-curated health templates for outcome routing.
Because health pays for silos. Because "AI doctor" means central model. Because no one noticed the coordinates were already on your phone.
QIS noticed.
The Three Architectures
Here's every implementation path. Conservative numbers (2026 5G ~100-300 Mbps real-world). Limits. Backups. All exact-capable. All scalable.
| Architecture | Latency | Privacy | Best For |
|---|---|---|---|
| Central (Pinecone, Milvus) | Fastest (1-3s for 1K packets) | Lower (trust vendor) | Speed, massive scale |
| Distributed (FAISS on P2P) | Moderate (3-5s for 1K packets) | Highest (no central) | Sovereignty, privacy |
| Hybrid (On-device + cloud) | Best of both | Flexible | Transition, pragmatism |
1. Central Vector Databases Centralized
One cluster holds the full index. Range search for all vectors within tiny radius of V. Engine (IVF or Flat index with range_search) returns ALL matching IDs—not limited to top-k. Server batches or CDN-streams metadata (outcome packets).
Production proof: Pinecone handles 1.4 billion vectors at 5,700 QPS with 26ms P50 latency (December 2025). Milvus powers tens of billions of vectors across 300+ enterprises including Salesforce, PayPal, eBay, and NVIDIA.
Index types: For range search (finding ALL matches), use IVF-Flat or IndexFlat. HNSW is optimized for top-k but some implementations support radius queries. DiskANN for SSD-optimized workloads.
Backups: Built-in replication + multi-region. Fork clusters: ping both, fastest wins.
Privacy note: Outcomes anonymized. Rare buckets (<50 matches) risk re-identification—add min-N guard (return only if ≥50 matches).
Limit: Single vendor/trust. But fastest for massive scale.
2. Distributed Vector Index Decentralized
No central server. Every phone/device runs its own FAISS shard. Range search finds ALL vectors within threshold across the P2P network. Matching vectors cluster at the same address—query that address, get every outcome.
The library behind it: FAISS (Meta) is a library, not a database. It's 8.5× faster than previous best methods, indexes 1.5 trillion vectors at Meta, and powers Milvus and OpenSearch under the hood. GPU acceleration with NVIDIA cuVS achieves 12× faster index builds and 8× faster search.
On-device capability: FAISS supports ARM NEON (mobile CPUs). With product quantization, a phone can index ~2 million vectors in ~500MB-1GB RAM. Beyond that: shard across devices or use laptop.
Backups: 5 leader nodes cache full cluster metadata. Gossip heartbeat sync. Partial results if leaders partial.
Range search support: FAISS range_search on IndexFlat/IVF returns ALL matches within threshold—critical for getting every outcome, not just top-k.
3. Hybrid (On-Device + Cloud Assist) Hybrid
Phone holds local FAISS for recent/relevant. Query first hits local → if <threshold matches, fan-out to central or distributed backup.
Real-world example: Apple Photos uses on-device ML for face clustering + iCloud sync for cross-device search. Same pattern.
Backups: Local + cloud mirror. Auto-fallback.
Limit: Trust split. But practical for transition.
How Vector Search Actually Works
Most vector search uses top-k (return K nearest). QIS needs range search—return ALL vectors within a distance threshold. FAISS supports this via range_search() on Flat and IVF indexes:
The insight is in the metadata. The metadata isn't a pointer to go fetch something. It IS the outcome packet. Range search returns ALL matches—not limited to top-k. Your device collects every matching outcome and synthesizes locally. No second round-trip. No streaming raw data. The bucket is a mailbox with sealed envelopes already inside.
Phone Reality: Compute, Bandwidth, Battery
Can a phone actually do this? Conservative 2026 numbers:
Phone Performance—Conservative Estimates
| Operation | Performance |
|---|---|
| 100,000 packets (~48 MB) | 8–12 seconds (5G) |
| 1 million packets (~488 MB) | 2–3 min (phone) / 20–40s (hospital WiFi) |
| Range search (radius ≈ 0) | Brute on 10K = 2ms. HNSW on 1M = 10ms |
| Synthesis (voting metadata) | Median + count on 1M = ~100ms |
| Battery (500 MB transfer) | ~1–2% drain (less than TikTok scroll) |
Phone hits a wall at ~500 MB sustained? Plug in laptop/tablet. Same code. Ethernet/Wi-Fi. Seconds.
Index size limit: With quantization, ~2M vectors in 500MB-1GB RAM on phone. Beyond that, shard across devices or offload to laptop. FAISS handles this gracefully.
Receipts and Immutability
Same as DHT—no blockchain theater required:
Tamper-proof without gas fees. The signature is unforgeable. The merkle log is append-only. That's enough for audit.
Why Vectors Aren't Saving Lives Already
What Vectors Are Used For
Everyone Else
"What song sounds like this?" → Spotify
"What image looks like this?" → Google Photos
"What document answers this?" → ChatGPT RAG
QIS
"What worked for patients exactly like me?"
"What yield did farms with my exact conditions achieve?"
"What maintenance prevented failure in equipment like mine?"
The technology is identical. The question is different.
No one noticed you could attach outcome packets to vectors instead of song IDs. No one noticed the same infrastructure that recommends your next playlist could route life-saving treatment insights.
Until now.
The Challenge
Can't?
Then vectors aren't the future either. They're the present. Every component exists.
Central for speed. Distributed for sovereignty. Hybrid for today.
Pick one (or combine—see Routing by Similarity for more methods).
The more QIS networks exist, the tighter similarity gets—clusters shrink as networks get more precise. Pulls drop to blinks. The phone—or the laptop—is waiting.