QIS Component Series — Step 2 of 5
Step 1: Data Aggregation →
Step 2: Defining Similarity →
Steps 3 & 4: Routing + Outcome Packets →
Step 5: Synthesis →
Capstone: Every Component Exists
Experts don't need AI to invent similarity.
They already live it.
Every diagnosis is a bucket. Every treatment plan is a template. Every trial inclusion criterion is a semantic space.
The best oncologist on Earth doesn't sit in silence when a patient walks in. They start grouping—quietly, instantly.
Stage III NSCLC, EGFR-positive, female age 55-65, smoker, no cardiac history, good performance status.
And it's defined every single day. In every clinic. In every hospital. In every research protocol.
They just don't call it "similarity." They call it medicine.
How It's Done — No New Tools Needed
Step 1: Template Creation
Oncologist or panel writes it once. Fields that matter for matching:
| Category | Example Fields |
|---|---|
| Condition | Disease type, stage, mutation status |
| Demographics | Age bin, sex, race/ethnicity |
| Lifestyle | Smoking status, diet, exercise level |
| Comorbidities | Diabetes, CKD, cardiac history, mental health |
| Lab Markers | EGFR, ALK, PD-L1, CRP, eGFR |
| Treatment | Drug name, dose, duration, line of therapy |
Output: a string, a hash, a vector—doesn't matter. All are equivalent.
// Template output → routing key nsclc-3a-egfr+-f-55to65-smoker-ecog01-osi80 // Same bucket. Same insight pool.
Step 2: The Filled Template IS the Routing Key
The filled template string—like nsclc-3a-egfr+-f-55to65-smoker-ecog01-osi80—is itself the semantic fingerprint. This is the routing key. Query with it, and you find outcome packets from cases with the same expert-defined similarity.
Different routing mechanisms can use this same key in different ways:
| Routing Method | How It Uses the Key | Result |
|---|---|---|
| DHT Hash | Template string → SHA-256 → DHT key |
Exact-match bucket lookup |
| Vector Embedding | Template string → MedCPT/BERT → 768-dim vector |
Exact or approximate similarity search |
| Registry Lookup | Template string → Registry ID |
Human-readable bucket mapping |
Same template. Same expert-defined similarity. Different routing mechanisms—all leading to the same insight pool.
Step 3: Publish Once
Push the template to the network registry (see Routing article).
Nodes sync. Done.
Updated? Re-push. Live in 5 seconds.
No AI training loop. No trillion-dollar model. No 5-year study.
Just the same logic doctors use when they open a chart.
The Numbers: Doctors Already Define Millions of Buckets
Proof: Doctors Group Every Day
📋 NCCN Guidelines
60+ tumor types with detailed treatment pathways. Each pathway = a similarity bucket. Stage, biomarker, performance status → recommendation.
🏥 ICD-10 Codes
74,044 diagnosis codes (FY 2025). Every code = a patient cluster. J18.9 (pneumonia) vs J18.1 (lobar pneumonia) = different buckets.
🔬 Clinical Trial Criteria
500,000+ registered trials on ClinicalTrials.gov. Every inclusion/exclusion criterion = expert-defined similarity filter. "Age 18-65, ECOG 0-1, no prior immunotherapy."
⚙️ EHR Clinical Rules
If age > 65 AND eGFR < 60, flag CKD. That's similarity. Every clinical decision support rule = a bucket definition.
🔍 Patient Matchmaking
"Find 5 patients like this one." Research coordinators do this in spreadsheets every day. Manual, slow, siloed—but the logic exists.
The Tech Already Exists
| Tool | What It Does | Status |
|---|---|---|
| MedCPT | Clinical embeddings from NIH/NLM, trained on 255M PubMed query pairs | ✓ Open source, production-ready |
| PubMedBERT | Biomedical language model for clinical text | ✓ Hugging Face, free |
| MedEmbed | Fine-tuned embedding models for medical retrieval | ✓ Open source |
| SHA-256 | Deterministic hash for exact-match routing | ✓ Every device on Earth |
Why No Network Does This — Yet
Google doesn't. Apple doesn't. Epic doesn't.
They could. They have the data. They have the doctors.
But they don't publish the buckets. They monetize the silos.
| Company | What They Could Do | What They Do Instead |
|---|---|---|
| Publish clinical similarity templates | Sells ads on cancer search queries | |
| Apple | Share HealthKit similarity definitions | Sells API access to pharma |
| Epic | Open patient matching across systems | Sells EHR upgrades, not insight |
Nobody turns diagnosis into a live, shareable template.
Nobody lets a kid in Ghana get the same insight as a banker in Boston—because the bucket is already defined, already filled, already voted on.
Every expert already defines similarity. Every day. In their head. On paper. In EHR flowsheets.
They just don't open the door.
But if they did—if Google hired the best oncologist on earth, had them write 500 templates, publish them once—then:
• Second opinion? Free. Instant. Real-time.
• Third-world patient? No doctor? Doesn't matter. The bucket answers.
• Rare mutation? Bucket of 5? Still bigger than zero.
• Woman in Nairobi? Same EGFR+ insight as the guy in Tokyo.
Same template. Same aggregation. Same routing. Same vote.
Same doctor. Same mind. Same bucket. Different room.
Now the room is the network.
Ties to the Chain
Aggregation
Expert says: "Grab PFS months, side effects level, treatment name." → Data Aggregation
Similarity (This Article)
Expert says: "Use this template. These fields define 'like me.'"
Routing
Expert says: "Use this ID to find peers." → Routing by Similarity
Packets
Outcome fits the template. Returns to querying node.
Synthesis
Expert says: "Vote on survival, average side effects." Local consensus.
Same mind. Same bucket. Different room.
Now the room is the network.
Show me the doctor who hasn't defined a bucket.
Show me the guideline that isn't a similarity group.
Show me the trial that didn't exclude people unlike you.
Can't?
Then defining similarity isn't unsolved.
It's ignored.
Time to stop ignoring it.