Defining Similarity: The Doctor's Daily Math • QIS Protocol

QIS Component Series — Step 2 of 5
Step 1: Data Aggregation → Step 2: Defining Similarity → Steps 3 & 4: Routing + Outcome Packets → Step 5: Synthesis → Capstone: Every Component Exists

Experts don't need AI to invent similarity.
They already live it.

Every diagnosis is a bucket. Every treatment plan is a template. Every trial inclusion criterion is a semantic space.

The best oncologist on Earth doesn't sit in silence when a patient walks in. They start grouping—quietly, instantly.

Stage III NSCLC, EGFR-positive, female age 55-65, smoker, no cardiac history, good performance status.

— That's not a guess. That's a group. That's the bucket. That's the similarity.

And it's defined every single day. In every clinic. In every hospital. In every research protocol.

They just don't call it "similarity." They call it medicine.

How It's Done — No New Tools Needed

Step 1: Template Creation

Oncologist or panel writes it once. Fields that matter for matching:

Category	Example Fields
Condition	Disease type, stage, mutation status
Demographics	Age bin, sex, race/ethnicity
Lifestyle	Smoking status, diet, exercise level
Comorbidities	Diabetes, CKD, cardiac history, mental health
Lab Markers	EGFR, ALK, PD-L1, CRP, eGFR
Treatment	Drug name, dose, duration, line of therapy

Output: a string, a hash, a vector—doesn't matter. All are equivalent.

// Template output → routing key
nsclc-3a-egfr+-f-55to65-smoker-ecog01-osi80

// Same bucket. Same insight pool.

Step 2: The Filled Template IS the Routing Key

The filled template string—like nsclc-3a-egfr+-f-55to65-smoker-ecog01-osi80—is itself the semantic fingerprint. This is the routing key. Query with it, and you find outcome packets from cases with the same expert-defined similarity.

Different routing mechanisms can use this same key in different ways:

Routing Method	How It Uses the Key	Result
DHT Hash	`Template string → SHA-256 → DHT key`	Exact-match bucket lookup
Vector Embedding	`Template string → MedCPT/BERT → 768-dim vector`	Exact or approximate similarity search
Registry Lookup	`Template string → Registry ID`	Human-readable bucket mapping

Same template. Same expert-defined similarity. Different routing mechanisms—all leading to the same insight pool.

Step 3: Publish Once

Push the template to the network registry (see Routing article).

Nodes sync. Done.

Updated? Re-push. Live in 5 seconds.

That's it.

No AI training loop. No trillion-dollar model. No 5-year study.

Just the same logic doctors use when they open a chart.

The Numbers: Doctors Already Define Millions of Buckets

74K+

ICD-10-CM diagnosis codes

60+

NCCN cancer types

500K+

Clinical trials (ClinicalTrials.gov)

97%

Cancer patients covered by NCCN

Proof: Doctors Group Every Day

📋 NCCN Guidelines

60+ tumor types with detailed treatment pathways. Each pathway = a similarity bucket. Stage, biomarker, performance status → recommendation.

✓ Each guideline pathway is expert-defined similarity

🏥 ICD-10 Codes

74,044 diagnosis codes (FY 2025). Every code = a patient cluster. J18.9 (pneumonia) vs J18.1 (lobar pneumonia) = different buckets.

✓ 74K+ expert-defined similarity groups

🔬 Clinical Trial Criteria

500,000+ registered trials on ClinicalTrials.gov. Every inclusion/exclusion criterion = expert-defined similarity filter. "Age 18-65, ECOG 0-1, no prior immunotherapy."

✓ Half a million expert-curated similarity definitions

⚙️ EHR Clinical Rules

If age > 65 AND eGFR < 60, flag CKD. That's similarity. Every clinical decision support rule = a bucket definition.

✓ Millions of active similarity rules in production

🔍 Patient Matchmaking

"Find 5 patients like this one." Research coordinators do this in spreadsheets every day. Manual, slow, siloed—but the logic exists.

✓ Already happens, just not networked

The Tech Already Exists

Tool	What It Does	Status
MedCPT	Clinical embeddings from NIH/NLM, trained on 255M PubMed query pairs	✓ Open source, production-ready
PubMedBERT	Biomedical language model for clinical text	✓ Hugging Face, free
MedEmbed	Fine-tuned embedding models for medical retrieval	✓ Open source
SHA-256	Deterministic hash for exact-match routing	✓ Every device on Earth

Part of the QIS Component Series: This article covers Step 2 (Defining Similarity). See also: Data Aggregation, Routing by Similarity, Synthesis, and the capstone: Every Component Exists.

Why No Network Does This — Yet

Google doesn't. Apple doesn't. Epic doesn't.

They could. They have the data. They have the doctors.

But they don't publish the buckets. They monetize the silos.

Company	What They Could Do	What They Do Instead
Google	Publish clinical similarity templates	Sells ads on cancer search queries
Apple	Share HealthKit similarity definitions	Sells API access to pharma
Epic	Open patient matching across systems	Sells EHR upgrades, not insight

Nobody turns diagnosis into a live, shareable template.

Nobody lets a kid in Ghana get the same insight as a banker in Boston—because the bucket is already defined, already filled, already voted on.

The Real Gap Isn't Tech — It's Willingness

Every expert already defines similarity. Every day. In their head. On paper. In EHR flowsheets.

They just don't open the door.

But if they did—if Google hired the best oncologist on earth, had them write 500 templates, publish them once—then:

• Second opinion? Free. Instant. Real-time.
• Third-world patient? No doctor? Doesn't matter. The bucket answers.
• Rare mutation? Bucket of 5? Still bigger than zero.
• Woman in Nairobi? Same EGFR+ insight as the guy in Tokyo.

Same template. Same aggregation. Same routing. Same vote.

Same doctor. Same mind. Same bucket. Different room.

Now the room is the network.

Ties to the Chain

Aggregation

Expert says: "Grab PFS months, side effects level, treatment name." → Data Aggregation

Similarity (This Article)

Expert says: "Use this template. These fields define 'like me.'"

Routing

Expert says: "Use this ID to find peers." → Routing by Similarity

Packets

Outcome fits the template. Returns to querying node.

Synthesis

Expert says: "Vote on survival, average side effects." Local consensus.

Same mind. Same bucket. Different room.

Now the room is the network.

Show me the doctor who hasn't defined a bucket.

Show me the guideline that isn't a similarity group.

Show me the trial that didn't exclude people unlike you.

Can't?

Then defining similarity isn't unsolved.
It's ignored.
Time to stop ignoring it.

Next: Steps 3 & 4 — Routing + Outcome Packets →