QIS Component Deep Dive

Defining Similarity: The Doctor's Daily Math

Already Done, Just Not Shared

By Christopher Thomas Trevethan · January 15, 2026

QIS Component Series — Step 2 of 5
Step 1: Data AggregationStep 2: Defining SimilaritySteps 3 & 4: Routing + Outcome PacketsStep 5: SynthesisCapstone: Every Component Exists

Experts don't need AI to invent similarity.
They already live it.

Every diagnosis is a bucket. Every treatment plan is a template. Every trial inclusion criterion is a semantic space.

The best oncologist on Earth doesn't sit in silence when a patient walks in. They start grouping—quietly, instantly.

Stage III NSCLC, EGFR-positive, female age 55-65, smoker, no cardiac history, good performance status.

— That's not a guess. That's a group. That's the bucket. That's the similarity.

And it's defined every single day. In every clinic. In every hospital. In every research protocol.

They just don't call it "similarity." They call it medicine.

How It's Done — No New Tools Needed

Step 1: Template Creation

Oncologist or panel writes it once. Fields that matter for matching:

Category Example Fields
Condition Disease type, stage, mutation status
Demographics Age bin, sex, race/ethnicity
Lifestyle Smoking status, diet, exercise level
Comorbidities Diabetes, CKD, cardiac history, mental health
Lab Markers EGFR, ALK, PD-L1, CRP, eGFR
Treatment Drug name, dose, duration, line of therapy

Output: a string, a hash, a vector—doesn't matter. All are equivalent.

// Template output → routing key
nsclc-3a-egfr+-f-55to65-smoker-ecog01-osi80

// Same bucket. Same insight pool.

Step 2: The Filled Template IS the Routing Key

The filled template string—like nsclc-3a-egfr+-f-55to65-smoker-ecog01-osi80—is itself the semantic fingerprint. This is the routing key. Query with it, and you find outcome packets from cases with the same expert-defined similarity.

Different routing mechanisms can use this same key in different ways:

Routing Method How It Uses the Key Result
DHT Hash Template string → SHA-256 → DHT key Exact-match bucket lookup
Vector Embedding Template string → MedCPT/BERT → 768-dim vector Exact or approximate similarity search
Registry Lookup Template string → Registry ID Human-readable bucket mapping

Same template. Same expert-defined similarity. Different routing mechanisms—all leading to the same insight pool.

Step 3: Publish Once

Push the template to the network registry (see Routing article).

Nodes sync. Done.

Updated? Re-push. Live in 5 seconds.

That's it.

No AI training loop. No trillion-dollar model. No 5-year study.

Just the same logic doctors use when they open a chart.

The Numbers: Doctors Already Define Millions of Buckets

74K+
ICD-10-CM diagnosis codes
60+
NCCN cancer types
500K+
Clinical trials (ClinicalTrials.gov)
97%
Cancer patients covered by NCCN

Proof: Doctors Group Every Day

📋 NCCN Guidelines

60+ tumor types with detailed treatment pathways. Each pathway = a similarity bucket. Stage, biomarker, performance status → recommendation.

✓ Each guideline pathway is expert-defined similarity

🏥 ICD-10 Codes

74,044 diagnosis codes (FY 2025). Every code = a patient cluster. J18.9 (pneumonia) vs J18.1 (lobar pneumonia) = different buckets.

✓ 74K+ expert-defined similarity groups

🔬 Clinical Trial Criteria

500,000+ registered trials on ClinicalTrials.gov. Every inclusion/exclusion criterion = expert-defined similarity filter. "Age 18-65, ECOG 0-1, no prior immunotherapy."

✓ Half a million expert-curated similarity definitions

⚙️ EHR Clinical Rules

If age > 65 AND eGFR < 60, flag CKD. That's similarity. Every clinical decision support rule = a bucket definition.

✓ Millions of active similarity rules in production

🔍 Patient Matchmaking

"Find 5 patients like this one." Research coordinators do this in spreadsheets every day. Manual, slow, siloed—but the logic exists.

✓ Already happens, just not networked

The Tech Already Exists

Tool What It Does Status
MedCPT Clinical embeddings from NIH/NLM, trained on 255M PubMed query pairs ✓ Open source, production-ready
PubMedBERT Biomedical language model for clinical text ✓ Hugging Face, free
MedEmbed Fine-tuned embedding models for medical retrieval ✓ Open source
SHA-256 Deterministic hash for exact-match routing ✓ Every device on Earth
Part of the QIS Component Series: This article covers Step 2 (Defining Similarity). See also: Data Aggregation, Routing by Similarity, Synthesis, and the capstone: Every Component Exists.

Why No Network Does This — Yet

Google doesn't. Apple doesn't. Epic doesn't.

They could. They have the data. They have the doctors.

But they don't publish the buckets. They monetize the silos.

Company What They Could Do What They Do Instead
Google Publish clinical similarity templates Sells ads on cancer search queries
Apple Share HealthKit similarity definitions Sells API access to pharma
Epic Open patient matching across systems Sells EHR upgrades, not insight

Nobody turns diagnosis into a live, shareable template.

Nobody lets a kid in Ghana get the same insight as a banker in Boston—because the bucket is already defined, already filled, already voted on.

The Real Gap Isn't Tech — It's Willingness

Every expert already defines similarity. Every day. In their head. On paper. In EHR flowsheets.

They just don't open the door.

But if they did—if Google hired the best oncologist on earth, had them write 500 templates, publish them once—then:

• Second opinion? Free. Instant. Real-time.
• Third-world patient? No doctor? Doesn't matter. The bucket answers.
• Rare mutation? Bucket of 5? Still bigger than zero.
• Woman in Nairobi? Same EGFR+ insight as the guy in Tokyo.

Same template. Same aggregation. Same routing. Same vote.

Same doctor. Same mind. Same bucket. Different room.

Now the room is the network.

Ties to the Chain

1

Aggregation

Expert says: "Grab PFS months, side effects level, treatment name." → Data Aggregation

2

Similarity (This Article)

Expert says: "Use this template. These fields define 'like me.'"

3

Routing

Expert says: "Use this ID to find peers." → Routing by Similarity

4

Packets

Outcome fits the template. Returns to querying node.

5

Synthesis

Expert says: "Vote on survival, average side effects." Local consensus.

Same mind. Same bucket. Different room.

Now the room is the network.

Show me the doctor who hasn't defined a bucket.

Show me the guideline that isn't a similarity group.

Show me the trial that didn't exclude people unlike you.

Can't?

Then defining similarity isn't unsolved.
It's ignored.
Time to stop ignoring it.

Next: Steps 3 & 4 — Routing + Outcome Packets →

Subscribe on Substack ← Data Aggregation Routing Methods → Back to Articles