Genome Decoder | Sean Francis Kelley

Why this exists

I read that the cost of sequencing a genome has fallen from over a decade and billions of dollars to a couple hundred dollars, and still dropping. That made me want to understand what a sequenced genome is actually good for, now that it’s an accessible pathway for anyone curious about the nuances of their own biology and health. It turns out it is, in fact, helpful, and that a layperson can analyze the raw data themselves. That’s what led me to this project.

What it does

Drop in a DNA file and it reports how your genes shape your response to three common drugs:

Accepts a 23andMe / AncestryDNA genotyping export (.txt) or a whole-genome sequencing file (.vcf / .vcf.gz)
Streams the file entirely in the browser. Nothing uploads, so even a multi-gigabyte genome stays on your own machine
Runs three CPIC drug-gene guidelines: clopidogrel (CYP2C19), warfarin (CYP2C9 + VKORC1), and simvastatin (SLCO1B1)
Shows the whole chain for each gene: the variants read, the diplotype, the phenotype, and the dosing guidance
Prints a clean one-page report you can save as a PDF

How it works

Read: It detects whether the file is a genotyping array or a VCF and streams it line by line, so a whole genome never has to fit in memory
Decompress: Gzipped genomes are unpacked on the fly. Real ones are multi-member BGZF, which the browser’s built-in decompressor chokes on, so it falls back to a streaming library that handles them
Match: It keeps only the handful of variants the drug modules need, matching each first by rsID, then by chromosome position plus reference allele if the file isn’t annotated
Call: Each gene module turns the genotypes into a diplotype, a phenotype, and the matching CPIC guidance
Report: The result renders to the neon console and to a printable summary

One detail that mattered: the same variant can be reported on either DNA strand, so a 23andMe file and a reference genome can write it with opposite letters. Counting a base or its complement keeps the call correct either way.

What’s next

Cover more drug-gene pairs. There are dozens of CPIC guidelines beyond the three here, and the module registry is built to take them
Make the language more accessible. I had to define a few terms to follow my own output
Keep the jargon-heavy, science-fiction console look, though. I actually like it as it stands

What I learned

This was a good crash course in what matters when you’re reading the presence or absence of a variant. I’m still a beginner in this domain, and a lot of the value came from asking naive questions and watching the answer explain why a particular detail mattered. The other surprise was how far a vibecoded Python script gets you. It’s adequately powerful to analyze a genome as a layperson, which would have been unthinkable two years ago.

Status

Shipped. Live and running in the browser at the link below. The next pass is about breadth and clearer language, not a rebuild.