Why this exists
I read that the cost of sequencing a genome has fallen from over a decade and billions of dollars to a couple hundred dollars, and still dropping. That made me want to understand what a sequenced genome is actually good for, now that it’s an accessible pathway for anyone curious about the nuances of their own biology and health. It turns out it is, in fact, helpful, and that a layperson can analyze the raw data themselves. That’s what led me to this project.
What it does
Drop in a DNA file and it reports how your genes shape your response to three common drugs:
- Accepts a 23andMe / AncestryDNA genotyping export (
.txt) or a whole-genome sequencing file (.vcf/.vcf.gz) - Streams the file entirely in the browser. Nothing uploads, so even a multi-gigabyte genome stays on your own machine
- Runs three CPIC drug-gene guidelines: clopidogrel (CYP2C19), warfarin (CYP2C9 + VKORC1), and simvastatin (SLCO1B1)
- Shows the whole chain for each gene: the variants read, the diplotype, the phenotype, and the dosing guidance
- Prints a clean one-page report you can save as a PDF
How it works
- Read: It detects whether the file is a genotyping array or a VCF and streams it line by line, so a whole genome never has to fit in memory
- Decompress: Gzipped genomes are unpacked on the fly. Real ones are multi-member BGZF, which the browser’s built-in decompressor chokes on, so it falls back to a streaming library that handles them
- Match: It keeps only the handful of variants the drug modules need, matching each first by rsID, then by chromosome position plus reference allele if the file isn’t annotated
- Call: Each gene module turns the genotypes into a diplotype, a phenotype, and the matching CPIC guidance
- Report: The result renders to the neon console and to a printable summary
One detail that mattered: the same variant can be reported on either DNA strand, so a 23andMe file and a reference genome can write it with opposite letters. Counting a base or its complement keeps the call correct either way.
What’s next
- Cover more drug-gene pairs. There are dozens of CPIC guidelines beyond the three here, and the module registry is built to take them
- Make the language more accessible. I had to define a few terms to follow my own output
- Keep the jargon-heavy, science-fiction console look, though. I actually like it as it stands
What I learned
This was a good crash course in what matters when you’re reading the presence or absence of a variant. I’m still a beginner in this domain, and a lot of the value came from asking naive questions and watching the answer explain why a particular detail mattered. The other surprise was how far a vibecoded Python script gets you. It’s adequately powerful to analyze a genome as a layperson, which would have been unthinkable two years ago.
Status
Shipped. Live and running in the browser at the link below. The next pass is about breadth and clearer language, not a rebuild.