More

eweitz · 2026-04-30T08:23:29 1777537409

"RP11" is that man from Buffalo who comprises 74% of the human reference genome [1].

[1] https://undark.org/2024/07/09/informed-consent-human-genome-...

eweitz · 2026-04-30T08:07:48 1777536468

Yes. For folks looking for more:

* Celera genome, first published 2004: https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_000002115.1...

* Human reference genome, first published 2001 and most recently updated in 2022: https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_000001405.4...

eweitz · on June 11, 2022

User interfaces for biology have drastically improved over the last 10 years.

Domain-specific tools like genome browsers, protein viewers, or phylogenetic explorers [1-3] almost all look and feel a lot better than they did in 2012.

The biggest exception here is UCSC Genome Browser, which has an old-school design and web technology stack. That said, it's steadily added features over the years, has substantially sleekened UX in its periphery, and remains widely used.

There are also bespoke visual design resources for biology applications that are good and getting better, like BioRender and PhyloPic [4-5]. There are multi-tiered packages like Dash Bio that wrap biology components together [6]. There's a Blender biology community, too!

---

1. Genome browsers and components: https://jbrowse.org/jb2/, https://www.ncbi.nlm.nih.gov/genome/gdv, https://igv.org/app, https://eweitz.github.io/ideogram

2. Protein viewers: https://pymol.org/, https://nglviewer.org/ngl/

3. Phylogenetic explorers: https://clades.nextstrain.org/

4. https://biorender.com/

5. http://phylopic.org/

6. https://github.com/plotly/dash-bio, https://dash.gallery/Portal/?search=[Pharma]

eweitz · on April 6, 2022

https://eweitz.github.io/ideogram/related-genes - gene search recommendation engine paired with a web component for genome visualization

eweitz · on Nov 23, 2021

I'm more interested in read speed than write speed. I have about 2 MB of data that I fetch, parse and transform into a nested object for easy look-up by various types of keys. It consists of 6 other objects, and I'd guess it's < 50 MB in total size.

In my brief experiment, it was 12% faster to read from the web Cache API [1], re-parse and re-transform that nested object than to read the fully transformed object using IndexedDB via idb-keyval [2]. That surprised me! I went on to learn that IndexedDB does a structured clone as part of such reads, which I suspect is the main cause of slowness in my use case.

Related commits to reproduce that finding are in [3], specifically [4].

[1] https://developer.mozilla.org/en-US/docs/Web/API/Cache

[2] https://github.com/jakearchibald/idb-keyval

[3] https://github.com/eweitz/ideogram/pull/285

[4] https://github.com/eweitz/ideogram/pull/285/commits/90e374a0...

eweitz · on June 18, 2021

Some notes towards those ends:

WikiPathways supports advanced queries via their SPARQL API and UI. See [1] and [2]. I find WikiPathways nice because it lets logged-in users create and edit pathways, with a low barrier to entry.

I've been building a way to find related genes using biochemical pathways [3]. The source code linked there includes practical examples for fetching information on genes in those pathways, which you rightly note is needed for something compelling. That and other code there might help spark ideas for you on how to glue together various biochemistry and molecular biology APIs to achieve your vision.

I'm currently working on a way to drastically expand the set of organisms and pathways covered by WikiPathways. Yeast has 66 pathways there, compared to 1319 for human. By doing fast ortholog detection at runtime (using another SPARQL API, provided by OrthoDB [4]) I'm hoping to be able to convert relevant annotated pathways across organisms, e.g. human to yeast, mouse to rat, Arabidopsis to rice -- and vice versa.

[1] http://sparql.wikipathways.org

[2] https://www.wikipathways.org/index.php/Help:WikiPathways_Spa...

[3] https://eweitz.github.io/ideogram/related-genes?q=RAD51&org=...

[4] https://sparql.orthodb.org

eweitz · on May 16, 2021

I created Ideogram.js, a JavaScript library for chromosome visualization [1].

Ideogram supports genomic views to research and report findings on cancer, clinical variants, gene expression, evolution, agriculture, and more [2]. What previously existed for genome visualization was either focused on short genomic regions (e.g. genes) or complex to set up and maintain.

[1]: https://github.com/eweitz/ideogram

[2]: https://eweitz.github.io/ideogram

eweitz · on June 8, 2019

Large genetic datasets yield medical progress by increasing statistical power of tests. These better tests enable earlier and more targeted treatment.

Take heart disease. It has a significant but complex genetic component. Many genetic variants each contribute a small amount to risk for heart disease. If a given person has many small risk variants, the sum total risk -- often called "polygenic score" -- can be relatively high.

People in the top 8% of polygenic scores had a 3x higher risk for heart disease than the general population [1][2]. Through techniques like polygenic scoring, large genetic datasets enable uniquely early detection of high risk for the world's leading cause of death.

[1] https://www.nature.com/articles/s41588-018-0183-z

[2] https://www.vox.com/science-and-health/2018/8/24/17759772/ge...

maxheadroom · on June 9, 2019

Ah, yes, polygenic scoring and the heart disease increase rate. Did you happen to catch this[0] refutation of the single source's work?

[0] - https://twitter.com/cecilejanssens/status/103135930540723404...

Also, you're posing prediction and targeted treatment but you haven't posited how Bob's mapped genome sitting in 23andme will be used for medical treatment.

As we know, genes are not an emphatic, "this will happen to you," but an increase in likelihood; which still doesn't translate to any emphatic treatments from the genes, themselves, yeah?

eweitz · on June 9, 2019

> refutation of the single source's work

Janssens seems less skeptical of 23andMe's paper on polygenic score for type 2 diabetes [1][2], which -- interestingly -- positively cites the Khera 2018 paper on polygenic score for heart disease that she critiqued. Some researchers are skeptical, but the medical community generally seems to consider polygenic scores promising for tests [3][4].

> you haven't posited how Bob's mapped genome sitting in 23andme will be used for medical treatment.

Early intervention. Polygenic scores could be used for medical treatment by motivating earlier intervention. That could include stronger recommendations for better diet and exercise, closer monitoring programs, or more precise prescriptions. That, in turn, could reduce disease burden.

[1] https://twitter.com/cecilejanssens/status/113707970323438797...

[2] https://permalinks.23andme.com/pdf/23_19-Type2Diabetes_March...

[3] https://twitter.com/EricTopol/status/1129780543434964993

[4] https://journals.plos.org/plosmedicine/article?id=10.1371/jo...

maxheadroom · on June 9, 2019

>...could be used for medical treatment by motivating earlier intervention...

and

>...could include stronger recommendations for better diet and exercise, closer monitoring programs, or more precise prescriptions...

and

>...could reduce disease burden...

This is where the problem delineates for me: We're being massively assumptive in moving from "could" to "is" and "will".

I will, generally, concede the could portion but to assert that it is emphatically happening or going to happen is still far from fruition and to label this science as such, just yet, is overreaching and giving false hope where none should really be given because, then, you'll taint it's benefits with the drawbacks.

Remember: Anonymised data (e.g.: 23andme) only allows a survey of what's relatively known or can be inferred from the anonymised dataset.

To arrive at what you're suggesting, it would have to move into a different realm (I believe), like UK BioBank or GEDMatch but, even then, we're still basing things on speculative science - gambles of percentages that aren't, emphatically, true or false but a kind of "maybe, kind of, sort of, in a way, definitely could or defintely could not" muddied waters.

That, to me, is a far stretch from saying that the data in 23andme is - actually - helping medicine; which I believe is what the OC I replied to emphatically said.

eweitz · on June 8, 2019

UK Biobank has a smaller cohort but deeper and more reliable data.

All UK Biobank participants were tested for blood pressure, bone mineral density, grip strength, BMI, etc. The last 200,000 participants underwent detailed tests of cognitive function [1].

23andMe asks customers for health information, but self-reports are not usually as reliable as the clinician-administered tests done in UK Biobank.

[1] https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6451771/

eweitz · on June 8, 2019

23andMe is CLIA certified and FDA authorized, and makes many medical claims. From [1]:

"CLIA certification and CAP accreditation 23andMe laboratory testing is done in U.S. laboratories certified to meet CLIA (Clinical Laboratory Improvement Amendments of 1988) standards, including qualifications for individuals performing testing and other standards to ensure the accuracy and reliability of results. The laboratory is also accredited by the College of American Pathologists (CAP), which has served as a model for various federal, state, and private laboratory accreditation programs throughout the world."

FDA authorizations for 23andMe's personal genome service are available online, e.g. [2] for Alzheimer's disease risk reporting based on the E4 variant of the APOE gene.

The company also offers ancestry reports, which are not clinical and thus covered by CLIA. But medical claims in 23andMe's health reports do comply with CLIA and other regulations.

1. https://medical.23andme.com/dna-kits/#clia

2. https://www.accessdata.fda.gov/cdrh_docs/pdf16/DEN160026.pdf

astazangasta · on June 8, 2019

Hmm, news to me. In any case the point still stands that you can do SNP sequencing without being CLIA certified.