Typefully

Major human genetic discoveries made with small sample size

Avatar

Share

 • 

3 years ago

 • 

View on X

While I appreciate the sentiment, I am not a fan of such blind rules. Let me start with a 🧵 of some major human genetic discoveries (many translated to therapeutics) made with very few samples. twitter.com/ewanbirney/status/1628340273594961920?s=20
1. Association of complement factor H with AMD was discovered with mere 96 cases and 50 controls. Today many drug pipelines are in development based on this discovery. science.org/doi/10.1126/science.1109557
2. Association of BCL11A locus with fetal hemoglobin fraction in blood was discovered based on 179 individuals. To date this is the most successful GWAS discovery to therapeutics translation. nature.com/articles/ng2108 twitter.com/doctorveera/status/1335323219151163393?s=20
3. APOL1 association with kidney disease in African Americans was discovered by studying 205 cases and 180 controls. Today there are many drug programs exist for APOL1 kidney disease in African Americans. science.org/doi/10.1126/science.1193032 twitter.com/doctorveera/status/1530624682847657985?s=20
4. CCR5 loss of function variant association with HIV was discovered by screening ~600 HIV exposed sero-negative individuals. twitter.com/doctorveera/status/1409189802168123392?s=20
While I support discouraging underpowered genetic association studies, putting a hard threshold for minimum sample size for genetic association studies isn't fair.
Genetic discoveries in non-European ancestries are picking up only now and we have barely touched the low hanging fruits of non-European studies (from developing and underdeveloped countries), many of which are likely to show up even in small sample sizes.
Also, still there are many rare (ish) or even common diseases that we haven't GWASed yet. So, it's possible there are low hanging fruits for such unexplored diseases/phenotypes that will show up even in small sample size. twitter.com/doctorveera/status/1531333656882475011?s=20
Apart from sample sizes, there are other important factors that strongly influence statistical power: effect size, proximity of the phenotype to DNA, phenotype measurement error.
When the effect sizes are extremely large, genetic discoveries can be made in small sample sizes (which were the cases in examples highlighted above).
Molecular traits (e.g. proteins, metabolites) often lie proximal to DNA and have small measurement errors and so will yield meaningful results even in very small sample size.
Some examples. 1. GWAS of brain cell composition in the brain. A neurodegeneration locus shows up beautifully in a GWAS of just 403 individuals. twitter.com/doctorveera/status/1352680473198268419?s=20
2. GWAS of neural progenitor cell proliferation. Just N=80, GWAS reveals a strong locus. twitter.com/doctorveera/status/1603596231070175236?s=20
3. GWAS of cellular morphological traits in N=207 reveal impressive and interpretable results. twitter.com/doctorveera/status/1613363164007260160?s=20
As we make progress in molecular phenotyping, we will see more of such GWAS leading to important discoveries even in sample sizes <1000.
Instead of putting hard thresholds for sample size, I'd recommend advocating for good study designs, statistical power consideration, replication effort and better phenotyping.
Avatar

Veera Rajagopal

@doctorveera

🇮🇳 MBBS, MD, 🇩🇰 PhD | 🧬 Scientist @ 🇺🇸 Regeneron | Translating genetic insights into life-saving medicines | Weekly thoughts @ gwasstories.com