Exaggerating your number of controls or being precise? “Variant not found in over 10,000 chromosomes from EVS…”

Reading through the last release of AJHG I saw a couple papers mention that the putative rare variant they were investigating was “not present in over 10,000 control chromosomes from the EVS”.

My first reaction was, “What? Do they mean the NHLBI 5400 Exome Sequencing Project? They only have 5,400 exomes not over 10,000! I wonder if there is some new control database I wasn’t aware of?” But being in journal-reading mode, I didn’t bother to look it up and went on.

I was pouring orange juice for my two year old this morning and pictures of perfect karyograms with attached diploid chromosomes came to my head for some reason. Aha! They are technically correct, and I suppose being very precise. They could also have said their variant wasn’t heterozygous or homozgyous in any of the the containing diploid chromosomes from the 5,400 exomes from NHLBI 5400ESP project. Or simply “It wasn’t present in over 5,400 exomes from the Exome Variant Server”. With the variant being on an autosomal chromosome, there was indeed two chromosomes where the mutation could have occurred per sample.

I don’t know. It seems a bit weird to claim your number of controls in terms of chromosomes rather than samples.

If you’ve seen this precedent before, I’d love to hear it.

Gabe Rudy

About Gabe Rudy

Meet Gabe Rudy, GHI’s Vice President of Product Development and team member since 2002. Gabe thrives in the dynamic and fast-changing field of bioinformatics and genetic analysis. Leading a killer team of Computer Scientists and Statisticians in building powerful products and providing world-class support, Gabe puts his passion into enabling Golden Helix’s customers to accelerate their research. When not reading or blogging, Gabe enjoys the outdoor Montana lifestyle. But most importantly, Gabe truly loves spending time with his sons, daughter, and wife. Follow Gabe on Twitter @gabeinformatics.
This entry was posted in Paper review. Bookmark the permalink.

5 Responses to Exaggerating your number of controls or being precise? “Variant not found in over 10,000 chromosomes from EVS…”

  1. Bryce Christensen says:

    I’ve often seen similar language, particularly in the context of imputation where the size of the reference dataset is quantified by the number of reference haplotypes, rather than the number of individual subjects used.

  2. R Segurado says:

    Given that chromosomes are the unit of observation in the usual case-control test for genetic association, maybe it’s not that peculiar a way of phrasing it. But it is cheeky.

  3. Jeffrey Rosenfeld says:

    I think this is just another part of the general push to exaggerate the number of samples that were really used in a GWAS-type study. You initially start with 1000 cases and 2000 controls, but because of various problems (genotyping error rate, duplication of samples, cryptic relatedness…) you reduce your numbers down to 800 cases and 1900 controls. Do you report that the study used 3000 samples, or 2700 samples? Either number can technically be termed correct since you did initially investigate 3000 samples, but I would think that the 2700 number is more honest

  4. Dan Gaston says:

    From reading lots of clinical genetics papers recently, in order to determine what the proper number of statistical controls is mainly, it is quite common there to report the number of control chromosomes as it is true sample number for determining the frequency of an observed variant.

  5. Bennett says:

    I blog frequently and I truly appreciate your information.
    This great article has really peaked my interest.

    I am going to bookmark your site and keep checking for new
    information about once per week. I opted in for
    your Feed too.

    Feel free to surf to my homepage prime business address,
    Bennett,

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>