Highly variable penetrance of abnormal phenotypes in embryonic lethal knockout mice

Background: Identifying genes that are essential for mouse embryonic development and survival through term is a powerful and unbiased way to discover possible genetic determinants of human developmental disorders. Characterising the changes in mouse embryos that result from ablation of lethal genes is a necessary first step towards uncovering their role in normal embryonic development and establishing any correlates amongst human congenital abnormalities. Methods: Here we present results gathered to date in the Deciphering the Mechanisms of Developmental Disorders (DMDD) programme, cataloguing the morphological defects identified from comprehensive imaging of 220 homozygous mutant and 114 wild type embryos from 42 lethal and subviable lines, analysed at E14.5. Results: Virtually all mutant embryos show multiple abnormal phenotypes and amongst the 42 lines these affect most organ systems. Within each mutant line, the phenotypes of individual embryos form distinct but overlapping sets. Subcutaneous edema, malformations of the heart or great vessels, abnormalities in forebrain morphology and the musculature of the eyes are all prevalent phenotypes, as is loss or abnormal size of the hypoglossal nerve. Conclusions: Overall, the most striking finding is that no matter how profound the malformation, each phenotype shows highly variable penetrance within a mutant line. These findings have challenging implications for efforts to identify human disease correlates.


Introduction
Animal models have long been used as experimental surrogates for investigating the role of individual genes in human development and disease. The remarkable degree of conservation in gene sequence and role that we now know exists across species confirms the validity of this approach and genetic manipulation in the mouse provides a commonly used way to explore gene function. The most ambitious example of this is the attempt coordinated by the International Mouse Phenotyping Consortium (IMPC) to generate a catalogue of gene function, using a systematic approach to phenotyping of individual gene knockouts (KO) that cover the entire mouse genome.
In generating KO lines from about one quarter of the total mouse genome so far, these studies have revealed that around one third of all mammalian genes are essential for life [1][2][3] , their removal resulting in embryonic or perinatal lethality. The study of such mutant lines provides a unique opportunity to gain a comprehensive overview of the genetic components regulating normal embryo development and, by inference, the identity of genes whose mutation may cause congenital abnormalities or developmental disease.
Deciphering the Mechanisms of Developmental Disorders (DMDD) is a five year, UK-based programme funded by the Wellcome Trust with the goal of studying 240 embryonic lethal KO lines 3 . By applying systematic phenotyping methods for homozygous mutant embryos with parallel efforts to identify placental abnormalities and changes in early embryo transcriptome profiles, DMDD offers a foundation for identifying novel genes important for developmental or clinical studies. Here we summarise results to date from detailed examination of homozygous mutant embryos at E14.5 for structural abnormalities.

Materials and methods
Embryos All embryos were produced by the Wellcome Trust Sanger Institute (https://www.sanger.ac.uk/mouseportal/) as part of the DMDD project 3 . Gene knockout lines produced as part of a systematic programme coordinated by the International Mouse Phenotyping Consortium (http://www.mousephenotype.org) were designated lethal if no homozygous mutants were present amongst a minimum of 28 pups at P14 and sub-viable if their proportion fell below 13% of total offspring 2 . All embryos are obtained from heterozygous intercross independently from the P14 viability call. Embryos were harvested from one or more litters at E14.5, fixed in Bouin's fixative for 24 hours and stored at 4°C in phosphate buffered saline.
Generation of digital volume data Embryos were initially scored for gross abnormalities under a dissection microscope before preparation for 3D imaging. Briefly, embryos were dehydrated in methanol (10% steps until 90%, followed by 95% and 100%; at least 2 hours each) and embedded in methacrylate resin (JB-4, PolySciences) containing eosin B and acridine orange, as previously described 4-6 . Within each resin block, the embryo was oriented to ensure transverse sectioning along its longitudinal axis. Resin blocks were allowed to polymerise overnight at room temperature, baked at 90°C for 24-48 hours and then subjected to digital volume data generation using highresolution episcopic microscopy (HREM) 7 . HREM data was downsized as appropriate to provide an isotropic voxel size of between 2.5-3 µm, depending on original section thickness.
Data processing and annotation 12 bit raw greyscale image data was adjusted to optimise tissue visualisation using Photoshop 6 (Adobe). Data visualisation and analysis was performed using software packages Amira 5 (ThermoFisher Scientific) and Osirix, versions 6-8 (Pixmeo). Phenotypes were identified by establishing the precise developmental sub-stage of each embryo and comparing it with stagematched controls 8 . Phenotyping was performed according to a standardised and sequential procedure using actual and virtual 2D section stacks, essentially as recently described 9 . Data from each embryo was independently reviewed by a second anatomist, and any discrepancies resolved by joint agreement. Each phenotype call was assigned to a 3D point within the embryo image data stack. Abnormalities were classified with the Mammalian Phenotype (MP) ontology 10 , using the most specific MP term that described each defect. 3D volume rendered models were employed for developmental staging from external morphology 8 .

Data analysis
In order to facilitate summarising of detailed phenotype annotation data, two subsets of the MP terms closer to the root of the ontology were chosen to provide structured "high" and "intermediate" level overviews of DMDD phenotype data. These MP ontology slims are shown in Table 5 and Table 6 (Supplementary Table 2  and Supplementary Table 3 for download). The MP terms assigned during annotation of the embryos were summarised into the categories defined by the DMDD slims using the Map2Slim algorithm (https://metacpan.org/pod/distribution/go-perl/scripts/ map2slim). All the terms of the DMDD slims that map to terms used to annotate mutant and wild type embryo phenotypes are listed in Supplementary Table 1A and Supplementary Table 1B, respectively. MP annotation terms used to describe the phenotypes of each embryo of a line were normalised to remove duplicate terms, and the terms for each embryo were mapped onto the ontology slims. For each line, a set of the unique slim terms observed for the line was generated and lists were produced of all the embryos from the line falling into each of these high or intermediate level categories. This enabled calculation of a penetrance score for each of the broad slim terms, calculated as a ratio of the number of embryos listed for the slim category to the number of homozygous mutant embryos analysed for the line.
To obtain a global view of the phenotypes detected, the frequency of lines showing each of the broad category slim terms were counted across all the lines analysed. In addition, the incidence of embryos scored for every phenotype category described by the slim terms, and the total number of embryos analysed in lines exhibiting each individual phenotype category was counted.
The total number of lines for each slim term that had a penetrance score between 0-0.24, 0.25-0.49, 0.50-0.74 and 0.75-1.00 was recorded. We calculated the cumulative penetrance score for each slim term as the overall sum of the penetrance scores of every line showing this broad category phenotype. In addition, for each of the penetrance intervals listed above, the sum of the penetrance scores was calculated for the lines falling into these categories.
All plots showing analysis of the data were produced using the R software package, version 3.2.1 (2015-06-18) (The R Foundation for Statistical Computing).

Use of animals
The care and use of all mice in this study were in accordance with UK Home Office regulations, UK Animals (Scientific Procedures) Act of 1986 (PPL 80/2485) and were approved by the Wellcome Trust Sanger Institute's Animal Welfare and Ethical Review Body.

Size of the study
The data for this study comprises 220 homozygous mutant and 114 wild type E14.5 embryos analysed by the DMDD programme. All data is presented in Supplementary Table 4 and Supplementary  Table 5 and is and also available on the DMDD web site (https:// dmdd.org.uk). Embryos were obtained from 42 novel gene knockout lines, 31 classified as lethal and 11 as sub-viable (Table 1; see also Materials and methods). This corresponds to an average of approximately 5 homozygous mutant embryos for each mutant line, although in practice numbers ranged widely from 1 to 11 as a result of variable breeding efficiency and cost limitations inherent in a large scale screening programme (Supplementary Figure 1). In total, 1,128,247 transverse section images obtained from the 334 embryos formed the basis for examining embryo structure and with the addition of digital resection of datasets in coronal and sagittal planes, scoring of phenotypes was based on examination of 2,536,659 images.

Incidence of structural abnormalities in homozygous mutant embryos.
Almost all mutant embryos studied (209/220) showed structural abnormalities that could be identified by a phenotyping procedure previously refined from pilot studies 9 . The remaining 11 apparently normal embryos were obtained from 9 different lines, each of which yielded several other homozygous mutants bearing detectable morphological abnormalities. We have previously reported that the resolution afforded by 3D datasets obtained by HREM imaging allowed the detection of phenotypic abnormalities spanning in size range from individual nerves and blood vessels to gross organ and tissue malformations 9 . In the present study, a total of 398 different MP terms were employed to record a total of 2,939 detected mutant embryo phenotypes (Table 2A and  Supplementary Table 1A and Supplementary Table 4). Multiple abnormalities were scored in virtually all homozygous mutant embryos. Most showed up to 10, but in some embryos as many as 50 phenotypes were recorded ( Figure 1A). Whilst a few phenotypes (for example those affecting different parts of vertebrae or different regions of the vertebral column) were often scored repeatedly within affected embryos, their incidence was insufficient to have a significant impact on the overall distribution of phenotype numbers scored per embryo across the whole study. When analysed by individual mutant line, the incidence of detectable abnormalities is more broadly distributed, with more than half of the 42 lines showing between 10 and 49 different phenotypes ( Figure 1B).

Incidence of structural abnormalities in wild type embryos
To establish the possible impact of "background" abnormalities present within embryos irrespective of mutation, we also analysed a total of 114 wild type embryos, obtained from 41 of the 42 mutant lines (Table 1). Previous large-scale studies of wild type E14.5 embryos from the same genetic background have enabled us to distinguish normal variation in structure from definite abnormalities, using careful stage-specific comparisons combined with statistical and morphometric analysis 8 . This formed the basis for identifying phenotypes in the wild type embryos (Table 2B and  Supplementary Table 1B and Supplementary Table 5).
In total, 56 phenotype calls were made, affecting 32 of the wild type embryos and 28 of the 41 lines. 21 of the 56 phenotype calls (38%) are accounted for by only 6 embryos, (indicating the skewing effect of a small number of abnormal embryos). Most affected embryos showing only a single phenotype. This is in marked contrast to the finding of many different phenotypes in individual mutant embryos.
The phenotypes of wild types vary in character, ranging from apparently minor differences (e.g. in blood vessel morphology) to a few major abnormalities (e.g. absent kidney). Each one is rare amongst the population of wild type embryos analysed and affects only a single wild type embryo within the line. Only 10 phenotypes (15 phenotype calls) overlap between mutant embryos and their wild type siblings and these affect only 10 of the 41 lines for which wild type embryos have been assessed (Table 3).

Prevalence of individual abnormalities in mutant embryos
Supplementary Table 1A presents the frequency of individual abnormalities that were identified amongst the mutant embryos.
Since some phenotypes (such as vertebral abnormalities) are often present multiply in affected embryos, the data is normalised for occurrence by embryo. Interestingly, the most common phenotype detected in this study was subcutaneous edema. This was evident from macroscopic observation of embryos at harvest and confirmed by subsequent HREM imaging (Figure 3, panels A-C). In total, subcutaneous edema and edema in other body regions (scored with four distinct MP terms) affected one third (72/220) of the embryos and was observed in a little over half (24/42) of the mutant lines.
Other prevalent phenotypes included defects affecting the vertebral arches, the ventricular septum of the heart, forebrain morphology and musculature of the developing eyes (Table 2A and Figure 3). Of particular note is the frequency with which mutant embryos showed abnormalities affecting the architecture or presence of the hypoglossal nerve (Figure 4, panels A and B). Complete absence of the nerve occurred in 37 embryos, obtained from 12 different mutant lines, with some embryos from a similar number of lines showing abnormal topology or unusual thinness of the nerve (13 and 9 lines respectively). Overall, scored phenotypes affected all the major organ systems at E14.5 ( Figure 5A) and multiple organs or tissues were frequently affected within individual embryos, or collectively within a mutant line ( Table 4, organised according to the MP ontology slims adopted by the DMDD, with data ranked according to prevalence in mutant lines.

Individual phenotypes show highly variable penetrance
Perhaps the most striking finding of the DMDD study is the almost complete absence of any fully penetrant abnormalities. Amongst lines for which more than a single embryo was analysed, only three phenotypes showed 100% penetrance: abnormal perichondrial ossification (1 line; 10 mutant embryos), small nodose ganglion (1 line; 4 embryos) and small trigeminal ganglion (1 line, 3 embryos). Furthermore, most defects showed surprisingly low penetrance. A penetrance greater than 75% within the line was only found for 7% of detected phenotypes. In contrast, over half (55%) of the scored abnormalities had a penetrance of 25% or less (Table 4). This is graphically illustrated in Figure 5A, in which the scored phenotypes are clustered according to high level MP ontology terms (broadly reflecting distinct organ systems, tissues or body regions) and the prevalence of each in the 42 mutant lines categorised by penetrance. All phenotypes show a broad range of penetrance, about half showing roughly symmetrical distribution of penetrance, with similar numbers of lines both above and below 50%. Interestingly, it is possible also to distinguish several phenotypes where penetrance is noticeably skewed. Abnormalities affecting the cardiovascular system, nervous system and skeleton all affected a relatively large number of lines and each showed a striking bias towards higher penetrance values. A second group of abnormalities encompassing liver/biliary, respiratory, renal and hearing systems showed a converse bias to penetrance values below 50% ( Figure 5A).
When grouped into such high level MP ontology terms, the most common group of abnormalities are those affecting the cardiovascular system, examples of which affect embryos in every single mutant line studied. Almost as prevalent are nervous system phenotypes, which are detected in 80% of the lines studied. Re-plotting the data summarised by intermediate level MP term slim provides a more detailed view of the prevalence and variability in penetrance of phenotypes ( Figure 5B). At this level of resolution, for example, cardiovascular defects are subdivided into two broad categories; those encompassing abnormalities in blood vessel morphology or topology ("abnormal blood vessel morphology" and most phenotypes within "abnormal cardiovascular development") and those affecting the heart and its great vessels ("abnormal heart morphology"). Viewed in this way, it is clear that detection of cardiovascular defects in all lines examined results from the presence of phenotypes in the vasculature. These range from relatively major defects such as absence of the ductus venosus, interrupted aortic arch or arterial stenosis, to more minor alterations in vascular topology in different regions of the embryo. Cardiac abnormalities nevertheless remain prevalent, affecting almost two thirds (27/42) of the  mutant lines. These encompass malformations in all regions of the four-chambered heart and its great vessels, including both atrial and ventricular septal defects, atrioventricular septal defects, common arterial trunk, double outlet right ventricle, transposition of the great arteries, bicuspid aortic valve, common truncal valve and abnormally thin myocardium. After blood vessel and cardiac abnormalities, the third most prevalent group of phenotypes detected were those affecting brain morphology ( Figure 5B), most commonly the forebrain ( Figure 6 and Supplementary Table 1A).
In order to assess the relative significance of each phenotype in the context of variable penetrance, we re-examined their ranking distribution after weighting each phenotype according to its individual prevalence. This provides a plot of cumulative line penetrance for each of the 70 intermediate level MP term slim (Figure 7). Whilst abnormalities in blood vessel morphology and structure of the heart remain amongst the most prevalent phenotypes, weighting by penetrance has a significant impact on the ranking of other phenotypes. Notably, the relative ranking of "abnormal brain morphology" and "abnormal somatic nervous system morphology" is increased, with both now lying in the five most prevalent abnormalities scored. This change is largely driven by the relatively high prevalence associated with abnormalities in forebrain morphology and hypoglossal nerve structure or presence, respectively.
Phenotype penetrance is affected by allele type Of the 42 mutant lines studied, 22 contained the tm1a insertion allele, compared with 20 containing exon deletions (19 tm1b and 1 CRISPR). With either group, blood vessel, heart and brain morphology remain amongst the most commonly observed    abnormalities. There is however a clear difference in phenotype penetrance between the two groups: phenotypes are significantly less penetrant with tm1a alleles (compare Figure 5B with Figure 8A and B).

Phenotyping embryos required new MP terms
Adoption of a formal, standardised ontology for scoring abnormalities provides an essential framework for analysing the data and facilitating structured search enquiries. However, during the course of the DMDD programme and its pilot study 9 , it became clear that additional terms were required in order to adequately describe abnormalities in embryo, as opposed to adult structures. A further outcome of the DMDD study has therefore been the creation of 142 new MP terms to accommodate the range of abnormalities we have observed (  (Figure 9).

Discussion
Since approximately one third of gene knockouts in the mouse prove to be embryonic or perinatal lethal 1-3 , further study of such lines offers a unique opportunity to better understand the genetic regulation of embryo development and identify genetic determinants of congenital abnormalities. The data accumulated during three years of the DMDD programme provide the first opportunity to study in detail the identity, range and prevalence of morphological abnormalities in such mutants and offer a window on the opportunities (and pitfalls) such systematic studies present.
The current analysis is restricted to a single developmental stage (E14.5) when most organ systems of the embryo have developed their definitive fetal appearance and the body plan is broadly similar to that of the adult mouse. Whilst this provides obvious practical advantages for a systematic, high throughput phenotyping programme, it is of course an arbitrary choice with respect to the time course of individual gene function and the consequences of gene ablation. Indeed, about 60% of the lethal lines entering  Our finding that some manifestation of edema (generally subcutaneous) is the most common phenotype could indicate an unappreciated complexity in the genetic controls regulating fluid balance or tissue integrity of vascular or lymphatic components. Edema may also represent a common outcome for a wide range of pathophysiological perturbations, as has been proposed for the association of non-immune hydrops fetalis with human fetal loss 11,12 . The prevalence of cardiovascular defects is also consistent with the well established finding that cardiac abnormalities are the most common congenital defect in human newborns 13 . Some caution is necessary in considering the mouse data, since as we have shown, a significant proportion of cardiovascular phenotypes comprise apparently minor alterations in blood vessel topology, the impact of which on normal development remains unclear. However, in addition to these, the lines we have studied show a range of severe abnormalities in cardiac structure that are both relatively prevalent and mirror the range of congenital abnormalities seen in humans. Despite the largely random selection of genes studied in screens such as DMDD, their identification as embryonic lethal therefore provides a dramatic enrichment for potential cardiac developmental disease alleles.
Phenotypes affecting neural tissue also prove to be relatively prevalent in mutant embryos. We are limited in the present analysis to identifying a subset of neural deficits readily identified from HREM imaging. This restricts identifiable phenotypes to relatively gross alterations in brain and neural tube morphology, or changes affecting major nerves. Amongst the latter, the frequency with which abnormalities affecting the hypoglossal nerve have been detected is perhaps not so surprising, since these (like abnormalities detected in the motoric portion of the trigeminal nerve) may compromise suckling and lead to perinatal lethality.
The multiplicity of phenotypes frequently detected in individual mutant embryos is not unexpected, given the nature of a single time point screening procedure, combined with the likely pleiotropic effects of individual gene loss. However, the most striking and surprising finding to emerge from the DMDD phenotype data is that virtually all phenotypes are incompletely (and frequently poorly) penetrant, despite the use of the isogenic C57BL/6N mouse strain. Combined with the observation of overlapping but distinct spectra of phenotypes between individual embryos from a single line, these findings are challenging to understand, and at a minimum point towards unknown stochastic components affecting the etiology of each phenotype or the compensatory responses they elicit 2 . They also demonstrate that efforts to identify linkage between mouse embryo phenotypes and human developmental disease are likely to require sophisticated bioinformatic analysis beyond the obvious issues raised by species differences in anatomy and physiology.
The observation of a small number of phenotypes amongst the wild type litter mates of the homozygous mutants raises the important question: why are phenotypes detected in genetically wild type embryos? We think there are several possible explanations. One possibility is that the C57BL/6N mouse strain used for engineering knockout lines carries a "background load" of abnormalities, previously unappreciated. Ours is the first systematic study on sufficiently large scale and employing sufficiently high-resolution imaging to detect such abnormalities. None of the phenotypes we have identified show a high penetrance across both mutants and wild types of a mutant line and do not therefore suggest themselves as strain-specific abnormalities. Another possible explanation is that abnormalities arise as a consequence of de novo mutation. Lastly, at least with the less profound abnormalities, it is possible that some phenotypes may prove to be outliers on spectrum of normal morphological variation and should not be considered genuine abnormalities. This highlights an important issue confronting phenotyping studies: the dearth of large-scale and systematic studies examining normal embryo morphology that can set a reliable benchmark for distinguishing abnormalities from normal variation. In this light, phenotype data may need revision as cumulative experience with the C57BL/6N and other mouse strains improves our ability to distinguish abnormalities from normal variation amongst wild types.
Our study has identified a small number of apparent abnormalities common to both homozygous mutant embryos and wildtype controls from the C57BL/6N mouse strain and which have therefore been excluded from the phenotyping procedure. These include splitting of the tail tip, persistence of the craniopharyngeal duct with associated fenestration of head bones and the presence of vesicles in the lens of the eye (Figure 4, panels C-E). Apart from these, our data offers no clear evidence for other "background" phenotypes associated with either the C57BL/6N genetic background or with individual mutant lines. Overall, we consider that neither the frequency, prevalence nor nature of the phenotypes identified in wild type embryos impact significantly on     the assignation of phenotypes amongst the homozygous mutant embryos.
Two other factors in our study might affect interpretation of the mutant phenotype data. 11 of the 42 lines examined in our study were judged subviable at weaning, rather than lethal. This number is too small to support meaningful comparison of the phenotypic spectrum between subviables and lethals. It is tempting to speculate that a difference in phenotype penetrance might underlie the difference in viability between the two groups, but there is no evidence to support this from the DMDD study so far (see Supplementary Figure 4). Even if a difference in penetrance was detected between lethal and subviable lines, interpreting its significance is far from simple as it raises an important and unresolved question: which phenotypes are responsible for embryo death? Many profound abnormalities that we detect may be compatible with life; equally, lethality may result from subtle structural changes. Without knowing which of the scored phenotypes are likely to cause lethality, it will be difficult, if not impossible, to establish if differences in their penetrance distinguish subviable from lethal lines. Add to this the additional difficulty that dams have a propensity to eat newborns that are not thriving well and there is a further complication in interpreting the data. The lines we have studied fall roughly equally between those containing an insertion into the targeted gene (tm1a alleles) and those in which recombination has removed both a gene exon and the neomycin selection cassette (tm1b alleles). Interestingly, our data clearly reveals that tm1b alleles show greater penetrance of phenotypes than those containing the tm1a insertion. This may reflect the potential of tm1a alleles to be hypomorphic, and might also be influenced by their retention of the neo selection cassette.
It is also worth noting the several practical lessons which have become evident through the course of DMDD studies and which may be of value for similar embryo phenotyping programmes.
The most pressing of these is basing phenotype detection on comparison of each mutant embryo with an appropriately staged normal counterpart 14 . Embryos harvested at E14.5 vary markedly in their developmental progress and many tissues and organs are actively remodelled during this period. This is most obvious for the topology of the intestine, the position of the palatal shelves and the interventricular communication between left and right sides of the heart. Only with precise developmental staging is accurate phenotyping of these features possible 8 .
Whilst the precise range and detail of phenotypes that can be scored will necessarily be dictated by the nature of the imaging modality and the method of phenotype identification (compare, for example 15,16, with the manual annotation used in the present study), a common challenge is the development of protocols to minimise occurrence or subsequent scoring of apparent abnormalities that are more likely artefacts of sample preparation or processing. These can range from the more obvious ruptures of the embryo skin or damaged external features during dissection, to tissue shrinkage or swelling (causing organ deformation) as a result of dehydration, fixation or embedding. Finally, the power of phenotypic screens such as DMDD to inform our understanding of developmental disease rests heavily on the detail with which abnormalities are scored. However, the very complexity we have seen this generates makes it all the more urgent to distinguish phenotypes not just through the nature of the morphological abnormality, but through its capacity, individually or in concert with others, to compromise subsequent fetal survival.

Data availability
Dataset The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Acknowledgements
We are grateful for the contributions made by past and present members of the DMDD consortium and the support of their institutions, without which the DMDD programme would not be possible.

Supplementary Figure 1: Embryo Homozygous mutant embryo numbers analysed for each mutant line.
The number of annotated embryos scored for each of the 42 lines was used to plot the variation in numbers of embryos analysed per line.
Click here to access the data.

Supplementary Figure 2: Distribution of homozygous mutant embryo phenotypes amongst DMDD mutant lines (high level MP ontology slim).
Data from the global analysis of the frequency of phenotype terms (see Materials and Methods) is plotted to show the penetrance of phenotypes scored for each line, indicated by a colour gradient from light yellow (no penetrance) to dark red (100% penetrance). Each line is labelled after the symbol of the gene disrupted (see also Table 1), and the number of homozygous mutant embryos analysed for each line is shown. Phenotype annotations are summarised using the high level DMDD ontology slim.
Click here to access the data. Click here to access the data.  Click here to access the data. A list of the Mammalian Phenotype Ontology IDs and names of terms selected as the high level ontology slim.

Supplementary
Click here to access the data. Click here to access the data.

Supplementary Tables 4 and 5: All embryo phenotypes from lethal and sub-viable lines scored by DMDD to date.
The tables list the annotation data that is the basis of the study. For every annotation the gene symbol, MGI_ID, allele symbol, DMDD_ID, MP term, ID and name is listed. In some cases the same MP term is listed more than once for a specific embryo (DMDD_ID), indicating the phenotypic abnormality was observed more than once in that embryo.
Supplementary control embryos, would be essential to establish the link between mutations and described phenotypes. In the absence of this data, any reference to a causal link between phenotypes and mutation should be removed from the article.
Both targeted traps (tm1a) and null (tm1b and CRISPR induced deletions) alleles are employed in the study. Both the presence of a selection cassette and the unpredictability of efficiency of trapping cassette(s) could form the basis of at least some of the variability shown in this study. An evaluation of variability (particularly using slim terms) within each of these 2 groups of alleles would help to address this point.
Subviable lines show by definition a partially penetrant phenotype and contribute to a quarter of the mutant studied. An evaluation of variability (particularly using slim terms) within lethal and subviable as separate alleles groups would discriminate whether variability of morphology is particularly occurring among subviable lines.
Minor points: Methods should detail information that permit the appraisal of materials used in the study, detailing the genetic background of stem cells and animals employed for germline transmission, and further breeding, including whether homozygotes were used to produce embryos to analyse subviable lines.
Methods should outline the steps taken to limit manual annotation variability (i.e. secondary calling or benchmarking between annotators).
All titles and text should precisely detail when lethal or both lethal and subviable mutations are presented.
I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.
No competing interests were disclosed.

Competing Interests:
Author Response 06 Mar 2017 , The Francis Crick Institute, UK Tim Mohun The authors made the unusual choice of not presenting baseline data on the morphology of wild-type mutants (littermates) produced in the study. Such data, surveying significant groups of control embryos, would be essential to establish the link between mutations and described phenotypes. In the absence of this data, any reference to a causal link between phenotypes and mutation should be removed from the article.
We have included the wild type phenotype data in the revised version of the manuscript (see the detailed response to Rosenthal/Murray for more details).
Both targeted traps (tm1a) and null (tm1b and CRISPR induced deletions) alleles are employed in the study. Both the presence of a selection cassette and the unpredictability of efficiency of trapping cassette(s) could form the basis of at least some of the variability shown in this study. An evaluation of variability (particularly using slim terms) within each of these 2 groups of alleles would help to address this point.
shown in this study. An evaluation of variability (particularly using slim terms) within each of these 2 groups of alleles would help to address this point.
The revised manuscript now includes separate analysis of phenotypes for the 22 tm1a alleles compared with 20 complete nulls (19 tm1b and 1 CRISPR). With either allele, blood vessel, heart and brain morphology remain amongst the most commonly observed abnormalities. However, with such relatively small numbers, we feel there is little more that can usefully be concluded from comparison of individual phenotype prevalence, since this will be heavily influenced by the distinct gene identities within each allele group. In contrast, there is a clear difference in phenotype penetrance between the two groups: phenotypes are clearly more penetrant from tm1b alleles (see new Figure 8A and 8B). We presume that this reflects the fact that whilst mutations based on tm1a alleles have the potential to be hypomorphic, those converted from tm1a to tm1b contain an exon deletion (and no longer carry the neo selection cassette).
Subviable lines show by definition a partially penetrant phenotype and contribute to a quarter of the mutant studied. An evaluation of variability (particularly using slim terms) within lethal and subviable as separate alleles groups would discriminate whether variability of morphology is particularly occurring among subviable lines.
We presume that the reviewer is wondering whether the difference between lethal and subviable lines is a result of differing degrees of penetrance of phenotypes that result in embryo death. Answering this point is not as simple as it might appear as it touches on a much more profound issue raised by studies such as ours. Whilst we are able to distinguish a remarkable number of different structural abnormalities by virtue of the resolution HREM imaging affords, it may not be at all clear which of these results in embryo lethality. Many profound abnormalities may be compatible with life and lethality may also result from structurally subtle changes. Without knowing which of the scored phenotypes are likely to cause lethality, it will be difficult if not impossible to establish of differences in their penetrance distinguish subviable from lethal lines. Add to this the additional difficulty that dams have a propensity to eat newborns that are not thriving well and there is a further complication in interpreting the data.
We have nevertheless reexamined the phenotype data in order to compare the results separately for lethal and subviable lines (new Supplementary Figure 4). From this it is clear that there is insufficient data from subviable lines to draw unequivocal conclusions. Overall, the approximate prevalence of particular phenotype terms (using the intermediate slim) appears broadly similar to that of lethals, but for most of these, the numbers of affected lines are too few to make useful estimates of penetrance.

Minor points:
1. Full details of genetic background and mutant allele are now provided for each line (revised Table 1).
2. All phenotyping was performed according to a standardised and sequential procedure, as mentioned in Material and methods. The data from each embryo was independently reviewed by a second anatomist and any discrepancies resolved by joint agreement.
3. We have amended titles and text to ensure that the distinction between lethal and subviable 1.
3. We have amended titles and text to ensure that the distinction between lethal and subviable lines is clear where necessary. This manuscript describes the findings of the DMDD consortium, analyzing 42 lethal and subviable genes at E14.5 using high-resolution 3D imaging (HREM) coupled with detailed annotation of the specific phenotypes revealed. The level of granularity in the scoring of the phenotypes is a major strength of the paper, and reflects the deep and unique expertise of the team. This has facilitated the discovery of widespread variable penetrance in mutant embryos at a level of detail not previously described. Furthermore, the effort to organize the MP into a series of "slims" is quite useful for organizing the calls into easier to analyze groups, and such work will likely benefit other groups such as the IMPC.
The manuscript is clearly written and, importantly, goes to great lengths to ensure full access to all data. In addition to minor issues detailed below, there are two major gaps, however, that must be addressed.
There is no description of the number of control embryos screened or the incidental rate of hits for each phenotype in the DMDD list. Given the focus of the paper on the variability of phenotype penetrance and the number of phenotypes with an "n=1", it is impossible to draw conclusions without this information. While the authors allude to a manuscript in preparation, it is actually essential data for this paper.
Similarly, there is no description of how the authors account for global developmental delay in mutants, which can lead to many "phenotypes" that are merely the result of slowed/retarded development or variability in developmental timing between litters. For example, at E14.5, one would expect a high rate of cleft palate in mutants that have some level of overall delay, or in entire delayed litters, as the palate is elevating and fusing at that time point. This raises the following questions: are controls from each litter collected? How is uniform staging assured? Are "delayed" embryos compared to a stage-matched control? Again, the authors allude to another manuscript, but some of this information needs to be included here to assure the MP calls do not have trivial explanations.
Minor points: While the brief description of the animal resource and use of website citation is acceptable, given the main finding of variable penetrance, the authors should make a point of describing the isogenic genetic background and the nature of the alleles (tm1a or tm1b) in the methods and results.
It's not entirely clear if this was a set of 42 genes that were lethal/subviable at wean, or if this was a select set of lethal genes that were viable/subviable (present) at E14.5. Given the comments in the discussion about lines lethal at E9.5 or earlier, I assume the latter. This should be spelled out.
Mouse gene symbols should be italicized.

4.
Mouse gene symbols should be italicized.
Apart from Table 1, the tables are too large and make reading a PDF a somewhat painful process. These might not be easily compressed, so most of the information should be moved to a supplemental file.
We have read this submission. We believe that we have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however we have significant reservations, as outlined above.
No competing interests were disclosed. There is no description of the number of control embryos screened or the incidental rate of hits for each phenotype in the DMDD list. Given the focus of the paper on the variability of phenotype penetrance and the number of phenotypes with an "n=1", it is impossible to draw conclusions without this information. While the authors allude to a manuscript in preparation, it is actually essential data for this paper.
The revised manuscript now includes the complete phenotype data obtained for 114 wild type embryos. This comprises 56 phenotype calls, affecting 32 embryos, originating from 28 lines (revised Tables 1, 2B and Supplementary Table 5). 21 of the 56 phenotype calls (38%) are accounted for by only 6 embryos, (indicating the skewing effect of a small number of abnormal embryos), most affected embryos showing only a single phenotype. This is in marked contrast to the finding of many different phenotypes in individual mutant embryos. The phenotypes of wild types vary in character, ranging from apparently minor differences (e.g. in blood vessel morphology) to a few major abnormalities (e.g. absent kidney). Each one is rare amongst the population of wild type embryos analysed and affects only a single wild type embryo within the line. Only 10 phenotypes (15 phenotype calls) overlap between mutant embryos and their wild type siblings and these affect only 10 of the 41 lines for which wild type embryos have been assessed (Table 3). As we discuss in the revised Results and Discussion sections, these data raise 3 related questions: Why are phenotypes detected in genetically wild type embryos? Are there "background" phenotypes associated with the C57BL/6N line that contribute to the mutant phenotypes scored? Is there any evidence for "background" phenotypes associated with an individual knockout line?
We think there are several possible explanations for finding phenotypes amongst wild type embryos. One possibility is that the mouse strain that has been used for engineering knockout lines carries a "background load" of abnormalities, previously unappreciated. Ours is the first systematic study on sufficiently large scale and employing sufficiently high resolution imaging to detect such abnormalities. Amongst the phenotypes identified, none shows significant prevalence that might be expected if it was a strain-specific abnormality. Another possible explanation is that abnormalities arise as a consequence of de novo mutation and the frequency we detect reflects the high sensitivity that results from HREM imaging. Lastly, at least with the less apparently severe abnormalities, it is possible that some of these in fact represent outliers on spectrum of normal morphological variation and should not be considered genuine abnormalities. This highlights an important issue confronting phenotyping studies: the dearth of large-scale and systematic studies examining normal embryo morphology that can set a reliable benchmark for distinguishing abnormalities from normal variation. In this light, phenotype data may need revision as cumulative examining normal embryo morphology that can set a reliable benchmark for distinguishing abnormalities from normal variation. In this light, phenotype data may need revision as cumulative experience improves our ability to distinguish abnormalities from variation amongst wild types.
Whatever the explanation, it is clear that neither the frequency, prevalence nor nature of the phenotypes identified in wild type embryos impact significantly on the assignation of phenotypes amongst the homozygous mutant embryos.
Similarly, there is no description of how the authors account for global developmental delay in mutants, which can lead to many "phenotypes" that are merely the result of slowed/retarded development or variability in developmental timing between litters. For example, at E14.5, one would expect a high rate of cleft palate in mutants that have some level of overall delay, or in entire delayed litters, as the palate is elevating and fusing at that time point. This raises the following questions: are controls from each litter collected? How is uniform staging assured? Are "delayed" embryos compared to a stage-matched control? Again, the authors allude to another manuscript, but some of this information needs to be included here to assure the MP calls do not have trivial explanations.
We believe it is important to distinguish between the effect of precise developmental stage of phenotyping and the issue of developmental retardation or delay. We can now reference the published study we mentioned that addresses these very questions (Geyer et al. 2017, J. Anat. in press). We do indeed collect wild type controls from each litter but our experience has demonstrated that precise stage matching of mutants with controls is essential to underpin accurate phenotyping. To facilitate this, we have analysed a large number of wild type embryos from the same genetic background as the that used for engineering of mutant lines. We have developed a system that can reliably distinguish five sub-stages within the span of Theiler stages 21 to 22 that are collected during E14.5, enabling us to compare each mutant embryo against precise, developmental stage-matched controls. Careful study and comparison of these has identified those changes (such as fusion of palatal shelves) which occur during the window of development that we observe. By combining qualitative comparisons with quantitative morphometry and statistical analysis, we are able to distinguish what can be considered genuine abnormalities from features that show either rapid developmental change or significant variability in the developmental timing of their appearance.
A more precise staging system also allows us to phenotype homozygous mutant embryos accurately, even though they frequently show some developmental delay, since we are able to compare them to controls at the equivalent stage of development. It also allows us to score instances of heterochrony where this affects individual (or a limited subset of) organs or tissues. By analysing a large number of wild type embryos harvested at E14.5, we have identified the spread and distribution of individual developmental sub stages that might be expected, and on this basis have a robust, statistical definition for global developmental retardation. Our studies do not allow us to identify why such retardation is relatively common amongst mutant embryos, but do offer some interesting pointers that we have commented upon. Retardation is, for example, much more common in mutant embryos showing cardiovascular defects (Geyer et al. 2017, J. Anat. in press). Furthermore, a surprisingly large proportion of mutants show abnormalities in their placental structure, and this may perhaps impact on their overall growth and development (unpublished data). phenotypes were 100% penetrant and over half of the abnormalities had a penetrance score under 25%.
Approximately one third of mouse gene knockouts are lethal and 60% of lethal lines entering the DMDD programme fail to provide homozygous mutant offspring by E14.5 with half of those being lethal prior to E9.5. Thus, as the authors point out, the data presented are from a subset of lethal lines. However, the most striking aspect of this study is the variability in penetrance of virtually all of the phenotypes analysed.
Recent studies sequencing human exome DNA has identified a high frequency of loss of function mutations. A study by Lek et al 2016 examined more than 60,000 human exomes and reported predicted homozygous loss of function genotypes in 1775 genes. On average there are 35 homozygous gene deletions in each human. Thus the comment by Wilson et al in the present paper is particularly pertinent; relating these findings to human developmental disease will require further sophisticated analysis. It would appear that homozygous loss of function mutations are more common than previously realised and, furthermore, the consequences of loss of function mutations are much more variable than previously realised. It will not be trivial to unmask the causes of this variability. We are only just beginning to scratch the surface of understanding the consequences of loss of function mutations in both mice and humans.
I have only one minor suggestion. On p4 3 lines from the bottom, the sentence starting "The Brd2 and Tcf712 alleles showed a similar, but less pronounced, conservation of phenotype.." requires clarification. Do they mean similar to Atp11a, to each other, or to both?