Tutorial

The complete tutorial is available in the following formats:

Video tutorials

Quick start

Get started with the Genonets Server in less than 2 minutes with the sample input file.

Deep dives

Learn the input file format, how to use the input form, format of the results files, and how to use the visualization features to explore the genotype networks.

Terminology

We refer to a set of genotypes with a given phenotype as a genotype set, which is typically a very small subset of genotype space – the set of all possible genotypes. Each genotype set therefore corresponds to a single phenotype. A genotype set may comprise one or more genotype networks. In such networks, vertices represent genotypes and edges connect vertices if their corresponding genotypes are separated by a single small mutation. Vertices that share an edge are referred to as neighbors. Since individual genotypes may belong to multiple genotype sets, genotype networks may overlap.
We also construct and visualize phenotype networks. In such networks, vertices represent genotype sets, and edges connect vertices if any genotypes in the corresponding genotype sets can be interconverted via a single small mutation. Vertices that share an edge are referred to as adjacent. We refer to mutations that lead to a change in phenotype as non-neutral. For further information on these and related concepts, the reader is referred to [4].

Analyses

Please review the following important information that applies to analyses in general:
  • All analyses are performed on the dominant genotype network. That is, if the genotype set is fragmented into several genotype networks, analyses will be performed on the largest of these networks. All other components are ignored.
  • Genotypes with a score below the score threshold tau are not considered in any analysis.
  • The noise threshold delta is only used in landscape related analyses, i.e., Paths, Peaks, and Epistasis (see below).

Evolvability

We define evolvability as the ability of mutation to bring forth a novel phenotype. Evolvability can be measured at the scale of an individual genotype or of a genotype set. In the Genonets Server, evolvability analyses are always enabled, resulting in the following:
  • Calculation of genotype evolvability
  • Calculation of phenotype evolvability
  • Construction of the phenotype network
These computational routines are based on [1], and are described in detail below.

Genotype evolvability

For a genotype g in a genotype set S, evolvability is the ratio of the number of genotype sets to which g can evolve via a single mutation, to the total number of genotype sets in the input data.
The higher the evolvability of a genotype g, the higher the number of genotype sets that can be reached by a single mutation from g.

Phenotype evolvability

For a genotype set S, phenotype evolvability is the ratio of the number of unique genotype sets to which genotypes in S can evolve via single mutations, to the total number of genotype sets available in the input data.

Phenotype network

  • Each vertex in the phenotype network represents a genotype set.
  • The size of a vertex corresponds to the phenotype evolvability of the corresponding genotype set, i.e., the higher the phenotype evolvability, the larger the vertex size.
  • The higher the out-degree of a vertex, the higher the number of genotype sets to which genotypes in this genotype set can evolve.
  • The higher the in-degree of a vertex, the higher the number of genotype sets from which this genotype set is accessible.
  • The network is a directed graph, which captures the possibly asymmetric relation between vertices. This means that it is possible for a genotype set to be accessible from other genotype sets, despite having zero phenotype evolvability itself.

Robustness

We define robustness as the invariance of a phenotype in the face of genetic perturbation. Like evolvability, robustness can be measured at the scale of an individual genotype or of a genotype set. In the Genonets Server, robustness analyses can be selected from the list of analyses in the input form. These analyses results in the following computations:
  • Genotype robustness
  • Average robustness of the genotype set
These computational routines are based on [1], and are described in detail below.

Genotype robustness

For a genotype g in a genotype set S, robustness is the fraction of all possible mutational neighbors that are also in S. Thus, g is maximally robust if all possible neighbors are members of S.

Phenotype robustness

Phenotype robustness is the arithmetic mean of the genotype robustness values for all genotypes in the genotype set S.

Accessibility

The accessibility of a genotype set S measures the potential for mutation to generate a genotype in S from genotypes in different genotype sets. In the Genonets Server, accessibility can be selected from the list of analyses in the input form, and is measured for all genotype sets in the input data. Specifically, for a genotype set S, accessibility is computed as follows:
For a genotype set S, accessibility is computed as follows:
  • For each pair of genotype sets (S, T), calculate the ratio of the number of genotypes in S that are separated by a single mutation from any genotype in T, to the total number of genotypes that are separated by a single mutation from genotypes in T.
  • Then calculate the accessibility of S as the sum of these ratios for all pairs (S, T).
Computational routines for accessibility are based on [2].

Neighbor abundance

The neighbor abundance of a genotype set S measures the size of adjacent genotype sets, in proportion to the probability that a mutation will generate a genotype in these adjacent genotype sets. In the Genonets Server, neighbor abundance can be selected from the list of analyses in the input form, and is measured for all genotype sets in the input data. Specifically, for a genotype set S, neighbor abundance is computed as follows:
  • Calculate the ratio of the number of genotypes in T that are accessible from S, to the total number of genotypes that are accessible from S.
  • Multiply this ratio by the number of genotypes in S.
  • Repeat this process for all genotype set pairs (S, T), taking the sum as the neighbor abundance of S.
Computational routines for neighbor abundance are based on [2].

Diversity index

The diversity index of a genotype set S gives the probability that two randomly chosen non-neutral mutations to genotypes in S yield genotypes that belong to the same genotype set T. In the Genonets Server, the diversity index can be selected from the list of analyses in the input form, and is measured for all genotype sets in the input data. Specifically, the diversity index of a genotype set S is computed as follows:
  • Calculate the ratio of the number of genotypes in T that are accessible from S, to the total number of genotypes that are accessible from S.
  • Square this ratio.
  • Repeat this process for all genotype set pairs (S, T), summing up along the way.
  • The diversity index of S is one minus this sum.
Computational routines for diversity index are based on [2].

Structure

The last two decades of research in network science have produced a wealth of measures for describing the structure of networks. The Genonets Server includes many of these analyses. They can be selected from the list of analyses in the input form, resulting in measures at the level of individual genotypes and genotype sets.
Computations performed at the level of the genotype set are:
  • Number of connected components, i.e., number of genotype networks within a single genotype set
  • Sizes of all connected components
  • Size of the giant component, i.e., size of the dominant genotype network
  • Proportional size of the dominant genotype network
  • Diameter of the dominant genotype network
  • Edge density of the dominant genotype network
  • Average clustering coefficient for the dominant genotype network
Computations performed at the level of genotypes are:
  • Coreness
  • Clustering coefficient
Computational routines for structural analysis are described in [5].

Overlap

Since some genotypes belong to more than one genotype set, genotype networks sometimes overlap. By selecting overlap from the list of analyses in the input form, the Genonets Server will characterize these regions of overlap for all pairs of genotype sets. Specifically, for each pair of genotype sets (S, T) available in the input data, this analysis calculates the number of genotypes that are common to both genotype sets S and T.
Overlap analysis can be selected from the list of analyses in the input form.

Epistasis

Epistasis – non-additive interactions between mutations – can impose severe constraints on molecular evolution because the mutations that are beneficial in one genetic background may be deleterious in another. Epistasis can be classified as magnitude, simple sign, or reciprocal sign epistasis depending on the sign (i.e., positive or negative) of the individual mutations and of the mutations in combination (please see [3] for details). In the Genonets Server, epistasis can be selected from the list of analyses in the input form, resulting in the following calculations:
  • Identify all squares in the dominant genotype network, as these represent pairs of mutations.
  • For each square, determine the class of epistasis (magnitude, simple sign, reciprocal sign).
  • For each epistasis class, calculate the proportion of all squares in the dominant genotype network that belong to this class.
Computational routines for epistasis are based on [3].

Peaks

In the input data, the user is required to provide a score for each genotype. Since these scores may reflect a quantitative phenotype that is related to organismal fitness, and because these scores vary amongst the genotypes in a genotype network, one may think of a genotype network as an adaptive landscape [6]. This opens the door to a slew of analyses that characterize the potential for mutation and selection to explore these landscapes. One of these analyses comprises determination of peaks in the landscape.
Peaks can be selected from the list of analyses in the input form, resulting in the determination of the global and all local peaks in the landscape. We refer to the genotype with the highest score in the genotype network as the summit. Please note that even though there can be multiple genotypes within a peak, when referring to the global peak within the Genonets Server documentation, we are in fact referring to the summit.
Computational routines for peaks are based on [3].

Paths

Another analysis where the genotype network is considered an adaptive landscape [6] (see the introduction to Peaks analysis above) is the compuation of accessible mutational paths.
Paths can be selected from the list of analyses in the input form. This analysis involves computing all accessible mutational paths from each genotype in the network, to the summit. A path is accessible, if and only if the scores for the genotypes on the path increase monotonically (plus or minus the user-supplied parameter delta), from the source genotype to the target genotype.
Computational routines for paths are based on [3].

Mapping of visualization features to analysis types

The following table is provided to help the user determine which analysis types are pre-requisite for which visualization features.
Visualization feature Required analysis type
Diameter path Structure
Landscape view Paths, Peaks
Path epistasis Paths, Epistasis
Squares: All
Squares: No epistasis
Squares: Magnitude epistasis
Squares: Simple sign epistasis
Squares: Reciprocal sign epistasis
Epistasis
Overlap target sets Overlap
Epistasis types: bar plot Epistasis
Paths to summit Paths
Highlight in landscape view Paths, Epistasis

Genotype set parameters

The following table provides a description of columns in the phenotype network table in the visualization, as well as the attributes in Genotype_set_measures.txt results file.
Attribute Description
Visualization Genotype_set_measures.txt
Name Genotype_set Name of the genotype set
Accessibility Accessibility Accessibility value computed for the genotype set.
Please refer to the accessibility analysis description for further details.
Neighbor abundance Neighbor_abundance Neighbor abundance value computed for the genotype set.
Please refer to the accessibility analysis description for further details.
Diversity index Diversity_index Diversity index value computed for the genotype set.
Please refer to the accessibility analysis description for further details.
Robustness Robustness Average robustness value computed for the genotype set.
Please refer to the robustness analysis description for further details.
Evolvability Evolvability Phenotype evolvability value computed for the genotype set.
Please refer to the evolvability analysis description for further details.
Evolvability targets Evolvability_targets List of genotype sets accessible from this genotype set by a single mutation.
Summit n/a ID of the vertex in the corresponding genotype network that represents the genotype with the highest score.
n/a Peaks Dictionary, where key is the peak ID, and value is a list of genotypes in the peak. Peak ID '0' is always the global peak.
Number of peaks Number_of_peaks Total number of peaks in the genotype network, including the global peak.
Number of squares Number_of_squares Total number of squares in the genotype network
Magnitude epistasis Magnitude_epistasis Ratio of the number of squares characterized by magnitude epistasis, to the total number of squares in the genotype network.
Simple sign epistasis Simple_sign_epistasis Ratio of the number of squares characterized by simple sign epistasis, to the total number of squares in the genotype network.
Reciprocal sign epistasis Reciprocal_sign_epistasis Ratio of the number of squares characterized by reciprocal sign epistasis, to the total number of squares in the genotype network.
Diameter Diameter Diameter of the the genotype network
Assortativity Assortativity Assortativity of the the genotype network
Edge density Edge_density Edge density of the genotype network
Number of genotype networks Number_of_genotype_networks Number of genotype network in the genotype set
Genotype network sizes Genotype_network_sizes A list of sizes of all genotype networks in the genotype set
Size of dominant genotype network Size_of_dominant_genotype_network Size of the dominant genotype network
Proportional size of dominant network Proportional_size_of_dominant_genotype_network Proportion of the total number of genotypes in the genotype set, which are in the dominant network
Average clustering coefficient Average_clustering_coefficient_of_dominant_genotype_network Average clustering coefficient computed for the dominant genotype network
Ratio of overlapping genotype sets Ratio_of_overlapping_genotype_sets Ratio of the number of overlapping genotype sets to the total number of genotype sets under consideration
Overlapping genotype sets Overlapping_genotype_sets List of names of genotype sets, where each genotype set has at least one genotype in common with this genotype set

Genotype parameters

The following table provides a description of columns in the genotype network table in the visualization, as well as the attributes in <>_genotype_measures.txt results file.
Attribute Description
Visualization _genotype_measures.txt
Vertex ID n/a ID of the vertex that corresponds to the genotype
Genotype Sequence The genotype
Score n/a Score value corresonding to the genotype as read from the input file
Robustness Robustness Genotype robustness.
Please refer to the robustness analysis description for further details.
Evolvability Evolvability Genotype evolvability.
Please refer to the evolvability analysis description for further details.
Evolvability targets Evolvability_targets Dictionary, where key is the genotype set name, and value is a list of genotypes in the genotype set to which the focal genotype can evolve.
Evolves to genotypes in n/a List of names of the genotype sets to which this genotype can evolve
Overlap with genotypes in Overlaps_with_genotypes_in List of names of the genotype sets which also contain this genotype
Coreness Coreness Coreness is an alternative measure of mutational robustness.
Clustering coefficient Clustering_coefficient The clustering coefficient measures the proportion of a vertex’s neighbors that are neighbors themselves.
Distance from summit Distance from Summit The number of edges between this genotype and the summit.
Accessible paths through Accessible_paths_through The number of accessible mutational paths that pass through this genotype.
Please note that it includes the paths of which the genotype is a starting or ending vertex.

References

  1. Andreas Wagner. Robustness and evolvability: a paradox resolved. Proc. R. Soc. B 2008 275 91-100; DOI: 10.1098/rspb.2007.1137. Published 7 January 2008.
  2. Cowperthwaite MC, Economo EP, Harcombe WR, Miller EL, Meyers LA (2008) The Ascent of the Abundant: How Mutational Networks Constrain Evolution. PLoS Comput Biol 4(7): e1000110. doi:10.1371/journal.pcbi.1000110
  3. Jose Aguilar Rodriguez, Joshua L. Payne, Andreas Wagner One thousand adaptive landscapes and their navigability. In review.
  4. Andreas Wagner. Neutralism and selectionism: a network-based reconciliation. Nature Reviews Genetics 9, 965-974 (December 2008).
  5. Mark Newman. Networks: An Introduction. Oxford University Press, Inc., New York, NY, USA. (2010).
  6. Sewall Wright. The roles of mutation, inbreeding, crossbreeding and selection in evolution. In Proc. Sixth Int. Congr. Genet. 356–366 (1932).