The Genonets package for Python

Introduction

This package provides a high level interface for construction and analysis of genotype networks from data.

Key features are:

  • Parsing of genotype-phenotype maps from input file, provided in the genonets input file format.
  • Creation of genotype networks from input data.
  • Various analyses on the constructed genotype networks.
  • Generation of result files with attributes from genotype network level analyses, as well as genotype level analyses.
  • Creation of a phenotype network that shows evolvability and accessibility relationships between the genotype sets.
  • Generation of GML files corresponding to the created genotype networks and the phenotype network.

All available analyses are described here.

License information

Author:

Fahad Khalid

License:

The MIT License (MIT)

Copyright (c) 2016 Fahad Khalid and Joshua L. Payne

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Installation instructions

Installation instructions are available on PyPI and GitHub.

Code samples

Click here to download all code samples, including the sample input file.

Tutorial I: The one-liner

The following code snippet shows the simplest possible way of using Genonets:

from genonets.cmdl_handler import CmdParser  # For parsing command line arguments
from genonets.genonets_interface import Genonets  # Interface to Genonets API

Genonets(CmdParser().getArgs(), process=True)

Yes, that’s it. Two import statements, and just one line of code to create, analyze, and save, all the genotype networks in the input file, as well the phenotype network. In fact, these are the contents of the 'genonets_exmpl_minimal.py' sample file included in the package.

If you have downloaded the sample code and changed directory to ‘sample/’, you can run 'genonets_exmpl_minimal.py' from the command line as follows:

python genonets_exmpl_minimal.py DNA true data/genonets_sample_input.txt 0.35 results_minimal

The command line arguments specified above are all mandatory positional arguments, i.e., one must specify each one of these arguments, and in the correct order. Here’s the ordered list of arguments and the corresponding descriptions:

  1. Alphabet type: The type of alphabet used in the input file. Supported values are:
  • RNA
  • DNA
  • Protein
  • Binary
  1. Include indels: Whether or not indels should be considered as mutations; details are available here.
  2. Input file name: Path to, and name of the input file.
  3. Tau: The minimum score value to consider when reading genotypes from the input file; details are available here.
  4. Result directory: Path to the directory in which the result files should be created.

In addition to the above listed mandatory arguments, the following optional arguments are also available:

  • ‘-rc’ or ‘—use_reverse_complements’: This argument can be specified without a value to enable consideration of reverse complements during the construction and analysis of genotype networks. This argument can only be used with alphabet type 'DNA'. More details are available here.

  • ‘-np’ or ‘—num_processes’: The number of processes to use when working in the parallel processing mode. An integer value greater than ‘0’ has to be specified as the value for this argument.

  • ‘-v’ or ‘—verbose’: This argument can be specified without a value to enable detailed printing of the processing steps. Please note that this argument should be used in conjunction with the ‘-u’ option for the ‘python command’, e.g.,

    python -u genonets_exmpl_minimal.py DNA true genonets_sample_input.txt 0.35 results_minimal -v
    

Tutorial II: Step by step processing

Instead of using a single call to perform all the processing steps, one can split these steps into multiple function calls. Here’s a code snippet from 'genonets_exmpl_simple.py', included in the sample sources:

# Parse the command line arguments
args = CmdParser().getArgs()

# Create the Genonets object. This will load the input file into memory.
gn = Genonets(args)

# Use 'gn' to create genotype networks for all genotype sets.
gn.create()

# Perform all available analyses on all genotype networks.
gn.analyze()

# Write all genotype networks to files in GML format. For a genotype network
# with two or more components, two files are generated: One corresponds to the
# entire network with all components, and the other corresponds to the dominant
# component only.
gn.save()

# Save all genotype network level measures to 'Genotype_set_measures.txt'.
gn.save_network_results()

# Save all genotype level measures to '<genotypeSetName>_genotype_measures.txt'
# files. One file per genotype set is generated.
gn.save_genotype_results()

Tutorial II: Parallel processing

Parallel processing an be used independently in network creation and network analysis. Here’s how parallel processing can be enabled:

# Use 'gn' to create genotype networks for all genotype sets in parallel.
gn.create(parallel=True)

# Perform all available analyses on all genotype networks in parallel.
gn.analyze(parallel= True)

Tutorial III: Selective processing

It is also possible to process only a selection of genotype sets from the input data. Also, it is possible to perform only a selection of available analyses. Here’s a code snippet from 'genonets_exmpl_selective.py', included in the sample sources:

# Parse the command line arguments
args = CmdParser().getArgs()

# Create the Genonets object. This will load the input file into
# memory.
gn = Genonets(args)

# Use 'gn' to create genotype networks for all genotype sets.
gn.create()

# Perform only 'Robustness' and 'Evolvability' analyses on just two of
# the genotype sets available in the input file, i.e., 'Foxa2' and 'Bbx'.
gn.analyze(["Foxa2", "Bbx"], analyses=[ac.ROBUSTNESS, ac.EVOLVABILITY])

# Write the given genotype networks to files in GML format.
# For a genotype network with two or more components, two files are generated:
# One corresponds to the entire network with all components, and the other
# corresponds to the dominant component only.
gn.save(["Foxa2", "Bbx"])

# Save genotype network level measures for the given genotype sets to
# 'Genotype_set_measures.txt'.
gn.save_network_results(["Foxa2", "Bbx"])

# Save all genotype level measures for the given genotype sets to
# 'Foxa2_genotype_measures.txt' and 'Bbx_genotype_measures.txt' files.
gn.save_genotype_results(["Foxa2", "Bbx"])

Tutorial III: Customizing results

This tutorial illustrates the process of customizing the output by adding information to the result file that would not be added by Genonets by default.

The ‘Peaks’ analysis is used as an example. By default, the ‘Peaks’ analysis stores results in a dictionary of the form:

{key=peakID : value=[genotypes in the peak]}.

The sample code that follows (taken from 'genonets_exmpl_custom.py') customizes this dictionary by adding the score value corresponding to each genotype in the list. The resulting dictionary is in the format:

{key=peakID : value=[(genotype1, score1), ..., (genotypeN, scoreN)]}

i.e., it is a list of tuples.:

# Parse the command line arguments
args = CmdParser().getArgs()

# Create the Genonets object. This will load the input file into
# memory.
gn = Genonets(args)

# Use 'gn' to create genotype networks for all genotype sets.
gn.create()

# Perform 'Peaks' analysis
gn.analyze(analyses=[ac.PEAKS])

# At this point, the analysis is done. We now need to extract the
# data we need, i.e., the peaks dictionary.

# For each genotype set,
for genotypeSet in gn.genotype_sets():
    # Get the igraph object for the giant
    giant = gn.dominant_network(genotypeSet)

    # Get the dict of peaks {key=peakId : value=[list of sequences in the peak]}
    peaks = giant["Peaks"]

    # Update the dict of peaks, so that instead of just a list of
    # genotypes, we have a list of tuples (sequence, escore).
    newPeaks = {}

    # For each peak,
    for peak in peaks:
        # Initialize the list of tuples
        seqScrTuples = []

        # For each sequence in this peak,
        for sequence in peaks[peak]:
            # Find the corresponding vertex in the giant
            try:
                vertex = giant.vs.find(sequences=sequence)
            except ValueError:
                print("Oops! can't find " + sequence + " in giant.")

            # Get the escore
            score = giant.vs[vertex.index]["escores"]

            # Add the tuple to the list of tuples
            seqScrTuples.append((sequence, score))

        # Add the peak and the corresponding list of tuples to the
        # new peaks dict
        newPeaks[peak] = seqScrTuples

    # Replace the peaks dict in giant with the new dict. This is
    # useful because now if you use the genonets functions below to
    # save the network and results, the updated peaks dict with tuples
    # will be written automatically to file.
    giant["Peaks"] = newPeaks

# Save networks to file in GML format
gn.save()

# Save the results to file from network level analysis
gn.save_network_results()

API documentation

genonets.genonets_interface module

Public interface to Genonets functions.

author: Fahad Khalid
license: MIT, see License information for details.
class genonets.genonets_interface.Genonets(args, process=False, parallel=False)

Encapsulates the Genonets public API.

__init__(args, process=False, parallel=False)

Initiate parsing of the input file, and load the parsed data into a Genonets object.

A simple way to create a Genonets object is the following:

gn = Genonets(CmdParser(args).getArgs())

where, CmdParser can be imported as follows:

from genonets.cmdl_handler import CmdParser

The args variable is a list of command line arguments. Click here to see the list and descriptions of all available command line arguments.

Parameters:
  • args – A populated CmdArgs object.
  • process – If 'True', in addition to creating the object, initiates complete processing, i,e., creates genotype networks for all genotype sets in the input data, performs all available analyses on all genotype networks, and generates all result files.
  • parallel – Flag to indicate whether or not parallel processing should be used. This parameter is only useful with 'process=True'.
analyze(genotype_sets=0, analyses=0, parallel=False)

Performs all analyses provided in the list of analysis types, on the given genotype sets.

This method can only be used if create has already been called on the same Genonets object.

Parameters:
  • genotype_sets – List of names of the genotype sets for which the genotype networks should be created. If a value is not explicitly specified for this parameter, genotype networks are constructed for all genotype sets available in the parsed data.
  • analyses – List of analysis type constants. The available values are:
    • ALL
    • ROBUSTNESS
    • EVOLVABILITY
    • ACCESSIBILITY
    • NEIGHBOR_ABUNDANCE
    • PHENOTYPIC_DIVERSITY
    • STRUCTURE
    • OVERLAP
    • PATHS
    • PEAKS
    • EPISTASIS
    • PATHS_RATIOS
    All analysis types, except 'PATHS_RATIOS' are described in detail here. 'PATHS_RATIOS' is used to calculate ratio of accessible mutational paths, to the total No. of paths in the genotype network.
    If the value for this parameter is not explicitly set, all available analyses are performed.
  • parallel – Flag to indicate whether or not parallel processing should be used.
Returns:

No return value.

create(genotype_sets=0, parallel=False)

Create genotype networks for the given list of genotype set names.

Parameters:
  • genotype_sets – List of names of the genotype sets for which the genotype networks should be created. If a value is not explicitly specified for this parameter, genotype networks are constructed for all genotype sets available in the parsed data.
  • parallel – Flag to indicate whether or not parallel processing should be used.
Returns:

No return value

dominant_network(genotype_set)

Get the igraph object for the dominant network corresponding to the given genotype set name.

The dominant network refers to the giant component in the network.

Note: This method can only be used if the genotype network corresponding to the requested genotype set name has already been created.

Parameters: genotype_set – Name of the genotype set for which the genotype network is requested.
Returns: Object of type igraph.Graph.
genotype_network(genotype_set)

Get the igraph object for the network corresponding to the given genotype set name.

The igraph object in this case refers to the entire network, i.e., all connected components.

Note: This method can only be used if the genotype network corresponding to the requested genotype set name has already been created.

Parameters: genotype_set – Name of the genotype set for which the genotype network is requested.
Returns: Object of type igraph.Graph.
genotype_sets()

Get a list of names of all genotype sets for which genotype networks have been created.

Returns: List of names of genotype sets.
phenotype_network(collection_name='phenotype_network', genotype_sets=0)

Create the phenotype network from the given list of genotype sets.

Parameters:
  • collection_name – The name to be assigned to the phenotype network.
  • genotype_sets – List of names of the genotype sets for which the phenotype network should be created. If a value is not explicitly specified for this parameter, all available genotype sets are considered.
Returns:

igraph.Graph object representing the phenotype network.

save(genotype_sets=0)

Write the genotype networks corresponding to the given genotype sets to file.

The networks are saved in GML format. For networks with more than one components, separate files are generated for the entire network and the dominant network.

Note: This method can be used only after analyze() has been called on the given genotype sets.

Parameters: genotype_sets – List of names of genotype sets for which the genotype should be written to file. If a value is not explicitly specified for this parameter, result files are written for all genotype sets.
Returns: No return value.
save_genotype_results(genotype_sets=0)

Write the genotype level results to files.

A results file is generated for each genotype set.

Note: This method can be used only after analyze() has been called on the given genotype sets.

Parameters: genotype_sets – List of names of genotype sets for which to generate the result files. If a value is not explicitly specified for this parameter, result files are written for all genotype sets.
Returns: No return value.
save_network_results(genotype_sets=0)

Write the genotype set level results to file.

A file named ‘Genotype_set_measures.txt’ is generated in the output directory specified at the time of the Genonets object creation.

Note: This method can be used only after analyze() has been called on the given genotype sets.

Parameters: genotype_sets – List of names of genotype sets for which to generate the result files. If a value is not explicitly specified for this parameter, result files are written for all genotype sets.
Returns: No return value.
save_phenotype_network()

Write the phenotype network to file in GML format.

Note: This method can only be used after the phenotype network has been created.

Returns: No return value.