pydna package

Submodules

pydna.amplicon module

This module provides functions for PCR. Primers with 5’ tails as well as inverse PCR on circular templates are handled correctly.

class pydna.amplicon.Amplicon(record, *args, template=None, forward_primer=None, reverse_primer=None, saltc=50.0, fprimerc=1000.0, rprimerc=1000.0, **kwargs)[source]

Bases: pydna.dseqrecord.Dseqrecord

The Amplicon class holds information about a PCR reaction involving two primers and one template. This class is used by the Anneal class and is not meant to be instantiated directly.

Parameters:

forward_primer : SeqRecord(Biopython)

SeqRecord object holding the forward (sense) primer

reverse_primer : SeqRecord(Biopython)

SeqRecord object holding the reverse (antisense) primer

template : Dseqrecord

Dseqrecord object holding the template (circular or linear)

saltc : float, optional

saltc = monovalent cations (mM) (Na,K..) default value is 50mM This is used for Tm calculations.

forward_primer_concentration : float, optional

primer concentration (nM) default set to 1000nM = 1µM This is used for Tm calculations.

rc : float, optional

primer concentration (nM) default set to 1000nM = 1µM This is used for Tm calculations.

dbd_program()[source]

Returns a string containing a text representation of a proposed PCR program using a polymerase with a DNA binding domain such as Pfu-Sso7d.

Pfu-Sso7d (rate 15s/kb)             |{size}bp
Three-step|          30 cycles   |      |Tm formula: Pydna tmbresluc
98.0°C    |98.0°C                |      |SaltC 50mM
__________|_____          72.0°C |72.0°C|Primer1C   1µM
00min30s  |10s  \ 61.0°C ________|______|Primer2C   1µM
          |      \______/ 0min 0s|10min |
          |        10s           |      |4-12°C
figure()[source]

This method returns a simple figure of the two primers binding to a part of the template.

5gctactacacacgtactgactg3
 |||||||||||||||||||||| tm 52.6 (dbd) 58.3
5gctactacacacgtactgactg...caagatagagtcagtaaccaca3
3cgatgatgtgtgcatgactgac...gttctatctcagtcattggtgt5
                          |||||||||||||||||||||| tm 49.1 (dbd) 57.7
                         3gttctatctcagtcattggtgt5
Returns:

figure:string

A string containing a text representation of the primers annealing on the template (see example above).

Notes

tm in the figure above is the melting temperature (tm) for each primer calculated according to SantaLucia 1998 [1].

dbd is the tm calculation for enzymes with dsDNA binding domains like Pfu-Sso7d [2]. See [3] for more information.

References

[1]
  1. SantaLucia, “A Unified View of Polymer, Dumbbell, and Oligonucleotide DNA Nearest-neighbor Thermodynamics,” Proceedings of the National Academy of Sciences 95, no. 4 (1998): 1460.
[2]
  1. Nørholm, “A Mutant Pfu DNA Polymerprimerase Designed for Advanced Uracil-excision DNA Engineering,” BMC Biotechnology 10, no. 1 (2010): 21, doi:10.1186/1472-6750-10-21.
[3]http://www.thermoscientificbio.com/webtools/tmc/
flankdn(flankdnlength=50)[source]

Returns a Dseqrecord object containing flankdnlength bases downstream of the reverse primer footprint. Truncated if the template is not long enough.

                               <---- flankdn ------>

                3actactgactatctTAATAA5
                 ||||||||||||||
acgcattcagctactgtactactgactatctatcgtacatgtactatcgtat
flankup(flankuplength=50)[source]

Returns a Dseqrecord object containing flankuplength bases upstream of the forward primer footprint, Truncated if the template is not long enough.

<--- flankup --->

          5TAATAAactactgactatct3
                 ||||||||||||||
acgcattcagctactgtactactgactatctatcg
pfu_sso7d_program()[source]
program()[source]

Returns a string containing a text representation of a proposed PCR program using Taq or similar polymerase.

Taq (rate 30 nt/s)
Three-step|         30 cycles     |      |Tm formula: Biopython Tm_NN
94.0°C    |94.0°C                 |      |SaltC 50mM
__________|_____          72.0°C  |72.0°C|
04min00s  |30s  \         ________|______|
          |      \ 46.0°C/ 0min 1s|10min |
          |       \_____/         |      |
          |         30s           |      |4-8°C
taq_program()[source]

pydna.amplify module

This module provides functions for PCR. Primers with 5’ tails as well as inverse PCR on circular templates are handled correctly.

class pydna.amplify.Anneal(primers, template, limit=13, primerc=1000.0, saltc=50, **kwargs)[source]

Bases: object

Parameters:

primers : iterable containing SeqRecord objects

Primer sequences 5’-3’.

template : Dseqrecord object

The template sequence 5’-3’.

limit : int, optional

limit length of the annealing part of the primers.

fprimerc : float, optional

Concentration of forward primer in nM, set to 1000.0 nM by default

rprimerc : float, optional

Concentration of reverse primer in nM, set to 1000.0 nM by default

saltc : float, optional

Salt concentration (monovalet cations) tmbresluc set to 50.0 mM by default

Examples

>>> from pydna.readers import read
>>> from pydna.amplify import Anneal
>>> from pydna.dseqrecord import Dseqrecord
>>> template = Dseqrecord("tacactcaccgtctatcattatctactatcgactgtatcatctgatagcac")
>>> from Bio.SeqRecord import SeqRecord
>>> p1 = read(">p1\ntacactcaccgtctatcattatc", ds = False)
>>> p2 = read(">p2\ngtgctatcagatgatacagtcg", ds = False)
>>> ann = Anneal((p1, p2), template)
>>> print(ann.report())
Template name? 51 nt linear:
Primer p1 anneals forward at position 23

Primer p2 anneals reverse at position 29
>>> ann.products
[Amplicon(51)]
>>> amplicon_list = ann.products
>>> amplicon = amplicon_list.pop()
>>> amplicon
Amplicon(51)
>>> print(amplicon.figure())
5tacactcaccgtctatcattatc...cgactgtatcatctgatagcac3
                           |||||||||||||||||||||| tm 55.9 (dbd) 60.5
                          3gctgacatagtagactatcgtg5
5tacactcaccgtctatcattatc3
 ||||||||||||||||||||||| tm 54.6 (dbd) 58.8
3atgtgagtggcagatagtaatag...gctgacatagtagactatcgtg5
>>> amplicon.annotations['date'] = '02-FEB-2013'   # Set the date for this example to pass the doctest
>>> print(amplicon)
Dseqrecord
circular: False
size: 51
ID: 51bp U96-TO06Y6pFs74SQx8M1IVTBiY
Name: 51bp_PCR_prod
Description: Product_p1_p2
Number of features: 4
/date=02-FEB-2013
Dseq(-51)
taca..gcac
atgt..cgtg
>>> print(amplicon.program())

Taq (rate 30 nt/s) 35 cycles             |51bp
95.0°C    |95.0°C                 |      |Tm formula: Biopython Tm_NN
|_________|_____          72.0°C  |72.0°C|SaltC 50mM
| 03min00s|30s  \         ________|______|Primer1C 1.0µM
|         |      \ 45.4°C/ 0min 2s| 5min |Primer2C 1.0µM
|         |       \_____/         |      |GC 39%
|         |         30s           |      |4-12°C
>>>

Attributes

products: list A list of Amplicon objects, one for each primer pair that may form a PCR product.
products
report()[source]

This method is an alias of str(Annealobj). Returns a short string representation.

pydna.amplify.nopcr(*args, **kwargs)[source]

no pcr is a convenience function for Anneal to simplify its usage, especially from the command line. If one or more PCR products are formed, an exception is raised.

args is any iterable of sequences or an iterable of iterables of sequences. args will be greedily flattened.

Parameters:

args : iterable containing sequence objects

Several arguments are also accepted.

limit : int = 13, optional

limit length of the annealing part of the primers.

Returns:

product : Dseqrecord

a Dseqrecord object representing the PCR product. The direction of the PCR product will be the same as for the template sequence.

Notes

sequences in args could be of type:

  • string
  • Seq
  • SeqRecord
  • Dseqrecord

The last sequence will be interpreted as the template all preceeding sequences as primers.

This is a powerful function, use with care!

Examples

>>> from pydna.dseqrecord import Dseqrecord
>>> from pydna.amplify    import nopcr
>>> template = Dseqrecord("tacactcaccgtctatcattatctactatcgactgtatcatctgatagcac")
>>> from Bio.SeqRecord import SeqRecord
>>> from Bio.Seq import Seq
>>> p1 = SeqRecord(Seq("tacactcaccgtctatcattatc"))
>>> p2 = SeqRecord(Seq("gtgctatcagatgatacagtG")) # This primer does not anneal
>>> nopcr(p1, p2, template)
True
pydna.amplify.pcr(*args, **kwargs)[source]

pcr is a convenience function for Anneal to simplify its usage, especially from the command line. If more than one PCR product is formed, an exception is raised.

args is any iterable of sequences or an iterable of iterables of sequences. args will be greedily flattened.

Parameters:

args : iterable containing sequence objects

Several arguments are also accepted.

limit : int = 13, optional

limit length of the annealing part of the primers.

Returns:

product : Dseqrecord

a Dseqrecord object representing the PCR product. The direction of the PCR product will be the same as for the template sequence.

Notes

sequences in args could be of type:

  • string
  • Seq
  • SeqRecord
  • Dseqrecord

The last sequence will be interpreted as the template all preceeding sequences as primers.

This is a powerful function, use with care!

Examples

>>> from pydna.dseqrecord import Dseqrecord
>>> from pydna.readers import read
>>> from pydna.amplify import pcr
>>> template = Dseqrecord("tacactcaccgtctatcattatctactatcgactgtatcatctgatagcac")
>>> from Bio.SeqRecord import SeqRecord
>>> p1 = read(">p1\ntacactcaccgtctatcattatc", ds = False)
>>> p2 = read(">p2\ncgactgtatcatctgatagcac", ds = False).reverse_complement()
>>> pcr(p1, p2, template)
Amplicon(51)
>>> pcr([p1, p2], template)
Amplicon(51)
>>> pcr((p1,p2,), template)
Amplicon(51)
>>>

pydna.assembly module

This module provides functions for assembly of sequences by homologous recombination and other related techniques. Given a list of sequences (Dseqrecords), all sequences will be analyzed for

The assembly algorithm is based on graph theory where each overlapping region forms a node and sequences separating the overlapping regions form edges.

class pydna.assembly.Assembly(dsrecs, limit=25, only_terminal_overlaps=False, max_nodes=None)[source]

Bases: object

Assembly of a list of linear DNA fragments into linear or circular constructs. The Assembly is meant to replace the Assembly method as it is easier to use. Accepts a list of Dseqrecords (source fragments) to initiate an Assembly object. Several methods are available for analysis of overlapping sequences, graph construction and assembly.

Parameters:

dsrecs : list

a list of Dseqrecord objects.

Examples

>>> from pydna.assembly import Assembly
>>> from pydna.dseqrecord import Dseqrecord
>>> a = Dseqrecord("acgatgctatactgCCCCCtgtgctgtgctcta")
>>> b = Dseqrecord("tgtgctgtgctctaTTTTTtattctggctgtatc")
>>> c = Dseqrecord("tattctggctgtatcGGGGGtacgatgctatactg")
>>> x = Assembly((a,b,c), limit=14)
>>> x
Assembly:
Sequences........................: [33] [34] [35]
Sequences with shared homologies.: [33] [34] [35]
Homology limit (bp)..............: 14
Number of overlaps...............: 3
Nodes in graph(incl. 5' & 3')....: 5
Only terminal overlaps...........: No
Circular products................: [59]
Linear products..................: [74] [73] [73] [54] [54] [53] [15] [14] [14]
>>> x.circular_products
[Contig(o59)]
>>> x.circular_products[0].seq.watson
'CCCCCtgtgctgtgctctaTTTTTtattctggctgtatcGGGGGtacgatgctatactg'
dsrecs = None

Sequences fed to this class is stored in this property

limit = None

The shortest common sub strings to be considered

max_nodes = None

The max number of nodes allowed. This can be reset to some other value

only_terminal_overlaps = None

Consider only terminal overlaps?

pydna.common_sub_strings module

pydna.common_sub_strings.common_sub_strings(stringx, stringy, limit=25)[source]

Finds all the common substrings between stringx and stringy longer than limit. This function is case sensitive.

returns a list of tuples describing the substrings The list is sorted longest -> shortest.

Examples

[(startx1,starty1,length1),(startx2,starty2,length2), ...]

startx1 = position in x where substring 1 starts starty1 = position in y where substring 1 starts length = lenght of substring

pydna.common_sub_strings.terminal_overlap(stringx, stringy, limit=15)[source]

pydna.contig module

class pydna.contig.Contig(record, *args, source_fragments=[], **kwargs)[source]

Bases: pydna.dseqrecord.Dseqrecord

This class holds information about a DNA assembly. This class is instantiated by the Assembly class and is not meant to be instantiated directly.

detailed_fig()[source]
detailed_figure()[source]

Synonym of detailed_fig()

figure()[source]

Synonym of small_fig()

small_fig()[source]

Returns a small ascii representation of the assembled fragments. Each fragment is represented by:

Size of common 5' substring|Name and size of DNA fragment| Size of common 5' substring

Linear:

frag20| 6
       \/
       /\
        6|frag23| 6
                 \/
                 /\
                  6|frag14

Circular:

 -|2577|61
|       \/
|       /\
|       61|5681|98
|               \/
|               /\
|               98|2389|557
|                       \/
|                       /\
|                       557-
|                          |
 --------------------------
small_figure()[source]

Synonym of small_fig()

pydna.design module

This module contain functions for primer design.

pydna.design.assembly_fragments(f, overlap=35, maxlink=40)[source]

This function return a list of pydna.amplicon.Amplicon objects where primers have been modified with tails so that the fragments can be fused in the order they appear in the list by for example Gibson assembly or homologous recombination.

Given that we have two linear pydna.amplicon.Amplicon objects a and b

we can modify the reverse primer of a and forward primer of b with tails to allow fusion by fusion PCR, Gibson assembly or in-vivo homologous recombination. The basic requirements for the primers for the three techniques are the same.

                          <-->

 _________ a _________           __________ b ________
/                     \         /                     \
agcctatcatcttggtctctgca         TTTATATCGCATGACTCTTCTTT
|||||||||||||||||||||||         |||||||||||||||||||||||
                 <gacgt                          <AGAAA
agcct>                          TTTAT>
|||||||||||||||||||||||         |||||||||||||||||||||||
tcggatagtagaaccagagacgt         AAATATAGCGTACTGAGAAGAAA

     agcctatcatcttggtctctgcaTTTATATCGCATGACTCTTCTTT
     ||||||||||||||||||||||||||||||||||||||||||||||
     tcggatagtagaaccagagacgtAAATATAGCGTACTGAGAAGAAA
     \___________________ c ______________________/

Design tailed primers incorporating a part of the next or previous fragment to be assembled.

agcctatcatcttggtctctgca
|||||||||||||||||||||||
                gagacgtAAATATA

|||||||||||||||||||||||
tcggatagtagaaccagagacgt

                       TTTATATCGCATGACTCTTCTTT
                       |||||||||||||||||||||||

                ctctgcaTTTATAT
                       |||||||||||||||||||||||
                       AAATATAGCGTACTGAGAAGAAA

PCR products with flanking sequences are formed in the PCR process.

agcctatcatcttggtctctgcaTTTATAT
||||||||||||||||||||||||||||||
tcggatagtagaaccagagacgtAAATATA
                \____________/

                   identical
                   sequences
                 ____________
                /            \
                ctctgcaTTTATATCGCATGACTCTTCTTT
                ||||||||||||||||||||||||||||||
                gagacgtAAATATAGCGTACTGAGAAGAAA

The fragments can be fused by any of the techniques mentioned earlier to form c:

agcctatcatcttggtctctgcaTTTATATCGCATGACTCTTCTTT
||||||||||||||||||||||||||||||||||||||||||||||
tcggatagtagaaccagagacgtAAATATAGCGTACTGAGAAGAAA

The first argument of this function is a list of sequence objects containing Amplicons and other similar objects.

At least every second sequence object needs to be an Amplicon

This rule exists because if a sequence object is that is not a PCR product is to be fused with another fragment, that other fragment needs to be an Amplicon so that the primer of the other object can be modified to include the whole stretch of sequence homology needed for the fusion. See the example below where a is a non-amplicon (a linear plasmid vector for instance)

 _________ a _________           __________ b ________
/                     \         /                     \
agcctatcatcttggtctctgca   <-->  TTTATATCGCATGACTCTTCTTT
|||||||||||||||||||||||         |||||||||||||||||||||||
tcggatagtagaaccagagacgt                          <AGAAA
                                TTTAT>
                                |||||||||||||||||||||||
                          <-->  AAATATAGCGTACTGAGAAGAAA

     agcctatcatcttggtctctgcaTTTATATCGCATGACTCTTCTTT
     ||||||||||||||||||||||||||||||||||||||||||||||
     tcggatagtagaaccagagacgtAAATATAGCGTACTGAGAAGAAA
     \___________________ c ______________________/

In this case only the forward primer of b is fitted with a tail with a part a:

agcctatcatcttggtctctgca
|||||||||||||||||||||||
tcggatagtagaaccagagacgt

                       TTTATATCGCATGACTCTTCTTT
                       |||||||||||||||||||||||
                                        <AGAAA
         tcttggtctctgcaTTTATAT
                       |||||||||||||||||||||||
                       AAATATAGCGTACTGAGAAGAAA

PCR products with flanking sequences are formed in the PCR process.

agcctatcatcttggtctctgcaTTTATAT
||||||||||||||||||||||||||||||
tcggatagtagaaccagagacgtAAATATA
                \____________/

                   identical
                   sequences
                 ____________
                /            \
                ctctgcaTTTATATCGCATGACTCTTCTTT
                ||||||||||||||||||||||||||||||
                gagacgtAAATATAGCGTACTGAGAAGAAA

The fragments can be fused by for example Gibson assembly:

agcctatcatcttggtctctgcaTTTATAT
||||||||||||||||||||||||||||||
tcggatagtagaacca

                             TCGCATGACTCTTCTTT
                ||||||||||||||||||||||||||||||
                gagacgtAAATATAGCGTACTGAGAAGAAA 

to form c:

agcctatcatcttggtctctgcaTTTATATCGCATGACTCTTCTTT
||||||||||||||||||||||||||||||||||||||||||||||
tcggatagtagaaccagagacgtAAATATAGCGTACTGAGAAGAAA

The first argument of this function is a list of sequence objects containing Amplicons and other similar objects.

The overlap argument controls how many base pairs of overlap required between adjacent sequence fragments. In the junction between Amplicons, tails with the length of about half of this value is added to the two primers closest to the junction.

>       <
Amplicon1
         Amplicon2
         >       <

         ⇣

>       <-
Amplicon1
         Amplicon2
        ->       <                     

In the case of an Amplicon adjacent to a Dseqrecord object, the tail will be twice as long (1*overlap) since the recombining sequence is present entirely on this primer:

Dseqrecd1
         Amplicon1
         >       <

         ⇣

Dseqrecd1
         Amplicon1
       -->       <

Note that if the sequence of DNA fragments starts or stops with an Amplicon, the very first and very last prinmer will not be modified i.e. assembles are always assumed to be linear. There are simple tricks around that for circular assemblies depicted in the last two examples below.

The maxlink arguments controls the cut off length for sequences that will be synhtesized by adding them to primers for the adjacent fragment(s). The argument list may contain short spacers (such as spacers between fusion proteins).

Example 1: Linear assembly of PCR products (pydna.amplicon.Amplicon class objects) ------

>       <         >       <
Amplicon1         Amplicon3
         Amplicon2         Amplicon4
         >       <         >       <

                     ⇣
                     pydna.design.assembly_fragments
                     ⇣ 

>       <-       ->       <-                      pydna.assembly.Assembly
Amplicon1         Amplicon3                         
         Amplicon2         Amplicon4     ➤  Amplicon1Amplicon2Amplicon3Amplicon4
        ->       <-       ->       <

Example 2: Linear assembly of alternating Amplicons and other fragments

>       <         >       <
Amplicon1         Amplicon2
         Dseqrecd1         Dseqrecd2

                     ⇣
                     pydna.design.assembly_fragments
                     ⇣ 

>       <--     -->       <--                     pydna.assembly.Assembly
Amplicon1         Amplicon2
         Dseqrecd1         Dseqrecd2     ➤  Amplicon1Dseqrecd1Amplicon2Dseqrecd2

Example 3: Linear assembly of alternating Amplicons and other fragments

Dseqrecd1         Dseqrecd2
         Amplicon1         Amplicon2
         >       <       -->       <

                     ⇣
             pydna.design.assembly_fragments
                     ⇣
                                                  pydna.assembly.Assembly
Dseqrecd1         Dseqrecd2
         Amplicon1         Amplicon2     ➤  Dseqrecd1Amplicon1Dseqrecd2Amplicon2
       -->       <--     -->       <

Example 4: Circular assembly of alternating Amplicons and other fragments

                 ->       <==
Dseqrecd1         Amplicon2
         Amplicon1         Dseqrecd1
       -->       <-
                     ⇣
                     pydna.design.assembly_fragments
                     ⇣ 
                                                   pydna.assembly.Assembly
                 ->       <==
Dseqrecd1         Amplicon2                    -Dseqrecd1Amplicon1Amplicon2-  
         Amplicon1                       ➤    |                             |
       -->       <-                            -----------------------------

------ Example 5: Circular assembly of Amplicons

>       <         >       <
Amplicon1         Amplicon3
         Amplicon2         Amplicon1
         >       <         >       <

                     ⇣
                     pydna.design.assembly_fragments
                     ⇣ 

>       <=       ->       <-        
Amplicon1         Amplicon3                  
         Amplicon2         Amplicon1
        ->       <-       +>       <

                     ⇣
             make new Amplicon using the Amplicon1.template and 
             the last fwd primer and the first rev primer.
                     ⇣
                                                   pydna.assembly.Assembly
+>       <=       ->       <-        
 Amplicon1         Amplicon3                  -Amplicon1Amplicon2Amplicon3-
          Amplicon2                      ➤   |                             |
         ->       <-                          -----------------------------
Parameters:

f : list of pydna.amplicon.Amplicon and other Dseqrecord like objects

list Amplicon and Dseqrecord object for which fusion primers should be constructed.

overlap : int, optional

Length of required overlap between fragments.

maxlink : int, optional

Maximum length of spacer sequences that may be present in f. These will be included in tails for designed primers.

Returns:

seqs : list of pydna.amplicon.Amplicon and other Dseqrecord like objects pydna.amplicon.Amplicon objects

[Amplicon1,
 Amplicon2, ...]

Examples

>>> from pydna.dseqrecord import Dseqrecord    
>>> from pydna.design import primer_design
>>> a=primer_design(Dseqrecord("atgactgctaacccttccttggtgttgaacaagatcgacgacatttcgttcgaaacttacgatg"))
>>> b=primer_design(Dseqrecord("ccaaacccaccaggtaccttatgtaagtacttcaagtcgccagaagacttcttggtcaagttgcc"))
>>> c=primer_design(Dseqrecord("tgtactggtgctgaaccttgtatcaagttgggtgttgacgccattgccccaggtggtcgtttcgtt"))
>>> from pydna.design import assembly_fragments
>>> # We would like a circular recombination, so the first sequence has to be repeated
>>> fa1,fb,fc,fa2 = assembly_fragments([a,b,c,a])
>>> # Since all fragments are Amplicons, we need to extract the rp of the 1st and fp of the last fragments.
>>> from pydna.amplify import pcr
>>> fa = pcr(fa2.forward_primer, fa1.reverse_primer, a)
>>> [fa,fb,fc]
[Amplicon(100), Amplicon(101), Amplicon(102)]
>>> from pydna.assembly import Assembly
>>> assemblyobj = Assembly([fa,fb,fc])
>>> assemblyobj
Assembly:
Sequences........................: [100] [101] [102]
Sequences with shared homologies.: [100] [101] [102]
Homology limit (bp)..............: 25
Number of overlaps...............: 3
Nodes in graph(incl. 5' & 3')....: 5
Only terminal overlaps...........: No
Circular products................: [195]
Linear products..................: [231] [231] [231] [167] [166] [165] [36] [36] [36]
>>> assemblyobj.linear_products
[Contig(-231), Contig(-231), Contig(-231), Contig(-167), Contig(-166), Contig(-165), Contig(-36), Contig(-36), Contig(-36)]
>>> assemblyobj.circular_products[0].cseguid()
'V3Mi8zilejgyoH833UbjJOtDMbc'
>>> (a+b+c).looped().cseguid()
'V3Mi8zilejgyoH833UbjJOtDMbc'
>>> print(assemblyobj.circular_products[0].small_fig())
 -|100bp_PCR_prod|36
|                 \/
|                 /\
|                 36|101bp_PCR_prod|36
|                                   \/
|                                   /\
|                                   36|102bp_PCR_prod|36
|                                                     \/
|                                                     /\
|                                                     36-
|                                                        |
 --------------------------------------------------------
>>>
pydna.design.primer_design(template, fp=None, rp=None, target_tm=55.0, fprimerc=1000.0, rprimerc=1000.0, saltc=50.0, limit=13, formula=<function tmbresluc>)[source]

This function designs a forward primer and a reverse primer for PCR amplification of a given template sequence.

The template argument is a Dseqrecord object or equivalent containing the template sequence.

The optional fp and rp arguments can contain an existing primer for the sequence (either the forward or reverse primer). One or the other primers can be specified, not both (since then there is nothing to design!, use the pydna.amplify.pcr function instead).

If one ofthe primers is given, the other primer is designed to match in terms of Tm. If both primers are designed, they will be designed to target_tm

fprimerc, rprimerc and saltc are formward and reverse primer concentration (nM). Saltc is the salt concentration. These arguments might affect how Tm is calculated.

formula is a function that can take at least three arguments f( str, primerc=float, saltc=float). There are several of these in the pydna.tm module.

The function returns a pydna.amplicon.Amplicon class instance. This object has the object.forward_primer and object.reverse_primer properties which contain the designed primers.

Parameters:

template : pydna.dseqrecord.Dseqrecord

a Dseqrecord object. The only required argument.

fp, rp : pydna.primer.Primer, optional

optional pydna.primer.Primer objects containing one primer each.

target_tm : float, optional

target tm for the primers, set to 55°C by default.

fprimerc : float, optional

Concentration of forward primer in nM, set to 1000.0 nM by default.

rprimerc : float, optional

Concentration of reverse primer in nM, set to 1000.0 nM by default.

saltc : float, optional

Salt concentration (monovalet cations) tmbresluc set to 50.0 mM by default

formula : function

formula used for tm calculation this is the name of a function. built in options are:

  1. pydna.amplify.tmbresluc() (default)
  2. pydna.amplify.basictm()
  3. pydna.amplify.tmstaluc98()
  4. pydna.amplify.tmbreslauer86()

These functions are imported from the pydna.amplify module, but can be substituted for some other custom made function.

Returns:

result : Amplicon

Examples

>>> from pydna.dseqrecord import Dseqrecord
>>> t=Dseqrecord("atgactgctaacccttccttggtgttgaacaagatcgacgacatttcgttcgaaacttacgatg")
>>> t
Dseqrecord(-64)
>>> from pydna.design import primer_design
>>> ampl = primer_design(t)
>>> ampl
Amplicon(64)
>>> ampl.forward_primer
fw64 18-mer:5'-atgactgctaacccttcc-3'
>>> ampl.reverse_primer
rv64 19-mer:5'-catcgtaagtttcgaacga-3'
>>> print(ampl.figure())
5atgactgctaacccttcc...tcgttcgaaacttacgatg3
                      ||||||||||||||||||| tm 53.8 (dbd) 60.6
                     3agcaagctttgaatgctac5
5atgactgctaacccttcc3
 |||||||||||||||||| tm 54.4 (dbd) 58.4
3tactgacgattgggaagg...agcaagctttgaatgctac5
>>> pf = "GGATCC" + ampl.forward_primer
>>> pr = "GGATCC" + ampl.reverse_primer  
>>> pf
fw64 24-mer:5'-GGATCCatgactgct..tcc-3'
>>> pr
rv64 25-mer:5'-GGATCCcatcgtaag..cga-3'
>>> from pydna.amplify import pcr
>>> pcr_prod = pcr(pf, pr, t)
>>> print(pcr_prod.figure())
      5atgactgctaacccttcc...tcgttcgaaacttacgatg3
                            ||||||||||||||||||| tm 53.8 (dbd) 60.6
                           3agcaagctttgaatgctacCCTAGG5
5GGATCCatgactgctaacccttcc3
       |||||||||||||||||| tm 54.4 (dbd) 58.4
      3tactgacgattgggaagg...agcaagctttgaatgctac5
>>> print(pcr_prod.seq)
GGATCCatgactgctaacccttccttggtgttgaacaagatcgacgacatttcgttcgaaacttacgatgGGATCC
>>> from pydna.primer import Primer
>>> pf = Primer("atgactgctaacccttccttggtgttg", id="myprimer")
>>> ampl = primer_design(t, fp = pf)
>>> ampl.forward_primer
myprimer 27-mer:5'-atgactgctaaccct..ttg-3'
>>> ampl.reverse_primer
rv64 28-mer:5'-catcgtaagtttcga..gtc-3'

pydna.download module

Provides a class for downloading sequences from genbank.

pydna.dseq module

Provides two classes, Dseq and Dseqrecord, for handling double stranded DNA sequences. Dseq and Dseqrecord are subclasses of Biopythons Seq and SeqRecord classes, respectively. These classes support the notion of circular and linear DNA.

class pydna.dseq.Dseq(watson, crick=None, ovhg=None, linear=None, circular=None, alphabet=IUPACAmbiguousDNA())[source]

Bases: Bio.Seq.Seq

Dseq is a class designed to hold information for a double stranded DNA fragment. Dseq also holds information describing the topology of the DNA fragment (linear or circular).

Parameters:

watson : str

a string representing the watson (sense) DNA strand.

crick : str, optional

a string representing the crick (antisense) DNA strand.

ovhg : int, optional

A positive or negative number to describe the stagger between the watson and crick strands. see below for a detailed explanation.

linear : bool, optional

True indicates that sequence is linear, False that it is circular.

circular : bool, optional

True indicates that sequence is circular, False that it is linear.

alphabet : Bio.Alphabet, optional

Bio.Alphabet.IUPAC.IUPACAmbiguousDNA from the Biopython package is set as default.

Examples

Dseq is a subclass of the Biopython Seq object. It stores two strings representing the watson (sense) and crick(antisense) strands. two properties called linear and circular, and a numeric value ovhg (overhang) describing the stagger for the watson and crick strand in the 5’ end of the fragment.

The most common usage is probably to create a Dseq object as a part of a Dseqrecord object (see pydna.dseqrecord.Dseqrecord).

There are three ways of creating a Dseq object directly:

Only one argument (string):

>>> from pydna.dseq import Dseq
>>> Dseq("aaa")
Dseq(-3)
aaa
ttt

The given string will be interpreted as the watson strand of a blunt, linear double stranded sequence object. The crick strand is created automatically from the watson strand.

Two arguments (string, string):

>>> from pydna.dseq import Dseq
>>> Dseq("gggaaat","ttt")
Dseq(-7)
gggaaat
   ttt

If both watson and crick are given, but not ovhg an attempt will be made to find the best annealing between the strands. There are limitations to this! For long fragments it is quite slow. The length of the annealing sequences have to be at least half the length of the shortest of the strands.

Three arguments (string, string, ovhg=int):

The ovhg parameter controls the stagger at the five prime end:

ovhg=-2

XXXXX
  XXXXX

ovhg=-1

XXXXX
 XXXXX

ovhg=0

XXXXX
XXXXX

ovhg=1

 XXXXX
XXXXX

ovhg=2

  XXXXX
XXXXX

Example of creating Dseq objects with different amounts of stagger:

>>> Dseq(watson="agt",crick="actta",ovhg=-2)
Dseq(-7)
agt
  attca
>>> Dseq(watson="agt",crick="actta",ovhg=-1)
Dseq(-6)
agt
 attca
>>> Dseq(watson="agt",crick="actta",ovhg=0)
Dseq(-5)
agt
attca
>>> Dseq(watson="agt",crick="actta",ovhg=1)
Dseq(-5)
 agt
attca
>>> Dseq(watson="agt",crick="actta",ovhg=2)
Dseq(-5)
  agt
attca

If the ovhg parameter is psecified a crick strand also needs to be supplied, otherwise an exception is raised.

>>> Dseq(watson="agt",ovhg=2)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/pydna_/dsdna.py", line 169, in __init__
    else:
Exception: ovhg defined without crick strand!

The default alphabet is set to Biopython IUPACAmbiguousDNA

The shape of the fragment is set by either:

  1. linear = False, True
  2. circular = True, False

Note that both ends of the DNA fragment has to be compatible to set circular = True (or linear = False).

>>> Dseq("aaa","ttt")
Dseq(-3)
aaa
ttt
>>> Dseq("aaa","ttt",ovhg=0)
Dseq(-3)
aaa
ttt
>>> Dseq("aaa", "ttt", linear = False ,ovhg=0)
Dseq(o3)
aaa
ttt
>>> Dseq("aaa", "ttt", circular = True , ovhg=0)
Dseq(o3)
aaa
ttt

Coercing to string

>>> a=Dseq("tttcccc","aaacccc")
>>> a
Dseq(-11)
    tttcccc
ccccaaa
>>> str(a)
'ggggtttcccc'

The double stranded part is accessible through the dsdata property:

>>> a.dsdata
'ttt'

A Dseq object can be longer that either the watson or crick strands.

<-- length -->
GATCCTTT
     AAAGCCTAG

The slicing of a linear Dseq object works mostly as it does for a string.

>>> s="ggatcc"
>>> s[2:3]
'a'
>>> s[2:4]
'at'
>>> s[2:4:-1]
''
>>> s[::2]
'gac'
>>> from pydna.dseq import Dseq
>>> d=Dseq(s, linear=True)
>>> d[2:3]
Dseq(-1)
a
t
>>> d[2:4]
Dseq(-2)
at
ta
>>> d[2:4:-1]
Dseq(-0)


>>> d[::2]
Dseq(-3)
gac
ctg

The slicing of a circular Dseq object has a slightly different meaning.

>>> s="ggAtCc"
>>> d=Dseq(s, circular=True)
>>> d
Dseq(o6)
ggAtCc
ccTaGg
>>> d[4:3]
Dseq(-5)
CcggA
GgccT

The slice [X:X] produces an empty slice for a string, while this will return the linearized sequence starting at X:

>>> s="ggatcc"
>>> d=Dseq(s, circular=True)
>>> d
Dseq(o6)
ggatcc
cctagg
>>> d[3:3]
Dseq(-6)
tccgga
aggcct
>>>
T4(nucleotides=None)[source]

Fill in of five prime protruding ends and chewing back of three prime protruding ends by a DNA polymerase providing both 5’-3’ DNA polymerase activity and 3’-5’ nuclease acitivty (such as T4 DNA polymerase). This can be done in presence of any combination of the four A, G, C or T. Default are all four nucleotides together.

Alias for the t4() method

Parameters:nucleotides : str

Examples

>>> from pydna.dseq import Dseq
>>> a=Dseq("gatcgatc")
>>> a
Dseq(-8)
gatcgatc
ctagctag
>>> a.T4()
Dseq(-8)
gatcgatc
ctagctag
>>> a.T4("t")
Dseq(-8)
gatcgat
 tagctag
>>> a.T4("a")
Dseq(-8)
gatcga
  agctag
>>> a.T4("g")
Dseq(-8)
gatcg
   gctag
>>>
circular

The circular property

cut(*enzymes)[source]

Returns a list of linear Dseq fragments produced in the digestion. If there are no cuts, an empty list is returned.

Parameters:

enzymes : enzyme object or iterable of such objects

A Bio.Restriction.XXX restriction objects or iterable.

Returns:

frags : list

list of Dseq objects formed by the digestion

Examples

>>> from pydna.dseq import Dseq
>>> seq=Dseq("ggatccnnngaattc")
>>> seq
Dseq(-15)
ggatccnnngaattc
cctaggnnncttaag
>>> from Bio.Restriction import BamHI,EcoRI
>>> type(seq.cut(BamHI))
<class 'list'>
>>> for frag in seq.cut(BamHI): print(frag.fig())
Dseq(-5)
g
cctag
Dseq(-14)
gatccnnngaattc
    gnnncttaag
>>> seq.cut(EcoRI, BamHI) ==  seq.cut(BamHI, EcoRI)
True
>>> a,b,c = seq.cut(EcoRI, BamHI)
>>> a+b+c
Dseq(-15)
ggatccnnngaattc
cctaggnnncttaag
>>>
fig()[source]

Returns a representation of the sequence, truncated if longer than 30 bp:

Examples

>>> from pydna.dseq import Dseq
>>> a=Dseq("atcgcttactagcgtactgatcatctgact")
>>> a
Dseq(-30)
atcgcttactagcgtactgatcatctgact
tagcgaatgatcgcatgactagtagactga
>>> a+=Dseq("A")
>>> a
Dseq(-31)
atcg..actA
tagc..tgaT
fill_in(nucleotides=None)[source]

Fill in of five prime protruding end with a DNA polymerase that has only DNA polymerase activity (such as exo-klenow [4]) and any combination of A, G, C or T. Default are all four nucleotides together.

Parameters:nucleotides : str

References

[4]http://en.wikipedia.org/wiki/Klenow_fragment#The_exo-_Klenow_fragment

Examples

>>> from pydna.dseq import Dseq
>>> a=Dseq("aaa", "ttt")
>>> a
Dseq(-3)
aaa
ttt
>>> a.fill_in()
Dseq(-3)
aaa
ttt
>>> b=Dseq("caaa", "cttt")
>>> b
Dseq(-5)
caaa
 tttc
>>> b.fill_in()
Dseq(-5)
caaag
gtttc
>>> b.fill_in("g")
Dseq(-5)
caaag
gtttc
>>> b.fill_in("tac")
Dseq(-5)
caaa
 tttc
>>> c=Dseq("aaac", "tttg")
>>> c
Dseq(-5)
 aaac
gttt
>>> c.fill_in()
Dseq(-5)
 aaac
gttt
>>>
find(sub, start=0, end=9223372036854775807)[source]

Find method, like that of a python string.

This behaves like the python string method of the same name.

Returns an integer, the index of the first occurrence of substring argument sub in the (sub)sequence given by [start:end].

Returns -1 if the subsequence is NOT found.

Parameters:

sub : string or Seq object

a string or another Seq object to look for.

start : int, optional

slice start.

end : int, optional

slice end.

Examples

>>> from pydna.dseq import Dseq
>>> seq = Dseq("atcgactgacgtgtt")
>>> seq
Dseq(-15)
atcgactgacgtgtt
tagctgactgcacaa
>>> seq.find("gac")
3
>>> seq = Dseq(watson="agt",crick="actta",ovhg=-2)
>>> seq
Dseq(-7)
agt
  attca
>>> seq.find("taa")
2
five_prime_end()[source]

Returns a tuple describing the structure of the 5’ end of the DNA fragment

Examples

>>> from pydna.dseq import Dseq
>>> a=Dseq("aaa", "ttt")
>>> a
Dseq(-3)
aaa
ttt
>>> a.five_prime_end()
('blunt', '')
>>> a=Dseq("aaa", "ttt", ovhg=1)
>>> a
Dseq(-4)
 aaa
ttt
>>> a.five_prime_end()
("3'", 't')
>>> a=Dseq("aaa", "ttt", ovhg=-1)
>>> a
Dseq(-4)
aaa
 ttt
>>> a.five_prime_end()
("5'", 'a')
>>>
linear

The linear property

looped()[source]

Returns a circularized Dseq object. This can only be done if the two ends are compatible, otherwise a TypeError is raised.

Examples

>>> from pydna.dseq import Dseq
>>> a=Dseq("catcgatc")
>>> a
Dseq(-8)
catcgatc
gtagctag
>>> a.looped()
Dseq(o8)
catcgatc
gtagctag
>>> a.T4("t")
Dseq(-8)
catcgat
 tagctag
>>> a.T4("t").looped()
Dseq(o7)
catcgat
gtagcta
>>> a.T4("a")
Dseq(-8)
catcga
  agctag
>>> a.T4("a").looped()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/pydna/dsdna.py", line 357, in looped
    if type5 == type3 and str(sticky5) == str(rc(sticky3)):
TypeError: DNA cannot be circularized.
5' and 3' sticky ends not compatible!
>>>
mung()[source]

Simulates treatment a nuclease with 5’-3’ and 3’-5’ single strand specific exonuclease activity (such as mung bean nuclease [5])

    ggatcc    ->     gatcc
     ctaggg          ctagg

     ggatcc   ->      ggatc
    tcctag            cctag

>>> from pydna.dseq import Dseq
>>> b=Dseq("caaa", "cttt")
>>> b
Dseq(-5)
caaa
 tttc
>>> b.mung()
Dseq(-3)
aaa
ttt
>>> c=Dseq("aaac", "tttg")
>>> c
Dseq(-5)
 aaac
gttt
>>> c.mung()
Dseq(-3)
aaa
ttt

References

[5]http://en.wikipedia.org/wiki/Mung_bean_nuclease
mw()[source]
ovhg

The ovhg property

rc()[source]

Returns a Dseq object where watson and crick have switched places. This represents the same double stranded sequence.

Examples

>>> from pydna.dseq import Dseq
>>> a=Dseq("catcgatc")
>>> a
Dseq(-8)
catcgatc
gtagctag
>>> b=a.reverse_complement()
>>> b
Dseq(-8)
gatcgatg
ctagctac
>>>
reverse_complement()[source]

Alias of the rc method

seguid()[source]
t4(*args, **kwargs)[source]

Alias for the T4() method

three_prime_end()[source]

Returns a tuple describing the structure of the 5’ end of the DNA fragment

>>> from pydna.dseq import Dseq
>>> a=Dseq("aaa", "ttt")
>>> a
Dseq(-3)
aaa
ttt
>>> a.three_prime_end()
('blunt', '')
>>> a=Dseq("aaa", "ttt", ovhg=1)
>>> a
Dseq(-4)
 aaa
ttt
>>> a.three_prime_end()
("3'", 'a')
>>> a=Dseq("aaa", "ttt", ovhg=-1)
>>> a
Dseq(-4)
aaa
 ttt
>>> a.three_prime_end()
("5'", 't')
>>>
tolinear()[source]

Returns a blunt, linear copy of a circular Dseq object. This can only be done if the Dseq object is circular, otherwise a TypeError is raised.

Examples

>>> from pydna.dseq import Dseq
>>> a=Dseq("catcgatc", circular=True)
>>> a
Dseq(o8)
catcgatc
gtagctag
>>> a.tolinear()
Dseq(-8)
catcgatc
gtagctag
>>>

pydna.dseqrecord module

Provides Dseqrecord, for handling double stranded DNA sequences. Dseq and Dseqrecord are subclasses of Biopythons Seq and SeqRecord classes, respectively. These classes support the notion of circular and linear DNA.

class pydna.dseqrecord.Dseqrecord(record, *args, circular=None, linear=None, n=5e-14, **kwargs)[source]

Bases: Bio.SeqRecord.SeqRecord

Dseqrecord is a double stranded version of the Biopython SeqRecord [6] class. The Dseqrecord object holds a Dseq object describing the sequence. Additionally, Dseqrecord hold meta information about the sequence in the from of a list of SeqFeatures, in the same way as the SeqRecord does. The Dseqrecord can be initialized with a string, Seq, Dseq, SeqRecord or another Dseqrecord. The sequence information will be stored in a Dseq object in all cases. Dseqrecord objects can be read or parsed from sequences in Fasta, Embl or Genbank format.

There is a short representation associated with the Dseqrecord. Dseqrecord(-3) represents a linear sequence of length 2 while Dseqrecord(o7) represents a circular sequence of length 7.

Dseqrecord and Dseq share the same concept of length

<-- length -->
GATCCTTT
     AAAGCCTAG
Parameters:

record : string, Seq, SeqRecord, Dseq or other Dseqrecord object

This data will be used to form the seq property

circular : bool, optional

True or False reflecting the shape of the DNA molecule

linear : bool, optional

True or False reflecting the shape of the DNA molecule

References

[6]http://biopython.org/wiki/SeqRecord

Examples

>>> from pydna.dseqrecord import Dseqrecord
>>> a=Dseqrecord("aaa")
>>> a
Dseqrecord(-3)
>>> a.seq
Dseq(-3)
aaa
ttt
>>> from Bio.Seq import Seq
>>> b=Dseqrecord(Seq("aaa"))
>>> b
Dseqrecord(-3)
>>> b.seq
Dseq(-3)
aaa
ttt
>>> from Bio.SeqRecord import SeqRecord
>>> c=Dseqrecord(SeqRecord(Seq("aaa")))
>>> c
Dseqrecord(-3)
>>> c.seq
Dseq(-3)
aaa
ttt
>>> a.seq.alphabet
IUPACAmbiguousDNA()
>>> b.seq.alphabet
IUPACAmbiguousDNA()
>>> c.seq.alphabet
IUPACAmbiguousDNA()
>>>
accession

alias for id property

add_feature(x=None, y=None, seq=None, label=None, type='misc', **kwargs)[source]

Adds a feature of type misc to the feature list of the sequence.

Parameters:

x : int

Indicates start of the feature

y : int

Indicates end of the feature

Examples

>>> from pydna.dseqrecord import Dseqrecord
>>> a=Dseqrecord("atgtaa")
>>> a.features
[]
>>> a.add_feature(2,4)
>>> a.features
[SeqFeature(FeatureLocation(ExactPosition(2), ExactPosition(4)), type='misc')]
circular

The circular property

cseguid()[source]

Returns the url safe cSEGUID for the sequence.

Only defined for circular sequences.

The cSEGUID checksum uniquely identifies a circular sequence regardless of where the origin is set. The two Dseqrecord objects below are circular permutations.

Examples

>>> from pydna.dseqrecord import Dseqrecord
>>> a=Dseqrecord("agtatcgtacatg", circular=True)
>>> a.cseguid() # cseguid is CTJbs6Fat8kLQxHj+/SC0kGEiYs
'CTJbs6Fat8kLQxHj-_SC0kGEiYs'
>>> a=Dseqrecord("gagtatcgtacat", circular=True)
>>> a.cseguid()
'CTJbs6Fat8kLQxHj-_SC0kGEiYs'
cut(*enzymes)[source]

Digest the Dseqrecord object with one or more restriction enzymes. returns a list of linear Dseqrecords. If there are no cuts, an empty list is returned.

See also Dseq.cut()

Parameters:

enzymes : enzyme object or iterable of such objects

A Bio.Restriction.XXX restriction object or iterable of such.

Returns:

Dseqrecord_frags : list

list of Dseqrecord objects formed by the digestion

Examples

>>> from pydna.dseqrecord import Dseqrecord
>>> a=Dseqrecord("ggatcc")
>>> from Bio.Restriction import BamHI
>>> a.cut(BamHI)
[Dseqrecord(-5), Dseqrecord(-5)]
>>> frag1, frag2 = a.cut(BamHI)
>>> frag1.seq
Dseq(-5)
g
cctag
>>> frag2.seq
Dseq(-5)
gatcc
    g
definition

alias for description property

extract_feature(n)[source]

Extracts a feature and creates a new Dseqrecord object.

Parameters:

n : int

Indicates the feature to extract

Examples

>>> from pydna.dseqrecord import Dseqrecord
>>> a=Dseqrecord("atgtaa")
>>> a.add_feature(2,4)
>>> b=a.extract_feature(0)
>>> b
Dseqrecord(-2)
>>> b.seq
Dseq(-2)
gt
ca
find(other)[source]
find_aa(other)[source]
find_aminoacids(other)[source]
>>> from pydna.dseqrecord import Dseqrecord
>>> s=Dseqrecord("atgtacgatcgtatgctggttatattttag")
>>> s.seq.translate()
Seq('MYDRMLVIF*', HasStopCodon(ExtendedIUPACProtein(), '*'))
>>> "RML" in s
True
>>> "MMM" in s
False
>>> s.seq.rc().translate()
Seq('LKYNQHTIVH', ExtendedIUPACProtein())
>>> "QHT" in s.rc()
True
>>> "QHT" in s
False
>>> slc = s.find_aa("RML")
>>> slc
slice(9, 18, None)
>>> s[slc]
Dseqrecord(-9)
>>> code = s[slc].seq
>>> code
Dseq(-9)
cgtatgctg
gcatacgac
>>> code.translate()
Seq('RML', ExtendedIUPACProtein())
format(f='gb')[source]

Returns the sequence as a string using a format supported by Biopython SeqIO [7]. Default is “gb” which is short for Genbank.

References

[7]http://biopython.org/wiki/SeqIO

Examples

>>> from pydna.dseqrecord import Dseqrecord
>>> x=Dseqrecord("aaa")
>>> x.annotations['date'] = '02-FEB-2013'
>>> x
Dseqrecord(-3)
>>> print(x.format("gb"))
LOCUS       name?                      3 bp    DNA     linear   UNK 02-FEB-2013
DEFINITION  description?
ACCESSION   id?
VERSION     id?
KEYWORDS    .
SOURCE      .
  ORGANISM  .
            .
FEATURES             Location/Qualifiers
ORIGIN
        1 aaa
//
gc()[source]

Returns GC content

isorf(table=1)[source]

Detects if sequence is an open reading frame (orf) in the 5’-3’ direction. Translation tables are numbers according to the NCBI numbering [8].

Parameters:

table : int

Sets the translation table, default is 1 (standard code)

Returns:

bool

True if sequence is an orf, False otherwise.

References

[8]http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi?mode=c

Examples

>>> from pydna.dseqrecord import Dseqrecord
>>> a=Dseqrecord("atgtaa")
>>> a.isorf()
True
>>> b=Dseqrecord("atgaaa")
>>> b.isorf()
False
>>> c=Dseqrecord("atttaa")
>>> c.isorf()
False
linear

The linear property

linearize(*enzymes)[source]

This method is similar to cut() but throws an exception if there is not excactly on cut i.e. none or more than one digestion products.

list_features()[source]

Prints an ASCII table with all features.

Examples

>>> from pydna.dseqrecord import Dseqrecord
>>> a=Dseqrecord("atgtaa")
>>> a.add_feature(2,4)
>>> print(a.list_features())
+----------+-----------+-------+-----+--------+--------------+------+------+
| Feature# | Direction | Start | End | Length | id           | type | orf? |
+----------+-----------+-------+-----+--------+--------------+------+------+
| 0        |    None   |   2   |  4  |      2 | <unknown id> | misc |  no  |
+----------+-----------+-------+-----+--------+--------------+------+------+
>>>
locus

alias for name property

looped()[source]

Returns a circular version of the Dseqrecord object. The underlying Dseq object has to have compatible ends.

Examples

>>> from pydna.dseqrecord import Dseqrecord
>>> a=Dseqrecord("aaa")
>>> a
Dseqrecord(-3)
>>> b=a.looped()
>>> b
Dseqrecord(o3)
>>>
lseguid()[source]

Returns the url safe lSEGUID for the sequence.

Only defined for linear double stranded sequences.

The lSEGUID checksum uniquely identifies a linear sequence independent of the direction. The two Dseqrecord objects below are each others reverse complements, so they do in fact refer to the same molecule.

Examples

>>> from pydna.dseqrecord import Dseqrecord
>>> a=Dseqrecord("agtatcgtacatg")
>>> a.lseguid()
'DPshMN4KeAjMovEjGEV4Kzj18lU'
>>> b=Dseqrecord("catgtacgatact")
>>> a.lseguid()
'DPshMN4KeAjMovEjGEV4Kzj18lU'
m()[source]
map_trace_files(pth)[source]
number_of_cuts(*enzymes)[source]
olaps(other, *args, **kwargs)[source]
rc()[source]

alias of the reverse_complement method

reverse_complement()[source]

Returns the reverse complement.

Examples

>>> from pydna.dseqrecord import Dseqrecord
>>> a=Dseqrecord("ggaatt")
>>> a
Dseqrecord(-6)
>>> a.seq
Dseq(-6)
ggaatt
ccttaa
>>> a.reverse_complement().seq
Dseq(-6)
aattcc
ttaagg
>>>
seguid()[source]
Returns the url safe SEGUID [9] for the sequence.
This checksum is the same as seguid but with base64.urlsafe encoding [10] instead of the normal base 64. This means that the characters + and / are replaced with - and _ so that the checksum can be a pert of and URL or a filename.

References

[9]http://wiki.christophchamp.com/index.php/SEGUID

Examples

>>> from pydna.dseqrecord import Dseqrecord
>>> a=Dseqrecord("aaaaaaa")
>>> a.seguid() # original seguid is +bKGnebMkia5kNg/gF7IORXMnIU
'-bKGnebMkia5kNg_gF7IORXMnIU'
shifted(shift)[source]

Returns a circular Dseqrecord with a new origin <shift>. This only works on circular Dseqrecords. If we consider the following circular sequence:

GAAAT   <-- watson strand
CTTTA   <-- crick strand

The T and the G on the watson strand are linked together as well as the A and the C of the of the crick strand.

if shift is 1, this indicates a new origin at position 1:

new origin at the | symbol:

G|AAAT
C|TTTA

new sequence:

AAATG
TTTAC

Shift is always positive and 0<shift<length, so in the example below, permissible values of shift are 1,2 and 3

Examples

>>> from pydna.dseqrecord import Dseqrecord
>>> a=Dseqrecord("aaat",circular=True)
>>> a
Dseqrecord(o4)
>>> a.seq
Dseq(o4)
aaat
ttta
>>> b=a.shifted(1)
>>> b
Dseqrecord(o4)
>>> b.seq
Dseq(o4)
aata
ttat
spread_ape_colors()[source]

This method assigns random colors compatible with the ApE editor to features.

stamp()[source]

Adds a SEGUID or cSEGUID checksum and a datestring to the description property. This will show in the genbank format.

For linear sequences:

SEGUID_<seguid>_<datestring>

For circular sequences:

cSEGUID_<seguid>_<datestring>

If there is already a stamp,

The stamp can be verified with verify_stamp()

Examples

>>> from pydna.dseqrecord import Dseqrecord
>>> a=Dseqrecord("aaa")
>>> a.stamp()
'SEGUID_YG7G6b2Kj_KtFOX63j8mRHHoIlE...'
>>> a.description
'SEGUID_YG7G6b2Kj_KtFOX63j8mRHHoIlE...'
>>> a.verify_stamp()
'SEGUID_YG7G6b2Kj_KtFOX63j8mRHHoIlE'
synced(*args, **kwargs)
tolinear()[source]

Returns a linear, blunt copy of a circular Dseqrecord object. The underlying Dseq object has to be circular.

Examples

>>> from pydna.dseqrecord import Dseqrecord
>>> a=Dseqrecord("aaa", circular = True)
>>> a
Dseqrecord(o3)
>>> b=a.tolinear()
>>> b
Dseqrecord(-3)
>>>
verify_stamp()[source]

Verifies the SEGUID stamp in the description property is valid. True if stamp match the sequid calculated from the sequence. Exception raised if no stamp can be found.

>>> from pydna.dseqrecord import Dseqrecord
>>> from pydna.readers import read
>>> b=read(">a\naaa")
>>> b.annotations['date'] = '02-FEB-2013'
>>> b.seguid()
'YG7G6b2Kj_KtFOX63j8mRHHoIlE'
>>> print(b.format("gb"))
LOCUS       a                          3 bp    DNA     linear   UNK 02-FEB-2013
DEFINITION  a
ACCESSION   a
VERSION     a
KEYWORDS    .
SOURCE      .
  ORGANISM  .
            .
FEATURES             Location/Qualifiers
ORIGIN
        1 aaa
//
>>> b.stamp()
'SEGUID_YG7G6b2Kj_KtFOX63j8mRHHoIlE'
>>> b
Dseqrecord(-3)
>>> print(b.format("gb"))
LOCUS       a                          3 bp    DNA     linear   UNK 02-FEB-2013
DEFINITION  a SEGUID_YG7G6b2Kj_KtFOX63j8mRHHoIlE_...
ACCESSION   a
VERSION     a
KEYWORDS    .
SOURCE      .
  ORGANISM  .
            .
FEATURES             Location/Qualifiers
ORIGIN
        1 aaa
//
>>> b.verify_stamp()
'SEGUID_YG7G6b2Kj_KtFOX63j8mRHHoIlE'
>>>
write(filename=None, f='gb')[source]

Writes the Dseqrecord to a file using the format f, which must be a format supported by Biopython SeqIO for writing [11]. Default is “gb” which is short for Genbank. Note that Biopython SeqIO reads more formats than it writes.

Filename is the path to the file where the sequece is to be written. The filename is optional, if it is not given, the description property (string) is used together with the format.

If obj is the Dseqrecord object, the default file name will be:

<obj.locus>.<f>

Where <f> is “gb” by default. If the filename already exists and AND the sequence it contains is different, a new file name will be used so that the old file is not lost:

<obj.locus>_NEW.<f>

References

[10]http://biopython.org/wiki/SeqIO

pydna.editor module

This module provides a class for opening a sequence using an editor that accepts a file as a command line argument.

ApE - A plasmid Editor [12] is and excellent editor for this purpose.

References

[11]http://biologylabs.utah.edu/jorgensen/wayned/ape/
class pydna.editor.Editor(shell_command_for_editor, tmpdir=None)[source]

Bases: object

The Editor class needs to be instantiated before use.

Parameters:

shell_command_for_editor : str

String containing the path to the editor

tmpdir : str, optional

String containing path to the temprary directory where sequence files are stored before opening.

Examples

>>> import pydna
>>> #ape = pydna.Editor("tclsh8.6 /home/bjorn/.ApE/apeextractor/ApE.vfs/lib/app-AppMain/AppMain.tcl")
>>> #ape.open("aaa") # This command opens the sequence in the ApE editor
open(seq)[source]

Open a sequence for editing in an external (DNA) editor.

Parameters:args : SeqRecord or Dseqrecord object
pydna.editor.ape(*args, **kwargs)[source]

pydna.gel module

Provides the class Gel for the simulation of agarose slab-gel electrophoresis of DNA at constant electric field.

Note

This code is at an early stage of development and documentation.

pydna.gel.DKuhn(kB, T, eta, l)
pydna.gel.Dblob(kB, T, eta, a)
class pydna.gel.Fakeseq(l, n=1e-11)[source]

Bases: object

pydna.gel.Gauss_FWHM(FWTM)
pydna.gel.Gauss_dev(FWHM)
pydna.gel.Gauss_hgt(auc, dev)
pydna.gel.Gaussian(x, hgt, ctr, dev)
class pydna.gel.Gel(samples, names=None, percentgel=<Quantity(1.0, 'gram / milliliter')>, electrfield=<Quantity(5.0, 'volt / centimeter')>, temperature=<Quantity(295.15, 'kelvin')>, gel_len=<Quantity(8, 'centimeter')>, wellxy=<Quantity([7 2], 'millimeter')>, wellsep=<Quantity(2, 'millimeter')>)[source]

Bases: object

Gel object for DNA slab-gel electrophoresis simulation.

Class designed to store information regarding an electrophoresis experimental setup. Includes gel’s and well’s dimensions, DNA samples loaded, agarose concentration, electric field intensity and temperature.

Parameters:

samples : list of lists of pydna.Dseqrecord objects

List of samples with the DNA fragments to separate. Each sample corresponds to a different well. Accepts lists of pydna.Dseq objects and converts them to Dseqrecords with the default quantity (n parameter).

names : list of str, optional

List with the identifiers of the samples. Defaults to None, in which case the identifiers are of the form “lane i” where i is the index.

percentgel : pint.unit.Quantity object, float or int, optional

Agarose concentration in the gel. Defaults to Q_(1.0, ‘(g/(100 mL))*100’). If a float or int is given assumes grams of agarose per milliliter of buffer.

electrfield : pint.unit.Quantity object, float or int, optional

Electric field intensity. Defaults to Q_(5.0, ‘V/cm’). If a float or int is given assumes Volts per centimeter.

temperature : pint.unit.Quantity object, float or int, optional

Absolute temperature. Defaults to Q_(295.15, ‘K’). If a float or int is given assumes Kelvin. The temperature is only considered in the calculation of the diffusional component of the band width.

gel_len : pint.unit.Quantity object, float or int, optional

Gel length (as measured from the bottom of the well in the direction of the DNA migration). Defaults to Q_(8, ‘cm’). If a float or int is given assumes centimeters.

wellxy : iterable of pint.unit.Quantity object, float or int, optional

Well’s dimensions. Well’s width (perpendicular to the migration direction) and well’s height (migration direction). Defaults to Q_([7, 2], ‘mm’). If a float or int is given assumes millimeters. The well’s width is merely used for aesthetic purposes. The well’s height is used to compute the initial band width.

wellsep : pint.unit.Quantity object, float or int, optional

Separation between wells. Defaults to Q_(2, ‘mm’). Merelly for aestetic purposes.

Notes

  • The DNA is diluted in the full volume of the well.
  • The previous implies that the injection size, which relates to

the initial bandwidth, is assumed to be well height (welly). * The DNA electrophoretic mobility in the well is given by the free solution mobility and is equal for every fragment.

run(till_len=0.75, till_time=None, exposure=0.5, plot=True, res=<Quantity(500, 'pixel / inch')>, cursor_ovr={'hover': False}, back_col=0.3, band_col=1, well_col=0.05, noise=0.015, detectlim=0.04, interpol='linear', dset_name='vertical', replNANs=True)[source]

Run electrophoresis.

Run the electrophoresis procedure until a specified time or until the quickest band reaches a specified gel length. If both conditions are given the most stringent will be respected.

Parameters:

till_len : float (0 to 1), optional

Fraction of the gel length (measured from the well bottom) to serve as finish line. Defaults to 0.75.

till_time : pint.unit.Quantity object, float or int, optional

Time at which to stop the electrophoresis. Defaults to None. If float or int is given assumes hours.

exposure : float (0 to 1)

Fraction of signal saturation. If exposure is set to 0 every band’s light intensity will be given by a perfect Gaussian curve. Wider bands with lower amounts of DNA might be hard to see. The closer to 1 the exposure the higher the saturation, which favors the weaker bands visualization. If exposure is set to 1 only the weakest band will be a Gaussian, all others will be saturated. Defaults to 0.5.

plot : {True, False}, optional

Whether to draw and return the gel picture or not. The electrophoresis data (such as migrated distances) will be stored in the Gel object either way.

res : pint.unit.Quantity object, float or int, optional

Resolution used to construct an array of light intensities corresponding to the gel picture. The dimensions of the intensities array are given by the gel’s dimensions multiplied by the resolution. A higher resolution implies more calculations and might cause the method to become slow. A lower resolution might result in a very pixelated image. Defaults to Q_(500, ‘px/in’). If float or int is given assumes pixels per inch. This resolution won’t necessarily convey to the final gel picture since the intensity array is processed by matplotlib.pyplot.imshow.

cursor_ovr : dict, optional

Key arguments to be passed to mpldatacursor.datacursor which provides interactive cursor functions to the resulting plot. Defaults to dict(hover=False).

back_col : float or int, optional

Background color (light intensity). Defaults to 0.3. Solid black (0) allows the maximum contrast.

band_col : float or int, optional

Band color (maximum light intensity). Defaults to 1. White (1) allows for maximum contrast.

well_col : float or int, optional

Well color (light intensity). Defaults to 0.05.

noise : float or int, optional

Standard deviation used to generate a Normal distribution of noise centered in the background color. This effect is purely aesthetic. Defaults to 0.015.

detectlim : float, optional

Minimal light intensity difference between the center of the band and the background color (back_col) below which the band is considered to be indistinguishable and a white doted outline is drawn around it. Defaults to 0.04.

interpol : {‘linear’, ‘cubic’, ‘nearest’}, optional

Interpolation method. This is passed to scipy.interpolate.griddata for the interpolation of the vWBR’s parameters from the experimental datasets (mu_S, mu_L, gamma = f(field, agarose concentration) [1-3]. Defaults to ‘linear’ which is more conservative. ‘nearest’ is not expected to provide good results.

dset_name : {‘vertical’, ‘horizontal’}, optional

Dataset identifier (str). Identifies the dataset to use. Currently two datasets are available (‘vertical’ and ‘horizontal’) [1-3]. The names alude to the geometry of the experimental setup. The geometry however is not expected to cause significant differences on the results.

replNANs : {True, False}, optional

Whether to replace NANs (not a number) by ‘nearest’ interpolation. NANs will only be produced if the conditions of electric field intensity and agarose concentration go beyond the concave space provided by the dataset.

References

[R1]Van Winkle, D.H., Beheshti, A., Rill, R.L.: DNA

electrophoresis in agarose gels: A simple relation describing the length dependence of mobility. ELECTROPHORESIS 23(1), 15–19 (2002)

[R2]Rill, R.L., Beheshti, A., Van Winkle, D.H.: DNA

electrophoresis in agarose gels: Effects of field and gel concentration on the exponential dependence of reciprocal mobility on DNA length. ELECTROPHORESIS 23(16), 2710–2719 (2002)

[R3]Beheshti, A.: DNA Electrophoresis in th Agarose Gels: A New

Mobility vs. DNA Length Dependence. Electronic Theses, Treatises and Dissertations (Paper 1207) (2002)

set_DNAspace_for_mu0(DNA_space)[source]
set_DNAspace_for_vWBRfit(DNA_space)[source]
set_Tvals_for_mu0(Tvals)[source]
set_field(electrfield)[source]
set_gelength(gel_len)[source]
set_percentgel(percentgel)[source]
set_temperature(temperature)[source]
set_wellsep(wellsep)[source]
set_wellx(wellx)[source]
set_welly(welly)[source]
pydna.gel.H2Oviscosity(T)
pydna.gel.NKuhn_to_N(NKuhn, l, a)
pydna.gel.NKuhn_to_Nbp(NKuhn, b, l)
pydna.gel.N_to_NKuhn(N, a, l)
pydna.gel.N_to_Nbp(N, a, b, l)
pydna.gel.Nbp_to_N(Nbp, a, b, l)
pydna.gel.Nbp_to_NKuhn(Nbp, b, l)
pydna.gel.Ogston_Rouse(Nbp, kB, T, a, eta, b, l)
pydna.gel.Ogston_Zimm(D0, g)
pydna.gel.Zimm_Rouse(x0, args)
pydna.gel.Zimm_g(Nbp, DRouse, qeff, mu0, kB, T)
pydna.gel.accel_plateau(epsilon)
pydna.gel.assign_quantitiesB(samples, maxdef=<Quantity(150, 'nanogram')>)[source]

Assigns quantities (masses in nanograms) to the DNA fragments without corresponding quantity assuming a linear relationship between the DNA length (in basepairs) and its mass. As if the fragments originated in a restriction procedure. For each sample takes the maximum quantity (either from the other samples or from the default) and assigns it to the fragment with greater length.

pydna.gel.bandbroadening(D, time)
pydna.gel.contour_length(Nbp, b)
pydna.gel.diff_Zimm_Rouse(Nbp, args)[source]
pydna.gel.diffusion_coefficient(Nbp, N_lim1, N_lim2, N_lim3, args)[source]
pydna.gel.dim_or_units(quantity, reference)[source]

If <quantity> is not an instance of <Quantity>, instantiate a <Quantity> object with <quantity> as magnitude and as <reference.units> as units. If <quantity> is an instance of <Quantity>, check whether it has the same dimensionality as <reference>.

pydna.gel.equil_accel(epsilon)
pydna.gel.ferguson_to_mu0(field, Tvals, DNAvals, dataset, mu_func, adjmethod='linear', replNANs=True, plot=True)[source]

This function extrapolates the free solution mobility (mu0) for a specified electric field intensity (field), via Ferguson plot (ln(mobility) vs. %agarose).

Mobiliy calculation method: [E,T,muS,muL,gamma] -> [E,T,mu(L)] -(L*)-> [E,T,mu] -(E*,T*,interp.)-> mu*

pydna.gel.flatten(List)[source]
pydna.gel.free_solution(kB, T, eta, Rh)
pydna.gel.gelplot_imshow(distances, bandwidths, intensities, lanes, names, gel_len, wellx, welly, wellsep, res, cursor_ovr, back_col, band_col, well_col, noise, Itol, detectlim, title, FWTM, show=True)[source]
pydna.gel.gen_sample(sizes, quantities)[source]

Return list of pydna Dseqrecords of given size and quantity.

If a single quantity is given it is divided by the DNA fragments in proportion to their length.

Parameters:

sizes : iterable of pint.unit.Quantity objects or ints

List of DNA sizes in base pairs (bp).

quantities : iterable of pint.unit.Quantity objects, floats or ints

List of DNA weights in nanograms (ng). If a single quantity is given (pint.unit.Quantity object, float or int) it is divided linearly by the DNA fragments length.

Examples

Direct quantity assignment without declared units.

>>> sizes = [3000, 500, 1500]  # bp
>>> qts = [70.0, 11.7, 35.0]  # ng
>>> sample = gen_sample(sizes, qts)
>>> sample
[Dseqrecord(-3000), Dseqrecord(-500), Dseqrecord(-1500)]
>>> sample[0].m()  # g
7.0000000000000005e-08
>>> Q_([dna.m() for dna in sample], 'g').to('ng')
<Quantity([ 70.   11.7  35. ], 'nanogram')>
>>> Q_([dna.n for dna in sample], 'mol').to('pmol')
<Quantity([ 0.03776682  0.03786665  0.03776521], 'picomole')>

Direct quantity assignment with declared units.

>>> sizes = Q_([3000, 500, 1500], 'bp')
>>> qts = Q_([70.0, 11.7, 35.0],  'ng')
>>> sample = gen_sample(sizes, qts)
>>> sample
[Dseqrecord(-3000), Dseqrecord(-500), Dseqrecord(-1500)]
>>> Q_([dna.m() for dna in sample], 'g').to('ng')
<Quantity([ 70.   11.7  35. ], 'nanogram')>
>>> Q_([dna.n for dna in sample], 'mol').to('pmol')
<Quantity([ 0.03776682  0.03786665  0.03776521], 'picomole')>

Assignment of total quantity (with declared units).

>>> sample = gen_sample(Q_([3000, 500, 1500], 'bp'), Q_(200, 'ng'))
>>> sample
[Dseqrecord(-3000), Dseqrecord(-500), Dseqrecord(-1500)]
>>> Q_([dna.m() for dna in sample], 'g').to('ng')
<Quantity([ 120.   20.   60.], 'nanogram')>
>>> Q_([dna.n for dna in sample], 'mol').to('pmol')
<Quantity([ 0.06474311  0.06472932  0.06474035], 'picomole')>
pydna.gel.lin_div_Qty(sample, quantity, criteria=<built-in function len>)[source]

Linearly divides a quantity by the elements of sample considering a criteria. Criteria must by a function that returns a number upon being applied to each element of sample.

pydna.gel.logspace_int(minimum, maximum, divs)[source]
pydna.gel.pore_size(gamma, muL, mu0, lp, b)
pydna.gel.pore_size_fit(C)
pydna.gel.radius_gyration(L, lp)
pydna.gel.random_Dseqs(sizes)[source]

Return a list of pydna Dseqs of given sizes and random sequences.

pydna.gel.reduced_field(eta, a, mu0, E, kB, T)
pydna.gel.reduced_field_Kuhn(eta, l, mu0, E, kB, T)
pydna.gel.reptation_accelerated(Dblob, epsilon, N)
pydna.gel.reptation_equilibrium(Dblob, N)
pydna.gel.reptation_plateau(Dblob, epsilon)
pydna.gel.rundistance(time, mobility, field)
pydna.gel.runtime(distance, mobility, field)
pydna.gel.size_to_mobility(dna_len, field, percentage, mu_func=<function vWBR.<locals>.<lambda>>, dataset={'muL': <Quantity([ 2.10000000e-09 3.40000000e-09 8.10000000e-09 7.70000000e-09 7.80000000e-09 8.90000000e-09 9.00000000e-09 9.80000000e-09 1.22000000e-08 1.20000000e-08 8.36000000e-14 9.00000000e-10 4.90000000e-09 5.30000000e-09 6.20000000e-09 5.50000000e-09 7.20000000e-09 6.90000000e-09 9.30000000e-09 1.03000000e-08 1.82000000e-22 1.37000000e-21 1.70000000e-09 3.30000000e-09 4.70000000e-09 3.80000000e-09 5.20000000e-09 6.10000000e-09 7.70000000e-09 8.80000000e-09 1.87000000e-18 3.80000000e-14 2.00000000e-10 2.70000000e-09 3.60000000e-09 2.10000000e-09 4.50000000e-09 5.00000000e-09 4.90000000e-09 5.90000000e-09], 'meter ** 2 / second / volt')>, 'T': <Quantity([ 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.7 0.7 0.7 0.7 0.7 0.7 0.7 0.7 0.7 0.7 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.3 1.3 1.3 1.3 1.3 1.3 1.3 1.3 1.3 1.3], 'gram / milliliter')>, 'E': <Quantity([ 0.62 0.93 1.24 1.55 1.86 2.17 2.48 3.1 4.97 6.21 0.62 0.93 1.24 1.55 1.86 2.17 2.48 3.1 4.97 6.21 0.62 0.93 1.24 1.55 1.86 2.17 2.48 3.1 4.97 6.21 0.62 0.93 1.24 1.55 1.86 2.17 2.48 3.1 4.97 6.21], 'volt / centimeter')>, 'gamma': <Quantity([ 5.27990000e+01 2.53040000e+01 1.49220000e+01 1.44880000e+01 1.26030000e+01 1.19110000e+01 1.26480000e+01 9.42300000e+00 9.63000000e+00 9.21400000e+00 7.32000000e+05 5.42640000e+01 1.52550000e+01 1.31250000e+01 9.83400000e+00 1.01510000e+01 9.03600000e+00 7.71300000e+00 6.18800000e+00 6.37100000e+00 1.83000000e+14 2.33000000e+13 2.58550000e+01 1.27730000e+01 8.50200000e+00 9.37200000e+00 7.20900000e+00 5.85200000e+00 4.74900000e+00 4.41700000e+00 1.11000000e+10 6.10000000e+05 1.79189000e+02 1.19200000e+01 8.59400000e+00 1.24120000e+01 7.46400000e+00 5.15900000e+00 4.04800000e+00 3.55500000e+00], 'kilobase_pair')>, 'muS': <Quantity([ 2.67000000e-08 2.54000000e-08 3.67000000e-08 3.18000000e-08 3.12000000e-08 3.07000000e-08 3.13000000e-08 3.14000000e-08 3.34000000e-08 3.11000000e-08 2.86000000e-08 2.38000000e-08 3.71000000e-08 3.61000000e-08 3.61000000e-08 3.07000000e-08 3.42000000e-08 3.14000000e-08 3.08000000e-08 3.24000000e-08 2.48000000e-08 2.41000000e-08 3.34000000e-08 3.25000000e-08 3.69000000e-08 2.87000000e-08 3.39000000e-08 3.36000000e-08 3.04000000e-08 3.13000000e-08 2.12000000e-08 1.96000000e-08 3.32000000e-08 3.47000000e-08 3.97000000e-08 2.76000000e-08 3.26000000e-08 3.37000000e-08 2.68000000e-08 2.70000000e-08], 'meter ** 2 / second / volt')>}, method='linear', replNANs=True)[source]
pydna.gel.to_units(quantity, units, var_name=None)[source]

Asserts that the quantity has the proper dimensions (inferred from the default units) if the quantity is an instance of pint.unit.Quantity or assigns the default units if it’s not.

pydna.gel.vWBR(muS, muL, gamma)[source]

vWBR equation

pydna.gel.vWBRfit(field, percentage, DNAvals=array([ 100. , 604.04040404, 1108.08080808, 1612.12121212, 2116.16161616, 2620.2020202 , 3124.24242424, 3628.28282828, 4132.32323232, 4636.36363636, 5140.4040404 , 5644.44444444, 6148.48484848, 6652.52525253, 7156.56565657, 7660.60606061, 8164.64646465, 8668.68686869, 9172.72727273, 9676.76767677, 10180.80808081, 10684.84848485, 11188.88888889, 11692.92929293, 12196.96969697, 12701.01010101, 13205.05050505, 13709.09090909, 14213.13131313, 14717.17171717, 15221.21212121, 15725.25252525, 16229.29292929, 16733.33333333, 17237.37373737, 17741.41414141, 18245.45454545, 18749.49494949, 19253.53535354, 19757.57575758, 20261.61616162, 20765.65656566, 21269.6969697 , 21773.73737374, 22277.77777778, 22781.81818182, 23285.85858586, 23789.8989899 , 24293.93939394, 24797.97979798, 25302.02020202, 25806.06060606, 26310.1010101 , 26814.14141414, 27318.18181818, 27822.22222222, 28326.26262626, 28830.3030303 , 29334.34343434, 29838.38383838, 30342.42424242, 30846.46464646, 31350.50505051, 31854.54545455, 32358.58585859, 32862.62626263, 33366.66666667, 33870.70707071, 34374.74747475, 34878.78787879, 35382.82828283, 35886.86868687, 36390.90909091, 36894.94949495, 37398.98989899, 37903.03030303, 38407.07070707, 38911.11111111, 39415.15151515, 39919.19191919, 40423.23232323, 40927.27272727, 41431.31313131, 41935.35353535, 42439.39393939, 42943.43434343, 43447.47474747, 43951.51515152, 44455.55555556, 44959.5959596 , 45463.63636364, 45967.67676768, 46471.71717172, 46975.75757576, 47479.7979798 , 47983.83838384, 48487.87878788, 48991.91919192, 49495.95959596, 50000. ]), dataset={'muL': <Quantity([ 2.10000000e-09 3.40000000e-09 8.10000000e-09 7.70000000e-09 7.80000000e-09 8.90000000e-09 9.00000000e-09 9.80000000e-09 1.22000000e-08 1.20000000e-08 8.36000000e-14 9.00000000e-10 4.90000000e-09 5.30000000e-09 6.20000000e-09 5.50000000e-09 7.20000000e-09 6.90000000e-09 9.30000000e-09 1.03000000e-08 1.82000000e-22 1.37000000e-21 1.70000000e-09 3.30000000e-09 4.70000000e-09 3.80000000e-09 5.20000000e-09 6.10000000e-09 7.70000000e-09 8.80000000e-09 1.87000000e-18 3.80000000e-14 2.00000000e-10 2.70000000e-09 3.60000000e-09 2.10000000e-09 4.50000000e-09 5.00000000e-09 4.90000000e-09 5.90000000e-09], 'meter ** 2 / second / volt')>, 'T': <Quantity([ 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.7 0.7 0.7 0.7 0.7 0.7 0.7 0.7 0.7 0.7 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.3 1.3 1.3 1.3 1.3 1.3 1.3 1.3 1.3 1.3], 'gram / milliliter')>, 'E': <Quantity([ 0.62 0.93 1.24 1.55 1.86 2.17 2.48 3.1 4.97 6.21 0.62 0.93 1.24 1.55 1.86 2.17 2.48 3.1 4.97 6.21 0.62 0.93 1.24 1.55 1.86 2.17 2.48 3.1 4.97 6.21 0.62 0.93 1.24 1.55 1.86 2.17 2.48 3.1 4.97 6.21], 'volt / centimeter')>, 'gamma': <Quantity([ 5.27990000e+01 2.53040000e+01 1.49220000e+01 1.44880000e+01 1.26030000e+01 1.19110000e+01 1.26480000e+01 9.42300000e+00 9.63000000e+00 9.21400000e+00 7.32000000e+05 5.42640000e+01 1.52550000e+01 1.31250000e+01 9.83400000e+00 1.01510000e+01 9.03600000e+00 7.71300000e+00 6.18800000e+00 6.37100000e+00 1.83000000e+14 2.33000000e+13 2.58550000e+01 1.27730000e+01 8.50200000e+00 9.37200000e+00 7.20900000e+00 5.85200000e+00 4.74900000e+00 4.41700000e+00 1.11000000e+10 6.10000000e+05 1.79189000e+02 1.19200000e+01 8.59400000e+00 1.24120000e+01 7.46400000e+00 5.15900000e+00 4.04800000e+00 3.55500000e+00], 'kilobase_pair')>, 'muS': <Quantity([ 2.67000000e-08 2.54000000e-08 3.67000000e-08 3.18000000e-08 3.12000000e-08 3.07000000e-08 3.13000000e-08 3.14000000e-08 3.34000000e-08 3.11000000e-08 2.86000000e-08 2.38000000e-08 3.71000000e-08 3.61000000e-08 3.61000000e-08 3.07000000e-08 3.42000000e-08 3.14000000e-08 3.08000000e-08 3.24000000e-08 2.48000000e-08 2.41000000e-08 3.34000000e-08 3.25000000e-08 3.69000000e-08 2.87000000e-08 3.39000000e-08 3.36000000e-08 3.04000000e-08 3.13000000e-08 2.12000000e-08 1.96000000e-08 3.32000000e-08 3.47000000e-08 3.97000000e-08 2.76000000e-08 3.26000000e-08 3.37000000e-08 2.68000000e-08 2.70000000e-08], 'meter ** 2 / second / volt')>}, mu_func=<function vWBR.<locals>.<lambda>>, method='linear', replNANs=True, plot=True)[source]
pydna.gel.weight_standard_sample(key, qty=<Quantity(500, 'nanogram')>)[source]

pydna.genbank module

Provides a class for downloading sequences from genbank.

class pydna.genbank.Genbank(users_email, *args, tool='pydna', **kwargs)[source]

Bases: object

Class to facilitate download from genbank.

Parameters:

users_email : string

Has to be a valid email address. You should always tell Genbanks who you are, so that they can contact you.

Examples

>>> from pydna.genbank import Genbank                                                           
>>> gb=Genbank("bjornjobb@gmail.com")                               
>>> rec = gb.nucleotide("AJ515731")                 # <- entry from genbank                  
>>> print(len(rec))                                                        
19
nucleotide(*args, **kwargs)
pydna.genbank.genbank(accession)[source]

pydna.genbankfile module

class pydna.genbankfile.GenbankFile(record, *args, path=None, **kwargs)[source]

Bases: pydna.dseqrecord.Dseqrecord

pydna.genbankfixer module

This module provides a way to clean up broken Genbank files enough to pass the BioPython Genbank parser This parser is based on pyparsing. Almost all of this code was lifted from BioJSON (https://github.com/levskaya/BioJSON) by Anselm Levskaya. The code put up was not accompanied by any software licence.

There are some modifications to deal with fringe cases.

The parser first produces JSON as an intermediate format which is then formatted back into a string in Genbank format.

The parser is not complete, so some fields do not survive the roundtrip (see below). This should not be a difficult fix, though. The returned result has two properties, .jseq which is the intermediate JSON produced by the parser and .gbtext which is the formatted genbank string.

pydna.genbankfixer.concat_dict(dlist)[source]

more or less dict(list of string pairs) but merges vals with the same keys so no duplicates occur

pydna.genbankfixer.gbtext_clean(gbtext)[source]
pydna.genbankfixer.locstr(locs, strand)[source]

genbank formatted location string, assumes no join’d combo of rev and fwd seqs

pydna.genbankfixer.originstr(sequence)[source]

formats dna sequence as broken, numbered lines ala genbank

pydna.genbankfixer.parseGBLoc(s, l, t)[source]

retwingles parsed genbank location strings, assumes no joins of RC and FWD sequences

pydna.genbankfixer.strip_indent(str)[source]
pydna.genbankfixer.strip_multiline(s, l, t)[source]
pydna.genbankfixer.toGB(jseqs)[source]

parses json jseq data and prints out ApE compatible genbank

pydna.genbankfixer.toInt(s, l, t)[source]
pydna.genbankfixer.toJSON(gbkstring)[source]
pydna.genbankfixer.wrapstring(str_, rowstart, rowend, padfirst=True)[source]

wraps the provided string in lines of length rowend-rowstart and padded on the left by rowstart. -> if padfirst is false the first line is not padded

pydna.genbankrecord module

class pydna.genbankrecord.GenbankRecord(record, *args, item='accession', start=None, stop=None, strand=1, **kwargs)[source]

Bases: pydna.dseqrecord.Dseqrecord

pydna.ipynb_importer module

class pydna.ipynb_importer.NotebookFinder[source]

Bases: object

Module finder that locates IPython Notebooks

find_module(fullname, path=None)[source]
class pydna.ipynb_importer.NotebookLoader(path=None)[source]

Bases: object

Module Loader for IPython Notebooks

load_module(fullname)[source]

import a notebook as a module

pydna.myprimers module

This module provides three ways to access a primer list specified in the pydna.ini file.

pydna.myprimers.append_primer_list(primers)[source]

pydna.parsers module

Provides two classes, Dseq and Dseqrecord, for handling double stranded DNA sequences. Dseq and Dseqrecord are subclasses of Biopythons Seq and SeqRecord classes, respectively. These classes support the notion of circular and linear DNA.

pydna.parsers.parse(data, ds=True)[source]

This function returns all DNA sequences found in data. If no sequences are found, an empty list is returned. This is a greedy function, use carefully.

Parameters:

data : string or iterable

The data parameter is a string containing:

  1. an absolute path to a local file. The file will be read in text mode and parsed for EMBL, FASTA and Genbank sequences.
  2. a string containing one or more sequences in EMBL, GENBANK, or FASTA format. Mixed formats are allowed.
  3. data can be a list or other iterable where the elements are 1 or 2

ds : bool

If True double stranded Dseqrecord objects are returned. If False single stranded Bio.SeqRecord [13] objects are returned.

Returns:

list

contains Dseqrecord or SeqRecord objects

See also

read

References

[12]http://biopython.org/wiki/SeqRecord
pydna.parsers.parse2(data, ds=True)[source]

experimental

pydna.parsers.parse_primers(data)[source]

pydna.primer module

This module contain functions for primer design.

class pydna.primer.Primer(record, *args, template=None, position=None, footprint=None, concentration=1000.0, **kwargs)[source]

Bases: Bio.SeqRecord.SeqRecord

This class can hold information about a primer and its position on a template footprint and tail.

footprint
tail
tm(saltc=50.0, formula=<function tmbresluc>)[source]

pydna.readers module

Provides two classes, Dseq and Dseqrecord, for handling double stranded DNA sequences. Dseq and Dseqrecord are subclasses of Biopythons Seq and SeqRecord classes, respectively. These classes support the notion of circular and linear DNA.

pydna.readers.read(data, ds=True)[source]

This function is similar the parse() function but expects one and only one sequence or and exception is thrown.

Parameters:

data : string

see below

ds : bool

Double stranded or single stranded DNA, if True return Dseqrecord objects, else Bio.SeqRecord objects.

Returns:

Dseqrecord

contains the first Dseqrecord or SeqRecord object parsed.

See also

parse

Notes

The data parameter is similar to the data parameter for parse().

pydna.readers.read_primer(data)[source]

pydna.seqrecord module

class pydna.seqrecord.SeqRecord(*args, **kwargs)[source]

Bases: Bio.SeqRecord.SeqRecord

accession

alias for id property

definition

alias for description property

gc()[source]

Returns GC content

isorf(table=1)[source]

Detects if sequence is an open reading frame (orf) in the 5’-3’ direction. Translation tables are numbers according to the NCBI numbering [14].

Parameters:

table : int

Sets the translation table, default is 1 (standard code)

Returns:

bool

True if sequence is an orf, False otherwise.

References

[13]http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi?mode=c

Examples

>>> from pydna.dseqrecord import Dseqrecord
>>> a=Dseqrecord("atgtaa")
>>> a.isorf()
True
>>> b=Dseqrecord("atgaaa")
>>> b.isorf()
False
>>> c=Dseqrecord("atttaa")
>>> c.isorf()
False
list_features()[source]

Prints an ASCII table with all features.

Examples

>>> from pydna.dseqrecord import Dseqrecord
>>> a=Dseqrecord("atgtaa")
>>> a.add_feature(2,4)
>>> print(a.list_features())
+----------+-----------+-------+-----+--------+--------------+------+------+
| Feature# | Direction | Start | End | Length | id           | type | orf? |
+----------+-----------+-------+-----+--------+--------------+------+------+
| 0        |    None   |   2   |  4  |      2 | <unknown id> | misc |  no  |
+----------+-----------+-------+-----+--------+--------------+------+------+
>>>
locus

alias for name property

spread_ape_colors()[source]

This method assigns random colors compatible with the ApE editor to features.

pydna.tm module

pydna.tm.Q5()[source]

for Q5 Ta they take the lower of the two Tms and add 1C (up to 72C). For Phusion they take the lower of the two and add 3C (up to 72C).

pydna.tm.basictm(primer, *args, **kwargs)[source]

Returns the melting temperature (Tm) of the primer using the basic formula. This function returns the same value as the Biopython Bio.SeqUtils.MeltingTemp.Tm_Wallace

Tm = (wA+xT)*2 + (yG+zC)*4 assumed 50mM monovalent cations

w = number of A in primer
x = number of T in primer
y = number of G in primer
z = number of C in primer
Parameters:

primer : string

Primer sequence 5’-3’

Returns:

tm : int

Examples

>>> from pydna.tm import basictm
>>> basictm("ggatcc")
20
>>>
pydna.tm.tmbreslauer86(primer, *args, dnac=500.0, saltc=50, thermodynamics=False, **kwargs)[source]

Returns the melting temperature (Tm) of the primer using the nearest neighbour algorithm. Formula and thermodynamic data is taken from Breslauer 1986.

These data are no longer widely used.

Breslauer 1986, table 2 [#]_

pair dH dS dG
AA/TT 9.1 24.0 1.9
AT/TA 8.6 23.9 1.5
TA/AT 6.0 16.9 0.9
CA/GT 5.8 12.9 1.9
GT/CA 6.5 17.3 1.3
CT/GA 7.8 20.8 1.6
GA/CT 5.6 13.5 1.6
CG/GC 11.9 27.8 3.6
GC/CG 11.1 26.7 3.1
GG/CC 11.0 26.6 3.1
Parameters:

primer : string

Primer sequence 5’-3’

Returns:

tm : float

References

[14]K.J. Breslauer et al., “Predicting DNA Duplex Stability from the Base Sequence,” Proceedings of the National Academy of Sciences 83, no. 11 (1986): 3746.

Examples

>>> from pydna.tm import tmbreslauer86
>>> tmbreslauer86("ACGTCATCGACACTATCATCGAC")
64.28863985851899
pydna.tm.tmbresluc(primer, *args, primerc=500.0, saltc=50, thermodynamics=False, **kwargs)[source]

Returns the tm for a primer using a formula adapted to polymerases with a DNA binding domain, such as the Phusion polymerase.

Parameters:

primer : string

primer sequence 5’-3’

primerc : float

primer concentration in nM), set to 500.0 nm by default.

saltc : float, optional

Monovalent cation concentration in mM, set to 50.0 mM by default.

thermodynamics : bool, optional

prints details of the thermodynamic data to stdout. For debugging only.

Returns:

tm : float

the tm of the primer

tm,dH,dS : tuple

tm and dH and dS used for the calculation

pydna.tm.tmstaluc98(primer, *args, dnac=50, saltc=50, **kwargs)[source]

Returns the melting temperature (Tm) of the primer using the nearest neighbour algorithm. Formula and thermodynamic data is taken from SantaLucia 1998 [1]. This implementation gives the same answer as the one provided by Biopython (See Examples).

Thermodynamic data used:

pair dH dS
AA/TT 7.9 22.2
AT/TA 7.2 20.4
TA/AT 7.2 21.3
CA/GT 8.5 22.7
GT/CA 8.4 22.4
CT/GA 7.8 21.0
GA/CT 8.2 22.2
CG/GC 10.6 27.2
GC/CG 9.8 24.4
GG/CC 8.0 19.9
Parameters:

primer : string

Primer sequence 5’-3’

Returns:

tm : float

tm of the primer

Examples

>>> from pydna.tm import tmstaluc98
>>> from Bio.SeqUtils.MeltingTemp import Tm_staluc
>>> tmstaluc98("ACGTCATCGACACTATCATCGAC")
54.55597724052518
>>> Tm_staluc("ACGTCATCGACACTATCATCGAC")
54.55597724052518

pydna.utils module

This module provides miscellaneous functions.

pydna.utils.ChenFoxLyndon(s)[source]

Decompose s into Lyndon words according to the Chen-Fox-Lyndon theorem. The arguments are the same as for ChenFoxLyndonBreakpoints but the return values are subsequences of s rather than indices of breakpoints.

Algorithms on strings and sequences based on Lyndon words. David Eppstein, October 2011.

pydna.utils.ChenFoxLyndonBreakpoints(s)[source]

Find starting positions of Chen-Fox-Lyndon decomposition of s. The decomposition is a set of Lyndon words that start at 0 and continue until the next position. 0 itself is not output, but the final breakpoint at the end of s is. The argument s must be of a type that can be indexed (e.g. a list, tuple, or string). The algorithm follows Duval, J. Algorithms 1983, but uses 0-based indexing rather than Duval’s choice of 1-based indexing.

Algorithms on strings and sequences based on Lyndon words. David Eppstein, October 2011.

pydna.utils.SmallestRotation(s)[source]

Find the rotation of s that is smallest in lexicographic order. Duval 1983 describes how to modify his algorithm to do so but I think it’s cleaner and more general to work from the ChenFoxLyndon output.

Algorithms on strings and sequences based on Lyndon words. David Eppstein, October 2011.

pydna.utils.copy_features(source_sr, target_sr, limit=10)[source]

This function tries to copy all features in source_seq and copy them to target_seq. Source_sr and target_sr are objects with a features property, such as Dseqrecord or Biopython SeqRecord.

Parameters:

source_seq : SeqRecord or Dseqrecord

The sequence to copy features from

target_seq : SeqRecord or Dseqrecord

The sequence to copy features to

Returns:

bool : True

This function acts on target_seq in place. No data is returned.

pydna.utils.cseguid(seq)[source]

Returns the cSEGUID for the sequence. The cSEGUID is the url safe SEGUID checksum calculated for the lexicographically minimal string rotation of a DNA sequence. Only defined for circular sequences.

pydna.utils.eq(*args, **kwargs)[source]

Compares two or more DNA sequences for equality i.e. they represent the same DNA molecule. Comparisons are case insensitive.

Parameters:

args : iterable

iterable containing sequences args can be strings, Biopython Seq or SeqRecord, Dseqrecord or dsDNA objects.

circular : bool, optional

Consider all molecules circular or linear

linear : bool, optional

Consider all molecules circular or linear

Returns:

eq : bool

Returns True or False

Notes

Compares two or more DNA sequences for equality i.e. if they represent the same DNA molecule.

Two linear sequences are considiered equal if either:

  • They have the same sequence (case insensitive)
  • One sequence is the reverse complement of the other (case insensitive)

Two circular sequences are considered equal if they are circular permutations:

  1. They have the same lengt, AND
  2. One sequence or can be found in the concatenation of the other sequence with itself, OR
  3. The reverse complement can be found in the concatenation of the other sequence with itself.

The topology for the comparison can be set using one of the keywords linear or circular to True or False.

If circular or linear is not set, it will be deduced from the topology of each sequence for sequences that have a linear or circular attribute (like Dseq and Dseqrecord).

Examples

>>> from pydna.dseqrecord import Dseqrecord
>>> from pydna.utils import eq
>>> eq("aaa","AAA")
True
>>> eq("aaa","AAA","TTT")
True
>>> eq("aaa","AAA","TTT","tTt")
True
>>> eq("aaa","AAA","TTT","tTt", linear=True)
True
>>> eq("Taaa","aTaa", linear = True)
False
>>> eq("Taaa","aTaa", circular = True)
True
>>> a=Dseqrecord("Taaa")
>>> b=Dseqrecord("aTaa")
>>> eq(a,b)
False
>>> eq(a,b,circular=True)
True
>>> a=a.looped()
>>> b=b.looped()
>>> eq(a,b)
True
>>> eq(a,b,circular=False)
False
>>> eq(a,b,linear=True)
False
>>> eq(a,b,linear=False)
True
>>> eq("ggatcc","GGATCC")
True
>>> eq("ggatcca","GGATCCa")
True
>>> eq("ggatcca","tGGATCC")
True
pydna.utils.memorize(filename)[source]
pydna.utils.pairwise(iterable)[source]

s -> (s0,s1), (s1,s2), (s2, s3), ...

pydna.utils.rc(sequence)[source]

returns the reverse complement of sequence (string) accepts mixed DNA/RNA

pydna.utils.seguid(seq)[source]

Returns the url safe SEGUID checksum for the sequence. This is the SEGUID checksum with the ‘+’ and ‘/’ characters of standard Base64 encoding are respectively replaced by ‘-‘ and ‘_’.

pydna.utils.shift_origin(seq, shift)[source]

Shift the origin of seq which is assumed to be a circular sequence.

Parameters:

seq : string, Biopython Seq, Biopython SeqRecord, Dseq or Dseqrecord

sequence to be shifted.

Returns:

new_seq : string, Biopython Seq, Biopython SeqRecord, Dseq or Dseqrecord

sequence with a new origin.

Examples

>>> from pydna.utils import shift_origin
>>> shift_origin("taaa",1)
'aaat'
>>> shift_origin("taaa",0)
'taaa'
>>> shift_origin("taaa",2)
'aata'
>>> shift_origin("gatc",2)
'tcga'

Module contents

pydna.get_env()[source]
pydna.open_cache_folder()[source]
pydna.open_config_folder()[source]
pydna.open_current_folder()[source]
pydna.open_log_folder()[source]