Primer design in pydna
You can use pydna
for primer design in different contexts, let’s start with some basic primer functionalities.
Checking the Tm of a primer
Primer design in pydna is very flexible, and supports different methods. For typical use-cases, we recommend using tm_default
, which uses the method Bio.SeqUtils.MeltingTemp
from biopython, nearest neighbor thermodynamics values from SantaLucia & Hicks (2004) and common values for nucleotide concentration, salt concentration, etc. You can of course change those settings. For a full dive check src/pydna/tm.py
.
from pydna.tm import tm_default
# The primers from the readme example
print(tm_default("ATGCAAACAGTAATGATGGA"))
print(tm_default("ATTATCTTTTTCAGCAATAGAATCA"))
55.047602641480864
54.55481807340169
Using NEB Tm calculator
If you are used to the NEB Tm calculator, and you want to use it programmatically, you can do so as well. The function takes three arguments:
primer
: The primer sequence.conc
: The primer concentration.prodcode
: The product code, which you can find on NEB’s website.
NOTE: When you call the function, it will make a request to the NEB server. This makes it much slower than using the builtin methods. In addition, we cannot guarantee that the NEB server will always be available, nor that the calculations that they use will not change in the future, since the code is not available.
from pydna.tm import tm_neb
print(tm_neb("ATGCAAACAGTAATGATGGA", 0.5, "q5-0"))
print(tm_neb("ATTATCTTTTTCAGCAATAGAATCA", 0.5, "q5-0"))
59
57
Designing primers for PCR
Let’s use pydna to amplify a region from a DNA sequence. You can use the primer_design
function to design primers for a given target Tm (target_tm
) and indicate a minimum primer hybridization length in basepairs (limit
).
from pydna.dseqrecord import Dseqrecord
from Bio.SeqFeature import SeqFeature, SimpleLocation
from pydna.design import primer_design
dna = Dseqrecord("ggttcaATGCAAACAGTAATGATGGATGACATTCAAAGCACTGATTCTATTGCTGAATAAcatttacatca")
# Let's add a feature representing the CDS
dna.features.append(SeqFeature(SimpleLocation(start=6, end=60), type="CDS"))
# To design the primer, we extract the template sequence we want to amplify, and use the `primer_design` method.
template = dna.features[0].location.extract(dna)
# We get an amplicon object (a subclass of Dseqrecord), that also contains extra info
# of where the primers align etc.
amplicon = primer_design(template, target_tm=60.0, limit=15)
# We extract the primers
fwd_primer, rvs_primer = amplicon.primers()
# We print the Tms
print("Forward primer Tm:", tm_default(fwd_primer.seq))
print("Forward primer sequence:", fwd_primer.seq)
print()
print("Reverse primer Tm:", tm_default(rvs_primer.seq))
print("Reverse primer sequence:", rvs_primer.seq)
Forward primer Tm: 59.71997924024873
Forward primer sequence: ATGCAAACAGTAATGATGGATGAC
Reverse primer Tm: 60.22377911083646
Reverse primer sequence: TTATTCAGCAATAGAATCAGTGCTTTG
Special primers
We saw an example where we simply want to amplify a region of DNA. But what if we want to design primers for a specific restriction enzyme, or for Gibson Assembly? That’s also easy.
Restriction enzyme
Simply append the sequence you want at the 5’ end of the primers.
from Bio.Restriction import EcoRI
fwd_primer_EcoRI = 'ttGAATTC' + fwd_primer
# You can also do it like this!
rvs_primer_EcoRI = 'tt' + EcoRI.site + rvs_primer
print(fwd_primer_EcoRI.seq)
print(rvs_primer_EcoRI.seq)
ttGAATTCATGCAAACAGTAATGATGGATGAC
ttGAATTCTTATTCAGCAATAGAATCAGTGCTTTG
Edge case: Some recognition sites contain ambiguous bases, for instance
Bst4CI
cuts at siteACNGT
, whereN
can be any nucleotide. In that case, you can use the dictionary provided by biopython (from Bio.Data.IUPACData import ambiguous_dna_values
) to produce a concrete DNA sequence you can use in real life.
Gibson Assembly
To design primers for Gibson Assembly, you can use the assembly_fragments
function.
Linear Gibson Assembly
from pydna.design import assembly_fragments
# Let's imagine we want to join these two sequences together linearly with Gibson Assembly
seq1 = Dseqrecord('ATGCAAACAGTAATGATGGATGACATTCAAAGCACTGATTCTATTGCTGAAAAAGATAAT')
seq2 = Dseqrecord('CACTCTAATAATGAATCTAACTTTACTTGGAAAGCGTTTCGTGAACAAGTGGAAAAGCAT')
# First, we design primers for each fragment, as before:
pre_amplicon1 = primer_design(seq1, target_tm=60.0, limit=15)
pre_amplicon2 = primer_design(seq2, target_tm=60.0, limit=15)
# Then, we use the `assembly_fragments` function to design primers for Gibson Assembly
amplicon1, amplicon2 = assembly_fragments([pre_amplicon1, pre_amplicon2], overlap=10)
# We print the primers:
fwd_1, rvs_1 = amplicon1.primers()
fwd_2, rvs_2 = amplicon2.primers()
print('Primers for fragment 1:')
print(fwd_1.seq)
print(rvs_1.seq)
print()
print('Primers for fragment 2:')
print(fwd_2.seq)
print(rvs_2.seq)
print()
# The amplicons contain the PCR products (note the overlap between the two fragments)
print('PCR product 1:')
print(amplicon1.seq)
print()
print('PCR product 2:')
print(amplicon2.seq)
print()
print('Overlap')
print(amplicon1.seq)
print(' '*55,amplicon2.seq, sep='')
Primers for fragment 1:
ATGCAAACAGTAATGATGGATGAC
GAGTGATTATCTTTTTCAGCAATAGAATCAGTGC
Primers for fragment 2:
ATAATCACTCTAATAATGAATCTAACTTTACTTGGAAA
ATGCTTTTCCACTTGTTCACG
PCR product 1:
ATGCAAACAGTAATGATGGATGACATTCAAAGCACTGATTCTATTGCTGAAAAAGATAATCACTC
PCR product 2:
ATAATCACTCTAATAATGAATCTAACTTTACTTGGAAAGCGTTTCGTGAACAAGTGGAAAAGCAT
Overlap
ATGCAAACAGTAATGATGGATGACATTCAAAGCACTGATTCTATTGCTGAAAAAGATAATCACTC
ATAATCACTCTAATAATGAATCTAACTTTACTTGGAAAGCGTTTCGTGAACAAGTGGAAAAGCAT
Once you have the amplicons, you can use Assembly
to join them together (see the Gibson
notebook for more details)
from pydna.assembly import Assembly
from pydna.common_sub_strings import terminal_overlap
assembly = Assembly([amplicon1, amplicon2], limit=10, algorithm=terminal_overlap)
product = assembly.assemble_linear()[0]
print(product.figure())
print()
print(Dseqrecord(product).figure())
65bp_PCR_prod|10
\/
/\
10|65bp_PCR_prod
Dseqrecord(-120)
[48;5;11mATGCAAACAGTAATGATGGATGAC[0mATTCAAAGCACTGATTCTATTGCTGAAAAAGATAATCACTCTAATAATGAATCTAACTTTACTTGGAAAGCGTTTCGTGAACAAGTGGAAAAGCAT
TACGTTTGTCATTACTACCTACTGTAAGTTTCGTGACTAAGATAACGACTTTTTCTATTAGTGAGATTATTACTTAGATTGAAATGAACCTTTCGCAAAGCACTTGTTCACCTTTTCGTA
Circular Gibson Assembly
# We use the `assembly_fragments` function with `circular=True`
amplicon1, amplicon2 = assembly_fragments([pre_amplicon1, pre_amplicon2], overlap=10, circular=True)
# We print the primers:
fwd_1, rvs_1 = amplicon1.primers()
fwd_2, rvs_2 = amplicon2.primers()
print('Primers for fragment 1:')
print(fwd_1.seq)
print(rvs_1.seq)
print()
print('Primers for fragment 2:')
print(fwd_2.seq)
print(rvs_2.seq)
print()
assembly = Assembly([amplicon1, amplicon2], limit=10, algorithm=terminal_overlap)
# Here we use assemble_circular!
product = assembly.assemble_circular()[0]
print(product.figure())
print()
print(Dseqrecord(product).figure())
Primers for fragment 1:
AGCATATGCAAACAGTAATGATGGATGAC
GAGTGATTATCTTTTTCAGCAATAGAATCAGTGC
Primers for fragment 2:
ATAATCACTCTAATAATGAATCTAACTTTACTTGGAAA
TGCATATGCTTTTCCACTTGTTCACG
-|70bp_PCR_prod|10
| \/
| /\
| 10|70bp_PCR_prod|10
| \/
| /\
| 10-
| |
------------------------------------
Dseqrecord(o120)
AGCAT[48;5;11mATGCAAACAGTAATGATGGATGAC[0mATTCAAAGCACTGATTCTATTGCTGAAAAAGATAATCACTCTAATAATGAATCTAACTTTACTTGGAAAGCGTTTCGTGAACAAGTGGAAA
TCGTATACGTTTGTCATTACTACCTACTGTAAGTTTCGTGACTAAGATAACGACTTTTTCTATTAGTGAGATTATTACTTAGATTGAAATGAACCTTTCGCAAAGCACTTGTTCACCTTT
Adding spacers / linkers to Gibson Assembly primers
In this case, you can pass a list of Amplicons
and Dseqrecords
to the assembly_fragments
function. The amplicons
will be used as the fragments to assemble, and the dseqrecords
will be used as spacers between the fragments, as long as they are shorter than the argument maxlink
.
# We create two spacers as dseqrecords
spacer1 = Dseqrecord('aaa')
spacer2 = Dseqrecord('ttt')
amplicon1, amplicon2 = assembly_fragments([pre_amplicon1, spacer1, pre_amplicon2, spacer2], overlap=10, circular=True)
assembly = Assembly([amplicon1, amplicon2], limit=10, algorithm=terminal_overlap)
# Here we use assemble_circular!
product = assembly.assemble_circular()[0]
# See the linkers that have been added
print()
print(Dseqrecord(product).seq)
print(4*' ', '^^^', 60*' ', '^^^', sep='')
GCATtttATGCAAACAGTAATGATGGATGACATTCAAAGCACTGATTCTATTGCTGAAAAAGATAATaaaCACTCTAATAATGAATCTAACTTTACTTGGAAAGCGTTTCGTGAACAAGTGGAAAA
^^^ ^^^
Summary of assembly_fragments behaviour
The behaviour is summarised in the following graphics for linear and circular assembly.