Great research starts with great data.

Learn More
More >
Patent Analysis of

Methods and compositions for RNA-directed target DNA modification and for RNA-directed modulation of transcription

Updated Time 12 June 2019

Patent Registration Data

Publication Number

US10000772

Application Number

US14/685502

Application Date

13 April 2015

Publication Date

19 June 2018

Current Assignee

UNIVERSITY OF VIENNA,THE REGENTS OF THE UNIVERSITY OF CALIFORNIA

Original Assignee (Applicant)

CHARPENTIER, EMMANUELLE,THE REGENTS OF THE UNIVERSITY OF CALIFORNIA,UNIVERSITY OF VIENNA

International Classification

C12N15/90,C12N15/70,C12N15/10,C12N9/22,C12N15/11

Cooperative Classification

C12N15/907,C12N9/22,C12N15/102,C12N15/111,C12N15/113

Inventor

DOUDNA, JENNIFER A.,JINEK, MARTIN,CHARPENTIER, EMMANUELLE,CHYLINSKI, KRZYSZTOF

Patent Images

This patent contains figures and images illustrating the invention and its embodiment.

US10000772 Methods compositions RNA-directed 1 US10000772 Methods compositions RNA-directed 2 US10000772 Methods compositions RNA-directed 3
See all images <>

Abstract

The present disclosure provides a DNA-targeting RNA that comprises a targeting sequence and, together with a modifying polypeptide, provides for site-specific modification of a target DNA and/or a polypeptide associated with the target DNA. The present disclosure further provides site-specific modifying polypeptides. The present disclosure further provides methods of site-specific modification of a target DNA and/or a polypeptide associated with the target DNA The present disclosure provides methods of modulating transcription of a target nucleic acid in a target cell, generally involving contacting the target nucleic acid with an enzymatically inactive Cas9 polypeptide and a DNA-targeting RNA. Kits and compositions for carrying out the methods are also provided. The present disclosure provides genetically modified cells that produce Cas9; and Cas9 transgenic non-human multicellular organisms.

Read more

Claims

1. A method of modifying a target DNA molecule, the method comprising:contacting a target DNA molecule having a target sequence with a complex comprising:(a) a Cas9 protein; and(b) a DNA-targeting RNA comprising:(i) a targeter-RNA that hybridizes with the target sequence, and(ii) an activator-RNA that hybridizes with the targeter-RNA to form a double-stranded RNA (dsRNA) duplex of a protein-binding segment, wherein the activator-RNA hybridizes with the targeter-RNA to form a total of 10 to 15 base pairs, wherein said contacting takes place outside of a bacterial cell and outside of an archaeal cell, thereby resulting in modification of the target DNA molecule.

2. The method of claim 1, wherein said modification of the target DNA molecule is cleavage of the target DNA molecule.

3. The method of claim 1, wherein the target sequence is 15 nucleotides (nt) to 18 nt long.

4. The method of claim 1, wherein the target sequence is 18 nucleotides (nt) to 25 nt long.

5. The method of claim 1, wherein the target DNA molecule is chromosomal DNA.

6. The method of claim 1, wherein the activator-RNA comprises the 26 nucleotide tracrRNA sequence set forth in SEQ ID NO: 441.

7. The method of claim 1, wherein the targeter-RNA and/or the activator-RNA comprises one or more of: a non-natural internucleoside linkage, a nucleic acid mimetic, a modified sugar moiety, and a modified nucleobase.

8. The method of claim 1, wherein the targeter-RNA and/or the activator-RNA comprises one or more of: (i) a non-natural internucleoside linkage selected from a phosphorothioate, an inverted polarity linkage, and an abasic nucleoside linkage; (ii) a locked nucleic acid (LNA); and (iii) a modified sugar moiety selected from 2′-O-methoxyethyl, 2′-O-methyl, and 2′-fluoro.

9. The method of claim 1, wherein the targeter-RNA and/or the activator-RNA comprises one or more of: a peptide nucleic acid (PNA), a morpholino nucleic acid, a cyclohexenyl nucleic acid (CeNA), and/or a locked nucleic acid (LNA).

10. The method of claim 1, wherein the targeter-RNA and/or the activator-RNA comprises one or more modified sugar moieties selected from: 2′-O-(2-methoxyethyl), 2′-dimethylaminooxyethoxy, 2′-dimethylaminoethoxyethoxy, 2′-O-methyl, and 2′-fluoro.

11. The method of claim 1, wherein the targeter-RNA and/or the activator-RNA is conjugated to a moiety selected from: a polyamine; a polyamide; a polyethylene glycol; a polyether; a cholesterol moiety; a cholic acid; a thioether; a thiocholesterol; an aliphatic chain; a phospholipid; an adamantane acetic acid; a palmityl moiety; an octadecylamine or hexylamino-carbonyl-oxycholesterol moiety; a biotin; a phenazine; a folate; a phenanthridine; an anthraquinone; an acridine; a fluorescein; a rhodamine; a fluor; and a coumarin.

Read more

Claim Tree

  • 1
    1. A method of modifying a target DNA molecule, the method comprising:contacting a target DNA molecule having a target sequence with a complex comprising:
    • (a) a Cas9 protein; and
    • (b) a DNA-targeting RNA comprising:(i) a targeter-RNA that hybridizes with the target sequence, and(ii) an activator-RNA that hybridizes with the targeter-RNA to form a double-stranded RNA (dsRNA) duplex of a protein-binding segment, wherein the activator-RNA hybridizes with the targeter-RNA to form a total of 10 to 15 base pairs, wherein said contacting takes place outside of a bacterial cell and outside of an archaeal cell, thereby resulting in modification of the target DNA molecule.
    • 2. The method of claim 1, wherein
      • said modification of the target DNA molecule is cleavage of the target DNA molecule.
    • 3. The method of claim 1, wherein
      • the target sequence is 15 nucleotides (nt) to 18 nt long.
    • 4. The method of claim 1, wherein
      • the target sequence is 18 nucleotides (nt) to 25 nt long.
    • 5. The method of claim 1, wherein
      • the target DNA molecule is chromosomal DNA.
    • 6. The method of claim 1, wherein
      • the activator-RNA comprises
    • 7. The method of claim 1, wherein
      • the targeter-RNA and/or the activator-RNA comprises
    • 8. The method of claim 1, wherein
      • the targeter-RNA and/or the activator-RNA comprises
    • 9. The method of claim 1, wherein
      • the targeter-RNA and/or the activator-RNA comprises
    • 10. The method of claim 1, wherein
      • the targeter-RNA and/or the activator-RNA comprises
    • 11. The method of claim 1, wherein
      • the targeter-RNA and/or the activator-RNA is conjugated to a moiety selected from: a polyamine; a polyamide; a polyethylene glycol; a polyether; a cholesterol moiety; a cholic acid; a thioether; a thiocholesterol; an aliphatic chain; a phospholipid; an adamantane acetic acid; a palmityl moiety; an octadecylamine or hexylamino-carbonyl-oxycholesterol moiety; a biotin; a phenazine; a folate; a phenanthridine; an anthraquinone; an acridine; a fluorescein; a rhodamine; a fluor; and a coumarin.
See all independent claims <>

Description

INCORPORATION BY REFERENCE OF SEQUENCE LISTING PROVIDED AS A TEXT FILE

A Sequence Listing is provided herewith as a text file, “BERK-187-SeqList_ST25.txt” created on Mar. 14, 2013 and having a size of 7645 KB. The contents of the text file are incorporated by reference herein in their entirety.

BACKGROUND

About 60% of bacteria and 90% of archaea possess CRISPR (clustered regularly interspaced short palindromic repeats)/CRISPR-associated (Cas) system systems to confer resistance to foreign DNA elements. Type II CRISPR system from Streptococcus pyogenes involves only a single gene encoding the Cas9 protein and two RNAs—a mature CRISPR RNA (crRNA) and a partially complementary trans-acting RNA (tracrRNA)—which are necessary and sufficient for RNA-guided silencing of foreign DNAs.

In recent years, engineered nuclease enzymes designed to target specific DNA sequences have attracted considerable attention as powerful tools for the genetic manipulation of cells and whole organisms, allowing targeted gene deletion, replacement and repair, as well as the insertion of exogenous sequences (transgenes) into the genome. Two major technologies for engineering site-specific DNA nucleases have emerged, both of which are based on the construction of chimeric endonuclease enzymes in which a sequence non-specific DNA endonuclease domain is fused to an engineered DNA binding domain. However, targeting each new genomic locus requires the design of a novel nuclease enzyme, making these approaches both time consuming and costly. In addition, both technologies suffer from limited precision, which can lead to unpredictable off-target effects.

The systematic interrogation of genomes and genetic reprogramming of cells involves targeting sets of genes for expression or repression. Currently the most common approach for targeting arbitrary genes for regulation is to use RNA interference (RNAi). This approach has limitations. For example, RNAi can exhibit significant off-target effects and toxicity.

There is need in the field for a technology that allows precise targeting of nuclease activity (or other protein activities) to distinct locations within a target DNA in a manner that does not require the design of a new protein for each new target sequence. In addition, there is a need in the art for methods of controlling gene expression with minimal off-target effects.

SUMMARY

The present disclosure provides a DNA-targeting RNA that comprises a targeting sequence and, together with a modifying polypeptide, provides for site-specific modification of a target DNA and/or a polypeptide associated with the target DNA. The present disclosure further provides site-specific modifying polypeptides. The present disclosure further provides methods of site-specific modification of a target DNA and/or a polypeptide associated with the target DNA The present disclosure provides methods of modulating transcription of a target nucleic acid in a target cell, generally involving contacting the target nucleic acid with an enzymatically inactive Cas9 polypeptide and a DNA-targeting RNA. Kits and compositions for carrying out the methods are also provided. The present disclosure provides genetically modified cells that produce Cas9; and Cas9 transgenic non-human multicellular organisms.

Features

Features of the present disclosure include a DNA-targeting RNA comprising: (i) a first segment comprising a nucleotide sequence that is complementary to a sequence in a target DNA; and (ii) a second segment that interacts with a site-directed modifying polypeptide. In some cases, the first segment comprises 8 nucleotides that have 100% complementarity to a sequence in the target DNA. In some cases, the second segment comprises a nucleotide sequence with at least 60% identity over a stretch of at least 8 contiguous nucleotides to any one of the nucleotide sequences set forth in SEQ ID NOs:431-682 (e.g., 431-562). In some cases, the second segment comprises a nucleotide sequence with at least 60% identity over a stretch of at least 8 contiguous nucleotides to any one of the nucleotide sequences set forth in SEQ ID NOs:563-682. In some cases, the site-directed modifying polypeptide comprises an amino acid sequence having at least about 75% amino acid sequence identity to amino acids 7-166 or 731-1003 of the Cas9/Csn1 amino acid sequence depicted in FIG. 3A and FIG. 3B, or to the corresponding portions in any of the amino acid sequences set forth as SEQ ID NOs:1-256 and 795-1346.

Features of the present disclosure include a DNA polynucleotide comprising a nucleotide sequence that encodes the DNA-targeting RNA. In some cases, a recombinant expression vector comprises the DNA polynucleotide. In some cases, the nucleotide sequence encoding the DNA-targeting RNA is operably linked to a promoter. In some cases, the promoter is an inducible promoter. In some cases, the nucleotide sequence encoding the DNA-targeting RNA further comprises a multiple cloning site. Features of the present disclosure include an in vitro genetically modified host cell comprising the DNA polynucleotide.

Features of the present disclosure include a recombinant expression vector comprising: (i) a nucleotide sequence encoding a DNA-targeting RNA, wherein the DNA-targeting RNA comprises: (a) a first segment comprising a nucleotide sequence that is complementary to a sequence in a target DNA; and (b) a second segment that interacts with a site-directed modifying polypeptide; and (ii) a nucleotide sequence encoding the site-directed modifying polypeptide comprising: (a) an RNA-binding portion that interacts with the DNA-targeting RNA; and (b) an activity portion that exhibits site-directed enzymatic activity, wherein the site of enzymatic activity is determined by the DNA-targeting RNA.

Features of the present disclosure include a recombinant expression vector comprising: (i) a nucleotide sequence encoding a DNA-targeting RNA, wherein the DNA-targeting RNA comprises: (a) a first segment comprising a nucleotide sequence that is complementary to a sequence in a target DNA; and (b) a second segment that interacts with a site-directed modifying polypeptide; and (ii) a nucleotide sequence encoding the site-directed modifying polypeptide, where the site-directed modifying polypeptide comprises: (a) an RNA-binding portion that interacts with the DNA-targeting RNA; and (b) an activity portion that modulates transcription within the target DNA, wherein the site of modulated transcription within the target DNA is determined by the DNA-targeting RNA.

Features of the present disclosure include a variant site-directed modifying polypeptide comprising: (i) an RNA-binding portion that interacts with a DNA-targeting RNA, wherein the DNA-targeting RNA comprises a nucleotide sequence that is complementary to a sequence in a target DNA; and (ii) an activity portion that exhibits reduced site-directed enzymatic activity, wherein the site of enzymatic activity is determined by the DNA-targeting RNA. In some cases, the variant site-directed modifying polypeptide comprises an H840A mutation of the S. pyogenes sequence SEQ ID NO:8 or the corresponding mutation in any of the amino acid sequences set forth as SEQ ID NOs:1-256 and 795-1346. In some cases, the variant site-directed modifying polypeptide comprises a D10A mutation of the S. pyogenes sequence SEQ ID NO:8 or the corresponding mutation in any of the amino acid sequences set forth as SEQ ID NOs:1-256 and 795-1346. In some cases, the variant site-directed modifying polypeptide comprises both (i) a D10A mutation of the S. pyogenes sequence SEQ ID NO:8 or the corresponding mutation in any of the amino acid sequences set forth as SEQ ID NOs:1-256 and 795-1346; and (ii) an H840A mutation of the S. pyogenes sequence SEQ ID NO:8 or the corresponding mutation in any of the amino acid sequences set forth as SEQ ID NOs:1-256 and 795-1346.

Features of the present disclosure include a chimeric site-directed modifying polypeptide comprising: (i) an RNA-binding portion that interacts with a DNA-targeting RNA, wherein the DNA-targeting RNA comprises a nucleotide sequence that is complementary to a sequence in a target DNA; and (ii) an activity portion that exhibits site-directed enzymatic activity, wherein the site of enzymatic activity is determined by the DNA-targeting RNA. In some cases, the chimeric site-directed modifying polypeptide of comprises an amino acid sequence having at least about 75% amino acid sequence identity to amino acids 7-166 or 731-1003 of the Cas9/Csn1 amino acid sequence depicted in FIG. 3A and FIG. 3B, or to the corresponding portions in any of the amino acid sequences set forth as SEQ ID NOs:1-256 and 795-1346. In some cases, the DNA-targeting RNA further comprises a nucleotide sequence with at least 60% identity over a stretch of at least 8 contiguous nucleotides to any one of the nucleotide sequences set forth in SEQ ID NOs:431-682 (e.g., SEQ ID NOs:563-682). In some cases, the DNA-targeting RNA further comprises a nucleotide sequence with at least 60% identity over a stretch of at least 8 contiguous nucleotides to any one of the nucleotide sequences set forth in SEQ ID NOs:431-562. In some cases, the enzymatic activity of the chimeric site-directed modifying polypeptide modifies the target DNA. In some cases, the enzymatic activity of the chimeric site-directed modifying polypeptide is nuclease activity, methyltransferase activity, demethylase activity, DNA repair activity, DNA damage activity, deamination activity, dismutase activity, alkylation activity, depurination activity, oxidation activity, pyrimidine dimer forming activity, integrase activity, transposase activity, recombinase activity, polymerase activity, ligase activity, helicase activity, photolyase activity or glycosylase activity. In some cases, the enzymatic activity of the chimeric site-directed modifying polypeptide is nuclease activity. In some cases, the nuclease activity introduces a double strand break in the target DNA. In some cases, the enzymatic activity of the chimeric site-directed modifying polypeptide modifies a target polypeptide associated with the target DNA. In some cases, the enzymatic activity of the chimeric site-directed modifying polypeptide is methyltransferase activity, demethylase activity, acetyltransferase activity, deacetylase activity, kinase activity, phosphatase activity, ubiquitin ligase activity, deubiquitinating activity, adenylation activity, deadenylation activity, SUMOylating activity, deSUMOylating activity, ribosylation activity, deribosylation activity, myristoylation activity or demyristoylation activity.

Features of the present disclosure include a polynucleotide comprising a nucleotide sequence encoding a chimeric site-directed modifying polypeptide. In some cases, the polynucleotide is an RNA polynucleotide. In some cases, the polynucleotide is a DNA polynucleotide. Features of the present disclosure include a recombinant expression vector comprising the polynucleotide. In some cases, the polynucleotide is operably linked to a promoter. In some cases, the promoter is an inducible promoter. Features of the present disclosure include an in vitro genetically modified host cell comprising the polynucleotide.

Features of the present disclosure include a chimeric site-directed modifying polypeptide comprising: (i) an RNA-binding portion that interacts with a DNA-targeting RNA, wherein the DNA-targeting RNA comprises a nucleotide sequence that is complementary to a sequence in a target DNA; and (ii) an activity portion that modulates transcription within the target DNA, wherein the site of modulated transcription within the target DNA is determined by the DNA-targeting RNA. In some cases, the activity portion increases transcription within the target DNA. In some cases, the activity portion decreases transcription within the target DNA.

Features of the present disclosure include a genetically modified cell comprising a recombinant site-directed modifying polypeptide comprising an RNA-binding portion that interacts with a DNA-targeting RNA; and an activity portion that exhibits site-directed enzymatic activity, wherein the site of enzymatic activity is determined by the DNA-targeting RNA. In some cases, the site-directed modifying polypeptide comprises an amino acid sequence having at least about 75% amino acid sequence identity to amino acids 7-166 or 731-1003 of the Cas9/Csn1 amino acid sequence depicted in FIG. 3A and FIG. 3B, or to the corresponding portions in any of the amino acid sequences set forth as SEQ ID NOs:1-256 and 795-1346. In some cases, the cell is selected from the group consisting of: an archaeal cell, a bacterial cell, a eukaryotic cell, a eukaryotic single-cell organism, a somatic cell, a germ cell, a stem cell, a plant cell, an algal cell, an animal cell, in invertebrate cell, a vertebrate cell, a fish cell, a frog cell, a bird cell, a mammalian cell, a pig cell, a cow cell, a goat cell, a sheep cell, a rodent cell, a rat cell, a mouse cell, a non-human primate cell, and a human cell.

Features of the present disclosure include a transgenic non-human organism whose genome comprises a transgene comprising a nucleotide sequence encoding a recombinant site-directed modifying polypeptide comprising: (i) an RNA-binding portion that interacts with a DNA-targeting RNA; and (ii) an activity portion that exhibits site-directed enzymatic activity, wherein the site of enzymatic activity is determined by the DNA-targeting RNA. In some cases, the site-directed modifying polypeptide comprises an amino acid sequence having at least about 75% amino acid sequence identity to amino acids 7-166 or 731-1003 of the Cas9/Csn1 amino acid sequence depicted in FIG. 3A and FIG. 3B, or to the corresponding portions in any of the amino acid sequences set forth as SEQ ID NOs:1-256 and 795-1346. In some cases, the organism is selected from the group consisting of: an archaea, a bacterium, a eukaryotic single-cell organism, an algae, a plant, an animal, an invertebrate, a fly, a worm, a cnidarian, a vertebrate, a fish, a frog, a bird, a mammal, an ungulate, a rodent, a rat, a mouse, and a non-human primate.

Features of the present disclosure include a composition comprising: (i) a DNA-targeting RNA, or a DNA polynucleotide encoding the same, the DNA-targeting RNA comprising: (a) a first segment comprising a nucleotide sequence that is complementary to a sequence in a target DNA; and (b) a second segment that interacts with a site-directed modifying polypeptide; and (ii) the site-directed modifying polypeptide, or a polynucleotide encoding the same, the site-directed modifying polypeptide comprising: (a) an RNA-binding portion that interacts with the DNA-targeting RNA; and (b) an activity portion that exhibits site-directed enzymatic activity, wherein the site of enzymatic activity is determined by the DNA-targeting RNA. In some cases, the first segment of the DNA-targeting RNA comprises 8 nucleotides that have at least 100% complementarity to a sequence in the target DNA. In some cases, the second segment of the DNA-targeting RNA comprises a nucleotide sequence with at least 60% identity over a stretch of at least 8 contiguous nucleotides to any one of the nucleotide sequences set forth in SEQ ID NOs:431-682 (e.g., SEQ ID NOs:563-682). In some cases, the second segment of the DNA-targeting RNA comprises a nucleotide sequence with at least 60% identity over a stretch of at least 8 contiguous nucleotides to any one of the nucleotide sequences set forth in SEQ ID NOs:431-562. In some cases, the site-directed modifying polypeptide comprises an amino acid sequence having at least about 75% amino acid sequence identity to amino acids 7-166 or 731-1003 of the Cas9/Csn1 amino acid sequence depicted in FIG. 3A and FIG. 3B, or to the corresponding portions in any of the amino acid sequences set forth as SEQ ID NOs:1-256 and 795-1346. In some cases, the enzymatic activity modifies the target DNA. In some cases, the enzymatic activity is nuclease activity, methyltransferase activity, demethylase activity, DNA repair activity, DNA damage activity, deamination activity, dismutase activity, alkylation activity, depurination activity, oxidation activity, pyrimidine dimer forming activity, integrase activity, transposase activity, recombinase activity, polymerase activity, ligase activity, helicase activity, photolyase activity or glycosylase activity. In some cases, the enzymatic activity is nuclease activity. In some cases, the nuclease activity introduces a double strand break in the target DNA. In some cases, the enzymatic activity modifies a target polypeptide associated with the target DNA. In some cases, the enzymatic activity is methyltransferase activity, demethylase activity, acetyltransferase activity, deacetylase activity, kinase activity, phosphatase activity, ubiquitin ligase activity, deubiquitinating activity, adenylation activity, deadenylation activity, SUMOylating activity, deSUMOylating activity, ribosylation activity, deribosylation activity, myristoylation activity or demyristoylation activity. In some cases, the target polypeptide is a histone and the enzymatic activity is methyltransferase activity, demethylase activity, acetyltransferase activity, deacetylase activity, kinase activity, phosphatase activity, ubiquitin ligase activity or deubiquitinating activity. In some cases, the DNA-targeting RNA is a double-molecule DNA-targeting RNA and the composition comprises both a targeter-RNA and an activator-RNA, the duplex-forming segments of which are complementary and hybridize to form the second segment of the DNA-targeting RNA. In some cases, the duplex-forming segment of the activator-RNA comprises a nucleotide sequence with at least 60% identity over a stretch of at least 8 contiguous nucleotides to any one of the nucleotide sequences set forth in SEQ ID NO:SEQ ID NOs:431-682.

Features of the present disclosure include a composition comprising: (i) a DNA-targeting RNA of the present disclosure, or a DNA polynucleotide encoding the same; and (ii) a buffer for stabilizing nucleic acids. Features of the present disclosure include a composition comprising: (i) a site-directed modifying polypeptide of the present disclosure, or a polynucleotide encoding the same; and (ii) a buffer for stabilizing nucleic acids and/or proteins. Features of the present disclosure include a composition comprising: (i) a DNA-targeting RNA, or a DNA polynucleotide encoding the same, the DNA-targeting RNA comprising: (a) a first segment comprising a nucleotide sequence that is complementary to a sequence in a target DNA; and (b) a second segment that interacts with a site-directed modifying polypeptide; and (ii) the site-directed modifying polypeptide, or a polynucleotide encoding the same, the site-directed modifying polypeptide comprising: (a) an RNA-binding portion that interacts with the DNA-targeting RNA; and (b) an activity portion that modulates transcription within the target DNA, wherein the site of modulated transcription within the target DNA is determined by the DNA-targeting RNA. In some cases, the activity portion increases transcription within the target DNA. In some cases, the activity portion decreases transcription within the target DNA. Features of the present disclosure include a composition comprising: (i) a site-directed modifying polypeptide, or a polynucleotide encoding the same; and (ii) a buffer for stabilizing nucleic acids and/or proteins.

Features of the present disclosure include a method of site-specific modification of a target DNA, the method comprising: contacting the target DNA with: (i) a DNA-targeting RNA, or a DNA polynucleotide encoding the same, wherein the DNA-targeting RNA comprises: (a) a first segment comprising a nucleotide sequence that is complementary to a sequence in the target DNA; and (b) a second segment that interacts with a site-directed modifying polypeptide; and (ii) a site-directed modifying polypeptide, or a polynucleotide encoding the same, wherein the site-directed modifying polypeptide comprises: (a) an RNA-binding portion that interacts with the DNA-targeting RNA; and (b) an activity portion that exhibits site-directed enzymatic activity. In some cases, the target DNA is extrachromosomal. In some cases, the target DNA comprises a PAM sequence of the complementary strand that is 5′-CCY-3′, wherein Y is any DNA nucleotide and Y is immediately 5′ of the target sequence of the complementary strand of the target DNA. In some cases, the target DNA is part of a chromosome in vitro. In some cases, the target DNA is part of a chromosome in vivo. In some cases, the target DNA is part of a chromosome in a cell. In some cases, the cell is selected from the group consisting of: an archaeal cell, a bacterial cell, a eukaryotic cell, a eukaryotic single-cell organism, a somatic cell, a germ cell, a stem cell, a plant cell, an algal cell, an animal cell, in invertebrate cell, a vertebrate cell, a fish cell, a frog cell, a bird cell, a mammalian cell, a pig cell, a cow cell, a goat cell, a sheep cell, a rodent cell, a rat cell, a mouse cell, a non-human primate cell, and a human cell. In some cases, the DNA-targeting RNA comprises a nucleotide sequence with at least 60% identity over a stretch of at least 8 contiguous nucleotides to any one of the nucleotide sequences set forth in SEQ ID NOs:431-682 (e.g., SEQ ID NOs:563-682). In some cases, the DNA-targeting RNA comprises a nucleotide sequence with at least 60% identity over a stretch of at least 8 contiguous nucleotides to any one of the nucleotide sequences set forth SEQ ID NOs:431-562. In some cases, the DNA-modifying polypeptide comprises an amino acid sequence having at least about 75% amino acid sequence identity to amino acids 7-166 or 731-1003 of the Cas9/Csn1 amino acid sequence depicted in FIG. 3A and FIG. 3B, or to the corresponding portions in any of the amino acid sequences set forth as SEQ ID NOs:1-256 and 795-1346. In some cases, the enzymatic activity modifies the target DNA. In some cases, the enzymatic activity is nuclease activity, methyltransferase activity, demethylase activity, DNA repair activity, DNA damage activity, deamination activity, dismutase activity, alkylation activity, depurination activity, oxidation activity, pyrimidine dimer forming activity, integrase activity, transposase activity, recombinase activity, polymerase activity, ligase activity, helicase activity, photolyase activity or glycosylase activity. In some cases, the DNA-modifying enzymatic activity is nuclease activity. In some cases, the nuclease activity introduces a double strand break in the target DNA. In some cases, the contacting occurs under conditions that are permissive for nonhomologous end joining or homology-directed repair. In some cases, the method further comprises contacting the target DNA with a donor polynucleotide, wherein the donor polynucleotide, a portion of the donor polynucleotide, a copy of the donor polynucleotide, or a portion of a copy of the donor polynucleotide integrates into the target DNA. In some cases, the method does not comprise contacting the cell with a donor polynucleotide, wherein the target DNA is modified such that nucleotides within the target DNA are deleted. In some cases, the enzymatic activity modifies a target polypeptide associated with the target DNA. In some cases, the enzymatic activity is methyltransferase activity, demethylase activity, acetyltransferase activity, deacetylase activity, kinase activity, phosphatase activity, ubiquitin ligase activity, deubiquitinating activity, adenylation activity, deadenylation activity, SUMOylating activity, deSUMOylating activity, ribosylation activity, deribosylation activity, myristoylation activity or demyristoylation activity. In some cases, the target polypeptide is a histone and the enzymatic activity is methyltransferase activity, demethylase activity, acetyltransferase activity, deacetylase activity, kinase activity, phosphatase activity, ubiquitin ligase activity or deubiquitinating activity. In some cases, the complex further comprises an activator-RNA. In some cases, the activator-RNA comprises a nucleotide sequence with at least 60% identity over a stretch of at least 8 contiguous nucleotides to any one of the nucleotide sequences set forth in SEQ ID NOs:431-682.

Features of the present disclosure include a method of modulating site-specific transcription within a target DNA, the method comprising contacting the target DNA with: (i) a DNA-targeting RNA, or a DNA polynucleotide encoding the same, wherein the DNA-targeting RNA comprises: (a) a first segment comprising a nucleotide sequence that is complementary to a sequence in the target DNA; and (b) a second segment that interacts with a site-directed modifying polypeptide; and (ii) a site-directed modifying polypeptide, or a polynucleotide encoding the same, wherein the site-directed modifying polypeptide comprises: (a) an RNA-binding portion that interacts with the DNA-targeting RNA; and (b) an activity portion that modulates transcription, wherein said contacting results in modulating transcription within the target DNA. In some cases, transcription within the target DNA is increased. In some cases, transcription within the target DNA is decreased.

Features of the present disclosure include a method of site-specific modification at target DNA, the method comprising: contacting the target DNA with: (i) a DNA-targeting RNA, or a DNA polynucleotide encoding the same, wherein the DNA-targeting RNA comprises: (a) a first segment comprising a nucleotide sequence that is complementary to a sequence in the target DNA; and (b) a second segment that interacts with a site-directed modifying polypeptide; and (ii) a site-directed modifying polypeptide, or a polynucleotide encoding the same, wherein the site-directed modifying polypeptide comprises: (a) an RNA-binding portion that interacts with the DNA-targeting RNA; and (b) an activity portion that modulates transcription within the target DNA. In some cases, the site-directed modifying polypeptide increases transcription within the target DNA. In some cases, the site-directed modifying polypeptide decreases transcription within the target DNA.

Features of the present disclosure include a method of promoting site-specific cleavage and modification of a target DNA in a cell, the method comprising introducing into the cell: (i) a DNA-targeting RNA, or a DNA polynucleotide encoding the same, wherein the DNA-targeting RNA comprises: (a) a first segment comprising a nucleotide sequence that is complementary to a sequence in the target DNA; and (b) a second segment that interacts with a site-directed modifying polypeptide; and (ii) a site-directed modifying polypeptide, or a polynucleotide encoding the same, wherein the site-directed modifying polypeptide comprises: (a) an RNA-binding portion that interacts with the DNA-targeting RNA; and (b) an activity portion that exhibits nuclease activity that creates a double strand break in the target DNA; wherein the site of the double strand break is determined by the DNA-targeting RNA, the contacting occurs under conditions that are permissive for nonhomologous end joining or homology-directed repair, and the target DNA is cleaved and rejoined to produce a modified DNA sequence. In some cases, the method further comprises contacting the target DNA with a donor polynucleotide, wherein the donor polynucleotide, a portion of the donor polynucleotide, a copy of the donor polynucleotide, or a portion of a copy of the donor polynucleotide integrates into the target DNA. In some cases, the method does not comprise contacting the cell with a donor polynucleotide, wherein the target DNA is modified such that nucleotides within the target DNA are deleted. In some cases, the cell is selected from the group consisting of: an archaeal cell, a bacterial cell, a eukaryotic cell, a eukaryotic single-cell organism, a somatic cell, a germ cell, a stem cell, a plant cell, an algal cell, an animal cell, in invertebrate cell, a vertebrate cell, a fish cell, a frog cell, a bird cell, a mammalian cell, a pig cell, a cow cell, a goat cell, a sheep cell, a rodent cell, a rat cell, a mouse cell, a non-human primate cell, and a human cell. In some cases, the cell is in vitro. In some cases, the cell is in vivo.

Features of the present disclosure include a method of producing a genetically modified cell in a subject, the method comprising: (I) introducing into a cell: (i) a DNA-targeting RNA, or a DNA polynucleotide encoding the same, wherein the DNA-targeting RNA comprises: (a) a first segment comprising a nucleotide sequence that is complementary to a sequence in the target DNA; and (b) a second segment that interacts with a site-directed modifying polypeptide; and (ii) a site-directed modifying polypeptide, or a polynucleotide encoding the same, wherein the site-directed modifying polypeptide comprises: (a) an RNA-binding portion that interacts with the DNA-targeting RNA; and (b) an activity portion that exhibits nuclease activity that creates a double strand break in the target DNA; wherein the site of the double strand break is determined by the DNA-targeting RNA, the contacting occurs under conditions that are permissive for nonhomologous end joining or homology-directed repair, and the target DNA is cleaved and rejoined to produce a modified DNA sequence; thereby producing the genetically modified cell; and (II) transplanting the genetically modified cell into the subject. In some cases, the method further comprises contacting the cell with a donor polynucleotide, wherein the donor polynucleotide, a portion of the donor polynucleotide, a copy of the donor polynucleotide, or a portion of a copy of the donor polynucleotide integrates into the target DNA. In some cases, the method does not comprise contacting the cell with a donor polynucleotide, wherein the target DNA is modified such that nucleotides within the target DNA are deleted. In some cases, the cell is selected from the group consisting of: an archaeal cell, a bacterial cell, a eukaryotic cell, a eukaryotic single-cell organism, a somatic cell, a germ cell, a stem cell, a plant cell, an algal cell, an animal cell, in invertebrate cell, a vertebrate cell, a fish cell, an amphibian cell, a bird cell, a mammalian cell, an ungulate cell, a rodent cell, a non-human primate cell, and a human cell.

Features of the present disclosure include a method of modifying target DNA in a genetically modified cell that comprises a nucleotide sequence encoding an exogenous site-directed modifying polypeptide, the method comprising introducing into the genetically modified cell a DNA-targeting RNA, or a DNA polynucleotide encoding the same, wherein: (i) the DNA-targeting RNA comprises: (a) a first segment comprising a nucleotide sequence that is complementary to a sequence in the target DNA; and (b) a second segment that interacts with a site-directed modifying polypeptide; and (ii) the site-directed modifying polypeptide comprises: (a) an RNA-binding portion that interacts with the DNA-targeting RNA; and (b) an activity portion that exhibits nuclease activity. In some cases, the site-directed modifying polypeptide comprises an amino acid sequence having at least about 75% amino acid sequence identity to amino acids 7-166 or 731-1003 of the Cas9/Csn1 amino acid sequence depicted in FIG. 3A and FIG. 3B, or to the corresponding portions in any of the amino acid sequences set forth as SEQ ID NOs:1-256 and 795-1346. In some cases, the cell is selected from the group consisting of: an archaeal cell, a bacterial cell, a eukaryotic cell, a eukaryotic single-cell organism, a somatic cell, a germ cell, a stem cell, a plant cell, an algal cell, an animal cell, in invertebrate cell, a vertebrate cell, a fish cell, an amphibian cell, a bird cell, a mammalian cell, an ungulate cell, a rodent cell, a non-human primate cell, and a human cell. In some cases, the cell is in vivo. In some cases, the cell is in vitro. In some cases, the expression of the site-directed modifying polypeptide is under the control of an inducible promoter. In some cases, the expression of the site-directed modifying polypeptide is under the control of a cell type-specific promoter.

Features of the present disclosure include a kit comprising: the DNA-targeting RNA, or a DNA polynucleotide encoding the same; and a reagent for reconstitution and/or dilution. In some cases, the kit further comprises a reagent selected from the group consisting of: a buffer for introducing into cells the DNA-targeting RNA, a wash buffer, a control reagent, a control expression vector or RNA polynucleotide, a reagent for transcribing the DNA-targeting RNA from DNA, and combinations thereof.

Features of the present disclosure include a kit comprising: a site-directed modifying polypeptide of the present disclosure, or a polynucleotide encoding the same; and a reagent for reconstitution and/or dilution. In some cases, the kit further comprises a reagent selected from the group consisting of: a buffer for introducing into cells the site-directed modifying polypeptide, a wash buffer, a control reagent, a control expression vector or RNA polynucleotide, a reagent for in vitro production of the site-directed modifying polypeptide from DNA, and combinations thereof.

Features of the present disclosure include a kit comprising: a site-directed modifying polypeptide of the present disclosure, or a polynucleotide encoding the same; and a reagent for reconstitution and/or dilution. Features of the present disclosure include a kit comprising: a DNA-targeting RNA, or a DNA polynucleotide encoding the same, the DNA-targeting RNA comprising: (a) a first segment comprising a nucleotide sequence that is complementary to a sequence in a target DNA; and (b) a second segment that interacts with a site-directed modifying polypeptide; and (ii) the site-directed modifying polypeptide, or a polynucleotide encoding the same, the site-directed modifying polypeptide comprising: (a) an RNA-binding portion that interacts with the DNA-targeting RNA; and (b) an activity portion that exhibits site-directed enzymatic activity, wherein the site of enzymatic activity is determined by the DNA-targeting RNA.

Features of the present disclosure include a kit comprising: (i) a DNA-targeting RNA, or a DNA polynucleotide encoding the same, comprising: (a) a first segment comprising a nucleotide sequence that is complementary to a sequence in a target DNA; and (b) a second segment that interacts with a site-directed modifying polypeptide; and (ii) the site-directed modifying polypeptide, or a polynucleotide encoding the same, comprising: (a) an RNA-binding portion that interacts with the DNA-targeting RNA; and (b) an activity portion that that modulates transcription within the target DNA, wherein the site of modulated transcription within the target DNA is determined by the DNA-targeting RNA.

Features of the present disclosure include a kit comprising: (i) any of the recombinant expression vectors above; and (ii) a reagent for reconstitution and/or dilution. Features of the present disclosure include a kit comprising: (i) any of the recombinant expression vectors above; and (ii) a recombinant expression vector comprising a nucleotide sequence that encodes a site-directed modifying polypeptide, wherein the site-directed modifying polypeptide comprises: (a) an RNA-binding portion that interacts with the DNA-targeting RNA; and (b) an activity portion that exhibits site-directed enzymatic activity, wherein the site of enzymatic activity is determined by the DNA-targeting RNA. Features of the present disclosure include a kit comprising: (i) any of the recombinant expression vectors above; and (ii) a recombinant expression vector comprising a nucleotide sequence that encodes a site-directed modifying polypeptide, wherein the site-directed modifying polypeptide comprises: (a) an RNA-binding portion that interacts with the DNA-targeting RNA; and (b) an activity portion that modulates transcription within the target DNA, wherein the site of modulated transcription within the target DNA is determined by the DNA-targeting RNA.

Features of the present disclosure include a kit for targeting target DNA comprising: two or more DNA-targeting RNAs, or DNA polynucleotides encoding the same, wherein the first segment of at least one of the two or more DNA-targeting RNAs differs by at least one nucleotide from the first segment of at least one other of the two or more DNA-targeting RNAs.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1A-1B provide a schematic drawing of two exemplary subject DNA-targeting RNAs, each associated with a site-directed modifying polypeptide and with a target DNA.

FIG. 2 depicts target DNA editing through double-stranded DNA breaks introduced using a Cas9/Csn1 site-directed modifying polypeptide and a DNA-targeting RNA.

FIG. 3A-3B depict the amino acid sequence of a Cas9/Csn1 protein from Streptococcus pyogenes (SEQ ID NO:8). Cas9 has domains homologous to both HNH and RuvC endonucleases. (FIG. 3A) Motifs 1-4 are overlined (FIG. 3B) Domains 1 and 2 are overlined.

FIG. 4A-4B depict the percent identity between the Cas9/Csn1 proteins from multiple species. (FIG. 4A) Sequence identity relative to Streptococcus pyogenes. For Example, Domain 1 is amino acids 7-166 and Domain 2 is amino acids 731-1003 of Cas9/Csn1 from Streptococcus pyogenes as depicted in FIG. 3B. (FIG. 4B) Sequence identity relative to Neisseria meningitidis. For example, Domain 1 is amino acids 13-139 and Domain 2 is amino acids 475-750 of Cas9/Csn1 from Neisseria meningitidis (SEQ ID NO:79).

FIG. 5 depicts a multiple sequence alignment of motifs 1-4 of Cas9/Csn1 proteins from various diverse species selected from the phylogenetic table in FIG. 32 (see FIG. 32, FIG. 3A, and Table 1) (Streptococcus pyogenes (Motifs 1-4: SEQ ID NOs.: 1361-1364), Legionella pneumophila (Motifs 1-4: SEQ ID NOs.: 1365-1368), Gamma proteobacterium (Motifs 1-4: SEQ ID NOs.: 1369-1372), Listeria innocua (Motifs 1-4: SEQ ID NOs.: 1373-1376), Lactobacillus gasseri (Motifs 1-4: SEQ ID NOs.: 1377-1380), Eubacterium rectale (Motifs 1-4: SEQ ID NOs.: 1381-1384), Staphylococcus lugdunensis (Motifs 1-4: SEQ ID NOs.: 1385-1388), Mycoplasma synoviae (Motifs 1-4: SEQ ID NOs.: 1389-1392), Mycoplasma mobile (Motifs 1-4: SEQ ID NOs.: 1393-1396), Wolinella succinogenes (Motifs 1-4: SEQ ID NOs.: 1397-1400), Flavobacterium columnare (Motifs 1-4: SEQ ID NOs.: 1401-1404), Fibrobacter succinogenes (Motifs 1-4: SEQ ID NOs.: 1405-1408), Bacteroides fragilis (Motifs 1-4: SEQ ID NOs.: 1409-1412), Acidothermus cellulolyticus (Motifs 1-4: SEQ ID NOs.: 1413-1416), and Bifidobacterium dentium (Motifs 1-4: SEQ ID NOs.: 1417-1420).

FIG. 6A-6B provide alignments of naturally occurring tracrRNA (“activator-RNA”) sequences from various species (L. innocua (SEQ ID NO: 434); S. pyogenes (SEQ ID NO: 433); S. mutans (SEQ ID NO: 435); S. thermophilus1 (SEQ ID NO: 436); M. mobile (SEQ ID NO: 440); N. meningitides (SEQ ID NO: 438); P. multocida (SEQ ID NO: 439); S. thermophilus2 (SEQ ID NO: 437); and S. pyogenes (SEQ ID NO: 433). (FIG. 6A) multiple sequence alignment of selected tracrRNA orthologues (AlignX, VectorNTI package, Invitrogen) associated with CRISPR/Cas loci of similar architecture and highly similar Cas9/Csn1 sequences. Black boxes represent shared nucleotides (FIG. 6B) multiple sequence alignment of selected tracrRNA orthologues (AlignX, VectorNTI package, Invitrogen) associated with CRISPR/Cas loci of different architecture and non-closely related Cas9/Csn1 sequences. Note the sequence similarity of N. meningitidis and P. multocida tracrRNA orthologues. Black boxes represent shared nucleotides. For more exemplary activator-RNA sequences, see SEQ ID NOs:431-562.

FIG. 7A-7B provide alignments of naturally occurring duplex-forming segments of crRNA (“targeter-RNA”) sequences from various species (L. innocua (SEQ ID NO:577); S. pyogenes (SEQ ID NO:569); S. mutans (SEQ ID NO:574); S. thermophilus1 (SEQ ID NO:575); C. jejuni (SEQ ID NO:597); S. pyogenes (SEQ ID NO:569); F. novicida (SEQ ID NO:572); M. mobile (SEQ ID NO:571); N. meningitides (SEQ ID NO:579); P. multocida (SEQ ID NO:570); and S. thermophilus2 (SEQ ID NO:576). (FIG. 7A) multiple sequence alignments of exemplary duplex-forming segment of targeter-RNA sequences (AlignX, VectorNTI package, Invitrogen) associated with the loci of similar architecture and highly similar Cas9/Csn1 sequences. (FIG. 7B) multiple sequence alignments of exemplary duplex-forming segment of targeter-RNA sequences (AlignX, VectorNTI package, Invitrogen) associated with the loci of different architecture and diverse Cas9 sequences. Black boxes represent shared nucleotides. For more exemplary duplex-forming segments targeter-RNA sequences, see SEQ ID NOs:563-679.

FIG. 8 provides a schematic of hybridization for naturally occurring duplex-forming segments of the crRNA (“targeter-RNA”) with the duplex-forming segment of the corresponding tracrRNA orthologue (“activator-RNA”). Upper sequence, targeter-RNA; lower sequence, duplex-forming segment of the corresponding activator-RNA. The CRISPR loci belong to the Type II (Nmeni/CASS4) CRISPR/Cas system. Nomenclature is according to the CRISPR database (CRISPR DB). SEQ ID numbers are listed top to bottom: S. pyogenes (SEQ ID NOs:569 and 442); S. mutans (SEQ ID NOs:574 and 443); S. thermophilus1 (SEQ ID NOs:575 and 444); S. thermophilus2 (SEQ ID NOs:576 and 445); L. innocua (SEQ ID NOs:577 and 446); T. denticola (SEQ ID NOs:578 and 448); N. meningitides (SEQ ID NOs:579 and 449); S. gordonii (SEQ ID NOs:580 and 451); B. bifidum (SEQ ID NOs:581 and 452); L. salivarius (SEQ ID NOs:582 and 453); F. tularensis (SEQ ID NOs:583, 454, 584, and 455); and L. pneumophila (SEQ ID NOs:585 and 456). Note that some species contain each two Type II CRISPR loci. For more exemplary activator-RNA sequences, see SEQ ID NOs:431-562. For more exemplary duplex-forming segments targeter-RNA sequences, see SEQ ID NOs:563-679.

FIG. 9 depicts example tracrRNA (activator-RNA) and crRNA (targeter-RNA) sequences from two species. A degree of interchangeability exists; for example, the S. pyogenes Cas9/Csn1 protein is functional with tracrRNA and crRNA derived from L. innocua. (|) denotes a canonical Watson-Crick base pair while (•) denotes a G-U wobble base pair. “Variable 20 nt” or “20 nt” represents the DNA-targeting segment that is complementary to a target DNA (this region can be up to about 100 nt in length). Also shown is the design of single-molecule DNA-targeting RNA that incorporates features of the targeter-RNA and the activator-RNA. (Cas9/Csn1 protein sequences from a wide variety of species are depicted in FIG. 3A and FIG. 3B and set forth as SEQ ID NOs:1-256 and 795-1346) Streptococcus pyogenes: top to bottom: (SEQ ID NO: 1421, 478, 1423); Listeria innocua: top to bottom: (SEQ ID NO: 1422, 479, 1424). The sequences provided are non-limiting examples and are meant to illustrate how single-molecule DNA-targeting RNAs and two-molecule DNA-targeting RNAs can be designed based on naturally existing sequences from a wide variety of species. Various examples of suitable sequences from a wide variety of species are set forth as follows (Cas9 protein: SEQ ID NOs:1-259; tracrRNAs: SEQ ID NOs:431-562, or the complements thereof; crRNAs: SEQ ID NOs:563-679, or the complements thereof; and example single-molecule DNA-targeting RNAs: SEQ ID NOs:680-682).

FIG. 10A-10E show that Cas9 is a DNA endonuclease guided by two RNA molecules. FIG. 10 E (top to bottom, SEQ ID NOs: 278-280, and 431).

FIG. 11A-11B demonstrate that Cas9 uses two nuclease domains to cleave the two strands in the target DNA.

FIG. 12A-12E illustrate that Cas9-catalyzed cleavage of target DNA requires an activating domain in tracrRNA and is governed by a seed sequence in the crRNA. FIG. 12C (top to bottom, SEQ ID NO:278-280, and 431); FIG. 12D (top to bottom, SEQ ID NOs: 281-290); and FIG. 12E (top to bottom, SEQ ID NOs: 291-292, 283, 293-298).

FIG. 13A-13C show that a PAM is required to license target DNA cleavage by the Cas9-tracrRNA:crRNA complex.

FIG. 14A-14C illustrate that Cas9 can be programmed using a single engineered RNA molecule combining tracrRNA and crRNA features. Chimera A (SEQ ID NO:299); Chimera B (SEQ ID NO:300).

FIG. 15 depicts the type II RNA-mediated CRISPR/Cas immune pathway.

FIG. 16A-16B depict purification of Cas9 nucleases.

FIG. 17A-17C show that Cas9 guided by dual-tracrRNA:crRNA cleaves protospacer plasmid and oligonucleotide DNA. FIG. 17B (top to bottom, SEQ ID NOs: 301-303, and 487); and FIG. 17C (top to bottom, SEQ ID NO:304-306, and 431).

FIG. 18A-18B show that Cas9 is a Mg2+-dependent endonuclease with 3′-5′ exonuclease activity.

FIG. 19A-19C illustrate that dual-tracrRNA:crRNA directed Cas9 cleavage of target DNA is site specific. FIG. 19A (top to bottom, SEQ IS NOs: 1350 and 1351) FIG. 19C (top to bottom, SEQ ID NOs: 307-309, 487, 337-339, and 431).

FIG. 20A-20B show that dual-tracrRNA:crRNA directed Cas9 cleavage of target DNA is fast and efficient.

FIG. 21A-21B show that the HNH and RuvC-like domains of Cas9 direct cleavage of the complementary and noncomplementary DNA strand, respectively.

FIG. 22 demonstrates that tracrRNA is required for target DNA recognition.

FIG. 23A-23B show that a minimal region of tracrRNA is capable of guiding dualtracrRNA: crRNA directed cleavage of target DNA.

FIG. 24A-24D demonstrate that dual-tracrRNA:crRNA guided target DNA cleavage by Cas9 can be species specific.

FIG. 25A-25C show that a seed sequence in the crRNA governs dual tracrRNA:crRNA directed cleavage of target DNA by Cas9. FIG. 25A: target DNA probe 1 (SEQ ID NO:310); spacer 4 crRNA (1-42) (SEQ ID NO:311); tracrRNA (15-89) (SEQ ID NO: 1352). FIG. 25B left panel (SEQ ID NO:310).

FIG. 26A-26C demonstrate that the PAM sequence is essential for protospacer plasmid DNA cleavage by Cas9-tracrRNA:crRNA and for Cas9-mediated plasmid DNA interference in bacterial cells. FIG. 26B (top to bottom, SEQ ID NOs:312-314); and FIG. 26C (top to bottom, SEQ ID NO:315-320).

FIG. 27A-27C show that Cas9 guided by a single chimeric RNA mimicking dual tracrRNA:crRNA cleaves protospacer DNA. FIG. 27C (top to bottom, SEQ ID NO:321-324).

FIG. 28A-28D depict de novo design of chimeric RNAs targeting the Green Fluorescent Protein (GFP) gene sequence. FIG. 28B (top to bottom, SEQ ID NOs:325-326). FIG. 28C: GFP1 target sequence (SEQ ID NO:327); GFP2 target sequence (SEQ ID NO:328); GFP3 target sequence (SEQ ID NO:329); GFP4 target sequence (SEQ ID NO:330); GFP5 target sequence (SEQ ID NO:331); GFP1 chimeric RNA (SEQ ID NO:332); GFP2 chimeric RNA (SEQ ID NO:333); GFP3 chimeric RNA (SEQ ID NO:334); GFP4 chimeric RNA (SEQ ID NO:335); GFP5 chimeric RNA (SEQ ID NO:336).

FIG. 29A-29E demonstrate that co-expression of Cas9 and guide RNA in human cells generates double-strand DNA breaks at the target locus. FIG. 29C (top to bottom, SEQ ID NO:425-428).

FIG. 30A-30B demonstrate that cell lysates contain active Cas9:sgRNA and support site-specific DNA cleavage.

FIG. 31A-31B demonstrate that 3′ extension of sgRNA constructs enhances site-specific NHEJ-mediated mutagenesis. FIG. 31A (top to bottom, SEQ ID NO:428-430).

FIG. 32A-32B depict a phylogenetic tree of representative Cas9 sequences from various organisms (FIG. 32A) as well as Cas9 locus architectures for the main groups of the tree (FIG. 32B).

FIG. 33A-33E depict the architecture of type II CRISPR-Cas from selected bacterial species.

FIG. 34A-34B depict tracrRNA and pre-crRNA co-processing in selected type II CRISPR Cas systems. FIG. 34A (top to bottom, SEQ ID NO: 618, 442, 574, 443, 577, 447, 573, 481); FIG. 34B (top to bottom, SEQ ID NO: 598, 470, 579, 450).

FIG. 35 depicts a sequence alignment of tracrRNA orthologues demonstrating the diversity of tracrRNA sequences.

FIG. 36A-36F depict the expression of bacterial tracrRNA orthologues and crRNAs revealed by deep RNA sequencing.

FIG. 37A-37O list all tracrRNA orthologues and mature crRNAs retrieved by sequencing for the bacterial species studied, including coordinates (region of interest) and corresponding cDNA sequences (5′ to 3′).

FIG. 38A-38B present a table of bacterial species containing type II CRISPR-Cas loci characterized by the presence of the signature gene cas9. These sequences were used for phylogenetic analyses.

FIG. 39A-39B depict the design of the CRISPR interference (CRISPRi) system.

FIG. 40A-40E demonstrate that CRISPRi effectively silences transcription elongation and initiation.

FIG. 41A-41B demonstrate that CRISPRi functions by blocking transcription elongation.

FIG. 42A-42C demonstrate the targeting specificity of the CRISPRi system.

FIG. 43A-43F depict the characterization of factors that affect silencing efficiency.

FIG. 44A-44C depict functional profiling of a complex regulatory network using CRISPRi gene knockdown.

FIG. 45A-45B demonstrates gene silencing using CRISPRi in mammalian cells.

FIG. 46 depicts the mechanism of the type II CRISPR system from S. pyogenes.

FIG. 47A-47B depict the growth curves of E. coli cell cultures co-transformed with dCas9 and sgRNA.

FIG. 48 shows that CRISPRi could silence expression of a reporter gene on a multiple-copy plasmid.

FIG. 49A-49C depict the RNA-seq data of cells with sgRNAs that target different genes.

FIG. 50A-50E depict the silencing effects of sgRNAs with adjacent double mismatches.

FIG. 51A-51C depict the combinatorial silencing effects of using two sgRNAs to regulate a single gene.

FIG. 52 shows that sgRNA repression is dependent on the target loci and relatively distance from the transcription start.

FIG. 53A-53C depict experimental results demonstrating that a variant Cas9 site-directed polypeptide (dCas9) is works for the subject methods when dCas9 has reduced activity in the RuvC1 domain only (e.g., D10A), the HNH domain only (e.g., H840A), or both domains (e.g, D10A and H840A).

FIG. 54A-54C list examples of suitable fusion partners (or fragments thereof) for a subject variant Cas9 site-directed polypeptide. Examples include, but are not limited to those listed.

FIG. 55A-55D demonstrate that a chimeric site-directed polypeptide can be used to activate (increase) transcription in human cells.

FIG. 56 demonstrates that a chimeric site-directed polypeptide can be used to repress (decrease) transcription in human cells.

FIG. 57A-57B demonstrate that artificial sequences that share roughly 50% identity with naturally occurring a tracrRNAs and crRNAs can function with Cas9 to cleave target DNA as long as the structure of the protein-binding domain of the DNA-targeting RNA is conserved.

DEFINITIONS—Part I

The terms “polynucleotide” and “nucleic acid,” used interchangeably herein, refer to a polymeric form of nucleotides of any length, either ribonucleotides or deoxyribonucleotides. Thus, this term includes, but is not limited to, single-, double-, or multi-stranded DNA or RNA, genomic DNA, cDNA, DNA-RNA hybrids, or a polymer comprising purine and pyrimidine bases or other natural, chemically or biochemically modified, non-natural, or derivatized nucleotide bases. “Oligonucleotide” generally refers to polynucleotides of between about 5 and about 100 nucleotides of single- or double-stranded DNA. However, for the purposes of this disclosure, there is no upper limit to the length of an oligonucleotide. Oligonucleotides are also known as “oligomers” or “oligos” and may be isolated from genes, or chemically synthesized by methods known in the art. The terms “polynucleotide” and “nucleic acid” should be understood to include, as applicable to the embodiments being described, single-stranded (such as sense or antisense) and double-stranded polynucleotides.

A “stem-loop structure” refers to a nucleic acid having a secondary structure that includes a region of nucleotides which are known or predicted to form a double strand (step portion) that is linked on one side by a region of predominantly single-stranded nucleotides (loop portion). The terms “hairpin” and “fold-back” structures are also used herein to refer to stem-loop structures. Such structures are well known in the art and these terms are used consistently with their known meanings in the art. As is known in the art, a stem-loop structure does not require exact base-pairing. Thus, the stem may include one or more base mismatches. Alternatively, the base-pairing may be exact, i.e. not include any mismatches.

By “hybridizable” or “complementary” or “substantially complementary” it is meant that a nucleic acid (e.g. RNA) comprises a sequence of nucleotides that enables it to non-covalently bind, i.e. form Watson-Crick base pairs and/or G/U base pairs, “anneal”, or “hybridize,” to another nucleic acid in a sequence-specific, antiparallel, manner (i.e., a nucleic acid specifically binds to a complementary nucleic acid) under the appropriate in vitro and/or in vivo conditions of temperature and solution ionic strength. As is known in the art, standard Watson-Crick base-pairing includes: adenine (A) pairing with thymidine (T), adenine (A) pairing with uracil (U), and guanine (G) pairing with cytosine (C) [DNA, RNA]. In addition, it is also known in the art that for hybridization between two RNA molecules (e.g., dsRNA), guanine (G) base pairs with uracil (U). For example, G/U base-pairing is partially responsible for the degeneracy (i.e., redundancy) of the genetic code in the context of tRNA anti-codon base-pairing with codons in mRNA. In the context of this disclosure, a guanine (G) of a protein-binding segment (dsRNA duplex) of a subject DNA-targeting RNA molecule is considered complementary to a uracil (U), and vice versa. As such, when a G/U base-pair can be made at a given nucleotide position a protein-binding segment (dsRNA duplex) of a subject DNA-targeting RNA molecule, the position is not considered to be non-complementary, but is instead considered to be complementary.

Hybridization and washing conditions are well known and exemplified in Sambrook, J., Fritsch, E. F. and Maniatis, T. Molecular Cloning: A Laboratory Manual, Second Edition, Cold Spring Harbor Laboratory Press, Cold Spring Harbor (1989), particularly Chapter 11 and Table 11.1 therein; and Sambrook, J. and Russell, W., Molecular Cloning: A Laboratory Manual, Third Edition, Cold Spring Harbor Laboratory Press, Cold Spring Harbor (2001). The conditions of temperature and ionic strength determine the “stringency” of the hybridization.

Hybridization requires that the two nucleic acids contain complementary sequences, although mismatches between bases are possible. The conditions appropriate for hybridization between two nucleic acids depend on the length of the nucleic acids and the degree of complementation, variables well known in the art. The greater the degree of complementation between two nucleotide sequences, the greater the value of the melting temperature (Tm) for hybrids of nucleic acids having those sequences. For hybridizations between nucleic acids with short stretches of complementarity (e.g. complementarity over 35 or less, 30 or less, 25 or less, 22 or less, 20 or less, or 18 or less nucleotides) the position of mismatches becomes important (see Sambrook et al., supra, 11.7-11.8). Typically, the length for a hybridizable nucleic acid is at least about 10 nucleotides. Illustrative minimum lengths for a hybridizable nucleic acid are: at least about 15 nucleotides; at least about 20 nucleotides; at least about 22 nucleotides; at least about 25 nucleotides; and at least about 30 nucleotides). Furthermore, the skilled artisan will recognize that the temperature and wash solution salt concentration may be adjusted as necessary according to factors such as length of the region of complementation and the degree of complementation.

It is understood in the art that the sequence of polynucleotide need not be 100% complementary to that of its target nucleic acid to be specifically hybridizable or hybridizable. Moreover, a polynucleotide may hybridize over one or more segments such that intervening or adjacent segments are not involved in the hybridization event (e.g., a loop structure or hairpin structure). A polynucleotide can comprise at least 70%, at least 80%, at least 90%, at least 95%, at least 99%, or 100% sequence complementarity to a target region within the target nucleic acid sequence to which they are targeted. For example, an antisense nucleic acid in which 18 of 20 nucleotides of the antisense compound are complementary to a target region, and would therefore specifically hybridize, would represent 90 percent complementarity. In this example, the remaining noncomplementary nucleotides may be clustered or interspersed with complementary nucleotides and need not be contiguous to each other or to complementary nucleotides. Percent complementarity between particular stretches of nucleic acid sequences within nucleic acids can be determined routinely using BLAST programs (basic local alignment search tools) and PowerBLAST programs known in the art (Altschul et al., J. Mol. Biol., 1990, 215, 403-410; Zhang and Madden, Genome Res., 1997, 7, 649-656) or by using the Gap program (Wisconsin Sequence Analysis Package, Version 8 for Unix, Genetics Computer Group, University Research Park, Madison Wis.), using default settings, which uses the algorithm of Smith and Waterman (Adv. Appl. Math., 1981, 2, 482-489).

The terms “peptide,”“polypeptide,” and “protein” are used interchangeably herein, and refer to a polymeric form of amino acids of any length, which can include coded and non-coded amino acids, chemically or biochemically modified or derivatized amino acids, and polypeptides having modified peptide backbones.

“Binding” as used herein (e.g. with reference to an RNA-binding domain of a polypeptide) refers to a non-covalent interaction between macromolecules (e.g., between a protein and a nucleic acid). While in a state of non-covalent interaction, the macromolecules are said to be “associated” or “interacting” or “binding” (e.g., when a molecule X is said to interact with a molecule Y, it is meant the molecule X binds to molecule Y in a non-covalent manner). Not all components of a binding interaction need be sequence-specific (e.g., contacts with phosphate residues in a DNA backbone), but some portions of a binding interaction may be sequence-specific. Binding interactions are generally characterized by a dissociation constant (Kd) of less than 10−6 M, less than 10−7 M, less than 10−8 M, less than 10−9 M, less than 10−10 M, less than 10−11 M, less than 10−12 M, less than 10−13 M, less than 10−14 M, or less than 10−15 M. “Affinity” refers to the strength of binding, increased binding affinity being correlated with a lower Kd.

By “binding domain” it is meant a protein domain that is able to bind non-covalently to another molecule. A binding domain can bind to, for example, a DNA molecule (a DNA-binding protein), an RNA molecule (an RNA-binding protein) and/or a protein molecule (a protein-binding protein). In the case of a protein domain-binding protein, it can bind to itself (to form homodimers, homotrimers, etc.) and/or it can bind to one or more molecules of a different protein or proteins.

The term “conservative amino acid substitution” refers to the interchangeability in proteins of amino acid residues having similar side chains. For example, a group of amino acids having aliphatic side chains consists of glycine, alanine, valine, leucine, and isoleucine; a group of amino acids having aliphatic-hydroxyl side chains consists of serine and threonine; a group of amino acids having amide containing side chains consisting of asparagine and glutamine; a group of amino acids having aromatic side chains consists of phenylalanine, tyrosine, and tryptophan; a group of amino acids having basic side chains consists of lysine, arginine, and histidine; a group of amino acids having acidic side chains consists of glutamate and aspartate; and a group of amino acids having sulfur containing side chains consists of cysteine and methionine. Exemplary conservative amino acid substitution groups are: valine-leucine-isoleucine, phenylalanine-tyrosine, lysine-arginine, alanine-valine, and asparagine-glutamine.

A polynucleotide or polypeptide has a certain percent “sequence identity” to another polynucleotide or polypeptide, meaning that, when aligned, that percentage of bases or amino acids are the same, and in the same relative position, when comparing the two sequences. Sequence identity can be determined in a number of different manners. To determine sequence identity, sequences can be aligned using various methods and computer programs (e.g., BLAST, T-COFFEE, MUSCLE, MAFFT, etc.), available over the world wide web at sites including ncbi.nlm.nili.gov/BLAST, ebi.ac.uk/Tools/msa/tcoffee/, ebi.ac.uk/Tools/msa/muscle/, mafft.cbrc.jp/alignment/software/. See, e.g., Altschul et al. (1990), J. Mol. Bioi. 215:403-10.

A DNA sequence that “encodes” a particular RNA is a DNA nucleic acid sequence that is transcribed into RNA. A DNA polynucleotide may encode an RNA (mRNA) that is translated into protein, or a DNA polynucleotide may encode an RNA that is not translated into protein (e.g. tRNA, rRNA, or a DNA-targeting RNA; also called “non-coding” RNA or “ncRNA”).

A “protein coding sequence” or a sequence that encodes a particular protein or polypeptide, is a nucleic acid sequence that is transcribed into mRNA (in the case of DNA) and is translated (in the case of mRNA) into a polypeptide in vitro or in vivo when placed under the control of appropriate regulatory sequences. The boundaries of the coding sequence are determined by a start codon at the 5′ terminus (N-terminus) and a translation stop nonsense codon at the 3′ terminus (C-terminus). A coding sequence can include, but is not limited to, cDNA from prokaryotic or eukaryotic mRNA, genomic DNA sequences from prokaryotic or eukaryotic DNA, and synthetic nucleic acids. A transcription termination sequence will usually be located 3′ to the coding sequence.

As used herein, a “promoter sequence” is a DNA regulatory region capable of binding RNA polymerase and initiating transcription of a downstream (3′ direction) coding or non-coding sequence. For purposes of defining the present invention, the promoter sequence is bounded at its 3′ terminus by the transcription initiation site and extends upstream (5′ direction) to include the minimum number of bases or elements necessary to initiate transcription at levels detectable above background. Within the promoter sequence will be found a transcription initiation site, as well as protein binding domains responsible for the binding of RNA polymerase. Eukaryotic promoters will often, but not always, contain “TATA” boxes and “CAT” boxes. Various promoters, including inducible promoters, may be used to drive the various vectors of the present invention.

A promoter can be a constitutively active promoter (i.e., a promoter that is constitutively in an active/“ON” state), it may be an inducible promoter (i.e., a promoter whose state, active/“ON” or inactive/“OFF”, is controlled by an external stimulus, e.g., the presence of a particular temperature, compound, or protein.), it may be a spatially restricted promoter (i.e., transcriptional control element, enhancer, etc.)(e.g., tissue specific promoter, cell type specific promoter, etc.), and it may be a temporally restricted promoter (i.e., the promoter is in the “ON” state or “OFF” state during specific stages of embryonic development or during specific stages of a biological process, e.g., hair follicle cycle in mice).

Suitable promoters can be derived from viruses and can therefore be referred to as viral promoters, or they can be derived from any organism, including prokaryotic or eukaryotic organisms. Suitable promoters can be used to drive expression by any RNA polymerase (e.g., pol I, pol II, pol III). Exemplary promoters include, but are not limited to the SV40 early promoter, mouse mammary tumor virus long terminal repeat (LTR) promoter; adenovirus major late promoter (Ad MLP); a herpes simplex virus (HSV) promoter, a cytomegalovirus (CMV) promoter such as the CMV immediate early promoter region (CMVIE), a rous sarcoma virus (RSV) promoter, a human U6 small nuclear promoter (U6) (Miyagishi et al., Nature Biotechnology 20, 497-500 (2002)), an enhanced U6 promoter (e.g., Xia et al., Nucleic Acids Res. 2003 Sep. 1; 31(17)), a human H1 promoter (H1), and the like.

Examples of inducible promoters include, but are not limited to T7 RNA polymerase promoter, T3 RNA polymerase promoter, Isopropyl-beta-D-thiogalactopyranoside (IPTG)-regulated promoter, lactose induced promoter, heat shock promoter, Tetracycline-regulated promoter, Steroid-regulated promoter, Metal-regulated promoter, estrogen receptor-regulated promoter, etc. Inducible promoters can therefore be regulated by molecules including, but not limited to, doxycycline; RNA polymerase, e.g., T7 RNA polymerase; an estrogen receptor; an estrogen receptor fusion; etc.

In some embodiments, the promoter is a spatially restricted promoter (i.e., cell type specific promoter, tissue specific promoter, etc.) such that in a multi-cellular organism, the promoter is active (i.e., “ON”) in a subset of specific cells. Spatially restricted promoters may also be referred to as enhancers, transcriptional control elements, control sequences, etc. Any convenient spatially restricted promoter may be used and the choice of suitable promoter (e.g., a brain specific promoter, a promoter that drives expression in a subset of neurons, a promoter that drives expression in the germline, a promoter that drives expression in the lungs, a promoter that drives expression in muscles, a promoter that drives expression in islet cells of the pancreas, etc.) will depend on the organism. For example, various spatially restricted promoters are known for plants, flies, worms, mammals, mice, etc. Thus, a spatially restricted promoter can be used to regulate the expression of a nucleic acid encoding a subject site-directed modifying polypeptide in a wide variety of different tissues and cell types, depending on the organism. Some spatially restricted promoters are also temporally restricted such that the promoter is in the “ON” state or “OFF” state during specific stages of embryonic development or during specific stages of a biological process (e.g., hair follicle cycle in mice).

For illustration purposes, examples of spatially restricted promoters include, but are not limited to, neuron-specific promoters, adipocyte-specific promoters, cardiomyocyte-specific promoters, smooth muscle-specific promoters, photoreceptor-specific promoters, etc. Neuron-specific spatially restricted promoters include, but are not limited to, a neuron-specific enolase (NSE) promoter (see, e.g., EMBL HSENO2, X51956); an aromatic amino acid decarboxylase (AADC) promoter; a neurofilament promoter (see, e.g., GenBank HUMNFL, L04147); a synapsin promoter (see, e.g., GenBank HUMSYNIB, M55301); a thy-1 promoter (see, e.g., Chen et al. (1987) Cell 51:7-19; and Llewellyn, et al. (2010) Nat. Med. 16(10):1161-1166); a serotonin receptor promoter (see, e.g., GenBank S62283); a tyrosine hydroxylase promoter (TH) (see, e.g., Oh et al. (2009) Gene Ther 16:437; Sasaoka et al. (1992) Mol. Brain Res. 16:274; Boundy et al. (1998) J. Neurosci. 18:9989; and Kaneda et al. (1991) Neuron 6:583-594); a GnRH promoter (see, e.g., Radovick et al. (1991) Proc. Natl. Acad. Sci. USA 88:3402-3406); an L7 promoter (see, e.g., Oberdick et al. (1990) Science 248:223-226); a DNMT promoter (see, e.g., Bartge et al. (1988) Proc. Natl. Acad. Sci. USA 85:3648-3652); an enkephalin promoter (see, e.g., Comb et al. (1988) EMBO J. 17:3793-3805); a myelin basic protein (MBP) promoter; a Ca2+-calmodulin-dependent protein kinase II-alpha (CamKIIα) promoter (see, e.g., Mayford et al. (1996) Proc. Natl. Acad. Sci. USA 93:13250; and Casanova et al. (2001) Genesis 31:37); a CMV enhancer/platelet-derived growth factor-β promoter (see, e.g., Liu et al. (2004) Gene Therapy 11:52-60); and the like.

Adipocyte-specific spatially restricted promoters include, but are not limited to aP2 gene promoter/enhancer, e.g., a region from −5.4 kb to +21 bp of a human aP2 gene (see, e.g., Tozzo et al. (1997) Endocrinol. 138:1604; Ross et al. (1990) Proc. Natl. Acad. Sci. USA 87:9590; and Pavjani et al. (2005) Nat. Med. 11:797); a glucose transporter-4 (GLUT4) promoter (see, e.g., Knight et al. (2003) Proc. Natl. Acad. Sci. USA 100:14725); a fatty acid translocase (FAT/CD36) promoter (see, e.g., Kuriki et al. (2002) Biol. Pharm. Bull. 25:1476; and Sato et al. (2002) J. Biol. Chem. 277:15703); a stearoyl-CoA desaturase-1 (SCD1) promoter (Tabor et al. (1999) J. Biol. Chem. 274:20603); a leptin promoter (see, e.g., Mason et al. (1998) Endocrinol. 139:1013; and Chen et al. (1999) Biochem. Biophys. Res. Comm 262:187); an adiponectin promoter (see, e.g., Kita et al. (2005) Biochem. Biophys. Res. Comm 331:484; and Chakrabarti (2010) Endocrinol. 151:2408); an adipsin promoter (see, e.g., Platt et al. (1989) Proc. Natl. Acad. Sci. USA 86:7490); a resistin promoter (see, e.g., Seo et al. (2003) Molec. Endocrinol. 17:1522); and the like.

Cardiomyocyte-specific spatially restricted promoters include, but are not limited to control sequences derived from the following genes: myosin light chain-2, α-myosin heavy chain, AE3, cardiac troponin C, cardiac actin, and the like. Franz et al. (1997) Cardiovasc. Res. 35:560-566; Robbins et al. (1995) Ann. N.Y. Acad. Sci. 752:492-505; Linn et al. (1995) Circ. Res. 76:584-591; Parmacek et al. (1994) Mol. Cell. Biol. 14:1870-1885; Hunter et al. (1993) Hypertension 22:608-617; and Sartorelli et al. (1992) Proc. Natl. Acad. Sci. USA 89:4047-4051.

Smooth muscle-specific spatially restricted promoters include, but are not limited to an SM22α promoter (see, e.g., Akyürek et al. (2000) Mol. Med. 6:983; and U.S. Pat. No. 7,169,874); a smoothelin promoter (see, e.g., WO 2001/018048); an α-smooth muscle actin promoter; and the like. For example, a 0.4 kb region of the SM22α promoter, within which lie two CArG elements, has been shown to mediate vascular smooth muscle cell-specific expression (see, e.g., Kim, et al. (1997) Mol. Cell. Biol. 17, 2266-2278; Li, et al., (1996) J. Cell Biol. 132, 849-859; and Moessler, et al. (1996) Development 122, 2415-2425).

Photoreceptor-specific spatially restricted promoters include, but are not limited to, a rhodopsin promoter; a rhodopsin kinase promoter (Young et al. (2003) Ophthalmol. Vis. Sci. 44:4076); a beta phosphodiesterase gene promoter (Nicoud et al. (2007) J. Gene Med. 9:1015); a retinitis pigmentosa gene promoter (Nicoud et al. (2007) supra); an interphotoreceptor retinoid-binding protein (IRBP) gene enhancer (Nicoud et al. (2007) supra); an IRBP gene promoter (Yokoyama et al. (1992) Exp Eye Res. 55:225); and the like.

The terms “DNA regulatory sequences,”“control elements,” and “regulatory elements,” used interchangeably herein, refer to transcriptional and translational control sequences, such as promoters, enhancers, polyadenylation signals, terminators, protein degradation signals, and the like, that provide for and/or regulate transcription of a non-coding sequence (e.g., DNA-targeting RNA) or a coding sequence (e.g., site-directed modifying polypeptide, or Cas9/Csn1 polypeptide) and/or regulate translation of an encoded polypeptide.

The term “naturally-occurring” or “unmodified” as used herein as applied to a nucleic acid, a polypeptide, a cell, or an organism, refers to a nucleic acid, polypeptide, cell, or organism that is found in nature. For example, a polypeptide or polynucleotide sequence that is present in an organism (including viruses) that can be isolated from a source in nature and which has not been intentionally modified by a human in the laboratory is naturally occurring.

The term “chimeric” as used herein as applied to a nucleic acid or polypeptide refers to two components that are defined by structures derived from different sources. For example, where “chimeric” is used in the context of a chimeric polypeptide (e.g., a chimeric Cas9/Csn1 protein), the chimeric polypeptide includes amino acid sequences that are derived from different polypeptides. A chimeric polypeptide may comprise either modified or naturally-occurring polypeptide sequences (e.g., a first amino acid sequence from a modified or unmodified Cas9/Csn1 protein; and a second amino acid sequence other than the Cas9/Csn1 protein). Similarly, “chimeric” in the context of a polynucleotide encoding a chimeric polypeptide includes nucleotide sequences derived from different coding regions (e.g., a first nucleotide sequence encoding a modified or unmodified Cas9/Csn1 protein; and a second nucleotide sequence encoding a polypeptide other than a Cas9/Csn1 protein).

The term “chimeric polypeptide” refers to a polypeptide which is made by the combination (i.e., “fusion”) of two otherwise separated segments of amino sequence, usually through human intervention. A polypeptide that comprises a chimeric amino acid sequence is a chimeric polypeptide. Some chimeric polypeptides can be referred to as “fusion variants.”

“Heterologous,” as used herein, means a nucleotide or polypeptide sequence that is not found in the native nucleic acid or protein, respectively. For example, in a chimeric Cas9/Csn1 protein, the RNA-binding domain of a naturally-occurring bacterial Cas9/Csn1 polypeptide (or a variant thereof) may be fused to a heterologous polypeptide sequence (i.e. a polypeptide sequence from a protein other than Cas9/Csn1 or a polypeptide sequence from another organism). The heterologous polypeptide sequence may exhibit an activity (e.g., enzymatic activity) that will also be exhibited by the chimeric Cas9/Csn1 protein (e.g., methyltransferase activity, acetyltransferase activity, kinase activity, ubiquitinating activity, etc.). A heterologous nucleic acid sequence may be linked to a naturally-occurring nucleic acid sequence (or a variant thereof) (e.g., by genetic engineering) to generate a chimeric nucleotide sequence encoding a chimeric polypeptide. As another example, in a fusion variant Cas9 site-directed polypeptide, a variant Cas9 site-directed polypeptide may be fused to a heterologous polypeptide (i.e. a polypeptide other than Cas9), which exhibits an activity that will also be exhibited by the fusion variant Cas9 site-directed polypeptide. A heterologous nucleic acid sequence may be linked to a variant Cas9 site-directed polypeptide (e.g., by genetic engineering) to generate a nucleotide sequence encoding a fusion variant Cas9 site-directed polypeptide.

“Recombinant” as used herein, means that a particular nucleic acid (DNA or RNA) is the product of various combinations of cloning, restriction, polymerase chain reaction (PCR) and/or ligation steps resulting in a construct having a structural coding or non-coding sequence distinguishable from endogenous nucleic acids found in natural systems. DNA sequences encoding polypeptides can be assembled from cDNA fragments or from a series of synthetic oligonucleotides, to provide a synthetic nucleic acid which is capable of being expressed from a recombinant transcriptional unit contained in a cell or in a cell-free transcription and translation system. Genomic DNA comprising the relevant sequences can also be used in the formation of a recombinant gene or transcriptional unit. Sequences of non-translated DNA may be present 5′ or 3′ from the open reading frame, where such sequences do not interfere with manipulation or expression of the coding regions, and may indeed act to modulate production of a desired product by various mechanisms (see “DNA regulatory sequences”, below). Alternatively, DNA sequences encoding RNA (e.g., DNA-targeting RNA) that is not translated may also be considered recombinant. Thus, e.g., the term “recombinant” nucleic acid refers to one which is not naturally occurring, e.g., is made by the artificial combination of two otherwise separated segments of sequence through human intervention. This artificial combination is often accomplished by either chemical synthesis means, or by the artificial manipulation of isolated segments of nucleic acids, e.g., by genetic engineering techniques. Such is usually done to replace a codon with a codon encoding the same amino acid, a conservative amino acid, or a non-conservative amino acid. Alternatively, it is performed to join together nucleic acid segments of desired functions to generate a desired combination of functions. This artificial combination is often accomplished by either chemical synthesis means, or by the artificial manipulation of isolated segments of nucleic acids, e.g., by genetic engineering techniques. When a recombinant polynucleotide encodes a polypeptide, the sequence of the encoded polypeptide can be naturally occurring (“wild type”) or can be a variant (e.g., a mutant) of the naturally occurring sequence. Thus, the term “recombinant” polypeptide does not necessarily refer to a polypeptide whose sequence does not naturally occur. Instead, a “recombinant” polypeptide is encoded by a recombinant DNA sequence, but the sequence of the polypeptide can be naturally occurring (“wild type”) or non-naturally occurring (e.g., a variant, a mutant, etc.). Thus, a “recombinant” polypeptide is the result of human intervention, but may be a naturally occurring amino acid sequence.

A “vector” or “expression vector” is a replicon, such as plasmid, phage, virus, or cosmid, to which another DNA segment, i.e. an “insert”, may be attached so as to bring about the replication of the attached segment in a cell.

An “expression cassette” comprises a DNA coding sequence operably linked to a promoter. “Operably linked” refers to a juxtaposition wherein the components so described are in a relationship permitting them to function in their intended manner. For instance, a promoter is operably linked to a coding sequence if the promoter affects its transcription or expression.

The terms “recombinant expression vector,” or “DNA construct” are used interchangeably herein to refer to a DNA molecule comprising a vector and at least one insert. Recombinant expression vectors are usually generated for the purpose of expressing and/or propagating the insert(s), or for the construction of other recombinant nucleotide sequences. The insert(s) may or may not be operably linked to a promoter sequence and may or may not be operably linked to DNA regulatory sequences.

A cell has been “genetically modified” or “transformed” or “transfected” by exogenous DNA, e.g. a recombinant expression vector, when such DNA has been introduced inside the cell. The presence of the exogenous DNA results in permanent or transient genetic change. The transforming DNA may or may not be integrated (covalently linked) into the genome of the cell. In prokaryotes, yeast, and mammalian cells for example, the transforming DNA may be maintained on an episomal element such as a plasmid. With respect to eukaryotic cells, a stably transformed cell is one in which the transforming DNA has become integrated into a chromosome so that it is inherited by daughter cells through chromosome replication. This stability is demonstrated by the ability of the eukaryotic cell to establish cell lines or clones that comprise a population of daughter cells containing the transforming DNA. A “clone” is a population of cells derived from a single cell or common ancestor by mitosis. A “cell line” is a clone of a primary cell that is capable of stable growth in vitro for many generations.

Suitable methods of genetic modification (also referred to as “transformation”) include e.g., viral or bacteriophage infection, transfection, conjugation, protoplast fusion, lipofection, electroporation, calcium phosphate precipitation, polyethyleneimine (PEI)-mediated transfection, DEAE-dextran mediated transfection, liposome-mediated transfection, particle gun technology, calcium phosphate precipitation, direct micro injection, nanoparticle-mediated nucleic acid delivery (see, e.g., Panyam et., al Adv Drug Deliv Rev. 2012 Sep. 13. pii: S0169-409X(12)00283-9.doi:10.1016/j.addr.2012.09.023), and the like.

The choice of method of genetic modification is generally dependent on the type of cell being transformed and the circumstances under which the transformation is taking place (e.g., in vitro, ex vivo, or in vivo). A general discussion of these methods can be found in Ausubel, et al., Short Protocols in Molecular Biology, 3rd ed., Wiley & Sons, 1995.

A “target DNA” as used herein is a DNA polynucleotide that comprises a “target site” or “target sequence.” The terms “target site” or “target sequence” or “target protospacer DNA” are used interchangeably herein to refer to a nucleic acid sequence present in a target DNA to which a DNA-targeting segment of a subject DNA-targeting RNA will bind (see FIGS. 1A-1B and FIG. 39A-39B), provided sufficient conditions for binding exist. For example, the target site (or target sequence) 5′-GAGCATATC-3′ within a target DNA is targeted by (or is bound by, or hybridizes with, or is complementary to) the RNA sequence 5′-GAUAUGCUC-3′. Suitable DNA/RNA binding conditions include physiological conditions normally present in a cell. Other suitable DNA/RNA binding conditions (e.g., conditions in a cell-free system) are known in the art; see, e.g., Sambrook, supra. The strand of the target DNA that is complementary to and hybridizes with the DNA-targeting RNA is referred to as the “complementary strand” and the strand of the target DNA that is complementary to the “complementary strand” (and is therefore not complementary to the DNA-targeting RNA) is referred to as the “noncomplementary strand” or “non-complementary strand” (see FIG. 12A-12E).

By “site-directed modifying polypeptide” or “RNA-binding site-directed polypeptide” or “RNA-binding site-directed modifying polypeptide” or “site-directed polypeptide” it is meant a polypeptide that binds RNA and is targeted to a specific DNA sequence. A site-directed modifying polypeptide as described herein is targeted to a specific DNA sequence by the RNA molecule to which it is bound. The RNA molecule comprises a sequence that is complementary to a target sequence within the target DNA, thus targeting the bound polypeptide to a specific location within the target DNA (the target sequence).

By “cleavage” it is meant the breakage of the covalent backbone of a DNA molecule. Cleavage can be initiated by a variety of methods including, but not limited to, enzymatic or chemical hydrolysis of a phosphodiester bond. Both single-stranded cleavage and double-stranded cleavage are possible, and double-stranded cleavage can occur as a result of two distinct single-stranded cleavage events. DNA cleavage can result in the production of either blunt ends or staggered ends. In certain embodiments, a complex comprising a DNA-targeting RNA and a site-directed modifying polypeptide is used for targeted double-stranded DNA cleavage.

“Nuclease” and “endonuclease” are used interchangeably herein to mean an enzyme which possesses catalytic activity for DNA cleavage.

By “cleavage domain” or “active domain” or “nuclease domain” of a nuclease it is meant the polypeptide sequence or domain within the nuclease which possesses the catalytic activity for DNA cleavage. A cleavage domain can be contained in a single polypeptide chain or cleavage activity can result from the association of two (or more) polypeptides. A single nuclease domain may consist of more than one isolated stretch of amino acids within a given polypeptide.

The RNA molecule that binds to the site-directed modifying polypeptide and targets the polypeptide to a specific location within the target DNA is referred to herein as the “DNA-targeting RNA” or “DNA-targeting RNA polynucleotide” (also referred to herein as a “guide RNA” or “gRNA”). A subject DNA-targeting RNA comprises two segments, a “DNA-targeting segment” and a “protein-binding segment.” By “segment” it is meant a segment/section/region of a molecule, e.g., a contiguous stretch of nucleotides in an RNA. A segment can also mean a region/section of a complex such that a segment may comprise regions of more than one molecule. For example, in some cases the protein-binding segment (described below) of a DNA-targeting RNA is one RNA molecule and the protein-binding segment therefore comprises a region of that RNA molecule. In other cases, the protein-binding segment (described below) of a DNA-targeting RNA comprises two separate molecules that are hybridized along a region of complementarity. As an illustrative, non-limiting example, a protein-binding segment of a DNA-targeting RNA that comprises two separate molecules can comprise (i) base pairs 40-75 of a first RNA molecule that is 100 base pairs in length; and (ii) base pairs 10-25 of a second RNA molecule that is 50 base pairs in length. The definition of “segment,” unless otherwise specifically defined in a particular context, is not limited to a specific number of total base pairs, is not limited to any particular number of base pairs from a given RNA molecule, is not limited to a particular number of separate molecules within a complex, and may include regions of RNA molecules that are of any total length and may or may not include regions with complementarity to other molecules.

The DNA-targeting segment (or “DNA-targeting sequence”) comprises a nucleotide sequence that is complementary to a specific sequence within a target DNA (the complementary strand of the target DNA). The protein-binding segment (or “protein-binding sequence”) interacts with a site-directed modifying polypeptide. When the site-directed modifying polypeptide is a Cas9 or Cas9 related polypeptide (described in more detail below), site-specific cleavage of the target DNA occurs at locations determined by both (i) base-pairing complementarity between the DNA-targeting RNA and the target DNA; and (ii) a short motif (referred to as the protospacer adjacent motif (PAM)) in the target DNA.

The protein-binding segment of a subject DNA-targeting RNA comprises two complementary stretches of nucleotides that hybridize to one another to form a double stranded RNA duplex (dsRNA duplex).

In some embodiments, a subject nucleic acid (e.g., a DNA-targeting RNA, a nucleic acid comprising a nucleotide sequence encoding a DNA-targeting RNA; a nucleic acid encoding a site-directed polypeptide; etc.) comprises a modification or sequence that provides for an additional desirable feature (e.g., modified or regulated stability; subcellular targeting; tracking, e.g., a fluorescent label; a binding site for a protein or protein complex; etc.). Non-limiting examples include: a 5′ cap (e.g., a 7-methylguanylate cap (m7G)); a 3′ polyadenylated tail (i.e., a 3′ poly(A) tail); a riboswitch sequence (e.g., to allow for regulated stability and/or regulated accessibility by proteins and/or protein complexes); a stability control sequence; a sequence that forms a dsRNA duplex (i.e., a hairpin)); a modification or sequence that targets the RNA to a subcellular location (e.g., nucleus, mitochondria, chloroplasts, and the like); a modification or sequence that provides for tracking (e.g., direct conjugation to a fluorescent molecule, conjugation to a moiety that facilitates fluorescent detection, a sequence that allows for fluorescent detection, etc.); a modification or sequence that provides a binding site for proteins (e.g., proteins that act on DNA, including transcriptional activators, transcriptional repressors, DNA methyltransferases, DNA demethylases, histone acetyltransferases, histone deacetylases, and the like); and combinations thereof.

In some embodiments, a DNA-targeting RNA comprises an additional segment at either the 5′ or 3′ end that provides for any of the features described above. For example, a suitable third segment can comprise a 5′ cap (e.g., a 7-methylguanylate cap (m7G)); a 3′ polyadenylated tail (i.e., a 3′ poly(A) tail); a riboswitch sequence (e.g., to allow for regulated stability and/or regulated accessibility by proteins and protein complexes); a stability control sequence; a sequence that forms a dsRNA duplex (i.e., a hairpin)); a sequence that targets the RNA to a subcellular location (e.g., nucleus, mitochondria, chloroplasts, and the like); a modification or sequence that provides for tracking (e.g., direct conjugation to a fluorescent molecule, conjugation to a moiety that facilitates fluorescent detection, a sequence that allows for fluorescent detection, etc.); a modification or sequence that provides a binding site for proteins (e.g., proteins that act on DNA, including transcriptional activators, transcriptional repressors, DNA methyltransferases, DNA demethylases, histone acetyltransferases, histone deacetylases, and the like); and combinations thereof.

A subject DNA-targeting RNA and a subject site-directed modifying polypeptide (i.e., site-directed polypeptide) form a complex (i.e., bind via non-covalent interactions). The DNA-targeting RNA provides target specificity to the complex by comprising a nucleotide sequence that is complementary to a sequence of a target DNA. The site-directed modifying polypeptide of the complex provides the site-specific activity. In other words, the site-directed modifying polypeptide is guided to a target DNA sequence (e.g. a target sequence in a chromosomal nucleic acid; a target sequence in an extrachromosomal nucleic acid, e.g. an episomal nucleic acid, a minicircle, etc.; a target sequence in a mitochondrial nucleic acid; a target sequence in a chloroplast nucleic acid; a target sequence in a plasmid; etc.) by virtue of its association with the protein-binding segment of the DNA-targeting RNA.

In some embodiments, a subject DNA-targeting RNA comprises two separate RNA molecules (RNA polynucleotides: an “activator-RNA” and a “targeter-RNA”, see below) and is referred to herein as a “double-molecule DNA-targeting RNA” or a “two-molecule DNA-targeting RNA.” In other embodiments, the subject DNA-targeting RNA is a single RNA molecule (single RNA polynucleotide) and is referred to herein as a “single-molecule DNA-targeting RNA,” a “single-guide RNA,” or an “sgRNA.” The term “DNA-targeting RNA” or “gRNA” is inclusive, referring both to double-molecule DNA-targeting RNAs and to single-molecule DNA-targeting RNAs (i.e., sgRNAs).

An exemplary two-molecule DNA-targeting RNA comprises a crRNA-like (“CRISPR RNA” or “targeter-RNA” or “crRNA” or “crRNA repeat”) molecule and a corresponding tracrRNA-like (“trans-acting CRISPR RNA” or “activator-RNA” or “tracrRNA”) molecule. A crRNA-like molecule (targeter-RNA) comprises both the DNA-targeting segment (single stranded) of the DNA-targeting RNA and a stretch (“duplex-forming segment”) of nucleotides that forms one half of the dsRNA duplex of the protein-binding segment of the DNA-targeting RNA. A corresponding tracrRNA-like molecule (activator-RNA) comprises a stretch of nucleotides (duplex-forming segment) that forms the other half of the dsRNA duplex of the protein-binding segment of the DNA-targeting RNA. In other words, a stretch of nucleotides of a crRNA-like molecule are complementary to and hybridize with a stretch of nucleotides of a tracrRNA-like molecule to form the dsRNA duplex of the protein-binding domain of the DNA-targeting RNA. As such, each crRNA-like molecule can be said to have a corresponding tracrRNA-like molecule. The crRNA-like molecule additionally provides the single stranded DNA-targeting segment. Thus, a crRNA-like and a tracrRNA-like molecule (as a corresponding pair) hybridize to form a DNA-targeting RNA. The exact sequence of a given crRNA or tracrRNA molecule is characteristic of the species in which the RNA molecules are found. Various crRNAs and tracrRNAs are depicted in corresponding complementary pairs in FIG. 8. A subject double-molecule DNA-targeting RNA can comprise any corresponding crRNA and tracrRNA pair. A subject double-molecule DNA-targeting RNA can comprise any corresponding crRNA and tracrRNA pair.

The term “activator-RNA” is used herein to mean a tracrRNA-like molecule of a double-molecule DNA-targeting RNA. The term “targeter-RNA” is used herein to mean a crRNA-like molecule of a double-molecule DNA-targeting RNA. The term “duplex-forming segment” is used herein to mean the stretch of nucleotides of an activator-RNA or a targeter-RNA that contributes to the formation of the dsRNA duplex by hybridizing to a stretch of nucleotides of a corresponding activator-RNA or targeter-RNA molecule. In other words, an activator-RNA comprises a duplex-forming segment that is complementary to the duplex-forming segment of the corresponding targeter-RNA. As such, an activator-RNA comprises a duplex-forming segment while a targeter-RNA comprises both a duplex-forming segment and the DNA-targeting segment of the DNA-targeting RNA. Therefore, a subject double-molecule DNA-targeting RNA can be comprised of any corresponding activator-RNA and targeter-RNA pair.

A “host cell,” as used herein, denotes an in vivo or in vitro eukaryotic cell, a prokaryotic cell (e.g., bacterial or archaeal cell), or a cell from a multicellular organism (e.g., a cell line) cultured as a unicellular entity, which eukaryotic or prokaryotic cells can be, or have been, used as recipients for a nucleic acid, and include the progeny of the original cell which has been transformed by the nucleic acid. It is understood that the progeny of a single cell may not necessarily be completely identical in morphology or in genomic or total DNA complement as the original parent, due to natural, accidental, or deliberate mutation. A “recombinant host cell” (also referred to as a “genetically modified host cell”) is a host cell into which has been introduced a heterologous nucleic acid, e.g., an expression vector. For example, a subject bacterial host cell is a genetically modified bacterial host cell by virtue of introduction into a suitable bacterial host cell of an exogenous nucleic acid (e.g., a plasmid or recombinant expression vector) and a subject eukaryotic host cell is a genetically modified eukaryotic host cell (e.g., a mammalian germ cell), by virtue of introduction into a suitable eukaryotic host cell of an exogenous nucleic acid.

The term “stem cell” is used herein to refer to a cell (e.g., plant stem cell, vertebrate stem cell) that has the ability both to self-renew and to generate a differentiated cell type (see Morrison et al. (1997) Cell 88:287-298). In the context of cell ontogeny, the adjective “differentiated”, or “differentiating” is a relative term. A “differentiated cell” is a cell that has progressed further down the developmental pathway than the cell it is being compared with. Thus, pluripotent stem cells (described below) can differentiate into lineage-restricted progenitor cells (e.g., mesodermal stem cells), which in turn can differentiate into cells that are further restricted (e.g., neuron progenitors), which can differentiate into end-stage cells (i.e., terminally differentiated cells, e.g., neurons, cardiomyocytes, etc.), which play a characteristic role in a certain tissue type, and may or may not retain the capacity to proliferate further. Stem cells may be characterized by both the presence of specific markers (e.g., proteins, RNAs, etc.) and the absence of specific markers. Stem cells may also be identified by functional assays both in vitro and in vivo, particularly assays relating to the ability of stem cells to give rise to multiple differentiated progeny.

Stem cells of interest include pluripotent stem cells (PSCs). The term “pluripotent stem cell” or “PSC” is used herein to mean a stem cell capable of producing all cell types of the organism. Therefore, a PSC can give rise to cells of all germ layers of the organism (e.g., the endoderm, mesoderm, and ectoderm of a vertebrate). Pluripotent cells are capable of forming teratomas and of contributing to ectoderm, mesoderm, or endoderm tissues in a living organism. Pluripotent stem cells of plants are capable of giving rise to all cell types of the plant (e.g., cells of the root, stem, leaves, etc.).

PSCs of animals can be derived in a number of different ways. For example, embryonic stem cells (ESCs) are derived from the inner cell mass of an embryo (Thomson et. al, Science. 1998 Nov. 6; 282(5391):1145-7) whereas induced pluripotent stem cells (iPSCs) are derived from somatic cells (Takahashi et. al, Cell. 2007 Nov. 30; 131(5):861-72; Takahashi et. al, Nat Protoc. 2007; 2(12):3081-9; Yu et. al, Science. 2007 Dec. 21; 318(5858):1917-20. Epub 2007 Nov. 20). Because the term PSC refers to pluripotent stem cells regardless of their derivation, the term PSC encompasses the terms ESC and iPSC, as well as the term embryonic germ stem cells (EGSC), which are another example of a PSC. PSCs may be in the form of an established cell line, they may be obtained directly from primary embryonic tissue, or they may be derived from a somatic cell. PSCs can be target cells of the methods described herein.

By “embryonic stem cell” (ESC) is meant a PSC that was isolated from an embryo, typically from the inner cell mass of the blastocyst. ESC lines are listed in the NIH Human Embryonic Stem Cell Registry, e.g. hESBGN-01, hESBGN-02, hESBGN-03, hESBGN-04 (BresaGen, Inc.); HES-1, HES-2, HES-3, HES-4, HES-5, HES-6 (ES Cell International); Miz-hES1 (MizMedi Hospital-Seoul National University); HSF-1, HSF-6 (University of California at San Francisco); and H1, H7, H9, H13, H14 (Wisconsin Alumni Research Foundation (WiCell Research Institute)). Stem cells of interest also include embryonic stem cells from other primates, such as Rhesus stem cells and marmoset stem cells. The stem cells may be obtained from any mammalian species, e.g. human, equine, bovine, porcine, canine, feline, rodent, e.g. mice, rats, hamster, primate, etc. (Thomson et al. (1998) Science 282:1145; Thomson et al. (1995) Proc. Natl. Acad. Sci USA 92:7844; Thomson et al. (1996) Biol. Reprod. 55:254; Shamblott et al., Proc. Natl. Acad. Sci. USA 95:13726, 1998). In culture, ESCs typically grow as flat colonies with large nucleo-cytoplasmic ratios, defined borders and prominent nucleoli. In addition, ESCs express SSEA-3, SSEA-4, TRA-1-60, TRA-1-81, and Alkaline Phosphatase, but not SSEA-1. Examples of methods of generating and characterizing ESCs may be found in, for example, U.S. Pat. No. 7,029,913, U.S. Pat. No. 5,843,780, and U.S. Pat. No. 6,200,806, the disclosures of which are incorporated herein by reference. Methods for proliferating hESCs in the undifferentiated form are described in WO 99/20741, WO 01/51616, and WO 03/020920.

By “embryonic germ stem cell” (EGSC) or “embryonic germ cell” or “EG cell” is meant a PSC that is derived from germ cells and/or germ cell progenitors, e.g. primordial germ cells, i.e. those that would become sperm and eggs. Embryonic germ cells (EG cells) are thought to have properties similar to embryonic stem cells as described above. Examples of methods of generating and characterizing EG cells may be found in, for example, U.S. Pat. No. 7,153,684; Matsui, Y., et al., (1992) Cell 70:841; Shamblott, M., et al. (2001) Proc. Natl. Acad. Sci. USA 98: 113; Shamblott, M., et al. (1998) Proc. Natl. Acad. Sci. USA, 95:13726; and Koshimizu, U., et al. (1996) Development, 122:1235, the disclosures of which are incorporated herein by reference.

By “induced pluripotent stem cell” or “iPSC” it is meant a PSC that is derived from a cell that is not a PSC (i.e., from a cell this is differentiated relative to a PSC). iPSCs can be derived from multiple different cell types, including terminally differentiated cells. iPSCs have an ES cell-like morphology, growing as flat colonies with large nucleo-cytoplasmic ratios, defined borders and prominent nuclei. In addition, iPSCs express one or more key pluripotency markers known by one of ordinary skill in the art, including but not limited to Alkaline Phosphatase, SSEA3, SSEA4, Sox2, Oct3/4, Nanog, TRA160, TRA181, TDGF 1, Dnmt3b, FoxD3, GDF3, Cyp26a1, TERT, and zfp42. Examples of methods of generating and characterizing iPSCs may be found in, for example, U.S. Patent Publication Nos. US20090047263, US20090068742, US20090191159, US20090227032, US20090246875, and US20090304646, the disclosures of which are incorporated herein by reference. Generally, to generate iPSCs, somatic cells are provided with reprogramming factors (e.g. Oct4, SOX2, KLF4, MYC, Nanog, Lin28, etc.) known in the art to reprogram the somatic cells to become pluripotent stem cells.

By “somatic cell” it is meant any cell in an organism that, in the absence of experimental manipulation, does not ordinarily give rise to all types of cells in an organism. In other words, somatic cells are cells that have differentiated sufficiently that they will not naturally generate cells of all three germ layers of the body, i.e. ectoderm, mesoderm and endoderm. For example, somatic cells would include both neurons and neural progenitors, the latter of which may be able to naturally give rise to all or some cell types of the central nervous system but cannot give rise to cells of the mesoderm or endoderm lineages.

By “mitotic cell” it is meant a cell undergoing mitosis. Mitosis is the process by which a eukaryotic cell separates the chromosomes in its nucleus into two identical sets in two separate nuclei. It is generally followed immediately by cytokinesis, which divides the nuclei, cytoplasm, organelles and cell membrane into two cells containing roughly equal shares of these cellular components.

By “post-mitotic cell” it is meant a cell that has exited from mitosis, i.e., it is “quiescent”, i.e. it is no longer undergoing divisions. This quiescent state may be temporary, i.e. reversible, or it may be permanent.

By “meiotic cell” it is meant a cell that is undergoing meiosis. Meiosis is the process by which a cell divides its nuclear material for the purpose of producing gametes or spores. Unlike mitosis, in meiosis, the chromosomes undergo a recombination step which shuffles genetic material between chromosomes. Additionally, the outcome of meiosis is four (genetically unique) haploid cells, as compared with the two (genetically identical) diploid cells produced from mitosis.

By “recombination” it is meant a process of exchange of genetic information between two polynucleotides. As used herein, “homology-directed repair (HDR)” refers to the specialized form DNA repair that takes place, for example, during repair of double-strand breaks in cells. This process requires nucleotide sequence homology, uses a “donor” molecule to template repair of a “target” molecule (i.e., the one that experienced the double-strand break), and leads to the transfer of genetic information from the donor to the target. Homology-directed repair may result in an alteration of the sequence of the target molecule (e.g., insertion, deletion, mutation), if the donor polynucleotide differs from the target molecule and part or all of the sequence of the donor polynucleotide is incorporated into the target DNA. In some embodiments, the donor polynucleotide, a portion of the donor polynucleotide, a copy of the donor polynucleotide, or a portion of a copy of the donor polynucleotide integrates into the target DNA.

By “non-homologous end joining (NHEJ)” it is meant the repair of double-strand breaks in DNA by direct ligation of the break ends to one another without the need for a homologous template (in contrast to homology-directed repair, which requires a homologous sequence to guide repair). NHEJ often results in the loss (deletion) of nucleotide sequence near the site of the double-strand break.

The terms “treatment”, “treating” and the like are used herein to generally mean obtaining a desired pharmacologic and/or physiologic effect. The effect may be prophylactic in terms of completely or partially preventing a disease or symptom thereof and/or may be therapeutic in terms of a partial or complete cure for a disease and/or adverse effect attributable to the disease. “Treatment” as used herein covers any treatment of a disease or symptom in a mammal, and includes: (a) preventing the disease or symptom from occurring in a subject which may be predisposed to acquiring the disease or symptom but has not yet been diagnosed as having it; (b) inhibiting the disease or symptom, i.e., arresting its development; or (c) relieving the disease, i.e., causing regression of the disease. The therapeutic agent may be administered before, during or after the onset of disease or injury. The treatment of ongoing disease, where the treatment stabilizes or reduces the undesirable clinical symptoms of the patient, is of particular interest. Such treatment is desirably performed prior to complete loss of function in the affected tissues. The subject therapy will desirably be administered during the symptomatic stage of the disease, and in some cases after the symptomatic stage of the disease.

The terms “individual,”“subject,”“host,” and “patient,” are used interchangeably herein and refer to any mammalian subject for whom diagnosis, treatment, or therapy is desired, particularly humans.

General methods in molecular and cellular biochemistry can be found in such standard textbooks as Molecular Cloning: A Laboratory Manual, 3rd Ed. (Sambrook et al., HaRBor Laboratory Press 2001); Short Protocols in Molecular Biology, 4th Ed. (Ausubel et al. eds., John Wiley & Sons 1999); Protein Methods (Bollag et al., John Wiley & Sons 1996); Nonviral Vectors for Gene Therapy (Wagner et al. eds., Academic Press 1999); Viral Vectors (Kaplift & Loewy eds., Academic Press 1995); Immunology Methods Manual (I. Lefkovits ed., Academic Press 1997); and Cell and Tissue Culture: Laboratory Procedures in Biotechnology (Doyle & Griffiths, John Wiley & Sons 1998), the disclosures of which are incorporated herein by reference.

Before the present invention is further described, it is to be understood that this invention is not limited to particular embodiments described, as such may, of course, vary. It is also to be understood that the terminology used herein is for the purpose of describing particular embodiments only, and is not intended to be limiting, since the scope of the present invention will be limited only by the appended claims.

Where a range of values is provided, it is understood that each intervening value, to the tenth of the unit of the lower limit unless the context clearly dictates otherwise, between the upper and lower limit of that range and any other stated or intervening value in that stated range, is encompassed within the invention. The upper and lower limits of these smaller ranges may independently be included in the smaller ranges, and are also encompassed within the invention, subject to any specifically excluded limit in the stated range. Where the stated range includes one or both of the limits, ranges excluding either or both of those included limits are also included in the invention.

Certain ranges are presented herein with numerical values being preceded by the term “about” The term “about” is used herein to provide literal support for the exact number that it precedes, as well as a number that is near to or approximately the number that the term precedes. In determining whether a number is near to or approximately a specifically recited number, the near or approximating unrecited number may be a number which, in the context in which it is presented, provides the substantial equivalent of the specifically recited number.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Although any methods and materials similar or equivalent to those described herein can also be used in the practice or testing of the present invention, the preferred methods and materials are now described. All publications mentioned herein are incorporated herein by reference to disclose and describe the methods and/or materials in connection with which the publications are cited.

All publications and patents cited in this specification are herein incorporated by reference as if each individual publication or patent were specifically and individually indicated to be incorporated by reference and are incorporated herein by reference to disclose and describe the methods and/or materials in connection with which the publications are cited. The citation of any publication is for its disclosure prior to the filing date and should not be construed as an admission that the present invention is not entitled to antedate such publication by virtue of prior invention. Further, the dates of publication provided may be different from the actual publication dates which may need to be independently confirmed.

It is noted that as used herein and in the appended claims, the singular forms “a,”“an,” and “the” include plural referents unless the context clearly dictates otherwise. Thus, for example, reference to “a polynucleotide” includes a plurality of such polynucleotides and reference to “the polypeptide” includes reference to one or more polypeptides and equivalents thereof known to those skilled in the art, and so forth. It is further noted that the claims may be drafted to exclude any optional element. As such, this statement is intended to serve as antecedent basis for use of such exclusive terminology as “solely,”“only” and the like in connection with the recitation of claim elements, or use of a “negative” limitation.

It is appreciated that certain features of the invention, which are, for clarity, described in the context of separate embodiments, may also be provided in combination in a single embodiment. Conversely, various features of the invention, which are, for brevity, described in the context of a single embodiment, may also be provided separately or in any suitable sub-combination. All combinations of the embodiments pertaining to the invention are specifically embraced by the present invention and are disclosed herein just as if each and every combination was individually and explicitly disclosed. In addition, all sub-combinations of the various embodiments and elements thereof are also specifically embraced by the present invention and are disclosed herein just as if each and every such sub-combination was individually and explicitly disclosed herein.

The publications discussed herein are provided solely for their disclosure prior to the filing date of the present application. Nothing herein is to be construed as an admission that the present invention is not entitled to antedate such publication by virtue of prior invention. Further, the dates of publication provided may be different from the actual publication dates which may need to be independently confirmed.

Detailed Description—Part I

The present disclosure provides a DNA-targeting RNA that comprises a targeting sequence and, together with a modifying polypeptide, provides for site-specific modification of a target DNA and/or a polypeptide associated with the target DNA. The present disclosure further provides site-specific modifying polypeptides. The present disclosure further provides methods of site-specific modification of a target DNA and/or a polypeptide associated with the target DNA The present disclosure provides methods of modulating transcription of a target nucleic acid in a target cell, generally involving contacting the target nucleic acid with an enzymatically inactive Cas9 polypeptide and a DNA-targeting RNA. Kits and compositions for carrying out the methods are also provided. The present disclosure provides genetically modified cells that produce Cas9; and Cas9 transgenic non-human multicellular organisms.

Nucleic Acids

DNA-Targeting RNA

The present disclosure provides a DNA-targeting RNA that directs the activities of an associated polypeptide (e.g., a site-directed modifying polypeptide) to a specific target sequence within a target DNA. A subject DNA-targeting RNA comprises: a first segment (also referred to herein as a “DNA-targeting segment” or a “DNA-targeting sequence”) and a second segment (also referred to herein as a “protein-binding segment” or a “protein-binding sequence”).

DNA-Targeting Segment of a DNA-Targeting RNA

The DNA-targeting segment of a subject DNA-targeting RNA comprises a nucleotide sequence that is complementary to a sequence in a target DNA. In other words, the DNA-targeting segment of a subject DNA-targeting RNA interacts with a target DNA in a sequence-specific manner via hybridization (i.e., base pairing). As such, the nucleotide sequence of the DNA-targeting segment may vary and determines the location within the target DNA that the DNA-targeting RNA and the target DNA will interact. The DNA-targeting segment of a subject DNA-targeting RNA can be modified (e.g., by genetic engineering) to hybridize to any desired sequence within a target DNA.

The DNA-targeting segment can have a length of from about 12 nucleotides to about 100 nucleotides. For example, the DNA-targeting segment can have a length of from about 12 nucleotides (nt) to about 80 nt, from about 12 nt to about 50 nt, from about 12 nt to about 40 nt, from about 12 nt to about 30 nt, from about 12 nt to about 25 nt, from about 12 nt to about 20 nt, or from about 12 nt to about 19 nt. For example, the DNA-targeting segment can have a length of from about 19 nt to about 20 nt, from about 19 nt to about 25 nt, from about 19 nt to about 30 nt, from about 19 nt to about 35 nt, from about 19 nt to about 40 nt, from about 19 nt to about 45 nt, from about 19 nt to about 50 nt, from about 19 nt to about 60 nt, from about 19 nt to about 70 nt, from about 19 nt to about 80 nt, from about 19 nt to about 90 nt, from about 19 nt to about 100 nt, from about 20 nt to about 25 nt, from about 20 nt to about 30 nt, from about 20 nt to about 35 nt, from about 20 nt to about 40 nt, from about 20 nt to about 45 nt, from about 20 nt to about 50 nt, from about 20 nt to about 60 nt, from about 20 nt to about 70 nt, from about 20 nt to about 80 nt, from about 20 nt to about 90 nt, or from about 20 nt to about 100 nt. The nucleotide sequence (the DNA-targeting sequence) of the DNA-targeting segment that is complementary to a nucleotide sequence (target sequence) of the target DNA can have a length at least about 12 nt. For example, the DNA-targeting sequence of the DNA-targeting segment that is complementary to a target sequence of the target DNA can have a length at least about 12 nt, at least about 15 nt, at least about 18 nt, at least about 19 nt, at least about 20 nt, at least about 25 nt, at least about 30 nt, at least about 35 nt or at least about 40 nt. For example, the DNA-targeting sequence of the DNA-targeting segment that is complementary to a target sequence of the target DNA can have a length of from about 12 nucleotides (nt) to about 80 nt, from about 12 nt to about 50 nt, from about 12 nt to about 45 nt, from about 12 nt to about 40 nt, from about 12 nt to about 35 nt, from about 12 nt to about 30 nt, from about 12 nt to about 25 nt, from about 12 nt to about 20 nt, from about 12 nt to about 19 nt, from about 19 nt to about 20 nt, from about 19 nt to about 25 nt, from about 19 nt to about 30 nt, from about 19 nt to about 35 nt, from about 19 nt to about 40 nt, from about 19 nt to about 45 nt, from about 19 nt to about 50 nt, from about 19 nt to about 60 nt, from about 20 nt to about 25 nt, from about 20 nt to about 30 nt, from about 20 nt to about 35 nt, from about 20 nt to about 40 nt, from about 20 nt to about 45 nt, from about 20 nt to about 50 nt, or from about 20 nt to about 60 nt. The nucleotide sequence (the DNA-targeting sequence) of the DNA-targeting segment that is complementary to a nucleotide sequence (target sequence) of the target DNA can have a length at least about 12 nt.

In some cases, the DNA-targeting sequence of the DNA-targeting segment that is complementary to a target sequence of the target DNA is 20 nucleotides in length. In some cases, the DNA-targeting sequence of the DNA-targeting segment that is complementary to a target sequence of the target DNA is 19 nucleotides in length.

The percent complementarity between the DNA-targeting sequence of the DNA-targeting segment and the target sequence of the target DNA can be at least 60% (e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 97%, at least 98%, at least 99%, or 100%). In some cases, the percent complementarity between the DNA-targeting sequence of the DNA-targeting segment and the target sequence of the target DNA is 100% over the seven contiguous 5′-most nucleotides of the target sequence of the complementary strand of the target DNA. In some cases, the percent complementarity between the DNA-targeting sequence of the DNA-targeting segment and the target sequence of the target DNA is at least 60% over about 20 contiguous nucleotides. In some cases, the percent complementarity between the DNA-targeting sequence of the DNA-targeting segment and the target sequence of the target DNA is 100% over the fourteen contiguous 5′-most nucleotides of the target sequence of the complementary strand of the target DNA and as low as 0% over the remainder. In such a case, the DNA-targeting sequence can be considered to be 14 nucleotides in length (see FIG. 12D-12E). In some cases, the percent complementarity between the DNA-targeting sequence of the DNA-targeting segment and the target sequence of the target DNA is 100% over the seven contiguous 5′-most nucleotides of the target sequence of the complementary strand of the target DNA and as low as 0% over the remainder. In such a case, the DNA-targeting sequence can be considered to be 7 nucleotides in length.

Protein-Binding Segment of a DNA-Targeting RNA

The protein-binding segment of a subject DNA-targeting RNA interacts with a site-directed modifying polypeptide. The subject DNA-targeting RNA guides the bound polypeptide to a specific nucleotide sequence within target DNA via the above mentioned DNA-targeting segment. The protein-binding segment of a subject DNA-targeting RNA comprises two stretches of nucleotides that are complementary to one another. The complementary nucleotides of the protein-binding segment hybridize to form a double stranded RNA duplex (dsRNA) (see FIGS. 1A and 1B).

A subject double-molecule DNA-targeting RNA comprises two separate RNA molecules. Each of the two RNA molecules of a subject double-molecule DNA-targeting RNA comprises a stretch of nucleotides that are complementary to one another such that the complementary nucleotides of the two RNA molecules hybridize to form the double stranded RNA duplex of the protein-binding segment (FIG. 1A).

In some embodiments, the duplex-forming segment of the activator-RNA is at least about 60% identical to one of the activator-RNA (tracrRNA) molecules set forth in SEQ ID NOs:431-562, or a complement thereof, over a stretch of at least 8 contiguous nucleotides. For example, the duplex-forming segment of the activator-RNA (or the DNA encoding the duplex-forming segment of the activator-RNA) is at least about 60% identical, at least about 65% identical, at least about 70% identical, at least about 75% identical, at least about 80% identical, at least about 85% identical, at least about 90% identical, at least about 95% identical, at least about 98% identical, at least about 99% identical, or 100% identical, to one of the tracrRNA sequences set forth in SEQ ID NOs:431-562, or a complement thereof, over a stretch of at least 8 contiguous nucleotides.

In some embodiments, the duplex-forming segment of the targeter-RNA is at least about 60% identical to one of the targeter-RNA (crRNA) sequences set forth in SEQ ID NOs:563-679, or a complement thereof, over a stretch of at least 8 contiguous nucleotides. For example, the duplex-forming segment of the targeter-RNA (or the DNA encoding the duplex-forming segment of the targeter-RNA) is at least about 65% identical, at least about 70% identical, at least about 75% identical, at least about 80% identical, at least about 85% identical, at least about 90% identical, at least about 95% identical, at least about 98% identical, at least about 99% identical or 100% identical to one of the crRNA sequences set forth in SEQ ID NOs:563-679, or a complement thereof, over a stretch of at least 8 contiguous nucleotides.

A two-molecule DNA-targeting RNA can be designed to allow for controlled (i.e., conditional) binding of a targeter-RNA with an activator-RNA. Because a two-molecule DNA-targeting RNA is not functional unless both the activator-RNA and the targeter-RNA are bound in a functional complex with dCas9, a two-molecule DNA-targeting RNA can be inducible (e.g., drug inducible) by rendering the binding between the activator-RNA and the targeter-RNA to be inducible. As one non-limiting example, RNA aptamers can be used to regulate (i.e., control) the binding of the activator-RNA with the targeter-RNA. Accordingly, the activator-RNA and/or the targeter-RNA can comprise an RNA aptamer sequence.

RNA aptamers are known in the art and are generally a synthetic version of a riboswitch. The terms “RNA aptamer” and “riboswitch” are used interchangeably herein to encompass both synthetic and natural nucleic acid sequences that provide for inducible regulation of the structure (and therefore the availability of specific sequences) of the RNA molecule of which they are part. RNA aptamers usually comprise a sequence that folds into a particular structure (e.g., a hairpin), which specifically binds a particular drug (e.g., a small molecule). Binding of the drug causes a structural change in the folding of the RNA, which changes a feature of the nucleic acid of which the aptamer is a part. As non-limiting examples: (i) an activator-RNA with an aptamer may not be able to bind to the cognate targeter-RNA unless the aptamer is bound by the appropriate drug; (ii) a targeter-RNA with an aptamer may not be able to bind to the cognate activator-RNA unless the aptamer is bound by the appropriate drug; and (iii) a targeter-RNA and an activator-RNA, each comprising a different aptamer that binds a different drug, may not be able to bind to each other unless both drugs are present. As illustrated by these examples, a two-molecule DNA-targeting RNA can be designed to be inducible.

Examples of aptamers and riboswitches can be found, for example, in: Nakamura et al., Genes Cells. 2012 May; 17(5):344-64; Vavalle et al., Future Cardiol. 2012 May; 8(3):371-82; Citartan et al., Biosens Bioelectron. 2012 Apr. 15; 34(1):1-11; and Liberman et al., Wiley Interdiscip Rev RNA. 2012 May-June; 3(3):369-84; all of which are herein incorporated by reference in their entirety.

Non-limiting examples of nucleotide sequences that can be included in a two-molecule DNA-targeting RNA include either of the sequences set forth in SEQ ID NOs:431-562, or complements thereof pairing with any sequences set forth in SEQ ID NOs:563-679, or complements thereof that can hybridize to form a protein binding segment.

A subject single-molecule DNA-targeting RNA comprises two stretches of nucleotides (a targeter-RNA and an activator-RNA) that are complementary to one another, are covalently linked by intervening nucleotides (“linkers” or “linker nucleotides”), and hybridize to form the double stranded RNA duplex (dsRNA duplex) of the protein-binding segment, thus resulting in a stem-loop structure (FIG. 1B). The targeter-RNA and the activator-RNA can be covalently linked via the 3′ end of the targeter-RNA and the 5′ end of the activator-RNA. Alternatively, targeter-RNA and the activator-RNA can be covalently linked via the 5′ end of the targeter-RNA and the 3′ end of the activator-RNA.

The linker of a single-molecule DNA-targeting RNA can have a length of from about 3 nucleotides to about 100 nucleotides. For example, the linker can have a length of from about 3 nucleotides (nt) to about 90 nt, from about 3 nucleotides (nt) to about 80 nt, from about 3 nucleotides (nt) to about 70 nt, from about 3 nucleotides (nt) to about 60 nt, from about 3 nucleotides (nt) to about 50 nt, from about 3 nucleotides (nt) to about 40 nt, from about 3 nucleotides (nt) to about 30 nt, from about 3 nucleotides (nt) to about 20 nt or from about 3 nucleotides (nt) to about 10 nt. For example, the linker can have a length of from about 3 nt to about 5 nt, from about 5 nt to about 10 nt, from about 10 nt to about 15 nt, from about 15 nt to about 20 nt, from about 20 nt to about 25 nt, from about 25 nt to about 30 nt, from about 30 nt to about 35 nt, from about 35 nt to about 40 nt, from about 40 nt to about 50 nt, from about 50 nt to about 60 nt, from about 60 nt to about 70 nt, from about 70 nt to about 80 nt, from about 80 nt to about 90 nt, or from about 90 nt to about 100 nt. In some embodiments, the linker of a single-molecule DNA-targeting RNA is 4 nt.

An exemplary single-molecule DNA-targeting RNA comprises two complementary stretches of nucleotides that hybridize to form a dsRNA duplex. In some embodiments, one of the two complementary stretches of nucleotides of the single-molecule DNA-targeting RNA (or the DNA encoding the stretch) is at least about 60% identical to one of the activator-RNA (tracrRNA) molecules set forth in SEQ ID NOs:431-562, or a complement thereof, over a stretch of at least 8 contiguous nucleotides. For example, one of the two complementary stretches of nucleotides of the single-molecule DNA-targeting RNA (or the DNA encoding the stretch) is at least about 65% identical, at least about 70% identical, at least about 75% identical, at least about 80% identical, at least about 85% identical, at least about 90% identical, at least about 95% identical, at least about 98% identical, at least about 99% identical or 100% identical to one of the tracrRNA sequences set forth in SEQ ID NOs:431-562, or a complement thereof, over a stretch of at least 8 contiguous nucleotides.

In some embodiments, one of the two complementary stretches of nucleotides of the single-molecule DNA-targeting RNA (or the DNA encoding the stretch) is at least about 60% identical to one of the targeter-RNA (crRNA) sequences set forth in SEQ ID NOs:563-679, or a complement thereof, over a stretch of at least 8 contiguous nucleotides. For example, one of the two complementary stretches of nucleotides of the single-molecule DNA-targeting RNA (or the DNA encoding the stretch) is at least about 65% identical, at least about 70% identical, at least about 75% identical, at least about 80% identical, at least about 85% identical, at least about 90% identical, at least about 95% identical, at least about 98% identical, at least about 99% identical or 100% identical to one of the crRNA sequences set forth in SEQ ID NOs:563-679, or a complement thereof, over a stretch of at least 8 contiguous nucleotides.

Appropriate naturally occurring cognate pairs of crRNAs and tracrRNAs can be routinely determined for SEQ ID NOs:431-679 by taking into account the species name and base-pairing (for the dsRNA duplex of the protein-binding domain) when determining appropriate cognate pairs (see FIG. 8 as a non-limiting example).

With regard to both a subject single-molecule DNA-targeting RNA and to a subject double-molecule DNA-targeting RNA, FIG. 57A-57B demonstrates that artificial sequences that share very little (roughly 50% identity) with naturally occurring a tracrRNAs and crRNAs can function with Cas9 to cleave target DNA as long as the structure of the protein-binding domain of the DNA-targeting RNA is conserved. Thus, RNA folding structure of a naturally occurring protein-binding domain of a DNA-targeting RNA can be taken into account in order to design artificial protein-binding domains (either two-molecule or single-molecule versions). As a non-limiting example, the functional artificial DNA-targeting RNA of FIG. 57A-57B was designed based on the structure of the protein-binding segment of the naturally occurring DNA-targeting (e.g., including the same number of base pairs along the RNA duplex and including the same “buldge” region as present in the naturally occurring RNA). As structures can readily be produced by one of ordinary skill in the art for any naturally occurring crRNA:tracrRNA pair from any species (see SEQ ID NOs:431-679 for crRNA and tracrRNA sequences from a wide variety of species), an artificial DNA-targeting-RNA can be designed to mimic the natural structure for a given species when using the Cas9 (or a related Cas9, see FIG. 32A) from that species. (see FIG. 24D and related details in Example 1). Thus, a suitable DNA-targeting RNA can be an artificially designed RNA (non-naturally occurring) comprising a protein-binding domain that was designed to mimic the structure of a protein-binding domain of a naturally occurring DNA-targeting RNA. (see SEQ ID NOs:431-679, taking into account the species name when determining appropriate cognate pairs).

The protein-binding segment can have a length of from about 10 nucleotides to about 100 nucleotides. For example, the protein-binding segment can have a length of from about 15 nucleotides (nt) to about 80 nt, from about 15 nt to about 50 nt, from about 15 nt to about 40 nt, from about 15 nt to about 30 nt or from about 15 nt to about 25 nt.

Also with regard to both a subject single-molecule DNA-targeting RNA and to a subject double-molecule DNA-targeting RNA, the dsRNA duplex of the protein-binding segment can have a length from about 6 base pairs (bp) to about 50 bp. For example, the dsRNA duplex of the protein-binding segment can have a length from about 6 bp to about 40 bp, from about 6 bp to about 30 bp, from about 6 bp to about 25 bp, from about 6 bp to about 20 bp, from about 6 bp to about 15 bp, from about 8 bp to about 40 bp, from about 8 bp to about 30 bp, from about 8 bp to about 25 bp, from about 8 bp to about 20 bp or from about 8 bp to about 15 bp. For example, the dsRNA duplex of the protein-binding segment can have a length from about from about 8 bp to about 10 bp, from about 10 bp to about 15 bp, from about 15 bp to about 18 bp, from about 18 bp to about 20 bp, from about 20 bp to about 25 bp, from about 25 bp to about 30 bp, from about 30 bp to about 35 bp, from about 35 bp to about 40 bp, or from about 40 bp to about 50 bp. In some embodiments, the dsRNA duplex of the protein-binding segment has a length of 36 base pairs. The percent complementarity between the nucleotide sequences that hybridize to form the dsRNA duplex of the protein-binding segment can be at least about 60%. For example, the percent complementarity between the nucleotide sequences that hybridize to form the dsRNA duplex of the protein-binding segment can be at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or at least about 99%. In some cases, the percent complementarity between the nucleotide sequences that hybridize to form the dsRNA duplex of the protein-binding segment is 100%.

Site-Directed Modifying Polypeptide

A subject DNA-targeting RNA and a subject site-directed modifying polypeptide form a complex. The DNA-targeting RNA provides target specificity to the complex by comprising a nucleotide sequence that is complementary to a sequence of a target DNA (as noted above). The site-directed modifying polypeptide of the complex provides the site-specific activity. In other words, the site-directed modifying polypeptide is guided to a DNA sequence (e.g. a chromosomal sequence or an extrachromosomal sequence, e.g. an episomal sequence, a minicircle sequence, a mitochondrial sequence, a chloroplast sequence, etc.) by virtue of its association with at least the protein-binding segment of the DNA-targeting RNA (described above).

A subject site-directed modifying polypeptide modifies target DNA (e.g., cleavage or methylation of target DNA) and/or a polypeptide associated with target DNA (e.g., methylation or acetylation of a histone tail). A site-directed modifying polypeptide is also referred to herein as a “site-directed polypeptide” or an “RNA binding site-directed modifying polypeptide.”

In some cases, the site-directed modifying polypeptide is a naturally-occurring modifying polypeptide. In other cases, the site-directed modifying polypeptide is not a naturally-occurring polypeptide (e.g., a chimeric polypeptide as discussed below or a naturally-occurring polypeptide that is modified, e.g., mutation, deletion, insertion).

Exemplary naturally-occurring site-directed modifying polypeptides are set forth in SEQ ID NOs:1-255 as a non-limiting and non-exhaustive list of naturally occurring Cas9/Csn1 endonucleases. These naturally occurring polypeptides, as disclosed herein, bind a DNA-targeting RNA, are thereby directed to a specific sequence within a target DNA, and cleave the target DNA to generate a double strand break. A subject site-directed modifying polypeptide comprises two portions, an RNA-binding portion and an activity portion. In some embodiments, a subject site-directed modifying polypeptide comprises: (i) an RNA-binding portion that interacts with a DNA-targeting RNA, wherein the DNA-targeting RNA comprises a nucleotide sequence that is complementary to a sequence in a target DNA; and (ii) an activity portion that exhibits site-directed enzymatic activity (e.g., activity for DNA methylation, activity for DNA cleavage, activity for histone acetylation, activity for histone methylation, etc.), wherein the site of enzymatic activity is determined by the DNA-targeting RNA.

In other embodiments, a subject site-directed modifying polypeptide comprises: (i) an RNA-binding portion that interacts with a DNA-targeting RNA, wherein the DNA-targeting RNA comprises a nucleotide sequence that is complementary to a sequence in a target DNA; and (ii) an activity portion that modulates transcription within the target DNA (e.g., to increase or decrease transcription), wherein the site of modulated transcription within the target DNA is determined by the DNA-targeting RNA.

In some cases, a subject site-directed modifying polypeptide has enzymatic activity that modifies target DNA (e.g., nuclease activity, methyltransferase activity, demethylase activity, DNA repair activity, DNA damage activity, deamination activity, dismutase activity, alkylation activity, depurination activity, oxidation activity, pyrimidine dimer forming activity, integrase activity, transposase activity, recombinase activity, polymerase activity, ligase activity, helicase activity, photolyase activity or glycosylase activity).

In other cases, a subject site-directed modifying polypeptide has enzymatic activity that modifies a polypeptide (e.g., a histone) associated with target DNA (e.g., methyltransferase activity, demethylase activity, acetyltransferase activity, deacetylase activity, kinase activity, phosphatase activity, ubiquitin ligase activity, deubiquitinating activity, adenylation activity, deadenylation activity, SUMOylating activity, deSUMOylating activity, ribosylation activity, deribosylation activity, myristoylation activity or demyristoylation activity).

Exemplary Site-Directed Modifying Polypeptides

In some cases, the site-directed modifying polypeptide comprises an amino acid sequence having at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 99%, or 100%, amino acid sequence identity to amino acids 7-166 or 731-1003 of the Cas9/Csn1 amino acid sequence depicted in FIG. 3A and FIG. 3B, or to the corresponding portions in any of the amino acid sequences set forth as SEQ ID NOs:1-256 and 795-1346.

Nucleic Acid Modifications

In some embodiments, a subject nucleic acid (e.g., a DNA-targeting RNA) comprises one or more modifications, e.g., a base modification, a backbone modification, etc, to provide the nucleic acid with a new or enhanced feature (e.g., improved stability). As is known in the art, a nucleoside is a base-sugar combination. The base portion of the nucleoside is normally a heterocyclic base. The two most common classes of such heterocyclic bases are the purines and the pyrimidines. Nucleotides are nucleosides that further include a phosphate group covalently linked to the sugar portion of the nucleoside. For those nucleosides that include a pentofuranosyl sugar, the phosphate group can be linked to the 2′, the 3′, or the 5′ hydroxyl moiety of the sugar. In forming oligonucleotides, the phosphate groups covalently link adjacent nucleosides to one another to form a linear polymeric compound. In turn, the respective ends of this linear polymeric compound can be further joined to form a circular compound, however, linear compounds are generally suitable. In addition, linear compounds may have internal nucleotide base complementarity and may therefore fold in a manner as to produce a fully or partially double-stranded compound. Within oligonucleotides, the phosphate groups are commonly referred to as forming the internucleoside backbone of the oligonucleotide. The normal linkage or backbone of RNA and DNA is a 3′ to 5′ phosphodiester linkage.

Modified Backbones and Modified Internucleoside Linkages

Examples of suitable nucleic acids containing modifications include nucleic acids containing modified backbones or non-natural internucleoside linkages. Nucleic acids (having modified backbones include those that retain a phosphorus atom in the backbone and those that do not have a phosphorus atom in the backbone.

Suitable modified oligonucleotide backbones containing a phosphorus atom therein include, for example, phosphorothioates, chiral phosphorothioates, phosphorodithioates, phosphotriesters, aminoalkylphosphotriesters, methyl and other alkyl phosphonates including 3′-alkylene phosphonates, 5′-alkylene phosphonates and chiral phosphonates, phosphinates, phosphoramidates including 3′-amino phosphoramidate and aminoalkylphosphoramidates, phosphorodiamidates, thionophosphoramidates, thionoalkylphosphonates, thionoalkylphosphotriesters, selenophosphates and boranophosphates having normal 3′-5′ linkages, 2′-5′ linked analogs of these, and those having inverted polarity wherein one or more internucleotide linkages is a 3′ to 3′,5′ to 5′ or 2′ to 2′ linkage. Suitable oligonucleotides having inverted polarity comprise a single 3′ to 3′ linkage at the 3′-most internucleotide linkage i.e. a single inverted nucleoside residue which may be a basic (the nucleobase is missing or has a hydroxyl group in place thereof). Various salts (such as, for example, potassium or sodium), mixed salts and free acid forms are also included.

In some embodiments, a subject nucleic acid comprises one or more phosphorothioate and/or heteroatom internucleoside linkages, in particular —CH2—NH—O—CH2—, —CH2—N(CH3)—O—CH2— (known as a methylene(methylimino) or MMI backbone), —CH2—O—N(CH3)—CH2—, —CH2—N(CH3)—N(CH3)—CH2— and —O—N(CH3)—CH2—CH2— (wherein the native phosphodiester internucleotide linkage is represented as —O—P(═O)(OH)—O—CH2—). MMI type internucleoside linkages are disclosed in the above referenced U.S. Pat. No. 5,489,677. Suitable amide internucleoside linkages are disclosed in t U.S. Pat. No. 5,602,240.

Also suitable are nucleic acids having morpholino backbone structures as described in, e.g., U.S. Pat. No. 5,034,506. For example, in some embodiments, a subject nucleic acid comprises a 6-membered morpholino ring in place of a ribose ring. In some of these embodiments, a phosphorodiamidate or other non-phosphodiester internucleoside linkage replaces a phosphodiester linkage.

Suitable modified polynucleotide backbones that do not include a phosphorus atom therein have backbones that are formed by short chain alkyl or cycloalkyl internucleoside linkages, mixed heteroatom and alkyl or cycloalkyl internucleoside linkages, or one or more short chain heteroatomic or heterocyclic internucleoside linkages. These include those having morpholino linkages (formed in part from the sugar portion of a nucleoside); siloxane backbones; sulfide, sulfoxide and sulfone backbones; formacetyl and thioformacetyl backbones; methylene formacetyl and thioformacetyl backbones; riboacetyl backbones; alkene containing backbones; sulfamate backbones; methyleneimino and methylenehydrazino backbones; sulfonate and sulfonamide backbones; amide backbones; and others having mixed N, O, S and CH2 component parts.

Mimetics

A subject nucleic acid can be a nucleic acid mimetic. The term “mimetic” as it is applied to polynucleotides is intended to include polynucleotides wherein only the furanose ring or both the furanose ring and the internucleotide linkage are replaced with non-furanose groups, replacement of only the furanose ring is also referred to in the art as being a sugar surrogate. The heterocyclic base moiety or a modified heterocyclic base moiety is maintained for hybridization with an appropriate target nucleic acid. One such nucleic acid, a polynucleotide mimetic that has been shown to have excellent hybridization properties, is referred to as a peptide nucleic acid (PNA). In PNA, the sugar-backbone of a polynucleotide is replaced with an amide containing backbone, in particular an aminoethylglycine backbone. The nucleotides are retained and are bound directly or indirectly to aza nitrogen atoms of the amide portion of the backbone.

One polynucleotide mimetic that has been reported to have excellent hybridization properties is a peptide nucleic acid (PNA). The backbone in PNA compounds is two or more linked aminoethylglycine units which gives PNA an amide containing backbone. The heterocyclic base moieties are bound directly or indirectly to aza nitrogen atoms of the amide portion of the backbone. Representative U.S. patents that describe the preparation of PNA compounds include, but are not limited to: U.S. Pat. Nos. 5,539,082; 5,714,331; and 5,719,262.

Another class of polynucleotide mimetic that has been studied is based on linked morpholino units (morpholino nucleic acid) having heterocyclic bases attached to the morpholino ring. A number of linking groups have been reported that link the morpholino monomeric units in a morpholino nucleic acid. One class of linking groups has been selected to give a non-ionic oligomeric compound. The non-ionic morpholino-based oligomeric compounds are less likely to have undesired interactions with cellular proteins. Morpholino-based polynucleotides are non-ionic mimics of oligonucleotides which are less likely to form undesired interactions with cellular proteins (Dwaine A. Braasch and David R. Corey, Biochemistry, 2002, 41(14), 4503-4510). Morpholino-based polynucleotides are disclosed in U.S. Pat. No. 5,034,506. A variety of compounds within the morpholino class of polynucleotides have been prepared, having a variety of different linking groups joining the monomeric subunits.

A further class of polynucleotide mimetic is referred to as cyclohexenyl nucleic acids (CeNA). The furanose ring normally present in a DNA/RNA molecule is replaced with a cyclohexenyl ring. CeNA DMT protected phosphoramidite monomers have been prepared and used for oligomeric compound synthesis following classical phosphoramidite chemistry. Fully modified CeNA oligomeric compounds and oligonucleotides having specific positions modified with CeNA have been prepared and studied (see Wang et al., J. Am. Chem. Soc., 2000, 122, 8595-8602). In general the incorporation of CeNA monomers into a DNA chain increases its stability of a DNA/RNA hybrid. CeNA oligoadenylates formed complexes with RNA and DNA complements with similar stability to the native complexes. The study of incorporating CeNA structures into natural nucleic acid structures was shown by NMR and circular dichroism to proceed with easy conformational adaptation.

A further modification includes Locked Nucleic Acids (LNAs) in which the 2′-hydroxyl group is linked to the 4′ carbon atom of the sugar ring thereby forming a 2′-C,4′-C-oxymethylene linkage thereby forming a bicyclic sugar moiety. The linkage can be a methylene (—CH2—), group bridging the 2′ oxygen atom and the 4′ carbon atom wherein n is 1 or 2 (Singh et al., Chem. Commun., 1998, 4, 455-456). LNA and LNA analogs display very high duplex thermal stabilities with complementary DNA and RNA (Tm=+3 to +10° C.), stability towards 3′-exonucleolytic degradation and good solubility properties. Potent and nontoxic antisense oligonucleotides containing LNAs have been described (Wahlestedt et al., Proc. Natl. Acad. Sci. U.S.A., 2000, 97, 5633-5638).

The synthesis and preparation of the LNA monomers adenine, cytosine, guanine, 5-methylcytosine, thymine and uracil, along with their oligomerization, and nucleic acid recognition properties have been described (Koshkin et al., Tetrahedron, 1998, 54, 3607-3630). LNAs and preparation thereof are also described in WO 98/39352 and WO 99/14226.

Modified Sugar Moieties

A subject nucleic acid can also include one or more substituted sugar moieties. Suitable polynucleotides comprise a sugar substituent group selected from: OH; F; O-, S-, or N-alkyl; O-, S-, or N-alkenyl; O-, S- or N-alkynyl; or O-alkyl-O-alkyl, wherein the alkyl, alkenyl and alkynyl may be substituted or unsubstituted C.sub.1 to C10 alkyl or C2 to C10 alkenyl and alkynyl. Particularly suitable are O((CH2)nO)mCH3, O(CH2)nOCH3, O(CH2)nNH2, O(CH2)nCH3, O(CH2)nONH2, and O(CH2)nON((CH2)nCH3)2, where n and m are from 1 to about 10. Other suitable polynucleotides comprise a sugar substituent group selected from: C1 to C10 lower alkyl, substituted lower alkyl, alkenyl, alkynyl, alkaryl, aralkyl, O-alkaryl or O-aralkyl, SH, SCH3, OCN, Cl, Br, CN, CF3, OCF3, SOCH3, SO2CH3, ONO2, NO2, N3, NH2, heterocycloalkyl, heterocycloalkaryl, aminoalkylamino, polyalkylamino, substituted silyl, an RNA cleaving group, a reporter group, an intercalator, a group for improving the pharmacokinetic properties of an oligonucleotide, or a group for improving the pharmacodynamic properties of an oligonucleotide, and other substituents having similar properties. A suitable modification includes 2′-methoxyethoxy(2′-O—CH2CH2OCH3, also known as 2′-O-(2-methoxyethyl) or 2′-MOE) (Martin et al., Helv. Chim Acta, 1995, 78, 486-504) i.e., an alkoxyalkoxy group. A further suitable modification includes 2′-dimethylaminooxyethoxy, i.e., a O(CH2)2ON(CH3)2 group, also known as 2′-DMAOE, as described in examples hereinbelow, and 2′-dimethylaminoethoxyethoxy (also known in the art as 2′-O-dimethyl-amino-ethoxy-ethyl or 2′-DMAEOE), i.e., 2′-O—CH2—O—CH2—N(CH3)2.

Other suitable sugar substituent groups include methoxy (—O—CH3), aminopropoxy (—OCH2CH2CH2NH2), allyl (—CH2—CH═CH2), —O-allyl CH2—CH═CH2) and fluoro (F). 2′-sugar substituent groups may be in the arabino (up) position or ribo (down) position. A suitable 2′-arabino modification is 2′-F. Similar modifications may also be made at other positions on the oligomeric compound, particularly the 3′ position of the sugar on the 3′ terminal nucleoside or in 2′-5′ linked oligonucleotides and the 5′ position of 5′ terminal nucleotide. Oligomeric compounds may also have sugar mimetics such as cyclobutyl moieties in place of the pentofuranosyl sugar.

Base Modifications and Substitutions

A subject nucleic acid may also include nucleobase (often referred to in the art simply as “base”) modifications or substitutions. As used herein, “unmodified” or “natural” nucleobases include the purine bases adenine (A) and guanine (G), and the pyrimidine bases thymine (T), cytosine (C) and uracil (U). Modified nucleobases include other synthetic and natural nucleobases such as 5-methylcytosine (5-me-C), 5-hydroxymethyl cytosine, xanthine, hypoxanthine, 2-aminoadenine, 6-methyl and other alkyl derivatives of adenine and guanine, 2-propyl and other alkyl derivatives of adenine and guanine, 2-thiouracil, 2-thiothymine and 2-thiocytosine, 5-halouracil and cytosine, 5-propynyl (—C═C—CH3) uracil and cytosine and other alkynyl derivatives of pyrimidine bases, 6-azo uracil, cytosine and thymine, 5-uracil (pseudouracil), 4-thiouracil, 8-halo, 8-amino, 8-thiol, 8-thioalkyl, 8-hydroxyl and other 8-substituted adenines and guanines, 5-halo particularly 5-bromo, 5-trifluoromethyl and other 5-substituted uracils and cytosines, 7-methylguanine and 7-methyladenine, 2-F-adenine, 2-aminoadenine, 8-azaguanine and 8-azaadenine, 7-deazaguanine and 7-deazaadenine and 3-deazaguanine and 3-deazaadenine. Further modified nucleobases include tricyclic pyrimidines such as phenoxazine cytidine(1H-pyrimido(5,4-b)(1,4)benzoxazin-2(3H)-one), phenothiazine cytidine (1H-pyrimido(5,4-b)(1,4)benzothiazin-2(3H)-one), G-clamps such as a substituted phenoxazine cytidine (e.g. 9-(2-aminoethoxy)-H-pyrimido(5,4-(b) (1,4)benzoxazin-2(3H)-one), carbazole cytidine (2H-pyrimido(4,5-b)indol-2-one), pyridoindole cytidine (H-pyrido(3′,2′:4,5)pyrrolo(2,3-d)pyrimidin-2-one).

Heterocyclic base moieties may also include those in which the purine or pyrimidine base is replaced with other heterocycles, for example 7-deaza-adenine, 7-deazaguanosine, 2-aminopyridine and 2-pyridone. Further nucleobases include those disclosed in U.S. Pat. No. 3,687,808, those disclosed in The Concise Encyclopedia Of Polymer Science And Engineering, pages 858-859, Kroschwitz, J. I., ed. John Wiley & Sons, 1990, those disclosed by Englisch et al., Angewandte Chemie, International Edition, 1991, 30, 613, and those disclosed by Sanghvi, Y. S., Chapter 15, Antisense Research and Applications, pages 289-302, Crooke, S. T. and Lebleu, B., ed., CRC Press, 1993. Certain of these nucleobases are useful for increasing the binding affinity of an oligomeric compound. These include 5-substituted pyrimidines, 6-azapyrimidines and N-2, N-6 and O-6 substituted purines, including 2-aminopropyladenine, 5-propynyluracil and 5-propynylcytosine. 5-methylcytosine substitutions have been shown to increase nucleic acid duplex stability by 0.6-1.2° C. (Sanghvi et al., eds., Antisense Research and Applications, CRC Press, Boca Raton, 1993, pp. 276-278) and are suitable base substitutions, e.g., when combined with 2′-O-methoxyethyl sugar modifications.

Conjugates

Another possible modification of a subject nucleic acid involves chemically linking to the polynucleotide one or more moieties or conjugates which enhance the activity, cellular distribution or cellular uptake of the oligonucleotide. These moieties or conjugates can include conjugate groups covalently bound to functional groups such as primary or secondary hydroxyl groups. Conjugate groups include, but are not limited to, intercalators, reporter molecules, polyamines, polyamides, polyethylene glycols, polyethers, groups that enhance the pharmacodynamic properties of oligomers, and groups that enhance the pharmacokinetic properties of oligomers. Suitable conjugate groups include, but are not limited to, cholesterols, lipids, phospholipids, biotin, phenazine, folate, phenanthridine, anthraquinone, acridine, fluoresceins, rhodamines, coumarins, and dyes. Groups that enhance the pharmacodynamic properties include groups that improve uptake, enhance resistance to degradation, and/or strengthen sequence-specific hybridization with the target nucleic acid. Groups that enhance the pharmacokinetic properties include groups that improve uptake, distribution, metabolism or excretion of a subject nucleic acid.

Conjugate moieties include but are not limited to lipid moieties such as a cholesterol moiety (Letsinger et al., Proc. Natl. Acad. Sci. USA, 1989, 86, 6553-6556), cholic acid (Manoharan et al., Bioorg. Med. Chem. Let., 1994, 4, 1053-1060), a thioether, e.g., hexyl-S-tritylthiol (Manoharan et al., Ann. N.Y. Acad. Sci., 1992, 660, 306-309; Manoharan et al., Bioorg. Med. Chem. Let., 1993, 3, 2765-2770), a thiocholesterol (Oberhauser et al., Nucl. Acids Res., 1992, 20, 533-538), an aliphatic chain, e.g., dodecandiol or undecyl residues (Saison-Behmoaras et al., EMBO J., 1991, 10, 1111-1118; Kabanov et al., FEBS Lett., 1990, 259, 327-330; Svinarchuk et al., Biochimie, 1993, 75, 49-54), a phospholipid, e.g., di-hexadecyl-rac-glycerol or triethylammonium 1,2-di-O-hexadecyl-rac-glycero-3-H-phosphonate (Manoharan et al., Tetrahedron Lett., 1995, 36, 3651-3654; Shea et al., Nucl. Acids Res., 1990, 18, 3777-3783), a polyamine or a polyethylene glycol chain (Manoharan et al., Nucleosides & Nucleotides, 1995, 14, 969-973), or adamantane acetic acid (Manoharan et al., Tetrahedron Lett., 1995, 36, 3651-3654), a palmityl moiety (Mishra et al., Biochim. Biophys. Acta, 1995, 1264, 229-237), or an octadecylamine or hexylamino-carbonyl-oxycholesterol moiety (Crooke et al., J. Pharmacol. Exp. Ther., 1996, 277, 923-937.\

A conjugate may include a “Protein Transduction Domain” or PTD (also known as a CPP—cell penetrating peptide), which may refer to a polypeptide, polynucleotide, carbohydrate, or organic or inorganic compound that facilitates traversing a lipid bilayer, micelle, cell membrane, organelle membrane, or vesicle membrane. A PTD attached to another molecule, which can range from a small polar molecule to a large macromolecule and/or a nanoparticle, facilitates the molecule traversing a membrane, for example going from extracellular space to intracellular space, or cytosol to within an organelle. In some embodiments, a PTD is covalently linked to the amino terminus of an exogenous polypeptide (e.g., a site-directed modifying polypeptide). In some embodiments, a PTD is covalently linked to the carboxyl terminus of an exogenous polypeptide (e.g., a site-directed modifying polypeptide). In some embodiments, a PTD is covalently linked to a nucleic acid (e.g., a DNA-targeting RNA, a polynucleotide encoding a DNA-targeting RNA, a polynucleotide encoding a site-directed modifying polypeptide, etc.). Exemplary PTDs include but are not limited to a minimal undecapeptide protein transduction domain (corresponding to residues 47-57 of HIV-1 TAT comprising YGRKKRRQRRR; SEQ ID NO:264); a polyarginine sequence comprising a number of arginines sufficient to direct entry into a cell (e.g., 3, 4, 5, 6, 7, 8, 9, 10, or 10-50 arginines); a VP22 domain (Zender et al. (2002) Cancer Gene Ther. 9(6):489-96); an Drosophila Antennapedia protein transduction domain (Noguchi et al. (2003) Diabetes 52(7):1732-1737); a truncated human calcitonin peptide (Trehin et al. (2004) Pharm. Research 21:1248-1256); polylysine (Wender et al. (2000) Proc. Natl. Acad. Sci. USA 97:13003-13008); RRQRRTSKLMKR (SEQ ID NO:265); Transportan GWTLNSAGYLLGKINLKALAALAKKIL (SEQ ID NO:266); KALAWEAKLAKALAKALAKHLAKALAKALKCEA (SEQ ID NO:267); and RQIKIWFQNRRMKWKK (SEQ ID NO:268). Exemplary PTDs include but are not limited to, YGRKKRRQRRR (SEQ ID NO:264), RKKRRQRRR (SEQ ID NO:269); an arginine homopolymer of from 3 arginine residues to 50 arginine residues; Exemplary PTD domain amino acid sequences include, but are not limited to, any of the following: YGRKKRRQRRR (SEQ ID NO:264); RKKRRQRR (SEQ ID NO:270); YARAAARQARA (SEQ ID NO:271); THRLPRRRRRR (SEQ ID NO:272); and GGRRARRRRRR (SEQ ID NO:273). In some embodiments, the PTD is an activatable CPP (ACPP) (Aguilera et al. (2009) Integr Biol (Camb) June; 1(5-6): 371-381). ACPPs comprise a polycationic CPP (e.g., Arg9 or “R9”) connected via a cleavable linker to a matching polyanion (e.g., Glu9 or “E9”), which reduces the net charge to nearly zero and thereby inhibits adhesion and uptake into cells. Upon cleavage of the linker, the polyanion is released, locally unmasking the polyarginine and its inherent adhesiveness, thus “activating” the ACPP to traverse the membrane.

Exemplary DNA-Targeting RNAs

In some embodiments, a suitable DNA-targeting RNA comprises two separate RNA polynucleotide molecules. The first of the two separate RNA polynucleotide molecules (the activator-RNA) comprises a nucleotide sequence having at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, at least about 99%, or 100% nucleotide sequence identity over a stretch of at least 8 contiguous nucleotides to any one of the nucleotide sequences set forth in SEQ ID NOs:431-562, or complements thereof. The second of the two separate RNA polynucleotide molecules (the targeter-RNA) comprises a nucleotide sequence having at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, at least about 99%, or 100% nucleotide sequence identity over a stretch of at least 8 contiguous nucleotides to any one of the nucleotide sequences set forth in SEQ ID NOs:563-679, or complements thereof.

In some embodiments, a suitable DNA-targeting RNA is a single RNA polynucleotide and comprises a first nucleotide sequence having at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, at least about 99%, or 100% nucleotide sequence identity over a stretch of at least 8 contiguous nucleotides to any one of the nucleotide sequences set forth in SEQ ID NOs:431-562 and a second nucleotide sequence having at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, at least about 99%, or 100% nucleotide sequence identity over a stretch of at least 8 contiguous nucleotides to any one of the nucleotide sequences set forth in SEQ ID NOs: 463-679.

In some embodiments, the DNA-targeting RNA is a double-molecule DNA-targeting RNA and the targeter-RNA comprises the sequence 5′GUUUUAGAGCUA-3′ (SEQ ID NO:679) linked at its 5′ end to a stretch of nucleotides that are complementary to a target DNA. In some embodiments, the DNA-targeting RNA is a double-molecule DNA-targeting RNA and the activator-RNA comprises the sequence 5′ UAGCAAGUUAAAAUAAGGCUAGUCCG-3′ (SEQ ID NO: 397).

In some embodiments, the DNA-targeting RNA is a single-molecule DNA-targeting RNA and comprises the sequence 5′-GUUUUAGAGCUA-linker-UAGCAAGUUAAAAUAAGGCUAGUCCG-3′ linked at its 5′ end to a stretch of nucleotides that are complementary to a target DNA (where “linker” denotes any a linker nucleotide sequence that can comprise any nucleotide sequence) (SEQ ID NO: 680). Other exemplary single-molecule DNA-targeting RNAs include those set forth in SEQ ID NOs: 680-682.

Nucleic Acids Encoding a Subject DNA-Targeting RNA and/or a Subject Site-Directed Modifying Polypeptide

The present disclosure provides a nucleic acid comprising a nucleotide sequence encoding a subject DNA-targeting RNA and/or a subject site-directed modifying polypeptide. In some embodiments, a subject DNA-targeting RNA-encoding nucleic acid is an expression vector, e.g., a recombinant expression vector.

In some embodiments, a subject method involves contacting a target DNA or introducing into a cell (or a population of cells) one or more nucleic acids comprising nucleotide sequences encoding a DNA-targeting RNA and/or a site-directed modifying polypeptide. In some embodiments a cell comprising a target DNA is in vitro. In some embodiments a cell comprising a target DNA is in vivo. Suitable nucleic acids comprising nucleotide sequences encoding a DNA-targeting RNA and/or a site-directed modifying polypeptide include expression vectors, where an expression vector comprising a nucleotide sequence encoding a DNA-targeting RNA and/or a site-directed modifying polypeptide is a “recombinant expression vector.”

In some embodiments, the recombinant expression vector is a viral construct, e.g., a recombinant adeno-associated virus construct (see, e.g., U.S. Pat. No. 7,078,387), a recombinant adenoviral construct, a recombinant lentiviral construct, a recombinant retroviral construct, etc.

Suitable expression vectors include, but are not limited to, viral vectors (e.g. viral vectors based on vaccinia virus; poliovirus; adenovirus (see, e.g., Li et al., Invest Opthalmol Vis Sci 35:2543 2549, 1994; Borras et al., Gene Ther 6:515 524, 1999; Li and Davidson, PNAS 92:7700 7704, 1995; Sakamoto et al., H Gene Ther 5:1088 1097, 1999; WO 94/12649, WO 93/03769; WO 93/19191; WO 94/28938; WO 95/11984 and WO 95/00655); adeno-associated virus (see, e.g., Ali et al., Hum Gene Ther 9:81 86, 1998, Flannery et al., PNAS 94:6916 6921, 1997; Bennett et al., Invest Opthalmol Vis Sci 38:2857 2863, 1997; Jomary et al., Gene Ther 4:683 690, 1997, Rolling et al., Hum Gene Ther 10:641 648, 1999; Ali et al., Hum Mol Genet 5:591 594, 1996; Srivastava in WO 93/09239, Samulski et al., J. Vir. (1989) 63:3822-3828; Mendelson et al., Virol. (1988) 166:154-165; and Flotte et al., PNAS (1993) 90:10613-10617); SV40; herpes simplex virus; human immunodeficiency virus (see, e.g., Miyoshi et al., PNAS 94:10319 23, 1997; Takahashi et al., J Virol 73:7812 7816, 1999); a retroviral vector (e.g., Murine Leukemia Virus, spleen necrosis virus, and vectors derived from retroviruses such as Rous Sarcoma Virus, Harvey Sarcoma Virus, avian leukosis virus, a lentivirus, human immunodeficiency virus, myeloproliferative sarcoma virus, and mammary tumor virus); and the like.

Numerous suitable expression vectors are known to those of skill in the art, and many are commercially available. The following vectors are provided by way of example; for eukaryotic host cells: pXT1, pSG5 (Stratagene), pSVK3, pBPV, pMSG, and pSVLSV40 (Pharmacia). However, any other vector may be used so long as it is compatible with the host cell.

Depending on the host/vector system utilized, any of a number of suitable transcription and translation control elements, including constitutive and inducible promoters, transcription enhancer elements, transcription terminators, etc. may be used in the expression vector (see e.g., Bitter et al. (1987) Methods in Enzymology, 153:516-544).

In some embodiments, a nucleotide sequence encoding a DNA-targeting RNA and/or a site-directed modifying polypeptide is operably linked to a control element, e.g., a transcriptional control element, such as a promoter. The transcriptional control element may be functional in either a eukaryotic cell, e.g., a mammalian cell; or a prokaryotic cell (e.g., bacterial or archaeal cell). In some embodiments, a nucleotide sequence encoding a DNA-targeting RNA and/or a site-directed modifying polypeptide is operably linked to multiple control elements that allow expression of the nucleotide sequence encoding a DNA-targeting RNA and/or a site-directed modifying polypeptide in both prokaryotic and eukaryotic cells.

Non-limiting examples of suitable eukaryotic promoters (promoters functional in a eukaryotic cell) include those from cytomegalovirus (CMV) immediate early, herpes simplex virus (HSV) thymidine kinase, early and late SV40, long terminal repeats (LTRs) from retrovirus, and mouse metallothionein-I. Selection of the appropriate vector and promoter is well within the level of ordinary skill in the art. The expression vector may also contain a ribosome binding site for translation initiation and a transcription terminator. The expression vector may also include appropriate sequences for amplifying expression. The expression vector may also include nucleotide sequences encoding protein tags (e.g., 6×His tag, hemagglutinin tag, green fluorescent protein, etc.) that are fused to the site-directed modifying polypeptide, thus resulting in a chimeric polypeptide.

In some embodiments, a nucleotide sequence encoding a DNA-targeting RNA and/or a site-directed modifying polypeptide is operably linked to an inducible promoter. In some embodiments, a nucleotide sequence encoding a DNA-targeting RNA and/or a site-directed modifying polypeptide is operably linked to a constitutive promoter.

Methods of introducing a nucleic acid into a host cell are known in the art, and any known method can be used to introduce a nucleic acid (e.g., an expression construct) into a cell. Suitable methods include e.g., viral or bacteriophage infection, transfection, conjugation, protoplast fusion, lipofection, electroporation, calcium phosphate precipitation, polyethyleneimine (PEI)-mediated transfection, DEAE-dextran mediated transfection, liposome-mediated transfection, particle gun technology, calcium phosphate precipitation, direct micro injection, nanoparticle-mediated nucleic acid delivery (see, e.g., Panyam et., al Adv Drug Deliv Rev. 2012 Sep. 13. pii: S0169-409X(12)00283-9. doi: 10.1016/j.addr.2012.09.023), and the like.

Chimeric Polypeptides

The present disclosure provides a chimeric site-directed modifying polypeptide. A subject chimeric site-directed modifying polypeptide interacts with (e.g., binds to) a subject DNA-targeting RNA (described above). The DNA-targeting RNA guides the chimeric site-directed modifying polypeptide to a target sequence within target DNA (e.g. a chromosomal sequence or an extrachromosomal sequence, e.g. an episomal sequence, a minicircle sequence, a mitochondrial sequence, a chloroplast sequence, etc.). A subject chimeric site-directed modifying polypeptide modifies target DNA (e.g., cleavage or methylation of target DNA) and/or a polypeptide associated with target DNA (e.g., methylation or acetylation of a histone tail).

A subject chimeric site-directed modifying polypeptide modifies target DNA (e.g., cleavage or methylation of target DNA) and/or a polypeptide associated with target DNA (e.g., methylation or acetylation of a histone tail). A chimeric site-directed modifying polypeptide is also referred to herein as a “chimeric site-directed polypeptide” or a “chimeric RNA binding site-directed modifying polypeptide.”

A subject chimeric site-directed modifying polypeptide comprises two portions, an RNA-binding portion and an activity portion. A subject chimeric site-directed modifying polypeptide comprises amino acid sequences that are derived from at least two different polypeptides. A subject chimeric site-directed modifying polypeptide can comprise modified and/or naturally-occurring polypeptide sequences (e.g., a first amino acid sequence from a modified or unmodified Cas9/Csn1 protein; and a second amino acid sequence other than the Cas9/Csn1 protein).

RNA-Binding Portion

In some cases, the RNA-binding portion of a subject chimeric site-directed modifying polypeptide is a naturally-occurring polypeptide. In other cases, the RNA-binding portion of a subject chimeric site-directed modifying polypeptide is not a naturally-occurring molecule (modified, e.g., mutation, deletion, insertion). Naturally-occurring RNA-binding portions of interest are derived from site-directed modifying polypeptides known in the art. For example, SEQ ID NOs:1-256 and 795-1346 provide a non-limiting and non-exhaustive list of naturally occurring Cas9/Csn1 endonucleases that can be used as site-directed modifying polypeptides. In some cases, the RNA-binding portion of a subject chimeric site-directed modifying polypeptide comprises an amino acid sequence having at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, at least about 99%, or 100%, amino acid sequence identity to the RNA-binding portion of a polypeptide having any of the amino acid sequences set forth as SEQ ID NOs:1-256 and 795-1346).

In some cases, the site-directed modifying polypeptide comprises an amino acid sequence having at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 99%, or 100%, amino acid sequence identity to amino acids 7-166 or 731-1003 of the Cas9/Csn1 amino acid sequence depicted in FIG. 3A and FIG. 3B, or to the corresponding portions in any of the amino acid sequences set forth as SEQ ID NOs:1-256 and 795-1346.

Activity Portion

In addition to the RNA-binding portion, the chimeric site-directed modifying polypeptide comprises an “activity portion.” In some embodiments, the activity portion of a subject chimeric site-directed modifying polypeptide comprises the naturally-occurring activity portion of a site-directed modifying polypeptide (e.g., Cas9/Csn1 endonuclease). In other embodiments, the activity portion of a subject chimeric site-directed modifying polypeptide comprises a modified amino acid sequence (e.g., substitution, deletion, insertion) of a naturally-occurring activity portion of a site-directed modifying polypeptide. Naturally-occurring activity portions of interest are derived from site-directed modifying polypeptides known in the art. For example, SEQ ID NOs:1-256 and 795-1346 provide a non-limiting and non-exhaustive list of naturally occurring Cas9/Csn1 endonucleases that can be used as site-directed modifying polypeptides. The activity portion of a subject chimeric site-directed modifying polypeptide is variable and may comprise any heterologous polypeptide sequence that may be useful in the methods disclosed herein.

In some embodiments, a subject chimeric site-directed modifying polypeptide comprises: (i) an RNA-binding portion that interacts with a DNA-targeting RNA, wherein the DNA-targeting RNA comprises a nucleotide sequence that is complementary to a sequence in a target DNA; and (ii) an activity portion that exhibits site-directed enzymatic activity (e.g., activity for DNA methylation, activity for DNA cleavage, activity for histone acetylation, activity for histone methylation, etc.), wherein the site of enzymatic activity is determined by the DNA-targeting RNA.

In other embodiments, a subject chimeric site-directed modifying polypeptide comprises: (i) an RNA-binding portion that interacts with a DNA-targeting RNA, wherein the DNA-targeting RNA comprises a nucleotide sequence that is complementary to a sequence in a target DNA; and (ii) an activity portion that modulates transcription within the target DNA (e.g., to increase or decrease transcription), wherein the site of modulated transcription within the target DNA is determined by the DNA-targeting RNA.

In some cases, the activity portion of a subject chimeric site-directed modifying polypeptide has enzymatic activity that modifies target DNA (e.g., nuclease activity, methyltransferase activity, demethylase activity, DNA repair activity, DNA damage activity, deamination activity, dismutase activity, alkylation activity, depurination activity, oxidation activity, pyrimidine dimer forming activity, integrase activity, transposase activity, recombinase activity, polymerase activity, ligase activity, helicase activity, photolyase activity or glycosylase activity).

In other cases, the activity portion of a subject chimeric site-directed modifying polypeptide has enzymatic activity (e.g., methyltransferase activity, demethylase activity, acetyltransferase activity, deacetylase activity, kinase activity, phosphatase activity, ubiquitin ligase activity, deubiquitinating activity, adenylation activity, deadenylation activity, SUMOylating activity, deSUMOylating activity, ribosylation activity, deribosylation activity, myristoylation activity or demyristoylation activity) that modifies a polypeptide associated with target DNA (e.g., a histone).

In some cases, the activity portion of a subject chimeric site-directed modifying polypeptide exhibits enzymatic activity (described above). In other cases, the activity portion of a subject chimeric site-directed modifying polypeptide modulates transcription of the target DNA (described above). The activity portion of a subject chimeric site-directed modifying polypeptide is variable and may comprise any heterologous polypeptide sequence that may be useful in the methods disclosed herein.

Exemplary Chimeric Site-Directed Modifying Polypeptides

In some embodiments, the activity portion of the chimeric site-directed modifying polypeptide comprises a modified form of the Cas9/Csn1 protein. In some instances, the modified form of the Cas9/Csn1 protein comprises an amino acid change (e.g., deletion, insertion, or substitution) that reduces the naturally-occurring nuclease activity of the Cas9/Csn1 protein. For example, in some instances, the modified form of the Cas9/Csn1 protein has less than 50%, less than 40%, less than 30%, less than 20%, less than 10%, less than 5%, or less than 1% of the nuclease activity of the corresponding wild-type Cas9/Csn1 polypeptide. In some cases, the modified form of the Cas9/Csn1 polypeptide has no substantial nuclease activity.

In some embodiments, the modified form of the Cas9/Csn1 polypeptide is a D10A (aspartate to alanine at amino acid position 10 of SEQ ID NO:8) mutation (or the corresponding mutation of any of the proteins presented in SEQ ID NOs:1-256 and 795-1346) that can cleave the complementary strand of the target DNA but has reduced ability to cleave the non-complementary strand of the target DNA (see FIG. 11A-11B). In some embodiments, the modified form of the Cas9/Csn1 polypeptide is a H840A (histidine to alanine at amino acid position 840) mutation (or the corresponding mutation of any of the proteins set forth as SEQ ID NOs:1-256 and 795-1346) that can cleave the non-complementary strand of the target DNA but has reduced ability to cleave the complementary strand of the target DNA (see FIG. 11A-11B). In some embodiments, the modified form of the Cas9/Csn1 polypeptide harbors both the D10A and the H840A mutations (or the corresponding mutations of any of the proteins set forth as SEQ ID NOs:1-256 and 795-1346) such that the polypeptide has a reduced ability to cleave both the complementary and the non-complementary strands of the target DNA. Other residues can be mutated to achieve the above effects (i.e. inactivate one or the other nuclease portions). As non-limiting examples, residues D10, G12, G17, E762, H840, N854, N863, H982, H983, A984, D986, and/or A987 (or the corresponding mutations of any of the proteins set forth as SEQ ID NOs:1-256 and 795-1346) can be altered (i.e., substituted) (seeFIGS. 3A-3B, FIG. 5, FIG. 11A, and Table 1 for more information regarding the conservation of Cas9 amino acid residues). Also, mutations other than alanine substitutions are suitable. For more information of important


TABLE 1
Table 1 lists 4 motifs that are present in Cas9
sequences from various species (see also FIG.
3A-3B and FIG. 5). The amino acids listed here
arefrom the Cas9 from S. pyogenes (SEQ ID NO: 8).
Motif
Highly
#
Motif
Amino acids (residue #s)
conserved
1
RuvC-
IGLDIGTNSVGWAVI (7-21)
D10, G12,
like I
(SEQ ID NO: 260)
G17
2
RuvC-
IVIEMARE (759-766)
E762
like II
(SEQ ID NO: 261)
3
HNH-
DVDHIVPQSFLKDDSIDNKVLTR
H840, N854,
motif
SDKN (837-863)
N863
(SEQ ID NO: 262)
4
RuvC-
HHAHDAYL (982-989)
H982, H983,
like II
(SEQ ID NO: 263)
A984, D986,
A987

In some cases, the chimeric site-directed modifying polypeptide comprises an amino acid sequence having at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 99% or 100% amino acid sequence identity to amino acids 7-166 or 731-1003 of the Cas9/Csn1 amino acid sequence depicted in FIG. 3A and FIG. 3B, or to the corresponding portions in any of the amino acid sequences set forth as SEQ ID NOs:1-256 and 795-1346. In some cases, the chimeric site-directed modifying polypeptide comprises 4 motifs (as listed in Table 4 and depicted in FIG. 3A and FIG. 5), each with amino acid sequences having at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 99% or 100% amino acid sequence identity to each of the 4 motifs listed in Table 1(SEQ ID NOs:260-263), or to the corresponding portions in any of the amino acid sequences set forth as SEQ ID NOs:1-256 and 795-1346. In some cases, the chimeric site-directed modifying polypeptide comprises amino acid sequences having at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 99% or 100% amino acid sequence identity to amino acids 7-166 or 731-1003 of the Cas9/Csn1 amino acid sequence depicted in FIG. 3A and FIG. 3B, or to the corresponding portions in any of the amino acid sequences set forth as SEQ ID NOs:1-256 and 795-1346.

In some embodiments, the activity portion of the site-directed modifying polypeptide comprises a heterologous polypeptide that has DNA-modifying activity and/or transcription factor activity and/or DNA-associated polypeptide-modifying activity. In some cases, a heterologous polypeptide replaces a portion of the Cas9/Csn1 polypeptide that provides nuclease activity. In other embodiments, a subject site-directed modifying polypeptide comprises both a portion of the Cas9/Csn1 polypeptide that normally provides nuclease activity (and that portion can be fully active or can instead be modified to have less than 100% of the corresponding wild-type activity) and a heterologous polypeptide. In other words, in some cases, a subject chimeric site-directed modifying polypeptide is a fusion polypeptide comprising both the portion of the Cas9/Csn1 polypeptide that normally provides nuclease activity and the heterologous polypeptide. In other cases, a subject chimeric site-directed modifying polypeptide is a fusion polypeptide comprising a modified variant of the activity portion of the Cas9/Csn1 polypeptide (e.g., amino acid change, deletion, insertion) and a heterologous polypeptide. In yet other cases, a subject chimeric site-directed modifying polypeptide is a fusion polypeptide comprising a heterologous polypeptide and the RNA-binding portion of a naturally-occurring or a modified site-directed modifying polypeptide.

For example, in a chimeric Cas9/Csn1 protein, a naturally-occurring (or modified, e.g., mutation, deletion, insertion) bacterial Cas9/Csn1 polypeptide may be fused to a heterologous polypeptide sequence (i.e. a polypeptide sequence from a protein other than Cas9/Csn1 or a polypeptide sequence from another organism). The heterologous polypeptide sequence may exhibit an activity (e.g., enzymatic activity) that will also be exhibited by the chimeric Cas9/Csn1 protein (e.g., methyltransferase activity, acetyltransferase activity, kinase activity, ubiquitinating activity, etc.). A heterologous nucleic acid sequence may be linked to another nucleic acid sequence (e.g., by genetic engineering) to generate a chimeric nucleotide sequence encoding a chimeric polypeptide. In some embodiments, a chimeric Cas9/Csn1 polypeptide is generated by fusing a Cas9/Csn1 polypeptide (e.g., wild type Cas9 or a Cas9 variant, e.g., a Cas9 with reduced or inactivated nuclease activity) with a heterologous sequence that provides for subcellular localization (e.g., a nuclear localization signal (NLS) for targeting to the nucleus; a mitochondrial localization signal for targeting to the mitochondria; a chloroplast localization signal for targeting to a chloroplast; an ER retention signal; and the like). In some embodiments, the heterologous sequence can provide a tag for ease of tracking or purification (e.g., a fluorescent protein, e.g., green fluorescent protein (GFP), YFP, RFP, CFP, mCherry, tdTomato, and the like; a HIS tag, e.g., a 6×His tag; a hemagglutinin (HA) tag; a FLAG tag; a Myc tag; and the like). In some embodiments, the heterologous sequence can provide for increased or decreased stability. In some embodiments, the heterologous sequence can provide a binding domain (e.g., to provide the ability of a chimeric Cas9 polypeptide to bind to another protein of interest, e.g., a DNA or histone modifying protein, a transcription factor or transcription repressor, a recruiting protein, etc.).

Examples of various additional suitable fusion partners (or fragments thereof) for a subject variant Cas9 site-directed polypeptide include, but are not limited to those listed in FIG. 54A-54C.

Nucleic Acid Encoding a Subject Chimeric Site-Directed Modifying Polypeptide

The present disclosure provides a nucleic acid comprising a nucleotide sequence encoding a subject chimeric site-directed modifying polypeptide. In some embodiments, the nucleic acid comprising a nucleotide sequence encoding a subject chimeric site-directed modifying polypeptide is an expression vector, e.g., a recombinant expression vector.

In some embodiments, a subject method involves contacting a target DNA or introducing into a cell (or a population of cells) one or more nucleic acids comprising a chimeric site-directed modifying polypeptide. Suitable nucleic acids comprising nucleotide sequences encoding a chimeric site-directed modifying polypeptide include expression vectors, where an expression vector comprising a nucleotide sequence encoding a chimeric site-directed modifying polypeptide is a “recombinant expression vector.”

In some embodiments, the recombinant expression vector is a viral construct, e.g., a recombinant adeno-associated virus construct (see, e.g., U.S. Pat. No. 7,078,387), a recombinant adenoviral construct, a recombinant lentiviral construct, etc.

Suitable expression vectors include, but are not limited to, viral vectors (e.g. viral vectors based on vaccinia virus; poliovirus; adenovirus (see, e.g., Li et al., Invest Opthalmol Vis Sci 35:2543 2549, 1994; Borras et al., Gene Ther 6:515 524, 1999; Li and Davidson, PNAS 92:7700 7704, 1995; Sakamoto et al., H Gene Ther 5:1088 1097, 1999; WO 94/12649, WO 93/03769; WO 93/19191; WO 94/28938; WO 95/11984 and WO 95/00655); adeno-associated virus (see, e.g., Ali et al., Hum Gene Ther 9:81 86, 1998, Flannery et al., PNAS 94:6916 6921, 1997; Bennett et al., Invest Opthalmol Vis Sci 38:2857 2863, 1997; Jomary et al., Gene Ther 4:683 690, 1997, Rolling et al., Hum Gene Ther 10:641 648, 1999; Ali et al., Hum Mol Genet 5:591 594, 1996; Srivastava in WO 93/09239, Samulski et al., J. Vir. (1989) 63:3822-3828; Mendelson et al., Virol. (1988) 166:154-165; and Flotte et al., PNAS (1993) 90:10613-10617); SV40; herpes simplex virus; human immunodeficiency virus (see, e.g., Miyoshi et al., PNAS 94:10319 23, 1997; Takahashi et al., J Virol 73:7812 7816, 1999); a retroviral vector (e.g., Murine Leukemia Virus, spleen necrosis virus, and vectors derived from retroviruses such as Rous Sarcoma Virus, Harvey Sarcoma Virus, avian leukosis virus, a lentivirus, human immunodeficiency virus, myeloproliferative sarcoma virus, and mammary tumor virus); and the like.

Numerous suitable expression vectors are known to those of skill in the art, and many are commercially available. The following vectors are provided by way of example; for eukaryotic host cells: pXT1, pSG5 (Stratagene), pSVK3, pBPV, pMSG, and pSVLSV40 (Pharmacia). However, any other vector may be used so long as it is compatible with the host cell.

Depending on the host/vector system utilized, any of a number of suitable transcription and translation control elements, including constitutive and inducible promoters, transcription enhancer elements, transcription terminators, etc. may be used in the expression vector (see e.g., Bitter et al. (1987) Methods in Enzymology, 153:516-544).

In some embodiments, a nucleotide sequence encoding a chimeric site-directed modifying polypeptide is operably linked to a control element, e.g., a transcriptional control element, such as a promoter. The transcriptional control element may be functional in either a eukaryotic cell, e.g., a mammalian cell; or a prokaryotic cell (e.g., bacterial or archaeal cell). In some embodiments, a nucleotide sequence encoding a chimeric site-directed modifying polypeptide is operably linked to multiple control elements that allow expression of the nucleotide sequence encoding a chimeric site-directed modifying polypeptide in both prokaryotic and eukaryotic cells.

Non-limiting examples of suitable eukaryotic promoters (promoters functional in a eukaryotic cell) include those from cytomegalovirus (CMV) immediate early, herpes simplex virus (HSV) thymidine kinase, early and late SV40, long terminal repeats (LTRs) from retrovirus, and mouse metallothionein-I. Selection of the appropriate vector and promoter is well within the level of ordinary skill in the art. The expression vector may also contain a ribosome binding site for translation initiation and a transcription terminator. The expression vector may also include appropriate sequences for amplifying expression. The expression vector may also include nucleotide sequences encoding protein tags (e.g., 6×His tag, hemagglutinin (HA) tag, a fluorescent protein (e.g., a green fluorescent protein; a yellow fluorescent protein, etc.), etc.) that are fused to the chimeric site-directed modifying polypeptide.

In some embodiments, a nucleotide sequence encoding a chimeric site-directed modifying polypeptide is operably linked to an inducible promoter (e.g., heat shock promoter, Tetracycline-regulated promoter, Steroid-regulated promoter, Metal-regulated promoter, estrogen receptor-regulated promoter, etc.). In some embodiments, a nucleotide sequence encoding a chimeric site-directed modifying polypeptide is operably linked to a spatially restricted and/or temporally restricted promoter (e.g., a tissue specific promoter, a cell type specific promoter, etc.). In some embodiments, a nucleotide sequence encoding a chimeric site-directed modifying polypeptide is operably linked to a constitutive promoter.

Methods of introducing a nucleic acid into a host cell are known in the art, and any known method can be used to introduce a nucleic acid (e.g., an expression construct) into a stem cell or progenitor cell. Suitable methods include, include e.g., viral or bacteriophage infection, transfection, conjugation, protoplast fusion, lipofection, electroporation, calcium phosphate precipitation, polyethyleneimine (PEI)-mediated transfection, DEAE-dextran mediated transfection, liposome-mediated transfection, particle gun technology, calcium phosphate precipitation, direct micro injection, nanoparticle-mediated nucleic acid delivery (see, e.g., Panyam et., al Adv Drug Deliv Rev. 2012 Sep. 13. pii: S0169-409X(12)00283-9. doi: 10.1016/j.addr.2012.09.023), and the like.

Methods

The present disclosure provides methods for modifying a target DNA and/or a target DNA-associated polypeptide. Generally, a subject method involves contacting a target DNA with a complex (a “targeting complex”), which complex comprises a DNA-targeting RNA and a site-directed modifying polypeptide.

As discussed above, a subject DNA-targeting RNA and a subject site-directed modifying polypeptide form a complex. The DNA-targeting RNA provides target specificity to the complex by comprising a nucleotide sequence that is complementary to a sequence of a target DNA. The site-directed modifying polypeptide of the complex provides the site-specific activity. In some embodiments, a subject complex modifies a target DNA, leading to, for example, DNA cleavage, DNA methylation, DNA damage, DNA repair, etc. In other embodiments, a subject complex modifies a target polypeptide associated with target DNA (e.g., a histone, a DNA-binding protein, etc.), leading to, for example, histone methylation, histone acetylation, histone ubiquitination, and the like. The target DNA may be, for example, naked DNA in vitro, chromosomal DNA in cells in vitro, chromosomal DNA in cells in vivo, etc.

In some cases, the site-directed modifying polypeptide exhibits nuclease activity that cleaves target DNA at a target DNA sequence defined by the region of complementarity between the DNA-targeting RNA and the target DNA. In some cases, when the site-directed modifying polypeptide is a Cas9 or Cas9 related polypeptide, site-specific cleavage of the target DNA occurs at locations determined by both (i) base-pairing complementarity between the DNA-targeting RNA and the target DNA; and (ii) a short motif [referred to as the protospacer adjacent motif (PAM)] in the target DNA. In some embodiments (e.g., when Cas9 from S. pyogenes, or a closely related Cas9, is used (see SEQ ID NOs:1-256 and 795-1346)), the PAM sequence of the non-complementary strand is 5′-XGG-3′, where X is any DNA nucleotide and X is immediately 3′ of the target sequence of the non-complementary strand of the target DNA (see FIG. 10A-10E). As such, the PAM sequence of the complementary strand is 5′-CCY-3′, where Y is any DNA nucleotide and Y is immediately 5′ of the target sequence of the complementary strand of the target DNA (see FIG. 10A-10E where the PAM of the non-complementary strand is 5′-GGG-3′ and the PAM of the complementary strand is 5′-CCC-3′). In some such embodiments, X and Y can be complementary and the X-Y base pair can be any basepair (e.g., X=C and Y=G; X=G and Y=C; X=A and Y=T, X=T and Y=A).

In some cases, different Cas9 proteins (i.e., Cas9 proteins from various species) may be advantageous to use in the various provided methods in order to capitalize on various enzymatic characteristics of the different Cas9 proteins (e.g., for different PAM sequence preferences; for increased or decreased enzymatic activity; for an increased or decreased level of cellular toxicity; to change the balance between NHEJ, homology-directed repair, single strand breaks, double strand breaks, etc.). Cas9 proteins from various species (see SEQ ID NOs:1-256 and 795-1346) may require different PAM sequences in the target DNA. Thus, for a particular Cas9 protein of choice, the PAM sequence requirement may be different than the 5′-XGG-3′ sequence described above.

Many Cas9 orthologus from a wide variety of species have been identified herein and the proteins share only a few identical amino acids. All identified Cas9 orthologs have the same domain architecture with a central HNH endonuclease domain and a split RuvC/RNaseH domain (See FIG. 3A, FIG. 3B, FIG. 5, and Table 1). Cas9 proteins share 4 key motifs with a conserved architecture. Motifs 1, 2, and 4 are RuvC like motifs while motif 3 is an HNH-motif. In some cases, a suitable site-directed modifying polypeptide comprises an amino acid sequence having 4 motifs, each of motifs 1-4 having at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 99% or 100% amino acid sequence identity to the motifs 1-4 of the Cas9/Csn1 amino acid sequence depicted in FIG. 3A (SEQ ID NOs:260-263, respectively, as depicted in Table 1), or to the corresponding portions in any of the amino acid sequences set forth in SEQ ID NOs:1-256 and 795-1346 (see FIG. 5 for an alignment of motifs 1-4 from divergent Cas9 sequences). In some cases, a suitable site-directed modifying polypeptide comprises an amino acid sequence having at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 99% or 100% amino acid sequence identity to amino acids 7-166 or 731-1003 of the Cas9/Csn1 amino acid sequence depicted in FIG. 3A and FIG. 3B, or to the corresponding portions in any of the amino acid sequences set forth as SEQ ID NOs:1-256 and 795-1346. Any Cas9 protein as defined above can be used as a site-directed modifying polypeptide or as part of a chimeric site-directed modifying polypeptide of the subject methods.

The nuclease activity cleaves target DNA to produce double strand breaks. These breaks are then repaired by the cell in one of two ways: non-homologous end joining, and homology-directed repair (FIG. 2). In non-homologous end joining (NHEJ), the double-strand breaks are repaired by direct ligation of the break ends to one another. As such, no new nucleic acid material is inserted into the site, although some nucleic acid material may be lost, resulting in a deletion. In homology-directed repair, a donor polynucleotide with homology to the cleaved target DNA sequence is used as a template for repair of the cleaved target DNA sequence, resulting in the transfer of genetic information from the donor polynucleotide to the target DNA. As such, new nucleic acid material may be inserted/copied into the site. In some cases, a target DNA is contacted with a subject donor polynucleotide. In some cases, a subject donor polynucleotide is introduced into a subject cell. The modifications of the target DNA due to NHEJ and/or homology-directed repair lead to, for example, gene correction, gene replacement, gene tagging, transgene insertion, nucleotide deletion, gene disruption, gene mutation, etc.

Accordingly, cleavage of DNA by a site-directed modifying polypeptide may be used to delete nucleic acid material from a target DNA sequence (e.g., to disrupt a gene that makes cells susceptible to infection (e.g. the CCR5 or CXCR4 gene, which makes T cells susceptible to HIV infection), to remove disease-causing trinucleotide repeat sequences in neurons, to create gene knockouts and mutations as disease models in research, etc.) by cleaving the target DNA sequence and allowing the cell to repair the sequence in the absence of an exogenously provided donor polynucleotide. Thus, the subject methods can be used to knock out a gene (resulting in complete lack of transcription or altered transcription) or to knock in genetic material into a locus of choice in the target DNA.

Alternatively, if a DNA-targeting RNA and a site-directed modifying polypeptide are coadministered to cells with a donor polynucleotide sequence that includes at least a segment with homology to the target DNA sequence, the subject methods may be used to add, i.e. insert or replace, nucleic acid material to a target DNA sequence (e.g. to “knock in” a nucleic acid that encodes for a protein, an siRNA, an miRNA, etc.), to add a tag (e.g., 6×His, a fluorescent protein (e.g., a green fluorescent protein; a yellow fluorescent protein, etc.), hemagglutinin (HA), FLAG, etc.), to add a regulatory sequence to a gene (e.g. promoter, polyadenylation signal, internal ribosome entry sequence (IRES), 2A peptide, start codon, stop codon, splice signal, localization signal, etc.), to modify a nucleic acid sequence (e.g., introduce a mutation), and the like. As such, a complex comprising a DNA-targeting RNA and a site-directed modifying polypeptide is useful in any in vitro or in vivo application in which it is desirable to modify DNA in a site-specific, i.e. “targeted”, way, for example gene knock-out, gene knock-in, gene editing, gene tagging, etc., as used in, for example, gene therapy, e.g. to treat a disease or as an antiviral, antipathogenic, or anticancer therapeutic, the production of genetically modified organisms in agriculture, the large scale production of proteins by cells for therapeutic, diagnostic, or research purposes, the induction of iPS cells, biological research, the targeting of genes of pathogens for deletion or replacement, etc.

In some embodiments, the site-directed modifying polypeptide comprises a modified form of the Cas9/Csn1 protein. In some instances, the modified form of the Cas9/Csn1 protein comprises an amino acid change (e.g., deletion, insertion, or substitution) that reduces the naturally-occurring nuclease activity of the Cas9/Csn1 protein. For example, in some instances, the modified form of the Cas9/Csn1 protein has less than 50%, less than 40%, less than 30%, less than 20%, less than 10%, less than 5%, or less than 1% of the nuclease activity of the corresponding wild-type Cas9/Csn1 polypeptide. In some cases, the modified form of the Cas9/Csn1 polypeptide has no substantial nuclease activity. When a subject site-directed modifying polypeptide is a modified form of the Cas9/Csn1 polypeptide that has no substantial nuclease activity, it can be referred to as “dCas9.”

In some embodiments, the modified form of the Cas9/Csn1 polypeptide is a D10A (aspartate to alanine at amino acid position 10 of SEQ ID NO:8) mutation (or the corresponding mutation of any of the proteins set forth as SEQ ID NOs:1-256 and 795-1346) that can cleave the complementary strand of the target DNA but has reduced ability to cleave the non-complementary strand of the target DNA (thus resulting in a single strand break (SSB) instead of a DSB; see FIG. 11A-11B). In some embodiments, the modified form of the Cas9/Csn1 polypeptide is a H840A (histidine to alanine at amino acid position 840 of SEQ ID NO:8) mutation (or the corresponding mutation of any of the proteins set forth as SEQ ID NOs:1-256 and 795-1346) that can cleave the non-complementary strand of the target DNA but has reduced ability to cleave the complementary strand of the target DNA (thus resulting in a single strand break (SSB) instead of a DSB; see FIG. 11A-11B). The use of the D10A or H840A variant of Cas9 (or the corresponding mutations in any of the proteins set forth as SEQ ID NOs:1-256 and 795-1346) can alter the expected biological outcome because the non-homologous end joining (NHEJ) is much more likely to occur when DSBs are present as opposed to SSBs. Thus, in some cases where one wishes to reduce the likelihood of DSB (and therefore reduce the likelihood of NHEJ), a D10A or H840A variant of Cas9 can be used. Other residues can be mutated to achieve the same effect (i.e. inactivate one or the other nuclease portions). As non-limiting examples, residues D10, G12, G17, E762, H840, N854, N863, H982, H983, A984, D986, and/or A987 (or the corresponding mutations of any of the proteins set forth as SEQ ID NOs:1-256 and 795-1346) can be altered (i.e., substituted) (see FIG. 3A-3B, FIG. 5, FIG. 11A, and Table 1 for more information regarding the conservation of Cas9 amino acid residues). Also, mutations other than alanine substitutions are suitable. In some embodiments when a site-directed polypeptide (e.g., site-directed modifying polypeptide) has reduced catalytic activity (e.g., when a Cas9 protein has a D10, G12, G17, E762, H840, N854, N863, H982, H983, A984, D986, and/or a A987 mutation, e.g., D10A, G12A, G17A, E762A, H840A, N854A, N863A, H982A, H983A, A984A, and/or D986A), the polypeptide can still bind to target DNA in a site-specific manner (because it is still guided to a target DNA sequence by a DNA-targeting RNA) as long as it retains the ability to interact with the DNA-targeting RNA.

In some embodiments, the modified form of the Cas9/Csn1 polypeptide harbors both the D10A and the H840A mutations (or the corresponding mutations of any of the proteins set forth as SEQ ID NOs:1-256 and 795-1346) such that the polypeptide has a reduced ability to cleave both the complementary and the non-complementary strands of the target DNA (i.e., the variant can have no substantial nuclease activity). Other residues can be mutated to achieve the same effect (i.e. inactivate one or the other nuclease portions). As non-limiting examples, residues D10, G12, G17, E762, H840, N854, N863, H982, H983, A984, D986, and/or A987 (or the corresponding mutations of any of the proteins set forth as SEQ ID NOs:1-256 and 795-1346) can be altered (i.e., substituted) (see FIG. 3A-3B, FIG. 5, FIG. 11A, and Table 1 for more information regarding the conservation of Cas9 amino acid residues). Also, mutations other than alanine substitutions are suitable.

In some embodiments, the site-directed modifying polypeptide comprises a heterologous sequence (e.g., a fusion). In some embodiments, a heterologous sequence can provide for subcellular localization of the site-directed modifying polypeptide (e.g., a nuclear localization signal (NLS) for targeting to the nucleus; a mitochondrial localization signal for targeting to the mitochondria; a chloroplast localization signal for targeting to a chloroplast; a ER retention signal; and the like). In some embodiments, a heterologous sequence can provide a tag for ease of tracking or purification (e.g., a fluorescent protein, e.g., green fluorescent protein (GFP), YFP, RFP, CFP, mCherry, tdTomato, and the like; a his tag, e.g., a 6×His tag; a hemagglutinin (HA) tag; a FLAG tag; a Myc tag; and the like). In some embodiments, the heterologous sequence can provide for increased or decreased stability.

In some embodiments, a subject site-directed modifying polypeptide can be codon-optimized. This type of optimization is known in the art and entails the mutation of foreign-derived DNA to mimic the codon preferences of the intended host organism or cell while encoding the same protein. Thus, the codons are changed, but the encoded protein remains unchanged. For example, if the intended target cell was a human cell, a human codon-optimized Cas9 (or variant, e.g., enzymatically inactive variant) would be a suitable site-directed modifying polypeptide (see SEQ ID NO:256 for an example). Any suitable site-directed modifying polypeptide (e.g., any Cas9 such as any of the sequences set forth in SEQ ID NOs:1-256 and 795-1346) can be codon optimized. As another non-limiting example, if the intended host cell were a mouse cell, than a mouse codon-optimized Cas9 (or variant, e.g., enzymatically inactive variant) would be a suitable site-directed modifying polypeptide. While codon optimization is not required, it is acceptable and may be preferable in certain cases.

In some embodiments, a subject DNA-targeting RNA and a subject site-directed modifying polypeptide are used as an inducible system for shutting off gene expression in bacterial cells. In some cases, nucleic acids encoding an appropriate DNA-targeting RNA and/or an appropriate site-directed polypeptide are incorporated into the chromosome of a target cell and are under control of an inducible promoter. When the DNA-targeting RNA and/or the site-directed polypeptide are induced, the target DNA is cleaved (or otherwise modified) at the location of interest (e.g., a target gene on a separate plasmid), when both the DNA-targeting RNA and the site-directed modifying polypeptide are present and form a complex. As such, in some cases, bacterial expression strains are engineered to include nucleic acid sequences encoding an appropriate site-directed modifying polypeptide in the bacterial genome and/or an appropriate DNA-targeting RNA on a plasmid (e.g., under control of an inducible promoter), allowing experiments in which the expression of any targeted gene (expressed from a separate plasmid introduced into the strain) could be controlled by inducing expression of the DNA-targeting RNA and the site-directed polypeptide.

In some cases, the site-directed modifying polypeptide has enzymatic activity that modifies target DNA in ways other than introducing double strand breaks. Enzymatic activity of interest that may be used to modify target DNA (e.g., by fusing a heterologous polypeptide with enzymatic activity to a site-directed modifying polypeptide, thereby generating a chimeric site-directed modifying polypeptide) includes, but is not limited methyltransferase activity, demethylase activity, DNA repair activity, DNA damage activity, deamination activity, dismutase activity, alkylation activity, depurination activity, oxidation activity, pyrimidine dimer forming activity, integrase activity, transposase activity, recombinase activity, polymerase activity, ligase activity, helicase activity, photolyase activity or glycosylase activity). Methylation and demethylation is recognized in the art as an important mode of epigenetic gene regulation while DNA damage and repair activity is essential for cell survival and for proper genome maintenance in response to environmental stresses.

As such, the methods herein find use in the epigenetic modification of target DNA and may be employed to control epigenetic modification of target DNA at any location in a target DNA by genetically engineering the desired complementary nucleic acid sequence into the DNA-targeting segment of a DNA-targeting RNA. The methods herein also find use in the intentional and controlled damage of DNA at any desired location within the target DNA. The methods herein also find use in the sequence-specific and controlled repair of DNA at any desired location within the target DNA. Methods to target DNA-modifying enzymatic activities to specific locations in target DNA find use in both research and clinical applications.

In some cases, the site-directed modifying polypeptide has activity that modulates the transcription of target DNA (e.g., in the case of a chimeric site-directed modifying polypeptide, etc.). In some cases, a chimeric site-directed modifying polypeptides comprising a heterologous polypeptide that exhibits the ability to increase or decrease transcription (e.g., transcriptional activator or transcription repressor polypeptides) is used to increase or decrease the transcription of target DNA at a specific location in a target DNA, which is guided by the DNA-targeting segment of the DNA-targeting RNA. Examples of source polypeptides for providing a chimeric site-directed modifying polypeptide with transcription modulatory activity include, but are not limited to light-inducible transcription regulators, small molecule/drug-responsive transcription regulators, transcription factors, transcription repressors, etc. In some cases, the subject method is used to control the expression of a targeted coding-RNA (protein-encoding gene) and/or a targeted non-coding RNA (e.g., tRNA, rRNA, snoRNA, siRNA, miRNA, long ncRNA, etc.).

In some cases, the site-directed modifying polypeptide has enzymatic activity that modifies a polypeptide associated with DNA (e.g. histone). In some embodiments, the enzymatic activity is methyltransferase activity, demethylase activity, acetyltransferase activity, deacetylase activity, kinase activity, phosphatase activity, ubiquitin ligase activity (i.e., ubiquitination activity), deubiquitinating activity, adenylation activity, deadenylation activity, SUMOylating activity, deSUMOylating activity, ribosylation activity, deribosylation activity, myristoylation activity, demyristoylation activity glycosylation activity (e.g., from O-GlcNAc transferase) or deglycosylation activity. The enzymatic activities listed herein catalyze covalent modifications to proteins. Such modifications are known in the art to alter the stability or activity of the target protein (e.g., phosphorylation due to kinase activity can stimulate or silence protein activity depending on the target protein). Of particular interest as protein targets are histones. Histone proteins are known in the art to bind DNA and form complexes known as nucleosomes. Histones can be modified (e.g., by methylation, acetylation, ubuitination, phosphorylation) to elicit structural changes in the surrounding DNA, thus controlling the accessibility of potentially large portions of DNA to interacting factors such as transcription factors, polymerases and the like. A single histone can be modified in many different ways and in many different combinations (e.g., trimethylation of lysine 27 of histone 3, H3K27, is associated with DNA regions of repressed transcription while trimethylation of lysine 4 of histone 3, H3K4, is associated with DNA regions of active transcription). Thus, a site-directed modifying polypeptide with histone-modifying activity finds use in the site specific control of DNA structure and can be used to alter the histone modification pattern in a selected region of target DNA. Such methods find use in both research and clinical applications.

In some embodiments, multiple DNA-targeting RNAs are used simultaneously to simultaneously modify different locations on the same target DNA or on different target DNAs. In some embodiments, two or more DNA-targeting RNAs target the same gene or transcript or locus. In some embodiments, two or more DNA-targeting RNAs target different unrelated loci. In some embodiments, two or more DNA-targeting RNAs target different, but related loci.

In some cases, the site-directed modifying polypeptide is provided directly as a protein. As one non-limiting example, fungi (e.g., yeast) can be transformed with exogenous protein and/or nucleic acid using spheroplast transformation (see Kawai et al., Bioeng Bugs. 2010 November-December; 1(6):395-403: “Transformation of Saccharomyces cerevisiae and other fungi: methods and possible underlying mechanism”; and Tanka et al., Nature. 2004 Mar. 18; 428(6980):323-8: “Conformational variations in an infectious protein determine prion strain differences”; both of which are herein incorporated by reference in their entirety). Thus, a site-directed modifying polypeptide (e.g., Cas9) can be incorporated into a spheroplast (with or without nucleic acid encoding a DNA-targeting RNA and with or without a donor polynucleotide) and the spheroplast can be used to introduce the content into a yeast cell. A site-directed modifying polypeptide can be introduced into a cell (provided to the cell) by any convenient method; such methods are known to those of ordinary skill in the art. As another non-limiting example, a site-directed modifying polypeptide can be injected directly into a cell (e.g., with or without nucleic acid encoding a DNA-targeting RNA and with or without a donor polynucleotide), e.g., a cell of a zebrafish embryo, the pronucleus of a fertilized mouse oocyte, etc.

Target Cells of Interest

In some of the above applications, the subject methods may be employed to induce DNA cleavage, DNA modification, and/or transcriptional modulation in mitotic or post-mitotic cells in vivo and/or ex vivo and/or in vitro (e.g., to produce genetically modified cells that can be reintroduced into an individual). Because the DNA-targeting RNA provide specificity by hybridizing to target DNA, a mitotic and/or post-mitotic cell of interest in the disclosed methods may include a cell from any organism (e.g. a bacterial cell, an archaeal cell, a cell of a single-cell eukaryotic organism, a plant cell, an algal cell, e.g., Botryococcus braunii, Chlamydomonas reinhardtii, Nannochloropsis gaditana, Chlorella pyrenoidosa, Sargassum patens C. Agardh, and the like, a fungal cell (e.g., a yeast cell), an animal cell, a cell from an invertebrate animal (e.g. fruit fly, cnidarian, echinoderm, nematode, etc.), a cell from a vertebrate animal (e.g., fish, amphibian, reptile, bird, mammal), a cell from a mammal, a cell from a rodent, a cell from a human, etc.).

Any type of cell may be of interest (e.g. a stem cell, e.g. an embryonic stem (ES) cell, an induced pluripotent stem (iPS) cell, a germ cell; a somatic cell, e.g. a fibroblast, a hematopoietic cell, a neuron, a muscle cell, a bone cell, a hepatocyte, a pancreatic cell; an in vitro or in vivo embryonic cell of an embryo at any stage, e.g., a 1-cell, 2-cell, 4-cell, 8-cell, etc. stage zebrafish embryo; etc.). Cells may be from established cell lines or they may be primary cells, where “primary cells”, “primary cell lines”, and “primary cultures” are used interchangeably herein to refer to cells and cells cultures that have been derived from a subject and allowed to grow in vitro for a limited number of passages, i.e. splittings, of the culture. For example, primary cultures are cultures that may have been passaged 0 times, 1 time, 2 times, 4 times, 5 times, 10 times, or 15 times, but not enough times go through the crisis stage. Typically, the primary cell lines of the present invention are maintained for fewer than 10 passages in vitro. Target cells are in many embodiments unicellular organisms, or are grown in culture.

If the cells are primary cells, they may be harvest from an individual by any convenient method. For example, leukocytes may be conveniently harvested by apheresis, leukocytapheresis, density gradient separation, etc., while cells from tissues such as skin, muscle, bone marrow, spleen, liver, pancreas, lung, intestine, stomach, etc. are most conveniently harvested by biopsy. An appropriate solution may be used for dispersion or suspension of the harvested cells. Such solution will generally be a balanced salt solution, e.g. normal saline, phosphate-buffered saline (PBS), Hank's balanced salt solution, etc., conveniently supplemented with fetal calf serum or other naturally occurring factors, in conjunction with an acceptable buffer at low concentration, generally from 5-25 mM. Convenient buffers include HEPES, phosphate buffers, lactate buffers, etc. The cells may be used immediately, or they may be stored, frozen, for long periods of time, being thawed and capable of being reused. In such cases, the cells will usually be frozen in 10% DMSO, 50% serum, 40% buffered medium, or some other such solution as is commonly used in the art to preserve cells at such freezing temperatures, and thawed in a manner as commonly known in the art for thawing frozen cultured cells.

Nucleic Acids Encoding a Subject DNA-Targeting RNA and/or a Subject Site-Directed Modifying Polypeptide

In some embodiments, a subject method involves contacting a target DNA or introducing into a cell (or a population of cells) one or more nucleic acids comprising nucleotide sequences encoding a DNA-targeting RNA and/or a site-directed modifying polypeptide and/or a donor polynucleotide. Suitable nucleic acids comprising nucleotide sequences encoding a DNA-targeting RNA and/or a site-directed modifying polypeptide include expression vectors, where an expression vector comprising a nucleotide sequence encoding a DNA-targeting RNA and/or a site-directed modifying polypeptide is a “recombinant expression vector.”

In some embodiments, the recombinant expression vector is a viral construct, e.g., a recombinant adeno-associated virus construct (see, e.g., U.S. Pat. No. 7,078,387), a recombinant adenoviral construct, a recombinant lentiviral construct, etc.

Suitable expression vectors include, but are not limited to, viral vectors (e.g. viral vectors based on vaccinia virus; poliovirus; adenovirus (see, e.g., Li et al., Invest Opthalmol Vis Sci 35:2543 2549, 1994; Borras et al., Gene Ther 6:515 524, 1999; Li and Davidson, PNAS 92:7700 7704, 1995; Sakamoto et al., H Gene Ther 5:1088 1097, 1999; WO 94/12649, WO 93/03769; WO 93/19191; WO 94/28938; WO 95/11984 and WO 95/00655); adeno-associated virus (see, e.g., Ali et al., Hum Gene Ther 9:81 86, 1998, Flannery et al., PNAS 94:6916 6921, 1997; Bennett et al., Invest Opthalmol Vis Sci 38:2857 2863, 1997; Jomary et al., Gene Ther 4:683 690, 1997, Rolling et al., Hum Gene Ther 10:641 648, 1999; Ali et al., Hum Mol Genet 5:591 594, 1996; Srivastava in WO 93/09239, Samulski et al., J. Vir. (1989) 63:3822-3828; Mendelson et al., Virol. (1988) 166:154-165; and Flotte et al., PNAS (1993) 90:10613-10617); SV40; herpes simplex virus; human immunodeficiency virus (see, e.g., Miyoshi et al., PNAS 94:10319 23, 1997; Takahashi et al., J Virol 73:7812 7816, 1999); a retroviral vector (e.g., Murine Leukemia Virus, spleen necrosis virus, and vectors derived from retroviruses such as Rous Sarcoma Virus, Harvey Sarcoma Virus, avian leukosis virus, a lentivirus, human immunodeficiency virus, myeloproliferative sarcoma virus, and mammary tumor virus); and the like.

Numerous suitable expression vectors are known to those of skill in the art, and many are commercially available. The following vectors are provided by way of example; for eukaryotic host cells: pXT1, pSG5 (Stratagene), pSVK3, pBPV, pMSG, and pSVLSV40 (Pharmacia). However, any other vector may be used so long as it is compatible with the host cell.

In some embodiments, a nucleotide sequence encoding a DNA-targeting RNA and/or a site-directed modifying polypeptide is operably linked to a control element, e.g., a transcriptional control element, such as a promoter. The transcriptional control element may be functional in either a eukaryotic cell, e.g., a mammalian cell, or a prokaryotic cell (e.g., bacterial or archaeal cell). In some embodiments, a nucleotide sequence encoding a DNA-targeting RNA and/or a site-directed modifying polypeptide is operably linked to multiple control elements that allow expression of the nucleotide sequence encoding a DNA-targeting RNA and/or a site-directed modifying polypeptide in both prokaryotic and eukaryotic cells.

Depending on the host/vector system utilized, any of a number of suitable transcription and translation control elements, including constitutive and inducible promoters, transcription enhancer elements, transcription terminators, etc. may be used in the expression vector (e.g., U6 promoter, H1 promoter, etc.; see above) (see e.g., Bitter et al. (1987) Methods in Enzymology, 153:516-544).

In some embodiments, a DNA-targeting RNA and/or a site-directed modifying polypeptide can be provided as RNA. In such cases, the DNA-targeting RNA and/or the RNA encoding the site-directed modifying polypeptide can be produced by direct chemical synthesis or may be transcribed in vitro from a DNA encoding the DNA-targeting RNA. Methods of synthesizing RNA from a DNA template are well known in the art. In some cases, the DNA-targeting RNA and/or the RNA encoding the site-directed modifying polypeptide will be synthesized in vitro using an RNA polymerase enzyme (e.g., T7 polymerase, T3 polymerase, SP6 polymerase, etc.). Once synthesized, the RNA may directly contact a target DNA or may be introduced into a cell by any of the well-known techniques for introducing nucleic acids into cells (e.g., microinjection, electroporation, transfection, etc).

Nucleotides encoding a DNA-targeting RNA (introduced either as DNA or RNA) and/or a site-directed modifying polypeptide (introduced as DNA or RNA) and/or a donor polynucleotide may be provided to the cells using well-developed transfection techniques; see, e.g. Angel and Yanik (2010) PLoS ONE 5(7): e11756, and the commercially available TransMessenger® reagents from Qiagen, Stemfect™ RNA Transfection Kit from Stemgent, and TransIT®-mRNA Transfection Kit from Mirus Bio LLC. See also Beumer et al. (2008) Efficient gene targeting in Drosophila by direct embryo injection with zinc-finger nucleases. PNAS 105(50):19821-19826. Alternatively, nucleic acids encoding a DNA-targeting RNA and/or a site-directed modifying polypeptide and/or a chimeric site-directed modifying polypeptide and/or a donor polynucleotide may be provided on DNA vectors. Many vectors, e.g. plasmids, cosmids, minicircles, phage, viruses, etc., useful for transferring nucleic acids into target cells are available. The vectors comprising the nucleic acid(s) may be maintained episomally, e.g. as plasmids, minicircle DNAs, viruses such cytomegalovirus, adenovirus, etc., or they may be integrated into the target cell genome, through homologous recombination or random integration, e.g. retrovirus-derived vectors such as MMLV, HIV-1, ALV, etc.

Vectors may be provided directly to the subject cells. In other words, the cells are contacted with vectors comprising the nucleic acid encoding DNA-targeting RNA and/or a site-directed modifying polypeptide and/or a chimeric site-directed modifying polypeptide and/or a donor polynucleotide such that the vectors are taken up by the cells. Methods for contacting cells with nucleic acid vectors that are plasmids, including electroporation, calcium chloride transfection, microinjection, and lipofection are well known in the art. For viral vector delivery, the cells are contacted with viral particles comprising the nucleic acid encoding a DNA-targeting RNA and/or a site-directed modifying polypeptide and/or a chimeric site-directed modifying polypeptide and/or a donor polynucleotide. Retroviruses, for example, lentiviruses, are particularly suitable to the method of the invention. Commonly used retroviral vectors are “defective”, i.e. unable to produce viral proteins required for productive infection. Rather, replication of the vector requires growth in a packaging cell line. To generate viral particles comprising nucleic acids of interest, the retroviral nucleic acids comprising the nucleic acid are packaged into viral capsids by a packaging cell line. Different packaging cell lines provide a different envelope protein (ecotropic, amphotropic or xenotropic) to be incorporated into the capsid, this envelope protein determining the specificity of the viral particle for the cells (ecotropic for murine and rat; amphotropic for most mammalian cell types including human, dog and mouse; and xenotropic for most mammalian cell types except murine cells). The appropriate packaging cell line may be used to ensure that the cells are targeted by the packaged viral particles. Methods of introducing the retroviral vectors comprising the nucleic acid encoding the reprogramming factors into packaging cell lines and of collecting the viral particles that are generated by the packaging lines are well known in the art. Nucleic acids can also introduced by direct micro-injection (e.g., injection of RNA into a zebrafish embryo).

Vectors used for providing the nucleic acids encoding DNA-targeting RNA and/or a site-directed modifying polypeptide and/or a chimeric site-directed modifying polypeptide and/or a donor polynucleotide to the subject cells will typically comprise suitable promoters for driving the expression, that is, transcriptional activation, of the nucleic acid of interest. In other words, the nucleic acid of interest will be operably linked to a promoter. This may include ubiquitously acting promoters, for example, the CMV-β-actin promoter, or inducible promoters, such as promoters that are active in particular cell populations or that respond to the presence of drugs such as tetracycline. By transcriptional activation, it is intended that transcription will be increased above basal levels in the target cell by at least about 10 fold, by at least about 100 fold, more usually by at least about 1000 fold. In addition, vectors used for providing a DNA-targeting RNA and/or a site-directed modifying polypeptide and/or a chimeric site-directed modifying polypeptide and/or a donor polynucleotide to the subject cells may include nucleic acid sequences that encode for selectable markers in the target cells, so as to identify cells that have taken up the DNA-targeting RNA and/or a site-directed modifying polypeptide and/or a chimeric site-directed modifying polypeptide and/or a donor polynucleotide.

A subject DNA-targeting RNA and/or a site-directed modifying polypeptide and/or a chimeric site-directed modifying polypeptide may instead be used to contact DNA or introduced into cells as RNA. Methods of introducing RNA into cells are known in the art and may include, for example, direct injection, transfection, or any other method used for the introduction of DNA.

A subject site-directed modifying polypeptide may instead be provided to cells as a polypeptide. Such a polypeptide may optionally be fused to a polypeptide domain that increases solubility of the product. The domain may be linked to the polypeptide through a defined protease cleavage site, e.g. a TEV sequence, which is cleaved by TEV protease. The linker may also include one or more flexible sequences, e.g. from 1 to 10 glycine residues. In some embodiments, the cleavage of the fusion protein is performed in a buffer that maintains solubility of the product, e.g. in the presence of from 0.5 to 2 M urea, in the presence of polypeptides and/or polynucleotides that increase solubility, and the like. Domains of interest include endosomolytic domains, e.g. influenza HA domain; and other polypeptides that aid in production, e.g. IF2 domain, GST domain, GRPE domain, and the like. The polypeptide may be formulated for improved stability. For example, the peptides may be PEGylated, where the polyethyleneoxy group provides for enhanced lifetime in the blood stream.

Additionally or alternatively, the subject site-directed modifying polypeptide may be fused to a polypeptide permeant domain to promote uptake by the cell. A number of permeant domains are known in the art and may be used in the non-integrating polypeptides of the present invention, including peptides, peptidomimetics, and non-peptide carriers. For example, a permeant peptide may be derived from the third alpha helix of Drosophila melanogaster transcription factor Antennapaedia, referred to as penetratin, which comprises the amino acid sequence RQIKIWFQNRRMKWKK (SEQ ID NO: 268). As another example, the permeant peptide comprises the HIV-1 tat basic region amino acid sequence, which may include, for example, amino acids 49-57 of naturally-occurring tat protein. Other permeant domains include polyarginine motifs, for example, the region of amino acids 34-56 of HIV-1 rev protein, nona-arginine, octa-arginine, and the like. (See, for example, Futaki et al. (2003) Curr Protein Pept Sci. 2003 April; 4(2): 87-9 and 446; and Wender et al. (2000) Proc. Natl. Acad. Sci. U.S.A 2000 Nov. 21; 97(24):13003-8; published U.S. Patent applications 20030220334; 20030083256; 20030032593; and 20030022831, herein specifically incorporated by reference for the teachings of translocation peptides and peptoids). The nona-arginine (R9) sequence is one of the more efficient PTDs that have been characterized (Wender et al. 2000; Uemura et al. 2002). The site at which the fusion is made may be selected in order to optimize the biological activity, secretion or binding characteristics of the polypeptide. The optimal site will be determined by routine experimentation.

A subject site-directed modifying polypeptide may be produced in vitro or by eukaryotic cells or by prokaryotic cells, and it may be further processed by unfolding, e.g. heat denaturation, DTT reduction, etc. and may be further refolded, using methods known in the art.

Modifications of interest that do not alter primary sequence include chemical derivatization of polypeptides, e.g., acylation, acetylation, carboxylation, amidation, etc. Also included are modifications of glycosylation, e.g. those made by modifying the glycosylation patterns of a polypeptide during its synthesis and processing or in further processing steps; e.g. by exposing the polypeptide to enzymes which affect glycosylation, such as mammalian glycosylating or deglycosylating enzymes. Also embraced are sequences that have phosphorylated amino acid residues, e.g. phosphotyrosine, phosphoserine, or phosphothreonine.

Also included in the subject invention are DNA-targeting RNAs and site-directed modifying polypeptides that have been modified using ordinary molecular biological techniques and synthetic chemistry so as to improve their resistance to proteolytic degradation, to change the target sequence specificity, to optimize solubility properties, to alter protein activity (e.g., transcription modulatory activity, enzymatic activity, etc) or to render them more suitable as a therapeutic agent. Analogs of such polypeptides include those containing residues other than naturally occurring L-amino acids, e.g. D-amino acids or non-naturally occurring synthetic amino acids. D-amino acids may be substituted for some or all of the amino acid residues.

The site-directed modifying polypeptides may be prepared by in vitro synthesis, using conventional methods as known in the art. Various commercial synthetic apparatuses are available, for example, automated synthesizers by Applied Biosystems, Inc., Beckman, etc. By using synthesizers, naturally occurring amino acids may be substituted with unnatural amino acids. The particular sequence and the manner of preparation will be determined by convenience, economics, purity required, and the like.

If desired, various groups may be introduced into the peptide during synthesis or during expression, which allow for linking to other molecules or to a surface. Thus cysteines can be used to make thioethers, histidines for linking to a metal ion complex, carboxyl groups for forming amides or esters, amino groups for forming amides, and the like.

The site-directed modifying polypeptides may also be isolated and purified in accordance with conventional methods of recombinant synthesis. A lysate may be prepared of the expression host and the lysate purified using HPLC, exclusion chromatography, gel electrophoresis, affinity chromatography, or other purification technique. For the most part, the compositions which are used will comprise at least 20% by weight of the desired product, more usually at least about 75% by weight, preferably at least about 95% by weight, and for therapeutic purposes, usually at least about 99.5% by weight, in relation to contaminants related to the method of preparation of the product and its purification. Usually, the percentages will be based upon total protein.

To induce DNA cleavage and recombination, or any desired modification to a target DNA, or any desired modification to a polypeptide associated with target DNA, the DNA-targeting RNA and/or the site-directed modifying polypeptide and/or the donor polynucleotide, whether they be introduced as nucleic acids or polypeptides, are provided to the cells for about 30 minutes to about 24 hours, e.g., 1 hour, 1.5 hours, 2 hours, 2.5 hours, 3 hours, 3.5 hours 4 hours, 5 hours, 6 hours, 7 hours, 8 hours, 12 hours, 16 hours, 18 hours, 20 hours, or any other period from about 30 minutes to about 24 hours, which may be repeated with a frequency of about every day to about every 4 days, e.g., every 1.5 days, every 2 days, every 3 days, or any other frequency from about every day to about every four days. The agent(s) may be provided to the subject cells one or more times, e.g. one time, twice, three times, or more than three times, and the cells allowed to incubate with the agent(s) for some amount of time following each contacting event e.g. 16-24 hours, after which time the media is replaced with fresh media and the cells are cultured further.

In cases in which two or more different targeting complexes are provided to the cell (e.g., two different DNA-targeting RNAs that are complementary to different sequences within the same or different target DNA), the complexes may be provided simultaneously (e.g. as two polypeptides and/or nucleic acids), or delivered simultaneously. Alternatively, they may be provided consecutively, e.g. the targeting complex being provided first, followed by the second targeting complex, etc. or vice versa.

Typically, an effective amount of the DNA-targeting RNA and/or site-directed modifying polypeptide and/or donor polynucleotide is provided to the target DNA or cells to induce cleavage. An effective amount of the DNA-targeting RNA and/or site-directed modifying polypeptide and/or donor polynucleotide is the amount to induce a 2-fold increase or more in the amount of target modification observed between two homologous sequences relative to a negative control, e.g. a cell contacted with an empty vector or irrelevant polypeptide. That is to say, an effective amount or dose of the DNA-targeting RNA and/or site-directed modifying polypeptide and/or donor polynucleotide will induce a 2-fold increase, a 3-fold increase, a 4-fold increase or more in the amount of target modification observed at a target DNA region, in some instances a 5-fold increase, a 6-fold increase or more, sometimes a 7-fold or 8-fold increase or more in the amount of recombination observed, e.g. an increase of 10-fold, 50-fold, or 100-fold or more, in some instances, an increase of 200-fold, 500-fold, 700-fold, or 1000-fold or more, e.g. a 5000-fold, or 10,000-fold increase in the amount of recombination observed. The amount of target modification may be measured by any convenient method. For example, a silent reporter construct comprising complementary sequence to the targeting segment (targeting sequence) of the DNA-targeting RNA flanked by repeat sequences that, when recombined, will reconstitute a nucleic acid encoding an active reporter may be cotransfected into the cells, and the amount of reporter protein assessed after contact with the DNA-targeting RNA and/or site-directed modifying polypeptide and/or donor polynucleotide, e.g. 2 hours, 4 hours, 8 hours, 12 hours, 24 hours, 36 hours, 48 hours, 72 hours or more after contact with the DNA-targeting RNA and/or site-directed modifying polypeptide and/or donor polynucleotide. As another, more sensitivity assay, for example, the extent of recombination at a genomic DNA region of interest comprising target DNA sequences may be assessed by PCR or Southern hybridization of the region after contact with a DNA-targeting RNA and/or site-directed modifying polypeptide and/or donor polynucleotide, e.g. 2 hours, 4 hours, 8 hours, 12 hours, 24 hours, 36 hours, 48 hours, 72 hours or more after contact with the DNA-targeting RNA and/or site-directed modifying polypeptide and/or donor polynucleotide.

Contacting the cells with a DNA-targeting RNA and/or site-directed modifying polypeptide and/or donor polynucleotide may occur in any culture media and under any culture conditions that promote the survival of the cells. For example, cells may be suspended in any appropriate nutrient medium that is convenient, such as Iscove's modified DMEM or RPMI 1640, supplemented with fetal calf serum or heat inactivated goat serum (about 5-10%), L-glutamine, a thiol, particularly 2-mercaptoethanol, and antibiotics, e.g. penicillin and streptomycin. The culture may contain growth factors to which the cells are responsive. Growth factors, as defined herein, are molecules capable of promoting survival, growth and/or differentiation of cells, either in culture or in the intact tissue, through specific effects on a transmembrane receptor. Growth factors include polypeptides and non-polypeptide factors. Conditions that promote the survival of cells are typically permissive of nonhomologous end joining and homology-directed repair.

In applications in which it is desirable to insert a polynucleotide sequence into a target DNA sequence, a polynucleotide comprising a donor sequence to be inserted is also provided to the cell. By a “donor sequence” or “donor polynucleotide” it is meant a nucleic acid sequence to be inserted at the cleavage site induced by a site-directed modifying polypeptide. The donor polynucleotide will contain sufficient homology to a genomic sequence at the cleavage site, e.g. 70%, 80%, 85%, 90%, 95%, or 100% homology with the nucleotide sequences flanking the cleavage site, e.g. within about 50 bases or less of the cleavage site, e.g. within about 30 bases, within about 15 bases, within about 10 bases, within about 5 bases, or immediately flanking the cleavage site, to support homology-directed repair between it and the genomic sequence to which it bears homology. Approximately 25, 50, 100, or 200 nucleotides, or more than 200 nucleotides, of sequence homology between a donor and a genomic sequence (or any integral value between 10 and 200 nucleotides, or more) will support homology-directed repair. Donor sequences can be of any length, e.g. 10 nucleotides or more, 50 nucleotides or more, 100 nucleotides or more, 250 nucleotides or more, 500 nucleotides or more, 1000 nucleotides or more, 5000 nucleotides or more, etc.

The donor sequence is typically not identical to the genomic sequence that it replaces. Rather, the donor sequence may contain at least one or more single base changes, insertions, deletions, inversions or rearrangements with respect to the genomic sequence, so long as sufficient homology is present to support homology-directed repair. In some embodiments, the donor sequence comprises a non-homologous sequence flanked by two regions of homology, such that homology-directed repair between the target DNA region and the two flanking sequences results in insertion of the non-homologous sequence at the target region. Donor sequences may also comprise a vector backbone containing sequences that are not homologous to the DNA region of interest and that are not intended for insertion into the DNA region of interest. Generally, the homologous region(s) of a donor sequence will have at least 50% sequence identity to a genomic sequence with which recombination is desired. In certain embodiments, 60%, 70%, 80%, 90%, 95%, 98%, 99%, or 99.9% sequence identity is present. Any value between 1% and 100% sequence identity can be present, depending upon the length of the donor polynucleotide.

The donor sequence may comprise certain sequence differences as compared to the genomic sequence, e.g. restriction sites, nucleotide polymorphisms, selectable markers (e.g., drug resistance genes, fluorescent proteins, enzymes etc.), etc., which may be used to assess for successful insertion of the donor sequence at the cleavage site or in some cases may be used for other purposes (e.g., to signify expression at the targeted genomic locus). In some cases, if located in a coding region, such nucleotide sequence differences will not change the amino acid sequence, or will make silent amino acid changes (i.e., changes which do not affect the structure or function of the protein). Alternatively, these sequences differences may include flanking recombination sequences such as FLPs, loxP sequences, or the like, that can be activated at a later time for removal of the marker sequence.

The donor sequence may be provided to the cell as single-stranded DNA, single-stranded RNA, double-stranded DNA, or double-stranded RNA. It may be introduced into a cell in linear or circular form. If introduced in linear form, the ends of the donor sequence may be protected (e.g., from exonucleolytic degradation) by methods known to those of skill in the art. For example, one or more dideoxynucleotide residues are added to the 3′ terminus of a linear molecule and/or self-complementary oligonucleotides are ligated to one or both ends. See, for example, Chang et al. (1987) Proc. Natl. Acad Sci USA 84:4959-4963; Nehls et al. (1996) Science 272:886-889. Additional methods for protecting exogenous polynucleotides from degradation include, but are not limited to, addition of terminal amino group(s) and the use of modified internucleotide linkages such as, for example, phosphorothioates, phosphoramidates, and O-methyl ribose or deoxyribose residues. As an alternative to protecting the termini of a linear donor sequence, additional lengths of sequence may be included outside of the regions of homology that can be degraded without impacting recombination. A donor sequence can be introduced into a cell as part of a vector molecule having additional sequences such as, for example, replication origins, promoters and genes encoding antibiotic resistance. Moreover, donor sequences can be introduced as naked nucleic acid, as nucleic acid complexed with an agent such as a liposome or poloxamer, or can be delivered by viruses (e.g., adenovirus, AAV), as described above for nucleic acids encoding a DNA-targeting RNA and/or site-directed modifying polypeptide and/or donor polynucleotide.

Following the methods described above, a DNA region of interest may be cleaved and modified, i.e. “genetically modified”, ex vivo. In some embodiments, as when a selectable marker has been inserted into the DNA region of interest, the population of cells may be enriched for those comprising the genetic modification by separating the genetically modified cells from the remaining population. Prior to enriching, the “genetically modified” cells may make up only about 1% or more (e.g., 2% or more, 3% or more, 4% or more, 5% or more, 6% or more, 7% or more, 8% or more, 9% or more, 10% or more, 15% or more, or 20% or more) of the cellular population. Separation of “genetically modified” cells may be achieved by any convenient separation technique appropriate for the selectable marker used. For example, if a fluorescent marker has been inserted, cells may be separated by fluorescence activated cell sorting, whereas if a cell surface marker has been inserted, cells may be separated from the heterogeneous population by affinity separation techniques, e.g. magnetic separation, affinity chromatography, “panning” with an affinity reagent attached to a solid matrix, or other convenient technique. Techniques providing accurate separation include fluorescence activated cell sorters, which can have varying degrees of sophistication, such as multiple color channels, low angle and obtuse light scattering detecting channels, impedance channels, etc. The cells may be selected against dead cells by employing dyes associated with dead cells (e.g. propidium iodide). Any technique may be employed which is not unduly detrimental to the viability of the genetically modified cells. Cell compositions that are highly enriched for cells comprising modified DNA are achieved in this manner. By “highly enriched”, it is meant that the genetically modified cells will be 70% or more, 75% or more, 80% or more, 85% or more, 90% or more of the cell composition, for example, about 95% or more, or 98% or more of the cell composition. In other words, the composition may be a substantially pure composition of genetically modified cells.

Genetically modified cells produced by the methods described herein may be used immediately. Alternatively, the cells may be frozen at liquid nitrogen temperatures and stored for long periods of time, being thawed and capable of being reused. In such cases, the cells will usually be frozen in 10% dimethylsulfoxide (DMSO), 50% serum, 40% buffered medium, or some other such solution as is commonly used in the art to preserve cells at such freezing temperatures, and thawed in a manner as commonly known in the art for thawing frozen cultured cells.

The genetically modified cells may be cultured in vitro under various culture conditions. The cells may be expanded in culture, i.e. grown under conditions that promote their proliferation. Culture medium may be liquid or semi-solid, e.g. containing agar, methylcellulose, etc. The cell population may be suspended in an appropriate nutrient medium, such as Iscove's modified DMEM or RPMI 1640, normally supplemented with fetal calf serum (about 5-10%), L-glutamine, a thiol, particularly 2-mercaptoethanol, and antibiotics, e.g. penicillin and streptomycin. The culture may contain growth factors to which the regulatory T cells are responsive. Growth factors, as defined herein, are molecules capable of promoting survival, growth and/or differentiation of cells, either in culture or in the intact tissue, through specific effects on a transmembrane receptor. Growth factors include polypeptides and non-polypeptide factors.

Cells that have been genetically modified in this way may be transplanted to a subject for purposes such as gene therapy, e.g. to treat a disease or as an antiviral, antipathogenic, or anticancer therapeutic, for the production of genetically modified organisms in agriculture, or for biological research. The subject may be a neonate, a juvenile, or an adult. Of particular interest are mammalian subjects. Mammalian species that may be treated with the present methods include canines and felines; equines; bovines; ovines; etc. and primates, particularly humans. Animal models, particularly small mammals (e.g. mouse, rat, guinea pig, hamster, lagomorpha (e.g., rabbit), etc.) may be used for experimental investigations.

Cells may be provided to the subject alone or with a suitable substrate or matrix, e.g. to support their growth and/or organization in the tissue to which they are being transplanted. Usually, at least 1×103 cells will be administered, for example 5×103 cells, 1×104 cells, 5×104 cells, 1×105 cells, 1×106 cells or more. The cells may be introduced to the subject via any of the following routes: parenteral, subcutaneous, intravenous, intracranial, intraspinal, intraocular, or into spinal fluid. The cells may be introduced by injection, catheter, or the like. Examples of methods for local delivery, that is, delivery to the site of injury, include, e.g. through an Ommaya reservoir, e.g. for intrathecal delivery (see e.g. U.S. Pat. Nos. 5,222,982 and 5,385,582, incorporated herein by reference); by bolus injection, e.g. by a syringe, e.g. into a joint; by continuous infusion, e.g. by cannulation, e.g. with convection (see e.g. US Application No. 20070254842, incorporated here by reference); or by implanting a device upon which the cells have been reversably affixed (see e.g. US Application Nos. 20080081064 and 20090196903, incorporated herein by reference). Cells may also be introduced into an embryo (e.g., a blastocyst) for the purpose of generating a transgenic animal (e.g., a transgenic mouse).

The number of administrations of treatment to a subject may vary. Introducing the genetically modified cells into the subject may be a one-time event; but in certain situations, such treatment may elicit improvement for a limited period of time and require an on-going series of repeated treatments. In other situations, multiple administrations of the genetically modified cells may be required before an effect is observed. The exact protocols depend upon the disease or condition, the stage of the disease and parameters of the individual subject being treated.

In other aspects of the invention, the DNA-targeting RNA and/or site-directed modifying polypeptide and/or donor polynucleotide are employed to modify cellular DNA in vivo, again for purposes such as gene therapy, e.g. to treat a disease or as an antiviral, antipathogenic, or anticancer therapeutic, for the production of genetically modified organisms in agriculture, or for biological research. In these in vivo embodiments, a DNA-targeting RNA and/or site-directed modifying polypeptide and/or donor polynucleotide are administered directly to the individual. A DNA-targeting RNA and/or site-directed modifying polypeptide and/or donor polynucleotide may be administered by any of a number of well-known methods in the art for the administration of peptides, small molecules and nucleic acids to a subject. A DNA-targeting RNA and/or site-directed modifying polypeptide and/or donor polynucleotide can be incorporated into a variety of formulations. More particularly, a DNA-targeting RNA and/or site-directed modifying polypeptide and/or donor polynucleotide of the present invention can be formulated into pharmaceutical compositions by combination with appropriate pharmaceutically acceptable carriers or diluents.

Pharmaceutical preparations are compositions that include one or more a DNA-targeting RNA and/or site-directed modifying polypeptide and/or donor polynucleotide present in a pharmaceutically acceptable vehicle. “Pharmaceutically acceptable vehicles” may be vehicles approved by a regulatory agency of the Federal or a state government or listed in the U.S. Pharmacopeia or other generally recognized pharmacopeia for use in mammals, such as humans. The term “vehicle” refers to a diluent, adjuvant, excipient, or carrier with which a compound of the invention is formulated for administration to a mammal Such pharmaceutical vehicles can be lipids, e.g. liposomes, e.g. liposome dendrimers; liquids, such as water and oils, including those of petroleum, animal, vegetable or synthetic origin, such as peanut oil, soybean oil, mineral oil, sesame oil and the like, saline; gum acacia, gelatin, starch paste, talc, keratin, colloidal silica, urea, and the like. In addition, auxiliary, stabilizing, thickening, lubricating and coloring agents may be used. Pharmaceutical compositions may be formulated into preparations in solid, semi-solid, liquid or gaseous forms, such as tablets, capsules, powders, granules, ointments, solutions, suppositories, injections, inhalants, gels, microspheres, and aerosols. As such, administration of the a DNA-targeting RNA and/or site-directed modifying polypeptide and/or donor polynucleotide can be achieved in various ways, including oral, buccal, rectal, parenteral, intraperitoneal, intradermal, transdermal, intratracheal, intraocular, etc., administration. The active agent may be systemic after administration or may be localized by the use of regional administration, intramural administration, or use of an implant that acts to retain the active dose at the site of implantation. The active agent may be formulated for immediate activity or it may be formulated for sustained release.

For some conditions, particularly central nervous system conditions, it may be necessary to formulate agents to cross the blood-brain barrier (BBB). One strategy for drug delivery through the blood-brain barrier (BBB) entails disruption of the BBB, either by osmotic means such as mannitol or leukotrienes, or biochemically by the use of vasoactive substances such as bradykinin. The potential for using BBB opening to target specific agents to brain tumors is also an option. A BBB disrupting agent can be co-administered with the therapeutic compositions of the invention when the compositions are administered by intravascular injection. Other strategies to go through the BBB may entail the use of endogenous transport systems, including Caveolin-1 mediated transcytosis, carrier-mediated transporters such as glucose and amino acid carriers, receptor-mediated transcytosis for insulin or transferrin, and active efflux transporters such as p-glycoprotein. Active transport moieties may also be conjugated to the therapeutic compounds for use in the invention to facilitate transport across the endothelial wall of the blood vessel. Alternatively, drug delivery of therapeutics agents behind the BBB may be by local delivery, for example by intrathecal delivery, e.g. through an Ommaya reservoir (see e.g. U.S. Pat. Nos. 5,222,982 and 5,385,582, incorporated herein by reference); by bolus injection, e.g. by a syringe, e.g. intravitreally or intracranially; by continuous infusion, e.g. by cannulation, e.g. with convection (see e.g. US Application No. 20070254842, incorporated here by reference); or by implanting a device upon which the agent has been reversably affixed (see e.g. US Application Nos. 20080081064 and 20090196903, incorporated herein by reference).

Typically, an effective amount of a DNA-targeting RNA and/or site-directed modifying polypeptide and/or donor polynucleotide are provided. As discussed above with regard to ex vivo methods, an effective amount or effective dose of a DNA-targeting RNA and/or site-directed modifying polypeptide and/or donor polynucleotide in vivo is the amount to induce a 2 fold increase or more in the amount of recombination observed between two homologous sequences relative to a negative control, e.g. a cell contacted with an empty vector or irrelevant polypeptide. The amount of recombination may be measured by any convenient method, e.g. as described above and known in the art. The calculation of the effective amount or effective dose of a DNA-targeting RNA and/or site-directed modifying polypeptide and/or donor polynucleotide to be administered is within the skill of one of ordinary skill in the art, and will be routine to those persons skilled in the art. The final amount to be administered will be dependent upon the route of administration and upon the nature of the disorder or condition that is to be treated.

The effective amount given to a particular patient will depend on a variety of factors, several of which will differ from patient to patient. A competent clinician will be able to determine an effective amount of a therapeutic agent to administer to a patient to halt or reverse the progression the disease condition as required. Utilizing LD50 animal data, and other information available for the agent, a clinician can determine the maximum safe dose for an individual, depending on the route of administration. For instance, an intravenously administered dose may be more than an intrathecally administered dose, given the greater body of fluid into which the therapeutic composition is being administered. Similarly, compositions which are rapidly cleared from the body may be administered at higher doses, or in repeated doses, in order to maintain a therapeutic concentration. Utilizing ordinary skill, the competent clinician will be able to optimize the dosage of a particular therapeutic in the course of routine clinical trials.

For inclusion in a medicament, a DNA-targeting RNA and/or site-directed modifying polypeptide and/or donor polynucleotide may be obtained from a suitable commercial source. As a general proposition, the total pharmaceutically effective amount of the a DNA-targeting RNA and/or site-directed modifying polypeptide and/or donor polynucleotide administered parenterally per dose will be in a range that can be measured by a dose response curve.

Therapies based on a DNA-targeting RNA and/or site-directed modifying polypeptide and/or donor polynucleotides, i.e. preparations of a DNA-targeting RNA and/or site-directed modifying polypeptide and/or donor polynucleotide to be used for therapeutic administration, must be sterile. Sterility is readily accomplished by filtration through sterile filtration membranes (e.g., 0.2 μm membranes). Therapeutic compositions generally are placed into a container having a sterile access port, for example, an intravenous solution bag or vial having a stopper pierceable by a hypodermic injection needle. The therapies based on a DNA-targeting RNA and/or site-directed modifying polypeptide and/or donor polynucleotide may be stored in unit or multi-dose containers, for example, sealed ampules or vials, as an aqueous solution or as a lyophilized formulation for reconstitution. As an example of a lyophilized formulation, 10-mL vials are filled with 5 ml of sterile-filtered 1% (w/v) aqueous solution of compound, and the resulting mixture is lyophilized. The infusion solution is prepared by reconstituting the lyophilized compound using bacteriostatic Water-for-Injection.

Pharmaceutical compositions can include, depending on the formulation desired, pharmaceutically-acceptable, non-toxic carriers of diluents, which are defined as vehicles commonly used to formulate pharmaceutical compositions for animal or human administration. The diluent is selected so as not to affect the biological activity of the combination. Examples of such diluents are distilled water, buffered water, physiological saline, PBS, Ringer's solution, dextrose solution, and Hank's solution. In addition, the pharmaceutical composition or formulation can include other carriers, adjuvants, or non-toxic, nontherapeutic, nonimmunogenic stabilizers, excipients and the like. The compositions can also include additional substances to approximate physiological conditions, such as pH adjusting and buffering agents, toxicity adjusting agents, wetting agents and detergents.

The composition can also include any of a variety of stabilizing agents, such as an antioxidant for example. When the pharmaceutical composition includes a polypeptide, the polypeptide can be complexed with various well-known compounds that enhance the in vivo stability of the polypeptide, or otherwise enhance its pharmacological properties (e.g., increase the half-life of the polypeptide, reduce its toxicity, enhance solubility or uptake). Examples of such modifications or complexing agents include sulfate, gluconate, citrate and phosphate. The nucleic acids or polypeptides of a composition can also be complexed with molecules that enhance their in vivo attributes. Such molecules include, for example, carbohydrates, polyamines, amino acids, other peptides, ions (e.g., sodium, potassium, calcium, magnesium, manganese), and lipids.

Further guidance regarding formulations that are suitable for various types of administration can be found in Remington's Pharmaceutical Sciences, Mace Publishing Company, Philadelphia, Pa., 17th ed. (1985). For a brief review of methods for drug delivery, see, Langer, Science 249:1527-1533 (1990).

The pharmaceutical compositions can be administered for prophylactic and/or therapeutic treatments. Toxicity and therapeutic efficacy of the active ingredient can be determined according to standard pharmaceutical procedures in cell cultures and/or experimental animals, including, for example, determining the LD50 (the dose lethal to 50% of the population) and the ED50 (the dose therapeutically effective in 50% of the population). The dose ratio between toxic and therapeutic effects is the therapeutic index and it can be expressed as the ratio LD50/ED50. Therapies that exhibit large therapeutic indices are preferred.

The data obtained from cell culture and/or animal studies can be used in formulating a range of dosages for humans. The dosage of the active ingredient typically lines within a range of circulating concentrations that include the ED50 with low toxicity. The dosage can vary within this range depending upon the dosage form employed and the route of administration utilized.

The components used to formulate the pharmaceutical compositions are preferably of high purity and are substantially free of potentially harmful contaminants (e.g., at least National Food (NF) grade, generally at least analytical grade, and more typically at least pharmaceutical grade). Moreover, compositions intended for in vivo use are usually sterile. To the extent that a given compound must be synthesized prior to use, the resulting product is typically substantially free of any potentially toxic agents, particularly any endotoxins, which may be present during the synthesis or purification process. Compositions for parental administration are also sterile, substantially isotonic and made under GMP conditions.

The effective amount of a therapeutic composition to be given to a particular patient will depend on a variety of factors, several of which will differ from patient to patient. A competent clinician will be able to determine an effective amount of a therapeutic agent to administer to a patient to halt or reverse the progression the disease condition as required. Utilizing LD50 animal data, and other information available for the agent, a clinician can determine the maximum safe dose for an individual, depending on the route of administration. For instance, an intravenously administered dose may be more than an intrathecally administered dose, given the greater body of fluid into which the therapeutic composition is being administered. Similarly, compositions which are rapidly cleared from the body may be administered at higher doses, or in repeated doses, in order to maintain a therapeutic concentration. Utilizing ordinary skill, the competent clinician will be able to optimize the dosage of a particular therapeutic in the course of routine clinical trials.

Genetically Modified Host Cells

The present disclosure provides genetically modified host cells, including isolated genetically modified host cells, where a subject genetically modified host cell comprises (has been genetically modified with: 1) an exogenous DNA-targeting RNA; 2) an exogenous nucleic acid comprising a nucleotide sequence encoding a DNA-targeting RNA; 3) an exogenous site-directed modifying polypeptide (e.g., a naturally occurring Cas9; a modified, i.e., mutated or variant, Cas9; a chimeric Cas9; etc.); 4) an exogenous nucleic acid comprising a nucleotide sequence encoding a site-directed modifying polypeptide; or 5) any combination of the above. A subject genetically modified cell is generated by genetically modifying a host cell with, for example: 1) an exogenous DNA-targeting RNA; 2) an exogenous nucleic acid comprising a nucleotide sequence encoding a DNA-targeting RNA; 3) an exogenous site-directed modifying polypeptide; 4) an exogenous nucleic acid comprising a nucleotide sequence encoding a site-directed modifying polypeptide; or 5) any combination of the above.).

All cells suitable to be a target cell are also suitable to be a genetically modified host cell. For example, a genetically modified host cells of interest can be a cell from any organism (e.g. a bacterial cell, an archaeal cell, a cell of a single-cell eukaryotic organism, a plant cell, an algal cell, e.g., Botryococcus braunii, Chlamydomonas reinhardtii, Nannochloropsis gaditana, Chlorella pyrenoidosa, Sargassum patens C. Agardh, and the like, a fungal cell (e.g., a yeast cell), an animal cell, a cell from an invertebrate animal (e.g. fruit fly, cnidarian, echinoderm, nematode, etc.), a cell from a vertebrate animal (e.g., fish, amphibian, reptile, bird, mammal), a cell from a mammal (e.g., a pig, a cow, a goat, a sheep, a rodent, a rat, a mouse, a non-human primate, a human, etc.), etc.

In some embodiments, a genetically modified host cell has been genetically modified with an exogenous nucleic acid comprising a nucleotide sequence encoding a site-directed modifying polypeptide (e.g., a naturally occurring Cas9; a modified, i.e., mutated or variant, Cas9; a chimeric Cas9; etc.). The DNA of a genetically modified host cell can be targeted for modification by introducing into the cell a DNA-targeting RNA (or a DNA encoding a DNA-targeting RNA, which determines the genomic location/sequence to be modified) and optionally a donor nucleic acid. In some embodiments, the nucleotide sequence encoding a site-directed modifying polypeptide is operably linked to an inducible promoter (e.g., heat shock promoter, Tetracycline-regulated promoter, Steroid-regulated promoter, Metal-regulated promoter, estrogen receptor-regulated promoter, etc.). In some embodiments, the nucleotide sequence encoding a site-directed modifying polypeptide is operably linked to a spatially restricted and/or temporally restricted promoter (e.g., a tissue specific promoter, a cell type specific promoter, etc.). In some embodiments, the nucleotide sequence encoding a site-directed modifying polypeptide is operably linked to a constitutive promoter.

In some embodiments, a subject genetically modified host cell is in vitro. In some embodiments, a subject genetically modified host cell is in vivo. In some embodiments, a subject genetically modified host cell is a prokaryotic cell or is derived from a prokaryotic cell. In some embodiments, a subject genetically modified host cell is a bacterial cell or is derived from a bacterial cell. In some embodiments, a subject genetically modified host cell is an archaeal cell or is derived from an archaeal cell. In some embodiments, a subject genetically modified host cell is a eukaryotic cell or is derived from a eukaryotic cell. In some embodiments, a subject genetically modified host cell is a plant cell or is derived from a plant cell. In some embodiments, a subject genetically modified host cell is an animal cell or is derived from an animal cell. In some embodiments, a subject genetically modified host cell is an invertebrate cell or is derived from an invertebrate cell. In some embodiments, a subject genetically modified host cell is a vertebrate cell or is derived from a vertebrate cell. In some embodiments, a subject genetically modified host cell is a mammalian cell or is derived from a mammalian cell. In some embodiments, a subject genetically modified host cell is a rodent cell or is derived from a rodent cell. In some embodiments, a subject genetically modified host cell is a human cell or is derived from a human cell.

The present disclosure further provides progeny of a subject genetically modified cell, where the progeny can comprise the same exogenous nucleic acid or polypeptide as the subject genetically modified cell from which it was derived. The present disclosure further provides a composition comprising a subject genetically modified host cell.

Genetically Modified Stem Cells and Genetically Modified Progenitor Cells

In some embodiments, a subject genetically modified host cell is a genetically modified stem cell or progenitor cell. Suitable host cells include, e.g., stem cells (adult stem cells, embryonic stem cells, iPS cells, etc.) and progenitor cells (e.g., cardiac progenitor cells, neural progenitor cells, etc.). Suitable host cells include mammalian stem cells and progenitor cells, including, e.g., rodent stem cells, rodent progenitor cells, human stem cells, human progenitor cells, etc. Suitable host cells include in vitro host cells, e.g., isolated host cells.

In some embodiments, a subject genetically modified host cell comprises an exogenous DNA-targeting RNA nucleic acid. In some embodiments, a subject genetically modified host cell comprises an exogenous nucleic acid comprising a nucleotide sequence encoding a DNA-targeting RNA. In some embodiments, a subject genetically modified host cell comprises an exogenous site-directed modifying polypeptide (e.g., a naturally occurring Cas9; a modified, i.e., mutated or variant, Cas9; a chimeric Cas9; etc.). In some embodiments, a subject genetically modified host cell comprises an exogenous nucleic acid comprising a nucleotide sequence encoding a site-directed modifying polypeptide. In some embodiments, a subject genetically modified host cell comprises exogenous nucleic acid comprising a nucleotide sequence encoding 1) a DNA-targeting RNA and 2) a site-directed modifying polypeptide.

In some cases, the site-directed modifying polypeptide comprises an amino acid sequence having at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 99%, or 100%, amino acid sequence identity to amino acids 7-166 or 731-1003 of the Cas9/Csn1 amino acid sequence depicted in FIG. 3A and FIG. 3B, or to the corresponding portions in any of the amino acid sequences set forth as SEQ ID NOs:1-256 and 795-1346.

Compositions

The present invention provides a composition comprising a subject DNA-targeting RNA and/or a site-directed modifying polypeptide. In some cases, the site-directed modifying polypeptide is a subject chimeric polypeptide. A subject composition is useful for carrying out a method of the present disclosure, e.g., a method for site-specific modification of a target DNA; a method for site-specific modification of a polypeptide associated with a target DNA; etc.

Compositions Comprising a DNA-Targeting RNA

The present invention provides a composition comprising a subject DNA-targeting RNA. The composition can comprise, in addition to the DNA-targeting RNA, one or more of: a salt, e.g., NaCl, MgCl2, KCl, MgSO4, etc.; a buffering agent, e.g., a Tris buffer, N-(2-Hydroxyethyl)piperazine-N′-(2-ethanesulfonic acid) (HEPES), 2-(N-Morpholino)ethanesulfonic acid (MES), MES sodium salt, 3-(N-Morpholino)propanesulfonic acid (MOPS), N-tris[Hydroxymethyl]methyl-3-aminopropanesulfonic acid (TAPS), etc.; a solubilizing agent; a detergent, e.g., a non-ionic detergent such as Tween-20, etc.; a nuclease inhibitor; and the like. For example, in some cases, a subject composition comprises a subject DNA-targeting RNA and a buffer for stabilizing nucleic acids.

In some embodiments, a DNA-targeting RNA present in a subject composition is pure, e.g., at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, at least about 99%, or more than 99% pure, where “% purity” means that DNA-targeting RNA is the recited percent free from other macromolecules, or contaminants that may be present during the production of the DNA-targeting RNA.

Compositions Comprising a Subject Chimeric Polypeptide

The present invention provides a composition a subject chimeric polypeptide. The composition can comprise, in addition to the DNA-targeting RNA, one or more of: a salt, e.g., NaCl, MgCl2, KCl, MgSO4, etc.; a buffering agent, e.g., a Tris buffer, HEPES, MES, MES sodium salt, MOPS, TAPS, etc.; a solubilizing agent; a detergent, e.g., a non-ionic detergent such as Tween-20, etc.; a protease inhibitor; a reducing agent (e.g., dithiothreitol); and the like.

In some embodiments, a subject chimeric polypeptide present in a subject composition is pure, e.g., at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, at least about 99%, or more than 99% pure, where “% purity” means that the site-directed modifying polypeptide is the recited percent free from other proteins, other macromolecules, or contaminants that may be present during the production of the chimeric polypeptide.

Compositions Comprising a DNA-Targeting RNA and a Site-Directed Modifying Polypeptide

The present invention provides a composition comprising: (i) a DNA-targeting RNA or a DNA polynucleotide encoding the same; and ii) a site-directed modifying polypeptide, or a polynucleotide encoding the same. In some cases, the site-directed modifying polypeptide is a subject chimeric site-directed modifying polypeptide. In other cases, the site-directed modifying polypeptide is a naturally-occurring site-directed modifying polypeptide. In some instances, the site-directed modifying polypeptide exhibits enzymatic activity that modifies a target DNA. In other cases, the site-directed modifying polypeptide exhibits enzymatic activity that modifies a polypeptide that is associated with a target DNA. In still other cases, the site-directed modifying polypeptide modulates transcription of the target DNA.

The present invention provides a composition comprising: (i) a DNA-targeting RNA, as described above, or a DNA polynucleotide encoding the same, the DNA-targeting RNA comprising: (a) a first segment comprising a nucleotide sequence that is complementary to a sequence in a target DNA; and (b) a second segment that interacts with a site-directed modifying polypeptide; and (ii) the site-directed modifying polypeptide, or a polynucleotide encoding the same, the site-directed modifying polypeptide comprising: (a) an RNA-binding portion that interacts with the DNA-targeting RNA; and (b) an activity portion that exhibits site-directed enzymatic activity, wherein the site of enzymatic activity is determined by the DNA-targeting RNA.

In some instances, a subject composition comprises: a composition comprising: (i) a subject DNA-targeting RNA, the DNA-targeting RNA comprising: (a) a first segment comprising a nucleotide sequence that is complementary to a sequence in a target DNA; and (b) a second segment that interacts with a site-directed modifying polypeptide; and (ii) the site-directed modifying polypeptide, the site-directed modifying polypeptide comprising: (a) an RNA-binding portion that interacts with the DNA-targeting RNA; and (b) an activity portion that exhibits site-directed enzymatic activity, wherein the site of enzymatic activity is determined by the DNA-targeting RNA.

In other embodiments, a subject composition comprises: (i) a polynucleotide encoding a subject DNA-targeting RNA, the DNA-targeting RNA comprising: (a) a first segment comprising a nucleotide sequence that is complementary to a sequence in a target DNA; and (b) a second segment that interacts with a site-directed modifying polypeptide; and (ii) a polynucleotide encoding the site-directed modifying polypeptide, the site-directed modifying polypeptide comprising: (a) an RNA-binding portion that interacts with the DNA-targeting RNA; and (b) an activity portion that exhibits site-directed enzymatic activity, wherein the site of enzymatic activity is determined by the DNA-targeting RNA.

In some embodiments, a subject composition includes both RNA molecules of a double-molecule DNA-targeting RNA. As such, in some embodiments, a subject composition includes an activator-RNA that comprises a duplex-forming segment that is complementary to the duplex-forming segment of a targeter-RNA (see FIG. 1A). The duplex-forming segments of the activator-RNA and the targeter-RNA hybridize to form the dsRNA duplex of the protein-binding segment of the DNA-targeting RNA. The targeter-RNA further provides the DNA-targeting segment (single stranded) of the DNA-targeting RNA and therefore targets the DNA-targeting RNA to a specific sequence within the target DNA. As one non-limiting example, the duplex-forming segment of the activator-RNA comprises a nucleotide sequence that has at least about 70%, at least about 80%, at least about 90%, at least about 95%, at least about 98%, or 100% identity with the sequence 5′-UAGCAAGUUAAAAU-3′ (SEQ ID NO:562). As another non-limiting example, the duplex-forming segment of the targeter-RNA comprises a nucleotide sequence that has at least about 70%, at least about 80%, at least about 90%, at least about 95%, at least about 98%, or 100% identity with the sequence 5′-GUUUUAGAGCUA-3′ (SEQ ID NO:679).

The present disclosure provides a composition comprising: (i) a DNA-targeting RNA, or a DNA polynucleotide encoding the same, the DNA-targeting RNA comprising: (a) a first segment comprising a nucleotide sequence that is complementary to a sequence in a target DNA; and (b) a second segment that interacts with a site-directed modifying polypeptide; and (ii) the site-directed modifying polypeptide, or a polynucleotide encoding the same, the site-directed modifying polypeptide comprising: (a) an RNA-binding portion that interacts with the DNA-targeting RNA; and (b) an activity portion that modulates transcription within the target DNA, wherein the site of modulated transcription within the target DNA is determined by the DNA-targeting RNA.

For example, in some cases, a subject composition comprises: (i) a DNA-targeting RNA, the DNA-targeting RNA comprising: (a) a first segment comprising a nucleotide sequence that is complementary to a sequence in a target DNA; and (b) a second segment that interacts with a site-directed modifying polypeptide; and (ii) the site-directed modifying polypeptide, the site-directed modifying polypeptide comprising: (a) an RNA-binding portion that interacts with the DNA-targeting RNA; and (b) an activity portion that modulates transcription within the target DNA, wherein the site of modulated transcription within the target DNA is determined by the DNA-targeting RNA.

As another example, in some cases, a subject composition comprises: (i) a DNA polynucleotide encoding a DNA-targeting RNA, the DNA-targeting RNA comprising: (a) a first segment comprising a nucleotide sequence that is complementary to a sequence in a target DNA; and (b) a second segment that interacts with a site-directed modifying polypeptide; and (ii) a polynucleotide encoding the site-directed modifying polypeptide, the site-directed modifying polypeptide comprising: (a) an RNA-binding portion that interacts with the DNA-targeting RNA; and (b) an activity portion that modulates transcription within the target DNA, wherein the site of modulated transcription within the target DNA is determined by the DNA-targeting RNA.

A subject composition can comprise, in addition to i) a subject DNA-targeting RNA, or a DNA polynucleotide encoding the same; and ii) a site-directed modifying polypeptide, or a polynucleotide encoding the same, one or more of: a salt, e.g., NaCl, MgCl2, KCl, MgSO4, etc.; a buffering agent, e.g., a Tris buffer, HEPES, MES, MES sodium salt, MOPS, TAPS, etc.; a solubilizing agent; a detergent, e.g., a non-ionic detergent such as Tween-20, etc.; a protease inhibitor; a reducing agent (e.g., dithiothreitol); and the like.

In some cases, the components of the composition are individually pure, e.g., each of the components is at least about 75%, at least about 80%, at least about 90%, at least about 95%, at least about 98%, at least about 99%, or at least 99%, pure. In some cases, the individual components of a subject composition are pure before being added to the composition.

For example, in some embodiments, a site-directed modifying polypeptide present in a subject composition is pure, e.g., at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, at least about 99%, or more than 99% pure, where “% purity” means that the site-directed modifying polypeptide is the recited percent free from other proteins (e.g., proteins other than the site-directed modifying polypeptide), other macromolecules, or contaminants that may be present during the production of the site-directed modifying polypeptide.

Kits

The present disclosure provides kits for carrying out a subject method. A subject kit can include one or more of: a site-directed modifying polypeptide; a nucleic acid comprising a nucleotide encoding a site-directed modifying polypeptide; a DNA-targeting RNA; a nucleic acid comprising a nucleotide sequence encoding a DNA-targeting RNA; an activator-RNA; a nucleic acid comprising a nucleotide sequence encoding an activator-RNA; a targeter-RNA; and a nucleic acid comprising a nucleotide sequence encoding a targeter-RNA. A site-directed modifying polypeptide; a nucleic acid comprising a nucleotide encoding a site-directed modifying polypeptide; a DNA-targeting RNA; a nucleic acid comprising a nucleotide sequence encoding a DNA-targeting RNA; an activator-RNA; a nucleic acid comprising a nucleotide sequence encoding an activator-RNA; a targeter-RNA; and a nucleic acid comprising a nucleotide sequence encoding a targeter-RNA, are described in detail above. A kit may comprise a complex that comprises two or more of: a site-directed modifying polypeptide; a nucleic acid comprising a nucleotide encoding a site-directed modifying polypeptide; a DNA-targeting RNA; a nucleic acid comprising a nucleotide sequence encoding a DNA-targeting RNA; an activator-RNA; a nucleic acid comprising a nucleotide sequence encoding an activator-RNA; a targeter-RNA; and a nucleic acid comprising a nucleotide sequence encoding a targeter-RNA.

In some embodiments, a subject kit comprises a site-directed modifying polypeptide, or a polynucleotide encoding the same. In some embodiments, the site-directed modifying polypeptide comprises: (a) an RNA-binding portion that interacts with the DNA-targeting RNA; and (b) an activity portion that modulates transcription within the target DNA, wherein the site of modulated transcription within the target DNA is determined by the DNA-targeting RNA. In some cases, the activity portion of the site-directed modifying polypeptide exhibits reduced or inactivated nuclease activity. In some cases, the site-directed modifying polypeptide is a chimeric site-directed modifying polypeptide.

In some embodiments, a subject kit comprises: a site-directed modifying polypeptide, or a polynucleotide encoding the same, and a reagent for reconstituting and/or diluting the site-directed modifying polypeptide. In other embodiments, a subject kit comprises a nucleic acid (e.g., DNA, RNA) comprising a nucleotide encoding a site-directed modifying polypeptide. In some embodiments, a subject kit comprises: a nucleic acid (e.g., DNA, RNA) comprising a nucleotide encoding a site-directed modifying polypeptide; and a reagent for reconstituting and/or diluting the site-directed modifying polypeptide.

A subject kit comprising a site-directed modifying polypeptide, or a polynucleotide encoding the same, can further include one or more additional reagents, where such additional reagents can be selected from: a buffer for introducing the site-directed modifying polypeptide into a cell; a wash buffer; a control reagent; a control expression vector or RNA polynucleotide; a reagent for in vitro production of the site-directed modifying polypeptide from DNA, and the like. In some cases, the site-directed modifying polypeptide included in a subject kit is a chimeric site-directed modifying polypeptide, as described above.

In some embodiments, a subject kit comprises a DNA-targeting RNA, or a DNA polynucleotide encoding the same, the DNA-targeting RNA comprising: (a) a first segment comprising a nucleotide sequence that is complementary to a sequence in a target DNA; and (b) a second segment that interacts with a site-directed modifying polypeptide. In some embodiments, the DNA-targeting RNA further comprises a third segment (as described above). In some embodiments, a subject kit comprises: (i) a DNA-targeting RNA, or a DNA polynucleotide encoding the same, the DNA-targeting RNA comprising: (a) a first segment comprising a nucleotide sequence that is complementary to a sequence in a target DNA; and (b) a second segment that interacts with a site-directed modifying polypeptide; and (ii) a site-directed modifying polypeptide, or a polynucleotide encoding the same, the site-directed modifying polypeptide comprising: (a) an RNA-binding portion that interacts with the DNA-targeting RNA; and (b) an activity portion that exhibits site-directed enzymatic activity, wherein the site of enzymatic activity is determined by the DNA-targeting RNA. In some embodiments, the activity portion of the site-directed modifying polypeptide does not exhibit enzymatic activity (comprises an inactivated nuclease, e.g., via mutation). In some cases, the kit comprises a DNA-targeting RNA and a site-directed modifying polypeptide. In other cases, the kit comprises: (i) a nucleic acid comprising a nucleotide sequence encoding a DNA-targeting RNA; and (ii) a nucleic acid comprising a nucleotide sequence encoding site-directed modifying polypeptide.

As another example, a subject kit can include: (i) a DNA-targeting RNA, or a DNA polynucleotide encoding the same, comprising: (a) a first segment comprising a nucleotide sequence that is complementary to a sequence in a target DNA; and (b) a second segment that interacts with a site-directed modifying polypeptide; and (ii) the site-directed modifying polypeptide, or a polynucleotide encoding the same, comprising: (a) an RNA-binding portion that interacts with the DNA-targeting RNA; and (b) an activity portion that that modulates transcription within the target DNA, wherein the site of modulated transcription within the target DNA is determined by the DNA-targeting RNA In some cases, the kit comprises: (i) a DNA-targeting RNA; and a site-directed modifying polypeptide. In other cases, the kit comprises: (i) a nucleic acid comprising a nucleotide sequence encoding a DNA-targeting RNA; and (ii) a nucleic acid comprising a nucleotide sequence encoding site-directed modifying polypeptide.

The present disclosure provides a kit comprising: (1) a recombinant expression vector comprising (i) a nucleotide sequence encoding a DNA-targeting RNA, wherein the DNA-targeting RNA comprises: (a) a first segment comprising a nucleotide sequence that is complementary to a sequence in a target DNA; and (b) a second segment that interacts with a site-directed modifying polypeptide; and (ii) a nucleotide sequence encoding the site-directed modifying polypeptide, wherein the site-directed modifying polypeptide comprises: (a) an RNA-binding portion that interacts with the DNA-targeting RNA; and (b) an activity portion that exhibits site-directed enzymatic activity, wherein the site of enzymatic activity is determined by the DNA-targeting RNA.; and (2) a reagent for reconstitution and/or dilution of the expression vector.

The present disclosure provides a kit comprising: (1) a recombinant expression vector comprising: (i) a nucleotide sequence encoding a DNA-targeting RNA, wherein the DNA-targeting RNA comprises: (a) a first segment comprising a nucleotide sequence that is complementary to a sequence in a target DNA; and (b) a second segment that interacts with a site-directed modifying polypeptide; and (ii) a nucleotide sequence encoding the site-directed modifying polypeptide, wherein the site-directed modifying polypeptide comprises: (a) an RNA-binding portion that interacts with the DNA-targeting RNA; and (b) an activity portion that modulates transcription within the target DNA, wherein the site of modulated transcription within the target DNA is determined by the DNA-targeting RNA; and (2) a reagent for reconstitution and/or dilution of the recombinant expression vector.

The present disclosure provides a kit comprising: (1) a recombinant expression vector comprising a nucleic acid comprising a nucleotide sequence that encodes a DNA targeting RNA comprising: (i) a first segment comprising a nucleotide sequence that is complementary to a sequence in a target DNA; and (ii) a second segment that interacts with a site-directed modifying polypeptide; and (2) a reagent for reconstitution and/or dilution of the recombinant expression vector. In some embodiments of this kit, the kit comprises: a recombinant expression vector comprising a nucleotide sequence that encodes a site-directed modifying polypeptide, wherein the site-directed modifying polypeptide comprises: (a) an RNA-binding portion that interacts with the DNA-targeting RNA; and (b) an activity portion that exhibits site-directed enzymatic activity, wherein the site of enzymatic activity is determined by the DNA-targeting RNA. In other embodiments of this kit, the kit comprises: a recombinant expression vector comprising a nucleotide sequence that encodes a site-directed modifying polypeptide, wherein the site-directed modifying polypeptide comprises: (a) an RNA-binding portion that interacts with the DNA-targeting RNA; and (b) an activity portion that modulates transcription within the target DNA, wherein the site of modulated transcription within the target DNA is determined by the DNA-targeting RNA.

In some embodiments of any of the above kits, the kit comprises an activator-RNA or a targeter-RNA. In some embodiments of any of the above kits, the kit comprises a single-molecule DNA-targeting RNA. In some embodiments of any of the above kits, the kit comprises two or more double-molecule or single-molecule DNA-targeting RNAs. In some embodiments of any of the above kits, a DNA-targeting RNA (e.g., including two or more DNA-targeting RNAs) can be provided as an array (e.g., an array of RNA molecules, an array of DNA molecules encoding the DNA-targeting RNA(s), etc.). Such kits can be useful, for example, for use in conjunction with the above described genetically modified host cells that comprise a subject site-directed modifying polypeptide. In some embodiments of any of the above kits, the kit further comprises a donor polynucleotide to effect the desired genetic modification. Components of a subject kit can be in separate containers; or can be combined in a single container.

Any of the above-described kits can further include one or more additional reagents, where such additional reagents can be selected from: a dilution buffer; a reconstitution solution; a wash buffer; a control reagent; a control expression vector or RNA polynucleotide; a reagent for in vitro production of the site-directed modifying polypeptide from DNA, and the like.

In addition to above-mentioned components, a subject kit can further include instructions for using the components of the kit to practice the subject methods. The instructions for practicing the subject methods are generally recorded on a suitable recording medium. For example, the instructions may be printed on a substrate, such as paper or plastic, etc. As such, the instructions may be present in the kits as a package insert, in the labeling of the container of the kit or components thereof (i.e., associated with the packaging or subpackaging) etc. In other embodiments, the instructions are present as an electronic storage data file present on a suitable computer readable storage medium, e.g. CD-ROM, diskette, flash drive, etc. In yet other embodiments, the actual instructions are not present in the kit, but means for obtaining the instructions from a remote source, e.g. via the internet, are provided. An example of this embodiment is a kit that includes a web address where the instructions can be viewed and/or from which the instructions can be downloaded. As with the instructions, this means for obtaining the instructions is recorded on a suitable substrate.

Non-Human Genetically Modified Organisms

In some embodiments, a genetically modified host cell has been genetically modified with an exogenous nucleic acid comprising a nucleotide sequence encoding a site-directed modifying polypeptide (e.g., a naturally occurring Cas9; a modified, i.e., mutated or variant, Cas9; a chimeric Cas9; etc.). If such a cell is a eukaryotic single-cell organism, then the modified cell can be considered a genetically modified organism. In some embodiments, subject non-human genetically modified organism is a Cas9 transgenic multicellular organism.

In some embodiments, a subject genetically modified non-human host cell (e.g., a cell that has been genetically modified with an exogenous nucleic acid comprising a nucleotide sequence encoding a site-directed modifying polypeptide, e.g., a naturally occurring Cas9; a modified, i.e., mutated or variant, Cas9; a chimeric Cas9; etc.) can generate a subject genetically modified non-human organism (e.g., a mouse, a fish, a frog, a fly, a worm, etc.). For example, if the genetically modified host cell is a pluripotent stem cell (i.e., PSC) or a germ cell (e.g., sperm, oocyte, etc.), an entire genetically modified organism can be derived from the genetically modified host cell. In some embodiments, the genetically modified host cell is a pluripotent stem cell (e.g., ESC, iPSC, pluripotent plant stem cell, etc.) or a germ cell (e.g., sperm cell, oocyte, etc.), either in vivo or in vitro, that can give rise to a genetically modified organism. In some embodiments the genetically modified host cell is a vertebrate PSC (e.g., ESC, iPSC, etc.) and is used to generate a genetically modified organism (e.g. by injecting a PSC into a blastocyst to produce a chimeric/mosaic animal, which could then be mated to generate non-chimeric/non-mosaic genetically modified organisms; grafting in the case of plants; etc.). Any convenient method/protocol for producing a genetically modified organism, including the methods described herein, is suitable for producing a genetically modified host cell comprising an exogenous nucleic acid comprising a nucleotide sequence encoding a site-directed modifying polypeptide (e.g., a naturally occurring Cas9; a modified, i.e., mutated or variant, Cas9; a chimeric Cas9; etc.). Methods of producing genetically modified organisms are known in the art. For example, see Cho et al., Curr Protoc Cell Biol. 2009 March; Chapter 19:Unit 19.11: Generation of transgenic mice; Gama et al., Brain Struct Funct. 2010 March; 214(2-3):91-109. Epub 2009 Nov. 25: Animal transgenesis: an overview; Husaini et al., GM Crops. 2011 June-December; 2(3):150-62. Epub 2011 Jun. 1: Approaches for gene targeting and targeted gene expression in plants.

In some embodiments, a genetically modified organism comprises a target cell for methods of the invention, and thus can be considered a source for target cells. For example, if a genetically modified cell comprising an exogenous nucleic acid comprising a nucleotide sequence encoding a site-directed modifying polypeptide (e.g., a naturally occurring Cas9; a modified, i.e., mutated or variant, Cas9; a chimeric Cas9; etc.) is used to generate a genetically modified organism, then the cells of the genetically modified organism comprise the exogenous nucleic acid comprising a nucleotide sequence encoding a site-directed modifying polypeptide (e.g., a naturally occurring Cas9; a modified, i.e., mutated or variant, Cas9; a chimeric Cas9; etc.). In some such embodiments, the DNA of a cell or cells of the genetically modified organism can be targeted for modification by introducing into the cell or cells a DNA-targeting RNA (or a DNA encoding a DNA-targeting RNA) and optionally a donor nucleic acid. For example, the introduction of a DNA-targeting RNA (or a DNA encoding a DNA-targeting RNA) into a subset of cells (e.g., brain cells, intestinal cells, kidney cells, lung cells, blood cells, etc.) of the genetically modified organism can target the DNA of such cells for modification, the genomic location of which will depend on the DNA-targeting sequence of the introduced DNA-targeting RNA.

In some embodiments, a genetically modified organism is a source of target cells for methods of the invention. For example, a genetically modified organism comprising cells that are genetically modified with an exogenous nucleic acid comprising a nucleotide sequence encoding a site-directed modifying polypeptide (e.g., a naturally occurring Cas9; a modified, i.e., mutated or variant, Cas9; a chimeric Cas9; etc.) can provide a source of genetically modified cells, for example PSCs (e.g., ESCs, iPSCs, sperm, oocytes, etc.), neurons, progenitor cells, cardiomyocytes, etc.

In some embodiments, a genetically modified cell is a PSC comprising an exogenous nucleic acid comprising a nucleotide sequence encoding a site-directed modifying polypeptide (e.g., a naturally occurring Cas9; a modified, i.e., mutated or variant, Cas9; a chimeric Cas9; etc.). As such, the PSC can be a target cell such that the DNA of the PSC can be targeted for modification by introducing into the PSC a DNA-targeting RNA (or a DNA encoding a DNA-targeting RNA) and optionally a donor nucleic acid, and the genomic location of the modification will depend on the DNA-targeting sequence of the introduced DNA-targeting RNA. Thus, in some embodiments, the methods described herein can be used to modify the DNA (e.g., delete and/or replace any desired genomic location) of PSCs derived from a subject genetically modified organism. Such modified PSCs can then be used to generate organisms having both (i) an exogenous nucleic acid comprising a nucleotide sequence encoding a site-directed modifying polypeptide (e.g., a naturally occurring Cas9; a modified, i.e., mutated or variant, Cas9; a chimeric Cas9; etc.) and (ii) a DNA modification that was introduced into the PSC.

An exogenous nucleic acid comprising a nucleotide sequence encoding a site-directed modifying polypeptide (e.g., a naturally occurring Cas9; a modified, i.e., mutated or variant, Cas9; a chimeric Cas9; etc.) can be under the control of (i.e., operably linked to) an unknown promoter (e.g., when the nucleic acid randomly integrates into a host cell genome) or can be under the control of (i.e., operably linked to) a known promoter. Suitable known promoters can be any known promoter and include constitutively active promoters (e.g., CMV promoter), inducible promoters (e.g., heat shock promoter, Tetracycline-regulated promoter, Steroid-regulated promoter, Metal-regulated promoter, estrogen receptor-regulated promoter, etc.), spatially restricted and/or temporally restricted promoters (e.g., a tissue specific promoter, a cell type specific promoter, etc.), etc.

A subject genetically modified organism (e.g. an organism whose cells comprise a nucleotide sequence encoding a site-directed modifying polypeptide, e.g., a naturally occurring Cas9; a modified, i.e., mutated or variant, Cas9; a chimeric Cas9; etc.) can be any organism including for example, a plant; algae; an invertebrate (e.g., a cnidarian, an echinoderm, a worm, a fly, etc.); a vertebrate (e.g., a fish (e.g., zebrafish, puffer fish, gold fish, etc.), an amphibian (e.g., salamander, frog, etc.), a reptile, a bird, a mammal, etc.); an ungulate (e.g., a goat, a pig, a sheep, a cow, etc.); a rodent (e.g., a mouse, a rat, a hamster, a guinea pig); a lagomorpha (e.g., a rabbit); etc.

In some cases, the site-directed modifying polypeptide comprises an amino acid sequence having at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 99%, or 100%, amino acid sequence identity to amino acids 7-166 or 731-1003 of the Cas9/Csn1 amino acid sequence depicted in FIG. 3A and FIG. 3B, or to the corresponding portions in any of the amino acid sequences set forth as SEQ ID NOs:1-256 and 795-1346.

Transgenic Non-Human Animals

As described above, in some embodiments, a subject nucleic acid (e.g., a nucleotide sequence encoding a site-directed modifying polypeptide, e.g., a naturally occurring Cas9; a modified, i.e., mutated or variant, Cas9; a chimeric Cas9; etc.) or a subject recombinant expression vector is used as a transgene to generate a transgenic animal that produces a site-directed modifying polypeptide. Thus, the present invention further provides a transgenic non-human animal, which animal comprises a transgene comprising a subject nucleic acid comprising a nucleotide sequence encoding a site-directed modifying polypeptide, e.g., a naturally occurring Cas9; a modified, i.e., mutated or variant, Cas9; a chimeric Cas9; etc., as described above. In some embodiments, the genome of the transgenic non-human animal comprises a subject nucleotide sequence encoding a site-directed modifying polypeptide. In some embodiments, the transgenic non-human animal is homozygous for the genetic modification. In some embodiments, the transgenic non-human animal is heterozygous for the genetic modification. In some embodiments, the transgenic non-human animal is a vertebrate, for example, a fish (e.g., zebra fish, gold fish, puffer fish, cave fish, etc.), an amphibian (frog, salamander, etc.), a bird (e.g., chicken, turkey, etc.), a reptile (e.g., snake, lizard, etc.), a mammal (e.g., an ungulate, e.g., a pig, a cow, a goat, a sheep, etc.; a lagomorph (e.g., a rabbit); a rodent (e.g., a rat, a mouse); a non-human primate; etc.), etc.

An exogenous nucleic acid comprising a nucleotide sequence encoding a site-directed modifying polypeptide (e.g., a naturally occurring Cas9; a modified, i.e., mutated or variant, Cas9; a chimeric Cas9; etc.) can be under the control of (i.e., operably linked to) an unknown promoter (e.g., when the nucleic acid randomly integrates into a host cell genome) or can be under the control of (i.e., operably linked to) a known promoter. Suitable known promoters can be any known promoter and include constitutively active promoters (e.g., CMV promoter), inducible promoters (e.g., heat shock promoter, Tetracycline-regulated promoter, Steroid-regulated promoter, Metal-regulated promoter, estrogen receptor-regulated promoter, etc.), spatially restricted and/or temporally restricted promoters (e.g., a tissue specific promoter, a cell type specific promoter, etc.), etc.

In some cases, the site-directed modifying polypeptide comprises an amino acid sequence having at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 99%, or 100%, amino acid sequence identity to amino acids 7-166 or 731-1003 of the Cas9/Csn1 amino acid sequence depicted in FIG. 3A and FIG. 3B, or to the corresponding portions in any of the amino acid sequences set forth as SEQ ID NOs:1-256 and 795-1346.

Transgenic Plants

As described above, in some embodiments, a subject nucleic acid (e.g., a nucleotide sequence encoding a site-directed modifying polypeptide, e.g., a naturally occurring Cas9; a modified, i.e., mutated or variant, Cas9; a chimeric Cas9; etc.) or a subject recombinant expression vector is used as a transgene to generate a transgenic plant that produces a site-directed modifying polypeptide. Thus, the present invention further provides a transgenic plant, which plant comprises a transgene comprising a subject nucleic acid comprising a nucleotide sequence encoding site-directed modifying polypeptide, e.g., a naturally occurring Cas9; a modified, i.e., mutated or variant, Cas9; a chimeric Cas9; etc., as described above. In some embodiments, the genome of the transgenic plant comprises a subject nucleic acid. In some embodiments, the transgenic plant is homozygous for the genetic modification. In some embodiments, the transgenic plant is heterozygous for the genetic modification.

Methods of introducing exogenous nucleic acids into plant cells are well known in the art. Such plant cells are considered “transformed,” as defined above. Suitable methods include viral infection (such as double stranded DNA viruses), transfection, conjugation, protoplast fusion, electroporation, particle gun technology, calcium phosphate precipitation, direct microinjection, silicon carbide whiskers technology, Agrobacterium-mediated transformation and the like. The choice of method is generally dependent on the type of cell being transformed and the circumstances under which the transformation is taking place (i.e. in vitro, ex vivo, or in vivo).

Transformation methods based upon the soil bacterium Agrobacterium tumefaciens are particularly useful for introducing an exogenous nucleic acid molecule into a vascular plant. The wild type form of Agrobacterium contains a Ti (tumor-inducing) plasmid that directs production of tumorigenic crown gall growth on host plants. Transfer of the tumor-inducing T-DNA region of the Ti plasmid to a plant genome requires the Ti plasmid-encoded virulence genes as well as T-DNA borders, which are a set of direct DNA repeats that delineate the region to be transferred. An Agrobacterium-based vector is a modified form of a Ti plasmid, in which the tumor inducing functions are replaced by the nucleic acid sequence of interest to be introduced into the plant host.

Agrobacterium-mediated transformation generally employs cointegrate vectors or binary vector systems, in which the components of the Ti plasmid are divided between a helper vector, which resides permanently in the Agrobacterium host and carries the virulence genes, and a shuttle vector, which contains the gene of interest bounded by T-DNA sequences. A variety of binary vectors are well known in the art and are commercially available, for example, from Clontech (Palo Alto, Calif.). Methods of coculturing Agrobacterium with cultured plant cells or wounded tissue such as leaf tissue, root explants, hypocotyledons, stem pieces or tubers, for example, also are well known in the art. See., e.g., Glick and Thompson, (eds.), Methods in Plant Molecular Biology and Biotechnology, Boca Raton, Fla.: CRC Press (1993).

Microprojectile-mediated transformation also can be used to produce a subject transgenic plant. This method, first described by Klein et al. (Nature 327:70-73 (1987)), relies on microprojectiles such as gold or tungsten that are coated with the desired nucleic acid molecule by precipitation with calcium chloride, spermidine or polyethylene glycol. The microprojectile particles are accelerated at high speed into an angiosperm tissue using a device such as the BIOLISTIC PD-1000 (Biorad; Hercules Calif.).

A subject nucleic acid may be introduced into a plant in a manner such that the nucleic acid is able to enter a plant cell(s), e.g., via an in vivo or ex vivo protocol. By “in vivo,” it is meant in the nucleic acid is administered to a living body of a plant e.g. infiltration. By “ex vivo” it is meant that cells or explants are modified outside of the plant, and then such cells or organs are regenerated to a plant. A number of vectors suitable for stable transformation of plant cells or for the establishment of transgenic plants have been described, including those described in Weissbach and Weissbach, (1989) Methods for Plant Molecular Biology Academic Press, and Gelvin et al., (1990) Plant Molecular Biology Manual, Kluwer Academic Publishers. Specific examples include those derived from a Ti plasmid of Agrobacterium tumefaciens, as well as those disclosed by Herrera-Estrella et al. (1983) Nature 303: 209, Bevan (1984) Nucl Acid Res. 12: 8711-8721, Klee (1985) Bio/Technolo 3: 637-642. Alternatively, non-Ti vectors can be used to transfer the DNA into plants and cells by using free DNA delivery techniques. By using these methods transgenic plants such as wheat, rice (Christou (1991) Bio/Technology 9:957-9 and 4462) and corn (Gordon-Kamm (1990) Plant Cell 2: 603-618) can be produced. An immature embryo can also be a good target tissue for monocots for direct DNA delivery techniques by using the particle gun (Weeks et al. (1993) Plant Physiol 102: 1077-1084; Vasil (1993) Bio/Technolo 10: 667-674; Wan and Lemeaux (1994) Plant Physiol 104: 37-48 and for Agrobacterium-mediated DNA transfer (Ishida et al. (1996) Nature Biotech 14: 745-750). Exemplary methods for introduction of DNA into chloroplasts are biolistic bombardment, polyethylene glycol transformation of protoplasts, and microinjection (Danieli et al Nat. Biotechnol 16:345-348, 1998; Staub et al Nat. Biotechnol 18: 333-338, 2000; O'Neill et al Plant J. 3:729-738, 1993; Knoblauch et al Nat. Biotechnol 17: 906-909; U.S. Pat. Nos. 5,451,513, 5,545,817, 5,545,818, and 5,576,198; in Intl. Application No. WO 95/16783; and in Boynton et al., Methods in Enzymology 217: 510-536 (1993), Svab et al., Proc. Natl. Acad. Sci. USA 90: 913-917 (1993), and McBride et al., Proc. Nati. Acad. Sci. USA 91: 7301-7305 (1994)). Any vector suitable for the methods of biolistic bombardment, polyethylene glycol transformation of protoplasts and microinjection will be suitable as a targeting vector for chloroplast transformation. Any double stranded DNA vector may be used as a transformation vector, especially when the method of introduction does not utilize Agrobacterium.

Plants which can be genetically modified include grains, forage crops, fruits, vegetables, oil seed crops, palms, forestry, and vines. Specific examples of plants which can be modified follow: maize, banana, peanut, field peas, sunflower, tomato, canola, tobacco, wheat, barley, oats, potato, soybeans, cotton, carnations, sorghum, lupin and rice.

Also provided by the subject invention are transformed plant cells, tissues, plants and products that contain the transformed plant cells. A feature of the subject transformed cells, and tissues and products that include the same is the presence of a subject nucleic acid integrated into the genome, and production by plant cells of a site-directed modifying polypeptide, e.g., a naturally occurring Cas9; a modified, i.e., mutated or variant, Cas9; a chimeric Cas9; etc. Recombinant plant cells of the present invention are useful as populations of recombinant cells, or as a tissue, seed, whole plant, stem, fruit, leaf, root, flower, stem, tuber, grain, animal feed, a field of plants, and the like.

A nucleic acid comprising a nucleotide sequence encoding a site-directed modifying polypeptide (e.g., a naturally occurring Cas9; a modified, i.e., mutated or variant, Cas9; a chimeric Cas9; etc.) can be under the control of (i.e., operably linked to) an unknown promoter (e.g., when the nucleic acid randomly integrates into a host cell genome) or can be under the control of (i.e., operably linked to) a known promoter. Suitable known promoters can be any known promoter and include constitutively active promoters, inducible promoters, spatially restricted and/or temporally restricted promoters, etc.

In some cases, the site-directed modifying polypeptide comprises an amino acid sequence having at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 99%, or 100%, amino acid sequence identity to amino acids 7-166 or 731-1003 of the Cas9/Csn1 amino acid sequence depicted in FIG. 3A and FIG. 3B, or to the corresponding portions in any of the amino acid sequences set forth as SEQ ID NOs:1-256 and 795-1346.

Also provided by the subject invention is reproductive material of a subject transgenic plant, where reproductive material includes seeds, progeny plants and clonal material.

Definitions—Part II

The term “naturally-occurring” or “unmodified” as used herein as applied to a nucleic acid, a polypeptide, a cell, or an organism, refers to a nucleic acid, polypeptide, cell, or organism that is found in nature. For example, a polypeptide or polynucleotide sequence that is present in an organism (including viruses) that can be isolated from a source in nature and which has not been intentionally modified by a human in the laboratory is naturally occurring.

“Heterologous,” as used herein, means a nucleotide or polypeptide sequence that is not found in the native nucleic acid or protein, respectively. For example, in a fusion variant Cas9 site-directed polypeptide, a variant Cas9 site-directed polypeptide may be fused to a heterologous polypeptide (i.e. a polypeptide other than Cas9). The heterologous polypeptide may exhibit an activity (e.g., enzymatic activity) that will also be exhibited by the fusion variant Cas9 site-directed polypeptide. A heterologous nucleic acid sequence may be linked to a variant Cas9 site-directed polypeptide (e.g., by genetic engineering) to generate a nucleotide sequence encoding a fusion variant Cas9 site-directed polypeptide.

The term “chimeric polypeptide” refers to a polypeptide which is not naturally occurring, e.g., is made by the artificial combination of two otherwise separated segments of amino sequence through human intervention. Thus, a chimeric polypeptide is also the result of human intervention. Thus, a polypeptide that comprises a chimeric amino acid sequence is a chimeric polypeptide.

By “site-directed polypeptide” or “RNA-binding site-directed polypeptide” or “RNA-binding site-directed polypeptide” it is meant a polypeptide that binds RNA and is targeted to a specific DNA sequence. A site-directed polypeptide as described herein is targeted to a specific DNA sequence by the RNA molecule to which it is bound. The RNA molecule comprises a sequence that is complementary to a target sequence within the target DNA, thus targeting the bound polypeptide to a specific location within the target DNA (the target sequence).

In some embodiments, a subject nucleic acid (e.g., a DNA-targeting RNA, a nucleic acid comprising a nucleotide sequence encoding a DNA-targeting RNA; a nucleic acid encoding a site-directed polypeptide; etc.) comprises a modification or sequence that provides for an additional desirable feature (e.g., modified or regulated stability; subcellular targeting; tracking, e.g., a fluorescent label; a binding site for a protein or protein complex; etc.). Non-limiting examples include: a 5′ cap (e.g., a 7-methylguanylate cap (m7G)); a 3′ polyadenylated tail (i.e., a 3′ poly(A) tail); a riboswitch sequence (e.g., to allow for regulated stability and/or regulated accessibility by proteins and/or protein complexes); a modification or sequence that targets the RNA to a subcellular location (e.g., nucleus, mitochondria, chloroplasts, and the like); a modification or sequence that provides for tracking (e.g., direct conjugation to a fluorescent molecule, conjugation to a moiety that facilitates fluorescent detection, a sequence that allows for fluorescent detection, etc.); a modification or sequence that provides a binding site for proteins (e.g., proteins that act on DNA, including transcriptional activators, transcriptional repressors, DNA methyltransferases, DNA demethylases, histone acetyltransferases, histone deacetylases, and the like); and combinations thereof.

In some embodiments, a DNA-targeting RNA comprises an additional segment at either the 5′ or 3′ end that provides for any of the features described above. For example, a suitable third segment can comprise a 5′ cap (e.g., a 7-methylguanylate cap (m7G)); a 3′ polyadenylated tail (i.e., a 3′ poly(A) tail); a riboswitch sequence (e.g., to allow for regulated stability and/or regulated accessibility by proteins and protein complexes); a sequence that targets the RNA to a subcellular location (e.g., nucleus, mitochondria, chloroplasts, and the like); a modification or sequence that provides for tracking (e.g., direct conjugation to a fluorescent molecule, conjugation to a moiety that facilitates fluorescent detection, a sequence that allows for fluorescent detection, etc.); a modification or sequence that provides a binding site for proteins (e.g., proteins that act on DNA, including transcriptional activators, transcriptional repressors, DNA methyltransferases, DNA demethylases, histone acetyltransferases, histone deacetylases, and the like); and combinations thereof.

A subject DNA-targeting RNA and a subject site-directed polypeptide form a complex (i.e., bind via non-covalent interactions). The DNA-targeting RNA provides target specificity to the complex by comprising a nucleotide sequence that is complementary to a sequence of a target DNA. The site-directed polypeptide of the complex provides the site-specific activity. In other words, the site-directed polypeptide is guided to a target DNA sequence (e.g. a target sequence in a chromosomal nucleic acid; a target sequence in an extrachromosomal nucleic acid, e.g. an episomal nucleic acid, a minicircle, etc.; a target sequence in a mitochondrial nucleic acid; a target sequence in a chloroplast nucleic acid; a target sequence in a plasmid; etc.) by virtue of its association with the protein-binding segment of the DNA-targeting RNA.

In some embodiments, a subject DNA-targeting RNA comprises two separate RNA molecules (RNA polynucleotides) and is referred to herein as a “double-molecule DNA-targeting RNA” or a “two-molecule DNA-targeting RNA.” In other embodiments, a subject DNA-targeting RNA is a single RNA molecule (single RNA polynucleotide) and is referred to herein as a “single-molecule DNA-targeting RNA.”. If not otherwise specified, the term “DNA-targeting RNA” is inclusive, referring to both single-molecule DNA-targeting RNAs and double-molecule DNA-targeting RNAs.

A subject two-molecule DNA-targeting RNA comprises two separate RNA molecules (a “targeter-RNA” and an “activator-RNA”). Each of the two RNA molecules of a subject two-molecule DNA-targeting RNA comprises a stretch of nucleotides that are complementary to one another such that the complementary nucleotides of the two RNA molecules hybridize to form the double stranded RNA duplex of the protein-binding segment.

A subject single-molecule DNA-targeting RNA comprises two stretches of nucleotides (a targeter-RNA and an activator-RNA) that are complementary to one another, are covalently linked by intervening nucleotides (“linkers” or “linker nucleotides”), and hybridize to form the double stranded RNA duplex (dsRNA duplex) of the protein-binding segment, thus resulting in a stem-loop structure. The targeter-RNA and the activator-RNA can be covalently linked via the 3′ end of the targeter-RNA and the 5′ end of the activator-RNA. Alternatively, targeter-RNA and the activator-RNA can be covalently linked via the 5′ end of the targeter-RNA and the 3′ end of the activator-RNA.

An exemplary two-molecule DNA-targeting RNA comprises a crRNA-like (“CRISPR RNA” or “targeter-RNA” or “crRNA” or “crRNA repeat”) molecule and a corresponding tracrRNA-like (“trans-acting CRISPR RNA” or “activator-RNA” or “tracrRNA”) molecule. A crRNA-like molecule (targeter-RNA) comprises both the DNA-targeting segment (single stranded) of the DNA-targeting RNA and a stretch (“duplex-forming segment”) of nucleotides that forms one half of the dsRNA duplex of the protein-binding segment of the DNA-targeting RNA. A corresponding tracrRNA-like molecule (activator-RNA) comprises a stretch of nucleotides (duplex-forming segment) that forms the other half of the dsRNA duplex of the protein-binding segment of the DNA-targeting RNA. In other words, a stretch of nucleotides of a crRNA-like molecule are complementary to and hybridize with a stretch of nucleotides of a tracrRNA-like molecule to form the dsRNA duplex of the protein-binding domain of the DNA-targeting RNA. As such, each crRNA-like molecule can be said to have a corresponding tracrRNA-like molecule. The crRNA-like molecule additionally provides the single stranded DNA-targeting segment. Thus, a crRNA-like and a tracrRNA-like molecule (as a corresponding pair) hybridize to form a DNA-targeting RNA. The exact sequence of a given crRNA or tracrRNA molecule is characteristic of the species in which the RNA molecules are found.

The term “activator-RNA” is used herein to mean a tracrRNA-like molecule of a double-molecule DNA-targeting RNA. The term “targeter-RNA” is used herein to mean a crRNA-like molecule of a double-molecule DNA-targeting RNA. The term “duplex-forming segment” is used herein to mean the stretch of nucleotides of an activator-RNA or a targeter-RNA that contributes to the formation of the dsRNA duplex by hybridizing to a stretch of nucleotides of a corresponding activator-RNA or targeter-RNA molecule. In other words, an activator-RNA comprises a duplex-forming segment that is complementary to the duplex-forming segment of the corresponding targeter-RNA. As such, an activator-RNA comprises a duplex-forming segment while a targeter-RNA comprises both a duplex-forming segment and the DNA-targeting segment of the DNA-targeting RNA. Therefore, a subject double-molecule DNA-targeting RNA can be comprised of any corresponding activator-RNA and targeter-RNA pair.

A two-molecule DNA-targeting RNA can be designed to allow for controlled (i.e., conditional) binding of a targeter-RNA with an activator-RNA. Because a two-molecule DNA-targeting RNA is not functional unless both the activator-RNA and the targeter-RNA are bound in a functional complex with dCas9, a two-molecule DNA-targeting RNA can be inducible (e.g., drug inducible) by rendering the binding between the activator-RNA and the targeter-RNA to be inducible. As one non-limiting example, RNA aptamers can be used to regulate (i.e., control) the binding of the activator-RNA with the targeter-RNA. Accordingly, the activator-RNA and/or the targeter-RNA can comprise an RNA aptamer sequence.

RNA aptamers are known in the art and are generally a synthetic version of a riboswitch. The terms “RNA aptamer” and “riboswitch” are used interchangeably herein to encompass both synthetic and natural nucleic acid sequences that provide for inducible regulation of the structure (and therefore the availability of specific sequences) of the RNA molecule of which they are part. RNA aptamers usually comprise a sequence that folds into a particular structure (e.g., a hairpin), which specifically binds a particular drug (e.g., a small molecule). Binding of the drug causes a structural change in the folding of the RNA, which changes a feature of the nucleic acid of which the aptamer is a part. As non-limiting examples: (i) an activator-RNA with an aptamer may not be able to bind to the cognate targeter-RNA unless the aptamer is bound by the appropriate drug; (ii) a targeter-RNA with an aptamer may not be able to bind to the cognate activator-RNA unless the aptamer is bound by the appropriate drug; and (iii) a targeter-RNA and an activator-RNA, each comprising a different aptamer that binds a different drug, may not be able to bind to each other unless both drugs are present. As illustrated by these examples, a two-molecule DNA-targeting RNA can be designed to be inducible.

Examples of aptamers and riboswitches can be found, for example, in: Nakamura et al., Genes Cells. 2012 May; 17(5):344-64; Vavalle et al., Future Cardiol. 2012 May; 8(3):371-82; Citartan et al., Biosens Bioelectron. 2012 Apr. 15; 34(1):1-11; and Liberman et al., Wiley Interdiscip Rev RNA. 2012 May-June; 3(3):369-84; all of which are herein incorporated by reference in their entirety.

Non-limiting examples of nucleotide sequences that can be included in a two-molecule DNA-targeting RNA include targeter RNAs (e.g., SEQ ID NOs:566-567) that can pair with the duplex forming segment of any one of the activator RNAs set forth in SEQ ID NOs:671-678.

An exemplary single-molecule DNA-targeting RNA comprises two complementary stretches of nucleotides that hybridize to form a dsRNA duplex. In some embodiments, one of the two complementary stretches of nucleotides of the single-molecule DNA-targeting RNA (or the DNA encoding the stretch) is at least about 60% identical to one of the activator-RNA (tracrRNA) sequences set forth in SEQ ID NOs:431-562 over a stretch of at least 8 contiguous nucleotides. For example, one of the two complementary stretches of nucleotides of the single-molecule DNA-targeting RNA (or the DNA encoding the stretch) is at least about 65% identical, at least about 70% identical, at least about 75% identical, at least about 80% identical, at least about 85% identical, at least about 90% identical, at least about 95% identical, at least about 98% identical, at least about 99% identical or 100% identical to one of the tracrRNA sequences set forth in SEQ ID NOs:431-562 over a stretch of at least 8 contiguous nucleotides.

In some embodiments, one of the two complementary stretches of nucleotides of the single-molecule DNA-targeting RNA (or the DNA encoding the stretch) is at least about 60% identical to one of the targeter-RNA (crRNA) sequences set forth in SEQ ID NOs:563-679 over a stretch of at least 8 contiguous nucleotides. For example, one of the two complementary stretches of nucleotides of the single-molecule DNA-targeting RNA (or the DNA encoding the stretch) is at least about 65% identical, at least about 70% identical, at least about 75% identical, at least about 80% identical, at least about 85% identical, at least about 90% identical, at least about 95% identical, at least about 98% identical, at least about 99% identical or 100% identical to one of the crRNA sequences set forth in SEQ ID NOs:563-679 over a stretch of at least 8 contiguous nucleotides.

As above, a “host cell,” as used herein, denotes an in vivo or in vitro eukaryotic cell, a prokaryotic cell (e.g., bacterial or archaeal cell), or a cell from a multicellular organism (e.g., a cell line) cultured as a unicellular entity, which eukaryotic or prokaryotic cells can be, or have been, used as recipients for a nucleic acid, and include the progeny of the original cell which has been transformed by the nucleic acid. It is understood that the progeny of a single cell may not necessarily be completely identical in morphology or in genomic or total DNA complement as the original parent, due to natural, accidental, or deliberate mutation. A “recombinant host cell” (also referred to as a “genetically modified host cell”) is a host cell into which has been introduced a heterologous nucleic acid, e.g., an expression vector. For example, a subject bacterial host cell is a genetically modified bacterial host cell by virtue of introduction into a suitable bacterial host cell of an exogenous nucleic acid (e.g., a plasmid or recombinant expression vector) and a subject eukaryotic host cell is a genetically modified eukaryotic host cell (e.g., a mammalian germ cell), by virtue of introduction into a suitable eukaryotic host cell of an exogenous nucleic acid.

Definitions provided in “Definitions—Part I” are also applicable to the instant section; see “Definitions—Part I” for additional clarification of terms.

Before the present invention is further described, it is to be understood that this invention is not limited to particular embodiments described, as such may, of course, vary. It is also to be understood that the terminology used herein is for the purpose of describing particular embodiments only, and is not intended to be limiting, since the scope of the present invention will be limited only by the appended claims.

Where a range of values is provided, it is understood that each intervening value, to the tenth of the unit of the lower limit unless the context clearly dictates otherwise, between the upper and lower limit of that range and any other stated or intervening value in that stated range, is encompassed within the invention. The upper and lower limits of these smaller ranges may independently be included in the smaller ranges, and are also encompassed within the invention, subject to any specifically excluded limit in the stated range. Where the stated range includes one or both of the limits, ranges excluding either or both of those included limits are also included in the invention.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Although any methods and materials similar or equivalent to those described herein can also be used in the practice or testing of the present invention, the preferred methods and materials are now described. All publications mentioned herein are incorporated herein by reference to disclose and describe the methods and/or materials in connection with which the publications are cited.

It must be noted that as used herein and in the appended claims, the singular forms “a,”“an,” and “the” include plural referents unless the context clearly dictates otherwise. Thus, for example, reference to “an enzymatically inactive Cas9 polypeptide” includes a plurality of such polypeptides and reference to “the target nucleic acid” includes reference to one or more target nucleic acids and equivalents thereof known to those skilled in the art, and so forth. It is further noted that the claims may be drafted to exclude any optional element. As such, this statement is intended to serve as antecedent basis for use of such exclusive terminology as “solely,”“only” and the like in connection with the recitation of claim elements, or use of a “negative” limitation.

It is appreciated that certain features of the invention, which are, for clarity, described in the context of separate embodiments, may also be provided in combination in a single embodiment. Conversely, various features of the invention, which are, for brevity, described in the context of a single embodiment, may also be provided separately or in any suitable sub-combination. All combinations of the embodiments pertaining to the invention are specifically embraced by the present invention and are disclosed herein just as if each and every combination was individually and explicitly disclosed. In addition, all sub-combinations of the various embodiments and elements thereof are also specifically embraced by the present invention and are disclosed herein just as if each and every such sub-combination was individually and explicitly disclosed herein.

The publications discussed herein are provided solely for their disclosure prior to the filing date of the present application. Nothing herein is to be construed as an admission that the present invention is not entitled to antedate such publication by virtue of prior invention. Further, the dates of publication provided may be different from the actual publication dates which may need to be independently confirmed.

Detailed Description—Part II

The present disclosure provides methods of modulating transcription of a target nucleic acid in a host cell. The methods generally involve contacting the target nucleic acid with an enzymatically inactive Cas9 polypeptide and a single-guide RNA. The methods are useful in a variety of applications, which are also provided.

A transcriptional modulation method of the present disclosure overcomes some of the drawbacks of methods involving RNAi. A transcriptional modulation method of the present disclosure finds use in a wide variety of applications, including research applications, drug discovery (e.g., high throughput screening), target validation, industrial applications (e.g., crop engineering; microbial engineering, etc.), diagnostic applications, therapeutic applications, and imaging techniques.

Methods of Modulating Transcription

The present disclosure provides a method of selectively modulating transcription of a target DNA in a host cell. The method generally involves: a) introducing into the host cell: i) a DNA-targeting RNA, or a nucleic acid comprising a nucleotide sequence encoding the DNA-targeting RNA; and ii) a variant Cas9 site-directed polypeptide (“variant Cas9 polypeptide”), or a nucleic acid comprising a nucleotide sequence encoding the variant Cas9 polypeptide, where the variant Cas9 polypeptide exhibits reduced endodeoxyribonuclease activity.

The DNA-targeting RNA (also referred to herein as “crRNA”; or “guide RNA”; or “gRNA”) comprises: i) a first segment comprising a nucleotide sequence that is complementary to a target sequence in a target DNA; ii) a second segment that interacts with a site-directed polypeptide; and iii) a transcriptional terminator. The first segment, comprising a nucleotide sequence that is complementary to a target sequence in a target DNA, is referred to herein as a “targeting segment”. The second segment, which interacts with a site-directed polypeptide, is also referred to herein as a “protein-binding sequence” or “dCas9-binding hairpin,” or “dCas9 handle.” By “segment” it is meant a segment/section/region of a molecule, e.g., a contiguous stretch of nucleotides in an RNA. The definition of “segment,” unless otherwise specifically defined in a particular context, is not limited to a specific number of total base pairs, and may include regions of RNA molecules that are of any total length and may or may not include regions with complementarity to other molecules. A DNA-targeting RNA according to the present disclosure can be a single RNA molecule (single RNA polynucleotide), which can be referred to herein as a “single-molecule DNA-targeting RNA,” a “single-guide RNA,” or an “sgRNA.” A DNA-targeting RNA according to the present disclosure can comprise two RNA molecules. The term “DNA-targeting RNA” or “gRNA” is inclusive, referring both to two-molecule DNA-targeting RNAs and to single-molecule DNA-targeting RNAs (i.e., sgRNAs).

The variant Cas9 site-directed polypeptide comprises: i) an RNA-binding portion that interacts with the DNA-targeting RNA; and ii) an activity portion that exhibits reduced endodeoxyribonuclease activity.

The DNA-targeting RNA and the variant Cas9 polypeptide form a complex in the host cell; the complex selectively modulates transcription of a target DNA in the host cell.

In some cases, a transcription modulation method of the present disclosure provides for selective modulation (e.g., reduction or increase) of a target nucleic acid in a host cell. For example, “selective” reduction of transcription of a target nucleic acid reduces transcription of the target nucleic acid by at least about 10%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 90%, or greater than 90%, compared to the level of transcription of the target nucleic acid in the absence of a DNA-targeting RNA/variant Cas9 polypeptide complex. Selective reduction of transcription of a target nucleic acid reduces transcription of the target nucleic acid, but does not substantially reduce transcription of a non-target nucleic acid, e.g., transcription of a non-target nucleic acid is reduced, if at all, by less than 10% compared to the level of transcription of the non-target nucleic acid in the absence of the DNA-targeting RNA/variant Cas9 polypeptide complex.

Increased Transcription

“Selective” increased transcription of a target DNA can increase transcription of the target DNA by at least about 1.1 fold (e.g., at least about 1.2 fold, at least about 1.3 fold, at least about 1.4 fold, at least about 1.5 fold, at least about 1.6 fold, at least about 1.7 fold, at least about 1.8 fold, at least about 1.9 fold, at least about 2 fold, at least about 2.5 fold, at least about 3 fold, at least about 3.5 fold, at least about 4 fold, at least about 4.5 fold, at least about 5 fold, at least about 6 fold, at least about 7 fold, at least about 8 fold, at least about 9 fold, at least about 10 fold, at least about 12 fold, at least about 15 fold, or at least about 20-fold) compared to the level of transcription of the target DNA in the absence of a DNA-targeting RNA/variant Cas9 polypeptide complex. Selective increase of transcription of a target DNA increases transcription of the target DNA, but does not substantially increase transcription of a non-target DNA, e.g., transcription of a non-target DNA is increased, if at all, by less than about 5-fold (e.g., less than about 4-fold, less than about 3-fold, less than about 2-fold, less than about 1.8-fold, less than about 1.6-fold, less than about 1.4-fold, less than about 1.2-fold, or less than about 1.1-fold) compared to the level of transcription of the non-targeted DNA in the absence of the DNA-targeting RNA/variant Cas9 polypeptide complex.

As a non-limiting example, increased can be achieved by fusing dCas9 to a heterologous sequence. Suitable fusion partners include, but are not limited to, a polypeptide that provides an activity that indirectly increases transcription by acting directly on the target DNA or on a polypeptide (e.g., a histone or other DNA-binding protein) associated with the target DNA. Suitable fusion partners include, but are not limited to, a polypeptide that provides for methyltransferase activity, demethylase activity, acetyltransferase activity, deacetylase activity, kinase activity, phosphatase activity, ubiquitin ligase activity, deubiquitinating activity, adenylation activity, deadenylation activity, SUMOylating activity, deSUMOylating activity, ribosylation activity, deribosylation activity, myristoylation activity, or demyristoylation activity.

Additional suitable fusion partners include, but are not limited to, a polypeptide that directly provides for increased transcription of the target nucleic acid (e.g., a transcription activator or a fragment thereof, a protein or fragment thereof that recruits a transcription activator, a small molecule/drug-responsive transcription regulator, etc.).

A non-limiting example of a subject method using a dCas9 fusion protein to increase transcription in a prokaryote includes a modification of the bacterial one-hybrid (B1H) or two-hybrid (B2H) system. In the B1H system, a DNA binding domain (BD) is fused to a bacterial transcription activation domain (AD, e.g., the alpha subunit of the Escherichia coli RNA polymerase (RNAPα)). Thus, a subject dCas9 can be fused to a heterologous sequence comprising an AD. When the subject dCas9 fusion protein arrives at the upstream region of a promoter (targeted there by the DNA-targeting RNA) the AD (e.g., RNAPα) of the dCas9 fusion protein recruits the RNAP holoenzyme, leading to transcription activation. In the B2H system, the BD is not directly fused to the AD; instead, their interaction is mediated by a protein-protein interaction (e.g., GAL11P-GAL4 interaction). To modify such a system for use in the subject methods, dCas9 can be fused to a first protein sequence that provides for protein-protein interaction (e.g., the yeast GAL11P and/or GAL4 protein) and RNAα can be fused to a second protein sequence that completes the protein-protein interaction (e.g., GAL4 if GAL11P is fused to dCas9, GAL11P if GAL4 is fused to dCas9, etc.). The binding affinity between GAL11P and GAL4 increases the efficiency of binding and transcription firing rate.

A non-limiting example of a subject method using a dCas9 fusion protein to increase transcription in a eukaryotes includes fusion of dCas9 to an activation domain (AD) (e.g., GAL4, herpesvirus activation protein VP16 or VP64, human nuclear factor NF-κB p65 subunit, etc.). To render the system inducible, expression of the dCas9 fusion protein can be controlled by an inducible promoter (e.g., Tet-ON, Tet-OFF, etc.). The DNA-targeting RNA can be design to target known transcription response elements (e.g., promoters, enhancers, etc.), known upstream activating sequences (UAS), sequences of unknown or known function that are suspected of being able to control expression of the target DNA, etc.

Additional Fusion Partners

Non-limiting examples of fusion partners to accomplish increased or decreased transcription are listed in FIGS. 54A-54C and include transcription activator and transcription repressor domains (e.g., the Krüppel associated box (KRAB or SKD); the Mad mSIN3 interaction domain (SID); the ERF repressor domain (ERD), etc). In some such cases, the dCas9 fusion protein is targeted by the DNA-targeting RNA to a specific location (i.e., sequence) in the target DNA and exerts locus-specific regulation such as blocking RNA polymerase binding to a promoter (which selectively inhibits transcription activator function), and/or modifying the local chromatin status (e.g., when a fusion sequence is used that modifies the target DNA or modifies a polypeptide associated with the target DNA). In some cases, the changes are transient (e.g., transcription repression or activation). In some cases, the changes are inheritable (e.g., when epigenetic modifications are made to the target DNA or to proteins associated with the target DNA, e.g., nucleosomal histones).

In some embodiments, the heterologous sequence can be fused to the C-terminus of the dCas9 polypeptide. In some embodiments, the heterologous sequence can be fused to the N-terminus of the dCas9 polypeptide. In some embodiments, the heterologous sequence can be fused to an internal portion (i.e., a portion other than the N- or C-terminus) of the dCas9 polypeptide.

The biological effects of a method using a subject dCas9 fusion protein can be detected by any convenient method (e.g., gene expression assays; chromatin-based assays, e.g., Chromatin immunoPrecipitation (ChiP), Chromatin in vivo Assay (CiA), etc.; and the like).

In some cases, a subject method involves use of two or more different DNA-targeting RNAs. For example, two different DNA-targeting RNAs can be used in a single host cell, where the two different DNA-targeting RNAs target two different target sequences in the same target nucleic acid.

Thus, for example, a subject transcriptional modulation method can further comprise introducing into the host cell a second DNA-targeting RNA, or a nucleic acid comprising a nucleotide sequence encoding the second DNA-targeting RNA, where the second DNA-targeting RNA comprises: i) a first segment comprising a nucleotide sequence that is complementary to a second target sequence in the target DNA; ii) a second segment that interacts with the site-directed polypeptide; and iii) a transcriptional terminator. In some cases, use of two different DNA-targeting RNAs targeting two different targeting sequences in the same target nucleic acid provides for increased modulation (e.g., reduction or increase) in transcription of the target nucleic acid.

As another example, two different DNA-targeting RNAs can be used in a single host cell, where the two different DNA-targeting RNAs target two different target nucleic acids. Thus, for example, a subject transcriptional modulation method can further comprise introducing into the host cell a second DNA-targeting RNA, or a nucleic acid comprising a nucleotide sequence encoding the second DNA-targeting RNA, where the second DNA-targeting RNA comprises: i) a first segment comprising a nucleotide sequence that is complementary to a target sequence in at least a second target DNA; ii) a second segment that interacts with the site-directed polypeptide; and iii) a transcriptional terminator.

In some embodiments, a subject nucleic acid (e.g., a DNA-targeting RNA, e.g., a single-molecule DNA-targeting RNA, an activator-RNA, a targeter-RNA, etc.; a donor polynucleotide; a nucleic acid encoding a site-directed modifying polypeptide; etc.) comprises a modification or sequence that provides for an additional desirable feature (e.g., modified or regulated stability; subcellular targeting; tracking, e.g., a fluorescent label; a binding site for a protein or protein complex; etc.). Non-limiting examples include: a 5′ cap (e.g., a 7-methylguanylate cap (m7G)); a 3′ polyadenylated tail (i.e., a 3′ poly(A) tail); a riboswitch sequence or an aptamer sequence (e.g., to allow for regulated stability and/or regulated accessibility by proteins and/or protein complexes); a terminator sequence; a sequence that forms a dsRNA duplex (i.e., a hairpin)); a modification or sequence that targets the RNA to a subcellular location (e.g., nucleus, mitochondria, chloroplasts, and the like); a modification or sequence that provides for tracking (e.g., direct conjugation to a fluorescent molecule, conjugation to a moiety that facilitates fluorescent detection, a sequence that allows for fluorescent detection, etc.); a modification or sequence that provides a binding site for proteins (e.g., proteins that act on DNA, including transcriptional activators, transcriptional repressors, DNA methyltransferases, DNA demethylases, histone acetyltransferases, histone deacetylases, and the like); and combinations thereof.

DNA-Targeting Segment

The DNA-targeting segment (or “DNA-targeting sequence”) of a DNA-targeting RNA (“crRNA”) comprises a nucleotide sequence that is complementary to a specific sequence within a target DNA (the complementary strand of the target DNA).

In other words, the DNA-targeting segment of a subject DNA-targeting RNA interacts with a target DNA in a sequence-specific manner via hybridization (i.e., base pairing). As such, the nucleotide sequence of the DNA-targeting segment may vary and determines the location within the target DNA that the DNA-targeting RNA and the target DNA will interact. The DNA-targeting segment of a subject DNA-targeting RNA can be modified (e.g., by genetic engineering) to hybridize to any desired sequence within a target DNA.

The DNA-targeting segment can have a length of from about 12 nucleotides to about 100 nucleotides. For example, the DNA-targeting segment can have a length of from about 12 nucleotides (nt) to about 80 nt, from about 12 nt to about 50 nt, from about 12 nt to about 40 nt, from about 12 nt to about 30 nt, from about 12 nt to about 25 nt, from about 12 nt to about 20 nt, or from about 12 nt to about 19 nt. For example, the DNA-targeting segment can have a length of from about 19 nt to about 20 nt, from about 19 nt to about 25 nt, from about 19 nt to about 30 nt, from about 19 nt to about 35 nt, from about 19 nt to about 40 nt, from about 19 nt to about 45 nt, from about 19 nt to about 50 nt, from about 19 nt to about 60 nt, from about 19 nt to about 70 nt, from about 19 nt to about 80 nt, from about 19 nt to about 90 nt, from about 19 nt to about 100 nt, from about 20 nt to about 25 nt, from about 20 nt to about 30 nt, from about 20 nt to about 35 nt, from about 20 nt to about 40 nt, from about 20 nt to about 45 nt, from about 20 nt to about 50 nt, from about 20 nt to about 60 nt, from about 20 nt to about 70 nt, from about 20 nt to about 80 nt, from about 20 nt to about 90 nt, or from about 20 nt to about 100 nt.

The nucleotide sequence (the DNA-targeting sequence) of the DNA-targeting segment that is complementary to a nucleotide sequence (target sequence) of the target DNA can have a length at least about 12 nt. For example, the DNA-targeting sequence of the DNA-targeting segment that is complementary to a target sequence of the target DNA can have a length at least about 12 nt, at least about 15 nt, at least about 18 nt, at least about 19 nt, at least about 20 nt, at least about 25 nt, at least about 30 nt, at least about 35 nt or at least about 40 nt. For example, the DNA-targeting sequence of the DNA-targeting segment that is complementary to a target sequence of the target DNA can have a length of from about 12 nucleotides (nt) to about 80 nt, from about 12 nt to about 50 nt, from about 12 nt to about 45 nt, from about 12 nt to about 40 nt, from about 12 nt to about 35 nt, from about 12 nt to about 30 nt, from about 12 nt to about 25 nt, from about 12 nt to about 20 nt, from about 12 nt to about 19 nt, from about 19 nt to about 20 nt, from about 19 nt to about 25 nt, from about 19 nt to about 30 nt, from about 19 nt to about 35 nt, from about 19 nt to about 40 nt, from about 19 nt to about 45 nt, from about 19 nt to about 50 nt, from about 19 nt to about 60 nt, from about 20 nt to about 25 nt, from about 20 nt to about 30 nt, from about 20 nt to about 35 nt, from about 20 nt to about 40 nt, from about 20 nt to about 45 nt, from about 20 nt to about 50 nt, or from about 20 nt to about 60 nt. The nucleotide sequence (the DNA-targeting sequence) of the DNA-targeting segment that is complementary to a nucleotide sequence (target sequence) of the target DNA can have a length at least about 12 nt.

In some cases, the DNA-targeting sequence of the DNA-targeting segment that is complementary to a target sequence of the target DNA is 20 nucleotides in length. In some cases, the DNA-targeting sequence of the DNA-targeting segment that is complementary to a target sequence of the target DNA is 19 nucleotides in length.

The percent complementarity between the DNA-targeting sequence of the DNA-targeting segment and the target sequence of the target DNA can be at least 60% (e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 97%, at least 98%, at least 99%, or 100%). In some cases, the percent complementarity between the DNA-targeting sequence of the DNA-targeting segment and the target sequence of the target DNA is 100% over the seven contiguous 5′-most nucleotides of the target sequence of the complementary strand of the target DNA. In some cases, the percent complementarity between the DNA-targeting sequence of the DNA-targeting segment and the target sequence of the target DNA is at least 60% over about 20 contiguous nucleotides. In some cases, the percent complementarity between the DNA-targeting sequence of the DNA-targeting segment and the target sequence of the target DNA is 100% over the fourteen contiguous 5′-most nucleotides of the target sequence of the complementary strand of the target DNA and as low as 0% over the remainder. In such a case, the DNA-targeting sequence can be considered to be 14 nucleotides in length. In some cases, the percent complementarity between the DNA-targeting sequence of the DNA-targeting segment and the target sequence of the target DNA is 100% over the seven contiguous 5′-most nucleotides of the target sequence of the complementary strand of the target DNA and as low as 0% over the remainder. In such a case, the DNA-targeting sequence can be considered to be 7 nucleotides in length.

Protein-Binding Segment

The protein-binding segment (i.e., “protein-binding sequence”) of a DNA-targeting RNA interacts with a variant site-directed polypeptide. When the variant Cas9 site-directed polypeptide, together with the DNA-targeting RNA, binds to a target DNA, transcription of the target DNA is reduced.

The protein-binding segment of a DNA-targeting RNA comprises two complementary stretches of nucleotides that hybridize to one another to form a double stranded RNA duplex (dsRNA duplex).

The protein-binding segment of a DNA-targeting RNA of the present disclosure comprises two stretches of nucleotides (a targeter-RNA and an activator-RNA) that are complementary to one another, are covalently linked by intervening nucleotides (e.g., in the case of a single-molecule DNA-targeting RNA)(“linkers” or “linker nucleotides”), and hybridize to form the double stranded RNA duplex (dsRNA duplex, or “dCas9-binding hairpin”) of the protein-binding segment, thus resulting in a stem-loop structure. This stem-loop structure is shown schematically in FIG. 39A. The targeter-RNA and the activator-RNA can be covalently linked via the 3′ end of the targeter-RNA and the 5′ end of the activator-RNA. Alternatively, targeter-RNA and the activator-RNA can be covalently linked via the 5′ end of the targeter-RNA and the 3′ end of the activator-RNA.

The protein-binding segment can have a length of from about 10 nucleotides to about 100 nucleotides, e.g., from about 10 nucleotides (nt) to about 20 nt, from about 20 nt to about 30 nt, from about 30 nt to about 40 nt, from about 40 nt to about 50 nt, from about 50 nt to about 60 nt, from about 60 nt to about 70 nt, from about 70 nt to about 80 nt, from about 80 nt to about 90 nt, or from about 90 nt to about 100 nt. For example, the protein-binding segment can have a length of from about 15 nucleotides (nt) to about 80 nt, from about 15 nt to about 50 nt, from about 15 nt to about 40 nt, from about 15 nt to about 30 nt or from about 15 nt to about 25 nt.

The dsRNA duplex of the protein-binding segment can have a length from about 6 base pairs (bp) to about 50 bp. For example, the dsRNA duplex of the protein-binding segment can have a length from about 6 bp to about 40 bp, from about 6 bp to about 30 bp, from about 6 bp to about 25 bp, from about 6 bp to about 20 bp, from about 6 bp to about 15 bp, from about 8 bp to about 40 bp, from about 8 bp to about 30 bp, from about 8 bp to about 25 bp, from about 8 bp to about 20 bp or from about 8 bp to about 15 bp. For example, the dsRNA duplex of the protein-binding segment can have a length from about from about 8 bp to about 10 bp, from about 10 bp to about 15 bp, from about 15 bp to about 18 bp, from about 18 bp to about 20 bp, from about 20 bp to about 25 bp, from about 25 bp to about 30 bp, from about 30 bp to about 35 bp, from about 35 bp to about 40 bp, or from about 40 bp to about 50 bp. In some embodiments, the dsRNA duplex of the protein-binding segment has a length of 36 base pairs. The percent complementarity between the nucleotide sequences that hybridize to form the dsRNA duplex of the protein-binding segment can be at least about 60%. For example, the percent complementarity between the nucleotide sequences that hybridize to form the dsRNA duplex of the protein-binding segment can be at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or at least about 99%. In some cases, the percent complementarity between the nucleotide sequences that hybridize to form the dsRNA duplex of the protein-binding segment is 100%.

The linker can have a length of from about 3 nucleotides to about 100 nucleotides. For example, the linker can have a length of from about 3 nucleotides (nt) to about 90 nt, from about 3 nucleotides (nt) to about 80 nt, from about 3 nucleotides (nt) to about 70 nt, from about 3 nucleotides (nt) to about 60 nt, from about 3 nucleotides (nt) to about 50 nt, from about 3 nucleotides (nt) to about 40 nt, from about 3 nucleotides (nt) to about 30 nt, from about 3 nucleotides (nt) to about 20 nt or from about 3 nucleotides (nt) to about 10 nt. For example, the linker can have a length of from about 3 nt to about 5 nt, from about 5 nt to about 10 nt, from about 10 nt to about 15 nt, from about 15 nt to about 20 nt, from about 20 nt to about 25 nt, from about 25 nt to about 30 nt, from about 30 nt to about 35 nt, from about 35 nt to about 40 nt, from about 40 nt to about 50 nt, from about 50 nt to about 60 nt, from about 60 nt to about 70 nt, from about 70 nt to about 80 nt, from about 80 nt to about 90 nt, or from about 90 nt to about 100 nt. In some embodiments, the linker of a DNA-targeting RNA is 4 nt.

Non-limiting examples of nucleotide sequences that can be included in a suitable protein-binding segment (i.e., dCas9 handle) are set forth in SEQ ID NOs:563-682 (For examples, see FIG. 8 and FIG. 9).

In some cases, a suitable protein-binding segment comprises a nucleotide sequence that differs by 1, 2, 3, 4, or 5 nucleotides from any one of the above-listed sequences.

Stability Control Sequence (e.g., Transcriptional Terminator Segment)

A stability control sequence influences the stability of an RNA (e.g., a DNA-targeting RNA, a targeter-RNA, an activator-RNA, etc.). One example of a suitable stability control sequence is a transcriptional terminator segment (i.e., a transcription termination sequence). A transcriptional terminator segment of a subject DNA-targeting RNA can have a total length of from about 10 nucleotides to about 100 nucleotides, e.g., from about 10 nucleotides (nt) to about 20 nt, from about 20 nt to about 30 nt, from about 30 nt to about 40 nt, from about 40 nt to about 50 nt, from about 50 nt to about 60 nt, from about 60 nt to about 70 nt, from about 70 nt to about 80 nt, from about 80 nt to about 90 nt, or from about 90 nt to about 100 nt. For example, the transcriptional terminator segment can have a length of from about 15 nucleotides (nt) to about 80 nt, from about 15 nt to about 50 nt, from about 15 nt to about 40 nt, from about 15 nt to about 30 nt or from about 15 nt to about 25 nt.

In some cases, the transcription termination sequence is one that is functional in a eukaryotic cell. In some cases, the transcription termination sequence is one that is functional in a prokaryotic cell.

Non-limiting examples of nucleotide sequences that can be included in a stability control sequence (e.g., transcriptional termination segment, or in any segment of the DNA-targeting RNA to provide for increased stability) include sequences set forth in SEQ ID NO:683-696 and, for example, 5′-UAAUCCCACAGCCGCCAGUUCCGCUGGCGGCAUUUU-5′ (SEQ ID NO: 1349) (a Rho-independent trp termination site).

Additional Sequences

In some embodiments, a DNA-targeting RNA comprises at least one additional segment at either the 5′ or 3′ end. For example, a suitable additional segment can comprise a 5′ cap (e.g., a 7-methylguanylate cap (m7G)); a 3′ polyadenylated tail (i.e., a 3′ poly(A) tail); a riboswitch sequence (e.g., to allow for regulated stability and/or regulated accessibility by proteins and protein complexes); a sequence that forms a dsRNA duplex (i.e., a hairpin)); a sequence that targets the RNA to a subcellular location (e.g., nucleus, mitochondria, chloroplasts, and the like); a modification or sequence that provides for tracking (e.g., direct conjugation to a fluorescent molecule, conjugation to a moiety that facilitates fluorescent detection, a sequence that allows for fluorescent detection, etc.); a modification or sequence that provides a binding site for proteins (e.g., proteins that act on DNA, including transcriptional activators, transcriptional repressors, DNA methyltransferases, DNA demethylases, histone acetyltransferases, histone deacetylases, and the like) a modification or sequence that provides for increased, decreased, and/or controllable stability; and combinations thereof.

Multiple Simultaneous DNA-Targeting RNAs

In some embodiments, multiple DNA-targeting RNAs are used simultaneously in the same cell to simultaneously modulate transcription at different locations on the same target DNA or on different target DNAs. In some embodiments, two or more DNA-targeting RNAs target the same gene or transcript or locus. In some embodiments, two or more DNA-targeting RNAs target different unrelated loci. In some embodiments, two or more DNA-targeting RNAs target different, but related loci.

Because the DNA-targeting RNAs are small and robust they can be simultaneously present on the same expression vector and can even be under the same transcriptional control if so desired. In some embodiments, two or more (e.g., 3 or more, 4 or more, 5 or more, 10 or more, 15 or more, 20 or more, 25 or more, 30 or more, 35 or more, 40 or more, 45 or more, or 50 or more) DNA-targeting RNAs are simultaneously expressed in a target cell (from the same or different vectors). The expressed DNA-targeting RNAs can be differently recognized by dCas9 proteins from different bacteria, such as S. pyogenes, S. thermophilus, L. innocua, and N. meningitidis.

To express multiple DNA-targeting RNAs, an artificial RNA processing system mediated by the Csy4 endoribonuclease can be used. Multiple DNA-targeting RNAs can be concatenated into a tandem array on a precursor transcript (e.g., expressed from a U6 promoter), and separated by Csy4-specific RNA sequence. Co-expressed Csy4 protein cleaves the precursor transcript into multiple DNA-targeting RNAs. Advantages for using an RNA processing system include: first, there is no need to use multiple promoters; second, since all DNA-targeting RNAs are processed from a precursor transcript, their concentrations are normalized for similar dCas9-binding.

Csy4 is a small endoribonuclease (RNase) protein derived from bacteria Pseudomonas aeruginosa. Csy4 specifically recognizes a minimal 17-bp RNA hairpin, and exhibits rapid (<1 min) and highly efficient (>99.9%) RNA cleavage. Unlike most RNases, the cleaved RNA fragment remains stable and functionally active. The Csy4-based RNA cleavage can be repurposed into an artificial RNA processing system. In this system, the 17-bp RNA hairpins are inserted between multiple RNA fragments that are transcribed as a precursor transcript from a single promoter. Co-expression of Csy4 is effective in generating individual RNA fragments.

Site-Directed Polypeptide

As noted above, a subject DNA-targeting RNA and a variant Cas9 site-directed polypeptide form a complex. The DNA-targeting RNA provides target specificity to the complex by comprising a nucleotide sequence that is complementary to a sequence of a target DNA.

The variant Cas9 site-directed polypeptide has reduced endodeoxyribonuclease activity. For example, a variant Cas9 site-directed polypeptide suitable for use in a transcription modulation method of the present disclosure exhibits less than about 20%, less than about 15%, less than about 10%, less than about 5%, less than about 1%, or less than about 0.1%, of the endodeoxyribonuclease activity of a wild-type Cas9 polypeptide, e.g., a wild-type Cas9 polypeptide comprising an amino acid sequence as depicted in FIG. 3A and FIG. 3B (SEQ ID NO:8). In some embodiments, the variant Cas9 site-directed polypeptide has substantially no detectable endodeoxyribonuclease activity. In some embodiments when a site-directed polypeptide has reduced catalytic activity (e.g., when a Cas9 protein has a D10, G12, G17, E762, H840, N854, N863, H982, H983, A984, D986, and/or a A987 mutation, e.g., DI OA, G12A, G17A, E762A, H840A, N854A, N863A, H982A, H983A, A984A, and/or D986A), the polypeptide can still bind to target DNA in a site-specific manner (because it is still guided to a target DNA sequence by a DNA-targeting RNA) as long as it retains the ability to interact with the DNA-targeting RNA.

In some cases, a suitable variant Cas9 site-directed polypeptide comprises an amino acid sequence having at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 99% or 100% amino acid sequence identity to amino acids 7-166 or 731-1003 of the Cas9/Csn1 amino acid sequence depicted in FIG. 3A and FIG. 3B (SEQ ID NO:8), or to the corresponding portions in any one of the amino acid sequences SEQ ID NOs:1-256 and 795-1346.

In some cases, the variant Cas9 site-directed polypeptide can cleave the complementary strand of the target DNA but has reduced ability to cleave the non-complementary strand of the target DNA. For example, the variant Cas9 site-directed polypeptide can have a mutation (amino acid substitution) that reduces the function of the RuvC domain (e.g., “domain 1” of FIG. 3B). As a non-limiting example, in some cases, the variant Cas9 site-directed polypeptide is a D10A (aspartate to alanine) mutation of the amino acid sequence depicted in FIG. 3A and FIG. 3B (or the corresponding mutation of any of the amino acid sequences set forth in SEQ ID NOs:1-256 and 795-1346).

In some cases, the variant Cas9 site-directed polypeptide can cleave the non-complementary strand of the target DNA but has reduced ability to cleave the complementary strand of the target DNA. For example, the variant Cas9 site-directed polypeptide can have a mutation (amino acid substitution) that reduces the function of the HNH domain (RuvC/HNH/RuvC domain motifs, “domain 2” of FIG. 3B). As a non-limiting example, in some cases, the variant Cas9 site-directed polypeptide is a H840A (histidine to alanine at amino acid position 840 of SEQ ID NO:8) or the corresponding mutation of any of the amino acid sequences set forth in SEQ ID NOs:1-256 and 795-1346).

In some cases, the variant Cas9 site-directed polypeptide has a reduced ability to cleave both the complementary and the non-complementary strands of the target DNA. As a non-limiting example, in some cases, the variant Cas9 site-directed polypeptide harbors both D10A and H840A mutations of the amino acid sequence depicted in FIG. 3A and FIG. 3B (or the corresponding mutations of any of the amino acid sequences set forth in SEQ ID NOs:1-256 and 795-1346).

Other residues can be mutated to achieve the same effect (i.e. inactivate one or the other nuclease portions). As non-limiting examples, residues D10, G12, G17, E762, H840, N854, N863, H982, H983, A984, D986, and/or A987 (or the corresponding mutations of any of the proteins set forth as SEQ ID NOs:1-256 and 795-1346) can be altered (i.e., substituted) (see FIG. 3A-3B, FIG. 5, FIG. 11A, and Table 1 for more information regarding the conservation of Cas9 amino acid residues). Also, mutations other than alanine substitutions are suitable.

In some cases, the variant Cas9 site-directed polypeptide is a fusion polypeptide (a “variant Cas9 fusion polypeptide”), i.e., a fusion polypeptide comprising: i) a variant Cas9 site-directed polypeptide; and b) a covalently linked heterologous polypeptide (also referred to as a “fusion partner”).

The heterologous polypeptide may exhibit an activity (e.g., enzymatic activity) that will also be exhibited by the variant Cas9 fusion polypeptide (e.g., methyltransferase activity, acetyltransferase activity, kinase activity, ubiquitinating activity, etc.). A heterologous nucleic acid sequence may be linked to another nucleic acid sequence (e.g., by genetic engineering) to generate a chimeric nucleotide sequence encoding a chimeric polypeptide. In some embodiments, a variant Cas9 fusion polypeptide is generated by fusing a variant Cas9 polypeptide with a heterologous sequence that provides for subcellular localization (i.e., the heterologous sequence is a subcellular localization sequence, e.g., a nuclear localization signal (NLS) for targeting to the nucleus; a mitochondrial localization signal for targeting to the mitochondria; a chloroplast localization signal for targeting to a chloroplast; an ER retention signal; and the like). In some embodiments, the heterologous sequence can provide a tag (i.e., the heterologous sequence is a detectable label) for ease of tracking and/or purification (e.g., a fluorescent protein, e.g., green fluorescent protein (GFP), YFP, RFP, CFP, mCherry, tdTomato, and the like; a histidine tag, e.g., a 6×His tag; a hemagglutinin (HA) tag; a FLAG tag; a Myc tag; and the like). In some embodiments, the heterologous sequence can provide for increased or decreased stability (i.e., the heterologous sequence is a stability control peptide, e.g., a degron, which in some cases is controllable (e.g., a temperature sensitive or drug controllable degron sequence, see below). In some embodiments, the heterologous sequence can provide for increased or decreased transcription from the target DNA (i.e., the heterologous sequence is a transcription modulation sequence, e.g., a transcription factor/activator or a fragment thereof, a protein or fragment thereof that recruits a transcription factor/activator, a transcription repressor or a fragment thereof, a protein or fragment thereof that recruits a transcription repressor, a small molecule/drug-responsive transcription regulator, etc.). In some embodiments, the heterologous sequence can provide a binding domain (i.e., the heterologous sequence is a protein binding sequence, e.g., to provide the ability of a chimeric dCas9 polypeptide to bind to another protein of interest, e.g., a DNA or histone modifying protein, a transcription factor or transcription repressor, a recruiting protein, etc.).

Suitable fusion partners that provide for increased or decreased stability include, but are not limited to degron sequences. Degrons are readily understood by one of ordinary skill in the art to be amino acid sequences that control the stability of the protein of which they are part. For example, the stability of a protein comprising a degron sequence is controlled at least in part by the degron sequence. In some cases, a suitable degron is constitutive such that the degron exerts its influence on protein stability independent of experimental control (i.e., the degron is not drug inducible, temperature inducible, etc.) In some cases, the degron provides the variant Cas9 polypeptide with controllable stability such that the variant Cas9 polypeptide can be turned “on” (i.e., stable) or “off” (i.e., unstable, degraded) depending on the desired conditions. For example, if the degron is a temperature sensitive degron, the variant Cas9 polypeptide may be functional (i.e., “on”, stable) below a threshold temperature (e.g., 42° C., 41° C., 40° C., 39° C., 38° C., 37° C., 36° C., 35° C., 34° C., 33° C., 32° C., 31° C., 30° C., etc.) but non-functional (i.e., “off”, degraded) above the threshold temperature. As another example, if the degron is a drug inducible degron, the presence or absence of drug can switch the protein from an “off” (i.e., unstable) state to an “on” (i.e., stable) state or vice versa. An exemplary drug inducible degron is derived from the FKBP12 protein. The stability of the degron is controlled by the presence or absence of a small molecule that binds to the degron.

Examples of suitable degrons include, but are not limited to those degrons controlled by Shield-1, DHFR, auxins, and/or temperature. Non-limiting examples of suitable degrons are known in the art (e.g., Dohmen et al., Science, 1994. 263(5151): p. 1273-1276: Heat-inducible degron: a method for constructing temperature-sensitive mutants; Schoeber et al., Am J Physiol Renal Physiol. 2009 January; 296(1):F204-11: Conditional fast expression and function of multimeric TRPV5 channels using Shield-1; Chu et al., Bioorg Med Chem Lett. 2008 Nov. 15; 18(22):5941-4: Recent progress with FKBP-derived destabilizing domains; Kanemaki, Pflugers Arch. 2012 Dec. 28: Frontiers of protein expression control with conditional degrons; Yang et al., Mol Cell. 2012 Nov. 30; 48(4):487-8: Titivated for destruction: the methyl degron; Barbour et al., Biosci Rep. 2013 Jan. 18; 33(1).: Characterization of the bipartite degron that regulates ubiquitin-independent degradation of thymidylate synthase; and Greussing et al., J Vis Exp. 2012 Nov. 10; (69): Monitoring of ubiquitin-proteasome activity in living cells using a Degron (dgn)-destabilized green fluorescent protein (GFP)-based reporter protein; all of which are hereby incorporated in their entirety by reference).

Exemplary degron sequences have been well-characterized and tested in both cells and animals. Thus, fusing dCas9 to a degron sequence produces a “tunable” and “inducible” dCas9 polypeptide. Any of the fusion partners described herein can be used in any desirable combination. As one non-limiting example to illustrate this point, a dCas9 fusion protein can comprise a YFP sequence for detection, a degron sequence for stability, and transcription activator sequence to increase transcription of the target DNA. Furthermore, the number of fusion partners that can be used in a dCas9 fusion protein is unlimited. In some cases, a dCas9 fusion protein comprises one or more (e.g. two or more, three or more, four or more, or five or more) heterologous sequences.

Suitable fusion partners include, but are not limited to, a polypeptide that provides for methyltransferase activity, demethylase activity, acetyltransferase activity, deacetylase activity, kinase activity, phosphatase activity, ubiquitin ligase activity, deubiquitinating activity, adenylation activity, deadenylation activity, SUMOylating activity, deSUMOylating activity, ribosylation activity, deribosylation activity, myristoylation activity, or demyristoylation activity, any of which can be directed at modifying the DNA directly (e.g., methylation of DNA) or at modifying a DNA-associated polypeptide (e.g., a histone or DNA binding protein). Further suitable fusion partners include, but are not limited to boundary elements (e.g., CTCF), proteins and fragments thereof that provide periphery recruitment (e.g., Lamin A, Lamin B, etc.), and protein docking elements (e.g., FKBP/FRB, Pil1/Aby1, etc.).

Examples of various additional suitable fusion partners (or fragments thereof) for a subject variant Cas9 site-directed polypeptide include, but are not limited to those listed in FIG. 54A-54C.

In some embodiments, a subject site-directed modifying polypeptide can be codon-optimized. This type of optimization is known in the art and entails the mutation of foreign-derived DNA to mimic the codon preferences of the intended host organism or cell while encoding the same protein. Thus, the codons are changed, but the encoded protein remains unchanged. For example, if the intended target cell was a human cell, a human codon-optimized dCas9 (or dCas9 variant) would be a suitable site-directed modifying polypeptide. As another non-limiting example, if the intended host cell were a mouse cell, than a mouse codon-optimized Cas9 (or variant, e.g., enzymatically inactive variant) would be a suitable Cas9 site-directed polypeptide. While codon optimization is not required, it is acceptable and may be preferable in certain cases.

Host Cells

A method of the present disclosure to modulate transcription may be employed to induce transcriptional modulation in mitotic or post-mitotic cells in vivo and/or ex vivo and/or in vitro. Because the DNA-targeting RNA provides specificity by hybridizing to target DNA, a mitotic and/or post-mitotic cell can be any of a variety of host cell, where suitable host cells include, but are not limited to, a bacterial cell; an archaeal cell; a single-celled eukaryotic organism; a plant cell; an algal cell, e.g., Botryococcus braunii, Chlamydomonas reinhardtii, Nannochloropsis gaditana, Chlorella pyrenoidosa, Sargassum patens, C. agardh, and the like; a fungal cell; an animal cell; a cell from an invertebrate animal (e.g., an insect, a cnidarian, an echinoderm, a nematode, etc.); a eukaryotic parasite (e.g., a malarial parasite, e.g., Plasmodium falciparum; a helminth; etc.); a cell from a vertebrate animal (e.g., fish, amphibian, reptile, bird, mammal); a mammalian cell, e.g., a rodent cell, a human cell, a non-human primate cell, etc. Suitable host cells include naturally-occurring cells; genetically modified cells (e.g., cells genetically modified in a laboratory, e.g., by the “hand of man”); and cells manipulated in vitro in any way. In some cases, a host cell is isolated.

Any type of cell may be of interest (e.g. a stem cell, e.g. an embryonic stem (ES) cell, an induced pluripotent stem (iPS) cell, a germ cell; a somatic cell, e.g. a fibroblast, a hematopoietic cell, a neuron, a muscle cell, a bone cell, a hepatocyte, a pancreatic cell; an in vitro or in vivo embryonic cell of an embryo at any stage, e.g., a 1-cell, 2-cell, 4-cell, 8-cell, etc. stage zebrafish embryo; etc.). Cells may be from established cell lines or they may be primary cells, where “primary cells”, “primary cell lines”, and “primary cultures” are used interchangeably herein to refer to cells and cells cultures that have been derived from a subject and allowed to grow in vitro for a limited number of passages, i.e. splittings, of the culture. For example, primary cultures include cultures that may have been passaged 0 times, 1 time, 2 times, 4 times, 5 times, 10 times, or 15 times, but not enough times go through the crisis stage. Primary cell lines can be are maintained for fewer than 10 passages in vitro. Target cells are in many embodiments unicellular organisms, or are grown in culture.

If the cells are primary cells, such cells may be harvest from an individual by any convenient method. For example, leukocytes may be conveniently harvested by apheresis, leukocytapheresis, density gradient separation, etc., while cells from tissues such as skin, muscle, bone marrow, spleen, liver, pancreas, lung, intestine, stomach, etc. are most conveniently harvested by biopsy. An appropriate solution may be used for dispersion or suspension of the harvested cells. Such solution will generally be a balanced salt solution, e.g. normal saline, phosphate-buffered saline (PBS), Hank's balanced salt solution, etc., conveniently supplemented with fetal calf serum or other naturally occurring factors, in conjunction with an acceptable buffer at low concentration, e.g., from 5-25 mM. Convenient buffers include HEPES, phosphate buffers, lactate buffers, etc. The cells may be used immediately, or they may be stored, frozen, for long periods of time, being thawed and capable of being reused. In such cases, the cells will usually be frozen in 10% dimethyl sulfoxide (DMSO), 50% serum, 40% buffered medium, or some other such solution as is commonly used in the art to preserve cells at such freezing temperatures, and thawed in a manner as commonly known in the art for thawing frozen cultured cells.

Introducing Nucleic Acid into a Host Cell

A DNA-targeting RNA, or a nucleic acid comprising a nucleotide sequence encoding same, can be introduced into a host cell by any of a variety of well-known methods. Similarly, where a subject method involves introducing into a host cell a nucleic acid comprising a nucleotide sequence encoding a variant Cas9 site-directed polypeptide, such a nucleic acid can be introduced into a host cell by any of a variety of well-known methods.

Methods of introducing a nucleic acid into a host cell are known in the art, and any known method can be used to introduce a nucleic acid (e.g., an expression construct) into a stem cell or progenitor cell. Suitable methods include, include e.g., viral or bacteriophage infection, transfection, conjugation, protoplast fusion, lipofection, electroporation, calcium phosphate precipitation, polyethyleneimine (PEI)-mediated transfection, DEAE-dextran mediated transfection, liposome-mediated transfection, particle gun technology, calcium phosphate precipitation, direct micro injection, nanoparticle-mediated nucleic acid delivery (see, e.g., Panyam et., al Adv Drug Deliv Rev. 2012 Sep. 13. pii: 50169-409X(12)00283-9. doi: 10.1016/j.addr.2012.09.023), and the like.

Nucleic Acids

The present disclosure provides an isolated nucleic acid comprising a nucleotide sequence encoding a subject DNA-targeting RNA. In some cases, a subject nucleic acid also comprises a nucleotide sequence encoding a variant Cas9 site-directed polypeptide.

In some embodiments, a subject method involves introducing into a host cell (or a population of host cells) one or more nucleic acids comprising nucleotide sequences encoding a DNA-targeting RNA and/or a variant Cas9 site-directed polypeptide. In some embodiments a cell comprising a target DNA is in vitro. In some embodiments a cell comprising a target DNA is in vivo. Suitable nucleic acids comprising nucleotide sequences encoding a DNA-targeting RNA and/or a site-directed polypeptide include expression vectors, where an expression vector comprising a nucleotide sequence encoding a DNA-targeting RNA and/or a site-directed polypeptide is a “recombinant expression vector.”

In some embodiments, the recombinant expression vector is a viral construct, e.g., a recombinant adeno-associated virus construct (see, e.g., U.S. Pat. No. 7,078,387), a recombinant adenoviral construct, a recombinant lentiviral construct, a recombinant retroviral construct, etc.

Suitable expression vectors include, but are not limited to, viral vectors (e.g. viral vectors based on vaccinia virus; poliovirus; adenovirus (see, e.g., Li et al., Invest Opthalmol Vis Sci 35:2543 2549, 1994; Borras et al., Gene Ther 6:515 524, 1999; Li and Davidson, PNAS 92:7700 7704, 1995; Sakamoto et al., H Gene Ther 5:1088 1097, 1999; WO 94/12649, WO 93/03769; WO 93/19191; WO 94/28938; WO 95/11984 and WO 95/00655); adeno-associated virus (see, e.g., Ali et al., Hum Gene Ther 9:81 86, 1998, Flannery et al., PNAS 94:6916 6921, 1997; Bennett et al., Invest Opthalmol Vis Sci 38:2857 2863, 1997; Jomary et al., Gene Ther 4:683 690, 1997, Rolling et al., Hum Gene Ther 10:641 648, 1999; Ali et al., Hum Mol Genet 5:591 594, 1996; Srivastava in WO 93/09239, Samulski et al., J. Vir. (1989) 63:3822-3828; Mendelson et al., Virol. (1988) 166:154-165; and Flotte et al., PNAS (1993) 90:10613-10617); SV40; herpes simplex virus; human immunodeficiency virus (see, e.g., Miyoshi et al., PNAS 94:10319 23, 1997; Takahashi et al., J Virol 73:7812 7816, 1999); a retroviral vector (e.g., Murine Leukemia Virus, spleen necrosis virus, and vectors derived from retroviruses such as Rous Sarcoma Virus, Harvey Sarcoma Virus, avian leukosis virus, a lentivirus, human immunodeficiency virus, myeloproliferative sarcoma virus, and mammary tumor virus); and the like.

Numerous suitable expression vectors are known to those of skill in the art, and many are commercially available. The following vectors are provided by way of example; for eukaryotic host cells: pXT1, pSG5 (Stratagene), pSVK3, pBPV, pMSG, and pSVLSV40 (Pharmacia). However, any other vector may be used so long as it is compatible with the host cell.

Depending on the host/vector system utilized, any of a number of suitable transcription and translation control elements, including constitutive and inducible promoters, transcription enhancer elements, transcription terminators, etc. may be used in the expression vector (see e.g., Bitter et al. (1987) Methods in Enzymology, 153:516-544).

In some embodiments, a nucleotide sequence encoding a DNA-targeting RNA and/or a variant Cas9 site-directed polypeptide is operably linked to a control element, e.g., a transcriptional control element, such as a promoter. The transcriptional control element may be functional in either a eukaryotic cell, e.g., a mammalian cell; or a prokaryotic cell (e.g., bacterial or archaeal cell). In some embodiments, a nucleotide sequence encoding a DNA-targeting RNA and/or a variant Cas9 site-directed polypeptide is operably linked to multiple control elements that allow expression of the nucleotide sequence encoding a DNA-targeting RNA and/or a variant Cas9 site-directed polypeptide in both prokaryotic and eukaryotic cells.

A promoter can be a constitutively active promoter (i.e., a promoter that is constitutively in an active/“ON” state), it may be an inducible promoter (i.e., a promoter whose state, active/“ON” or inactive/“OFF”, is controlled by an external stimulus, e.g., the presence of a particular temperature, compound, or protein.), it may be a spatially restricted promoter (i.e., transcriptional control element, enhancer, etc.)(e.g., tissue specific promoter, cell type specific promoter, etc.), and it may be a temporally restricted promoter (i.e., the promoter is in the “ON” state or “OFF” state during specific stages of embryonic development or during specific stages of a biological process, e.g., hair follicle cycle in mice).

Suitable promoters can be derived from viruses and can therefore be referred to as viral promoters, or they can be derived from any organism, including prokaryotic or eukaryotic organisms. Suitable promoters can be used to drive expression by any RNA polymerase (e.g., pol I, pol II, pol III). Exemplary promoters include, but are not limited to the SV40 early promoter, mouse mammary tumor virus long terminal repeat (LTR) promoter; adenovirus major late promoter (Ad MLP); a herpes simplex virus (HSV) promoter, a cytomegalovirus (CMV) promoter such as the CMV immediate early promoter region (CMVIE), a rous sarcoma virus (RSV) promoter, a human U6 small nuclear promoter (U6) (Miyagishi et al., Nature Biotechnology 20, 497-500 (2002)), an enhanced U6 promoter (e.g., Xia et al., Nucleic Acids Res. 2003 Sep. 1; 31(17)), a human H1 promoter (H1), and the like.

Examples of inducible promoters include, but are not limited to T7 RNA polymerase promoter, T3 RNA polymerase promoter, Isopropyl-beta-D-thiogalactopyranoside (IPTG)-regulated promoter, lactose induced promoter, heat shock promoter, Tetracycline-regulated promoter (e.g., Tet-ON, Tet-OFF, etc.), Steroid-regulated promoter, Metal-regulated promoter, estrogen receptor-regulated promoter, etc. Inducible promoters can therefore be regulated by molecules including, but not limited to, doxycycline; RNA polymerase, e.g., T7 RNA polymerase; an estrogen receptor; an estrogen receptor fusion; etc.

In some embodiments, the promoter is a spatially restricted promoter (i.e., cell type specific promoter, tissue specific promoter, etc.) such that in a multi-cellular organism, the promoter is active (i.e., “ON”) in a subset of specific cells. Spatially restricted promoters may also be referred to as enhancers, transcriptional control elements, control sequences, etc. Any convenient spatially restricted promoter may be used and the choice of suitable promoter (e.g., a brain specific promoter, a promoter that drives expression in a subset of neurons, a promoter that drives expression in the germline, a promoter that drives expression in the lungs, a promoter that drives expression in muscles, a promoter that drives expression in islet cells of the pancreas, etc.) will depend on the organism. For example, various spatially restricted promoters are known for plants, flies, worms, mammals, mice, etc. Thus, a spatially restricted promoter can be used to regulate the expression of a nucleic acid encoding a subject site-directed polypeptide in a wide variety of different tissues and cell types, depending on the organism. Some spatially restricted promoters are also temporally restricted such that the promoter is in the “ON” state or “OFF” state during specific stages of embryonic development or during specific stages of a biological process (e.g., hair follicle cycle in mice).

For illustration purposes, examples of spatially restricted promoters include, but are not limited to, neuron-specific promoters, adipocyte-specific promoters, cardiomyocyte-specific promoters, smooth muscle-specific promoters, photoreceptor-specific promoters, etc. Neuron-specific spatially restricted promoters include, but are not limited to, a neuron-specific enolase (NSE) promoter (see, e.g., EMBL HSENO2, X51956); an aromatic amino acid decarboxylase (AADC) promoter; a neurofilament promoter (see, e.g., GenBank HUMNFL, L04147); a synapsin promoter (see, e.g., GenBank HUMSYNIB, M55301); a thy-1 promoter (see, e.g., Chen et al. (1987) Cell 51:7-19; and Llewellyn, et al. (2010) Nat. Med. 16(10):1161-1166); a serotonin receptor promoter (see, e.g., GenBank S62283); a tyrosine hydroxylase promoter (TH) (see, e.g., Oh et al. (2009) Gene Ther 16:437; Sasaoka et al. (1992) Mol. Brain Res. 16:274; Boundy et al. (1998) J. Neurosci. 18:9989; and Kaneda et al. (1991) Neuron 6:583-594); a GnRH promoter (see, e.g., Radovick et al. (1991) Proc. Natl. Acad. Sci. USA 88:3402-3406); an L7 promoter (see, e.g., Oberdick et al. (1990) Science 248:223-226); a DNMT promoter (see, e.g., Bartge et al. (1988) Proc. Natl. Acad. Sci. USA 85:3648-3652); an enkephalin promoter (see, e.g., Comb et al. (1988) EMBO J. 17:3793-3805); a myelin basic protein (MBP) promoter; a Ca2+-calmodulin-dependent protein kinase II-alpha (CamKIIα) promoter (see, e.g., Mayford et al. (1996) Proc. Natl. Acad. Sci. USA 93:13250; and Casanova et al. (2001) Genesis 31:37); a CMV enhancer/platelet-derived growth factor-β promoter (see, e.g., Liu et al. (2004) Gene Therapy 11:52-60); and the like.

Adipocyte-specific spatially restricted promoters include, but are not limited to aP2 gene promoter/enhancer, e.g., a region from −5.4 kb to +21 bp of a human aP2 gene (see, e.g., Tozzo et al. (1997) Endocrinol. 138:1604; Ross et al. (1990) Proc. Natl. Acad. Sci. USA 87:9590; and Pavjani et al. (2005) Nat. Med. 11:797); a glucose transporter-4 (GLUT4) promoter (see, e.g., Knight et al. (2003) Proc. Natl. Acad. Sci. USA 100:14725); a fatty acid translocase (FAT/CD36) promoter (see, e.g., Kuriki et al. (2002) Biol. Pharm. Bull. 25:1476; and Sato et al. (2002) J. Biol. Chem. 277:15703); a stearoyl-CoA desaturase-1 (SCD1) promoter (Tabor et al. (1999) J. Biol. Chem. 274:20603); a leptin promoter (see, e.g., Mason et al. (1998) Endocrinol. 139:1013; and Chen et al. (1999) Biochem. Biophys. Res. Comm. 262:187); an adiponectin promoter (see, e.g., Kita et al. (2005) Biochem. Biophys. Res. Comm. 331:484; and Chakrabarti (2010) Endocrinol. 151:2408); an adipsin promoter (see, e.g., Platt et al. (1989) Proc. Natl. Acad. Sci. USA 86:7490); a resistin promoter (see, e.g., Seo et al. (2003) Molec. Endocrinol. 17:1522); and the like.

Cardiomyocyte-specific spatially restricted promoters include, but are not limited to control sequences derived from the following genes: myosin light chain-2, α-myosin heavy chain, AE3, cardiac troponin C, cardiac actin, and the like. Franz et al. (1997) Cardiovasc. Res. 35:560-566; Robbins et al. (1995) Ann. N.Y. Acad. Sci. 752:492-505; Linn et al. (1995) Circ. Res. 76:584-591; Parmacek et al. (1994) Mol. Cell. Biol. 14:1870-1885; Hunter et al. (1993) Hypertension 22:608-617; and Sartorelli et al. (1992) Proc. Natl. Acad. Sci. USA 89:4047-4051.

Smooth muscle-specific spatially restricted promoters include, but are not limited to an SM22α promoter (see, e.g., Akyürek et al. (2000) Mol. Med. 6:983; and U.S. Pat. No. 7,169,874); a smoothelin promoter (see, e.g., WO 2001/018048); an α-smooth muscle actin promoter; and the like. For example, a 0.4 kb region of the SM22α promoter, within which lie two CArG elements, has been shown to mediate vascular smooth muscle cell-specific expression (see, e.g., Kim, et al. (1997) Mol. Cell. Biol. 17, 2266-2278; Li, et al., (1996) J. Cell Biol. 132, 849-859; and Moessler, et al. (1996) Development 122, 2415-2425).

Photoreceptor-specific spatially restricted promoters include, but are not limited to, a rhodopsin promoter; a rhodopsin kinase promoter (Young et al. (2003) Ophthalmol. Vis. Sci. 44:4076); a beta phosphodiesterase gene promoter (Nicoud et al. (2007) J. Gene Med. 9:1015); a retinitis pigmentosa gene promoter (Nicoud et al. (2007) supra); an interphotoreceptor retinoid-binding protein (IRBP) gene enhancer (Nicoud et al. (2007) supra); an IRBP gene promoter (Yokoyama et al. (1992) Exp Eye Res. 55:225); and the like.

Libraries

The present disclosure provides a library of DNA-targeting RNAs. The present disclosure provides a library of nucleic acids comprising nucleotides encoding DNA-targeting RNAs. A subject library of nucleic acids comprising nucleotides encoding DNA-targeting RNAs can comprises a library of recombinant expression vectors comprising nucleotides encoding the DNA-targeting RNAs.

A subject library can comprise from about 10 individual members to about 1012 individual members; e.g., a subject library can comprise from about 10 individual members to about 102 individual members, from about 102 individual members to about 103 individual members, from about 103 individual members to about 105 individual members, from about 105 individual members to about 107 individual members, from about 107 individual members to about 109 individual members, or from about 109 individual members to about 1012 individual members.

An “individual member” of a subject library differs from other members of the library in the nucleotide sequence of the DNA targeting segment of the DNA-targeting RNA. Thus, e.g., each individual member of a subject library can comprise the same or substantially the same nucleotide sequence of the protein-binding segment as all other members of the library; and can comprise the same or substantially the same nucleotide sequence of the transcriptional termination segment as all other members of the library; but differs from other members of the library in the nucleotide sequence of the DNA targeting segment of the DNA-targeting RNA. In this way, the library can comprise members that bind to different target nucleic acids.

Utility

A method for modulating transcription according to the present disclosure finds use in a variety of applications, which are also provided. Applications include research applications; diagnostic applications; industrial applications; and treatment applications.

Research applications include, e.g., determining the effect of reducing or increasing transcription of a target nucleic acid on, e.g., development, metabolism, expression of a downstream gene, and the like.

High through-put genomic analysis can be carried out using a subject transcription modulation method, in which only the DNA-targeting segment of the DNA-targeting RNA needs to be varied, while the protein-binding segment and the transcription termination segment can (in some cases) be held constant. A library (e.g., a subject library) comprising a plurality of nucleic acids used in the genomic analysis would include: a promoter operably linked to a DNA-targeting RNA-encoding nucleotide sequence, where each nucleic acid would include a different DNA-targeting segment, a common protein-binding segment, and a common transcription termination segment. A chip could contain over 5×104 unique DNA-targeting RNAs. Applications would include large-scale phenotyping, gene-to-function mapping, and meta-genomic analysis.

The subject methods disclosed herein find use in the field of metabolic engineering. Because transcription levels can be efficiently and predictably controlled by designing an appropriate DNA-targeting RNA, as disclosed herein, the activity of metabolic pathways (e.g., biosynthetic pathways) can be precisely controlled and tuned by controlling the level of specific enzymes (e.g., via increased or decreased transcription) within a metabolic pathway of interest. Metabolic pathways of interest include those used for chemical (fine chemicals, fuel, antibiotics, toxins, agonists, antagonists, etc.) and/or drug production.

Biosynthetic pathways of interest include but are not limited to (1) the mevalonate pathway (e.g., HMG-CoA reductase pathway) (converts acetyl-CoA to dimethylallyl pyrophosphate (DMAPP) and isopentenyl pyrophosphate (IPP), which are used for the biosynthesis of a wide variety of biomolecules including terpenoids/isoprenoids), (2) the non-mevalonate pathway (i.e., the “2-C-methyl-D-erythritol 4-phosphate/1-deoxy-D-xylulose 5-phosphate pathway” or “MEP/DOXP pathway” or “DXP pathway”)(also produces DMAPP and IPP, instead by converting pyruvate and glyceraldehyde 3-phosphate into DMAPP and IPP via an alternative pathway to the mevalonate pathway), (3) the polyketide synthesis pathway (produces a variety of polyketides via a variety of polyketide synthase enzymes. Polyketides include naturally occurring small molecules used for chemotherapy (e. g., tetracyclin, and macrolides) and industrially important polyketides include rapamycin (immunosuppressant), erythromycin (antibiotic), lovastatin (anticholesterol drug), and epothilone B (anticancer drug)), (4) fatty acid synthesis pathways, (5) the DAHP (3-deoxy-D-arabino-heptulosonate 7-phosphate) synthesis pathway, (6) pathways that produce potential biofuels (such as short-chain alcohols and alkane, fatty acid methyl esters and fatty alcohols, isoprenoids, etc.), etc.

Networks and Cascades

The methods disclosed herein can be used to design integrated networks (i.e., a cascade or cascades) of control. For example, a subject DNA-targeting RNA/variant Cas9 site-directed polypeptide may be used to control (i.e., modulate, e.g., increase, decrease) the expression of another DNA-targeting RNA or another subject variant Cas9 site-directed polypeptide. For example, a first DNA-targeting RNA may be designed to target the modulation of transcription of a second chimeric dCas9 polypeptide with a function that is different than the first variant Cas9 site-directed polypeptide (e.g., methyltransferase activity, demethylase activity, acetyltansferase activity, deacetylase activity, etc.). In addition, because different dCas9 proteins (e.g., derived from different species) may require a different Cas9 handle (i.e., protein binding segment), the second chimeric dCas9 polypeptide can be derived from a different species than the first dCas9 polypeptide above. Thus, in some cases, the second chimeric dCas9 polypeptide can be selected such that it may not interact with the first DNA-targeting RNA. In other cases, the second chimeric dCas9 polypeptide can be selected such that it does interact with the first DNA-targeting RNA. In some such cases, the activities of the two (or more) dCas9 proteins may compete (e.g., if the polypeptides have opposing activities) or may synergize (e.g., if the polypeptides have similar or synergistic activities). Likewise, as noted above, any of the complexes (i.e., DNA-targeting RNA/dCas9 polypeptide) in the network can be designed to control other DNA-targeting RNAs or dCas9 polypeptides. Because a subject DNA-targeting RNA and subject variant Cas9 site-directed polypeptide can be targeted to any desired DNA sequence, the methods described herein can be used to control and regulate the expression of any desired target. The integrated networks (i.e., cascades of interactions) that can be designed range from very simple to very complex, and are without limit.

In a network wherein two or more components (e.g., DNA-targeting RNAs, activator-RNAs, targeter-RNAs, or dCas9 polypeptides) are each under regulatory control of another DNA-targeting RNA/dCas9 polypeptide complex, the level of expression of one component of the network may affect the level of expression (e.g., may increase or decrease the expression) of another component of the network. Through this mechanism, the expression of one component may affect the expression of a different component in the same network, and the network may include a mix of components that increase the expression of other components, as well as components that decrease the expression of other components. As would be readily understood by one of skill in the art, the above examples whereby the level of expression of one component may affect the level of expression of one or more different component(s) are for illustrative purposes, and are not limiting. An additional layer of complexity may be optionally introduced into a network when one or more components are modified (as described above) to be manipulable (i.e., under experimental control, e.g., temperature control; drug control, i.e., drug inducible control; light control; etc.).

As one non-limiting example, a first DNA-targeting RNA can bind to the promoter of a second DNA-targeting RNA, which controls the expression of a target therapeutic/metabolic gene. In such a case, conditional expression of the first DNA-targeting RNA indirectly activates the therapeutic/metabolic gene. RNA cascades of this type are useful, for example, for easily converting a repressor into an activator, and can be used to control the logics or dynamics of expression of a target gene.

A subject transcription modulation method can also be used for drug discovery and target validation.

Kits

The present disclosure provides a kit for carrying out a subject method. A subject kit comprises: a) a DNA-targeting RNA of the present disclosure, or a nucleic acid comprising a nucleotide sequence encoding the DNA-targeting RNA, wherein the DNA-targeting RNA comprises: i)) a first segment comprising a nucleotide sequence that is complementary to a target sequence in the target DNA; ii)) a second segment that interacts with a site-directed polypeptide; and iii) a transcriptional terminator; and b) a buffer. In some cases, the nucleic acid comprising a nucleotide sequence encoding the DNA-targeting RNA further comprises a nucleotide sequence encoding a variant Cas9 site-directed polypeptide that exhibits reduced endodeoxyribonuclease activity relative to wild-type Cas9.

In some cases, a subject kit further comprises a variant Cas9 site-directed polypeptide that exhibits reduced endodeoxyribonuclease activity relative to wild-type Cas9.

In some cases, a subject kit further comprises a nucleic acid comprising a nucleotide sequence encoding a variant Cas9 site-directed polypeptide that exhibits reduced endodeoxyribonuclease activity relative to wild-type Cas9.

A subject can further include one or more additional reagents, where such additional reagents can be selected from: a buffer; a wash buffer; a control reagent; a control expression vector or RNA polynucleotide; a reagent for in vitro production of the variant Cas9 site-directed polypeptide from DNA; and the like. In some cases, the variant Cas9 site-directed polypeptide included in a subject kit is a fusion variant Cas9 site-directed polypeptide, as described above.

Components of a subject kit can be in separate containers; or can be combined in a single container.

In addition to above-mentioned components, a subject kit can further include instructions for using the components of the kit to practice the subject methods. The instructions for practicing the subject methods are generally recorded on a suitable recording medium. For example, the instructions may be printed on a substrate, such as paper or plastic, etc. As such, the instructions may be present in the kits as a package insert, in the labeling of the container of the kit or components thereof (i.e., associated with the packaging or subpackaging) etc. In other embodiments, the instructions are present as an electronic storage data file present on a suitable computer readable storage medium, e.g. CD-ROM, diskette, flash drive, etc. In yet other embodiments, the actual instructions are not present in the kit, but means for obtaining the instructions from a remote source, e.g. via the internet, are provided. An example of this embodiment is a kit that includes a web address where the instructions can be viewed and/or from which the instructions can be downloaded. As with the instructions, this means for obtaining the instructions is recorded on a suitable substrate.

EXAMPLES

The following examples are put forth so as to provide those of ordinary skill in the art with a complete disclosure and description of how to make and use the present invention, and are not intended to limit the scope of what the inventors regard as their invention nor are they intended to represent that the experiments below are all or the only experiments performed. Efforts have been made to ensure accuracy with respect to numbers used (e.g. amounts, temperature, etc.) but some experimental errors and deviations should be accounted for. Unless indicated otherwise, parts are parts by weight, molecular weight is weight average molecular weight, temperature is in degrees Celsius, and pressure is at or near atmospheric. Standard abbreviations may be used, e.g., bp, base pair(s); kb, kilobase(s); pl, picoliter(s); s or sec, second(s); min, minute(s); h or hr, hour(s); aa, amino acid(s); kb, kilobase(s); bp, base pair(s); nt, nucleotide(s); i.m., intramuscular(ly); i.p., intraperitoneal(ly); s.c., subcutaneous(ly); and the like.

Example 1: Use of Cas9 to Generate Modifications in Target DNA

Materials and Methods

Bacterial Strains and Culture Conditions

Streptococcus pyogenes, cultured in THY medium (Todd Hewitt Broth (THB, Bacto, Becton Dickinson) supplemented with 0.2% yeast extract (Oxoid)) or on TSA (trypticase soy agar, BBL, Becton Dickinson) supplemented with 3% sheep blood, was incubated at 37° C. in an atmosphere supplemented with 5% CO2 without shaking. Escherichia coli, cultured in Luria-Bertani (LB) medium and agar, was incubated at 37° C. with shaking. When required, suitable antibiotics were added to the medium at the following final concentrations: ampicillin, 100 μg/ml for E. coli; chloramphenicol, 33 μg/ml for Escherichia coli; kanamycin, 25 μg/ml for E. coli and 300 μg/ml for S. pyogenes. Bacterial cell growth was monitored periodically by measuring the optical density of culture aliquots at 620 nm using a microplate reader (SLT Spectra Reader).

Transformation of Bacterial Cells

Plasmid DNA transformation into E. coli cells was performed according to a standard heat shock protocol. Transformation of S. pyogenes was performed as previously described with some modifications. The transformation assay performed to monitor in vivo CRISPR/Cas activity on plasmid maintenance was essentially carried out as described previously. Briefly, electrocompetent cells of S. pyogenes were equalized to the same cell density and electroporated with 500 ng of plasmid DNA. Every transformation was plated two to three times and the experiment was performed three times independently with different batches of competent cells for statistical analysis. Transformation efficiencies were calculated as CFU (colony forming units) per μg of DNA. Control transformations were performed with sterile water and backbone vector pEC85.

DNA Manipulations

DNA manipulations including DNA preparation, amplification, digestion, ligation, purification, agarose gel electrophoresis were performed according to standard techniques with minor modifications. Protospacer plasmids for the in vitro cleavage and S. pyogenes transformation assays were constructed as described previously (4). Additional pUC19-based protospacer plasmids for in vitro cleavage assays were generated by ligating annealed oligonucleotides between digested EcoRI and BamHI sites in pUC19. The GFP gene-containing plasmid has been described previously (41). Kits (Qiagen) were used for DNA purification and plasmid preparation. Plasmid mutagenesis was performed using QuikChange® II XL kit (Stratagene) or QuikChange site-directed mutagenesis kit (Agilent). VBC-Biotech Services, Sigma-Aldrich and Integrated DNA Technologies supplied the synthetic oligonucleotides and RNAs.

Oligonucleotides for In Vitro Transcription Templates

Templates for In Vitro Transcribed CRISPR Type II-A tracrRNA and crRNAs of S. pyogenes (for tracrRNA—PCR on Chr. DNA SF370; for crRNA—Annealing of Two Oligonucleotides)

T7-tracrRNA (75 nt)

OLEC1521 (F 5′ tracrRNA): SEQ ID NO:340

OLEC1522 (R 3′ tracrRNA): SEQ ID NO:341

T7-crRNA (template)

OLEC2176 (F crRNA-sp1): SEQ ID NO:342

OLEC2178 (R crRNA-sp1): SEQ ID NO:343

OLEC2177 (F crRNA-sp2): SEQ ID NO:344

OLEC2179 (R crRNA-sp2): SEQ ID NO:345

Templates for In Vitro Transcribed N. meningitidis tracrRNA and Engineered crRNA-Sp2 (for tracrRNA—PCR on Chr. DNA Z2491; for crRNA—Annealing of Two Oligonucleotides)

T7-tracrRNA

OLEC2205 (F predicted 5′): SEQ ID NO:346

OLEC2206 (R predicted 3′): SEQ ID NO:347

T7-crRNA (Template)

OLEC2209 (F sp2(speM)+N.m. repeat): SEQ ID NO:348

OLEC2214 (R sp2(speM)+N.m. repeat): SEQ ID NO:349

Templates for In Vitro Transcribed L. innocua tracrRNA and Engineered crRNA-Sp2 (for tracrRNA—PCR on Chr. DNA Clip11262; for crRNA—Annealing of Two Oligonucleotides)

T7-tracrRNA

OLEC2203 (F predicted 5′): SEQ ID NO:350

OLEC2204 (R predicted 3′): SEQ ID NO:351

T7-crRNA (template)

OLEC2207 (F sp2(speM)+L.in. repeat): SEQ ID NO:352

OLEC2212 (R sp2(speM)+L.in. repeat): SEQ ID NO:353

Oligonucleotides for Constructing Plasmids with Protospacer for In Vitro and In Vivo Studies

Plasmids for speM (Spacer 2 (CRISPR Type II-A, SF370; Protospacer Prophage ø8232.3 from MGAS8232) Analysis In Vitro and in S. pyogenes (Template: Chr. DNA MGAS8232 or Plasmids Containing speM Fragments)

pEC287

OLEC1555 (F speM): SEQ ID NO:354

OLEC1556 (R speM): SEQ ID NO:355

pEC488

OLEC2145 (F speM): SEQ ID NO:356

OLEC2146 (R speM): SEQ ID NO:357

pEC370

OLEC1593 (F pEC488 protospacer 2 A22G): SEQ ID NO:358

OLEC1594 (R pEC488 protospacer 2 A22G): SEQ ID NO:359

pEC371

OLEC1595 (F pEC488 protospacer 2 T10C): SEQ ID NO:360

OLEC1596 (R pEC488 protospacer 2 T10C): SEQ ID NO:361

pEC372

OLEC2185 (F pEC488 protospacer 2 T7A): SEQ ID NO:362

OLEC2186 (R pEC488 protospacer 2 T7A): SEQ ID NO:363

pEC373

OLEC2187 (F pEC488 protospacer 2 A6T): SEQ ID NO:364

OLEC2188 (R pEC488 protospacer 2 A6T): SEQ ID NO:365

pEC374

OLEC2235 (F pEC488 protospacer 2 AST): SEQ ID NO:366

OLEC2236 (R pEC488 protospacer 2 AST): SEQ ID NO:367

pEC375

OLEC2233 (F pEC488 protospacer 2 A4T): SEQ ID NO:368

OLEC2234 (R pEC488 protospacer 2 A4T): SEQ ID NO:369

pEC376

OLEC2189 (F pEC488 protospacer 2 A3T): SEQ ID NO:370

OLEC2190 (R pEC488 protospacer 2 A3T): SEQ ID NO:371

pEC377

OLEC2191 (F pEC488 protospacer 2 PAM G1C): SEQ ID NO:372

OLEC2192 (R pEC488 protospacer 2 PAM G1C): SEQ ID NO:373

pEC378

OLEC2237 (F pEC488 protospacer 2 PAM GG1, 2CC): SEQ ID NO:374

OLEC2238 (R pEC488 protospacer 2 PAM GG1, 2CC): SEQ ID NO:375

Plasmids for SPy_0700 (Spacer 1 (CRISPR Type II-A, SF370; Protospacer Prophage ø370.1 from SF370) Analysis In Vitro and in S. pyogenes (Template: Chr. DNA SF370 or Plasmids Containing SPy_0700 Fragments)

pEC489

OLEC2106 (F Spy_0700): SEQ ID NO:376

OLEC2107 (R Spy_0700): SEQ ID NO:377

pEC573

OLEC2941 (F PAM TG1, 2GG): SEQ ID NO:378

OLEC2942 (R PAM TG1, 2GG): SEQ ID NO:379

Oligonucleotides for Verification of Plasmid Constructs and Cutting Sites by Sequencing Analysis

ColE1 (pEC85)

oliRN228 (R sequencing): SEQ ID NO:380

speM (pEC287)

OLEC1557 (F sequencing): SEQ ID NO:381

OLEC1556 (R sequencing): SEQ ID NO:382

repDEG-pAMbeta1 (pEC85)

OLEC787 (F sequencing): SEQ ID NO:383

Oligonucleotides for In Vitro Cleavage Assays

crRNA

Spacer 1 crRNA (1-42): SEQ ID NO:384

Spacer 2 crRNA (1-42): SEQ ID NO:385

Spacer 4 crRNA (1-42): SEQ ID NO:386

Spacer 2 crRNA (1-36): SEQ ID NO:387

Spacer 2 crRNA (1-32): SEQ ID NO:388

Spacer 2 crRNA (11-42): SEQ ID NO:389

tracrRNA

(4-89): SEQ ID NO:390

(15-89): SEQ ID NO:391

(23-89): SEQ ID NO:392

(15-53): SEQ ID NO:393

(15-44): SEQ ID NO:394

(15-36): SEQ ID NO:395

(23-53): SEQ ID NO:396

(23-48): SEQ ID NO:397

(23-44): SEQ ID NO:398

(1-26): SEQ ID NO:399

chimeric RNAs

Spacer 1—chimera A: SEQ ID NO:400

Spacer 1—chimera B: SEQ ID NO:401

Spacer 2—chimera A: SEQ ID NO:402

Spacer 2—chimera B: SEQ ID NO:403

Spacer 4—chimera A: SEQ ID NO:404

Spacer 4—chimera B: SEQ ID NO:405

GFP1: SEQ ID NO:406

GFP2: SEQ ID NO:407

GFP3: SEQ ID NO:408

GFP4: SEQ ID NO:409

GFP5: SEQ ID NO:410

DNA Oligonucleotides as Substrates for Cleavage Assays (Protospacer in Bold, PAM Underlined)

protospacer 1—complementary—WT: SEQ ID NO:411

protospacer 1—noncomplementary—WT: SEQ ID NO:412

protospacer 2—complementary—WT: SEQ ID NO:413

protospacer 2—noncomplementary—WT: SEQ ID NO:414

protospacer 4—complementary—WT: SEQ ID NO:415

protospacer 4—noncomplementary—WT: SEQ ID NO:416

protospacer 2—complementary—PAM1: SEQ ID NO:417

protospacer 2—noncomplementary—PAM1: SEQ ID NO:418

protospacer 2—complementary—PAM2: SEQ ID NO:419

protospacer 2—noncomplementary—PAM2: SEQ ID NO:420

protospacer 4—complementary—PAM1: SEQ ID NO:421

protospacer 4—noncomplementary—PAM1: SEQ ID NO:422

protospacer 4—complementary—PAM2: SEQ ID NO:423

protospacer 4—noncomplementary—PAM2: SEQ ID NO:424

In Vitro Transcription and Purification of RNA

RNA was in vitro transcribed using T7 Flash in vitro Transcription Kit (Epicentre, Illumina company) and PCR-generated DNA templates carrying a T7 promoter sequence. RNA was gel-purified and quality-checked prior to use. The primers used for the preparation of RNA templates from S. pyogenes SF370, Listeria innocua Clip 11262 and Neisseria meningitidis A Z2491 are described above.

Protein Purification

The sequence encoding Cas9 (residues 1-1368) was PCRamplified from the genomic DNA of S. pyogenes SF370 and inserted into a custom pET-based expression vector using ligation-independent cloning (LIC). The resulting fusion construct contained an N-terminal hexahistidine-maltose binding protein (His6-MBP) tag, followed by a peptide sequence containing a tobacco etch virus (TEV) protease cleavage site. The protein was expressed in E. coli strain BL21 Rosetta 2 (DE3) (EMD Biosciences), grown in 2×TY medium at 18° C. for 16 h following induction with 0.2 mM IPTG. The protein was purified by a combination of affinity, ion exchange and size exclusion chromatographic steps. Briefly, cells were lysed in 20 mM Tris pH 8.0, 500 mM NaCl, 1 mM TCEP (supplemented with protease inhibitor cocktail (Roche)) in a homogenizer (Avestin). Clarified lysate was bound in batch to Ni-NTA agarose (Qiagen). The resin was washed extensively with 20 mM Tris pH 8.0, 500 mM NaCl and the bound protein was eluted in 20 mM Tris pH 8.0, 250 mM NaCl, 10% glycerol. The His6-MBP affinity tag was removed by cleavage with TEV protease, while the protein was dialyzed overnight against 20 mM HEPES pH 7.5, 150 mM KCl, 1 mM TCEP, 10% glycerol. The cleaved Cas9 protein was separated from the fusion tag by purification on a 5 ml SP Sepharose HiTrap column (GE Life Sciences), eluting with a linear gradient of 100 mM-1 M KCl. The protein was further purified by size exclusion chromatography on a Superdex 200 16/60 column in 20 mM HEPES pH 7.5, 150 mM KCl and 1 mM TCEP. Eluted protein was concentrated to ˜8 mg/ml, flash-frozen in liquid nitrogen and stored at −80° C. Cas9 D10A, H840A and D10A/H840A point mutants were generated using the QuikChange site-directed mutagenesis kit (Agilent) and confirmed by DNA sequencing. The proteins were purified following the same procedure as for the wildtype Cas9 protein.

Cas9 orthologs from Streptococcus thermophilus (LMD-9, YP_820832.1), L. innocua (Clip11262, NP_472073.1), Campylobacter jejuni (subsp. jejuni NCTC 11168, YP_002344900.1) and N. meningitidis (Z2491, YP_002342100.1) were expressed in BL21 Rosetta (DE3) pLysS cells (Novagen) as His6-MBP (N. meningitidis and C. jejuni), His6-Thioredoxin (L. innocua) and His6-GST (S. thermophilus) fusion proteins, and purified essentially as for S. pyogenes Cas9 with the following modifications. Due to large amounts of co-purifying nucleic acids, all four Cas9 proteins were purified by an additional heparin sepharose step prior to gel filtration, eluting the bound protein with a linear gradient of 100 mM-2 M KCl. This successfully removed nucleic acid contamination from the C. jejuni, N. meningitidis and L. innocua proteins, but failed to remove co-purifying nucleic acids from the S. thermophilus Cas9 preparation. All proteins were concentrated to 1-8 mg/ml in 20 mM HEPES pH 7.5, 150 mM KCl and 1 mM TCEP, flash-frozen in liquid N2 and stored at −80° C.

Plasmid DNA Cleavage Assay

Synthetic or in vitro-transcribed tracrRNA and crRNA were pre-annealed prior to the reaction by heating to 95° C. and slowly cooling down to room temperature. Native or restriction digest-linearized plasmid DNA (300 ng (˜8 nM)) was incubated for 60 min at 37° C. with purified Cas9 protein (50-500 nM) and tracrRNA:crRNA duplex (50-500 nM, 1:1) in a Cas9 plasmid cleavage buffer (20 mM HEPES pH 7.5, 150 mM KCl, 0.5 mM DTT, 0.1 mM EDTA) with or without 10 mM MgCl2. The reactions were stopped with 5×DNA loading buffer containing 250 mM EDTA, resolved by 0.8 or 1% agarose gel electrophoresis and visualized by ethidium bromide staining. For the Cas9 mutant cleavage assays, the reactions were stopped with 5×SDS loading buffer (30% glycerol, 1.2% SDS, 250 mM EDTA) prior to loading on the agarose gel.

Metal-Dependent Cleavage Assay

Protospacer 2 plasmid DNA (5 nM) was incubated for 1 h at 37° C. with Cas9 (50 nM) pre-incubated with 50 nM tracrRNA:crRNA-sp2 in cleavage buffer (20 mM HEPES pH 7.5, 150 mM KCl, 0.5 mM DTT, 0.1 mM EDTA) supplemented with 1, 5 or 10 mM MgCl2, 1 or 10 mM of MnCl2, CaCl2, ZnCl2, CoCl2, NiSO4 or CuSO4. The reaction was stopped by adding 5×SDS loading buffer (30% glycerol, 1.2% SDS, 250 mM EDTA), resolved by 1% agarose gel electrophoresis and visualized by ethidium bromide staining.

Single-Turnover Assay

Cas9 (25 nM) was pre-incubated 15 min at 37° C. in cleavage buffer (20 mM HEPES pH 7.5, 150 mM KCl, 10 mM MgCl2, 0.5 mM DTT, 0.1 mM EDTA) with duplexed tracrRNA:crRNA-sp2 (25 nM, 1:1) or both RNAs (25 nM) not preannealed and the reaction was started by adding protospacer 2 plasmid DNA (5 nM). The reaction mix was incubated at 37° C. At defined time intervals, samples were withdrawn from the reaction, 5×SDS loading buffer (30% glycerol, 1.2% SDS, 250 mM EDTA) was added to stop the reaction and the cleavage was monitored by 1% agarose gel electrophoresis and ethidium bromide staining. The same was done for the single turnover kinetics without pre-incubation of Cas9 and RNA, where protospacer 2 plasmid DNA (5 nM) was mixed in cleavage buffer with duplex tracrRNA:crRNA-sp2 (25 nM) or both RNAs (25 nM) not pre-annealed, and the reaction was started by addition of Cas9 (25 nM). Percentage of cleavage was analyzed by densitometry and the average of three independent experiments was plotted against time. The data were fit by nonlinear regression analysis and the cleavage rates (kobs [min−1]) were calculated.

Multiple-Turnover Assay

Cas9 (1 nM) was pre-incubated for 15 min at 37° C. in cleavage buffer (20 mM HEPES pH 7.5, 150 mM KCl, 10 mM MgCl2, 0.5 mM DTT, 0.1 mM EDTA) with pre-annealed tracrRNA:crRNA-sp2 (1 nM, 1:1). The reaction was started by addition of protospacer 2 plasmid DNA (5 nM). At defined time intervals, samples were withdrawn and the reaction was stopped by adding 5×SDS loading buffer (30% glycerol, 1.2% SDS, 250 mM EDTA). The cleavage reaction was resolved by 1% agarose gel electrophoresis, stained with ethidium bromide and the percentage of cleavage was analyzed by densitometry. The results of four independent experiments were plotted against time (min).

Oligonucleotide DNA Cleavage Assay

DNA oligonucleotides (10 pmol) were radiolabeled by incubating with 5 units T4 polynucleotide kinase (New England Biolabs) and ˜3-6 pmol (˜20-40 mCi) [γ-32P]-ATP (Promega) in 1× T4 polynucleotide kinase reaction buffer at 37° C. for 30 min, in a 50 μL reaction. After heat inactivation (65° C. for 20 min), reactions were purified through an Illustra MicroSpin G-25 column (GE Healthcare) to remove unincorporated label. Duplex substrates (100 nM) were generated by annealing labeled oligonucleotides with equimolar amounts of unlabeled complementary oligonucleotide at 95° C. for 3 min, followed by slow cooling to room temperature. For cleavage assays, tracrRNA and crRNA were annealed by heating to 95° C. for 30 s, followed by slow cooling to room temperature. Cas9 (500 nM final concentration) was pre-incubated with the annealed tracrRNA:crRNA duplex (500 nM) in cleavage assay buffer (20 mM HEPES pH 7.5, 100 mM KCl, 5 mM MgCl2, 1 mM DTT, 5% glycerol) in a total volume of 9 μl. Reactions were initiated by the addition of 1 μl target DNA (10 nM) and incubated for 1 h at 37° C. Reactions were quenched by the addition of 20 μl of loading dye (5 mM EDTA, 0.025% SDS, 5% glycerol in formamide) and heated to 95° C. for 5 min. Cleavage products were resolved on 12% denaturing polyacrylamide gels containing 7 M urea and visualized by phosphorimaging (Storm, GE Life Sciences). Cleavage assays testing PAM requirements (FIG. 13B) were carried out using DNA duplex substrates that had been pre-annealed and purified on an 8% native acrylamide gel, and subsequently radiolabeled at both 5′ ends. The reactions were set-up and analyzed as above.

Electrophoretic Mobility Shift Assays

Target DNA duplexes were formed by mixing of each strand (10 nmol) in deionized water, heating to 95° C. for 3 min and slow cooling to room temperature. All DNAs were purified on 8% native gels containing 1×TBE. DNA bands were visualized by UV shadowing, excised, and eluted by soaking gel pieces in DEPC-treated H2O. Eluted DNA was ethanol precipitated and dissolved in DEPC-treated H2O. DNA samples were 5′ end labeled with [γ-32P]-ATP using T4 polynucleotide kinase (New England Biolabs) for 30 min at 37° C. PNK was heat denatured at 65° C. for 20 min, and unincorporated radiolabel was removed using an Illustra MicroSpin G-25 column (GE Healthcare). Binding assays were performed in buffer containing 20 mM HEPES pH 7.5, 100 mM KCl, 5 mM MgCl2, 1 mM DTT and 10% glycerol in a total volume of 10 μl. Cas9 D10A/H840A double mutant was programmed with equimolar amounts of pre-annealed tracrRNA:crRNA duplex and titrated from 100 pM to 1 μM. Radiolabeled DNA was added to a final concentration of 20 pM. Samples were incubated for 1 h at 37° C. and resolved at 4° C. on an 8% native polyacrylamide gel containing 1×TBE and 5 mM MgCl2. Gels were dried and DNA visualized by phosphorimaging.

In Silico Analysis of DNA and Protein Sequences

Vector NTI package (Invitrogen) was used for DNA sequence analysis (Vector NTI) and comparative sequence analysis of proteins (AlignX).

In Silico Modeling of RNA Structure and Co-Folding

In silico predictions were performed using the Vienna RNA package algorithms (42, 43). RNA secondary structures and co-folding models were predicted with RNAfold and RNAcofold, respectively and visualized with VARNA (44).

Results

Bacteria and archaea have evolved RNA mediated adaptive defense systems called clustered regularly interspaced short palindromic repeats (CRISPR)/CRISPR-associated (Cas) that protect organisms from invading viruses and plasmids (1-3). We show that in a subset of these systems, the mature crRNA that is base-paired to trans-activating crRNA (tracrRNA) forms a two-RNA structure that directs the CRISPR-associated protein Cas9 to introduce double-stranded (ds) breaks in target DNA. At sites complementary to the crRNA-guide sequence, the Cas9 HNH nuclease domain cleaves the complementary strand, whereas the Cas9 RuvC-like domain cleaves the noncomplementary strand. The dual-tracrRNA:crRNA, when engineered as a single RNA chimera, also directs sequence-specific Cas9 dsDNA cleavage. These studies reveal a family of endonucleases that use dual-RNAs for site-specific DNA cleavage and highlight the ability to exploit the system for RNA-programmable genome editing.

CRISPR/Cas defense systems rely on small RNAs for sequence-specific detection and silencing of foreign nucleic acids. CRISPR/Cas systems are composed of cas genes organized in operon(s) and CRISPR array(s) consisting of genome-targeting sequences (called spacers) interspersed with identical repeats (1-3). CRISPR/Cas-mediated immunity occurs in three steps. In the adaptive phase, bacteria and archaea harboring one or more CRISPR loci respond to viral or plasmid challenge by integrating short fragments of foreign sequence (protospacers) into the host chromosome at the proximal end of the CRISPR array (1-3). In the expression and interference phases, transcription of the repeat spacer element into precursor CRISPR RNA (pre-crRNA) molecules followed by enzymatic cleavage yields the short crRNAs that can pair with complementary protospacer sequences of invading viral or plasmid targets (4-11). Target recognition by crRNAs directs the silencing of the foreign sequences by means of Cas proteins that function in complex with the crRNAs (10, 12-20).

There are three types of CRISPR/Cas systems (21-23). The type I and III systems share some overarching features: specialized Cas endonucleases process the pre-crRNAs, and oncemature, each crRNA assembles into a large multi-Cas protein complex capable of recognizing and cleaving nucleic acids complementary to the crRNA. In contrast, type II systems process precrRNAs by a different mechanism in which a trans-activating crRNA (tracrRNA) complementary to the repeat sequences in pre-crRNA triggers processing by the double-stranded (ds) RNA specific ribonuclease RNase III in the presence of the Cas9 (formerly Csn1) protein (FIG. 15) (4, 24). Cas9 is thought to be the sole protein responsible for crRNA-guided silencing of foreign DNA (25-27).

We show that in type II systems, Cas9 proteins constitute a family of enzymes that require a base-paired structure formed between the activating tracrRNA and the targeting crRNA to cleave target dsDNA. Site-specific cleavage occurs at locations determined by both base-pairing complementarity between the crRNA and the target protospacer DNA and a short motif [referred to as the protospacer adjacent motif (PAM)] juxtaposed to the complementary region in the target DNA. Our study further demonstrates that the Cas9 endonuclease family can be programmed with single RNA molecules to cleave specific DNA sites, thereby facilitating the development of a simple and versatile RNA-directed system to generate dsDNA breaks for genome targeting and editing.

Cas9 is a DNA Endonuclease Guided by Two RNAs

Cas9, the hallmark protein of type II systems, has been hypothesized to be involved in both crRNA maturation and crRNA-guided DNA interference (FIG. 15) (4, 25-27). Cas9 is involved in crRNA maturation (4), but its direct participation in target DNA destruction has not been investigated. To test whether and how Cas9 might be capable of target DNA cleavage, we used an overexpression system to purify Cas9 protein derived from the pathogen Streptococcus pyogenes (FIGS. 16A-16B, see supplementary materials and methods) and tested its ability to cleave a plasmid DNA or an oligonucleotide duplex bearing a protospacer sequence complementary to a mature crRNA, and a bona fide PAM. We found that mature crRNA alone was incapable of directing Cas9-catalyzed plasmid DNA cleavage (FIG. 10A and FIG. 17A). However, addition of tracrRNA, which can pair with the repeat sequence of crRNA and is essential to crRNA maturation in this system, triggered Cas9 to cleave plasmid DNA (FIG. 10A and FIG. 17A). The cleavage reaction required both magnesium and the presence of a crRNA sequence complementary to the DNA; a crRNA capable of tracrRNA base pairing but containing a noncognate target DNA-binding sequence did not support Cas9-catalyzed plasmid cleavage (FIG. 10A; FIG. 17A, compare crRNA-sp2 to crRNA-sp1; and FIG. 18A). We obtained similar results with a short linear dsDNA substrate (FIG. 10B and FIGS. 17B and 17C). Thus, the trans-activating tracrRNA is a small noncoding RNA with two critical functions: triggering pre-crRNA processing by the enzyme RNase III (4) and subsequently activating crRNA-guided DNA cleavage by Cas9.

Cleavage of both plasmid and short linear dsDNA by tracrRNA:crRNA-guided Cas9 is sitespecific (FIG. 10C to 10E, and FIGS. 19A and 19B). Plasmid DNA cleavage produced blunt ends at a position three base pairs upstream of the PAM sequence (FIGS. 10C and 10E, and FIGS. 19A and 19C) (26). Similarly, within short dsDNA duplexes, the DNA strand that is complementary to the target-binding sequence in the crRNA (the complementary strand) is cleaved at a site three base pairs upstream of the PAM(FIGS. 10D and 10E, and FIGS. 19B and 19C). The noncomplementary DNA strand is cleaved at one or more sites within three to eight base pairs upstream of the PAM. Further investigation revealed that the noncomplementary strand is first cleaved endonucleolytically and subsequently trimmed by a 3′-5′ exonuclease activity (FIG. 18B). The cleavage rates by Cas9 under single-turnover conditions ranged from 0.3 to 1 min-1, comparable to those of restriction endonucleases (FIG. 20A), whereas incubation of wildtype (WT) Cas9-tracrRNA:crRNA complex with a fivefold molar excess of substrate DNA provided evidence that the dual-RNA-guided Cas9 is a multiple-turnover enzyme (FIG. 20B). In contrast to the CRISPR type I Cascade complex (18), Cas9 cleaves both linearized and supercoiled plasmids (FIGS. 10A and 11A). Therefore, an invading plasmid can, in principle, be cleaved multiple times by Cas9 proteins programmed with different crRNAs.

(FIG. 10A) Cas9 was programmed with a 42-nucleotide crRNA-sp2 (crRNA containing a spacer 2 sequence) in the presence or absence of 75-nucleotide tracrRNA. The complex was added to circular or XhoI-linearized plasmid DNA bearing a sequence complementary to spacer 2 and a functional PAM. crRNA-sp1, specificity control; M, DNA marker; kbp, kilo-base pair. See FIG. 17A. (FIG. 10B) Cas9 was programmed with crRNA-sp2 and tracrRNA (nucleotides 4 to 89). The complex was incubated with double- or single-stranded DNAs harboring a sequence complementary to spacer 2 and a functional PAM (4). The complementary or noncomplementary strands of the DNA were 5′-radiolabeled and annealed with a nonlabeled partner strand. nt, nucleotides. See FIGS. 17B and 17C. (FIG. 10C) Sequencing analysis of cleavage products from FIG. 10A. Termination of primer extension in the sequencing reaction indicates the position of the cleavage site. The 3′ terminal A overhang (asterisks) is an artifact of the sequencing reaction. See FIGS. 19A and 19C. (FIG. 10D) The cleavage products from FIG. 10B were analyzed alongside 5′ end-labeled size markers derived from the complementary and noncomplementary strands of the target DNA duplex. M, marker; P, cleavage product. See FIGS. 19B and 19C (FIG. 10E) Schematic representation of tracrRNA, crRNA-sp2, and protospacer 2 DNA sequences. Regions of crRNA complementarity to tracrRNA (overline) and the protospacer DNA (underline) are represented. The PAM sequence is labeled; cleavage sites mapped in (FIG. 10C) and (FIG. 10D) are represented by white-filled arrows (FIG. 10C), a black-filled arrow [(FIG. 10D), complementary strand], and a black bar [(FIG. 10D), noncomplementary strand].

FIG. 15 depicts the type II RNA-mediated CRISPR/Cas immune pathway. The expression and interference steps are represented in the drawing. The type II CRISPR/Cas loci are composed of an operon of four genes encoding the proteins Cas9, Cas1, Cas2 and Csn2, a CRISPR array consisting of a leader sequence followed by identical repeats (black rectangles) interspersed with unique genome-targeting spacers (diamonds) and a sequence encoding the trans-activating tracrRNA. Represented here is the type II CRISPR/Cas locus of S. pyogenes SF370 (Accession number NC_002737) (4). Experimentally confirmed promoters and transcriptional terminator in this locus are indicated (4). The CRISPR array is transcribed as a precursor CRISPR RNA (pre-crRNA) molecule that undergoes a maturation process specific to the type II systems (4). In S. pyogenes SF370, tracrRNA is transcribed as two primary transcripts of 171 and 89 nt in length that have complementarity to each repeat of the pre-crRNA. The first processing event involves pairing of tracrRNA to pre-crRNA, forming a duplex RNA that is recognized and cleaved by the housekeeping endoribonuclease RNase III in the presence of the Cas9 protein. RNase III-mediated cleavage of the duplex RNA generates a 75-nt processed tracrRNA and a 66-nt intermediate crRNAs consisting of a central region containing a sequence of one spacer, flanked by portions of the repeat sequence. A second processing event, mediated by unknown ribonuclease(s), leads to the formation of mature crRNAs of 39 to 42 nt in length consisting of 5′-terminal spacer-derived guide sequence and repeat-derived 3′-terminal sequence. Following the first and second processing events, mature tracrRNA remains paired to the mature crRNAs and bound to the Cas9 protein. In this ternary complex, the dual tracrRNA:crRNA structure acts as guide RNA that directs the endonuclease Cas9 to the cognate target DNA. Target recognition by the Cas9-tracrRNA:crRNA complex is initiated by scanning the invading DNA molecule for homology between the protospacer sequence in the target DNA and the spacer-derived sequence in the crRNA. In addition to the DNA protospacer-crRNA spacer complementarity, DNA targeting requires the presence of a short motif (NGG, where N can be any nucleotide) adjacent to the protospacer (protospacer adjacent motif—PAM). Following pairing between the dual-RNA and the protospacer sequence, an R-loop is formed and Cas9 subsequently introduces a double-stranded break (DSB) in the DNA. Cleavage of target DNA by Cas9 requires two catalytic domains in the protein. At a specific site relative to the PAM, the HNH domain cleaves the complementary strand of the DNA while the RuvC-like domain cleaves the noncomplementary strand.

(FIG. 16A) S. pyogenes Cas9 was expressed in E. coli as a fusion protein containing an N-terminal His6-MBP tag and purified by a combination of affinity, ion exchange and size exclusion chromatographic steps. The affinity tag was removed by TEV protease cleavage following the affinity purification step. Shown is a chromatogram of the final size exclusion chromatography step on a Superdex 200 (16/60) column. Cas9 elutes as a single monomeric peak devoid of contaminating nucleic acids, as judged by the ratio of absorbances at 280 and 260 nm Inset; eluted fractions were resolved by SDS-PAGE on a 10% polyacrylamide gel and stained with SimplyBlue Safe Stain (Invitrogen). (FIG. 16B) SDS-PAGE analysis of purified Cas9 orthologs. Cas9 orthologs were purified as described in Supplementary Materials and Methods. 2.5 μg of each purified Cas9 were analyzed on a 4-20% gradient polyacrylamide gel and stained with SimplyBlue Safe Stain.

FIG. 17A-17C (also see FIG. 10A-10E). The protospacer 1 sequence originates from S. pyogenes SF370 (M1) SPy_0700, target of S. pyogenes SF370 crRNAsp1 (4). Here, the protospacer 1 sequence was manipulated by changing the PAM from a nonfunctional sequence (TTG) to a functional one (TGG). The protospacer 4 sequence originates from S. pyogenes MGAS10750 (M4) MGAS10750_Spy1285, target of S. pyogenes SF370 crRNA-sp4 (4). (FIG. 17A) Protospacer 1 plasmid DNA cleavage guided by cognate tracrRNA:crRNA duplexes. The cleavage products were resolved by agarose gel electrophoresis and visualized by ethidium bromide staining M, DNA marker; fragment sizes in base pairs are indicated. (FIG. 17B) Protospacer 1 oligonucleotide DNA cleavage guided by cognate tracrRNA:crRNA-sp1 duplex. The cleavage products were resolved by denaturating polyacrylamide gel electrophoresis and visualized by phosphorimaging. Fragment sizes in nucleotides are indicated. (FIG. 17C) Protospacer 4 oligonucleotide DNA cleavage guided by cognate tracrRNA:crRNA-sp4 duplex. The cleavage products were resolved by denaturating polyacrylamide gel electrophoresis and visualized by phosphorimaging. Fragment sizes in nucleotides are indicated. (FIG. 17A, FIG. 17 B, FIG. 17C) Experiments in FIG. 17A were performed as in FIG. 10A; in FIG. 17B and in FIG. 17C as in FIG. 10B. (FIG. 17B, FIG. 17C) A schematic of the tracrRNA:crRNAtarget DNA interaction is shown below. The regions of crRNA complementarity to tracrRNA and the protospacer DNA are overlined and underlined, respectively. The PAM sequence is labeled.

FIG. 18 (also see FIG. 10A-10E). (FIG. 18A) Protospacer 2 plasmid DNA was incubated with Cas9 complexed with tracrRNA:crRNA-sp2 in the presence of different concentrations of Mg2+, Mn2+, Ca2+, Zn2+, Co2+, Ni2+ or Cu2+. The cleavage products were resolved by agarose gel electrophoresis and visualized by ethidium bromide staining Plasmid forms are indicated. (FIG. 18B) A protospacer 4 oligonucleotide DNA duplex containing a PAM motif was annealed and gel-purified prior to radiolabeling at both 5′ ends. The duplex (10 nM final concentration) was incubated with Cas9 programmed with tracrRNA (nucleotides 23-89) and crRNAsp4 (500 nM final concentration, 1:1). At indicated time points (min), 10 μl aliquots of the cleavage reaction were quenched with formamide buffer containing 0.025% SDS and 5 mM EDTA, and analyzed by denaturing polyacrylamide gel electrophoresis as in FIG. 10B. Sizes in nucleotides are indicated.

(FIG. 19A) Mapping of protospacer 1 plasmid DNA cleavage. Cleavage products from FIG. 17A were analyzed by sequencing as in FIG. 10C. Note that the 3′ terminal A overhang (asterisk) is an artifact of the sequencing reaction. (FIG. 19B) Mapping of protospacer 4 oligonucleotide DNA cleavage. Cleavage products from FIG. 17C were analyzed by denaturing polyacrylamide gel electrophoresis alongside 5′ endlabeled oligonucleotide size markers derived from the complementary and noncomplementary strands of the protospacer 4 duplex DNA. M, marker; P, cleavage product. Lanes 1-2: complementary strand. Lanes 3-4: non-complementary strand. Fragment sizes in nucleotides are indicated. (FIG. 19C) Schematic representations of tracrRNA, crRNA-sp1 and protospacer 1 DNA sequences (top) and tracrRNA, crRNAsp4 and protospacer 4 DNA sequences (bottom). tracrRNA:crRNA forms a dual-RNA structure directed to complementary protospacer DNA through crRNA-protospacer DNA pairing. The regions of crRNA complementary to tracrRNA and the protospacer DNA are overlined and underlined, respectively. The cleavage sites in the complementary and noncomplementary DNA strands mapped in (FIG. 19A) (top) and (FIG. 19B) (bottom) are represented with arrows (FIG. 19A and FIG. 19B, complementary strand) and a black bar (FIG. 19B, noncomplementary strand) above the sequences, respectively.

(FIG. 20A) Single turnover kinetics of Cas9 under different RNA pre-annealing and protein-RNA pre-incubation conditions. Protospacer 2 plasmid DNA was incubated with either Cas9 pre-incubated with pre-annealed tracrRNA:crRNA-sp2 (o), Cas9 not pre-incubated with pre-annealed tracrRNA:crRNA-sp2 (●), Cas9 pre-incubated with not pre-annealed tracrRNA and crRNA-sp2 (□) or Cas9 not pre-incubated with not pre-annealed RNAs (▪). The cleavage activity was monitored in a time-dependent manner and analyzed by agarose gel electrophoresis followed by ethidium bromide staining. The average percentage of cleavage from three independent experiments is plotted against the time (min) and fitted with a nonlinear regression. The calculated cleavage rates (kobs) are shown in the table. The results suggest that the binding of Cas9 to the RNAs is not rate-limiting under the conditions tested. Plasmid forms are indicated. The obtained kobs values are comparable to those of restriction endonucleases which are typically of the order of 1-10 per min (45-47). (FIG. 20B) Cas9 is a multiple turnover endonuclease. Cas9 loaded with duplexed tracrRNA:crRNA-sp2 (1 nM, 1:1:1—indicated with gray line on the graph) was incubated with a 5-fold excess of native protospacer 2 plasmid DNA. Cleavage was monitored by withdrawing samples from the reaction at defined time intervals (0 to 120 min) followed by agarose gel electrophoresis analysis (top) and determination of cleavage product amount (nM) (bottom). Standard deviations of three independent experiments are indicated. In the time interval investigated, 1 nM Cas9 was able to cleave ˜2.5 nM plasmid DNA.

Each Cas9 Nuclease Domain Cleaves One DNA Strand

Cas9 contains domains homologous to both HNH and RuvC endonucleases (FIG. 11A and FIG. 3A and FIG. 3B) (21-23, 27, 28). We designed and purified Cas9 variants containing inactivating point mutations in the catalytic residues of either the HNH or RuvC-like domains (FIG. 11A and FIG. 3A and FIG. 3B) (23, 27). Incubation of these variant Cas9 proteins with native plasmid DNA showed that dual-RNA-guided mutant Cas9 proteins yielded nicked open circular plasmids, whereas the WT Cas9 protein-tracrRNA:crRNA complex produced a linear DNA product (FIG. 10A and FIG. 11A and FIG. 17A and FIG. 25A). This result indicates that the Cas9 HNH and RuvC-like domains each cleave one plasmid DNA strand. To determine which strand of the target DNA is cleaved by each Cas9 catalytic domain, we incubated the mutant Cas9-tracrRNA:crRNA complexes with short dsDNA substrates in which either the complementary or noncomplementary strand was radiolabeled at its 5′ end. The resulting cleavage products indicated that the Cas9 HNH domain cleaves the complementary DNA strand, whereas the Cas9 RuvC-like domain cleaves the noncomplementary DNA strand (FIG. 11B and FIG. 21B).

(FIG. 11A) (Top) Schematic representation of Cas9 domain structure showing the positions of domain mutations. D10A, Asp10→Ala10; H840A; His840→Ala840. Complexes of WT or nuclease mutant Cas9 proteins with tracrRNA: crRNA-sp2 were assayed for endonuclease activity as in FIG. 10A. (FIG. 11B) Complexes of WT Cas9 or nuclease domain mutants with tracrRNA and crRNA-sp2 were tested for activity as in FIG. 10B.

FIG. 3A and FIG. 3B The amino-acid sequence of Cas9 from S. pyogenes (SEQ ID NO:8) is represented. Cas9/Csn1 proteins from various diverse species have 2 domains that include motifs homologous to both HNH and RuvC endonucleases. (FIG. 3A) Motifs 1-4 (motif numbers are marked on left side of sequence) are shown for S. pyogenes Cas9/Csn1. The three predicted RuvC-like motifs (1, 2, 4) and the predicted HNH motif (3) are overlined. Residues Asp10 and His840, which were substituted by Ala in this study are highlighted by an asterisk above the sequence. Underlined residues are highly conserved among Cas9 proteins from different species. Mutations in underlined residues are likely to have functional consequences on Cas9 activity. Note that in the present study coupling of the two nuclease-like activities is experimentally demonstrated (FIGS. 11A-11B and FIGS. 21A-21B). (FIG. 3B) Domains 1 (amino acids 7-166) and 2 (amino acids 731-1003), which include motifs 1-4, are depicted for S. pyogenes Cas9/Csn1. Refer to Table 1 and FIG. 5 for additional information.

FIG. 21A-21B Protospacer DNA cleavage by cognate tracrRNA:crRNA-directed Cas9 mutants containing mutations in the HNH or RuvC-like domain. (FIG. 21A) Protospacer 1 plasmid DNA cleavage. The experiment was performed as in FIG. 11A. Plasmid DNA conformations and sizes in base pairs are indicated. (FIG. 21B) Protospacer 4 oligonucleotide DNA cleavage. The experiment was performed as in FIG. 11B. Sizes in nucleotides are indicated.

Dual-RNA Requirements for Target DNA Binding and Cleavage

tracrRNA might be required for targetDNA binding and/or to stimulate the nuclease activity of Cas9 downstream of target recognition. To distinguish between these possibilities, we used an electrophoretic mobility shift assay to monitor target DNA binding by catalytically inactive Cas9 in the presence or absence of crRNA and/or tracrRNA. Addition of tracrRNA substantially enhanced target DNA binding by Cas9, whereas we observed little specific DNA binding with Cas9 alone or Cas9-crRNA (FIG. 22). This indicates that tracrRNA is required for target DNA recognition, possibly by properly orienting the crRNA for interaction with the complementary strand of target DNA. The predicted tracrRNA:crRNA secondary structure includes base pairing between the 22 nucleotides at the 3′ terminus of the crRNA and a segment near the 5′ end of the mature tracrRNA (FIG. 10E). This interaction creates a structure in which the 5′-terminal 20 nucleotides of the crRNA, which vary in sequence in different crRNAs, are available for target DNA binding. The bulk of the tracrRNA downstream of the crRNA basepairing region is free to form additional RNA structure(s) and/or to interact with Cas9 or the target DNA site. To determine whether the entire length of the tracrRNA is necessary for sitespecific Cas9-catalyzed DNA cleavage, we tested Cas9-tracrRNA:crRNA complexes reconstituted using full-length mature (42-nucleotide) crRNA and various truncated forms of tracrRNA lacking sequences at their 5′ or 3′ ends. These complexes were tested for cleavage using a short target dsDNA. A substantially truncated version of the tracrRNA retaining nucleotides 23 to 48 of the native sequence was capable of supporting robust dual-RNA-guided Cas9-catalyzed DNA cleavage (FIG. 12A and FIG. 12C, and FIG. 23A and FIG. 23B). Truncation of the crRNA from either end showed that Cas9-catalyzed cleavage in the presence of tracrRNA could be triggered with crRNAs missing the 3′-terminal 10 nucleotides (FIG. 12B and FIG. 12C). In contrast, a 10-nucleotide deletion from the 5′ end of crRNA abolished DNA cleavage by Cas9 (FIG. 12B). We also analyzed Cas9 orthologs from various bacterial species for their ability to support S. pyogenes tracrRNA:crRNA-guided DNA cleavage. In contrast to closely related S. pyogenes Cas9 orthologs, more distantly related orthologs were not functional in the cleavage reaction (FIG. 24A-24D). Similarly, S. pyogenes Cas9 guided by tracrRNA:crRNA duplexes originating from more distant systems was unable to cleave DNA efficiently (FIG. 24A-24D). Species specificity of dual-RNA-guided cleavage of DNA indicates coevolution of Cas9, tracrRNA, and the crRNA repeat, as well as the existence of a still unknown structure and/or sequence in the dual-RNA that is critical for the formation of the ternary complex with specific Cas9 orthologs.

To investigate the protospacer sequence requirements for type II CRISPR/Cas immunity in bacterial cells, we analyzed a series of protospacer-containing plasmid DNAs harboring single-nucleotide mutations for their maintenance following transformation in S. pyogenes and their ability to be cleaved by Cas9 in vitro. In contrast to point mutations introduced at the 5′ end of the protospacer, mutations in the region close to the PAM and the Cas9 cleavage sites were not tolerated in vivo and resulted in decreased plasmid cleavage efficiency in vitro (FIG. 12D). Our results are in agreement with a previous report of protospacer escape mutants selected in the type II CRISPR system from S. thermophilus in vivo (27, 29). Furthermore, the plasmid maintenance and cleavage results hint at the existence of a “seed” region located at the 3′ end of the protospacer sequence that is crucial for the interaction with crRNA and subsequent cleavage by Cas9. In support of this notion, Cas9 enhanced complementary DNA strand hybridization to the crRNA; this enhancement was the strongest in the 3′-terminal region of the crRNA targeting sequence (FIG. 25A-25C). Corroborating this finding, a contiguous stretch of at least 13 base pairs between the crRNA and the target DNA site proximal to the PAM is required for efficient target cleavage, whereas up to six contiguous mismatches in the 5′-terminal region of the protospacer are tolerated (FIG. 12E). These findings are reminiscent of the previously observed seed-sequence requirements for target nucleic acid recognition in Argonaute proteins (30, 31) and the Cascade and Csy CRISPR complexes (13, 14).

(FIG. 12A) Cas9-tracrRNA: crRNA complexes were reconstituted using 42-nucleotide crRNA-sp2 and truncated tracrRNA constructs and were assayed for cleavage activity as in FIG. 10B. (FIG. 12B) Cas9 programmed with full-length tracrRNA and crRNA-sp2 truncations was assayed for activity as in (FIG. 12A). (FIG. 12C) Minimal regions of tracrRNA and crRNA capable of guiding Cas9-mediated DNA cleavage (shaded region). (FIG. 12D) Plasmids containing WT or mutant protospacer 2 sequences with indicated point mutations were cleaved in vitro by programmed Cas9 as in FIG. 10A and used for transformation assays of WT or pre-crRNA-deficient S. pyogenes. The transformation efficiency was calculated as colony-forming units (CFU) per microgram of plasmid DNA. Error bars represent SDs for three biological replicates. (FIG. 12E) Plasmids containing WT and mutant protospacer 2 inserts with varying extent of crRNA-target DNA mismatches (bottom) were cleaved in vitro by programmed Cas9 (top). The cleavage reactions were further digested with XmnI. The 1880- and 800-bp fragments are Cas9-generated cleavage products. M, DNA marker.

FIG. 22 Electrophoretic mobility shift assays were performed using protospacer 4 target DNA duplex and Cas9 (containing nuclease domain inactivating mutations D10A and H840) alone or in the presence of crRNA-sp4, tracrRNA (75 nt), or both. The target DNA duplex was radiolabeled at both 5′ ends. Cas9 (D10/H840A) and complexes were titrated from 1 nM to 1 μM. Binding was analyzed by 8% native polyacrylamide gel electrophoresis and visualized by phosphorimaging. Note that Cas9 alone binds target DNA with moderate affinity. This binding is unaffected by the addition of crRNA, suggesting that this represents sequence nonspecific interaction with the dsDNA. Furthermore, this interaction can be outcompeted by tracrRNA alone in the absence of crRNA. In the presence of both crRNA and tracrRNA, target DNA binding is substantially enhanced and yields a species with distinct electrophoretic mobility, indicative of specific target DNA recognition.

FIG. 23A-23B A fragment of tracrRNA encompassing a part of the crRNA paired region and a portion of the downstream region is sufficient to direct cleavage of protospacer oligonucleotide DNA by Cas9. (FIG. 23A) Protospacer 1 oligonucleotide DNA cleavage and (FIG. 23B) Protospacer 4 oligonucleotide DNA cleavage by Cas9 guided with a mature cognate crRNA and various tracrRNA fragments. (FIG. 23A, FIG. 23B) Sizes in nucleotides are indicated.

FIG. 24A-24D Like Cas9 from S. pyogenes, the closely related Cas9 orthologs from the Gram-positive bacteria L. innocua and S. thermophilus cleave protospacer DNA when targeted by tracrRNA:crRNA from S. pyogenes. However, under the same conditions, DNA cleavage by the less closely related Cas9 orthologs from the Gramnegative bacteria C. jejuni and N. meningitidis is not observed. Spy, S. pyogenes SF370 (Accession Number NC_002737); Sth, S. thermophilus LMD-9 (STER_1477 Cas9 ortholog; Accession Number NC_008532); Lin, L. innocua Clip11262 (Accession Number NC_003212); Cje, C. jejuni NCTC 11168 (Accession Number NC_002163); Nme, N. meningitidis A Z2491 (Accession Number NC_003116). (FIG. 24A) Cleavage of protospacer plasmid DNA. Protospacer 2 plasmid DNA (300 ng) was subjected to cleavage by different Cas9 orthologs (500 nM) guided by hybrid tracrRNA:crRNA-sp2 duplexes (500 nM, 1:1) from different species. To design the RNA duplexes, we predicted tracrRNA sequences from L. innocua and N. meningitidis based on previously published Northern blot data (4). The dual-hybrid RNA duplexes consist of species specific tracrRNA and a heterologous crRNA. The heterologous crRNA sequence was engineered to contain S. pyogenes DNA-targeting sp2 sequence at the 5′ end fused to L. innocua or N. meningitidis tracrRNA-binding repeat sequence at the 3′ end. Cas9 orthologs from S. thermophilus and L. innocua, but not from N. meningitidis or C. jejuni, can be guided by S. pyogenes tracrRNA:crRNA-sp2 to cleave protospacer 2 plasmid DNA, albeit with slightly decreased efficiency. Similarly, the hybrid L. innocua tracrRNA:crRNA-sp2 can guide S. pyogenes Cas9 to cleave the target DNA with high efficiency, whereas the hybrid N. meningitidis tracrRNA:crRNA-sp2 triggers only slight DNA cleavage activity by S. pyogenes Cas9. As controls, N. meningitidis and L. innocua Cas9 orthologs cleave protospacer 2 plasmid DNA when guided by the cognate hybrid tracrRNA:crRNA-sp2. Note that as mentioned above, the tracrRNA sequence of N. meningitidis is predicted only and has not yet been confirmed by RNA sequencing. Therefore, the low efficiency of cleavage could be the result of either low activity of the Cas9 orthologs or the use of a nonoptimally designed tracrRNA sequence. (FIG. 24B) Cleavage of protospacer oligonucleotide DNA. 5′-end radioactively labeled complementary strand oligonucleotide (10 nM) pre-annealed with unlabeled noncomplementary strand oligonucleotide (protospacer 1) (10 nM) (left) or 5′-end radioactively labeled noncomplementary strand oligonucleotide (10 nM) pre-annealed with unlabeled complementary strand oligonucleotide (10 nM) (right) (protospacer 1) was subjected to cleavage by various Cas9 orthologs (500 nM) guided by tracrRNA:crRNA-sp1 duplex from S. pyogenes (500 nM, 1:1). Cas9 orthologs from S. thermophilus and L. innocua, but not from N. meningitidis or C. jejuni can be guided by S. pyogenes cognate dual-RNA to cleave the protospacer oligonucleotide DNA, albeit with decreased efficiency. Note that the cleavage site on the complementary DNA strand is identical for all three orthologs. Cleavage of the noncomplementary strand occurs at distinct positions. (FIG. 24C) Amino acid sequence identity of Cas9 orthologs. S. pyogenes, S. thermophilus and L. innocua Cas9 orthologs share high percentage of amino acid identity. In contrast, the C. jejuni and N. meningitidis Cas9 proteins differ in sequence and length (˜300-400 amino acids shorter). (FIG. 24D) Co-foldings of engineered species-specific heterologous crRNA sequences with the corresponding tracrRNA orthologs from S. pyogenes (experimentally confirmed, (4)), L. innocua (predicted) or N. meningitidis (predicted). tracrRNAs; crRNA spacer 2 fragments; and crRNA repeat fragments are traced and labeled. L. innocua and S. pyogenes hybrid tracrRNA:crRNA-sp2 duplexes share very similar structural characteristics, albeit distinct from the N. meningitidis hybrid tracrRNA:crRNA. Together with the cleavage data described above in FIG. 24A and FIG. 24B, the co-folding predictions would indicate that the species-specificity cleavage of target DNA by Cas9-tracrRNA:crRNA is dictated by a still unknown structural feature in the tracrRNA:crRNA duplex that is recognized specifically by a cognate Cas9 ortholog. It was predicted that the species-specificity of cleavage observed in FIG. 24A and FIG. 24B occurs at the level of binding of Cas9 to dual-tracrRNA:crRNA. Dual-RNA guided Cas9 cleavage of target DNA can be species specific. Depending on the degree of diversity/evolution among Cas9 proteins and tracrRNA:crRNA duplexes, Cas9 and dual-RNA orthologs are partially interchangeable.

FIG. 25A-25C A series of 8-nucleotide DNA probes complementary to regions in the crRNA encompassing the DNA-targeting region and tracrRNA-binding region were analyzed for their ability to hybridize to the crRNA in the context of a tracrRNA:crRNA duplex and the Cas9-tracrRNA:crRNA ternary complex. (FIG. 25A) Schematic representation of the sequences of DNA probes used in the assay and their binding sites in crRNA-sp4. (FIG. 25B-FIG. 25C) Electrophoretic mobility shift assays of target DNA probes with tracrRNA:crRNA-sp4 or Cas9-tracrRNA:crRNA-sp4. The tracrRNA(15-89) construct was used in the experiment. Binding of the duplexes or complexes to target oligonucleotide DNAs was analyzed on a 16% native polyacrylamide gel and visualized by phosphorimaging.

A Short Sequence Motif Dictates R-Loop Formation

In multiple CRISPR/Cas systems, recognition of self versus nonself has been shown to involve a short sequence motif that is preserved in the foreign genome, referred to as the PAM(27, 29, 32-34). PAM motifs are only a few base pairs in length, and their precise sequence and position vary according to the CRISPR/Cas system type (32). In the S. pyogenes type II system, the PAM conforms to an NGG consensus sequence, containing two G:C base pairs that occur one base pair downstream of the crRNA binding sequence, within the target DNA (4). Transformation assays demonstrated that the GG motif is essential for protospacer plasmid DNA elimination by CRISPR/Cas in bacterial cells (FIG. 26A), consistent with previous observations in S. thermophilus (27). The motif is also essential for in vitro protospacer plasmid cleavage by tracrRNA:crRNA-guided Cas9 (FIG. 26B). To determine the role of the PAM in target DNA cleavage by the Cas9-tracrRNA: crRNA complex, we tested a series of dsDNA duplexes containing mutations in the PAM sequence on the complementary or noncomplementary strands, or both (FIG. 13A). Cleavage assays using these substrates showed that Cas9-catalyzed DNA cleavage was particularly sensitive to mutations in the PAM sequence on the noncomplementary strand of the DNA, in contrast to complementary strand PAM recognition by type I CRISPR/Cas systems (18, 34). Cleavage of target single-stranded DNAs was unaffected by mutations of the PAM motif. This observation suggests that the PAM motif is required only in the context of target dsDNA and may thus be required to license duplex unwinding, strand invasion, and the formation of an R-loop structure. When we used a different crRNA-target DNA pair (crRNA-sp4 and protospacer 4 DNA), selected due to the presence of a canonical PAM not present in the protospacer 2 target DNA, we found that both G nucleotides of the PAM were required for efficient Cas9-catalyzed DNA cleavage (FIG. 13B and FIG. 26C). To determine whether the PAM plays a direct role in recruiting the Cas9-tracrRNA:crRNA complex to the correct target DNA site, we analyzed binding affinities of the complex for target DNA sequences by native gel mobility shift assays (FIG. 13C). Mutation of either G in the PAM sequence substantially reduced the affinity of Cas9-tracrRNA: crRNA for the target DNA. This finding illustrates a role for the PAM sequence in target DNA binding by Cas9.

(FIG. 13A) Dual RNA-programmed Cas9 was tested for activity as in FIG. 10B. WT and mutant PAM sequences in target DNAs are indicated with lines. (FIG. 13B) Protospacer 4 target DNA duplexes (labeled at both 5′ ends) containing WT and mutant PAM motifs were incubated with Cas9 programmed with tracrRNA:crRNA-sp4 (nucleotides 23 to 89). At the indicated time points (in minutes), aliquots of the cleavage reaction were taken and analyzed as in FIG. 10B. (FIG. 13C) Electrophoretic mobility shift assays were performed using RNA-programmed Cas9 (D10A/H840A) and protospacer 4 target DNA duplexes [same as in FIG. 13B] containing WT and mutated PAM motifs. The Cas9 (D10A/H840A)-RNA complex was titrated from 100 pM to 1 mM.

(FIG. 26A) Mutations of the PAM sequence in protospacer 2 plasmid DNA abolish interference of plasmid maintenance by the Type II CRISPR/Cas system in bacterial cells. Wild-type protospacer 2 plasmids with a functional or mutated PAM were transformed into wild-type (strain SF370, also named EC904) and pre-crRNA-deficient mutant (EC1479) S. pyogenes as in FIG. 12D. PAM mutations are not tolerated by the Type II CRISPR/Cas system in vivo. The mean values and standard deviations of three biological replicates are shown. (FIG. 26B) Mutations of the PAM sequence in protospacer plasmid DNA abolishes cleavage by Cas9-tracrRNA:crRNA. Wild type protospacer 2 plasmid with a functional or mutated PAM were subjected to Cas9 cleavage as in FIG. 10A. The PAM mutant plasmids are not cleaved by the Cas9-tracrRNA:crRNA complex. (FIG. 26C) Mutations of the canonical PAM sequence abolish interference of plasmid maintenance by the Type II CRISPR/Cas system in bacterial cells. Wild-type protospacer 4 plasmids with a functional or mutated PAM were cleaved with Cas9 programmed with tracrRNA and crRNA-sp2. The cleavage reactions were carried out in the presence of the XmnI restriction endonuclease to visualize the Cas9 cleavage products as two fragments (˜1880 and ˜800 bp). Fragment sizes in base pairs are indicated.

Cas9 can be Programmed with a Single Chimeric RNA

Examination of the likely secondary structure of the tracrRNA:crRNA duplex (FIGS. 10E and 12C) suggested the possibility that the features required for site-specific Cas9-catalyzed DNA cleavage could be captured in a single chimeric RNA. Although the tracrRNA:crRNA target-selection mechanism works efficiently in nature, the possibility of a single RNA-guided Cas9 is appealing due to its potential utility for programmed DNA cleavage and genome editing (FIG. 1A-1B). We designed two versions of a chimeric RNA containing a target recognition sequence at the 5′ end followed by a hairpin structure retaining the base-pairing interactions that occur between the tracrRNA and the crRNA (FIG. 14A). This single transcript effectively fuses the 3′ end of crRNA to the 5′ end of tracrRNA, thereby mimicking the dual-RNA structure required to guide site-specific DNA cleavage by Cas9. In cleavage assays using plasmid DNA, we observed that the longer chimeric RNA was able to guide Cas9-catalyzed DNA cleavage in a manner similar to that observed for the truncated tracrRNA:crRNA duplex (FIG. 14A and FIG. 27A and FIG. 27C). The shorter chimeric RNA did not work efficiently in this assay, confirming that nucleotides that are 5 to 12 positions beyond the tracrRNA:crRNA base-pairing interaction are important for efficient Cas9 binding and/or target recognition. We obtained similar results in cleavage assays using short dsDNA as a substrate, further indicating that the position of the cleavage site in target DNA is identical to that observed using the dual tracrRNA:crRNA as a guide (FIG. 14B and FIG. 27B and FIG. 27C). Finally, to establish whether the design of chimeric RNA might be universally applicable, we engineered five different chimeric guide RNAs to target a portion of the gene encoding the green-fluorescent protein (GFP) (FIG. 28A to 28C) and tested their efficacy against a plasmid carrying the GFP coding sequence in vitro. In all five cases, Cas9 programmed with these chimeric RNAs efficiently cleaved the plasmid at the correct target site (FIG. 14C and FIG. 28D), indicating that rational design of chimeric RNAs is robust and could, in principle, enable targeting of any DNA sequence of interest with few constraints beyond the presence of a GG dinucleotide adjacent to the targeted sequence.

FIG. 1A-1B A DNA-targeting RNA comprises a single stranded “DNA-targeting segment” and a “protein-binding segment,” which comprises a stretch of double stranded RNA. (FIG. 1A) A DNA-targeting RNA can comprise two separate RNA molecules (referred to as a “double-molecule” or “two-molecule” DNA-targeting RNA). A double-molecule DNA-targeting RNA comprises a “targeter-RNA” and an “activator-RNA.” (FIG. 1B) A DNA-targeting RNA can comprise a single RNA molecule (referred to as a “single-molecule” DNA-targeting RNA). A single-molecule DNA-targeting RNA comprises “linker nucleotides.”

(FIG. 14A) A plasmid harboring protospacer 4 target sequence and a WT PAM was subjected to cleavage by Cas9 programmed with tracrRNA(4-89):crRNA-sp4 duplex or in vitro-transcribed chimeric RNAs constructed by joining the 3′ end of crRNA to the 5′ end of tracrRNA with a GAAA tetraloop. Cleavage reactions were analyzed by restriction mapping with XmnI. Sequences of chimeric RNAs A and B are shown with DNA-targeting (underline), crRNA repeat-derived sequences (overlined), and tracrRNA-derived (dashed underlined) sequences. (FIG. 14B) Protospacer 4 DNA duplex cleavage reactions were performed as in FIG. 10B. (FIG. 14C) Five chimeric RNAs designed to target the GFP gene were used to program Cas9 to cleave a GFP gene-containing plasmid. Plasmid cleavage reactions were performed as in FIG. 12E, except that the plasmid DNA was restriction mapped with AvrII after Cas9 cleavage.

(FIG. 27A) A single chimeric RNA guides Cas9-catalyzed cleavage of cognate protospacer plasmid DNA (protospacer 1 and protospacer 2). The cleavage reactions were carried out in the presence of the XmnI restriction endonuclease to visualize the Cas9 cleavage products as two fragments (˜1880 and ˜800 bp). Fragment sizes in base pairs are indicated. (FIG. 27B) A single chimeric RNA guides Cas9-catalyzed cleavage of cognate protospacer oligonucleotide DNA (protospacer 1 and protospacer 2). Fragment sizes in nucleotides are indicated. (FIG. 27C) Schematic representations of the chimeric RNAs used in the experiment. Sequences of chimeric RNAs A and B are shown with the 5′ protospacer DNA-targeting sequence of crRNA (underlined), the tracrRNA-binding sequence of crRNA (overlined) and tracrRNA-derived sequence (dashed underlined).

(FIG. 28A) Schematic representation of the GFP expression plasmid pCFJ127. The targeted portion of the GFP open reading frame is indicated with a black arrowhead. (FIG. 28B) Close-up of the sequence of the targeted region. Sequences targeted by the chimeric RNAs are shown with gray bars. PAM dinucleotides are boxed. A unique SalI restriction site is located 60 bp upstream of the target locus. (FIG. 28C) Left: Target DNA sequences are shown together with their adjacent PAM motifs. Right: Sequences of the chimeric guide RNAs. (FIG. 28D) pCFJ127 was cleaved by Cas9 programmed with chimeric RNAs GFP1-5, as indicated. The plasmid was additionally digested with SalI and the reactions were analyzed by electrophoresis on a 3% agarose gel and visualized by staining with SYBR Safe.

Conclusions

A DNA interference mechanism was identified, involving a dual-RNA structure that directs a Cas9 endonuclease to introduce site-specific double-stranded breaks in target DNA. The tracrRNA:crRNA-guided Cas9 protein makes use of distinct endonuclease domains (HNH and RuvC-like domains) to cleave the two strands in the target DNA. Target recognition by Cas9 requires both a seed sequence in the crRNA and a GG dinucleotide-containing PAM sequence adjacent to the crRNA-binding region in the DNA target. We further show that the Cas9 endonuclease can be programmed with guide RNA engineered as a single transcript to target and cleave any dsDNA sequence of interest. The system is efficient, versatile, and programmable by changing the DNA target-binding sequence in the guide chimeric RNA. Zinc-finger nucleases and transcription-activator-like effector nucleases have attracted considerable interest as artificial enzymes engineered to manipulate genomes (35-38). This represents alternative methodology based on RNA-programmed Cas9 that facilitates gene-targeting and genome-editing applications.

REFERENCES CITED

  • 1. B. Wiedenheft, S. H. Sternberg, J. A. Doudna, Nature 482, 331 (2012).
  • 2. D. Bhaya, M. Davison, R. Barrangou, Annu. Rev. Genet. 45, 273 (2011).
  • 3. M. P. Terns, R. M. Terns, Curr. Opin. Microbiol. 14, 321 (2011).
  • 4. E. Deltcheva et al., Nature 471, 602 (2011).
  • 5. J. Carte, R. Wang, H. Li, R. M. Terns, M. P. Terns, Genes Dev. 22, 3489 (2008).
  • 6. R. E. Haurwitz, M. Jinek, B. Wiedenheft, K. Zhou, J. A. Doudna, Science 329, 1355 (2010).
  • 7. R. Wang, G. Preamplume, M. P. Terns, R. M. Terns, H. Li, Structure 19, 257 (2011).
  • 8. E. M. Gesner, M. J. Schellenberg, E. L. Garside, M. M. George, A. M. Macmillan, Nat. Struct. Mol. Biol. 18, 688 (2011).
  • 9. A. Hatoum-Aslan, I. Maniv, L. A. Marraffini, Proc. Natl. Acad. Sci. U.S.A. 108, 21218 (2011).
  • 10. S. J. J. Brouns et al., Science 321, 960 (2008).
  • 11. D. G. Sashital, M. Jinek, J. A. Doudna, Nat. Struct. Mol. Biol. 18, 680 (2011).
  • 12. N. G. Lintner et al., J. Biol. Chem. 286, 21643 (2011).
  • 13. E. Semenova et al., Proc. Natl. Acad. Sci. U.S.A. 108, 10098 (2011).
  • 14. B. Wiedenheft et al., Proc. Natl. Acad. Sci. U.S.A. 108, 10092 (2011).
  • 15. B. Wiedenheft et al., Nature 477, 486 (2011).
  • 16. C. R. Hale et al., Cell 139, 945 (2009).
  • 17. J. A. L. Howard, S. Delmas, I. Ivančić-Baće, E. L. Bolt, Biochem. J. 439, 85 (2011).
  • 18. E. R. Westra et al., Mol. Cell 46, 595 (2012).
  • 19. C. R. Hale et al., Mol. Cell 45, 292 (2012).
  • 20. J. Zhang et al., Mol. Cell 45, 303 (2012).
  • 21. K. S. Makarova et al., Nat. Rev. Microbiol. 9, 467 (2011).
  • 22. K. S. Makarova, N. V. Grishin, S. A. Shabalina, Y. I. Wolf, E. V. Koonin, Biol. Direct 1, 7 (2006).
  • 23. K. S. Makarova, L. Aravind, Y. I. Wolf, E. V. Koonin, Biol. Direct 6, 38 (2011).
  • 24. S. Gottesman, Nature 471, 588 (2011).
  • 25. R. Barrangou et al., Science 315, 1709 (2007).
  • 26. J. E. Garneau et al., Nature 468, 67 (2010).
  • 27. R. Sapranauskas et al., Nucleic Acids Res. 39, 9275 (2011).
  • 28. G. K. Taylor, D. F. Heiter, S. Pietrokovski, B. L. Stoddard, Nucleic Acids Res. 39, 9705 (2011).
  • 29. H. Deveau et al., J. Bacteriol. 190, 1390 (2008).
  • 30. B. P. Lewis, C. B. Burge, D. P. Bartel, Cell 120, 15 (2005).
  • 31. G. Hutvagner, M. J. Simard, Nat. Rev. Mol. Cell Biol. 9, 22 (2008).
  • 32. F. J. M. Mojica, C. Díez-Villaseñor, J. Garcia-Martinez, C. Almendros, Microbiology 155, 733 (2009).
  • 33. L. A. Marraffini, E. J. Sontheimer, Nature 463, 568 (2010).
  • 34. D. G. Sashital, B. Wiedenheft, J. A. Doudna, Mol. Cell 46, 606 (2012).
  • 35. M. Christian et al., Genetics 186, 757 (2010).
  • 36. J. C. Miller et al., Nat. Biotechnol. 29, 143 (2011).
  • 37. F. D. Urnov, E. J. Rebar, M. C. Holmes, H. S. Zhang, P. D. Gregory, Nat. Rev. Genet. 11, 636 (2010).
  • 38. D. Carroll, Gene Ther. 15, 1463 (2008).
  • 39. J. Sambrook, E. F. Fritsch, T. Maniatis, Molecular Cloning: A Laboratory Manual (Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y., ed. 2, 1989).
  • 40. M. G. Caparon, J. R. Scott, Genetic manipulation of pathogenic streptococci. Methods Enzymol. 204, 556 (1991). doi:10.1016/0076-6879(91)04028-M Medline
  • 41. C. Frøkj∛r-Jensen et al., Single-copy insertion of transgenes in Caenorhabditis elegans. Nat. Genet. 40, 1375 (2008). doi:10.1038/ng.248 Medline
  • 42. R. B. Denman, Using RNAFOLD to predict the activity of small catalytic RNAs. Biotechniques 15, 1090 (1993). Medline
  • 43. I. L. Hofacker, P. F. Stadler, Memory efficient folding algorithms for circular RNA secondary structures. Bioinformatics 22, 1172 (2006). doi:10.1093/bioinformatics/bt1023 Medline
  • 44. K. Darty, A. Denise, Y. Ponty, VARNA: Interactive drawing and editing of the RNA secondary structure. Bioinformatics 25, 1974 (2009). doi:10.1093/bioinformatics/btp250 Medline

Example 2: RNA-Programmed Genome Editing in Human Cells

Data provided below demonstrate that Cas9 can be expressed and localized to the nucleus of human cells, and that it assembles with single-guide RNA (“sgRNA”; encompassing the features required for both Cas9 binding and DNA target site recognition) in a human cell. These complexes can generate double stranded breaks and stimulate non-homologous end joining (NHEJ) repair in genomic DNA at a site complementary to the sgRNA sequence, an activity that requires both Cas9 and the sgRNA. Extension of the RNA sequence at its 3′ end enhances DNA targeting activity in living cells. Further, experiments using extracts from transfected cells show that sgRNA assembly into Cas9 is the limiting factor for Cas9-mediated DNA cleavage. These results demonstrate that RNA-programmed genome editing works in living cells and in vivo.

Materials and Methods

Plasmid Design and Construction

The sequence encoding Streptococcus pyogenes Cas9 (residues 1-1368) fused to an HA epitope (amino acid sequence DAYPYDVPDYASL (SEQ ID NO:274)), a nuclear localization signal (amino acid sequence PKKKRKVEDPKKKRKVD (SEQ ID NO:275)) was codon optimized for human expression and synthesized by GeneArt. The DNA sequence is SEQ ID NO:276 and the protein sequence is SEQ ID NO:277. Ligation-independent cloning (LIC) was used to insert this sequence into a pcDNA3.1-derived GFP and mCherry LIC vectors (vectors 6D and 6B, respectively, obtained from the UC Berkeley MacroLab), resulting in a Cas9-HA-NLS-GFP and Cas9-HA-NLS-mCherry fusions expressed under the control of the CMV promoter. Guide sgRNAs were expressed using expression vector pSilencer 2.1-U6 puro (Life Technologies) and pSuper (Oligoengine). RNA expression constructs were generated by annealing complementary oligonucleotides to form the RNA-coding DNA sequence and ligating the annealed DNA fragment between the BamHI and HindIII sites in pSilencer 2.1-U6 puro and BglII and HindIII sites in pSuper.

Cell Culture Conditions and DNA Transfections

HEK293T cells were maintained in Dulbecco's modified eagle medium (DMEM) supplemented with 10% fetal bovine serum (FBS) in a 37° C. humidified incubator with 5% CO2. Cells were transiently transfected with plasmid DNA using either X-tremeGENE DNA Transfection Reagent (Roche) or Turbofect Transfection Reagent (Thermo Scientific) with recommended protocols. Briefly, HEK293T cells were transfected at 60-80% confluency in 6-well plates using 0.5 μg of the Cas9 expression plasmid and 2.0 μg of the RNA expression plasmid. The transfection efficiencies were estimated to be 30-50% for Tubofect (FIG. 29E and FIGS. 37A-37B) and 80-90% for X-tremegene (FIG. 31B), based on the fraction of GFP-positive cells observed by fluorescence microscopy. 48 hours post transfection, cells were washed with phosphate buffered saline (PBS) and lysed by applying 250 μl lysis buffer (20 mM Hepes pH 7.5, 100 mM potassium chloride (KCl), 5 mM magnesium chloride (MgCl2), 1 mM dithiothreitol (DTT), 5% glycerol, 0.1% Triton X-100, supplemented with Roche Protease Inhibitor cocktail) and then rocked for 10 min at 4° C. The resulting cell lysate was divided into aliquots for further analysis. Genomic DNA was isolated from 200 μl cell lysate using the DNeasy Blood and Tissue Kit (Qiagen) according to the manufacturer's protocol.

Western Blot Analysis of Cas9 Expression

HEK293T, transfected with the Cas9-HA-NLS-GFP expression plasmid, were harvested and lysed 48 hours post transfection as above. 5 ul of lysate were eletrophoresed on a 10% SDS polyacrylamide gel, blotter onto a PVDF membrane and probed with HRP-conjugated anti-HA antibody (Sigma, 1:1000 dilution in 1×PBS).

Surveyor Assay

The Surveyor assay was performed as previously described [10,12,13]. Briefly, the human clathrin light chain A (CLTA) locus was PCR amplified from 200 ng of genomic DNA using a high fidelity polymerase, Herculase II Fusion DNA Polymerase (Agilent Technologies) and forward primer 5′-GCAGCAGAAGAAGCCTTTGT-3′ (SEQ ID NO: 1353) and reverse primer 5′-TTCCTCCTCTCCCTCCTCTC-3′ (SEQ ID NO: 1354). 300 ng of the 360 bp amplicon was then denatured by heating to 95° C. and slowly reannealed using a heat block to randomly rehybridize wild type and mutant DNA strands. Samples were then incubated with Cel-1 nuclease (Surveyor Kit, Transgenomic) for 1 hour at 42° C. Cel-1 recognizes and cleaves DNA helices containing mismatches (wild type:mutant hybridization). Cel-1 nuclease digestion products were separated on a 10% acrylamide gel and visualized by staining with SYBR Safe (Life Technologies). Quantification of cleavage bands was performed using ImageLab software (Bio-Rad). The percent cleavage was determined by dividing the average intensity of cleavage products (160-200 bps) by the sum of the intensities of the uncleaved PCR product (360 bp) and the cleavage product.

In Vitro Transcription

Guide RNA was in vitro transcribed using recombinant T7 RNA polymerase and a DNA template generated by annealing complementary synthetic oligonucleotides as previously described [14]. RNAs were purified by electrophoresis on 7M urea denaturing acrylamide gel, ethanol precipitated, and dissolved in DEPC-treated water.

Northern Blot Analysis

RNA was purified from HEK293T cells using the mirVana small-RNA isolation kit (Ambion). For each sample, 800 ng of RNA were separated on a 10% urea-PAGE gel after denaturation for 10 min at 70° C. in RNA loading buffer (0.5×TBE (pH7.5), 0.5 mg/ml bromophenol blue, 0.5 mg xylene cyanol and 47% formamide). After electrophoresis at 10 W in 0.5×TBE buffer until the bromophenol blue dye reached the bottom of the gel, samples were electroblotted onto a Nytran membrane at 20 volts for 1.5 hours in 0.5×TBE. The transferred RNAs were cross-linked onto the Nytran membrane in UV-Crosslinker (Strategene) and were pre-hybridized at 45° C. for 3 hours in a buffer containing 40% formamide, 5×SSC, 3× Dernhardt's (0.1% each of ficoll, polyvinylpyrollidone, and BSA) and 200 μg/ml Salmon sperm DNA. The pre-hybridized membranes were incubated overnight in the prehybridization buffer supplemented with 5′-32P-labeled antisense DNA oligo probe at 1 million cpm/ml. After several washes in SSC buffer (final wash in 0.2×SCC), the membranes were imaged phosphorimaging.

In Vitro Cleavage Assay

Cell lysates were prepared as described above and incubated with CLTA-RFP donor plasmid [10]. Cleavage reactions were carried out in a total volume of 20 μl and contained 10 μl lysate, 2 μl of 5× cleavage buffer (100 mM HEPES pH 7.5, 500 mM KCl, 25 mM MgCl2, 5 mM DTT, 25% glycerol) and 300 ng plasmid. Where indicated, reactions were supplemented with 10 pmol of in vitro transcribed CLTA1 sgRNA. Reactions were incubated at 37° C. for one hour and subsequently digested with 10 U of XhoI (NEB) for an additional 30 min at 37° C. The reactions were stopped by the addition of Proteinase K (Thermo Scientific) and incubated at 37° C. for 15 min. Cleavage products were analyzed by electrophoresis on a 1% agarose gel and stained with SYBR Safe. The presence of ˜2230 and ˜3100 bp fragments is indicative of Cas9-mediated cleavage.

Results

To test whether Cas9 could be programmed to cleave genomic DNA in living cells, Cas9 was co-expressed together with an sgRNA designed to target the human clathrin light chain (CLTA) gene. The CLTA genomic locus has previously been targeted and edited using ZFNs [10]. We first tested the expression of a human-codon-optimized version of the Streptococcus pyogenes Cas9 protein and sgRNA in human HEK293T cells. The 160 kDa Cas9 protein was expressed as a fusion protein bearing an HA epitope, a nuclear localization signal (NLS), and green fluorescent protein (GFP) attached to the C-terminus of Cas9 (FIG. 29A). Analysis of cells transfected with a vector encoding the GFP-fused Cas9 revealed abundant Cas9 expression and nuclear localization (FIG. 29B). Western blotting confirmed that the Cas9 protein is expressed largely intact in extracts from these cells (FIG. 29A). To program Cas9, we expressed sgRNA bearing a 5′-terminal 20-nucleotide sequence complementary to the target DNA sequence, and a 42-nucleotide 3′-terminal stem loop structure required for Cas9 binding (FIG. 29C). This 3′-terminal sequence corresponds to the minimal stem-loop structure that has previously been used to program Cas9 in vitro [8]. The expression of this sgRNA was driven by the human U6 (RNA polymerase III) promoter [11]. Northern blotting analysis of RNA extracted from cells transfected with the U6 promoter-driven sgRNA plasmid expression vector showed that the sgRNA is indeed expressed, and that their stability is enhanced by the presence of Cas9 (FIG. 29D).

FIG. 29A-29E demonstrates that co-expression of Cas9 and guide RNA in human cells generates double-strand DNA breaks at the target locus. (FIG. 29A) Top; schematic diagram of the Cas9-HA-NLS-GFP expression construct. Bottom; lysate from HEK293T cells transfected with the Cas9 expression plasmid was analyzed by Western blotting using an anti-HA antibody. (FIG. 29B) Fluorescence microscopy of HEK293T cells expressing Cas9-HA-NLS-GFP. (FIG. 29C) Design of a single-guide RNA (sgRNA, i.e., a single-molecule DNA-targeting RNA) targeting the human CLTA locus. Top; schematic diagram of the sgRNA target site in exon 7 of the human CLTA gene. The target sequence that hybridizes to the guide segment of CLTA1 sgRNA is indicated by “CLTA1 sgRNA.” The GG di-nucleotide protospacer adjacent motif (PAM) is marked by an arrow. Black lines denote the DNA binding regions of the control ZFN protein. The translation stop codon of the CLTA open reading frame is marked with a dotted line for reference. Middle; schematic diagram of the sgRNA expression construct. The RNA is expressed under the control of the U6 Pol III promoter and a poly(T) tract that serves as a Pol III transcriptional terminator signal. Bottom; sgRNA-guided cleavage of target DNA by Cas9. The sgRNA consists of a 20-nt 5′-terminal guide segment followed by a 42-nt stem-loop structure required for Cas9 binding. Cas9-mediated cleavage of the two target DNA strands occurs upon unwinding of the target DNA and formation of a duplex between the guide segment of the sgRNA and the target DNA. This is dependent on the presence of a PAM motif (appropriate for the Cas9 being used, e.g., GG dinucleotide, see Example 1 above) downstream of the target sequence in the target DNA. Note that the target sequence is inverted relative to the upper diagram. (FIG. 29D) Northern blot analysis of sgRNA expression in HEK239T cells. (FIG. 29E) Surveyor nuclease assay of genomic DNA isolated from HEK293T cells expressing Cas9 and/or CLTA sgRNA. A ZFN construct previously used to target the CLTA locus [10] was used as a positive control for detecting DSB-induced DNA repair by non-homologous end joining.

Next we investigated whether site-specific DSBs are generated in HEK293T cells transfected with Cas9-HA-NLS-mCherry and the CLTA1 sgRNA. To do this, we probed for minor insertions and deletions in the locus resulting from imperfect repair by DSB-induced NHEJ using the Surveyor nuclease assay [12]. The region of genomic DNA targeted by Cas9:sgRNA is amplified by PCR and the resulting products are denatured and reannealed. The rehybridized PCR products are incubated with the mismatch recognition endonuclease Cel-1 and resolved on an acrylamide gel to identify Cel-1 cleavage bands. As DNA repair by NHEJ is typically induced by a DSB, a positive signal in the Surveyor assay indicates that genomic DNA cleavage has occurred. Using this assay, we detected cleavage of the CLTA locus at a position targeted by the CLTA1 sgRNA (FIG. 29E). A pair of ZFNs that target a neighboring site in the CLTA locus provided a positive control in these experiments [10].

To determine if either Cas9 or sgRNA expression is a limiting factor in the observed genome editing reactions, lysates prepared from the transfected cells were incubated with plasmid DNA harboring a fragment of the CLTA gene targeted by the CLTA1 sgRNA. Plasmid DNA cleavage was not observed upon incubation with lysate prepared from cells transfected with the Cas9-HA-NLS-GFP expression vector alone, consistent with the Surveyor assay results. However, robust plasmid cleavage was detected when the lysate was supplemented with in vitro transcribed CLTA1 sgRNA (FIG. 30A). Furthermore, lysate prepared from cells transfected with both Cas9 and sgRNA expression vectors supported plasmid cleavage, while lysates from cells transfected with the sgRNA-encoding vector alone did not (FIG. 30A). These results suggest that a limiting factor for Cas9 function in human cells could be assembly with the sgRNA. We tested this possibility directly by analyzing plasmid cleavage in lysates from cells transfected as before in the presence and absence of added exogenous sgRNA. Notably, when exogenous sgRNA was added to lysate from cells transfected with both the Cas9 and sgRNA expression vectors, a substantial increase in DNA cleavage activity was observed (FIG. 30B). This result indicates that the limiting factor for Cas9 function in HEK293T cells is the expression of the sgRNA or its loading into Cas9.

FIG. 30A-30B demonstrates that cell lysates contain active Cas9:sgRNA and support site-specific DNA cleavage. (FIG. 30A) Lysates from cells transfected with the plasmid(s) indicated at left were incubated with plasmid DNA containing a PAM and the target sequence complementary to the CLTA1 sgRNA; where indicated, the reaction was supplemented with 10 pmol of in vitro transcribed CLTA1 sgRNA; secondary cleavage with XhoI generated fragments of ˜2230 and ˜3100 bp fragments indicative of Cas9-mediated cleavage. A control reaction using lysate from cells transfected with a ZFN expression construct shows fragments of slightly different size reflecting the offset of the ZFN target site relative to the CLTA1 target site. (FIG. 30B) Lysates from cells transfected with Cas9-GFP expression plasmid and, where indicated, the CLTA1 sgRNA expression plasmid, were incubated with target plasmid DNA as in FIG. 30A in the absence or presence of in vitro-transcribed CLTA1 sgRNA.

As a means of enhancing the Cas9:sgRNA assembly in living cells, we next tested the effect of extending the presumed Cas9-binding region of the guide RNA. Two new versions of the CLTA1 sgRNA were designed to include an additional six or twelve base pairs in the helix that mimics the base-pairing interactions between the crRNA and tracrRNA (FIG. 31A). Additionally, the 3′-end of the guide RNA was extended by five nucleotides based on the native sequence of the S. pyogenes tracrRNA [9]. Vectors encoding these 3′ extended sgRNAs under the control of either the U6 or H1 Pol III promoters were transfected into cells along with the Cas9-HA-NLS-GFP expression vector and site-specific genome cleavage was tested using the Surveyor assay (FIG. 31B). The results confirmed that cleavage required both Cas9 and the CLTA1 sgRNA, but did not occur when either Cas9 or the sgRNA were expressed alone. Furthermore, we observed substantially increased frequencies of NHEJ, as detected by Cel-1 nuclease cleavage, while the frequency of NHEJ mutagenesis obtained with the control ZFN pair was largely unchanged.

FIG. 31A-31B demonstrates that 3′ extension of sgRNA constructs enhances site-specific NHEJ-mediated mutagenesis. (FIG. 31A) The construct for CLTA1 sgRNA expression (top) was designed to generate transcripts containing the original Cas9-binding sequence (v1.0), or dsRNA duplexes extended by 4 base pairs (v2.1) or 10 base pairs (v2.2). (FIG. 31B) Surveyor nuclease assay of genomic DNA isolated from HEK293T cells expressing Cas9 and/or CLTA sgRNA v1.0, v2.1 or v2.2. A ZFN construct previously used to target the CLTA locus [10] was used as a positive control for detecting DSB-induced DNA repair by non-homologous end joining.

The results thus provide the framework for implementing Cas9 as a facile molecular tool for diverse genome editing applications. A powerful feature of this system is the potential to program Cas9 with multiple sgRNAs in the same cell, either to increase the efficiency of targeting at a single locus, or as a means of targeting several loci simultaneously. Such strategies would find broad application in genome-wide experiments and large-scale research efforts such as the development of multigenic disease models.

Example 3: The tracrRNA and Cas9 Families of Type II CRISPR-Cas Immunity Systems

We searched for all putative type II CRISPR-Cas loci currently existing in publicly available bacterial genomes by screening for sequences homologous to Cas9, the hallmark protein of the type II system. We constructed a phylogenetic tree from a multiple sequence alignment of the identified Cas9 orthologues. The CRISPR repeat length and gene organization of cas operons of the associated type II systems were analyzed in the different Cas9 subclusters. A subclassification of type II loci was proposed and further divided into subgroups based on the selection of 75 representative Cas9 orthologues. We then predicted tracrRNA sequences mainly by retrieving CRISPR repeat sequences and screening for anti-repeats within or in the vicinity of the cas genes and CRISPR arrays of selected type II loci. Comparative analysis of sequences and predicted structures of chosen tracrRNA orthologues was performed. Finally, we determined the expression and processing profiles of tracrRNAs and crRNAs from five bacterial species.

Materials and Methods

Bacterial Strains and Culture Conditions

The following media were used to grow bacteria on plates: TSA (trypticase soy agar, Trypticase™ Soy Agar (TSA II) BD BBL, Becton Dickinson) supplemented with 3% sheep blood for S. mutans (UA159), and BHI (brain heart infusion, BD Bacto™ Brain Heart Infusion, Becton Dickinson) agar for L. innocua (Clip11262). When cultivated in liquid cultures, THY medium (Todd Hewitt Broth (THB, Bacto, Becton Dickinson) supplemented with 0.2% yeast extract (Servabacter®) was used for S. mutans, BHI broth for L. innocua, BHI liquid medium containing 1% vitamin-mix VX (Difco, Becton Dickinson) for N. meningitidis (A Z2491), MH (Mueller Hinton Broth, Oxoid) Broth including 1% vitamin-mix VX for C. jejuni (NCTC 11168; ATCC 700819) and TSB (Tryptic Soy Broth, BD BBL™ Trypticase™ Soy Broth) for F. novicida (U112). S. mutans was incubated at 37° C., 5% CO2 without shaking. Strains of L. innocua, N. meningitidis and F. novicida were grown aerobically at 37° C. with shaking. C. jejuni was grown at 37° C. in microaerophilic conditions using campygen (Oxoid) atmosphere. Bacterial cell growth was followed by measuring the optical density of cultures at 620 nm (OD620 nm) at regular time intervals using a microplate reader (BioTek PowerWave™).

Sequencing of Bacterial Small RNA Libraries.

C. jejuni NCTC 11168 (ATCC 700819), F. novicida U112, L. innocua Clip11262, N. meningitidis A Z2491 and S. mutans UA159 were cultivated until mid-logarithmic growth phase and total RNA was extracted with TRIzol (Sigma-Aldrich). 10 μg of total RNA from each strain were treated with TURBO™ DNase (Ambion) to remove any residual genomic DNA. Ribosomal RNAs were removed by using the Ribo-Zero™ rRNA Removal Kits® for Gram-positive or Gram-negative bacteria (Epicentre) according to the manufacturer's instructions. Following purification with the RNA Clean & Concentrator™-5 kit (Zymo Research), the libraries were prepared using ScriptMiner™ Small RNA-Seq Library Preparation Kit (Multiplex, Illumina® compatible) following the manufacturer's instructions. RNAs were treated with the Tobacco Acid Pyrophosphatase (TAP) (Epicentre). Columns from RNA Clean & Concentrator™-5 (Zymo Research) were used for subsequent RNA purification and the Phusion® High-Fidelity DNA Polymerase (New England Biolabs) was used for PCR amplification. Specific userdefined barcodes were added to each library (RNA-Seq Barcode Primers (Illumina®-compatible) Epicentre) and the samples were sequenced at the Next Generation Sequencing (CSF NGS Unit; on the web at “csf.” followed by “ac.at”) facility of the Vienna Biocenter, Vienna, Austria (Illumina single end sequencing).

Analysis of tracrRNA and crRNA Sequencing Data

The RNA sequencing reads were split up using the illumina2bam tool and trimmed by (i) removal of Illumina adapter sequences (cutadapt 1.0) and (ii) removal of 15 nt at the 3′ end to improve the quality of reads. After removal of reads shorter than 15 nt, the cDNA reads were aligned to their respective genome using Bowtie by allowing 2 mismatches: C. jejuni (GenBank: NC_002163), F. novicida (GenBank: NC_008601), N. meningitidis (GenBank: NC_003116), L. innocua (GenBank: NC_003212) and S. mutans (GenBank: NC_004350). Coverage of the reads was calculated at each nucleotide position separately for both DNA strands using BEDTools-Version-2.15.0. A normalized wiggle file containing coverage in read per million (rpm) was created and visualized using the Integrative Genomics Viewer (IGV) tool (“www.” followed by “broadinstitute.org/igv/”) (FIG. 36A-36F). Using SAMTools flagstat80 the proportion of mapped reads was calculated on a total of mapped 9914184 reads for C. jejuni, 48205 reads for F. novicida, 13110087 reads for N. meningitidis, 161865 reads L. innocua and 1542239 reads for S. mutans. A file containing the number of reads starting (5′) and ending (3′) at each single nucleotide position was created and visualized in IGV. For each tracrRNA orthologue and crRNA, the total number of reads retrieved was calculated using SAMtools.

Cas9 Sequence Analysis, Multiple Sequence Alignment and Guide Tree Construction

Position-Specific Iterated (PSI)-BLAST program was used to retrieve homologues of the Cas9 family in the NCBI non redundant database. Sequences shorter than 800 amino acids were discarded. The BLASTClust program set up with a length coverage cutoff of 0.8 and a score coverage threshold (bit score divided by alignment length) of 0.8 was used to cluster the remaining sequences (FIG. 38A-38B). This procedure produced 78 clusters (48 of those were represented by one sequence only). One (or rarely a few representatives) were selected from each cluster and multiple alignment for these sequences was constructed using the MUSCLE program with default parameters, followed by a manual correction on the basis of local alignments obtained using PSI-BLAST and HHpred programs. A few more sequences were unalignable and also excluded from the final alignments. The confidently aligned blocks with 272 informative positions were used for maximum likelihood tree reconstruction using the FastTree program with the default parameters: JTT evolutionary model, discrete gamma model with 20 rate categories. The same program was used to calculate the bootstrap values.

FIG. 38A-38B depict sequences that were grouped according to the BLASTclust clustering program. Only sequences longer than 800 amino acids were selected for the BLASTclust analysis (see Materials and Methods). Representative strains harboring cas9 orthologue genes were used. Some sequences did not cluster, but were verified as Cas9 sequences due to the presence of conserved motifs and/or other cas genes in their immediate vicinity.

Analysis of CRISPR-Cas Loci

The CRISPR repeat sequences were retrieved from the CRISPRdb database or predicted using the CRISPRFinder tool (Grissa I et al., BMC Bioinformatics 2007; 8:172; Grissa I et al., Nucleic Acids Res 2007). The cas genes were identified using the BLASTp algorithm and/or verified with the KEGG database (on the web at “www.” followed by kegg.jp/).

In Silico Prediction and Analysis of tracrRNA Orthologues

The putative antirepeats were identified using the Vector NTI® software (Invitrogen) by screening for additional, degenerated repeat sequences that did not belong to the repeat-spacer array on both strands of the respective genomes allowing up to 15 mismatches. The transcriptional promoters and rho-independent terminators were predicted using the BDGP Neural Network Promoter Prediction program (“www.” followed by fruitfly.org/seq_tools/promoter.html) and the TransTermHP software, respectively. The multiple sequence alignments were performed using the MUSCLE program with default parameters. The alignments were analyzed for the presence of conserved structure motifs using the RNAalifold algorithm of the Vienna RNA package 2.0.

Results

Type II CRISPR-Cas Systems are Widespread in Bacteria.

In addition to the tracrRNA-encoding DNA and the repeat-spacer array, type II CRISPR-Cas loci are typically composed of three to four cas genes organized in an operon (FIG. 32A-32B). Cas9 is the signature protein characteristic for type II and is involved in the steps of expression and interference. Cas1 and Cas2 are core proteins that are shared by all CRISPR-Cas systems and are implicated in spacer acquisition. Csn2 and Cas4 are present in only a subset of type II systems and were suggested to play a role in adaptation. To retrieve a maximum number of type II CRISPR-Cas loci, containing tracrRNA, we first screened publicly available genomes for sequences homologous to already annotated Cas9 proteins. 235 Cas9 orthologues were identified in 203 bacterial species. A set of 75 diverse sequences representative of all retrieved Cas9 orthologues were selected for further analysis (FIGS. 32A-32B, FIGS. 38A-38B, and Materials and Methods).

FIG. 32A-32B depict (FIG. 32A) a phylogenetic tree of representative Cas9 sequences from various organisms as well as (FIG. 32B) representative Cas9 locus architecture. Bootstrap values calculated for each node are indicated. Same color branches represent selected subclusters of similar Cas9 orthologues. CRISPR repeat length in nucleotides, average Cas9 protein size in amino acids (aa) and consensus locus architecture are shown for every subcluster. *-gi|116628213 **-gi|116627542 †-gi|34557790 ‡gi|34557932. Type II-A is characterized by cas9-csx12, cas1, cas2, cas4. Type II-B is characterized by cas9, cas1, cas2 followed by a csn2 variant. Type II-C is characterized by a conserved cas9, cas1, cas2 operon (See also FIG. 38A-38B).

Next, we performed a multiple sequence alignment of the selected Cas9 orthologues. The comparative analysis revealed high diversities in amino acid composition and protein size. The Cas9 orthologues share only a few identical amino acids and all retrieved sequences have the same domain architecture with a central HNH endonuclease domain and splitted RuvC/RNaseH domain. The lengths of Cas9 proteins range from 984 (Campylobacter jejuni) to 1629 (Francisella novicida) amino acids with typical sizes of ˜1100 or ˜1400 amino acids. Due to the high diversity of Cas9 sequences, especially in the length of the inter-domain regions, we selected only well-aligned, informative positions of the prepared alignment to reconstruct a phylogenetic tree of the analyzed sequences (FIGS. 32A-32B and Materials and Methods). Cas9 orthologues grouped into three major, monophyletic clusters with some outlier sequences. The observed topology of the Cas9 tree is well in agreement with the current classification of type II loci, with previously defined type II-A and type II-B forming separate, monophyletic clusters. To further characterize the clusters, we examined in detail the cas operon compositions and CRISPR repeat sequences of all listed strains.

Cas9 Subclustering Reflects Diversity in Type II CRISPR-Cas Loci Architecture

A deeper analysis of selected type II loci revealed that the clustering of Cas9 orthologue sequences correlates with the diversity in CRISPR repeat length. For most of the type II CRISPR-Cas systems, the repeat length is 36 nucleotides (nt) with some variations for two of the Cas9 tree subclusters. In the type II-A cluster (FIG. 32A-32B) that comprises loci encoding the long Cas9 orthologue, previously named Csx12, the CRISPR repeats are 37 nt long. The small subcluster composed of sequences from bacteria belonging to the Bacteroidetes phylum (FIG. 32A-32B) is characterized by unusually long CRISPR repeats, up to 48 nt in size. Furthermore, we noticed that the subclustering of Cas9 sequences correlates with distinct cas operon architectures, as depicted in FIG. 32A-32B. The third major cluster (FIGS. 32A-32B) and the outlier loci (FIG. 32A-32B), consist mainly of the minimum operon composed of the cas9, cas1 and cas2 genes, with an exception of some incomplete loci that are discussed later. All other loci of the two first major clusters are associated with a fourth gene, mainly cas4, specific to type II-A or csn2-like, specific to type II-B (FIGS. 32A-32B). We identified genes encoding shorter variants of the Csn2 protein, Csn2a, within loci similar to type II-B S. pyogenes CRISPR01 and S. thermophilus CRISPR3 (FIGS. 32A-32B). The longer variant of Csn2, Csn2b, was found associated with loci similar to type II-B S. thermophilus CRISPR1 (FIGS. 32A-32B). Interestingly, we identified additional putative cas genes encoding proteins with no obvious sequence similarity to previously described Csn2 variants. One of those uncharacterized proteins is exclusively associated with type II-B loci of Mycoplasma species (FIGS. 32A-32B and FIGS. 33A-33E). Two others were found encoded in type II-B loci of Staphylococcus species (FIGS. 33A-33E). In all cases the cas operon architecture diversity is thus consistent with the subclustering of Cas9 sequences. These characteristics together with the general topology of the Cas9 tree divided into three major, distinct, monophyletic clusters, led us to propose a new, further division of the type II CRISPR-Cas system into three subtypes. Type II-A is associated with Csx12-like Cas9 and Cas4, type II-B is associated with Csn2-like and type II-C only contains the minimal set of the cas9, cas1 and cas2 genes, as depicted in FIG. 32A-32B.

FIG. 33A-33E depicts the architecture of type II CRISPR-Cas from selected bacterial species. The vertical bars group the loci that code for Cas9 orthologues belonging to the same tree subcluster (compare with FIG. 32A-32B). Horizontal black bar, leader sequence; black rectangles and diamonds, repeat-spacer array. Predicted anti-repeats are represented by arrows indicating the direction of putative tracrRNA orthologue transcription. Note that for the loci that were not verified experimentally, the CRISPR repeat-spacer array is considered here to be transcribed from the same strand as the cas operon. The transcription direction of the putative tracrRNA orthologue is indicated accordingly.

In Silico Predictions of Novel tracrRNA Orthologues

Type II loci selected earlier based on the 75 representative Cas9 orthologues were screened for the presence of putative tracrRNA orthologues. Our previous analysis performed on a restricted number of tracrRNA sequences revealed that neither the sequences of tracrRNAs nor their localization within the CRISPR-Cas loci seemed to be conserved. However, as mentioned above, tracrRNAs are also characterized by an anti-repeat sequence capable of base-pairing with each of the pre-crRNA repeats to form tracrRNA:precrRNA repeat duplexes that are cleaved by RNase III in the presence of Cas9. To predict novel tracrRNAs, we took advantage of this characteristic and used the following workflow: (i) screen for potential anti-repeats (sequence base-pairing with CRISPR repeats) within the CRISPR-Cas loci, (ii) select anti-repeats located in the intergenic regions, (iii) validate CRISPR anti-repeat:repeat base-pairing, and (iv) predict promoters and Rho-independent transcriptional terminators associated to the identified tracrRNAs.

To screen for putative anti-repeats, we retrieved repeat sequences from the CRISPRdb database or, when the information was not available, we predicted the repeat sequences using the CRISPRfinder software. In our previous study, we showed experimentally that the transcription direction of the repeat-spacer array compared to that of the cas operon varied among loci. Here RNA sequencing analysis confirmed this observation. In some of the analyzed loci, namely in F. novicida, N. meningitidis and C. jejuni, the repeat-spacer array is transcribed in the opposite direction of the cas operon (see paragraph ‘Deep RNA sequencing validates expression of novel tracrRNA orthologues’ and FIGS. 33A-33E and FIGS. 34A-34B) while in S. pyogenes, S. mutans, S. thermophilus and L. innocua, the array and the cas operon are transcribed in the same direction. These are the only type II repeat-spacer array expression data available to date. To predict the transcription direction of other repeat-spacer arrays, we considered the previous observation according to which the last repeats of the arrays are usually mutated. This remark is in agreement with the current spacer acquisition model, in which typically the first repeat of the array is duplicated upon insertion of a spacer sequence during the adaptation phase. For 37 repeat spacer arrays, we were able to identify the mutated repeat at the putative end of the arrays. We observed that the predicted orientation of transcription for the N. meningitidis and C. jejuni repeat-spacer array would be opposite to the orientation determined experimentally (RNA sequencing and Northern blot analysis). As the predicted orientation is not consistent within the clusters and as in most of the cases we could detect potential promoters on both ends of the arrays, we considered transcription of the repeat-spacer arrays to be in the same direction as transcription of the cas operon, if not validated otherwise.

FIG. 34A-34B depicts tracrRNA and pre-crRNA co-processing in selected type II CRISPR Cas systems. CRISPR loci architectures with verified positions and directions of tracrRNA and pre-crRNA transcription are shown. Top sequences, pre-crRNA repeats; bottom sequences, tracrRNA sequences base-pairing with crRNA repeats. Putative RNA processing sites as revealed by RNA sequencing are indicated with arrowheads. For each locus, arrowhead sizes represent relative amounts of the retrieved 5′ and 3′ ends (see also FIG. 37A-37O).

FIG. 37A-37O lists all tracrRNA orthologues and mature crRNAs retrieved by sequencing for the bacterial species studied, including coordinates (region of interest) and corresponding cDNA sequences (5′ to 3′). The arrows represent the transcriptional direction (strand). Number of cDNA reads (calculated using SAMtools), coverage numbers (percentage of mapped reads) and predominant ends associated with each transcript are indicated. Numbers of reads starting or stopping at each nucleotide position around the 5′ and 3′ ends of each transcript are displayed. The sizes of each crRNA mature forms are indicated. The number allocated to each crRNA species corresponds to the spacer sequence position in the pre-crRNA, according to the CRISPRdb. The number allocated to each tracrRNA species corresponds to different forms of the same transcript.

We then screened the selected CRISPR-Cas loci including sequences located 1 kb upstream and downstream on both strands for possible repeat sequences that did not belong to the repeat-spacer array, allowing up to 15 mismatches. On average, we found one to three degenerated repeat sequences per locus that would correspond to anti-repeats of tracrRNA orthologues and selected the sequences located within the intergenic regions. The putative anti-repeats were found in four typical localizations: upstream of the cas9 gene, in the region between cas9 and cas1, and upstream or downstream of the repeat-spacer array (FIG. 33A-33E). For every retrieved sequence, we validated the extent of base-pairing formed between the repeat and anti-repeat (FIG. 44A-44C) by predicting the possible RNA:RNA interaction and focusing especially on candidates with longer and perfect complementarity region forming an optimal double-stranded structure for RNase III processing. To predict promoters and transcriptional terminators flanking the anti-repeat, we set the putative transcription start and termination sites to be included within a region located maximally 200 nt uptream and 100 nt downstream of the anti-repeat sequence, respectively, based on our previous observations26. As mentioned above, experimental information on the transcriptional direction of most repeat-spacer arrays of type II systems is lacking. The in silico promoter prediction algorithms often give false positive results and point to putative promoters that would lead to the transcription of repeat-spacer arrays from both strands. In some cases we could not predict transcriptional terminators, even though the tracrRNA orthologue expression could be validated experimentally, as exemplified by the C. jejuni locus (see paragraph ‘Deep RNA sequencing validates expression of novel tracrRNA orthologues’). We suggest to consider promoter and transcriptional terminator predictions only as a supportive, but not essential, step of the guideline described above.

FIG. 44A-44C depicts predicted pre-crRNA repeat:tracrRNA anti-repeat basepairing in selected bacterial species. bThe CRISPR loci belong to the type II (Nmeni/CASS4) CRISPR-Cas system. Nomenclature is according to the CRISPR database (CRISPRdb). Note that S. thermophilus LMD-9 and W. succinogenes contain two type II loci. cUpper sequence, pre-crRNA repeat consensus sequence (5′ to 3′); lower sequence, tracrRNA homologue sequence annealing to the repeat (anti-repeat; 3′ to 5′). Note that the repeat sequence given is based on the assumption that the CRISPR repeat-spacer array is transcribed from the same strand as the cas operon. For the sequences that were validated experimentally in this study, RNA sequencing data were taken into account to determine the base-pairing. See FIG. 33A-33E. dTwo possible anti-repeats were identified in the F. tularensis subsp. novicida, W. succinogenes and gamma proteobacterium HTCC5015 type II-A loci. Upper sequence pairing, anti-repeat within the putative leader sequence; lower sequence pairing, anti-repeat downstream of the repeat spacer array. See FIG. 33A-33E. eTwo possible anti-repeats were identified in the S. wadsworthensis type II-A locus. Upper sequence pairing, anti-repeat; lower sequence pairing, anti-repeat within the putative leader sequence See FIG. 33A-33E. fTwo possible anti-repeats were identified in the L. gasseri type II-B locus. Upper sequence pairing, anti-repeat upstream of cas9; lower sequence pairing, anti-repeat between the cas9 and cas1 genes. See FIG. 33A-33E. gTwo possible anti-repeats were identified in the C. jejuni type II-C loci. Upper sequence pairing, anti-repeat upstream of cas9; lower sequence pairing, anti-repeat downstream of the repeat-spacer array. See FIG. 33A-33E. hTwo possible anti-repeats were identified in the R. rubrum type II-C locus. Upper sequence pairing, antirepeat downstream of the repeat-spacer array; lower sequence pairing, anti-repeat upstream of cas1. See FIG. 33A-33E.

A Plethora of tracrRNA Orthologues

We predicted putative tracrRNA orthologues for 56 of the 75 loci selected earlier. The results of predictions are depicted in FIG. 33A-33E. As already mentioned, the direction of tracrRNA transcription indicated in this Figure is hypothetical and based on the indicated direction of repeat-spacer array transcription. As previously stated, sequences encoding putative tracrRNA orthologues were identified upstream, within and downstream of the cas operon, as well as downstream of the repeat spacer arrays, including the putative leader sequences, commonly found in type II-A loci (FIG. 33A-33E). However, we observed that anti-repeats of similar localization within CRISPR-Cas loci can be transcribed in different directions (as observed when comparing e.g. Lactobacillus rhamnosus and Eubacterium rectale or Mycoplasma mobile and S. pyogenes or N. meningitidis) (FIG. 33A-33E). Notably, loci grouped within a same subcluster of the Cas9 guide tree share a common architecture with respect to the position of the tracrRNA-encoding gene. We identified anti-repeats around the repeat-spacer array in type II-A loci, and mostly upstream of the cas9 gene in types II-B and II-C with several notable exceptions for the putative tracrRNA located between cas9 and call in three distinct subclusters of type II-B.

Some Type II CRISPR-Cas Loci have Defective Repeat-Spacer Arrays and/or tracrRNA Orthologues

For six type II loci (Fusobacterium nucleatum, Aminomonas paucivorans, Helicobacter mustelae, Azospirillum sp., Prevotella ruminicola and Akkermansia muciniphila), we identified potential anti-repeats with weak base-pairing to the repeat sequence or located within the open reading frames. Notably, in these loci, a weak anti-repeat within the open reading frame of the gene encoding a putative ATPase in A. paucivorans, a strong anti-repeat within the first 100 nt of the cas9 gene in Azospirillum sp. B510 and a strong anti-repeat overlapping with both cas9 and cas1 in A. muciniphila were identified (FIG. 33A-33E). For twelve additional loci (Peptoniphilus duerdenii, Coprococcus catus, Acidaminococcus intestini, Catenibacterium mitsuokai, Staphylococcus pseudintermedius, Ilyobacter polytropus, Elusimicrobium minutum, Bacteroides fragilis, Acidothermus cellulolyticus, Corynebacterium diphteriae, Bifidobacterium longum and Bifidobacterium dentium), we could not detect any putative anti-repeat. There is no available information on pre-crRNA expression and processing in these CRISPR-Cas loci. Thus, the functionality of type II systems in the absence of a clearly defined tracrRNA orthologue remains to be addressed. For seven analyzed loci we could not identify any repeat spacer array (Parasutterella excrementihominis, Bacillus cereus, Ruminococcus albus, Rhodopseudomonas palustris, Nitrobacter hamburgensis, Bradyrhizobium sp. and Prevotella micans) (FIGS. 33A-33E) and in three of those (Bradyrhizobium sp. BTAi1, N. hamburgensis and B. cereus) we detected cas9 as a single gene with no other cas genes in the vicinity. For these three loci, we failed to predict any small RNA sequence upstream or downstream of the cas9 gene. In the case of R. albus and P. excrementihominis, the genomic contig containing cas9 is too short to allow prediction of the repeat spacer array.

Deep RNA Sequencing Validates Expression of Novel tracrRNA Orthologues

To verify the in silico tracrRNA predictions and determine tracrRNA:pre-crRNA coprocessing patterns, RNAs from selected Gram-positive (S. mutans and L. innocua) and Gram-negative (N. meningitidis, C. jejuni and F. novicida) bacteria were analyzed by deep sequencing. Sequences of tracrRNA orthologues and processed crRNAs were retrieved (FIGS. 36A-36F and FIGS. 37A-37O). Consistent with previously published differential tracrRNA sequencing data in S. pyogenes26, tracrRNA orthologues were highly represented in the libraries, ranging from 0.08 to 6.2% of total mapped reads. Processed tracrRNAs were also more abundant than primary transcripts, ranging from 66% to more than 95% of the total amount of tracrRNA reads (FIGS. 36A-36F and FIGS. 37A-37O).

FIGS. 36A-36F depict the expression of bacterial tracrRNA orthologues and crRNAs revealed by deep RNA sequencing. Expression profiles of tracrRNA orthologues and crRNAs of selected bacterial strains are represented along the corresponding genomes by bar charts (Images captured from the Integrative Genomics Viewer (IGV) tool). Campylobacter jejuni (GenBank: NC_002163), Francisella novicida (GenBank: NC_008601), Neisseria meningitidis (GenBank: NC_003116), Listeria innocua (GenBank: NC_003212) and Streptococcus mutans (GenBank: NC_004350). Genomic coordinates are given. aSequence coverage calculated using BEDTools-Version-2.15.0 (Scale given in reads per million). bDistribution of reads starting (5′) and ending (3′) at each nucleotide position are indicated (Scale given in numbers of reads). Upper panels correspond to transcripts from the positive strand and lower panels correspond to transcripts from the negative strand. The negative coverage values and peaks presented below the axes indicate transcription from the negative strand of the genome. Predominant 5′- and 3′-ends of the reads are plotted for all RNAs. Note that given the low quality of L. innocua cDNA library, the reads are shortened for crRNAs, and an accumulation of the reads at the 3′ end of tracrRNA is observed, presumably due to RNA degradation.

To assess the 5′ ends of tracrRNA primary transcripts, we analyzed the abundance of all 5′ end reads of tracrRNA and retrieved the most prominent reads upstream or in the vicinity of the 5′ end of the predicted anti-repeat sequence. The 5′ ends of tracrRNA orthologues were further confirmed using the promoter prediction algorithm. The identified 5′ ends of tracrRNAs from S. mutans, L. innocua and N. meningitidis correlated with both in silico predictions and Northern blot analysis of tracrRNA expression26. The most prominent 5′ end of C. jejuni tracrRNA was identified in the middle of the anti-repeat sequence. Five nucleotides upstream, an additional putative 5′ end correlating with the in silico prediction and providing longer sequence of interaction with the CRISPR repeat sequence was detected. We retrieved relatively low amount of reads from the F. novicida library that corresponded almost exclusively to processed transcripts. Analysis of the very small amount of reads of primary transcripts provided a 5′ end that corresponded to the strong in silico promoter predictions. Northern blot probing of F. novicida tracrRNA further confirmed the validity of the predictions showing the low abundance of transcripts of around 90 nt in length. The results are listed in Table 2. For all examined species, except N. meningitidis, primary tracrRNA transcripts were identified as single small RNA species of 75 to 100 nt in length. In the case of N. meningitidis, we found a predominant primary tracrRNA form of ˜110 nt and a putative longer transcript of ˜170 nt represented by a very low amount of reads and detected previously as a weak band by Northern blot analysis.


TABLE 2
Selected tracrRNA orthologues
5′-endb
RNA-seq
Most
Length
Strainsa
Transcript
First read
prominent
Predicted
3′-endc
(nt)
S. pyogenes
primary
854 546
854 376
171
SF370
primary
854 464
89
processed
854 450
~75
C. jejuni
primary
1 455 497
1 455 502  
1 455 497
1 455 570  
~75
NCTC 11168
processed
1 455 509  
~60
L. innocue
primary
2 774 774
2 774 774
2 774 773  
2 774 863  
~90
Clip11262
processed
2 774 788  
~75
S. mutans
primary
1 335 040
1 335 040
1 355 039  
1 335 141  
~100
UA159
processed
1 335 054  
~85
1 335 062  
~80
N. meningitidis
primary
  614 158
614 162
614 154
614 333
~175
A Z2491
primary
614 223
614 225
614 223
~110
processed
614 240
~90
F. novicida
primary
  817 144
817 145
817 065
~80
U112
817 154
processed
817 138
~75
817 128
~85
S. thermophilus
primary
1 384 330  
1 384 425  
~95
LMD-8
primary
646 654
646 762
~110
P. multocida
primary
1 327 287
1 327 396  
~110
Pm70
M. mobile
primary
49 470
 49 361
~110
163K
atracrRNA orthologues of S. thermophilus, P. multocida and M. mobile were predicted in silico.
bRNA-seq, revealed by RNA sequencing (Table S3); first read, first 5′-end position retrieved by sequencing; most prominent, abundant 5′-end according to RNA-seq data; predicted, in silico prediction of transcription start site; underlined, 5′-end chosen for the primary tracrRNA to be aligned.
cEstimated 3′ end according to RNA-seq data and transcriptional terminator prediction.

tracrRNA and Pre-crRNA Co-Processing Sites Lie in the Anti-Repeat:Repeat Region.

We examined the processed tracrRNA transcripts by analyzing abundant tracrRNA 5′ ends within the predicted anti-repeat sequence and abundant mature crRNA 3′ ends (FIGS. 34A-34B and FIGS. 45A-45B). In all species, we identified the prominent 5′ ends of tracrRNA orthologues that could result from co-processing of the tracrRNA:pre-crRNA repeat duplexes by RNase III. We also identified the processed 5′-ends of crRNAs that most probably result from a second maturation event by putative trimming, consistently with previous observations. Noteworthy, in the closely related RNA pairs of S. pyogenes, S. mutans and L. innocua, we observed the same processing site around the G:C basepair in the middle of the anti-repeat sequence. In both S. mutans and L. innocua, we detected additional prominent tracrRNA 5′ ends and crRNA 3′ ends that could suggest further trimming of the tracrRNA:crRNA duplex, with 3′-end of crRNA being shortened additionally to the already mentioned 5′-end trimming, following the RNase III-catalyzed first processing event. Similarly, in C. jejuni we found only a small amount of crRNA 3′ ends that would fit to the RNase III processing patterns and retrieved the corresponding 5′ ends of processed tracrRNA. Thus, the putative trimming of tracrRNA:crRNA duplexes after initial cleavage by RNase III would result in a shorter repeat-derived part in mature crRNAs, producing shorter tracrRNA:crRNA duplexes stabilized by a triple G:C base-pairing for interaction with the endonuclease Cas9 and subsequent cleavage of target DNAs. The N. meningitidis RNA duplex seems to be processed at two primary sites further to the 3′ end of the CRISPR repeat, resulting in a long repeat-derived part in mature crRNA and stable RNA:RNA interaction despite the central bulge within the duplex. Interestingly, the tracrRNA:pre-crRNA duplex of F. novicida seems to be cleaved within the region of low complementarity and some of the retrieved abundant 5′ ends of tracrRNA suggest its further trimming without concomitant trimming of crRNA. Differences in primary transcript sizes and in the location of processing sites result in various lengths of processed tracrRNAs ranging from ˜65 to 85 nt. The coordinates and sizes of the prominent processed tracrRNA transcripts are shown in Table 2 and FIG. 37A-37O. The observed processing patterns of tracrRNA and crRNA are well in agreement with the previously proposed model of two maturation events. The putative further trimming of some of the tracrRNA 5′-ends and crRNA 3′-ends could stem from the second maturation event or alternatively, be an artifact of the cDNA library preparation or RNA sequencing. The nature of these processings remains to be investigated further.

Sequences of tracrRNA Orthologues are Highly Diverse

Sequences similarities of selected tracrRNA orthologues were also determined We performed multiple sequence alignments of primary tracrRNA transcripts of S. pyogenes (89 nt form only), S. mutans, L. innocua and N. meningitidis (110 nt form only), S. thermophilus, P. multocida and M. mobile (Table 2, FIG. 35). We observed high diversity in tracrRNA sequences but significant conservation of sequences from closely related CRISPR-Cas loci. tracrRNAs from L. innocua, S. pyogenes, S. mutans and S. thermophiles share on average 77% identity and tracrRNAs from N. meningitidis and P. multocida share 82% identity according to pairwise alignments. The average identity of the analyzed tracrRNA sequences is 56%, comparable to the identity of random RNA sequences. This observation further confirms that the prediction of tracrRNA orthologues based on sequence similarity can be performed only in the case of closely related loci. We also sought for possible tracrRNA structure conservation but could not find any significant similarity except one co-variation and conserved transcriptional terminator structure (FIG. 35).

FIG. 35 depicts sequence diversity of tracrRNA orthologues. tracrRNA sequence multiple alignment. S. thermophilus and S. thermophilus2, tracrRNA associated with SEQ ID NO:41 and SEQ ID NO:40 Cas9 orthologues, accordingly. Black, highly conserved; dark grey, conserved; light grey, weakly conserved. Predicted consensus structure is depicted on the top of the alignment. Arrows indicate the nucleotide covariations. S. pyogenes SF370, S. mutans UA159, L. innocua Clip11262, C. jejuni NCTC 11168, F. novicida U112 and N. meningitidis A Z2491 tracrRNAs were validated by RNA sequencing and Northern blot analysis. S. thermophiles LMD-9 tracrRNA was validated by Northern blot analysis. P. multocida Pm70 tracrRNA was predicted from high similarity of the CRISPR-Cas locus with that of N. meningitidis A Z2491. M. mobile 163K tracrRNA was predicted in silico from strong predictions of transcriptional promoter and terminator.

Example 4: Repurposing CRISPR as an RNA-Guided Platform for Sequence-Specific Control of Gene Expression

Targeted gene regulation on a genome-wide scale is a powerful strategy for interrogating, perturbing and engineering cellular systems. The inventors have developed a new method for controlling gene expression, based on Cas9, an RNA-guided DNA endonuclease from a Type II CRISPR system. This example demonstrates that a catalytically dead Cas9, lacking endonuclease activity, when co-expressed with a guide RNA, generates a DNA recognition complex that can specifically interfere with transcriptional elongation, RNA polymerase binding, or transcription factor binding. This system, called CRISPR interference (CRISPRi), can efficiently repress expression of targeted genes in Escherichia coli with no detectable off-target effects. CRISPRi can be used to repress multiple target genes simultaneously, and its effects are reversible. In addition, the system can be adapted for gene repression in mammalian cells. This RNA-guided DNA recognition platform provides a simple approach for selectively perturbing gene expression on a genome-wide scale.

Materials and Methods

Strains and Media

The Escherichia coli K-12 strain MG1655 was used as the host strain for the in vivo fluorescence measurements. An E. coli MG1655-derived strain that endogenously expresses a variant of RNAP with a 3×-FLAG epitope tag attached to the C-terminal end of the RpoC subunit was used for all sequencing experiments. EZ rich defined media (EZ-RDM, Teknoka) was used as the growth media for in vivo fluorescence assays. Genetic transformation and verification of transformation were done using standard protocols, using AmpR, CmR, or KanR genes as selectable markers.

Plasmid Construction and E. coli Genome Cloning

The Cas9 and dCas9 genes were cloned from the previous described vector pMJ806 and pMJ841, respectively. The genes were PCR amplified and inserted into a vector containing an anhydrotetracycline (aTc)-inducible promoter PLtetO-1, a chloramphenicol selectable marker and a p15A replication origin. The sgRNA template was cloned into a vector containing a minimal synthetic promoter (J23119) with an annotated transcription start site, an ampicillin selectable marker and a ColE1 replication origin. Inverse PCR was used to generate sgRNA cassettes with new 20-bp complementary regions. To insert fluorescent reporter genes into E. coli genomes, the fluorescence gene was first cloned onto an entry vector, which was then PCR amplified to generate linearized DNA fragments that contained nsfA 5′/3′ UTR sequences, the fluorescent gene and a KanR selectable marker. The E. coli MG1655 strain was transformed with a temperature-sensitive plasmid pKD46 that contained λ-Red recombination proteins (Exo, Beta and Gama). Cell cultures were grown at 30° C. to an OD (600 nm) of ˜0.5, and 0.2% arabinose was added to induce expression of the λ-Red recombination proteins for 1 h. The cells were harvested at 4° C., and used for transformation of the linearized DNA fragments by electroporation. Cells that contain correct genome insertions were selected by using 50 μg/mL Kanamycin.

Flow Cytometry and Analysis

Strains were cultivated in EZ-RDM containing 100 μg/mL carbenicillin and 34 μg/mL chloramphenicol in 2 mL 96-well deep well plates (Costar 3960) overnight at 37 C and 1200 r.p.m. One-μL of this overnight culture was then added to 249 μL of fresh EZ-RDM with the same antibiotic concentrations with 2 μM aTc supplemented to induce production of the dCas9 protein. When cells were grown to mid-log phase (˜4 h), the levels of fluorescence protein were determined using the LSRII flow cytometer (BD Biosciences) equipped with a high-throughput sampler. Cells were sampled with a low flow rate until at least 20,000 cells had been collected. Data were analyzed using FCS Express (De Novo Software) by gating on a polygonal region containing 60% cell population in the forward scatter-side scatter plot. For each experiment, triplicate cultures were measured, and their standard deviation was indicated as the error bar.

B-Galactosidase Assay

To perform β-galactosidase assay, 1 μL of overnight culture prepared as above was added to 249 μL of fresh EZ-RDM with the same antibiotic concentrations with 2 μM aTc, with or without 1 mM Isopropyl β-D-1-thiogalactopyranoside (IPTG). Cells were grown to mid-log phase. The LacZ activity of 100 uL of this culture was measured using the yeast β-galactosidase assay kit (Pierce) following the instructions.

Extraction and Purification of Total RNA

For each sample, a monoclonal culture of E. coli was grown at 37° C. from an OD (600 nm) 0.1 in 500 mL of EZ-RDM to early log-phase (OD 0.45±0.05), at which point the cells were harvested by filtration over 0.22 μm nitrocellulose filters (GE) and frozen in liquid nitrogen to simultaneously halt all transcriptional progress. Frozen cells (100 μg) were pulverized on a Qiagen TissueLyser II mixer mill 6 times at 15 Hz for 3 min in the presence of 500 μL frozen lysis buffer (20 mM Tris pH 8, 0.4% Triton X-100, 0.1% NP-40, 100 mM NH4Cl, 50 U/mL SUPERase.In (Ambion) and 1× protease inhibitor cocktail (Complete, EDTA-free, Roche), supplemented with 10 mM MnCl2 and 15 μM Tagetin transcriptional inhibitor (Epicentre).

The lysate was resuspended on ice by pipetting. RQ1 DNase I (110 U total, Promega) was added and incubated for 20 min on ice. The reaction was quenched with EDTA (25 mM final) and the lysate clarified at 4° C. by centrifugation at 20,000 g for 10 min. The lysate was loaded onto a PD MiniTrap G-25 column (GE Healthcare) and eluted with lysis buffer supplemented with 1 mM EDTA.

Total mRNA Purification

Total RNA was purified from the clarified lysate using the miRNeasy kit (Qiagen). 1 μg of RNA in 20 μL of 10 mM Tris pH 7 was mixed with an equal volume of 2× alkaline fragmentation solution (2 mM EDTA, 10 mM Na2CO3, 90 mM NaHCO3, pH 9.3) and incubated for ˜25 min at 95° C. to generate fragments ranging from 30-100 nt. The fragmentation reaction was stopped by adding 0.56 mL of ice-cold precipitation solution (300 mM NaOAc pH 5.5 plus GlycoBlue (Ambion)), and the RNA was purified by a standard isopropanol precipitation. The fragmented mRNA was then dephosphorylated in a 50 μL reaction with 25 U T4 PNK (NEB) in 1×PNK buffer (without ATP) plus 0.5 U SUPERase.In, and precipitated with GlycoBlue via standard isopropanol precipitation methods.

Nascent RNA Purification

For nascent RNA purification, the clarified lysate was added to 0.5 mL anti-FLAG M2 affinity gel (Sigma Aldrich) as described previously. The affinity gel was washed twice with lysis buffer supplemented with 1 mM EDTA before incubation with the clarified lysate at 4° C. for 2.5 h with nutation. The immunoprecipitation was washed 4×10 ml with lysis buffer supplemented with 300 mM KCl, and bound RNAP was eluted twice with lysis buffer supplemented with 1 mM EDTA and 2 mg/mL 3×-FLAG peptide (Sigma Aldrich). Nascent RNA was purified from the eluate using the miRNeasy kit (Qiagen) and converted to DNA using a previously established library generation protocol.

DNA Library Preparation and DNA Sequencing

The DNA library was sequencing on an Illumina HiSeq 2000. Reads were processed using the HTSeq Python package and other custom software written in Python. The 3′-end of the sequenced transcript was aligned to the reference genome using Bowtie (“bowtie-bio” preceeding “.sourceforge.net”) and the RNAP profiles generated in MochiView (“johnsonlab.ucsf” preceeding “.edu/mochi.html”).

Plasmid Design and Construction for CRISPRi in Human Cells

The sequence encoding mammalian codon optimized Streptococcus pyogenes Cas9 (DNA 2.0) was fused with three C-terminal SV40 nuclear localization sequences (NLS) or to tagBFP flanked by two NLS. Using standard ligation independent cloning we cloned these two fusion proteins into MSCV-Puro (Clontech). Guide sgRNAs were expressed using a lentiviral U6 based expression vector derived from pSico which co-expresses mCherry from a CMV promoter. The sgRNA expression plasmids were cloned by inserting annealed primers into the lentiviral U6 based expression vector that was digested by BstXI and XhoI.

Cell Culture, DNA Transfections and Fluorescence Measurements for CRISPRi in Human Cells

HEK293 cells were maintained in Dulbecco's modified eagle medium (DMEM) in 10% FBS, 2 mM glutamine, 100 units/mL streptomycin and 100 m/mL penicillin. HEK293 were infected with a GFP expressing MSCV retrovirus using standard protocols and sorted by flow cytometry using a BD FACS Aria2 for stable GFP expression. GFP expressing HEK293 cells were transiently transfected using TransIT-LT1 transfection reagent (Mirus) with the manufacturers recommended protocol in 24 well plates using 0.5 μg of the dCas9 expression plasmid and 0.5 μg of the RNA expression plasmid (with 0.25 μg of GFP reporter plasmid for FIG. 45B). 72 hours following transfection, cells were trypsinized to a single cell suspension. The U6 vector contains a constitutive CMV promoter driving a mCherry gene. GFP expression was analyzed using a BD LSRII FACS machine by gating on the mCherry positive populations (>10-fold brighter mCherry over the negative control cells).

Designed RNAs

sgRNA designs used in the Figures: only the 20 nucleotide matching region (DNA targeting segment) are listed (unless otherwise noted):

The mRFP-targeting sgRNAs used in FIG. 40C (SEQ ID NOs:741-746);

The promoter-targeting sgRNAs used in FIG. 40D (SEQ ID NOs:747-751);

Target promoter sequence in FIG. 40D (SEQ ID NO:752);

The mRFP-targeting sgRNAs used in FIG. 43B (SEQ ID NOs:753-760);

The sfGFP-targeting sgRNA (gfp) used in FIG. 42B (SEQ ID NO:761);

The sfGFP-targeting sgRNAs used in FIG. 43B (SEQ ID NOs:762-769);

The double-sgRNA targeting experiments in FIG. 43F and FIGS. 51A-51C (SEQ ID NOs:770-778);

The lac operon-targeting sgRNAs used in FIG. 44B (SEQ ID NOs:779-787); and

The EGFP-targeting sgRNAs used in FIG. 45A-45B (SEQ ID NOs:788-794).


TABLE 3
Sequences used in the FIGURES of Example 4 (listed above)
Sequence
SEQ ID NO:
T1
741
T2
742
T3
743
NT1
744
NT2
745
NT3
746
P1
747
P2
748
P3
749
P4
750
P5
751
R1
770
R2
771
R3
772
R4
773
R5
774
R6
775
R7
776
R8
777
R9
778
lacZ
779
lacI
780
lacY
781
lacA
782
crp
783
cya
784
A site
785
O site
786
P site
787
eT1
788
eT2
789
eNT1
790
eNT2
791
eNT3
792
eNT4
793
eNT5
794

Results

The CRISPR (clustered regularly interspaced short palindromic repeats) system provides a new potential platform for targeted gene regulation. About 40% of bacteria and 90% of archaea possess CRISPR/CRISPR-associated (Cas) systems to confer resistance to foreign DNA elements. CRISPR systems use small base-pairing RNAs to target and cleave foreign DNA elements in a sequence-specific manner. There are diverse CRISPR systems in different organisms, and one of the simplest is the type II CRISPR system from Streptococcus pyogenes: only a single gene encoding the Cas9 protein and two RNAs, a mature CRISPR RNA (crRNA) and a partially complementary trans-acting RNA (tracrRNA), are necessary and sufficient for RNA-guided silencing of foreign DNAs (FIG. 46). Maturation of crRNA requires tracrRNA and RNase III. However, this requirement can be bypassed by using an engineered small guide RNA (sgRNA) containing a designed hairpin that mimics the tracrRNA-crRNA complex. Base pairing between the sgRNA and target DNA causes double-strand breaks (DSBs) due to the endonuclease activity of Cas9. Binding specificity is determined by both sgRNA-DNA basepairing and a short DNA motif (protospacer adjacent motif or PAM, sequence: NGG) juxtaposed to the DNA complementary region. Thus, the CRISPR system only requires a minimal set of two molecules—the Cas9 protein and the sgRNA, and therefore holds the potential to be used as a host-independent gene-targeting platform. It has been demonstrated that the Cas9/CRISPR can be harnessed for site-selective RNA-guided genome editing (FIG. 39A).

FIG. 46 depicts the mechanism of the type II CRISPR system from S. pyogenes. The system consists of a set of CRISPR-associated (Cas) proteins and a CRISPR locus that contains an array of repeat spacer sequences. All repeats are the same and all spacers are different and complementary to the target DNA sequences. When the cell is infected by foreign DNA elements, the CRISPR locus will transcribe into a long precursor transcript, which will be cleaved into smaller fragments. The cleavage is mediated by a transacting antisense RNA (tracrRNA) and the host RNase III. After cleavage, one single protein, Cas9, recognizes and binds to the cleaved form of the crRNA. Cas9 guides crRNA to DNA and scans the DNA molecule. The complex is stabilized by basepairing between the crRNA and the DNA target. In this case, Cas9 causes double-stranded DNA breaks due to its nuclease activity. This usually removes cognate DNA molecules, and cells confer immunity to certain DNA populations.

FIG. 39A-39B depicts the design of the CRISPR interference (CRISPRi) system. (FIG. 39A) The minimal interference system consists of a single protein and a designed sgRNA chimera. The sgRNA chimera consists of three domains (boxed region): a 20-nucleotide (nt) complementary region for specific DNA binding, a 42-nt hairpin for Cas9 binding (Cas9 handle), and a 40-nt transcription terminator derived from S. pyogenes. The wild-type Cas9 protein contains the nuclease activity. The dCas9 protein is defective in nuclease activity. (FIG. 39B) The wild-type Cas9 protein binds to the sgRNA and forms a protein-RNA complex. The complex binds to specific DNA targets by Watson-Crick basepairing between the sgRNA and the DNA target. In the case of wild-type Cas9, the DNA will be cleaved due to the nuclease activity of the Cas9 protein. In the case of nuclease defective Cas9, the complex disrupts appropriate transcription. A minimal CRISPRi system consists of a single protein and RNA and can effectively silence transcription initiation and elongation

To implement such a CRISPRi platform in E. coli, the wild-type S. pyogenes Cas9 gene and an sgRNA were expressed from bacterial vectors to determine if the system could perturb gene expression at a targeted locus (FIG. 40A). The S. pyogenes CRISPR system is orthogonal to the native E. coli system. The Cas9 protein is expressed from an anhydrotetracycline (aTc)-inducible promoter on a plasmid containing a p15A replication origin, and the sgRNA is expressed from a minimal constitutive promoter on a plasmid containing a ColE1 replication origin. As an alternative strategy, a catalytically dead Cas9 mutant (dCas9), which is defective in DNA cleavage, was used and showed that this form of Cas9 still acts as a simple RNA-guided DNA binding complex.

FIG. 40A-40E demonstrates that CRISPRi effectively silences transcription elongation and initiation. (FIG. 40A) The CRISPRi system consists of an inducible Cas9 protein and a designed sgRNA chimera. The dCas9 contains mutations of the RuvC1 and HNH nuclease domains. The sgRNA chimera contains three functional domains as described in FIG. 39A-39B. (FIG. 40B) Sequence of designed sgRNA (NT1) and the DNA target. NT1 targets the non-template DNA strand of mRFP coding region. Only the region surrounding the base-pairing motif (20-nt) is shown. Base-pairing nucleotides are numbered and the dCas9-binding hairpin is overlined. The PAM sequence is underlined. (FIG. 40C) CRISPRi blocked transcription elongation in a strand-specific manner A synthetic fluorescence-based reporter system containing an mRFP-coding gene was inserted into the E. coli MG1655 genome (the nsfA locus). Six sgRNAs that bind to either the template DNA strand or the non-template DNA strand were co-expressed with the dCas9 protein, with their effects on the target mRFP measured by in vivo fluorescence assay. Only sgRNAs that bind to the non-template DNA strand showed silencing (10˜300-fold). The control shows fluorescence of the cells with dCas9 protein but without the sgRNA. (FIG. 40D) CRISPRi blocked transcription initiation. Five sgRNAs were designed to bind to different regions around an E. coli promoter (J23119). The transcription start site was labeled as +1. The dotted oval shows the initial RNAP complex that covers a 75-bp region from −55 to +20. Only sgRNAs targeting regions inside the initial RNAP complex showed repression (P1-P4). Unlike transcription elongation block, silencing was independent of the targeted DNA strand. (FIG. 40E) CRISPRi regulation was reversible. Both dCas9 and sgRNA (NT1) were under the control of an aTc-inducible promoter. Cell culture was maintained during exponential phase. At time T=0, 1 μM of aTc was supplemented to cells with OD=0.001. Repression of target mRFP started within 10 min. The fluorescence signal decayed in a way that is consistent with cell growth, suggesting the decay was due to cell division. In 240 min, the fluorescence reached the fully repressed level. At T=370 min, aTc is washed away from the growth media, and cells were diluted back to OD=0.001. Fluorescence started to increase after 50 min, and took about 300 min to rise to the same level as the positive control. Positive control: always without the inducer; negative control: always with 1 μM aTc inducer. Fluorescence results in 2C, 2D, and 2E represent average and SEM of at least three biological replicates. See also FIGS. 47A-47B and FIG. 48.

The sgRNA molecules co-expressed with Cas9 each consist of three segments: a 20-nucleotide (nt) target-specific complementary region, a 42-nt Cas9 binding hairpin (Cas9 handle) and a 40-nt transcription terminator derived from S. pyogenes (FIG. 40B). A red fluorescent protein (mRFP)-based reporter system, was inserted it into the E. coli MG1655 genome.

Co-expression of the wild-type Cas9 protein and an sgRNA (NT1) targeted to the mRFP coding sequence dramatically decreased transformation efficiency, likely due to Cas9-induced double-stranded breaks on the genome (FIG. 47A). Sequencing of a few survivor colonies showed that they all had sequence rearrangements around the target mRFP site on the genome, suggesting that there was strong selection against expression of wild-type Cas9 and an sgRNA targeted to a host sequence. The dCas9 mutant gene (non-cleaving), which contained two silencing mutations of the RuvC1 and HNH nuclease domains (D10A and H841A), alleviated this lethality, as verified by transformation efficiency and E. coli growth rates (FIG. 47A&FIG. 47B).

FIG. 47A-47B is related to FIGS. 40A-40E and shows Growth curves of E. coli cell cultures co-transformed with dCas9 and sgRNA. (FIG. 47A) Transformation efficiency for transforming E. coli cells with two plasmids. One plasmid contains an sgRNA that targets to a genomic copy of mRFP and the other plasmid contains the wild-type Cas9 or dCas9. Co-transformation of wild-type Cas9 and sgRNA is highly toxic, which can be alleviated using dCas9. (FIG. 47B) The sgRNA (NT1) is designed to target the coding sequence of mRFP. Co-expression of dCas9 and sgRNA exhibits almost no effects on cellular growth rates, suggesting the dCas9-sgRNA interaction with DNA is strong enough to block RNA polymerase but not DNA polymerase or cell replication. The results represent average and SEM of at least three independent experiments.

To test whether the dCas9:sgRNA complex could yield highly efficient repression of gene expression, sgRNAs complementary to different regions of the mRFP coding sequence were designed, either binding to the template DNA strand or the non-template DNA strand. The results indicated that sgRNAs targeting the non-template DNA strand demonstrated effective gene silencing (10 to 300-fold of repression), while those targeting the template strand showed little effect (FIG. 40C). The system exhibited similar repression effects for genes that were within the E. coli genome or on a high-copy plasmid (FIG. 48). Furthermore, targeting to the promoter region also led to effective gene silencing (FIG. 40D). Targeting of the sgRNA to the −35 box significantly knocked down gene expression (P1, ˜100-fold of repression), whereas targeting to other adjacent regions showed a dampened effect (P2-P4). Targeting sequences about 100-bp upstream of the promoter showed no effects (P5). Unlike targeting the coding sequence, when targeting the promoter, the efficiency of silencing is independent of the DNA strand; targeting of template or non-template strands is equally effective (P2 and P3).

FIG. 48 is related to FIG. 40C and shows that CRISPRi could silence expression of a reporter gene on a multiple-copy plasmid. The mRFP gene was cloned onto a p15A plasmid. Presence of the dCas9 and an mRFP-specific sgRNA (NT1) strongly represses mRFP (˜300-fold). The repression effect is similar to that observed using the mRFP in the genome (FIG. 40C). Silencing is only effective when the sgRNA acts on the nontemplate DNA strand but not the template DNA strand (T1). Also, silencing is highly specific, as a GFP-specific 3 sgRNA (gfp) shows no effect on mRFP expression. Fluorescence results represent average and SEM of at least three biological replicates.

CRISPRi Gene Knockdown is Inducible and Reversible

Unlike gene knockout methods, one advantage of using CRISPRi-based knock down of gene expression is the fact that this perturbation should be reversible. To test if CRISPRi regulation could be induced and subsequently reversed, both dCas9 and mRFP-specific sgRNA (NT1) were placed under the control of the aTc-inducible promoter, and time-course measurements of CRISPRi-mediated regulation of mRFP in response to inducers were performed (FIG. 40E). At time zero, cell culture that grew to the early exponential phase without inducers was supplemented with 1 μM of aTc. The data indicated that the system could quickly respond to the presence of inducers—the fluorescent reporter protein signal started to decrease within 10 min of the addition of the inducer molecule. Because the mRFP protein is stable, the rate of fluorescence signal decrease is limited by protein dilution due to cell growth, as seen by a similar cell doubling time and loss of fluorescence half-time (both ˜36 min). At 240 min, all cells were uniformly repressed to the same level as the negative control. At 420 min, the inducer was washed away from the growth media and cells were diluted back to a lower OD. After a delay of 50 min, mRFP fluorescence started to increase. It took a total 300 min for single-cell fluorescence to increase to the same level as the positive control. The 50 min delay is most likely determined by the dCas9/sgRNA turnover rate offset by dilution by cell growth and division. In summary, these results demonstrate that the silencing effects of dCas9-sgRNA can be induced and reversed.

Native Elongating Transcript Sequencing (NET-Seq) Confirms that CRISPRi Functions by Blocking Transcription

dCas9 appeared to be functioning as an RNA-guided DNA binding complex that could block RNA polymerase (RNAP) binding during transcription elongation. Since the non-template DNA strand shares the same sequence identity as the transcribed mRNA and only sgRNAs that bind to the non-template DNA strand exhibited silencing, it remained a possibility that the dCas9:sgRNA complex interacts with mRNA and alters its translation or stability. To distinguish these possibilities, a recently described native elongating transcript sequencing (NET-seq) approach was applied to E. coli, which could be used to globally profile the positions of elongating RNA polymerases and monitor the effect of the dCas9:sgRNA complex on transcription. In this NET-seq method, the CRISPRi system was transformed into an E. coli MG1655-derived strain that contained a FLAG-tagged RNAP. The CRISPRi contained an sgRNA (NT1) that binds to the mRFP coding region. In vitro immunopurification of the tagged RNAP followed by sequencing of the nascent transcripts associated with elongating RNAPs allowed for distinguishing the pause sites of the RNAP.

These experiments demonstrated that the sgRNA induced strong transcriptional pausing upstream of the sgRNA target locus (FIG. 41A). The distance between the pause site and the target site is 19-bp, which is in perfect accordance with the previously reported ˜18-bp distance between the nucleotide incorporation of RNAP and its front-edge. This finding is consistent with a mechanism of CRISPRi in which the transcription block is due to physical collision between the elongating RNAP and the dCas9:sgRNA complex (FIG. 41B). Binding of the dCas9:sgRNA complex to the template strand had little repressive effect, suggesting that RNAP was able to read through the complex in this particular orientation. In this case, the sgRNA faces the RNAP, which might be unzipped by the helicase activity of RNAP. These experiments have demonstrated that CRISPRi utilizes RNAs to directly block transcription. This mechanism is distinct from that of RNAi, for which knockdown of gene expression requires the destruction of already transcribed messenger RNAs, prior to their translation.

FIG. 41A-41B demonstrates that CRISPRi functions by blocking transcription elongation. (FIG. 41A) FLAG-tagged RNAP molecules were immunoprecipitated and the associated nascent mRNA transcripts were sequenced. The top panel shows sequencing results of the nascent mRFP transcript in cells without sgRNA, and the bottom panel shows results in cells with sgRNA. In the presence of sgRNA, a strong transcriptional pause was observed 19-bp upstream of the target site, after which the number of sequencing reads drops precipitously. (FIG. 41B) A proposed CRISPRi mechanism based on physical collision between RNAP and dCas9-sgRNA. The distance from the center of RNAP to its front edge is ˜19 bp, which matches well with our measured distance between the transcription pause site and 3′ of sgRNA basepairing region. The paused RNAP aborts transcription elongation upon encountering the dCas9-sgRNA roadblock.

CRISPRi sgRNA-Guided Gene Silencing is Highly Specific

To evaluate the specificity of CRISPRi on a genome-wide scale, whole transcriptome shotgun sequencing (RNA-seq) of dCas9-transformed cells with and without sgRNA co-expression was performed (FIG. 42A). In the presence of the sgRNA targeted to mRFP (NT1), the mRFP transcript was the sole gene exhibiting a decrease in abundance. No other genes showed significant change in expression upon addition of the sgRNA, within sequencing errors. We also performed RNA-seq on cells with different sgRNAs that target different genes. None of these experiments showed significant changes of genes besides the targeted gene (FIG. 49A-49C). Thus sgRNA-guided gene targeting and regulation is highly specific and does not have significant off-target effects.

FIG. 42A-42C demonstrates the targeting specificity of the CRISPRi system. (FIG. 42A) Genome-scale mRNA sequencing (RNA-seq) confirmed that CRISPRi targeting has no off-target effects. The sgRNA NT1 that binds to the mRFP coding region was used. The dCas9, mRFP, and sfGFP genes are highlighted. (FIG. 42B) Multiple sgRNAs can independently silence two fluorescent protein reporters in the same cell. Each sgRNA specifically repressed its cognate gene but not the other gene. When both sgRNAs were present, both genes were silenced. Error bars represent SEM from at least three biological replicates. (FIG. 42C) Microscopic images for using two sgRNAs to control two fluorescent proteins. The top panel shows the bright-field images of the E. coli cells, the middle panel shows the RFP channel, and the bottom shows the GFP panel. Co-expression of one sgRNA and dCas9 only silences the cognate fluorescent protein but not the other. The knockdown effect was strong, as almost no fluorescence was observed from cells with certain fluorescent protein silenced. Scale bar, 10 μm. Control shows cells without any fluorescent protein reporters. Fluorescence results represent average and SEM of at least three biological replicates. See also FIG. 49A-49C.

FIG. 49A-49C is related to FIG. 42A and depicts the RNA-seq data of cells with sgRNAs that target different genes. (FIG. 49A) (+/−) sgRNA that targets the promoter of the endogenous lacI gene in E. coli. The same lad-targeting sgRNA was used as in FIG. 44A. (FIG. 49B) (+/−) 1 mM IPTG for cells without auto-inhibited sgRNA (sgRNA repressed its own promoter). (FIG. 49C) (+/−) sgRNA that targets the endogenous lacZ gene in E. coli. The same lacZ-targeting sgRNA was used as in FIG. 44A. 1 mM IPTG was also supplemented to cells with the lacZ-targeting sgRNA.

CRISPRi can be Used to Simultaneously Regulate Multiple Genes

The CRISPRi system can allow control of multiple genes independently without crosstalk. A dual-color fluorescence-reporter system based on mRFP and sfGFP was devised. Two sgRNAs with distinct complementary regions to each gene were designed. Expression of each sgRNA only silenced the cognate gene and had no effect on the other. Co-expression of two sgRNAs knocked down both genes (FIGS. 42B&42C). These results suggest that the sgRNA-guided targeting is specific, with the specificity dictated by its sequence identity, and not impacted by the presence of other sgRNAs. This behavior should enable multiplex control of multiple genes simultaneously by CRISPRi.

Factors that Determine CRISPRi Silencing Efficiency

To find determinants of CRISPRi targeting efficiency, the role of length, sequence complementarity and position on silencing efficiency was investigated (FIG. 43A). As suggested in FIG. 40C, the location of the sgRNA target sequence along the gene was important for efficiency. sgRNAs were further designed to cover the full length of the coding regions for both mRFP and sfGFP (Supplemental Data for sgRNA sequences). In all cases, repression was inversely correlated with the target distance from the transcription start site (FIG. 43B). A strong linear correlation was observed for mRFP. A similar, but slightly weaker correlation was observed when sfGFP was used as the target, perhaps indicating varying kinetics of the RNA polymerase during different points in elongation of this gene.

The sgRNA contains a 20-bp region complementary to the target. To identify the importance of this basepairing region, the length of sgRNA NT1 was altered (FIG. 43C). While extension of the region from the 5′ end did not affect silencing, truncation of the region severely decreased repression. The minimal length of the basepairing region needed for gene silencing was 12 bp, with further truncation leading to complete loss of function. Single mutations were introduced into the basepairing region of sgRNA NT1 and the overall effect on silencing was tested. From the results, three sub-regions could be discerned, each with a distinct contribution to the overall binding and silencing (FIG. 43D). Any single mutation of the first 7 nucleotides dramatically decreased repression, suggesting this sequence constitutes a “seed region” for binding, as noted previously for both the type I and type II CRISPR systems. Adjacent nucleotides were also mutated in pairs (FIG. 43E and FIGS. 50A-50E). In most cases, the relative repression activity due to a double mutation was multiplicative, relative to the effects of the single mutants, suggesting an independent relationship between the mismatches. Furthermore, in agreement with previous results on the importance of the PAM sequence, an incorrect PAM totally abolished silencing even with a 20-bp perfect binding region (FIG. 43E). Thus, the specificity of the CRISPRi system is determined jointly by the PAM (2-bp) and at least a 12-bp sgRNA-DNA stretch, the space of which is large enough to cover most bacterial genomes for unique target sites.

Two sgRNAs both targeting the same gene were tested (FIG. 43F and FIGS. 51A-51C). Depending on the relative positioning of multiple sgRNAs, distinct combinatorial effects were observed. Combining two sgRNAs, each with about 300-fold repression, allowed for increased overall silencing up to a thousand-fold. Combining two weaker sgRNAs (˜5-fold) showed multiplicative effects when used together. Suppressive combinatorial effects were observed when using two sgRNAs whose targets overlapped. This was probably due to competition of both sgRNAs for binding to the same region.

FIG. 43A-43F depicts the characterization of factors that affect silencing efficiency. (FIG. 43A) The silencing effects were measured of sgRNAs with different targeting loci on the same gene (distance from the translation start codon) and sgRNAs with different lengths of the basepairing region to the same target locus (based on NT1). (FIG. 43B) The silencing efficiency was inversely correlated with the target distance from the translation start codon (orange—mRFP & green—sfGFP). The relative repression activity was calculated by normalizing repression of each sgRNA to that of the sgRNA with the highest repression fold change. Error bars represent SEM from three biological replicates. (FIG. 43C) The length of the Watson-Crick basepairing region between the sgRNA and the target DNA affects repression efficiency. Extensions of the basepairing region all exhibited strong silencing effect, and truncations dramatically decreased repression. The minimal length of the basepairing region for detectable repression is 12-bp. Error bars represent SEM from three biological replicates. (FIG. 43D) Single mismatches were introduced into every nucleotide on sgRNA (NT1, FIG. 40B) how these single mismatches affected repression efficiency was measured. Three sub-regions with distinct importance to the overall silencing can be discerned. They show a step function. The first 7-nucleotide region was critical for silencing, and likely constitutes a “seed” region for probing sgRNAs binding to the DNA target. The PAM sequence (NGG) was indispensible for silencing. Error bars represent SEM from three biological replicates. (FIG. 43E) Silencing effects of sgRNAs with adjacent double mismatches. The relative repression activity of single-mismatched sgRNAs is shown with the mismatch position labeled on the bottom. Experimentally measured activity of double-mismatched sgRNAs is shown. Calculated activity by multiplying the effects of two single-mismatched sgRNAs is shown in white and labeled with “Com.” In most cases, the silencing activity of a double-mismatched sgRNA was simply a multiplication of the activities of single-mismatched sgRNAs (except FIG. 50B), suggesting an independent relationship between single mismatches. Error bars represent SEM from three biological replicates. (FIG. 43F) Combinatorial silencing effects of using double sgRNAs to target a single mRFP gene. Using two sgRNAs that target the same gene, the overall knockdown effect can be improved to almost 1,000-fold. When two sgRNAs bind to non-overlapping sequences of the same gene, repression was augmented. When two sgRNAs target overlapping regions, repression was suppressed. Error bars represent SEM from three biological replicates.

FIG. 50A-50E is related to FIG. 43E and depicts the silencing effects of sgRNAs with adjacent double mismatches. The relative repression activity of single-mismatched sgRNAs is shown with the mismatch position labeled on the bottom. Experimentally measured activity of double-mismatched sgRNAs is also shown. Activity calculated by multiplying the effects of two single-mismatched sgRNAs is shown in white and labeled with “Com”. Fluorescence results represent average and SEM of three biological replicates.

FIG. 51A-51C is related to FIG. 43F and depicts the combinatorial silencing effects of using two sgRNAs to regulate a single gene. In all cases, non-overlapping sgRNAs showed augmentative silencing effects, and overlapping sgRNAs showed suppressive effects. The combinatorial effect was independent of whether the sgRNA was targeting the template or non-template DNA strands. Fluorescence results represent average and SEM of three biological replicates.

Interrogating an Endogenous Regulatory Network Using CRISPRi Gene Knockdown

The CRISPRi system was next used as a gene knockdown platform to interrogate endogenous gene networks. Previous methods to interrogate microbial gene networks have mostly relied on laborious and costly genomic engineering and knockout procedures. By contrast, gene knockdown with CRISPRi requires only the design and synthesis of a small sgRNA bearing a 20-bp complementary region to the desired genes. To demonstrate this, CRISPRi was used to create E. coli knockdown strains by designing sgRNAs to systematically perturb genes that were part of the well-characterized E. coli lactose regulatory pathway (FIG. 44A). β-galactosidase assays were performed to measure LacZ expression from the knockdown strains, with and without Isopropyl β-D-1-thiogalactopyranoside (IPTG), a chemical that inhibits the lac repressor (LacI). In wild-type cells, addition of IPTG induced LacZ expression. The results showed that a lacZ-specific sgRNA could strongly repress LacZ expression (FIG. 44B). Conversely, an sgRNA targeting the lacI gene led to activation of LacZ expression, even in the absence of IPTG, as would be expected for silencing a direct repressor of LacZ expression.

It is known that cAMP-CRP is an essential activator of LacZ expression by binding to a cis regulatory site upstream of the promoter (A site). Consistently, the sgRNA that was targeted to the crp gene or to the A site in the LacZ promoter led to repression, demonstrating a means to link a regulator to its cis-regulatory sequence using CRISPRi experiments. Targeting the adenylate cylase gene (cya), which is necessary to produce the cAMP that makes CRP more effective at the LacZ promoter, only led to partial repression. Addition of 1 mM cAMP to the growth media complemented the effects for cya knockdown but not for crp knockdown, suggesting that cya is an indirect regulator of LacZ. Furthermore, targeting the LacI cis-regulatory site (O site) with an sgRNA led to inhibition, presumably because Cas9 complex binding at this site sterically blocks RNA polymerase, mimicking the behavior of the LacI transcription repressor. Targeting the known RNAP binding site (P site) also blocked expression. In summary, these studies demonstrate that the CRISPRi-based gene knockdown method provides a rapid and effective approach for interrogating the regulatory functions (activating or repressing) of genes and cis elements in a complex regulatory network (FIG. 44C).

FIG. 44A-44C demonstrates functional profiling of a complex regulatory network using CRISPRi gene knockdown. (FIG. 44A) sgRNAs were designed and used to knock down genes (cya, crp, lacI, lacZ, lacY, lacA) in the lac regulatory pathway or block transcriptional operator sites (A/P/O). LacI is a repressor of the lacZYA operon by binding to a transcription operator site (O site). The lacZ gene encodes an enzyme that catalyzes lactose into glucose. A few trans-acting host genes such as cya and crp are involved in the activation of the lacZYA system. The cAMP-CRP complex binds to a transcription operator site (A site) and recruits RNA polymerase binding to the P site, which initiates transcription of lacZYA. IPTG, a chemical that inhibits LacI function, induces LacZ expression. (FIG. 44B) β-galactosidase assay of the knockdown strains without (white) and with (grey) IPTG. Control shows that the wild-type cells without CRISPRi perturbation could be induced by addition of IPTG. The sgRNA that targets LacZ strongly repressed LacZ expression, even in the presence of IPTG. When LacI was targeted, LacZ expression was high, even without IPTG. Targeting cya and crp genes led to decreased LacZ expression level in the presence of IPTG. Presence of 1 mM cAMP rescued cya knockdown but not crp knockdown. Blocking the transcription operator sites resulted in LacZ repression, suggesting that these are important cis-acting regulatory sites for LacZ. Upon perturbation, decreased (down arrows) and increased (up arrows) expression of LacZ is indicated. Error bars represent SEM from three biological replicates. (FIG. 44C) The knockdown experiments allowed us to profile the roles of regulators in the lac regulatory circuit. The data is shown on a 2-D graph, with x-axis showing LacZ activity without IPTG and y-axis showing its activity with IPTG. The spreading of ovals along each axis shows the standard deviations. The (3-galactosidase assay results represent average and SEM of three biological replicates. For RNA-seq data on LacI and LacZ targeting, see also FIG. 49A-49C.

CRISPRi can Knock Clown Targeted Gene Expression in Human Cells

To test the generality of the CRISPRi approach for using the dCas9-sgRNA complex to repress transcription, the system was tested in HEK293 mammalian cells. The dCas9 protein was codon optimized, fused to three copies of a nuclear localization sequence (NLS), and expressed from a Murine Stem Cell Virus (MSCV) retroviral vector. The same sgRNA design shown in FIG. 40B was used to express from the RNA polymerase III U6 promoter. A reporter HEK293 cell line expressing EGFP under the SV40 promoter was created by viral infection. Using an sgRNA (eNT2) that targeted the non-template DNA strand of the EGFP coding region, a moderate but reproducible knockdown of gene expression was observed (46% repression, FIG. 45A). The repression was dependent on both the dCas9 protein and sgRNA, implying that repression was due to the dCas9-sgRNA complex and RNA-guided targeting. The same sgRNA exhibited better repression on the same gene when transiently expressed from a plasmid (63% repression, FIG. 52). Consistent with the bacterial system, only sgRNAs targeted to the non-template strand exhibited repression. Factors such as the distance from the transcription start and the local chromatin state may be critical parameters determining repression efficiency (FIG. 52). Optimization of dCas9 and sgRNA expression, stability, nuclear localization, and interaction will allow for further improvement of CRISPRi efficiency in mammalian cells.

FIG. 45A-45B demonstrates that CRISPRi can repress gene expression in human cells. (FIG. 45A) A CRISPRi system in HEK293 cells. The SV40-EGFP expression cassette was inserted into the genome via retroviral infection. The dCas9 protein was codon-optimized and fused with three copies of NLS sequence. The sgRNA was expressed from an RNA polymerase III U6 vector. Co-transfection of dCas9 and an sgRNA (eNT2) that targets the non-template strand of EGFP decreased fluorescence (˜46%) while the expression of either dCas9 or sgRNA alone showed no effect. (FIG. 45B) The dCas9:sgRNA-mediated repression was dependent on the target loci. Seven sgRNAs were designed to target different regions of the EGFP coding sequence on the template or non-template strand. Only eNT2 and eNT5 showed moderate repression. Fluorescence results from 7A and 7B represent average and error of two biological replicates.

FIG. 52 is related to FIGS. 45A-45B and shows that sgRNA repression is dependent on the target loci and relatively distance from the transcription start. The same sgRNA was used to repress the same EGFP gene with different promoters. Cas9/sgRNA complexes repressed transcription from transiently transfected plasmid DNA. The level of transcriptional repression was slightly better (63%) than that observed for genomic genes, and the percentage of GFP-negative cells increased in the presence of sgRNA. The target locus has different distance from the transcription start. While SV40-EGFP showed repression, LTR-EGFP had no effect. Fluorescence results represent average and error of two biological replicates.

CRISPRi Efficiently and Selectively Represses Transcription of Target Genes

The CRISPRi system is a relatively simple platform for targeted gene regulation. CRISPRi does not rely on the presence of complex host factors, but instead only requires the dCas9 protein and guide RNAs, and thus is flexible and highly designable. The system can efficiently silence genes in bacteria. The silencing is very efficient, as no off-target effects were detected. Furthermore, the efficiency of the knockdown can be tuned by changing the target loci and the degree of basepairing between the sgRNA and the target gene. This will make it possible to create allelic series of hypomorphs—a feature that is especially useful for the study of essential genes. The system functions by directly blocking transcription, in a manner that can be easily programmed by designing sgRNAs. Mechanistically, this is distinct from RNAi-based silencing, which requires the destruction of already transcribed mRNAs.

In addition, these dCas9:sgRNA complexes can also modulate transcription by targeting key cis-acting motifs within any promoter, sterically blocking the association of their cognate trans-acting transcription factors. Thus, in addition to its use as a gene knockdown tool, CRISPRi could be used for functional mapping of promoters and other genomic regulatory modules.

CRISPRi is Amenable to Genome-Scale Analysis and Regulation

The CRISPRi method is based on the use of DNA-targeting RNAs, and only the DNA targeting segment needs to be designed for specific gene targets. With the advances of large-scale DNA oligonucleotide synthesis technology, generating large sets of oligonucleotides that contain unique 20-bp regions for genome targeting is fast and inexpensive. These oligonucleotide libraries could allow us to target large numbers of individual genes to infer gene function or to target gene pairs to map genetic interactions. Furthermore, CRISPRi could be used to simultaneously modulate the expression of large sets of genes, as the small size of sgRNAs allows one to concatenate multiple elements into the same expression vector.

CRISPRi Provides New Tools for Manipulating Microbial Genomes

Because the CRISPRi platform is compact and self-contained, it can be adapted for different organisms. CRISPRi is a powerful tool for studying non-model organisms for which genetic engineering methods are not well developed, including pathogens or industrially useful organisms. Unlike most eukaryotes, most bacteria lack the RNAi machinery. As a consequence, regulation of endogenous genes using designed synthetic RNAs is currently limited. CRISPRi could provide an RNAi-like method for gene perturbation in microbes.

CRISPRi as a Platform for Engineering Transcriptional Regulatory Networks

The CRISPRi can be utilized as a flexible framework for engineering transcriptional regulatory networks. The CRISPRi platform, because it is essentially a RNA-guided DNA-binding complex, also provides a flexible scaffold for directing diverse regulatory machinery to specific sites in the genome. Beyond simply blocking transcription of target genes, it is possible to couple the dCas9 protein with numerous regulatory domains to modulate different biological processes and to generate different functional outcomes (e.g transcriptional activation, chromatin modifications).

In the CRISPRi system, it is possible to link multiple sgRNAs into transcriptional circuits in which one upstream sgRNA controls the expression of a different downstream sgRNA. As RNA molecules in microorganisms tend to be short-lived, we suspect that the genetic programs regulated by sgRNAs might show rapid kinetics distinct from circuits that involve slow processes such as protein expression and degradation. In summary, the CRISPRi system is a general genetic programming platform suitable for a variety of biomedical research and clinical applications including genome-scale functional profiling, microbial metabolic engineering, and cell reprogramming.

Example 5: A Chimeric Site-Directed Polypeptide can be Used to Modulate (Activate or Repress) Transcription in Human Cells

We have demonstrated that in human cells, a fusion protein comprising a catalytically inactive Cas9 and an activator domain or a repressor domain can increase or decrease transcription from a target DNA, respectively.

FIG. 55A-55D. We fused the humanized catalytically inactive Cas9 with a transcription activator domain VP64. (FIG. 55A) To test the efficiency for gene activation using this system, we inserted a GAL4 UAS inducible promoter that controls a GFP into the HEK293 (human tissue culture cells) genome. (FIG. 55B) The GAL4 UAS promoter can be induced in the presence of yeast-derived protein GAL4. The dCas9-VP64 fusion can effectively activate GAL4 UAS by 20-fold in the presence of a cognate guide RNA that binds to the GAL4 UAS region. (FIG. 55C) Microscopic images for dCas9-VP64 activation. (FIG. 55D) Flow cytometry data for dCas9-VP64 activation.

FIG. 56 We fused the humanized catalytically inactive Cas9 with a transcription repressor domain KRAB. (Top) We designed 10 guide RNAs that target a well-characterized promoter SV40 early promoter and one guide RNA that targets the EGFP coding region. (Bottom) Using a non-chimeric dCas9, we observed 2˜3 fold repression for gRNAs of P9 and NT2. This efficiency was greatly improved using the dCas9-KRAB fusion. For example, with dCas9-KRAB fusion, P9 and NT2 showed 20-fold and 15-fold repression, respectively. In addition P1-P6 showed a significant reduction in expression when the fusion protein was used, but limited repression when a non-chimeric dCas9 was used.

Example 6: Cas9 can Use Artificial Guide RNAs, not Existing in Nature, to Perform Target DNA Cleavage

An artificial crRNA and an artificial tracrRNA were designed based on the protein-binding segment of naturally occurring transcripts of S. pyogenes crRNA and tracrRNAs, modified to mimic the asymmetric bulge within natural S. pyogenes crRNA:tracrRNA duplex (see the bulge in the protein-binding domain of both the artificial (top) and natural (bottom) RNA molecules depicted in FIG. 57A). The artificial tracrRNA sequence shares less than 50% identity with the natural tracrRNA. The predicted secondary structure of the crRNA:tracrRNA protein-binding duplex is the same for both RNA pairs, but the predicted structure of the rest of the RNAs is much different.

FIG. 57A-57B demonstrates that artificial sequences that share very little (roughly 50% identity) with naturally occurring a tracrRNAs and crRNAs can function with Cas9 to cleave target DNA as long as the structure of the protein-binding domain of the DNA-targeting RNA is conserved. (FIG. 57A) Co-folding of S. pyogenes tracrRNA and crRNA and artificial tracrRNA and crRNA. (FIG. 57B) Combinations of S. pyogenes Cas9 and tracrRNA:crRNA orthologs were used to perform plasmid DNA cleavage assays .Spy—S. pyogenes, Lin—L. innocua, Nme—N. meningitidis, Pmu—P. multocida. S. pyogenes Cas9 can be guided by some, but not all tracrRNA:crRNA orthologs naturally occurring in selected bacterial species. Notably, S. pyogenes Cas9 can be guided by the artificial tracrRNA:crRNA pair, which was designed based on the structure of the protein-binding segment of the naturally occurring DNA-targeting RNA using sequence completely unrelated to the CRISPR system.

The artificial “tracrRNA” (activator RNA) used was 5′-GUUUUCCCUUUUCAAAGAAAUCUCCUGGGCACCUAUCUUCUUAGGUGCCCUCCCU UGUUUAAACCUGACCAGUUAACCGGCUGGUUAGGUUUUU-3′ (SEQ ID NO:1347). The artificial “crRNA” (targeter RNA) used was: 5′-GAGAUUUAUGAAAAGGGAAAAC-3′ (SEQ ID NO:1348).

Example 7: Generation of Non-Human Transgenic Organisms

A transgenic mouse expressing Cas9 (either unmodified, modified to have reduced enzymatic activity, modified as a fusion protein for any of the purposes outline above) is generated using a convenient method known to one of ordinary skill in the art (e.g., (i) gene knock-in at a targeted locus (e.g., ROSA 26) of a mouse embryonic stem cell (ES cell) followed by blastocyst injection and the generation of chimeric mice; (ii) injection of a randomly integrating transgene into the pronucleus of a fertilized mouse oocyte followed by ovum implantation into a pseudopregnant female; etc.). The Cas9 protein is under the control of a promoter that expresses at least in embryonic stem cells, and may be additionally under temporal or tissue-specific control (e.g., drug inducible, controlled by a Cre/Lox based promoter system, etc.). Once a line of transgenic Cas9 expressing mice is generated, embryonic stem cells are isolated and cultured and in some cases ES cells are frozen for future use. Because the isolated ES cells express Cas9 (and in some cases the expression is under temporal control (e.g., drug inducible), new knock-out or knock-in cells (and therefore mice) are rapidly generated at any desired locus in the genome by introducing an appropriately designed DNA-targeting RNA that targets the Cas9 to a particular locus of choice. Such a system, and many variations thereof, is used to generate new genetically modified organisms at any locus of choice. When modified Cas9 is used to modulate transcription and/or modify DNA and/or modify polypeptides associated with DNA, the ES cells themselves (or any differentiated cells derived from the ES cells (e.g., an entire mouse, a differentiated cell line, etc.) are used to study to properties of any gene of choice (or any expression product of choice, or any genomic locus of choice) simply by introducing an appropriate DNA-targeting RNA into a desired Cas9 expressing cell.

While the present invention has been described with reference to the specific embodiments thereof, it should be understood by those skilled in the art that various changes may be made and equivalents may be substituted without departing from the true spirit and scope of the invention. In addition, many modifications may be made to adapt a particular situation, material, composition of matter, process, process step or steps, to the objective, spirit and scope of the present invention. All such modifications are intended to be within the scope of the claims appended hereto.


SEQUENCE LISTING
The patent contains a lengthy “Sequence Listing” section. A copy of the “Sequence Listing” is available in electronic form from the USPTO web site (). An electronic copy of the “Sequence Listing” will also be available from the USPTO upon request and payment of the fee set forth in 37 CFR 1.19(b)(3).

<160> NUMBER OF SEQ ID NOS: 1424

<140> CURRENT APPLICATION NUMBER: US/14/685,502D

<210> SEQ ID NO 1

<211> LENGTH: 1056

<212> TYPE: PRT

<213> ORGANISM: Pasteurella multocida

<400> SEQUENCE: 1

Met Gln Thr Thr Asn Leu Ser Tyr Ile Leu Gly Leu Asp Leu Gly Ile

1 5 10 15

Ala Ser Val Gly Trp Ala Val Val Glu Ile Asn Glu Asn Glu Asp Pro

20 25 30

Ile Gly Leu Ile Asp Val Gly Val Arg Ile Phe Glu Arg Ala Glu Val

35 40 45

Pro Lys Thr Gly Glu Ser Leu Ala Leu Ser Arg Arg Leu Ala Arg Ser

50 55 60

Thr Arg Arg Leu Ile Arg Arg Arg Ala His Arg Leu Leu Leu Ala Lys

65 70 75 80

Arg Phe Leu Lys Arg Glu Gly Ile Leu Ser Thr Ile Asp Leu Glu Lys

85 90 95

Gly Leu Pro Asn Gln Ala Trp Glu Leu Arg Val Ala Gly Leu Glu Arg

100 105 110

Arg Leu Ser Ala Ile Glu Trp Gly Ala Val Leu Leu His Leu Ile Lys

115 120 125

His Arg Gly Tyr Leu Ser Lys Arg Lys Asn Glu Ser Gln Thr Asn Asn

130 135 140

Lys Glu Leu Gly Ala Leu Leu Ser Gly Val Ala Gln Asn His Gln Leu

145 150 155 160

Leu Gln Ser Asp Asp Tyr Arg Thr Pro Ala Glu Leu Ala Leu Lys Lys

165 170 175

Phe Ala Lys Glu Glu Gly His Ile Arg Asn Gln Arg Gly Ala Tyr Thr

180 185 190

His Thr Phe Asn Arg Leu Asp Leu Leu Ala Glu Leu Asn Leu Leu Phe

195 200 205

Ala Gln Gln His Gln Phe Gly Asn Pro His Cys Lys Glu His Ile Gln

210 215 220

Gln Tyr Met Thr Glu Leu Leu Met Trp Gln Lys Pro Ala Leu Ser Gly

225 230 235 240

Glu Ala Ile Leu Lys Met Leu Gly Lys Cys Thr His Glu Lys Asn Glu

245 250 255

Phe Lys Ala Ala Lys His Thr Tyr Ser Ala Glu Arg Phe Val Trp Leu

260 265 270

Thr Lys Leu Asn Asn Leu Arg Ile Leu Glu Asp Gly Ala Glu Arg Ala

275 280 285

Leu Asn Glu Glu Glu Arg Gln Leu Leu Ile Asn His Pro Tyr Glu Lys

290 295 300

Ser Lys Leu Thr Tyr Ala Gln Val Arg Lys Leu Leu Gly Leu Ser Glu

305 310 315 320

Gln Ala Ile Phe Lys His Leu Arg Tyr Ser Lys Glu Asn Ala Glu Ser

325 330 335

Ala Thr Phe Met Glu Leu Lys Ala Trp His Ala Ile Arg Lys Ala Leu

340 345 350

Glu Asn Gln Gly Leu Lys Asp Thr Trp Gln Asp Leu Ala Lys Lys Pro

355 360 365

Asp Leu Leu Asp Glu Ile Gly Thr Ala Phe Ser Leu Tyr Lys Thr Asp

370 375 380

Glu Asp Ile Gln Gln Tyr Leu Thr Asn Lys Val Pro Asn Ser Val Ile

385 390 395 400

Asn Ala Leu Leu Val Ser Leu Asn Phe Asp Lys Phe Ile Glu Leu Ser

405 410 415

Leu Lys Ser Leu Arg Lys Ile Leu Pro Leu Met Glu Gln Gly Lys Arg

420 425 430

Tyr Asp Gln Ala Cys Arg Glu Ile Tyr Gly His His Tyr Gly Glu Ala

435 440 445

Asn Gln Lys Thr Ser Gln Leu Leu Pro Ala Ile Pro Ala Gln Glu Ile

450 455 460

Arg Asn Pro Val Val Leu Arg Thr Leu Ser Gln Ala Arg Lys Val Ile

465 470 475 480

Asn Ala Ile Ile Arg Gln Tyr Gly Ser Pro Ala Arg Val His Ile Glu

485 490 495

Thr Gly Arg Glu Leu Gly Lys Ser Phe Lys Glu Arg Arg Glu Ile Gln

500 505 510

Lys Gln Gln Glu Asp Asn Arg Thr Lys Arg Glu Ser Ala Val Gln Lys

515 520 525

Phe Lys Glu Leu Phe Ser Asp Phe Ser Ser Glu Pro Lys Ser Lys Asp

530 535 540

Ile Leu Lys Phe Arg Leu Tyr Glu Gln Gln His Gly Lys Cys Leu Tyr

545 550 555 560

Ser Gly Lys Glu Ile Asn Ile His Arg Leu Asn Glu Lys Gly Tyr Val

565 570 575

Glu Ile Asp His Ala Leu Pro Phe Ser Arg Thr Trp Asp Asp Ser Phe

580 585 590

Asn Asn Lys Val Leu Val Leu Ala Ser Glu Asn Gln Asn Lys Gly Asn

595 600 605

Gln Thr Pro Tyr Glu Trp Leu Gln Gly Lys Ile Asn Ser Glu Arg Trp

610 615 620

Lys Asn Phe Val Ala Leu Val Leu Gly Ser Gln Cys Ser Ala Ala Lys

625 630 635 640

Lys Gln Arg Leu Leu Thr Gln Val Ile Asp Asp Asn Lys Phe Ile Asp

645 650 655

Arg Asn Leu Asn Asp Thr Arg Tyr Ile Ala Arg Phe Leu Ser Asn Tyr

660 665 670

Ile Gln Glu Asn Leu Leu Leu Val Gly Lys Asn Lys Lys Asn Val Phe

675 680 685

Thr Pro Asn Gly Gln Ile Thr Ala Leu Leu Arg Ser Arg Trp Gly Leu

690 695 700

Ile Lys Ala Arg Glu Asn Asn Asn Arg His His Ala Leu Asp Ala Ile

705 710 715 720

Val Val Ala Cys Ala Thr Pro Ser Met Gln Gln Lys Ile Thr Arg Phe

725 730 735

Ile Arg Phe Lys Glu Val His Pro Tyr Lys Ile Glu Asn Arg Tyr Glu

740 745 750

Met Val Asp Gln Glu Ser Gly Glu Ile Ile Ser Pro His Phe Pro Glu

755 760 765

Pro Trp Ala Tyr Phe Arg Gln Glu Val Asn Ile Arg Val Phe Asp Asn

770 775 780

His Pro Asp Thr Val Leu Lys Glu Met Leu Pro Asp Arg Pro Gln Ala

785 790 795 800

Asn His Gln Phe Val Gln Pro Leu Phe Val Ser Arg Ala Pro Thr Arg

805 810 815

Lys Met Ser Gly Gln Gly His Met Glu Thr Ile Lys Ser Ala Lys Arg

820 825 830

Leu Ala Glu Gly Ile Ser Val Leu Arg Ile Pro Leu Thr Gln Leu Lys

835 840 845

Pro Asn Leu Leu Glu Asn Met Val Asn Lys Glu Arg Glu Pro Ala Leu

850 855 860

Tyr Ala Gly Leu Lys Ala Arg Leu Ala Glu Phe Asn Gln Asp Pro Ala

865 870 875 880

Lys Ala Phe Ala Thr Pro Phe Tyr Lys Gln Gly Gly Gln Gln Val Lys

885 890 895

Ala Ile Arg Val Glu Gln Val Gln Lys Ser Gly Val Leu Val Arg Glu

900 905 910

Asn Asn Gly Val Ala Asp Asn Ala Ser Ile Val Arg Thr Asp Val Phe

915 920 925

Ile Lys Asn Asn Lys Phe Phe Leu Val Pro Ile Tyr Thr Trp Gln Val

930 935 940

Ala Lys Gly Ile Leu Pro Asn Lys Ala Ile Val Ala His Lys Asn Glu

945 950 955 960

Asp Glu Trp Glu Glu Met Asp Glu Gly Ala Lys Phe Lys Phe Ser Leu

965 970 975

Phe Pro Asn Asp Leu Val Glu Leu Lys Thr Lys Lys Glu Tyr Phe Phe

980 985 990

Gly Tyr Tyr Ile Gly Leu Asp Arg Ala Thr Gly Asn Ile Ser Leu Lys

995 1000 1005

Glu His Asp Gly Glu Ile Ser Lys Gly Lys Asp Gly Val Tyr Arg

1010 1015 1020

Val Gly Val Lys Leu Ala Leu Ser Phe Glu Lys Tyr Gln Val Asp

1025 1030 1035

Glu Leu Gly Lys Asn Arg Gln Ile Cys Arg Pro Gln Gln Arg Gln

1040 1045 1050

Pro Val Arg

1055

<210> SEQ ID NO 2

<211> LENGTH: 1368

<212> TYPE: PRT

<213> ORGANISM: Streptococcus pyogenes

<400> SEQUENCE: 2

Met Asp Lys Lys Tyr Ser Ile Gly Leu Asp Ile Gly Thr Asn Ser Val

1 5 10 15

Gly Trp Ala Val Ile Thr Asp Glu Tyr Lys Val Pro Ser Lys Lys Phe

20 25 30

Lys Val Leu Gly Asn Thr Asp Arg His Ser Ile Lys Lys Asn Leu Ile

35 40 45

Gly Ala Leu Leu Phe Asp Ser Gly Glu Thr Ala Glu Ala Thr Arg Leu

50 55 60

Lys Arg Thr Ala Arg Arg Arg Tyr Thr Arg Arg Lys Asn Arg Ile Cys

65 70 75 80

Tyr Leu Gln Glu Ile Phe Ser Asn Glu Met Ala Lys Val Asp Asp Ser

85 90 95

Phe Phe His Arg Leu Glu Glu Ser Phe Leu Val Glu Glu Asp Lys Lys

100 105 110

His Glu Arg His Pro Ile Phe Gly Asn Ile Val Asp Glu Val Ala Tyr

115 120 125

His Glu Lys Tyr Pro Thr Ile Tyr His Leu Arg Lys Lys Leu Val Asp

130 135 140

Ser Thr Asp Lys Ala Asp Leu Arg Leu Ile Tyr Leu Ala Leu Ala His

145 150 155 160

Met Ile Lys Phe Arg Gly His Phe Leu Ile Glu Gly Asp Leu Asn Pro

165 170 175

Asp Asn Ser Asp Val Asp Lys Leu Phe Ile Gln Leu Val Gln Thr Tyr

180 185 190

Asn Gln Leu Phe Glu Glu Asn Pro Ile Asn Ala Ser Gly Val Asp Ala

195 200 205

Lys Ala Ile Leu Ser Ala Arg Leu Ser Lys Ser Arg Arg Leu Glu Asn

210 215 220

Leu Ile Ala Gln Leu Pro Gly Glu Lys Lys Asn Gly Leu Phe Gly Asn

225 230 235 240

Leu Ile Ala Leu Ser Leu Gly Leu Thr Pro Asn Phe Lys Ser Asn Phe

245 250 255

Asp Leu Ala Glu Asp Ala Lys Leu Gln Leu Ser Lys Asp Thr Tyr Asp

260 265 270

Asp Asp Leu Asp Asn Leu Leu Ala Gln Ile Gly Asp Gln Tyr Ala Asp

275 280 285

Leu Phe Leu Ala Ala Lys Asn Leu Ser Asp Ala Ile Leu Leu Ser Asp

290 295 300

Ile Leu Arg Val Asn Thr Glu Ile Thr Lys Ala Pro Leu Ser Ala Ser

305 310 315 320

Met Ile Lys Arg Tyr Asp Glu His His Gln Asp Leu Thr Leu Leu Lys

325 330 335

Ala Leu Val Arg Gln Gln Leu Pro Glu Lys Tyr Lys Glu Ile Phe Phe

340 345 350

Asp Gln Ser Lys Asn Gly Tyr Ala Gly Tyr Ile Asp Gly Gly Ala Ser

355 360 365

Gln Glu Glu Phe Tyr Lys Phe Ile Lys Pro Ile Leu Glu Lys Met Asp

370 375 380

Gly Thr Glu Glu Leu Leu Val Lys Leu Asn Arg Glu Asp Leu Leu Arg

385 390 395 400

Lys Gln Arg Thr Phe Asp Asn Gly Ser Ile Pro His Gln Ile His Leu

405 410 415

Gly Glu Leu His Ala Ile Leu Arg Arg Gln Glu Asp Phe Tyr Pro Phe

420 425 430

Leu Lys Asp Asn Arg Glu Lys Ile Glu Lys Ile Leu Thr Phe Arg Ile

435 440 445

Pro Tyr Tyr Val Gly Pro Leu Ala Arg Gly Asn Ser Arg Phe Ala Trp

450 455 460

Met Thr Arg Lys Ser Glu Glu Thr Ile Thr Pro Trp Asn Phe Glu Glu

465 470 475 480

Val Val Asp Lys Gly Ala Ser Ala Gln Ser Phe Ile Glu Arg Met Thr

485 490 495

Asn Phe Asp Lys Asn Leu Pro Asn Glu Lys Val Leu Pro Lys His Ser

500 505 510

Leu Leu Tyr Glu Tyr Phe Thr Val Tyr Asn Glu Leu Thr Lys Val Lys

515 520 525

Tyr Val Thr Glu Gly Met Arg Lys Pro Ala Phe Leu Ser Gly Glu Gln

530 535 540

Lys Lys Ala Ile Val Asp Leu Leu Phe Lys Thr Asn Arg Lys Val Thr

545 550 555 560

Val Lys Gln Leu Lys Glu Asp Tyr Phe Lys Lys Ile Glu Cys Phe Asp

565 570 575

Ser Val Glu Ile Ser Gly Val Glu Asp Arg Phe Asn Ala Ser Leu Gly

580 585 590

Thr Tyr His Asp Leu Leu Lys Ile Ile Lys Asp Lys Asp Phe Leu Asp

595 600 605

Asn Glu Glu Asn Glu Asp Ile Leu Glu Asp Ile Val Leu Thr Leu Thr

610 615 620

Leu Phe Glu Asp Arg Glu Met Ile Glu Glu Arg Leu Lys Thr Tyr Ala

625 630 635 640

His Leu Phe Asp Asp Lys Val Met Lys Gln Leu Lys Arg Arg Arg Tyr

645 650 655

Thr Gly Trp Gly Arg Leu Ser Arg Lys Leu Ile Asn Gly Ile Arg Asp

660 665 670

Lys Gln Ser Gly Lys Thr Ile Leu Asp Phe Leu Lys Ser Asp Gly Phe

675 680 685

Ala Asn Arg Asn Phe Met Gln Leu Ile His Asp Asp Ser Leu Thr Phe

690 695 700

Lys Glu Asp Ile Gln Lys Ala Gln Val Ser Gly Gln Gly Asp Ser Leu

705 710 715 720

His Glu His Ile Ala Asn Leu Ala Gly Ser Pro Ala Ile Lys Lys Gly

725 730 735

Ile Leu Gln Thr Val Lys Val Val Asp Glu Leu Val Lys Val Met Gly

740 745 750

Arg His Lys Pro Glu Asn Ile Val Ile Glu Met Ala Arg Glu Asn Gln

755 760 765

Thr Thr Gln Lys Gly Gln Lys Asn Ser Arg Glu Arg Met Lys Arg Ile

770 775 780

Glu Glu Gly Ile Lys Glu Leu Gly Ser Gln Ile Leu Lys Glu His Pro

785 790 795 800

Val Glu Asn Thr Gln Leu Gln Asn Glu Lys Leu Tyr Leu Tyr Tyr Leu

805 810 815

Gln Asn Gly Arg Asp Met Tyr Val Asp Gln Glu Leu Asp Ile Asn Arg

820 825 830

Leu Ser Asp Tyr Asp Val Asp His Ile Val Pro Gln Ser Phe Leu Lys

835 840 845

Asp Asp Ser Ile Asp Asn Lys Val Leu Thr Arg Ser Asp Lys Asn Arg

850 855 860

Gly Lys Ser Asp Asn Val Pro Ser Glu Glu Val Val Lys Lys Met Lys

865 870 875 880

Asn Tyr Trp Arg Gln Leu Leu Asn Ala Lys Leu Ile Thr Gln Arg Lys

885 890 895

Phe Asp Asn Leu Thr Lys Ala Glu Arg Gly Gly Leu Ser Glu Leu Asp

900 905 910

Lys Ala Gly Phe Ile Lys Arg Gln Leu Val Glu Thr Arg Gln Ile Thr

915 920 925

Lys His Val Ala Gln Ile Leu Asp Ser Arg Met Asn Thr Lys Tyr Asp

930 935 940

Glu Asn Asp Lys Leu Ile Arg Glu Val Lys Val Ile Thr Leu Lys Ser

945 950 955 960

Lys Leu Val Ser Asp Phe Arg Lys Asp Phe Gln Phe Tyr Lys Val Arg

965 970 975

Glu Ile Asn Asn Tyr His His Ala His Asp Ala Tyr Leu Asn Ala Val

980 985 990

Val Gly Thr Ala Leu Ile Lys Lys Tyr Pro Lys Leu Glu Ser Glu Phe

995 1000 1005

Val Tyr Gly Asp Tyr Lys Val Tyr Asp Val Arg Lys Met Ile Ala

1010 1015 1020

Lys Ser Glu Gln Glu Ile Gly Lys Ala Thr Ala Lys Tyr Phe Phe

1025 1030 1035

Tyr Ser Asn Ile Met Asn Phe Phe Lys Thr Glu Ile Thr Leu Ala

1040 1045 1050

Asn Gly Glu Ile Arg Lys Arg Pro Leu Ile Glu Thr Asn Gly Glu

1055 1060 1065

Thr Gly Glu Ile Val Trp Asp Lys Gly Arg Asp Phe Ala Thr Val

1070 1075 1080

Arg Lys Val Leu Ser Met Pro Gln Val Asn Ile Val Lys Lys Thr

1085 1090 1095

Glu Val Gln Thr Gly Gly Phe Ser Lys Glu Ser Ile Leu Pro Lys

1100 1105 1110

Arg Asn Ser Asp Lys Leu Ile Ala Arg Lys Lys Asp Trp Asp Pro

1115 1120 1125

Lys Lys Tyr Gly Gly Phe Asp Ser Pro Thr Val Ala Tyr Ser Val

1130 1135 1140

Leu Val Val Ala Lys Val Glu Lys Gly Lys Ser Lys Lys Leu Lys

1145 1150 1155

Ser Val Lys Glu Leu Leu Gly Ile Thr Ile Met Glu Arg Ser Ser

1160 1165 1170

Phe Glu Lys Asn Pro Ile Asp Phe Leu Glu Ala Lys Gly Tyr Lys

1175 1180 1185

Glu Val Lys Lys Asp Leu Ile Ile Lys Leu Pro Lys Tyr Ser Leu

1190 1195 1200

Phe Glu Leu Glu Asn Gly Arg Lys Arg Met Leu Ala Ser Ala Gly

1205 1210 1215

Glu Leu Gln Lys Gly Asn Glu Leu Ala Leu Pro Ser Lys Tyr Val

1220 1225 1230

Asn Phe Leu Tyr Leu Ala Ser His Tyr Glu Lys Leu Lys Gly Ser

1235 1240 1245

Pro Glu Asp Asn Glu Gln Lys Gln Leu Phe Val Glu Gln His Lys

1250 1255 1260

His Tyr Leu Asp Glu Ile Ile Glu Gln Ile Ser Glu Phe Ser Lys

1265 1270 1275

Arg Val Ile Leu Ala Asp Ala Asn Leu Asp Lys Val Leu Ser Ala

1280 1285 1290

Tyr Asn Lys His Arg Asp Lys Pro Ile Arg Glu Gln Ala Glu Asn

1295 1300 1305

Ile Ile His Leu Phe Thr Leu Thr Asn Leu Gly Ala Pro Ala Ala

1310 1315 1320

Phe Lys Tyr Phe Asp Thr Thr Ile Asp Arg Lys Arg Tyr Thr Ser

1325 1330 1335

Thr Lys Glu Val Leu Asp Ala Thr Leu Ile His Gln Ser Ile Thr

1340 1345 1350

Gly Leu Tyr Glu Thr Arg Ile Asp Leu Ser Gln Leu Gly Gly Asp

1355 1360 1365

<210> SEQ ID NO 3

<211> LENGTH: 1334

<212> TYPE: PRT

<213> ORGANISM: Listeria innocua

<400> SEQUENCE: 3

Met Lys Lys Pro Tyr Thr Ile Gly Leu Asp Ile Gly Thr Asn Ser Val

1 5 10 15

Gly Trp Ala Val Leu Thr Asp Gln Tyr Asp Leu Val Lys Arg Lys Met

20 25 30

Lys Ile Ala Gly Asp Ser Glu Lys Lys Gln Ile Lys Lys Asn Phe Trp

35 40 45

Gly Val Arg Leu Phe Asp Glu Gly Gln Thr Ala Ala Asp Arg Arg Met

50 55 60

Ala Arg Thr Ala Arg Arg Arg Ile Glu Arg Arg Arg Asn Arg Ile Ser

65 70 75 80

Tyr Leu Gln Gly Ile Phe Ala Glu Glu Met Ser Lys Thr Asp Ala Asn

85 90 95

Phe Phe Cys Arg Leu Ser Asp Ser Phe Tyr Val Asp Asn Glu Lys Arg

100 105 110

Asn Ser Arg His Pro Phe Phe Ala Thr Ile Glu Glu Glu Val Glu Tyr

115 120 125

His Lys Asn Tyr Pro Thr Ile Tyr His Leu Arg Glu Glu Leu Val Asn

130 135 140

Ser Ser Glu Lys Ala Asp Leu Arg Leu Val Tyr Leu Ala Leu Ala His

145 150 155 160

Ile Ile Lys Tyr Arg Gly Asn Phe Leu Ile Glu Gly Ala Leu Asp Thr

165 170 175

Gln Asn Thr Ser Val Asp Gly Ile Tyr Lys Gln Phe Ile Gln Thr Tyr

180 185 190

Asn Gln Val Phe Ala Ser Gly Ile Glu Asp Gly Ser Leu Lys Lys Leu

195 200 205

Glu Asp Asn Lys Asp Val Ala Lys Ile Leu Val Glu Lys Val Thr Arg

210 215 220

Lys Glu Lys Leu Glu Arg Ile Leu Lys Leu Tyr Pro Gly Glu Lys Ser

225 230 235 240

Ala Gly Met Phe Ala Gln Phe Ile Ser Leu Ile Val Gly Ser Lys Gly

245 250 255

Asn Phe Gln Lys Pro Phe Asp Leu Ile Glu Lys Ser Asp Ile Glu Cys

260 265 270

Ala Lys Asp Ser Tyr Glu Glu Asp Leu Glu Ser Leu Leu Ala Leu Ile

275 280 285

Gly Asp Glu Tyr Ala Glu Leu Phe Val Ala Ala Lys Asn Ala Tyr Ser

290 295 300

Ala Val Val Leu Ser Ser Ile Ile Thr Val Ala Glu Thr Glu Thr Asn

305 310 315 320

Ala Lys Leu Ser Ala Ser Met Ile Glu Arg Phe Asp Thr His Glu Glu

325 330 335

Asp Leu Gly Glu Leu Lys Ala Phe Ile Lys Leu His Leu Pro Lys His

340 345 350

Tyr Glu Glu Ile Phe Ser Asn Thr Glu Lys His Gly Tyr Ala Gly Tyr

355 360 365

Ile Asp Gly Lys Thr Lys Gln Ala Asp Phe Tyr Lys Tyr Met Lys Met

370 375 380

Thr Leu Glu Asn Ile Glu Gly Ala Asp Tyr Phe Ile Ala Lys Ile Glu

385 390 395 400

Lys Glu Asn Phe Leu Arg Lys Gln Arg Thr Phe Asp Asn Gly Ala Ile

405 410 415

Pro His Gln Leu His Leu Glu Glu Leu Glu Ala Ile Leu His Gln Gln

420 425 430

Ala Lys Tyr Tyr Pro Phe Leu Lys Glu Asn Tyr Asp Lys Ile Lys Ser

435 440 445

Leu Val Thr Phe Arg Ile Pro Tyr Phe Val Gly Pro Leu Ala Asn Gly

450 455 460

Gln Ser Glu Phe Ala Trp Leu Thr Arg Lys Ala Asp Gly Glu Ile Arg

465 470 475 480

Pro Trp Asn Ile Glu Glu Lys Val Asp Phe Gly Lys Ser Ala Val Asp

485 490 495

Phe Ile Glu Lys Met Thr Asn Lys Asp Thr Tyr Leu Pro Lys Glu Asn

500 505 510

Val Leu Pro Lys His Ser Leu Cys Tyr Gln Lys Tyr Leu Val Tyr Asn

515 520 525

Glu Leu Thr Lys Val Arg Tyr Ile Asn Asp Gln Gly Lys Thr Ser Tyr

530 535 540

Phe Ser Gly Gln Glu Lys Glu Gln Ile Phe Asn Asp Leu Phe Lys Gln

545 550 555 560

Lys Arg Lys Val Lys Lys Lys Asp Leu Glu Leu Phe Leu Arg Asn Met

565 570 575

Ser His Val Glu Ser Pro Thr Ile Glu Gly Leu Glu Asp Ser Phe Asn

580 585 590

Ser Ser Tyr Ser Thr Tyr His Asp Leu Leu Lys Val Gly Ile Lys Gln

595 600 605

Glu Ile Leu Asp Asn Pro Val Asn Thr Glu Met Leu Glu Asn Ile Val

610 615 620

Lys Ile Leu Thr Val Phe Glu Asp Lys Arg Met Ile Lys Glu Gln Leu

625 630 635 640

Gln Gln Phe Ser Asp Val Leu Asp Gly Val Val Leu Lys Lys Leu Glu

645 650 655

Arg Arg His Tyr Thr Gly Trp Gly Arg Leu Ser Ala Lys Leu Leu Met

660 665 670

Gly Ile Arg Asp Lys Gln Ser His Leu Thr Ile Leu Asp Tyr Leu Met

675 680 685

Asn Asp Asp Gly Leu Asn Arg Asn Leu Met Gln Leu Ile Asn Asp Ser

690 695 700

Asn Leu Ser Phe Lys Ser Ile Ile Glu Lys Glu Gln Val Thr Thr Ala

705 710 715 720

Asp Lys Asp Ile Gln Ser Ile Val Ala Asp Leu Ala Gly Ser Pro Ala

725 730 735

Ile Lys Lys Gly Ile Leu Gln Ser Leu Lys Ile Val Asp Glu Leu Val

740 745 750

Ser Val Met Gly Tyr Pro Pro Gln Thr Ile Val Val Glu Met Ala Arg

755 760 765

Glu Asn Gln Thr Thr Gly Lys Gly Lys Asn Asn Ser Arg Pro Arg Tyr

770 775 780

Lys Ser Leu Glu Lys Ala Ile Lys Glu Phe Gly Ser Gln Ile Leu Lys

785 790 795 800

Glu His Pro Thr Asp Asn Gln Glu Leu Arg Asn Asn Arg Leu Tyr Leu

805 810 815

Tyr Tyr Leu Gln Asn Gly Lys Asp Met Tyr Thr Gly Gln Asp Leu Asp

820 825 830

Ile His Asn Leu Ser Asn Tyr Asp Ile Asp His Ile Val Pro Gln Ser

835 840 845

Phe Ile Thr Asp Asn Ser Ile Asp Asn Leu Val Leu Thr Ser Ser Ala

850 855 860

Gly Asn Arg Glu Lys Gly Asp Asp Val Pro Pro Leu Glu Ile Val Arg

865 870 875 880

Lys Arg Lys Val Phe Trp Glu Lys Leu Tyr Gln Gly Asn Leu Met Ser

885 890 895

Lys Arg Lys Phe Asp Tyr Leu Thr Lys Ala Glu Arg Gly Gly Leu Thr

900 905 910

Glu Ala Asp Lys Ala Arg Phe Ile His Arg Gln Leu Val Glu Thr Arg

915 920 925

Gln Ile Thr Lys Asn Val Ala Asn Ile Leu His Gln Arg Phe Asn Tyr

930 935 940

Glu Lys Asp Asp His Gly Asn Thr Met Lys Gln Val Arg Ile Val Thr

945 950 955 960

Leu Lys Ser Ala Leu Val Ser Gln Phe Arg Lys Gln Phe Gln Leu Tyr

965 970 975

Lys Val Arg Asp Val Asn Asp Tyr His His Ala His Asp Ala Tyr Leu

980 985 990

Asn Gly Val Val Ala Asn Thr Leu Leu Lys Val Tyr Pro Gln Leu Glu

995 1000 1005

Pro Glu Phe Val Tyr Gly Asp Tyr His Gln Phe Asp Trp Phe Lys

1010 1015 1020

Ala Asn Lys Ala Thr Ala Lys Lys Gln Phe Tyr Thr Asn Ile Met

1025 1030 1035

Leu Phe Phe Ala Gln Lys Asp Arg Ile Ile Asp Glu Asn Gly Glu

1040 1045 1050

Ile Leu Trp Asp Lys Lys Tyr Leu Asp Thr Val Lys Lys Val Met

1055 1060 1065

Ser Tyr Arg Gln Met Asn Ile Val Lys Lys Thr Glu Ile Gln Lys

1070 1075 1080

Gly Glu Phe Ser Lys Ala Thr Ile Lys Pro Lys Gly Asn Ser Ser

1085 1090 1095

Lys Leu Ile Pro Arg Lys Thr Asn Trp Asp Pro Met Lys Tyr Gly

1100 1105 1110

Gly Leu Asp Ser Pro Asn Met Ala Tyr Ala Val Val Ile Glu Tyr

1115 1120 1125

Ala Lys Gly Lys Asn Lys Leu Val Phe Glu Lys Lys Ile Ile Arg

1130 1135 1140

Val Thr Ile Met Glu Arg Lys Ala Phe Glu Lys Asp Glu Lys Ala

1145 1150 1155

Phe Leu Glu Glu Gln Gly Tyr Arg Gln Pro Lys Val Leu Ala Lys

1160 1165 1170

Leu Pro Lys Tyr Thr Leu Tyr Glu Cys Glu Glu Gly Arg Arg Arg

1175 1180 1185

Met Leu Ala Ser Ala Asn Glu Ala Gln Lys Gly Asn Gln Gln Val

1190 1195 1200

Leu Pro Asn His Leu Val Thr Leu Leu His His Ala Ala Asn Cys

1205 1210 1215

Glu Val Ser Asp Gly Lys Ser Leu Asp Tyr Ile Glu Ser Asn Arg

1220 1225 1230

Glu Met Phe Ala Glu Leu Leu Ala His Val Ser Glu Phe Ala Lys

1235 1240 1245

Arg Tyr Thr Leu Ala Glu Ala Asn Leu Asn Lys Ile Asn Gln Leu

1250 1255 1260

Phe Glu Gln Asn Lys Glu Gly Asp Ile Lys Ala Ile Ala Gln Ser

1265 1270 1275

Phe Val Asp Leu Met Ala Phe Asn Ala Met Gly Ala Pro Ala Ser

1280 1285 1290

Phe Lys Phe Phe Glu Thr Thr Ile Glu Arg Lys Arg Tyr Asn Asn

1295 1300 1305

Leu Lys Glu Leu Leu Asn Ser Thr Ile Ile Tyr Gln Ser Ile Thr

1310 1315 1320

Gly Leu Tyr Glu Ser Arg Lys Arg Leu Asp Asp

1325 1330

<210> SEQ ID NO 4

<211> LENGTH: 1368

<212> TYPE: PRT

<213> ORGANISM: Streptococcus pyogenes

<400> SEQUENCE: 4

Met Asp Lys Lys Tyr Ser Ile Gly Leu Asp Ile Gly Thr Asn Ser Val

1 5 10 15

Gly Trp Ala Val Ile Thr Asp Asp Tyr Lys Val Pro Ser Lys Lys Leu

20 25 30

Lys Gly Leu Gly Asn Thr Asp Arg His Gly Ile Lys Lys Asn Leu Ile

35 40 45

Gly Ala Leu Leu Phe Asp Ser Gly Glu Thr Ala Glu Ala Thr Arg Leu

50 55 60

Lys Arg Thr Ala Arg Arg Arg Tyr Thr Arg Arg Lys Asn Arg Ile Cys

65 70 75 80

Tyr Leu Gln Glu Ile Phe Ser Asn Glu Met Ala Lys Val Asp Asp Ser

85 90 95

Phe Phe His Arg Leu Glu Glu Ser Phe Leu Val Glu Glu Asp Lys Lys

100 105 110

His Glu Arg His Pro Ile Phe Gly Asn Ile Val Asp Glu Val Ala Tyr

115 120 125

His Glu Lys Tyr Pro Thr Ile Tyr His Leu Arg Lys Lys Leu Ala Asp

130 135 140

Ser Thr Asp Lys Val Asp Leu Arg Leu Ile Tyr Leu Ala Leu Ala His

145 150 155 160

Met Ile Lys Phe Arg Gly His Phe Leu Ile Glu Gly Asp Leu Asn Pro

165 170 175

Asp Asn Ser Asp Val Asp Lys Leu Phe Ile Gln Leu Val Gln Thr Tyr

180 185 190

Asn Gln Leu Phe Glu Glu Asn Pro Ile Asn Ala Ser Arg Val Asp Ala

195 200 205

Lys Ala Ile Leu Ser Ala Arg Leu Ser Lys Ser Arg Arg Leu Glu Asn

210 215 220

Leu Ile Ala Gln Leu Pro Gly Glu Lys Lys Asn Gly Leu Phe Gly Asn

225 230 235 240

Leu Ile Ala Leu Ser Leu Gly Leu Thr Pro Asn Phe Lys Ser Asn Phe

245 250 255

Asp Leu Ala Glu Asp Ala Lys Leu Gln Leu Ser Lys Asp Thr Tyr Asp

260 265 270

Asp Asp Leu Asp Asn Leu Leu Ala Gln Ile Gly Asp Gln Tyr Ala Asp

275 280 285

Leu Phe Leu Ala Ala Lys Asn Leu Ser Asp Ala Thr Leu Leu Ser Asp

290 295 300

Ile Leu Arg Val Asn Ser Glu Ile Thr Lys Ala Pro Leu Ser Ala Ser

305 310 315 320

Met Ile Lys Arg Tyr Asp Glu His His Gln Asp Leu Thr Leu Leu Lys

325 330 335

Ala Leu Val Arg Gln Gln Leu Pro Glu Lys Tyr Lys Glu Ile Phe Phe

340 345 350

Asp Gln Ser Lys Asn Gly Tyr Ala Gly Tyr Ile Asp Gly Gly Ala Ser

355 360 365

Gln Glu Glu Phe Tyr Lys Phe Ile Lys Pro Ile Leu Glu Lys Met Asp

370 375 380

Gly Thr Glu Glu Leu Leu Ala Lys Leu Asn Arg Glu Asp Leu Leu Arg

385 390 395 400

Lys Gln Arg Thr Phe Asp Asn Gly Ser Ile Pro Tyr Gln Ile His Leu

405 410 415

Gly Glu Leu His Ala Ile Leu Arg Arg Gln Glu Asp Phe Tyr Pro Phe

420 425 430

Leu Lys Asp Asn Arg Glu Lys Ile Glu Lys Ile Leu Thr Phe Arg Ile

435 440 445

Pro Tyr Tyr Val Gly Pro Leu Ala Arg Gly Asn Ser Arg Phe Ala Trp

450 455 460

Met Thr Arg Lys Ser Glu Glu Thr Ile Thr Pro Trp Asn Phe Glu Glu

465 470 475 480

Val Val Asp Lys Gly Ala Ser Ala Gln Ser Phe Ile Glu Arg Met Thr

485 490 495

Asn Phe Asp Lys Asn Leu Pro Asn Glu Lys Val Leu Pro Lys His Ser

500 505 510

Leu Leu Tyr Glu Tyr Phe Thr Val Tyr Asn Glu Leu Thr Lys Val Lys

515 520 525

Tyr Val Thr Glu Gly Met Arg Lys Pro Ala Phe Leu Ser Gly Glu Gln

530 535 540

Lys Lys Ala Ile Val Asp Leu Leu Phe Lys Thr Asn Arg Lys Val Thr

545 550 555 560

Val Lys Gln Leu Lys Glu Asp Tyr Phe Lys Lys Ile Glu Cys Phe Asp

565 570 575

Ser Val Glu Ile Ser Gly Val Glu Asp Arg Phe Asn Ala Ser Leu Gly

580 585 590

Thr Tyr His Asp Leu Leu Lys Ile Ile Lys Asp Lys Asp Phe Leu Asp

595 600 605

Asn Glu Glu Asn Glu Asp Ile Leu Glu Asp Ile Val Leu Thr Leu Thr

610 615 620

Leu Phe Glu Asp Arg Glu Met Ile Glu Glu Arg Leu Lys Thr Tyr Ala

625 630 635 640

His Leu Phe Asp Asp Lys Val Met Lys Gln Leu Lys Arg Arg Arg Tyr

645 650 655

Thr Gly Trp Gly Arg Leu Ser Arg Lys Leu Ile Asn Gly Ile Arg Asp

660 665 670

Lys Gln Ser Gly Lys Thr Ile Leu Asp Phe Leu Lys Ser Asp Gly Phe

675 680 685

Ala Asn Arg Asn Phe Met Gln Leu Ile His Asp Asp Ser Leu Thr Phe

690 695 700

Lys Glu Asp Ile Gln Lys Ala Gln Val Ser Gly Gln Gly Asp Ser Leu

705 710 715 720

His Glu His Ile Ala Asn Leu Ala Gly Ser Pro Ala Ile Lys Lys Gly

725 730 735

Ile Leu Gln Thr Val Lys Val Val Asp Glu Leu Val Lys Val Met Gly

740 745 750

Arg His Lys Pro Glu Asn Ile Val Ile Glu Met Ala Arg Glu Asn Gln

755 760 765

Thr Thr Gln Lys Gly Gln Lys Asn Ser Arg Glu Arg Met Lys Arg Ile

770 775 780

Glu Glu Gly Ile Lys Glu Leu Gly Ser Asp Ile Leu Lys Glu Tyr Pro

785 790 795 800

Val Glu Asn Thr Gln Leu Gln Asn Glu Lys Leu Tyr Leu Tyr Tyr Leu

805 810 815

Gln Asn Gly Arg Asp Met Tyr Val Asp Gln Glu Leu Asp Ile Asn Arg

820 825 830

Leu Ser Asp Tyr Asp Val Asp His Ile Val Pro Gln Ser Phe Leu Lys

835 840 845

Asp Asp Ser Ile Asp Asn Lys Val Leu Thr Arg Ser Asp Lys Asn Arg

850 855 860

Gly Lys Ser Asp Asn Val Pro Ser Glu Glu Val Val Lys Lys Met Lys

865 870 875 880

Asn Tyr Trp Arg Gln Leu Leu Asn Ala Lys Leu Ile Thr Gln Arg Lys

885 890 895

Phe Asp Asn Leu Thr Lys Ala Glu Arg Gly Gly Leu Ser Glu Leu Asp

900 905 910

Lys Val Gly Phe Ile Lys Arg Gln Leu Val Glu Thr Arg Gln Ile Thr

915 920 925

Lys His Val Ala Gln Ile Leu Asp Ser Arg Met Asn Thr Lys Tyr Asp

930 935 940

Glu Asn Asp Lys Leu Ile Arg Glu Val Arg Val Ile Thr Leu Lys Ser

945 950 955 960

Lys Leu Val Ser Asp Phe Arg Lys Asp Phe Gln Phe Tyr Lys Val Arg

965 970 975

Glu Ile Asn Asn Tyr His His Ala His Asp Ala Tyr Leu Asn Ala Val

980 985 990

Val Gly Thr Ala Leu Ile Lys Lys Tyr Pro Lys Leu Glu Ser Glu Phe

995 1000 1005

Val Tyr Gly Asp Tyr Lys Val Tyr Asp Val Arg Lys Met Ile Ala

1010 1015 1020

Lys Ser Glu Gln Glu Ile Gly Lys Ala Thr Ala Lys Tyr Phe Phe

1025 1030 1035

Tyr Ser Asn Ile Met Asn Phe Phe Lys Thr Glu Ile Thr Leu Ala

1040 1045 1050

Asn Gly Glu Ile Arg Lys Arg Pro Leu Ile Glu Thr Asn Gly Glu

1055 1060 1065

Thr Gly Glu Ile Val Trp Asp Lys Gly Arg Asp Phe Ala Thr Val

1070 1075 1080

Arg Lys Val Leu Ser Met Pro Gln Val Asn Ile Val Lys Lys Thr

1085 1090 1095

Glu Val Gln Thr Gly Gly Phe Ser Lys Glu Ser Ile Leu Pro Lys

1100 1105 1110

Arg Asn Ser Asp Lys Leu Ile Ala Arg Lys Lys Asp Trp Asp Pro

1115 1120 1125

Lys Lys Tyr Gly Gly Phe Asp Ser Pro Thr Val Ala Tyr Ser Val

1130 1135 1140

Leu Val Val Ala Lys Val Glu Lys Gly Lys Ser Lys Lys Leu Lys

1145 1150 1155

Ser Val Lys Glu Leu Leu Gly Ile Thr Ile Met Glu Arg Ser Ser

1160 1165 1170

Phe Glu Lys Asp Pro Ile Asp Phe Leu Glu Ala Lys Gly Tyr Lys

1175 1180 1185

Glu Val Arg Lys Asp Leu Ile Ile Lys Leu Pro Lys Tyr Ser Leu

1190 1195 1200

Phe Glu Leu Glu Asn Gly Arg Lys Arg Met Leu Ala Ser Ala Gly

1205 1210 1215

Glu Leu Gln Lys Gly Asn Glu Leu Ala Leu Pro Ser Lys Tyr Val

1220 1225 1230

Asn Phe Leu Tyr Leu Ala Ser His Tyr Glu Lys Leu Lys Gly Ser

1235 1240 1245

Pro Glu Asp Asn Glu Gln Lys Gln Leu Phe Val Glu Gln His Lys

1250 1255 1260

His Tyr Leu Asp Glu Ile Ile Glu Gln Ile Ser Glu Phe Ser Lys

1265 1270 1275

Arg Val Ile Leu Ala Asp Ala Asn Leu Asp Lys Val Leu Ser Ala

1280 1285 1290

Tyr Asn Lys His Arg Asp Lys Pro Ile Arg Glu Gln Ala Glu Asn

1295 1300 1305

Ile Ile His Leu Phe Thr Leu Thr Asn Leu Gly Ala Pro Ala Ala

1310 1315 1320

Phe Lys Tyr Phe Asp Thr Thr Ile Asp Arg Lys Arg Tyr Thr Ser

1325 1330 1335

Thr Lys Glu Val Leu Asp Ala Thr Leu Ile His Gln Ser Ile Thr

1340 1345 1350

Gly Leu Tyr Glu Thr Arg Ile Asp Leu Ser Gln Leu Gly Gly Asp

1355 1360 1365

<210> SEQ ID NO 5

<211> LENGTH: 1370

<212> TYPE: PRT

<213> ORGANISM: Streptococcus agalactiae

<400> SEQUENCE: 5

Met Asn Lys Pro Tyr Ser Ile Gly Leu Asp Ile Gly Thr Asn Ser Val

1 5 10 15

Gly Trp Ser Ile Ile Thr Asp Asp Tyr Lys Val Pro Ala Lys Lys Met

20 25 30

Arg Val Leu Gly Asn Thr Asp Lys Glu Tyr Ile Lys Lys Asn Leu Ile

35 40 45

Gly Ala Leu Leu Phe Asp Gly Gly Asn Thr Ala Ala Asp Arg Arg Leu

50 55 60

Lys Arg Thr Ala Arg Arg Arg Tyr Thr Arg Arg Arg Asn Arg Ile Leu

65 70 75 80

Tyr Leu Gln Glu Ile Phe Ala Glu Glu Met Ser Lys Val Asp Asp Ser

85 90 95

Phe Phe His Arg Leu Glu Asp Ser Phe Leu Val Glu Glu Asp Lys Arg

100 105 110

Gly Ser Lys Tyr Pro Ile Phe Ala Thr Leu Gln Glu Glu Lys Asp Tyr

115 120 125

His Glu Lys Phe Ser Thr Ile Tyr His Leu Arg Lys Glu Leu Ala Asp

130 135 140

Lys Lys Glu Lys Ala Asp Leu Arg Leu Ile Tyr Ile Ala Leu Ala His

145 150 155 160

Ile Ile Lys Phe Arg Gly His Phe Leu Ile Glu Asp Asp Ser Phe Asp

165 170 175

Val Arg Asn Thr Asp Ile Ser Lys Gln Tyr Gln Asp Phe Leu Glu Ile

180 185 190

Phe Asn Thr Thr Phe Glu Asn Asn Asp Leu Leu Ser Gln Asn Val Asp

195 200 205

Val Glu Ala Ile Leu Thr Asp Lys Ile Ser Lys Ser Ala Lys Lys Asp

210 215 220

Arg Ile Leu Ala Gln Tyr Pro Asn Gln Lys Ser Thr Gly Ile Phe Ala

225 230 235 240

Glu Phe Leu Lys Leu Ile Val Gly Asn Gln Ala Asp Phe Lys Lys Tyr

245 250 255

Phe Asn Leu Glu Asp Lys Thr Pro Leu Gln Phe Ala Lys Asp Ser Tyr

260 265 270

Asp Glu Asp Leu Glu Asn Leu Leu Gly Gln Ile Gly Asp Glu Phe Ala

275 280 285

Asp Leu Phe Ser Ala Ala Lys Lys Leu Tyr Asp Ser Val Leu Leu Ser

290 295 300

Gly Ile Leu Thr Val Ile Asp Leu Ser Thr Lys Ala Pro Leu Ser Ala

305 310 315 320

Ser Met Ile Gln Arg Tyr Asp Glu His Arg Glu Asp Leu Lys Gln Leu

325 330 335

Lys Gln Phe Val Lys Ala Ser Leu Pro Glu Lys Tyr Gln Glu Ile Phe

340 345 350

Ala Asp Ser Ser Lys Asp Gly Tyr Ala Gly Tyr Ile Glu Gly Lys Thr

355 360 365

Asn Gln Glu Ala Phe Tyr Lys Tyr Leu Ser Lys Leu Leu Thr Lys Gln

370 375 380

Glu Asp Ser Glu Asn Phe Leu Glu Lys Ile Lys Asn Glu Asp Phe Leu

385 390 395 400

Arg Lys Gln Arg Thr Phe Asp Asn Gly Ser Ile Pro His Gln Val His

405 410 415

Leu Thr Glu Leu Lys Ala Ile Ile Arg Arg Gln Ser Glu Tyr Tyr Pro

420 425 430

Phe Leu Lys Glu Asn Gln Asp Arg Ile Glu Lys Ile Leu Thr Phe Arg

435 440 445

Ile Pro Tyr Tyr Ile Gly Pro Leu Ala Arg Glu Lys Ser Asp Phe Ala

450 455 460

Trp Met Thr Arg Lys Thr Asp Asp Ser Ile Arg Pro Trp Asn Phe Glu

465 470 475 480

Asp Leu Val Asp Lys Glu Lys Ser Ala Glu Ala Phe Ile His Arg Met

485 490 495

Thr Asn Asn Asp Phe Tyr Leu Pro Glu Glu Lys Val Leu Pro Lys His

500 505 510

Ser Leu Ile Tyr Glu Lys Phe Thr Val Tyr Asn Glu Leu Thr Lys Val

515 520 525

Arg Tyr Lys Asn Glu Gln Gly Glu Thr Tyr Phe Phe Asp Ser Asn Ile

530 535 540

Lys Gln Glu Ile Phe Asp Gly Val Phe Lys Glu His Arg Lys Val Ser

545 550 555 560

Lys Lys Lys Leu Leu Asp Phe Leu Ala Lys Glu Tyr Glu Glu Phe Arg

565 570 575

Ile Val Asp Val Ile Gly Leu Asp Lys Glu Asn Lys Ala Phe Asn Ala

580 585 590

Ser Leu Gly Thr Tyr His Asp Leu Glu Lys Ile Leu Asp Lys Asp Phe

595 600 605

Leu Asp Asn Pro Asp Asn Glu Ser Ile Leu Glu Asp Ile Val Gln Thr

610 615 620

Leu Thr Leu Phe Glu Asp Arg Glu Met Ile Lys Lys Arg Leu Glu Asn

625 630 635 640

Tyr Lys Asp Leu Phe Thr Glu Ser Gln Leu Lys Lys Leu Tyr Arg Arg

645 650 655

His Tyr Thr Gly Trp Gly Arg Leu Ser Ala Lys Leu Ile Asn Gly Ile

660 665 670

Arg Asp Lys Glu Ser Gln Lys Thr Ile Leu Asp Tyr Leu Ile Asp Asp

675 680 685

Gly Arg Ser Asn Arg Asn Phe Met Gln Leu Ile Asn Asp Asp Gly Leu

690 695 700

Ser Phe Lys Ser Ile Ile Ser Lys Ala Gln Ala Gly Ser His Ser Asp

705 710 715 720

Asn Leu Lys Glu Val Val Gly Glu Leu Ala Gly Ser Pro Ala Ile Lys

725 730 735

Lys Gly Ile Leu Gln Ser Leu Lys Ile Val Asp Glu Leu Val Lys Val

740 745 750

Met Gly Tyr Glu Pro Glu Gln Ile Val Val Glu Met Ala Arg Glu Asn

755 760 765

Gln Thr Thr Asn Gln Gly Arg Arg Asn Ser Arg Gln Arg Tyr Lys Leu

770 775 780

Leu Asp Asp Gly Val Lys Asn Leu Ala Ser Asp Leu Asn Gly Asn Ile

785 790 795 800

Leu Lys Glu Tyr Pro Thr Asp Asn Gln Ala Leu Gln Asn Glu Arg Leu

805 810 815

Phe Leu Tyr Tyr Leu Gln Asn Gly Arg Asp Met Tyr Thr Gly Glu Ala

820 825 830

Leu Asp Ile Asp Asn Leu Ser Gln Tyr Asp Ile Asp His Ile Ile Pro

835 840 845

Gln Ala Phe Ile Lys Asp Asp Ser Ile Asp Asn Arg Val Leu Val Ser

850 855 860

Ser Ala Lys Asn Arg Gly Lys Ser Asp Asp Val Pro Ser Leu Glu Ile

865 870 875 880

Val Lys Asp Cys Lys Val Phe Trp Lys Lys Leu Leu Asp Ala Lys Leu

885 890 895

Met Ser Gln Arg Lys Tyr Asp Asn Leu Thr Lys Ala Glu Arg Gly Gly

900 905 910

Leu Thr Ser Asp Asp Lys Ala Arg Phe Ile Gln Arg Gln Leu Val Glu

915 920 925

Thr Arg Gln Ile Thr Lys His Val Ala Arg Ile Leu Asp Glu Arg Phe

930 935 940

Asn Asn Glu Leu Asp Ser Lys Gly Arg Arg Ile Arg Lys Val Lys Ile

945 950 955 960

Val Thr Leu Lys Ser Asn Leu Val Ser Asn Phe Arg Lys Glu Phe Gly

965 970 975

Phe Tyr Lys Ile Arg Glu Val Asn Asn Tyr His His Ala His Asp Ala

980 985 990

Tyr Leu Asn Ala Val Val Ala Lys Ala Ile Leu Thr Lys Tyr Pro Gln

995 1000 1005

Leu Glu Pro Glu Phe Val Tyr Gly Asp Tyr Pro Lys Tyr Asn Ser

1010 1015 1020

Tyr Lys Thr Arg Lys Ser Ala Thr Glu Lys Leu Phe Phe Tyr Ser

1025 1030 1035

Asn Ile Met Asn Phe Phe Lys Thr Lys Val Thr Leu Ala Asp Gly

1040 1045 1050

Thr Val Val Val Lys Asp Asp Ile Glu Val Asn Asn Asp Thr Gly

1055 1060 1065

Glu Ile Val Trp Asp Lys Lys Lys His Phe Ala Thr Val Arg Lys

1070 1075 1080

Val Leu Ser Tyr Pro Gln Asn Asn Ile Val Lys Lys Thr Glu Ile

1085 1090 1095

Gln Thr Gly Gly Phe Ser Lys Glu Ser Ile Leu Ala His Gly Asn

1100 1105 1110

Ser Asp Lys Leu Ile Pro Arg Lys Thr Lys Asp Ile Tyr Leu Asp

1115 1120 1125

Pro Lys Lys Tyr Gly Gly Phe Asp Ser Pro Ile Val Ala Tyr Ser

1130 1135 1140

Val Leu Val Val Ala Asp Ile Lys Lys Gly Lys Ala Gln Lys Leu

1145 1150 1155

Lys Thr Val Thr Glu Leu Leu Gly Ile Thr Ile Met Glu Arg Ser

1160 1165 1170

Arg Phe Glu Lys Asn Pro Ser Ala Phe Leu Glu Ser Lys Gly Tyr

1175 1180 1185

Leu Asn Ile Arg Ala Asp Lys Leu Ile Ile Leu Pro Lys Tyr Ser

1190 1195 1200

Leu Phe Glu Leu Glu Asn Gly Arg Arg Arg Leu Leu Ala Ser Ala

1205 1210 1215

Gly Glu Leu Gln Lys Gly Asn Glu Leu Ala Leu Pro Thr Gln Phe

1220 1225 1230

Met Lys Phe Leu Tyr Leu Ala Ser Arg Tyr Asn Glu Ser Lys Gly

1235 1240 1245

Lys Pro Glu Glu Ile Glu Lys Lys Gln Glu Phe Val Asn Gln His

1250 1255 1260

Val Ser Tyr Phe Asp Asp Ile Leu Gln Leu Ile Asn Asp Phe Ser

1265 1270 1275

Lys Arg Val Ile Leu Ala Asp Ala Asn Leu Glu Lys Ile Asn Lys

1280 1285 1290

Leu Tyr Gln Asp Asn Lys Glu Asn Ile Ser Val Asp Glu Leu Ala

1295 1300 1305

Asn Asn Ile Ile Asn Leu Phe Thr Phe Thr Ser Leu Gly Ala Pro

1310 1315 1320

Ala Ala Phe Lys Phe Phe Asp Lys Ile Val Asp Arg Lys Arg Tyr

1325 1330 1335

Thr Ser Thr Lys Glu Val Leu Asn Ser Thr Leu Ile His Gln Ser

1340 1345 1350

Ile Thr Gly Leu Tyr Glu Thr Arg Ile Asp Leu Gly Lys Leu Gly

1355 1360 1365

Glu Asp

1370

<210> SEQ ID NO 6

<211> LENGTH: 1345

<212> TYPE: PRT

<213> ORGANISM: Streptococcus mutans

<400> SEQUENCE: 6

Met Lys Lys Pro Tyr Ser Ile Gly Leu Asp Ile Gly Thr Asn Ser Val

1 5 10 15

Gly Trp Ala Val Val Thr Asp Asp Tyr Lys Val Pro Ala Lys Lys Met

20 25 30

Lys Val Leu Gly Asn Thr Asp Lys Ser His Ile Glu Lys Asn Leu Leu

35 40 45

Gly Ala Leu Leu Phe Asp Ser Gly Asn Thr Ala Glu Asp Arg Arg Leu

50 55 60

Lys Arg Thr Ala Arg Arg Arg Tyr Thr Arg Arg Arg Asn Arg Ile Leu

65 70 75 80

Tyr Leu Gln Glu Ile Phe Ser Glu Glu Met Gly Lys Val Asp Asp Ser

85 90 95

Phe Phe His Arg Leu Glu Asp Ser Phe Leu Val Thr Glu Asp Lys Arg

100 105 110

Gly Glu Arg His Pro Ile Phe Gly Asn Leu Glu Glu Glu Val Lys Tyr

115 120 125

His Glu Asn Phe Pro Thr Ile Tyr His Leu Arg Gln Tyr Leu Ala Asp

130 135 140

Asn Pro Glu Lys Val Asp Leu Arg Leu Val Tyr Leu Ala Leu Ala His

145 150 155 160

Ile Ile Lys Phe Arg Gly His Phe Leu Ile Glu Gly Lys Phe Asp Thr

165 170 175

Arg Asn Asn Asp Val Gln Arg Leu Phe Gln Glu Phe Leu Ala Val Tyr

180 185 190

Asp Asn Thr Phe Glu Asn Ser Ser Leu Gln Glu Gln Asn Val Gln Val

195 200 205

Glu Glu Ile Leu Thr Asp Lys Ile Ser Lys Ser Ala Lys Lys Asp Arg

210 215 220

Val Leu Lys Leu Phe Pro Asn Glu Lys Ser Asn Gly Arg Phe Ala Glu

225 230 235 240

Phe Leu Lys Leu Ile Val Gly Asn Gln Ala Asp Phe Lys Lys His Phe

245 250 255

Glu Leu Glu Glu Lys Ala Pro Leu Gln Phe Ser Lys Asp Thr Tyr Glu

260 265 270

Glu Glu Leu Glu Val Leu Leu Ala Gln Ile Gly Asp Asn Tyr Ala Glu

275 280 285

Leu Phe Leu Ser Ala Lys Lys Leu Tyr Asp Ser Ile Leu Leu Ser Gly

290 295 300

Ile Leu Thr Val Thr Asp Val Gly Thr Lys Ala Pro Leu Ser Ala Ser

305 310 315 320

Met Ile Gln Arg Tyr Asn Glu His Gln Met Asp Leu Ala Gln Leu Lys

325 330 335

Gln Phe Ile Arg Gln Lys Leu Ser Asp Lys Tyr Asn Glu Val Phe Ser

340 345 350

Asp Val Ser Lys Asp Gly Tyr Ala Gly Tyr Ile Asp Gly Lys Thr Asn

355 360 365

Gln Glu Ala Phe Tyr Lys Tyr Leu Lys Gly Leu Leu Asn Lys Ile Glu

370 375 380

Gly Ser Gly Tyr Phe Leu Asp Lys Ile Glu Arg Glu Asp Phe Leu Arg

385 390 395 400

Lys Gln Arg Thr Phe Asp Asn Gly Ser Ile Pro His Gln Ile His Leu

405 410 415

Gln Glu Met Arg Ala Ile Ile Arg Arg Gln Ala Glu Phe Tyr Pro Phe

420 425 430

Leu Ala Asp Asn Gln Asp Arg Ile Glu Lys Leu Leu Thr Phe Arg Ile

435 440 445

Pro Tyr Tyr Val Gly Pro Leu Ala Arg Gly Lys Ser Asp Phe Ala Trp

450 455 460

Leu Ser Arg Lys Ser Ala Asp Lys Ile Thr Pro Trp Asn Phe Asp Glu

465 470 475 480

Ile Val Asp Lys Glu Ser Ser Ala Glu Ala Phe Ile Asn Arg Met Thr

485 490 495

Asn Tyr Asp Leu Tyr Leu Pro Asn Gln Lys Val Leu Pro Lys His Ser

500 505 510

Leu Leu Tyr Glu Lys Phe Thr Val Tyr Asn Glu Leu Thr Lys Val Lys

515 520 525

Tyr Lys Thr Glu Gln Gly Lys Thr Ala Phe Phe Asp Ala Asn Met Lys

530 535 540

Gln Glu Ile Phe Asp Gly Val Phe Lys Val Tyr Arg Lys Val Thr Lys

545 550 555 560

Asp Lys Leu Met Asp Phe Leu Glu Lys Glu Phe Asp Glu Phe Arg Ile

565 570 575

Val Asp Leu Thr Gly Leu Asp Lys Glu Asn Lys Val Phe Asn Ala Ser

580 585 590

Tyr Gly Thr Tyr His Asp Leu Cys Lys Ile Leu Asp Lys Asp Phe Leu

595 600 605

Asp Asn Ser Lys Asn Glu Lys Ile Leu Glu Asp Ile Val Leu Thr Leu

610 615 620

Thr Leu Phe Glu Asp Arg Glu Met Ile Arg Lys Arg Leu Glu Asn Tyr

625 630 635 640

Ser Asp Leu Leu Thr Lys Glu Gln Val Lys Lys Leu Glu Arg Arg His

645 650 655

Tyr Thr Gly Trp Gly Arg Leu Ser Ala Glu Leu Ile His Gly Ile Arg

660 665 670

Asn Lys Glu Ser Arg Lys Thr Ile Leu Asp Tyr Leu Ile Asp Asp Gly

675 680 685

Asn Ser Asn Arg Asn Phe Met Gln Leu Ile Asn Asp Asp Ala Leu Ser

690 695 700

Phe Lys Glu Glu Ile Ala Lys Ala Gln Val Ile Gly Glu Thr Asp Asn

705 710 715 720

Leu Asn Gln Val Val Ser Asp Ile Ala Gly Ser Pro Ala Ile Lys Lys

725 730 735

Gly Ile Leu Gln Ser Leu Lys Ile Val Asp Glu Leu Val Lys Ile Met

740 745 750

Gly His Gln Pro Glu Asn Ile Val Val Glu Met Ala Arg Glu Asn Gln

755 760 765

Phe Thr Asn Gln Gly Arg Arg Asn Ser Gln Gln Arg Leu Lys Gly Leu

770 775 780

Thr Asp Ser Ile Lys Glu Phe Gly Ser Gln Ile Leu Lys Glu His Pro

785 790 795 800

Val Glu Asn Ser Gln Leu Gln Asn Asp Arg Leu Phe Leu Tyr Tyr Leu

805 810 815

Gln Asn Gly Arg Asp Met Tyr Thr Gly Glu Glu Leu Asp Ile Asp Tyr

820 825 830

Leu Ser Gln Tyr Asp Ile Asp His Ile Ile Pro Gln Ala Phe Ile Lys

835 840 845

Asp Asn Ser Ile Asp Asn Arg Val Leu Thr Ser Ser Lys Glu Asn Arg

850 855 860

Gly Lys Ser Asp Asp Val Pro Ser Lys Asp Val Val Arg Lys Met Lys

865 870 875 880

Ser Tyr Trp Ser Lys Leu Leu Ser Ala Lys Leu Ile Thr Gln Arg Lys

885 890 895

Phe Asp Asn Leu Thr Lys Ala Glu Arg Gly Gly Leu Thr Asp Asp Asp

900 905 910

Lys Ala Gly Phe Ile Lys Arg Gln Leu Val Glu Thr Arg Gln Ile Thr

915 920 925

Lys His Val Ala Arg Ile Leu Asp Glu Arg Phe Asn Thr Glu Thr Asp

930 935 940

Glu Asn Asn Lys Lys Ile Arg Gln Val Lys Ile Val Thr Leu Lys Ser

945 950 955 960

Asn Leu Val Ser Asn Phe Arg Lys Glu Phe Glu Leu Tyr Lys Val Arg

965 970 975

Glu Ile Asn Asp Tyr His His Ala His Asp Ala Tyr Leu Asn Ala Val

980 985 990

Ile Gly Lys Ala Leu Leu Gly Val Tyr Pro Gln Leu Glu Pro Glu Phe

995 1000 1005

Val Tyr Gly Asp Tyr Pro His Phe His Gly His Lys Glu Asn Lys

1010 1015 1020

Ala Thr Ala Lys Lys Phe Phe Tyr Ser Asn Ile Met Asn Phe Phe

1025 1030 1035

Lys Lys Asp Asp Val Arg Thr Asp Lys Asn Gly Glu Ile Ile Trp

1040 1045 1050

Lys Lys Asp Glu His Ile Ser Asn Ile Lys Lys Val Leu Ser Tyr

1055 1060 1065

Pro Gln Val Asn Ile Val Lys Lys Val Glu Glu Gln Thr Gly Gly

1070 1075 1080

Phe Ser Lys Glu Ser Ile Leu Pro Lys Gly Asn Ser Asp Lys Leu

1085 1090 1095

Ile Pro Arg Lys Thr Lys Lys Phe Tyr Trp Asp Thr Lys Lys Tyr

1100 1105 1110

Gly Gly Phe Asp Ser Pro Ile Val Ala Tyr Ser Ile Leu Val Ile

1115 1120 1125

Ala Asp Ile Glu Lys Gly Lys Ser Lys Lys Leu Lys Thr Val Lys

1130 1135 1140

Ala Leu Val Gly Val Thr Ile Met Glu Lys Met Thr Phe Glu Arg

1145 1150 1155

Asp Pro Val Ala Phe Leu Glu Arg Lys Gly Tyr Arg Asn Val Gln

1160 1165 1170

Glu Glu Asn Ile Ile Lys Leu Pro Lys Tyr Ser Leu Phe Lys Leu

1175 1180 1185

Glu Asn Gly Arg Lys Arg Leu Leu Ala Ser Ala Arg Glu Leu Gln

1190 1195 1200

Lys Gly Asn Glu Ile Val Leu Pro Asn His Leu Gly Thr Leu Leu

1205 1210 1215

Tyr His Ala Lys Asn Ile His Lys Val Asp Glu Pro Lys His Leu

1220 1225 1230

Asp Tyr Val Asp Lys His Lys Asp Glu Phe Lys Glu Leu Leu Asp

1235 1240 1245

Val Val Ser Asn Phe Ser Lys Lys Tyr Thr Leu Ala Glu Gly Asn

1250 1255 1260

Leu Glu Lys Ile Lys Glu Leu Tyr Ala Gln Asn Asn Gly Glu Asp

1265 1270 1275

Leu Lys Glu Leu Ala Ser Ser Phe Ile Asn Leu Leu Thr Phe Thr

1280 1285 1290

Ala Ile Gly Ala Pro Ala Thr Phe Lys Phe Phe Asp Lys Asn Ile

1295 1300 1305

Asp Arg Lys Arg Tyr Thr Ser Thr Thr Glu Ile Leu Asn Ala Thr

1310 1315 1320

Leu Ile His Gln Ser Ile Thr Gly Leu Tyr Glu Thr Arg Ile Asp

1325 1330 1335

Leu Asn Lys Leu Gly Gly Asp

1340 1345

<210> SEQ ID NO 7

<211> LENGTH: 1377

<212> TYPE: PRT

<213> ORGANISM: Streptococcus agalactiae

<400> SEQUENCE: 7

Met Asn Lys Pro Tyr Ser Ile Gly Leu Asp Ile Gly Thr Asn Ser Val

1 5 10 15

Gly Trp Ser Ile Ile Thr Asp Asp Tyr Lys Val Pro Ala Lys Lys Met

20 25 30

Arg Val Leu Gly Asn Thr Asp Lys Glu Tyr Ile Lys Lys Asn Leu Ile

35 40 45

Gly Ala Leu Leu Phe Asp Gly Gly Asn Thr Ala Ala Asp Arg Arg Leu

50 55 60

Lys Arg Thr Ala Arg Arg Arg Tyr Thr Arg Arg Arg Asn Arg Ile Leu

65 70 75 80

Tyr Leu Gln Glu Ile Phe Ala Glu Glu Met Ser Lys Val Asp Asp Ser

85 90 95

Phe Phe His Arg Leu Glu Asp Ser Phe Leu Val Glu Glu Asp Lys Arg

100 105 110

Gly Ser Lys Tyr Pro Ile Phe Ala Thr Met Gln Glu Glu Lys Tyr Tyr

115 120 125

His Glu Lys Phe Pro Thr Ile Tyr His Leu Arg Lys Glu Leu Ala Asp

130 135 140

Lys Lys Glu Lys Ala Asp Leu Arg Leu Val Tyr Leu Ala Leu Ala His

145 150 155 160

Ile Ile Lys Phe Arg Gly His Phe Leu Ile Glu Asp Asp Arg Phe Asp

165 170 175

Val Arg Asn Thr Asp Ile Gln Lys Gln Tyr Gln Ala Phe Leu Glu Ile

180 185 190

Phe Asp Thr Thr Phe Glu Asn Asn His Leu Leu Ser Gln Asn Val Asp

195 200 205

Val Glu Ala Ile Leu Thr Asp Lys Ile Ser Lys Ser Ala Lys Lys Asp

210 215 220

Arg Ile Leu Ala Gln Tyr Pro Asn Gln Lys Ser Thr Gly Ile Phe Ala

225 230 235 240

Glu Phe Leu Lys Leu Ile Val Gly Asn Gln Ala Asp Phe Lys Lys His

245 250 255

Phe Asn Leu Glu Asp Lys Thr Pro Leu Gln Phe Ala Lys Asp Ser Tyr

260 265 270

Asp Glu Asp Leu Glu Asn Leu Leu Gly Gln Ile Gly Asp Glu Phe Ala

275 280 285

Asp Leu Phe Ser Val Ala Lys Lys Leu Tyr Asp Ser Val Leu Leu Ser

290 295 300

Gly Ile Leu Thr Val Thr Asp Leu Ser Thr Lys Ala Pro Leu Ser Ala

305 310 315 320

Ser Met Ile Gln Arg Tyr Asp Glu His His Glu Asp Leu Lys His Leu

325 330 335

Lys Gln Phe Val Lys Ala Ser Leu Pro Glu Asn Tyr Arg Glu Val Phe

340 345 350

Ala Asp Ser Ser Lys Asp Gly Tyr Ala Gly Tyr Ile Glu Gly Lys Thr

355 360 365

Asn Gln Glu Ala Phe Tyr Lys Tyr Leu Leu Lys Leu Leu Thr Lys Gln

370 375 380

Glu Gly Ser Glu Tyr Phe Leu Glu Lys Ile Lys Asn Glu Asp Phe Leu

385 390 395 400

Arg Lys Gln Arg Thr Phe Asp Asn Gly Ser Ile Pro His Gln Val His

405 410 415

Leu Thr Glu Leu Arg Ala Ile Ile Arg Arg Gln Ser Glu Tyr Tyr Pro

420 425 430

Phe Leu Lys Glu Asn Gln Asp Arg Ile Glu Lys Ile Leu Thr Phe Arg

435 440 445

Ile Pro Tyr Tyr Val Gly Pro Leu Ala Arg Glu Lys Ser Asp Phe Ala

450 455 460

Trp Met Thr Arg Lys Thr Asp Asp Ser Ile Arg Pro Trp Asn Phe Glu

465 470 475 480

Asp Leu Val Asp Lys Glu Lys Ser Ala Glu Ala Phe Ile His Arg Met

485 490 495

Thr Asn Asn Asp Leu Tyr Leu Pro Glu Glu Lys Val Leu Pro Lys His

500 505 510

Ser Leu Ile Tyr Glu Lys Phe Thr Val Tyr Asn Glu Leu Thr Lys Val

515 520 525

Arg Phe Leu Ala Glu Gly Phe Lys Asp Phe Gln Phe Leu Asn Arg Lys

530 535 540

Gln Lys Glu Thr Ile Phe Asn Ser Leu Phe Lys Glu Lys Arg Lys Val

545 550 555 560

Thr Glu Lys Asp Ile Ile Ser Phe Leu Asn Lys Val Asp Gly Tyr Glu

565 570 575

Gly Ile Ala Ile Lys Gly Ile Glu Lys Gln Phe Asn Ala Ser Leu Ser

580 585 590

Thr Tyr His Asp Leu Lys Lys Ile Leu Gly Lys Asp Phe Leu Asp Asn

595 600 605

Thr Asp Asn Glu Leu Ile Leu Glu Asp Ile Val Gln Thr Leu Thr Leu

610 615 620

Phe Glu Asp Arg Glu Met Ile Lys Lys Cys Leu Asp Ile Tyr Lys Asp

625 630 635 640

Phe Phe Thr Glu Ser Gln Leu Lys Lys Leu Tyr Arg Arg His Tyr Thr

645 650 655

Gly Trp Gly Arg Leu Ser Ala Lys Leu Ile Asn Gly Ile Arg Asn Lys

660 665 670

Glu Asn Gln Lys Thr Ile Leu Asp Tyr Leu Ile Asp Asp Gly Ser Ala

675 680 685

Asn Arg Asn Phe Met Gln Leu Ile Asn Asp Asp Asp Leu Ser Phe Lys

690 695 700

Pro Ile Ile Asp Lys Ala Arg Thr Gly Ser His Ser Asp Asn Leu Lys

705 710 715 720

Glu Val Val Gly Glu Leu Ala Gly Ser Pro Ala Ile Lys Lys Gly Ile

725 730 735

Leu Gln Ser Leu Lys Ile Val Asp Glu Leu Val Lys Val Met Gly Tyr

740 745 750

Glu Pro Glu Gln Ile Val Val Glu Met Ala Arg Glu Asn Gln Thr Thr

755 760 765

Ala Lys Gly Leu Ser Arg Ser Arg Gln Arg Leu Thr Thr Leu Arg Glu

770 775 780

Ser Leu Ala Asn Leu Lys Ser Asn Ile Leu Glu Glu Lys Lys Pro Lys

785 790 795 800

Tyr Val Lys Asp Gln Val Glu Asn His His Leu Ser Asp Asp Arg Leu

805 810 815

Phe Leu Tyr Tyr Leu Gln Asn Gly Arg Asp Met Tyr Thr Lys Lys Ala

820 825 830

Leu Asp Ile Asp Asn Leu Ser Gln Tyr Asp Ile Asp His Ile Ile Pro

835 840 845

Gln Ala Phe Ile Lys Asp Asp Ser Ile Asp Asn Arg Val Leu Val Ser

850 855 860

Ser Ala Lys Asn Arg Gly Lys Ser Asp Asp Val Pro Ser Ile Glu Ile

865 870 875 880

Val Lys Ala Arg Lys Met Phe Trp Lys Asn Leu Leu Asp Ala Lys Leu

885 890 895

Met Ser Gln Arg Lys Tyr Asp Asn Leu Thr Lys Ala Glu Arg Gly Gly

900 905 910

Leu Thr Ser Asp Asp Lys Ala Arg Phe Ile Gln Arg Gln Leu Val Glu

915 920 925

Thr Arg Gln Ile Thr Lys His Val Ala Arg Ile Leu Asp Glu Arg Phe

930 935 940

Asn Asn Glu Val Asp Asn Gly Lys Lys Ile Cys Lys Val Lys Ile Val

945 950 955 960

Thr Leu Lys Ser Asn Leu Val Ser Asn Phe Arg Lys Glu Phe Gly Phe

965 970 975

Tyr Lys Ile Arg Glu Val Asn Asp Tyr His His Ala His Asp Ala Tyr

980 985 990

Leu Asn Ala Val Val Ala Lys Ala Ile Leu Thr Lys Tyr Pro Gln Leu

995 1000 1005

Glu Pro Glu Phe Val Tyr Gly Met Tyr Arg Gln Lys Lys Leu Ser

1010 1015 1020

Lys Ile Val His Glu Asp Lys Glu Glu Lys Tyr Ser Glu Ala Thr

1025 1030 1035

Arg Lys Met Phe Phe Tyr Ser Asn Leu Met Asn Met Phe Lys Arg

1040 1045 1050

Val Val Arg Leu Ala Asp Gly Ser Ile Val Val Arg Pro Val Ile

1055 1060 1065

Glu Thr Gly Arg Tyr Met Arg Lys Thr Ala Trp Asp Lys Lys Lys

1070 1075 1080

His Phe Ala Thr Val Arg Lys Val Leu Ser Tyr Pro Gln Asn Asn

1085 1090 1095

Ile Val Lys Lys Thr Glu Ile Gln Thr Gly Gly Phe Ser Lys Glu

1100 1105 1110

Ser Ile Leu Ala His Gly Asn Ser Asp Lys Leu Ile Pro Arg Lys

1115 1120 1125

Thr Lys Asp Ile Tyr Leu Asp Pro Lys Lys Tyr Gly Gly Phe Asp

1130 1135 1140

Ser Pro Ile Val Ala Tyr Ser Val Leu Val Val Ala Asp Ile Lys

1145 1150 1155

Lys Gly Lys Ala Gln Lys Leu Lys Thr Val Thr Glu Leu Leu Gly

1160 1165 1170

Ile Thr Ile Met Glu Arg Ser Arg Phe Glu Lys Asn Pro Ser Ala

1175 1180 1185

Phe Leu Glu Ser Lys Gly Tyr Leu Asn Ile Arg Asp Asp Lys Leu

1190 1195 1200

Met Ile Leu Pro Lys Tyr Ser Leu Phe Glu Leu Glu Asn Gly Arg

1205 1210 1215

Arg Arg Leu Leu Ala Ser Ala Gly Glu Leu Gln Lys Gly Asn Glu

1220 1225 1230

Leu Ala Leu Pro Thr Gln Phe Met Lys Phe Leu Tyr Leu Ala Ser

1235 1240 1245

Arg Tyr Asn Glu Ser Lys Gly Lys Pro Glu Glu Ile Glu Lys Lys

1250 1255 1260

Gln Glu Phe Val Asn Gln His Val Ser Tyr Phe Asp Asp Ile Leu

1265 1270 1275

Gln Leu Ile Asn Asp Phe Ser Lys Arg Val Ile Leu Ala Asp Ala

1280 1285 1290

Asn Leu Glu Lys Ile Asn Lys Leu Tyr Gln Asp Asn Lys Glu Asn

1295 1300 1305

Ile Pro Val Asp Glu Leu Ala Asn Asn Ile Ile Asn Leu Phe Thr

1310 1315 1320

Phe Thr Ser Leu Gly Ala Pro Ala Ala Phe Lys Phe Phe Asp Lys

1325 1330 1335

Ile Val Asp Arg Lys Arg Tyr Thr Ser Thr Lys Glu Val Leu Asn

1340 1345 1350

Ser Thr Leu Ile His Gln Ser Ile Thr Gly Leu Tyr Glu Thr Arg

1355 1360 1365

Ile Asp Leu Gly Lys Leu Gly Glu Asp

1370 1375

<210> SEQ ID NO 8

<211> LENGTH: 1368

<212> TYPE: PRT

<213> ORGANISM: Streptococcus pyogenes

<400> SEQUENCE: 8

Met Asp Lys Lys Tyr Ser Ile Gly Leu Asp Ile Gly Thr Asn Ser Val

1 5 10 15

Gly Trp Ala Val Ile Thr Asp Asp Tyr Lys Val Pro Ser Lys Lys Leu

20 25 30

Lys Gly Leu Gly Asn Thr Asp Arg His Gly Ile Lys Lys Asn Leu Ile

35 40 45

Gly Ala Leu Leu Phe Asp Ser Gly Glu Thr Ala Glu Ala Thr Arg Leu

50 55 60

Lys Arg Thr Ala Arg Arg Arg Tyr Thr Arg Arg Lys Asn Arg Ile Cys

65 70 75 80

Tyr Leu Gln Glu Ile Phe Ser Asn Glu Met Ala Lys Val Asp Asp Ser

85 90 95

Phe Phe His Arg Leu Glu Glu Ser Phe Leu Val Glu Glu Asp Lys Lys

100 105 110

His Glu Arg His Pro Ile Phe Gly Asn Ile Val Asp Glu Val Ala Tyr

115 120 125

His Glu Lys Tyr Pro Thr Ile Tyr His Leu Arg Lys Lys Leu Ala Asp

130 135 140

Ser Thr Asp Lys Val Asp Leu Arg Leu Ile Tyr Leu Ala Leu Ala His

145 150 155 160

Met Ile Lys Phe Arg Gly His Phe Leu Ile Glu Gly Asp Leu Asn Pro

165 170 175

Asp Asn Ser Asp Val Asp Lys Leu Phe Ile Gln Leu Val Gln Thr Tyr

180 185 190

Asn Gln Leu Phe Glu Glu Asn Pro Ile Asn Ala Ser Arg Val Asp Ala

195 200 205

Lys Ala Ile Leu Ser Ala Arg Leu Ser Lys Ser Arg Arg Leu Glu Asn

210 215 220

Leu Ile Ala Gln Leu Pro Gly Glu Lys Lys Asn Gly Leu Phe Gly Asn

225 230 235 240

Leu Ile Ala Leu Ser Leu Gly Leu Thr Pro Asn Phe Lys Ser Asn Phe

245 250 255

Asp Leu Ala Glu Asp Ala Lys Leu Gln Leu Ser Lys Asp Thr Tyr Asp

260 265 270

Asp Asp Leu Asp Asn Leu Leu Ala Gln Ile Gly Asp Gln Tyr Ala Asp

275 280 285

Leu Phe Leu Ala Ala Lys Asn Leu Ser Asp Ala Thr Leu Leu Ser Asp

290 295 300

Ile Leu Arg Val Asn Ser Glu Ile Thr Lys Ala Pro Leu Ser Ala Ser

305 310 315 320

Met Ile Lys Arg Tyr Asp Glu His His Gln Asp Leu Thr Leu Leu Lys

325 330 335

Ala Leu Val Arg Gln Gln Leu Pro Glu Lys Tyr Lys Glu Ile Phe Phe

340 345 350

Asp Gln Ser Lys Asn Gly Tyr Ala Gly Tyr Ile Asp Gly Gly Ala Ser

355 360 365

Gln Glu Glu Phe Tyr Lys Phe Ile Lys Pro Ile Leu Glu Lys Met Asp

370 375 380

Gly Thr Glu Glu Leu Leu Ala Lys Leu Asn Arg Glu Asp Leu Leu Arg

385 390 395 400

Lys Gln Arg Thr Phe Asp Asn Gly Ser Ile Pro Tyr Gln Ile His Leu

405 410 415

Gly Glu Leu His Ala Ile Leu Arg Arg Gln Glu Asp Phe Tyr Pro Phe

420 425 430

Leu Lys Asp Asn Arg Glu Lys Ile Glu Lys Ile Leu Thr Phe Arg Ile

435 440 445

Pro Tyr Tyr Val Gly Pro Leu Ala Arg Gly Asn Ser Arg Phe Ala Trp

450 455 460

Met Thr Arg Lys Ser Glu Glu Thr Ile Thr Pro Trp Asn Phe Glu Glu

465 470 475 480

Val Val Asp Lys Gly Ala Ser Ala Gln Ser Phe Ile Glu Arg Met Thr

485 490 495

Asn Phe Asp Lys Asn Leu Pro Asn Glu Lys Val Leu Pro Lys His Ser

500 505 510

Leu Leu Tyr Glu Tyr Phe Thr Val Tyr Asn Glu Leu Thr Lys Val Lys

515 520 525

Tyr Val Thr Glu Gly Met Arg Lys Pro Ala Phe Leu Ser Gly Glu Gln

530 535 540

Lys Lys Ala Ile Val Asp Leu Leu Phe Lys Thr Asn Arg Lys Val Thr

545 550 555 560

Val Lys Gln Leu Lys Glu Asp Tyr Phe Lys Lys Ile Glu Cys Phe Asp

565 570 575

Ser Val Glu Ile Ser Gly Val Glu Asp Arg Phe Asn Ala Ser Leu Gly

580 585 590

Thr Tyr His Asp Leu Leu Lys Ile Ile Lys Asp Lys Asp Phe Leu Asp

595 600 605

Asn Glu Glu Asn Glu Asp Ile Leu Glu Asp Ile Val Leu Thr Leu Thr

610 615 620

Leu Phe Glu Asp Arg Glu Met Ile Glu Glu Arg Leu Lys Thr Tyr Ala

625 630 635 640

His Leu Phe Asp Asp Lys Val Met Lys Gln Leu Lys Arg Arg Arg Tyr

645 650 655

Thr Gly Trp Gly Arg Leu Ser Arg Lys Leu Ile Asn Gly Ile Arg Asp

660 665 670

Lys Gln Ser Gly Lys Thr Ile Leu Asp Phe Leu Lys Ser Asp Gly Phe

675 680 685

Ala Asn Arg Asn Phe Met Gln Leu Ile His Asp Asp Ser Leu Thr Phe

690 695 700

Lys Glu Asp Ile Gln Lys Ala Gln Val Ser Gly Gln Gly Asp Ser Leu

705 710 715 720

His Glu His Ile Ala Asn Leu Ala Gly Ser Pro Ala Ile Lys Lys Gly

725 730 735

Ile Leu Gln Thr Val Lys Val Val Asp Glu Leu Val Lys Val Met Gly

740 745 750

Arg His Lys Pro Glu Asn Ile Val Ile Glu Met Ala Arg Glu Asn Gln

755 760 765

Thr Thr Gln Lys Gly Gln Lys Asn Ser Arg Glu Arg Met Lys Arg Ile

770 775 780

Glu Glu Gly Ile Lys Glu Leu Gly Ser Asp Ile Leu Lys Glu Tyr Pro

785 790 795 800

Val Glu Asn Thr Gln Leu Gln Asn Glu Lys Leu Tyr Leu Tyr Tyr Leu

805 810 815

Gln Asn Gly Arg Asp Met Tyr Val Asp Gln Glu Leu Asp Ile Asn Arg

820 825 830

Leu Ser Asp Tyr Asp Val Asp His Ile Val Pro Gln Ser Phe Leu Lys

835 840 845

Asp Asp Ser Ile Asp Asn Lys Val Leu Thr Arg Ser Asp Lys Asn Arg

850 855 860

Gly Lys Ser Asp Asn Val Pro Ser Glu Glu Val Val Lys Lys Met Lys

865 870 875 880

Asn Tyr Trp Arg Gln Leu Leu Asn Ala Lys Leu Ile Thr Gln Arg Lys

885 890 895

Phe Asp Asn Leu Thr Lys Ala Glu Arg Gly Gly Leu Ser Glu Leu Asp

900 905 910

Lys Val Gly Phe Ile Lys Arg Gln Leu Val Glu Thr Arg Gln Ile Thr

915 920 925

Lys His Val Ala Gln Ile Leu Asp Ser Arg Met Asn Thr Lys Tyr Asp

930 935 940

Glu Asn Asp Lys Leu Ile Arg Glu Val Arg Val Ile Thr Leu Lys Ser

945 950 955 960

Lys Leu Val Ser Asp Phe Arg Lys Asp Phe Gln Phe Tyr Lys Val Arg

965 970 975

Glu Ile Asn Asn Tyr His His Ala His Asp Ala Tyr Leu Asn Ala Val

980 985 990

Val Gly Thr Ala Leu Ile Lys Lys Tyr Pro Lys Leu Glu Ser Glu Phe

995 1000 1005

Val Tyr Gly Asp Tyr Lys Val Tyr Asp Val Arg Lys Met Ile Ala

1010 1015 1020

Lys Ser Glu Gln Glu Ile Gly Lys Ala Thr Ala Lys Tyr Phe Phe

1025 1030 1035

Tyr Ser Asn Ile Met Asn Phe Phe Lys Thr Glu Ile Thr Leu Ala

1040 1045 1050

Asn Gly Glu Ile Arg Lys Arg Pro Leu Ile Glu Thr Asn Gly Glu

1055 1060 1065

Thr Gly Glu Ile Val Trp Asp Lys Gly Arg Asp Phe Ala Thr Val

1070 1075 1080

Arg Lys Val Leu Ser Met Pro Gln Val Asn Ile Val Lys Lys Thr

1085 1090 1095

Glu Val Gln Thr Gly Gly Phe Ser Lys Glu Ser Ile Leu Pro Lys

1100 1105 1110

Arg Asn Ser Asp Lys Leu Ile Ala Arg Lys Lys Asp Trp Asp Pro

1115 1120 1125

Lys Lys Tyr Gly Gly Phe Asp Ser Pro Thr Val Ala Tyr Ser Val

1130 1135 1140

Leu Val Val Ala Lys Val Glu Lys Gly Lys Ser Lys Lys Leu Lys

1145 1150 1155

Ser Val Lys Glu Leu Leu Gly Ile Thr Ile Met Glu Arg Ser Ser

1160 1165 1170

Phe Glu Lys Asp Pro Ile Asp Phe Leu Glu Ala Lys Gly Tyr Lys

1175 1180 1185

Glu Val Arg Lys Asp Leu Ile Ile Lys Leu Pro Lys Tyr Ser Leu

1190 1195 1200

Phe Glu Leu Glu Asn Gly Arg Lys Arg Met Leu Ala Ser Ala Gly

1205 1210 1215

Glu Leu Gln Lys Gly Asn Glu Leu Ala Leu Pro Ser Lys Tyr Val

1220 1225 1230

Asn Phe Leu Tyr Leu Ala Ser His Tyr Glu Lys Leu Lys Gly Ser

1235 1240 1245

Pro Glu Asp Asn Glu Gln Lys Gln Leu Phe Val Glu Gln His Lys

1250 1255 1260

His Tyr Leu Asp Glu Ile Ile Glu Gln Ile Ser Glu Phe Ser Lys

1265 1270 1275

Arg Val Ile Leu Ala Asp Ala Asn Leu Asp Lys Val Leu Ser Ala

1280 1285 1290

Tyr Asn Lys His Arg Asp Lys Pro Ile Arg Glu Gln Ala Glu Asn

1295 1300 1305

Ile Ile His Leu Phe Thr Leu Thr Asn Leu Gly Ala Pro Ala Ala

1310 1315 1320

Phe Lys Tyr Phe Asp Thr Thr Ile Asp Arg Lys Arg Tyr Thr Ser

1325 1330 1335

Thr Lys Glu Val Leu Asp Ala Thr Leu Ile His Gln Ser Ile Thr

1340 1345 1350

Gly Leu Tyr Glu Thr Arg Ile Asp Leu Ser Gln Leu Gly Gly Asp

1355 1360 1365

<210> SEQ ID NO 9

<211> LENGTH: 131

<212> TYPE: PRT

<213> ORGANISM: Helicobacter hepaticus

<400> SEQUENCE: 9

Met Arg Ile Leu Gly Phe Asp Ile Gly Ile Thr Ser Ile Gly Trp Ala

1 5 10 15

Tyr Val Glu Ser Asn Glu Leu Lys Asp Cys Gly Val Arg Ile Phe Thr

20 25 30

Lys Ala Glu Asn Pro Lys Asn Gly Asp Ser Leu Ala Ala Pro Arg Arg

35 40 45

Glu Ala Arg Gly Ala Arg Arg Arg Leu Ala Arg Arg Lys Ala Arg Leu

50 55 60

Asn Ala Ile Lys Arg Leu Leu Cys Lys Glu Phe Glu Leu Asn Leu Asn

65 70 75 80

Asp Tyr Leu Ala Asn Asp Gly Glu Leu Pro Lys Ala Tyr Gln Thr Ser

85 90 95

Lys Asp Thr Lys Ser Pro Tyr Glu Leu Tyr Thr Ala Phe His Trp Ile

100 105 110

Ile Phe Ala Phe Cys Ser Ile Ala Ser Ser Leu Ser Asn Arg Gln Met

115 120 125

Leu Pro Ile

130

<210> SEQ ID NO 10

<211> LENGTH: 1059

<212> TYPE: PRT

<213> ORGANISM: Wolinella succinogenes

<400> SEQUENCE: 10

Met Ile Glu Arg Ile Leu Gly Val Asp Leu Gly Ile Ser Ser Leu Gly

1 5 10 15

Trp Ala Ile Val Glu Tyr Asp Lys Asp Asp Glu Ala Ala Asn Arg Ile

20 25 30

Ile Asp Cys Gly Val Arg Leu Phe Thr Ala Ala Glu Thr Pro Lys Lys

35 40 45

Lys Glu Ser Pro Asn Lys Ala Arg Arg Glu Ala Arg Gly Ile Arg Arg

50 55 60

Val Leu Asn Arg Arg Arg Val Arg Met Asn Met Ile Lys Lys Leu Phe

65 70 75 80

Leu Arg Ala Gly Leu Ile Gln Asp Val Asp Leu Asp Gly Glu Gly Gly

85 90 95

Met Phe Tyr Ser Lys Ala Asn Arg Ala Asp Val Trp Glu Leu Arg His

100 105 110

Asp Gly Leu Tyr Arg Leu Leu Lys Gly Asp Glu Leu Ala Arg Val Leu

115 120 125

Ile His Ile Ala Lys His Arg Gly Tyr Lys Phe Ile Gly Asp Asp Glu

130 135 140

Ala Asp Glu Glu Ser Gly Lys Val Lys Lys Ala Gly Val Val Leu Arg

145 150 155 160

Gln Asn Phe Glu Ala Ala Gly Cys Arg Thr Val Gly Glu Trp Leu Trp

165 170 175

Arg Glu Arg Gly Ala Asn Gly Lys Lys Arg Asn Lys His Gly Asp Tyr

180 185 190

Glu Ile Ser Ile His Arg Asp Leu Leu Val Glu Glu Val Glu Ala Ile

195 200 205

Phe Val Ala Gln Gln Glu Met Arg Ser Thr Ile Ala Thr Asp Ala Leu

210 215 220

Lys Ala Ala Tyr Arg Glu Ile Ala Phe Phe Val Arg Pro Met Gln Arg

225 230 235 240

Ile Glu Lys Met Val Gly His Cys Thr Tyr Phe Pro Glu Glu Arg Arg

245 250 255

Ala Pro Lys Ser Ala Pro Thr Ala Glu Lys Phe Ile Ala Ile Ser Lys

260 265 270

Phe Phe Ser Thr Val Ile Ile Asp Asn Glu Gly Trp Glu Gln Lys Ile

275 280 285

Ile Glu Arg Lys Thr Leu Glu Glu Leu Leu Asp Phe Ala Val Ser Arg

290 295 300

Glu Lys Val Glu Phe Arg His Leu Arg Lys Phe Leu Asp Leu Ser Asp

305 310 315 320

Asn Glu Ile Phe Lys Gly Leu His Tyr Lys Gly Lys Pro Lys Thr Ala

325 330 335

Lys Lys Arg Glu Ala Thr Leu Phe Asp Pro Asn Glu Pro Thr Glu Leu

340 345 350

Glu Phe Asp Lys Val Glu Ala Glu Lys Lys Ala Trp Ile Ser Leu Arg

355 360 365

Gly Ala Ala Lys Leu Arg Glu Ala Leu Gly Asn Glu Phe Tyr Gly Arg

370 375 380

Phe Val Ala Leu Gly Lys His Ala Asp Glu Ala Thr Lys Ile Leu Thr

385 390 395 400

Tyr Tyr Lys Asp Glu Gly Gln Lys Arg Arg Glu Leu Thr Lys Leu Pro

405 410 415

Leu Glu Ala Glu Met Val Glu Arg Leu Val Lys Ile Gly Phe Ser Asp

420 425 430

Phe Leu Lys Leu Ser Leu Lys Ala Ile Arg Asp Ile Leu Pro Ala Met

435 440 445

Glu Ser Gly Ala Arg Tyr Asp Glu Ala Val Leu Met Leu Gly Val Pro

450 455 460

His Lys Glu Lys Ser Ala Ile Leu Pro Pro Leu Asn Lys Thr Asp Ile

465 470 475 480

Asp Ile Leu Asn Pro Thr Val Ile Arg Ala Phe Ala Gln Phe Arg Lys

485 490 495

Val Ala Asn Ala Leu Val Arg Lys Tyr Gly Ala Phe Asp Arg Val His

500 505 510

Phe Glu Leu Ala Arg Glu Ile Asn Thr Lys Gly Glu Ile Glu Asp Ile

515 520 525

Lys Glu Ser Gln Arg Lys Asn Glu Lys Glu Arg Lys Glu Ala Ala Asp

530 535 540

Trp Ile Ala Glu Thr Ser Phe Gln Val Pro Leu Thr Arg Lys Asn Ile

545 550 555 560

Leu Lys Lys Arg Leu Tyr Ile Gln Gln Asp Gly Arg Cys Ala Tyr Thr

565 570 575

Gly Asp Val Ile Glu Leu Glu Arg Leu Phe Asp Glu Gly Tyr Cys Glu

580 585 590

Ile Asp His Ile Leu Pro Arg Ser Arg Ser Ala Asp Asp Ser Phe Ala

595 600 605

Asn Lys Val Leu Cys Leu Ala Arg Ala Asn Gln Gln Lys Thr Asp Arg

610 615 620

Thr Pro Tyr Glu Trp Phe Gly His Asp Ala Ala Arg Trp Asn Ala Phe

625 630 635 640

Glu Thr Arg Thr Ser Ala Pro Ser Asn Arg Val Arg Thr Gly Lys Gly

645 650 655

Lys Ile Asp Arg Leu Leu Lys Lys Asn Phe Asp Glu Asn Ser Glu Met

660 665 670

Ala Phe Lys Asp Arg Asn Leu Asn Asp Thr Arg Tyr Met Ala Arg Ala

675 680 685

Ile Lys Thr Tyr Cys Glu Gln Tyr Trp Val Phe Lys Asn Ser His Thr

690 695 700

Lys Ala Pro Val Gln Val Arg Ser Gly Lys Leu Thr Ser Val Leu Arg

705 710 715 720

Tyr Gln Trp Gly Leu Glu Ser Lys Asp Arg Glu Ser His Thr His His

725 730 735

Ala Val Asp Ala Ile Ile Ile Ala Phe Ser Thr Gln Gly Met Val Gln

740 745 750

Lys Leu Ser Glu Tyr Tyr Arg Phe Lys Glu Thr His Arg Glu Lys Glu

755 760 765

Arg Pro Lys Leu Ala Val Pro Leu Ala Asn Phe Arg Asp Ala Val Glu

770 775 780

Glu Ala Thr Arg Ile Glu Asn Thr Glu Thr Val Lys Glu Gly Val Glu

785 790 795 800

Val Lys Arg Leu Leu Ile Ser Arg Pro Pro Arg Ala Arg Val Thr Gly

805 810 815

Gln Ala His Glu Gln Thr Ala Lys Pro Tyr Pro Arg Ile Lys Gln Val

820 825 830

Lys Asn Lys Lys Lys Trp Arg Leu Ala Pro Ile Asp Glu Glu Lys Phe

835 840 845

Glu Ser Phe Lys Ala Asp Arg Val Ala Ser Ala Asn Gln Lys Asn Phe

850 855 860

Tyr Glu Thr Ser Thr Ile Pro Arg Val Asp Val Tyr His Lys Lys Gly

865 870 875 880

Lys Phe His Leu Val Pro Ile Tyr Leu His Glu Met Val Leu Asn Glu

885 890 895

Leu Pro Asn Leu Ser Leu Gly Thr Asn Pro Glu Ala Met Asp Glu Asn

900 905 910

Phe Phe Lys Phe Ser Ile Phe Lys Asp Asp Leu Ile Ser Ile Gln Thr

915 920 925

Gln Gly Thr Pro Lys Lys Pro Ala Lys Ile Ile Met Gly Tyr Phe Lys

930 935 940

Asn Met His Gly Ala Asn Met Val Leu Ser Ser Ile Asn Asn Ser Pro

945 950 955 960

Cys Glu Gly Phe Thr Cys Thr Pro Val Ser Met Asp Lys Lys His Lys

965 970 975

Asp Lys Cys Lys Leu Cys Pro Glu Glu Asn Arg Ile Ala Gly Arg Cys

980 985 990

Leu Gln Gly Phe Leu Asp Tyr Trp Ser Gln Glu Gly Leu Arg Pro Pro

995 1000 1005

Arg Lys Glu Phe Glu Cys Asp Gln Gly Val Lys Phe Ala Leu Asp

1010 1015 1020

Val Lys Lys Tyr Gln Ile Asp Pro Leu Gly Tyr Tyr Tyr Glu Val

1025 1030 1035

Lys Gln Glu Lys Arg Leu Gly Thr Ile Pro Gln Met Arg Ser Ala

1040 1045 1050

Lys Lys Leu Val Lys Lys

1055

<210> SEQ ID NO 11

<211> LENGTH: 1409

<212> TYPE: PRT

<213> ORGANISM: Wolinella succinogenes

<400> SEQUENCE: 11

Met Leu Val Ser Pro Ile Ser Val Asp Leu Gly Gly Lys Asn Thr Gly

1 5 10 15

Phe Phe Ser Phe Thr Asp Ser Leu Asp Asn Ser Gln Ser Gly Thr Val

20 25 30

Ile Tyr Asp Glu Ser Phe Val Leu Ser Gln Val Gly Arg Arg Ser Lys

35 40 45

Arg His Ser Lys Arg Asn Asn Leu Arg Asn Lys Leu Val Lys Arg Leu

50 55 60

Phe Leu Leu Ile Leu Gln Glu His His Gly Leu Ser Ile Asp Val Leu

65 70 75 80

Pro Asp Glu Ile Arg Gly Leu Phe Asn Lys Arg Gly Tyr Thr Tyr Ala

85 90 95

Gly Phe Glu Leu Asp Glu Lys Lys Lys Asp Ala Leu Glu Ser Asp Thr

100 105 110

Leu Lys Glu Phe Leu Ser Glu Lys Leu Gln Ser Ile Asp Arg Asp Ser

115 120 125

Asp Val Glu Asp Phe Leu Asn Gln Ile Ala Ser Asn Ala Glu Ser Phe

130 135 140

Lys Asp Tyr Lys Lys Gly Phe Glu Ala Val Phe Ala Ser Ala Thr His

145 150 155 160

Ser Pro Asn Lys Lys Leu Glu Leu Lys Asp Glu Leu Lys Ser Glu Tyr

165 170 175

Gly Glu Asn Ala Lys Glu Leu Leu Ala Gly Leu Arg Val Thr Lys Glu

180 185 190

Ile Leu Asp Glu Phe Asp Lys Gln Glu Asn Gln Gly Asn Leu Pro Arg

195 200 205

Ala Lys Tyr Phe Glu Glu Leu Gly Glu Tyr Ile Ala Thr Asn Glu Lys

210 215 220

Val Lys Ser Phe Phe Asp Ser Asn Ser Leu Lys Leu Thr Asp Met Thr

225 230 235 240

Lys Leu Ile Gly Asn Ile Ser Asn Tyr Gln Leu Lys Glu Leu Arg Arg

245 250 255

Tyr Phe Asn Asp Lys Glu Met Glu Lys Gly Asp Ile Trp Ile Pro Asn

260 265 270

Lys Leu His Lys Ile Thr Glu Arg Phe Val Arg Ser Trp His Pro Lys

275 280 285

Asn Asp Ala Asp Arg Gln Arg Arg Ala Glu Leu Met Lys Asp Leu Lys

290 295 300

Ser Lys Glu Ile Met Glu Leu Leu Thr Thr Thr Glu Pro Val Met Thr

305 310 315 320

Ile Pro Pro Tyr Asp Asp Met Asn Asn Arg Gly Ala Val Lys Cys Gln

325 330 335

Thr Leu Arg Leu Asn Glu Glu Tyr Leu Asp Lys His Leu Pro Asn Trp

340 345 350

Arg Asp Ile Ala Lys Arg Leu Asn His Gly Lys Phe Asn Asp Asp Leu

355 360 365

Ala Asp Ser Thr Val Lys Gly Tyr Ser Glu Asp Ser Thr Leu Leu His

370 375 380

Arg Leu Leu Asp Thr Ser Lys Glu Ile Asp Ile Tyr Glu Leu Arg Gly

385 390 395 400

Lys Lys Pro Asn Glu Leu Leu Val Lys Thr Leu Gly Gln Ser Asp Ala

405 410 415

Asn Arg Leu Tyr Gly Phe Ala Gln Asn Tyr Tyr Glu Leu Ile Arg Gln

420 425 430

Lys Val Arg Ala Gly Ile Trp Val Pro Val Lys Asn Lys Asp Asp Ser

435 440 445

Leu Asn Leu Glu Asp Asn Ser Asn Met Leu Lys Arg Cys Asn His Asn

450 455 460

Pro Pro His Lys Lys Asn Gln Ile His Asn Leu Val Ala Gly Ile Leu

465 470 475 480

Gly Val Lys Leu Asp Glu Ala Lys Phe Ala Glu Phe Glu Lys Glu Leu

485 490 495

Trp Ser Ala Lys Val Gly Asn Lys Lys Leu Ser Ala Tyr Cys Lys Asn

500 505 510

Ile Glu Glu Leu Arg Lys Thr His Gly Asn Thr Phe Lys Ile Asp Ile

515 520 525

Glu Glu Leu Arg Lys Lys Asp Pro Ala Glu Leu Ser Lys Glu Glu Lys

530 535 540

Ala Lys Leu Arg Leu Thr Asp Asp Val Ile Leu Asn Glu Trp Ser Gln

545 550 555 560

Lys Ile Ala Asn Phe Phe Asp Ile Asp Asp Lys His Arg Gln Arg Phe

565 570 575

Asn Asn Leu Phe Ser Met Ala Gln Leu His Thr Val Ile Asp Thr Pro

580 585 590

Arg Ser Gly Phe Ser Ser Thr Cys Lys Arg Cys Thr Ala Glu Asn Arg

595 600 605

Phe Arg Ser Glu Thr Ala Phe Tyr Asn Asp Glu Thr Gly Glu Phe His

610 615 620

Lys Lys Ala Thr Ala Thr Cys Gln Arg Leu Pro Ala Asp Thr Gln Arg

625 630 635 640

Pro Phe Ser Gly Lys Ile Glu Arg Tyr Ile Asp Lys Leu Gly Tyr Glu

645 650 655

Leu Ala Lys Ile Lys Ala Lys Glu Leu Glu Gly Met Glu Ala Lys Glu

660 665 670

Ile Lys Val Pro Ile Ile Leu Glu Gln Asn Ala Phe Glu Tyr Glu Glu

675 680 685

Ser Leu Arg Lys Ser Lys Thr Gly Ser Asn Asp Arg Val Ile Asn Ser

690 695 700

Lys Lys Asp Arg Asp Gly Lys Lys Leu Ala Lys Ala Lys Glu Asn Ala

705 710 715 720

Glu Asp Arg Leu Lys Asp Lys Asp Lys Arg Ile Lys Ala Phe Ser Ser

725 730 735

Gly Ile Cys Pro Tyr Cys Gly Asp Thr Ile Gly Asp Asp Gly Glu Ile

740 745 750

Asp His Ile Leu Pro Arg Ser His Thr Leu Lys Ile Tyr Gly Thr Val

755 760 765

Phe Asn Pro Glu Gly Asn Leu Ile Tyr Val His Gln Lys Cys Asn Gln

770 775 780

Ala Lys Ala Asp Ser Ile Tyr Lys Leu Ser Asp Ile Lys Ala Gly Val

785 790 795 800

Ser Ala Gln Trp Ile Glu Glu Gln Val Ala Asn Ile Lys Gly Tyr Lys

805 810 815

Thr Phe Ser Val Leu Ser Ala Glu Gln Gln Lys Ala Phe Arg Tyr Ala

820 825 830

Leu Phe Leu Gln Asn Asp Asn Glu Ala Tyr Lys Lys Val Val Asp Trp

835 840 845

Leu Arg Thr Asp Gln Ser Ala Arg Val Asn Gly Thr Gln Lys Tyr Leu

850 855 860

Ala Lys Lys Ile Gln Glu Lys Leu Thr Lys Met Leu Pro Asn Lys His

865 870 875 880

Leu Ser Phe Glu Phe Ile Leu Ala Asp Ala Thr Glu Val Ser Glu Leu

885 890 895

Arg Arg Gln Tyr Ala Arg Gln Asn Pro Leu Leu Ala Lys Ala Glu Lys

900 905 910

Gln Ala Pro Ser Ser His Ala Ile Asp Ala Val Met Ala Phe Val Ala

915 920 925

Arg Tyr Gln Lys Val Phe Lys Asp Gly Thr Pro Pro Asn Ala Asp Glu

930 935 940

Val Ala Lys Leu Ala Met Leu Asp Ser Trp Asn Pro Ala Ser Asn Glu

945 950 955 960

Pro Leu Thr Lys Gly Leu Ser Thr Asn Gln Lys Ile Glu Lys Met Ile

965 970 975

Lys Ser Gly Asp Tyr Gly Gln Lys Asn Met Arg Glu Val Phe Gly Lys

980 985 990

Ser Ile Phe Gly Glu Asn Ala Ile Gly Glu Arg Tyr Lys Pro Ile Val

995 1000 1005

Val Gln Glu Gly Gly Tyr Tyr Ile Gly Tyr Pro Ala Thr Val Lys

1010 1015 1020

Lys Gly Tyr Glu Leu Lys Asn Cys Lys Val Val Thr Ser Lys Asn

1025 1030 1035

Asp Ile Ala Lys Leu Glu Lys Ile Ile Lys Asn Gln Asp Leu Ile

1040 1045 1050

Ser Leu Lys Glu Asn Gln Tyr Ile Lys Ile Phe Ser Ile Asn Lys

1055 1060 1065

Gln Thr Ile Ser Glu Leu Ser Asn Arg Tyr Phe Asn Met Asn Tyr

1070 1075 1080

Lys Asn Leu Val Glu Arg Asp Lys Glu Ile Val Gly Leu Leu Glu

1085 1090 1095

Phe Ile Val Glu Asn Cys Arg Tyr Tyr Thr Lys Lys Val Asp Val

1100 1105 1110

Lys Phe Ala Pro Lys Tyr Ile His Glu Thr Lys Tyr Pro Phe Tyr

1115 1120 1125

Asp Asp Trp Arg Arg Phe Asp Glu Ala Trp Arg Tyr Leu Gln Glu

1130 1135 1140

Asn Gln Asn Lys Thr Ser Ser Lys Asp Arg Phe Val Ile Asp Lys

1145 1150 1155

Ser Ser Leu Asn Glu Tyr Tyr Gln Pro Asp Lys Asn Glu Tyr Lys

1160 1165 1170

Leu Asp Val Asp Thr Gln Pro Ile Trp Asp Asp Phe Cys Arg Trp

1175 1180 1185

Tyr Phe Leu Asp Arg Tyr Lys Thr Ala Asn Asp Lys Lys Ser Ile

1190 1195 1200

Arg Ile Lys Ala Arg Lys Thr Phe Ser Leu Leu Ala Glu Ser Gly

1205 1210 1215

Val Gln Gly Lys Val Phe Arg Ala Lys Arg Lys Ile Pro Thr Gly

1220 1225 1230

Tyr Ala Tyr Gln Ala Leu Pro Met Asp Asn Asn Val Ile Ala Gly

1235 1240 1245

Asp Tyr Ala Asn Ile Leu Leu Glu Ala Asn Ser Lys Thr Leu Ser

1250 1255 1260

Leu Val Pro Lys Ser Gly Ile Ser Ile Glu Lys Gln Leu Asp Lys

1265 1270 1275

Lys Leu Asp Val Ile Lys Lys Thr Asp Val Arg Gly Leu Ala Ile

1280 1285 1290

Asp Asn Asn Ser Phe Phe Asn Ala Asp Phe Asp Thr His Gly Ile

1295 1300 1305

Arg Leu Ile Val Glu Asn Thr Ser Val Lys Val Gly Asn Phe Pro

1310 1315 1320

Ile Ser Ala Ile Asp Lys Ser Ala Lys Arg Met Ile Phe Arg Ala

1325 1330 1335

Leu Phe Glu Lys Glu Lys Gly Lys Arg Lys Lys Lys Thr Thr Ile

1340 1345 1350

Ser Phe Lys Glu Ser Gly Pro Val Gln Asp Tyr Leu Lys Val Phe

1355 1360 1365

Leu Lys Lys Ile Val Lys Ile Gln Leu Arg Thr Asp Gly Ser Ile

1370 1375 1380

Ser Asn Ile Val Val Arg Lys Asn Ala Ala Asp Phe Thr Leu Ser

1385 1390 1395

Phe Arg Ser Glu His Ile Gln Lys Leu Leu Lys

1400 1405

<210> SEQ ID NO 12

<211> LENGTH: 1374

<212> TYPE: PRT

<213> ORGANISM: Fusobacterium nucleatum

<400> SEQUENCE: 12

Met Lys Lys Gln Lys Phe Ser Asp Tyr Tyr Leu Gly Phe Asp Ile Gly

1 5 10 15

Thr Asn Ser Val Gly Trp Cys Val Thr Asp Leu Asp Tyr Asn Val Leu

20 25 30

Arg Phe Asn Lys Lys Asp Met Trp Gly Ser Arg Leu Phe Asp Glu Ala

35 40 45

Lys Thr Ala Ala Glu Arg Arg Val Gln Arg Asn Ser Arg Arg Arg Leu

50 55 60

Lys Arg Arg Lys Trp Arg Leu Asn Leu Leu Glu Glu Ile Phe Ser Asp

65 70 75 80

Glu Ile Met Lys Ile Asp Ser Asn Phe Phe Arg Arg Leu Lys Glu Ser

85 90 95

Ser Leu Trp Leu Glu Asp Lys Asn Ser Lys Glu Lys Phe Thr Leu Phe

100 105 110

Asn Asp Asp Asn Tyr Lys Asp Tyr Asp Phe Tyr Lys Gln Tyr Pro Thr

115 120 125

Ile Phe His Leu Arg Asp Glu Leu Ile Lys Asn Pro Glu Lys Lys Asp

130 135 140

Ile Arg Leu Ile Tyr Leu Ala Leu His Ser Ile Phe Lys Ser Arg Gly

145 150 155 160

His Phe Leu Phe Glu Gly Gln Asn Leu Lys Glu Ile Lys Asn Phe Glu

165 170 175

Thr Leu Tyr Asn Asn Leu Ile Ser Phe Leu Glu Asp Asn Gly Ile Asn

180 185 190

Lys Ser Ile Asp Lys Asp Asn Ile Glu Lys Leu Glu Lys Ile Ile Cys

195 200 205

Asp Ser Gly Lys Gly Leu Lys Asp Lys Glu Lys Glu Phe Lys Gly Ile

210 215 220

Phe Asn Ser Asp Lys Gln Leu Val Ala Ile Phe Lys Leu Ser Val Gly

225 230 235 240

Ser Ser Val Ser Leu Asn Asp Leu Phe Asp Thr Asp Glu Tyr Lys Lys

245 250 255

Glu Glu Val Glu Lys Glu Lys Ile Ser Phe Arg Glu Gln Ile Tyr Glu

260 265 270

Asp Asp Lys Pro Ile Tyr Tyr Ser Ile Leu Gly Glu Lys Ile Glu Leu

275 280 285

Leu Asp Ile Ala Lys Ser Phe Tyr Asp Phe Met Val Leu Asn Asn Ile

290 295 300

Leu Ser Asp Ser Asn Tyr Ile Ser Glu Ala Lys Val Lys Leu Tyr Glu

305 310 315 320

Glu His Lys Lys Asp Leu Lys Asn Leu Lys Tyr Ile Ile Arg Lys Tyr

325 330 335

Asn Lys Glu Asn Tyr Asp Lys Leu Phe Lys Asp Lys Asn Glu Asn Asn

340 345 350

Tyr Pro Ala Tyr Ile Gly Leu Asn Lys Glu Lys Asp Lys Lys Glu Val

355 360 365

Val Glu Lys Ser Arg Leu Lys Ile Asp Asp Leu Ile Lys Val Ile Lys

370 375 380

Gly Tyr Leu Pro Lys Pro Glu Arg Ile Glu Glu Lys Asp Lys Thr Ile

385 390 395 400

Phe Asn Glu Ile Leu Asn Lys Ile Glu Leu Lys Thr Ile Leu Pro Lys

405 410 415

Gln Arg Ile Ser Asp Asn Gly Thr Leu Pro Tyr Gln Ile His Glu Val

420 425 430

Glu Leu Glu Lys Ile Leu Glu Asn Gln Ser Lys Tyr Tyr Asp Phe Leu

435 440 445

Asn Tyr Glu Glu Asn Gly Val Ser Thr Lys Asp Lys Leu Leu Lys Thr

450 455 460

Phe Lys Phe Arg Ile Pro Tyr Tyr Val Gly Pro Leu Asn Ser Tyr His

465 470 475 480

Lys Asp Lys Gly Gly Asn Ser Trp Ile Val Arg Lys Glu Glu Gly Lys

485 490 495

Ile Leu Pro Trp Asn Phe Glu Gln Lys Val Asp Ile Glu Lys Ser Ala

500 505 510

Glu Glu Phe Ile Lys Arg Met Thr Asn Lys Cys Thr Tyr Leu Asn Gly

515 520 525

Glu Asp Val Ile Pro Lys Asp Ser Phe Leu Tyr Ser Glu Tyr Ile Ile

530 535 540

Leu Asn Glu Leu Asn Lys Val Gln Val Asn Asp Glu Phe Leu Asn Glu

545 550 555 560

Glu Asn Lys Arg Lys Ile Ile Asp Glu Leu Phe Lys Glu Asn Lys Lys

565 570 575

Val Ser Glu Lys Lys Phe Lys Glu Tyr Leu Leu Val Asn Gln Ile Ala

580 585 590

Asn Arg Thr Val Glu Leu Lys Gly Ile Lys Asp Ser Phe Asn Ser Asn

595 600 605

Tyr Val Ser Tyr Ile Lys Phe Lys Asp Ile Phe Gly Glu Lys Leu Asn

610 615 620

Leu Asp Ile Tyr Lys Glu Ile Ser Glu Lys Ser Ile Leu Trp Lys Cys

625 630 635 640

Leu Tyr Gly Asp Asp Lys Lys Ile Phe Glu Lys Lys Ile Lys Asn Glu

645 650 655

Tyr Gly Asp Ile Leu Asn Lys Asp Glu Ile Lys Lys Ile Asn Ser Phe

660 665 670

Lys Phe Asn Thr Trp Gly Arg Leu Ser Glu Lys Leu Leu Thr Gly Ile

675 680 685

Glu Phe Ile Asn Leu Glu Thr Gly Glu Cys Tyr Ser Ser Val Met Glu

690 695 700

Ala Leu Arg Arg Thr Asn Tyr Asn Leu Met Glu Leu Leu Ser Ser Lys

705 710 715 720

Phe Thr Leu Gln Glu Ser Ile Asp Asn Glu Asn Lys Glu Met Asn Glu

725 730 735

Val Ser Tyr Arg Asp Leu Ile Glu Glu Ser Tyr Val Ser Pro Ser Leu

740 745 750

Lys Arg Ala Ile Leu Gln Thr Leu Lys Ile Tyr Glu Glu Ile Lys Lys

755 760 765

Ile Thr Gly Arg Val Pro Lys Lys Val Phe Ile Glu Met Ala Arg Gly

770 775 780

Gly Asp Glu Ser Met Lys Asn Lys Lys Ile Pro Ala Arg Gln Glu Gln

785 790 795 800

Leu Lys Lys Leu Tyr Asp Ser Cys Gly Asn Asp Ile Ala Asn Phe Ser

805 810 815

Ile Asp Ile Lys Glu Met Lys Asn Ser Leu Ser Ser Tyr Asp Asn Asn

820 825 830

Ser Leu Arg Gln Lys Lys Leu Tyr Leu Tyr Tyr Leu Gln Phe Gly Lys

835 840 845

Cys Met Tyr Thr Gly Arg Glu Ile Asp Leu Asp Arg Leu Leu Gln Asn

850 855 860

Asn Asp Thr Tyr Asp Ile Asp His Ile Tyr Pro Arg Ser Lys Val Ile

865 870 875 880

Lys Asp Asp Ser Phe Asp Asn Leu Val Leu Val Leu Lys Asn Glu Asn

885 890 895

Ala Glu Lys Ser Asn Glu Tyr Pro Val Lys Lys Glu Ile Gln Glu Lys

900 905 910

Met Lys Ser Phe Trp Arg Phe Leu Lys Glu Lys Asn Phe Ile Ser Asp

915 920 925

Glu Lys Tyr Lys Arg Leu Thr Gly Lys Asp Asp Phe Glu Leu Arg Gly

930 935 940

Phe Met Ala Arg Gln Leu Val Asn Val Arg Gln Thr Thr Lys Glu Val

945 950 955 960

Gly Lys Ile Leu Gln Gln Ile Glu Pro Glu Ile Lys Ile Val Tyr Ser

965 970 975

Lys Ala Glu Ile Ala Ser Ser Phe Arg Glu Met Phe Asp Phe Ile Lys

980 985 990

Val Arg Glu Leu Asn Asp Thr His His Ala Lys Asp Ala Tyr Leu Asn

995 1000 1005

Ile Val Ala Gly Asn Val Tyr Asn Thr Lys Phe Thr Glu Lys Pro

1010 1015 1020

Tyr Arg Tyr Leu Gln Glu Ile Lys Glu Asn Tyr Asp Val Lys Lys

1025 1030 1035

Ile Tyr Asn Tyr Asp Ile Lys Asn Ala Trp Asp Lys Glu Asn Ser

1040 1045 1050

Leu Glu Ile Val Lys Lys Asn Met Glu Lys Asn Thr Val Asn Ile

1055 1060 1065

Thr Arg Phe Ile Lys Glu Glu Lys Gly Glu Leu Phe Asn Leu Asn

1070 1075 1080

Pro Ile Lys Lys Gly Glu Thr Ser Asn Glu Ile Ile Ser Ile Lys

1085 1090 1095

Pro Lys Leu Tyr Asp Gly Lys Asp Asn Lys Leu Asn Glu Lys Tyr

1100 1105 1110

Gly Tyr Tyr Thr Ser Leu Lys Ala Ala Tyr Phe Ile Tyr Val Glu

1115 1120 1125

His Glu Lys Lys Asn Lys Lys Val Lys Thr Phe Glu Arg Ile Thr

1130 1135 1140

Arg Ile Asp Ser Thr Leu Ile Lys Asn Glu Lys Asn Leu Ile Lys

1145 1150 1155

Tyr Leu Val Ser Gln Lys Lys Leu Leu Asn Pro Lys Ile Ile Lys

1160 1165 1170

Lys Ile Tyr Lys Glu Gln Thr Leu Ile Ile Asp Ser Tyr Pro Tyr

1175 1180 1185

Thr Phe Thr Gly Val Asp Ser Asn Lys Lys Val Glu Leu Lys Asn

1190 1195 1200

Lys Lys Gln Leu Tyr Leu Glu Lys Lys Tyr Glu Gln Ile Leu Lys

1205 1210 1215

Asn Ala Leu Lys Phe Val Glu Asp Asn Gln Gly Glu Thr Glu Glu

1220 1225 1230

Asn Tyr Lys Phe Ile Tyr Leu Lys Lys Arg Asn Asn Asn Glu Lys

1235 1240 1245

Asn Glu Thr Ile Asp Ala Val Lys Glu Arg Tyr Asn Ile Glu Phe

1250 1255 1260

Asn Glu Met Tyr Asp Lys Phe Leu Glu Lys Leu Ser Ser Lys Asp

1265 1270 1275

Tyr Lys Asn Tyr Ile Asn Asn Lys Leu Tyr Thr Asn Phe Leu Asn

1280 1285 1290

Ser Lys Glu Lys Phe Lys Lys Leu Lys Leu Trp Glu Lys Ser Leu

1295 1300 1305

Ile Leu Arg Glu Phe Leu Lys Ile Phe Asn Lys Asn Thr Tyr Gly

1310 1315 1320

Lys Tyr Glu Ile Lys Asp Ser Gln Thr Lys Glu Lys Leu Phe Ser

1325 1330 1335

Phe Pro Glu Asp Thr Gly Arg Ile Arg Leu Gly Gln Ser Ser Leu

1340 1345 1350

Gly Asn Asn Lys Glu Leu Leu Glu Glu Ser Val Thr Gly Leu Phe

1355 1360 1365

Val Lys Lys Ile Lys Leu

1370

<210> SEQ ID NO 13

<211> LENGTH: 1084

<212> TYPE: PRT

<213> ORGANISM: Corynebacterium diphtheriae

<400> SEQUENCE: 13

Met Lys Tyr His Val Gly Ile Asp Val Gly Thr Phe Ser Val Gly Leu

1 5 10 15

Ala Ala Ile Glu Val Asp Asp Ala Gly Met Pro Ile Lys Thr Leu Ser

20 25 30

Leu Val Ser His Ile His Asp Ser Gly Leu Asp Pro Asp Glu Ile Lys

35 40 45

Ser Ala Val Thr Arg Leu Ala Ser Ser Gly Ile Ala Arg Arg Thr Arg

50 55 60

Arg Leu Tyr Arg Arg Lys Arg Arg Arg Leu Gln Gln Leu Asp Lys Phe

65 70 75 80

Ile Gln Arg Gln Gly Trp Pro Val Ile Glu Leu Glu Asp Tyr Ser Asp

85 90 95

Pro Leu Tyr Pro Trp Lys Val Arg Ala Glu Leu Ala Ala Ser Tyr Ile

100 105 110

Ala Asp Glu Lys Glu Arg Gly Glu Lys Leu Ser Val Ala Leu Arg His

115 120 125

Ile Ala Arg His Arg Gly Trp Arg Asn Pro Tyr Ala Lys Val Ser Ser

130 135 140

Leu Tyr Leu Pro Asp Gly Pro Ser Asp Ala Phe Lys Ala Ile Arg Glu

145 150 155 160

Glu Ile Lys Arg Ala Ser Gly Gln Pro Val Pro Glu Thr Ala Thr Val

165 170 175

Gly Gln Met Val Thr Leu Cys Glu Leu Gly Thr Leu Lys Leu Arg Gly

180 185 190

Glu Gly Gly Val Leu Ser Ala Arg Leu Gln Gln Ser Asp Tyr Ala Arg

195 200 205

Glu Ile Gln Glu Ile Cys Arg Met Gln Glu Ile Gly Gln Glu Leu Tyr

210 215 220

Arg Lys Ile Ile Asp Val Val Phe Ala Ala Glu Ser Pro Lys Gly Ser

225 230 235 240

Ala Ser Ser Arg Val Gly Lys Asp Pro Leu Gln Pro Gly Lys Asn Arg

245 250 255

Ala Leu Lys Ala Ser Asp Ala Phe Gln Arg Tyr Arg Ile Ala Ala Leu

260 265 270

Ile Gly Asn Leu Arg Val Arg Val Asp Gly Glu Lys Arg Ile Leu Ser

275 280 285

Val Glu Glu Lys Asn Leu Val Phe Asp His Leu Val Asn Leu Thr Pro

290 295 300

Lys Lys Glu Pro Glu Trp Val Thr Ile Ala Glu Ile Leu Gly Ile Asp

305 310 315 320

Arg Gly Gln Leu Ile Gly Thr Ala Thr Met Thr Asp Asp Gly Glu Arg

325 330 335

Ala Gly Ala Arg Pro Pro Thr His Asp Thr Asn Arg Ser Ile Val Asn

340 345 350

Ser Arg Ile Ala Pro Leu Val Asp Trp Trp Lys Thr Ala Ser Ala Leu

355 360 365

Glu Gln His Ala Met Val Lys Ala Leu Ser Asn Ala Glu Val Asp Asp

370 375 380

Phe Asp Ser Pro Glu Gly Ala Lys Val Gln Ala Phe Phe Ala Asp Leu

385 390 395 400

Asp Asp Asp Val His Ala Lys Leu Asp Ser Leu His Leu Pro Val Gly

405 410 415

Arg Ala Ala Tyr Ser Glu Asp Thr Leu Val Arg Leu Thr Arg Arg Met

420 425 430

Leu Ser Asp Gly Val Asp Leu Tyr Thr Ala Arg Leu Gln Glu Phe Gly

435 440 445

Ile Glu Pro Ser Trp Thr Pro Pro Thr Pro Arg Ile Gly Glu Pro Val

450 455 460

Gly Asn Pro Ala Val Asp Arg Val Leu Lys Thr Val Ser Arg Trp Leu

465 470 475 480

Glu Ser Ala Thr Lys Thr Trp Gly Ala Pro Glu Arg Val Ile Ile Glu

485 490 495

His Val Arg Glu Gly Phe Val Thr Glu Lys Arg Ala Arg Glu Met Asp

500 505 510

Gly Asp Met Arg Arg Arg Ala Ala Arg Asn Ala Lys Leu Phe Gln Glu

515 520 525

Met Gln Glu Lys Leu Asn Val Gln Gly Lys Pro Ser Arg Ala Asp Leu

530 535 540

Trp Arg Tyr Gln Ser Val Gln Arg Gln Asn Cys Gln Cys Ala Tyr Cys

545 550 555 560

Gly Ser Pro Ile Thr Phe Ser Asn Ser Glu Met Asp His Ile Val Pro

565 570 575

Arg Ala Gly Gln Gly Ser Thr Asn Thr Arg Glu Asn Leu Val Ala Val

580 585 590

Cys His Arg Cys Asn Gln Ser Lys Gly Asn Thr Pro Phe Ala Ile Trp

595 600 605

Ala Lys Asn Thr Ser Ile Glu Gly Val Ser Val Lys Glu Ala Val Glu

610 615 620

Arg Thr Arg His Trp Val Thr Asp Thr Gly Met Arg Ser Thr Asp Phe

625 630 635 640

Lys Lys Phe Thr Lys Ala Val Val Glu Arg Phe Gln Arg Ala Thr Met

645 650 655

Asp Glu Glu Ile Asp Ala Arg Ser Met Glu Ser Val Ala Trp Met Ala

660 665 670

Asn Glu Leu Arg Ser Arg Val Ala Gln His Phe Ala Ser His Gly Thr

675 680 685

Thr Val Arg Val Tyr Arg Gly Ser Leu Thr Ala Glu Ala Arg Arg Ala

690 695 700

Ser Gly Ile Ser Gly Lys Leu Lys Phe Phe Asp Gly Val Gly Lys Ser

705 710 715 720

Arg Leu Asp Arg Arg His His Ala Ile Asp Ala Ala Val Ile Ala Phe

725 730 735

Thr Ser Asp Tyr Val Ala Glu Thr Leu Ala Val Arg Ser Asn Leu Lys

740 745 750

Gln Ser Gln Ala His Arg Gln Glu Ala Pro Gln Trp Arg Glu Phe Thr

755 760 765

Gly Lys Asp Ala Glu His Arg Ala Ala Trp Arg Val Trp Cys Gln Lys

770 775 780

Met Glu Lys Leu Ser Ala Leu Leu Thr Glu Asp Leu Arg Asp Asp Arg

785 790 795 800

Val Val Val Met Ser Asn Val Arg Leu Arg Leu Gly Asn Gly Ser Ala

805 810 815

His Lys Glu Thr Ile Gly Lys Leu Ser Lys Val Lys Leu Ser Ser Gln

820 825 830

Leu Ser Val Ser Asp Ile Asp Lys Ala Ser Ser Glu Ala Leu Trp Cys

835 840 845

Ala Leu Thr Arg Glu Pro Gly Phe Asp Pro Lys Glu Gly Leu Pro Ala

850 855 860

Asn Pro Glu Arg His Ile Arg Val Asn Gly Thr His Val Tyr Ala Gly

865 870 875 880

Asp Asn Ile Gly Leu Phe Pro Val Ser Ala Gly Ser Ile Ala Leu Arg

885 890 895

Gly Gly Tyr Ala Glu Leu Gly Ser Ser Phe His His Ala Arg Val Tyr

900 905 910

Lys Ile Thr Ser Gly Lys Lys Pro Ala Phe Ala Met Leu Arg Val Tyr

915 920 925

Thr Ile Asp Leu Leu Pro Tyr Arg Asn Gln Asp Leu Phe Ser Val Glu

930 935 940

Leu Lys Pro Gln Thr Met Ser Met Arg Gln Ala Glu Lys Lys Leu Arg

945 950 955 960

Asp Ala Leu Ala Thr Gly Asn Ala Glu Tyr Leu Gly Trp Leu Val Val

965 970 975

Asp Asp Glu Leu Val Val Asp Thr Ser Lys Ile Ala Thr Asp Gln Val

980 985 990

Lys Ala Val Glu Ala Glu Leu Gly Thr Ile Arg Arg Trp Arg Val Asp

995 1000 1005

Gly Phe Phe Ser Pro Ser Lys Leu Arg Leu Arg Pro Leu Gln Met

1010 1015 1020

Ser Lys Glu Gly Ile Lys Lys Glu Ser Ala Pro Glu Leu Ser Lys

1025 1030 1035

Ile Ile Asp Arg Pro Gly Trp Leu Pro Ala Val Asn Lys Leu Phe

1040 1045 1050

Ser Asp Gly Asn Val Thr Val Val Arg Arg Asp Ser Leu Gly Arg

1055 1060 1065

Val Arg Leu Glu Ser Thr Ala His Leu Pro Val Thr Trp Lys Val

1070 1075 1080

Gln

<210> SEQ ID NO 14

<211> LENGTH: 1395

<212> TYPE: PRT

<213> ORGANISM: Treponema denticola

<400> SEQUENCE: 14

Met Lys Lys Glu Ile Lys Asp Tyr Phe Leu Gly Leu Asp Val Gly Thr

1 5 10 15

Gly Ser Val Gly Trp Ala Val Thr Asp Thr Asp Tyr Lys Leu Leu Lys

20 25 30

Ala Asn Arg Lys Asp Leu Trp Gly Met Arg Cys Phe Glu Thr Ala Glu

35 40 45

Thr Ala Glu Val Arg Arg Leu His Arg Gly Ala Arg Arg Arg Ile Glu

50 55 60

Arg Arg Lys Lys Arg Ile Lys Leu Leu Gln Glu Leu Phe Ser Gln Glu

65 70 75 80

Ile Ala Lys Thr Asp Glu Gly Phe Phe Gln Arg Met Lys Glu Ser Pro

85 90 95

Phe Tyr Ala Glu Asp Lys Thr Ile Leu Gln Glu Asn Thr Leu Phe Asn

100 105 110

Asp Lys Asp Phe Ala Asp Lys Thr Tyr His Lys Ala Tyr Pro Thr Ile

115 120 125

Asn His Leu Ile Lys Ala Trp Ile Glu Asn Lys Val Lys Pro Asp Pro

130 135 140

Arg Leu Leu Tyr Leu Ala Cys His Asn Ile Ile Lys Lys Arg Gly His

145 150 155 160

Phe Leu Phe Glu Gly Asp Phe Asp Ser Glu Asn Gln Phe Asp Thr Ser

165 170 175

Ile Gln Ala Leu Phe Glu Tyr Leu Arg Glu Asp Met Glu Val Asp Ile

180 185 190

Asp Ala Asp Ser Gln Lys Val Lys Glu Ile Leu Lys Asp Ser Ser Leu

195 200 205

Lys Asn Ser Glu Lys Gln Ser Arg Leu Asn Lys Ile Leu Gly Leu Lys

210 215 220

Pro Ser Asp Lys Gln Lys Lys Ala Ile Thr Asn Leu Ile Ser Gly Asn

225 230 235 240

Lys Ile Asn Phe Ala Asp Leu Tyr Asp Asn Pro Asp Leu Lys Asp Ala

245 250 255

Glu Lys Asn Ser Ile Ser Phe Ser Lys Asp Asp Phe Asp Ala Leu Ser

260 265 270

Asp Asp Leu Ala Ser Ile Leu Gly Asp Ser Phe Glu Leu Leu Leu Lys

275 280 285

Ala Lys Ala Val Tyr Asn Cys Ser Val Leu Ser Lys Val Ile Gly Asp

290 295 300

Glu Gln Tyr Leu Ser Phe Ala Lys Val Lys Ile Tyr Glu Lys His Lys

305 310 315 320

Thr Asp Leu Thr Lys Leu Lys Asn Val Ile Lys Lys His Phe Pro Lys

325 330 335

Asp Tyr Lys Lys Val Phe Gly Tyr Asn Lys Asn Glu Lys Asn Asn Asn

340 345 350

Asn Tyr Ser Gly Tyr Val Gly Val Cys Lys Thr Lys Ser Lys Lys Leu

355 360 365

Ile Ile Asn Asn Ser Val Asn Gln Glu Asp Phe Tyr Lys Phe Leu Lys

370 375 380

Thr Ile Leu Ser Ala Lys Ser Glu Ile Lys Glu Val Asn Asp Ile Leu

385 390 395 400

Thr Glu Ile Glu Thr Gly Thr Phe Leu Pro Lys Gln Ile Ser Lys Ser

405 410 415

Asn Ala Glu Ile Pro Tyr Gln Leu Arg Lys Met Glu Leu Glu Lys Ile

420 425 430

Leu Ser Asn Ala Glu Lys His Phe Ser Phe Leu Lys Gln Lys Asp Glu

435 440 445

Lys Gly Leu Ser His Ser Glu Lys Ile Ile Met Leu Leu Thr Phe Lys

450 455 460

Ile Pro Tyr Tyr Ile Gly Pro Ile Asn Asp Asn His Lys Lys Phe Phe

465 470 475 480

Pro Asp Arg Cys Trp Val Val Lys Lys Glu Lys Ser Pro Ser Gly Lys

485 490 495

Thr Thr Pro Trp Asn Phe Phe Asp His Ile Asp Lys Glu Lys Thr Ala

500 505 510

Glu Ala Phe Ile Thr Ser Arg Thr Asn Phe Cys Thr Tyr Leu Val Gly

515 520 525

Glu Ser Val Leu Pro Lys Ser Ser Leu Leu Tyr Ser Glu Tyr Thr Val

530 535 540

Leu Asn Glu Ile Asn Asn Leu Gln Ile Ile Ile Asp Gly Lys Asn Ile

545 550 555 560

Cys Asp Ile Lys Leu Lys Gln Lys Ile Tyr Glu Asp Leu Phe Lys Lys

565 570 575

Tyr Lys Lys Ile Thr Gln Lys Gln Ile Ser Thr Phe Ile Lys His Glu

580 585 590

Gly Ile Cys Asn Lys Thr Asp Glu Val Ile Ile Leu Gly Ile Asp Lys

595 600 605

Glu Cys Thr Ser Ser Leu Lys Ser Tyr Ile Glu Leu Lys Asn Ile Phe

610 615 620

Gly Lys Gln Val Asp Glu Ile Ser Thr Lys Asn Met Leu Glu Glu Ile

625 630 635 640

Ile Arg Trp Ala Thr Ile Tyr Asp Glu Gly Glu Gly Lys Thr Ile Leu

645 650 655

Lys Thr Lys Ile Lys Ala Glu Tyr Gly Lys Tyr Cys Ser Asp Glu Gln

660 665 670

Ile Lys Lys Ile Leu Asn Leu Lys Phe Ser Gly Trp Gly Arg Leu Ser

675 680 685

Arg Lys Phe Leu Glu Thr Val Thr Ser Glu Met Pro Gly Phe Ser Glu

690 695 700

Pro Val Asn Ile Ile Thr Ala Met Arg Glu Thr Gln Asn Asn Leu Met

705 710 715 720

Glu Leu Leu Ser Ser Glu Phe Thr Phe Thr Glu Asn Ile Lys Lys Ile

725 730 735

Asn Ser Gly Phe Glu Asp Ala Glu Lys Gln Phe Ser Tyr Asp Gly Leu

740 745 750

Val Lys Pro Leu Phe Leu Ser Pro Ser Val Lys Lys Met Leu Trp Gln

755 760 765

Thr Leu Lys Leu Val Lys Glu Ile Ser His Ile Thr Gln Ala Pro Pro

770 775 780

Lys Lys Ile Phe Ile Glu Met Ala Lys Gly Ala Glu Leu Glu Pro Ala

785 790 795 800

Arg Thr Lys Thr Arg Leu Lys Ile Leu Gln Asp Leu Tyr Asn Asn Cys

805 810 815

Lys Asn Asp Ala Asp Ala Phe Ser Ser Glu Ile Lys Asp Leu Ser Gly

820 825 830

Lys Ile Glu Asn Glu Asp Asn Leu Arg Leu Arg Ser Asp Lys Leu Tyr

835 840 845

Leu Tyr Tyr Thr Gln Leu Gly Lys Cys Met Tyr Cys Gly Lys Pro Ile

850 855 860

Glu Ile Gly His Val Phe Asp Thr Ser Asn Tyr Asp Ile Asp His Ile

865 870 875 880

Tyr Pro Gln Ser Lys Ile Lys Asp Asp Ser Ile Ser Asn Arg Val Leu

885 890 895

Val Cys Ser Ser Cys Asn Lys Asn Lys Glu Asp Lys Tyr Pro Leu Lys

900 905 910

Ser Glu Ile Gln Ser Lys Gln Arg Gly Phe Trp Asn Phe Leu Gln Arg

915 920 925

Asn Asn Phe Ile Ser Leu Glu Lys Leu Asn Arg Leu Thr Arg Ala Thr

930 935 940

Pro Ile Ser Asp Asp Glu Thr Ala Lys Phe Ile Ala Arg Gln Leu Val

945 950 955 960

Glu Thr Arg Gln Ala Thr Lys Val Ala Ala Lys Val Leu Glu Lys Met

965 970 975

Phe Pro Glu Thr Lys Ile Val Tyr Ser Lys Ala Glu Thr Val Ser Met

980 985 990

Phe Arg Asn Lys Phe Asp Ile Val Lys Cys Arg Glu Ile Asn Asp Phe

995 1000 1005

His His Ala His Asp Ala Tyr Leu Asn Ile Val Val Gly Asn Val

1010 1015 1020

Tyr Asn Thr Lys Phe Thr Asn Asn Pro Trp Asn Phe Ile Lys Glu

1025 1030 1035

Lys Arg Asp Asn Pro Lys Ile Ala Asp Thr Tyr Asn Tyr Tyr Lys

1040 1045 1050

Val Phe Asp Tyr Asp Val Lys Arg Asn Asn Ile Thr Ala Trp Glu

1055 1060 1065

Lys Gly Lys Thr Ile Ile Thr Val Lys Asp Met Leu Lys Arg Asn

1070 1075 1080

Thr Pro Ile Tyr Thr Arg Gln Ala Ala Cys Lys Lys Gly Glu Leu

1085 1090 1095

Phe Asn Gln Thr Ile Met Lys Lys Gly Leu Gly Gln His Pro Leu

1100 1105 1110

Lys Lys Glu Gly Pro Phe Ser Asn Ile Ser Lys Tyr Gly Gly Tyr

1115 1120 1125

Asn Lys Val Ser Ala Ala Tyr Tyr Thr Leu Ile Glu Tyr Glu Glu

1130 1135 1140

Lys Gly Asn Lys Ile Arg Ser Leu Glu Thr Ile Pro Leu Tyr Leu

1145 1150 1155

Val Lys Asp Ile Gln Lys Asp Gln Asp Val Leu Lys Ser Tyr Leu

1160 1165 1170

Thr Asp Leu Leu Gly Lys Lys Glu Phe Lys Ile Leu Val Pro Lys

1175 1180 1185

Ile Lys Ile Asn Ser Leu Leu Lys Ile Asn Gly Phe Pro Cys His

1190 1195 1200

Ile Thr Gly Lys Thr Asn Asp Ser Phe Leu Leu Arg Pro Ala Val

1205 1210 1215

Gln Phe Cys Cys Ser Asn Asn Glu Val Leu Tyr Phe Lys Lys Ile

1220 1225 1230

Ile Arg Phe Ser Glu Ile Arg Ser Gln Arg Glu Lys Ile Gly Lys

1235 1240 1245

Thr Ile Ser Pro Tyr Glu Asp Leu Ser Phe Arg Ser Tyr Ile Lys

1250 1255 1260

Glu Asn Leu Trp Lys Lys Thr Lys Asn Asp Glu Ile Gly Glu Lys

1265 1270 1275

Glu Phe Tyr Asp Leu Leu Gln Lys Lys Asn Leu Glu Ile Tyr Asp

1280 1285 1290

Met Leu Leu Thr Lys His Lys Asp Thr Ile Tyr Lys Lys Arg Pro

1295 1300 1305

Asn Ser Ala Thr Ile Asp Ile Leu Val Lys Gly Lys Glu Lys Phe

1310 1315 1320

Lys Ser Leu Ile Ile Glu Asn Gln Phe Glu Val Ile Leu Glu Ile

1325 1330 1335

Leu Lys Leu Phe Ser Ala Thr Arg Asn Val Ser Asp Leu Gln His

1340 1345 1350

Ile Gly Gly Ser Lys Tyr Ser Gly Val Ala Lys Ile Gly Asn Lys

1355 1360 1365

Ile Ser Ser Leu Asp Asn Cys Ile Leu Ile Tyr Gln Ser Ile Thr

1370 1375 1380

Gly Ile Phe Glu Lys Arg Ile Asp Leu Leu Lys Val

1385 1390 1395

<210> SEQ ID NO 15

<211> LENGTH: 1334

<212> TYPE: PRT

<213> ORGANISM: Listeria monocytogenes

<400> SEQUENCE: 15

Met Lys Asn Pro Tyr Thr Ile Gly Leu Asp Ile Gly Thr Asn Ser Val

1 5 10 15

Gly Trp Ala Val Leu Thr Asn Gln Tyr Asp Leu Val Lys Arg Lys Met

20 25 30

Lys Val Ala Gly Asn Ser Asp Lys Lys Gln Ile Lys Lys Asn Phe Trp

35 40 45

Gly Val Arg Leu Phe Asp Asp Gly Gln Thr Ala Val Asp Arg Arg Met

50 55 60

Asn Arg Thr Ala Arg Arg Arg Ile Glu Arg Arg Arg Asn Arg Ile Ser

65 70 75 80

Tyr Leu Gln Glu Ile Phe Ala Val Glu Met Ala Asn Ile Asp Ala Asn

85 90 95

Phe Phe Cys Arg Leu Asn Asp Ser Phe Tyr Val Asp Ser Glu Lys Arg

100 105 110

Asn Ser Arg His Pro Phe Phe Ala Thr Ile Glu Glu Glu Val Ala Tyr

115 120 125

His Asp Asn Tyr Arg Thr Ile Tyr His Leu Arg Glu Lys Leu Val Asn

130 135 140

Ser Ser Glu Lys Ala Asp Leu Arg Leu Val Tyr Leu Ala Leu Ala His

145 150 155 160

Ile Ile Lys Tyr Arg Gly Asn Phe Leu Ile Glu Gly Ala Leu Asp Thr

165 170 175

Lys Asn Thr Ser Val Asp Glu Val Tyr Lys Gln Phe Ile Glu Thr Tyr

180 185 190

Asn Gln Val Phe Met Ser Asn Ile Glu Glu Gly Ala Leu Ala Lys Val

195 200 205

Glu Glu Asn Ile Glu Val Ala Asn Ile Leu Ala Gly Lys Phe Thr Arg

210 215 220

Arg Glu Lys Phe Glu Arg Ile Leu Gln Leu Tyr Pro Gly Glu Lys Ser

225 230 235 240

Thr Gly Met Phe Ala Gln Phe Ile Ser Leu Ile Val Gly Ser Lys Gly

245 250 255

Asn Phe Gln Lys Val Phe Asp Leu Ile Glu Lys Thr Asp Ile Glu Cys

260 265 270

Ala Lys Asp Ser Tyr Glu Glu Asp Leu Glu Thr Leu Leu Ala Ile Ile

275 280 285

Gly Asp Glu Tyr Ala Glu Leu Phe Val Ala Ala Lys Asn Thr Tyr Asn

290 295 300

Ala Val Val Leu Ser Ser Ile Ile Thr Val Thr Asp Thr Glu Thr Asn

305 310 315 320

Ala Lys Leu Ser Ala Ser Met Ile Glu Arg Phe Asp Ala His Glu Lys

325 330 335

Asp Leu Val Glu Leu Lys Ala Phe Ile Lys Leu Asn Leu Pro Lys Gln

340 345 350

Tyr Glu Glu Ile Phe Ser Asn Ala Ala Ile Asp Gly Tyr Ala Gly Tyr

355 360 365

Ile Asp Gly Lys Thr Lys Gln Val Asp Phe Tyr Lys Tyr Leu Lys Thr

370 375 380

Ile Leu Glu Asn Ile Glu Gly Ser Asp Tyr Phe Ile Ala Lys Ile Glu

385 390 395 400

Glu Glu Asn Phe Leu Arg Lys Gln Arg Thr Phe Asp Asn Gly Ala Ile

405 410 415

Pro His Gln Leu His Leu Glu Glu Leu Glu Ala Ile Ile His Gln Gln

420 425 430

Ala Lys Tyr Tyr Pro Phe Leu Lys Glu Asp Tyr Asp Lys Ile Lys Ser

435 440 445

Leu Val Thr Phe Arg Ile Pro Tyr Phe Val Gly Pro Leu Ala Asn Gly

450 455 460

Gln Ser Glu Phe Ala Trp Leu Thr Arg Lys Ala Asp Gly Glu Ile Arg

465 470 475 480

Pro Trp Asn Ile Glu Glu Lys Val Asp Phe Gly Lys Ser Ala Val Asp

485 490 495

Phe Ile Glu Lys Met Thr Asn Lys Asp Thr Tyr Leu Pro Lys Glu Asn

500 505 510

Val Leu Pro Lys His Ser Leu Cys Tyr Gln Lys Tyr Met Val Tyr Asn

515 520 525

Glu Leu Thr Lys Ile Arg Tyr Ile Asp Asp Gln Gly Lys Thr Asn Tyr

530 535 540

Phe Ser Gly Arg Glu Lys Gln Gln Val Phe Asn Asp Leu Phe Lys Gln

545 550 555 560

Lys Arg Lys Val Lys Lys Lys Asp Leu Glu Leu Phe Leu Arg Asn Ile

565 570 575

Asn His Ile Glu Ser Pro Thr Ile Glu Gly Leu Glu Asp Ser Phe Asn

580 585 590

Ala Ser Tyr Ala Thr Tyr His Asp Leu Leu Lys Val Gly Met Lys Gln

595 600 605

Glu Ile Leu Asp Asn Pro Leu Asn Thr Glu Met Leu Glu Asp Ile Val

610 615 620

Lys Ile Leu Thr Val Phe Glu Asp Lys Pro Met Ile Lys Glu Gln Leu

625 630 635 640

Gln Gln Phe Ser Asp Val Leu Asp Gly Gly Val Leu Lys Lys Leu Glu

645 650 655

Arg Arg His Tyr Thr Gly Trp Gly Arg Leu Ser Ala Lys Leu Leu Val

660 665 670

Gly Ile Arg Glu Lys Gln Ser His Leu Thr Ile Leu Asp Tyr Leu Met

675 680 685

Asn Asp Asp Gly Leu Asn Arg Asn Leu Met Gln Leu Ile Asn Asp Ser

690 695 700

Asn Leu Ser Phe Lys Ser Ile Ile Glu Lys Glu Gln Val Ser Thr Thr

705 710 715 720

Asp Lys Asp Leu Gln Ser Ile Val Ala Glu Leu Ala Gly Ser Pro Ala

725 730 735

Ile Lys Lys Gly Ile Leu Gln Ser Leu Lys Ile Val Asp Glu Leu Val

740 745 750

Ser Ile Met Gly Tyr Pro Pro Gln Thr Ile Val Val Glu Met Ala Arg

755 760 765

Glu Asn Gln Thr Thr Gly Lys Gly Lys Asn Asn Ser Lys Pro Arg Tyr

770 775 780

Lys Ser Leu Glu Lys Ala Ile Lys Glu Phe Gly Ser Gln Ile Leu Lys

785 790 795 800

Glu His Pro Thr Asp Asn Gln Glu Leu Lys Asn Asn Arg Leu Tyr Leu

805 810 815

Tyr Tyr Leu Gln Asn Gly Lys Asp Met Tyr Thr Gly Gln Glu Leu Asp

820 825 830

Ile His Asn Leu Ser Asn Tyr Asp Ile Asp His Ile Val Pro Gln Ser

835 840 845

Phe Ile Thr Asp Asn Ser Ile Asp Asn Leu Val Leu Thr Ser Ser Ala

850 855 860

Gly Asn Arg Glu Lys Gly Gly Asp Val Pro Pro Leu Glu Ile Val Arg

865 870 875 880

Lys Arg Lys Val Phe Trp Glu Lys Leu Tyr Gln Gly Asn Leu Met Ser

885 890 895

Lys Arg Lys Phe Asp Tyr Leu Thr Lys Ala Glu Arg Gly Gly Leu Thr

900 905 910

Glu Ala Asp Lys Ala Arg Phe Ile His Arg Gln Leu Val Glu Thr Arg

915 920 925

Gln Ile Thr Lys Asn Val Ala Asn Ile Leu Tyr Gln Arg Phe Asn Lys

930 935 940

Glu Thr Asp Asn His Gly Asn Thr Met Glu Gln Val Arg Ile Val Thr

945 950 955 960

Leu Lys Ser Ala Leu Val Ser Gln Phe Arg Lys Gln Phe Gln Leu Tyr

965 970 975

Lys Val Arg Glu Val Asn Gly Tyr His His Ala His Asp Ala Tyr Leu

980 985 990

Asn Gly Val Val Ala Asn Thr Leu Leu Lys Val Tyr Pro Gln Leu Glu

995 1000 1005

Pro Glu Phe Val Tyr Gly Glu Tyr His Gln Phe Asp Trp Phe Lys

1010 1015 1020

Ala Asn Lys Ala Thr Ala Lys Lys Gln Phe Tyr Thr Asn Ile Met

1025 1030 1035

Leu Phe Phe Ala Gln Lys Glu Arg Ile Ile Asp Glu Asn Gly Glu

1040 1045 1050

Ile Leu Trp Asp Lys Lys Tyr Leu Glu Thr Ile Lys Lys Val Leu

1055 1060 1065

Asp Tyr Arg Gln Met Asn Ile Val Lys Lys Thr Glu Ile Gln Lys

1070 1075 1080

Gly Glu Phe Ser Lys Ala Thr Ile Lys Pro Lys Gly Asn Ser Ser

1085 1090 1095

Lys Leu Ile Pro Arg Lys Glu Asn Trp Asp Pro Met Lys Tyr Gly

1100 1105 1110

Gly Leu Asp Ser Pro Asn Met Ala Tyr Ala Val Ile Ile Glu His

1115 1120 1125

Ala Lys Gly Lys Lys Lys Ile Val Ile Glu Lys Lys Leu Ile Gln

1130 1135 1140

Ile Asn Ile Met Glu Arg Lys Met Phe Glu Lys Asp Glu Glu Ala

1145 1150 1155

Phe Leu Glu Glu Lys Gly Tyr Arg His Pro Lys Val Leu Thr Lys

1160 1165 1170

Leu Pro Lys Tyr Thr Leu Tyr Glu Cys Glu Lys Gly Arg Arg Arg

1175 1180 1185

Met Leu Ala Ser Ala Asn Glu Ala Gln Lys Gly Asn Gln Leu Val

1190 1195 1200

Leu Ser Asn His Leu Val Ser Leu Leu Tyr His Ala Lys Asn Cys

1205 1210 1215

Glu Ala Ser Asp Gly Lys Ser Leu Lys Tyr Ile Glu Ala His Arg

1220 1225 1230

Glu Thr Phe Ser Glu Leu Leu Ala Gln Val Ser Glu Phe Ala Thr

1235 1240 1245

Arg Tyr Thr Leu Ala Asp Ala Asn Leu Ser Lys Ile Asn Asn Leu

1250 1255 1260

Phe Glu Gln Asn Lys Glu Gly Asp Ile Lys Ala Ile Ala Gln Ser

1265 1270 1275

Phe Val Asp Leu Met Ala Phe Asn Ala Met Gly Ala Pro Ala Ser

1280 1285 1290

Phe Lys Phe Phe Glu Ala Thr Ile Asp Arg Lys Arg Tyr Thr Asn

1295 1300 1305

Leu Lys Glu Leu Leu Ser Ser Thr Ile Ile Tyr Gln Ser Ile Thr

1310 1315 1320

Gly Leu Tyr Glu Ser Arg Lys Arg Leu Asp Asp

1325 1330

<210> SEQ ID NO 16

<211> LENGTH: 1236

<212> TYPE: PRT

<213> ORGANISM: Mycoplasma mobile

<400> SEQUENCE: 16

Met Tyr Phe Tyr Lys Asn Lys Glu Asn Lys Leu Asn Lys Lys Val Val

1 5 10 15

Leu Gly Leu Asp Leu Gly Ile Ala Ser Val Gly Trp Cys Leu Thr Asp

20 25 30

Ile Ser Gln Lys Glu Asp Asn Lys Phe Pro Ile Ile Leu His Gly Val

35 40 45

Arg Leu Phe Glu Thr Val Asp Asp Ser Asp Asp Lys Leu Leu Asn Glu

50 55 60

Thr Arg Arg Lys Lys Arg Gly Gln Arg Arg Arg Asn Arg Arg Leu Phe

65 70 75 80

Thr Arg Lys Arg Asp Phe Ile Lys Tyr Leu Ile Asp Asn Asn Ile Ile

85 90 95

Glu Leu Glu Phe Asp Lys Asn Pro Lys Ile Leu Val Arg Asn Phe Ile

100 105 110

Glu Lys Tyr Ile Asn Pro Phe Ser Lys Asn Leu Glu Leu Lys Tyr Lys

115 120 125

Ser Val Thr Asn Leu Pro Ile Gly Phe His Asn Leu Arg Lys Ala Ala

130 135 140

Ile Asn Glu Lys Tyr Lys Leu Asp Lys Ser Glu Leu Ile Val Leu Leu

145 150 155 160

Tyr Phe Tyr Leu Ser Leu Arg Gly Ala Phe Phe Asp Asn Pro Glu Asp

165 170 175

Thr Lys Ser Lys Glu Met Asn Lys Asn Glu Ile Glu Ile Phe Asp Lys

180 185 190

Asn Glu Ser Ile Lys Asn Ala Glu Phe Pro Ile Asp Lys Ile Ile Glu

195 200 205

Phe Tyr Lys Ile Ser Gly Lys Ile Arg Ser Thr Ile Asn Leu Lys Phe

210 215 220

Gly His Gln Asp Tyr Leu Lys Glu Ile Lys Gln Val Phe Glu Lys Gln

225 230 235 240

Asn Ile Asp Phe Met Asn Tyr Glu Lys Phe Ala Met Glu Glu Lys Ser

245 250 255

Phe Phe Ser Arg Ile Arg Asn Tyr Ser Glu Gly Pro Gly Asn Glu Lys

260 265 270

Ser Phe Ser Lys Tyr Gly Leu Tyr Ala Asn Glu Asn Gly Asn Pro Glu

275 280 285

Leu Ile Ile Asn Glu Lys Gly Gln Lys Ile Tyr Thr Lys Ile Phe Lys

290 295 300

Thr Leu Trp Glu Ser Lys Ile Gly Lys Cys Ser Tyr Asp Lys Lys Leu

305 310 315 320

Tyr Arg Ala Pro Lys Asn Ser Phe Ser Ala Lys Val Phe Asp Ile Thr

325 330 335

Asn Lys Leu Thr Asp Trp Lys His Lys Asn Glu Tyr Ile Ser Glu Arg

340 345 350

Leu Lys Arg Lys Ile Leu Leu Ser Arg Phe Leu Asn Lys Asp Ser Lys

355 360 365

Ser Ala Val Glu Lys Ile Leu Lys Glu Glu Asn Ile Lys Phe Glu Asn

370 375 380

Leu Ser Glu Ile Ala Tyr Asn Lys Asp Asp Asn Lys Ile Asn Leu Pro

385 390 395 400

Ile Ile Asn Ala Tyr His Ser Leu Thr Thr Ile Phe Lys Lys His Leu

405 410 415

Ile Asn Phe Glu Asn Tyr Leu Ile Ser Asn Glu Asn Asp Leu Ser Lys

420 425 430

Leu Met Ser Phe Tyr Lys Gln Gln Ser Glu Lys Leu Phe Val Pro Asn

435 440 445

Glu Lys Gly Ser Tyr Glu Ile Asn Gln Asn Asn Asn Val Leu His Ile

450 455 460

Phe Asp Ala Ile Ser Asn Ile Leu Asn Lys Phe Ser Thr Ile Gln Asp

465 470 475 480

Arg Ile Arg Ile Leu Glu Gly Tyr Phe Glu Phe Ser Asn Leu Lys Lys

485 490 495

Asp Val Lys Ser Ser Glu Ile Tyr Ser Glu Ile Ala Lys Leu Arg Glu

500 505 510

Phe Ser Gly Thr Ser Ser Leu Ser Phe Gly Ala Tyr Tyr Lys Phe Ile

515 520 525

Pro Asn Leu Ile Ser Glu Gly Ser Lys Asn Tyr Ser Thr Ile Ser Tyr

530 535 540

Glu Glu Lys Ala Leu Gln Asn Gln Lys Asn Asn Phe Ser His Ser Asn

545 550 555 560

Leu Phe Glu Lys Thr Trp Val Glu Asp Leu Ile Ala Ser Pro Thr Val

565 570 575

Lys Arg Ser Leu Arg Gln Thr Met Asn Leu Leu Lys Glu Ile Phe Lys

580 585 590

Tyr Ser Glu Lys Asn Asn Leu Glu Ile Glu Lys Ile Val Val Glu Val

595 600 605

Thr Arg Ser Ser Asn Asn Lys His Glu Arg Lys Lys Ile Glu Gly Ile

610 615 620

Asn Lys Tyr Arg Lys Glu Lys Tyr Glu Glu Leu Lys Lys Val Tyr Asp

625 630 635 640

Leu Pro Asn Glu Asn Thr Thr Leu Leu Lys Lys Leu Trp Leu Leu Arg

645 650 655

Gln Gln Gln Gly Tyr Asp Ala Tyr Ser Leu Arg Lys Ile Glu Ala Asn

660 665 670

Asp Val Ile Asn Lys Pro Trp Asn Tyr Asp Ile Asp His Ile Val Pro

675 680 685

Arg Ser Ile Ser Phe Asp Asp Ser Phe Ser Asn Leu Val Ile Val Asn

690 695 700

Lys Leu Asp Asn Ala Lys Lys Ser Asn Asp Leu Ser Ala Lys Gln Phe

705 710 715 720

Ile Glu Lys Ile Tyr Gly Ile Glu Lys Leu Lys Glu Ala Lys Glu Asn

725 730 735

Trp Gly Asn Trp Tyr Leu Arg Asn Ala Asn Gly Lys Ala Phe Asn Asp

740 745 750

Lys Gly Lys Phe Ile Lys Leu Tyr Thr Ile Asp Asn Leu Asp Glu Phe

755 760 765

Asp Asn Ser Asp Phe Ile Asn Arg Asn Leu Ser Asp Thr Ser Tyr Ile

770 775 780

Thr Asn Ala Leu Val Asn His Leu Thr Phe Ser Asn Ser Lys Tyr Lys

785 790 795 800

Tyr Ser Val Val Ser Val Asn Gly Lys Gln Thr Ser Asn Leu Arg Asn

805 810 815

Gln Ile Ala Phe Val Gly Ile Lys Asn Asn Lys Glu Thr Glu Arg Glu

820 825 830

Trp Lys Arg Pro Glu Gly Phe Lys Ser Ile Asn Ser Asn Asp Phe Leu

835 840 845

Ile Arg Glu Glu Gly Lys Asn Asp Val Lys Asp Asp Val Leu Ile Lys

850 855 860

Asp Arg Ser Phe Asn Gly His His Ala Glu Asp Ala Tyr Phe Ile Thr

865 870 875 880

Ile Ile Ser Gln Tyr Phe Arg Ser Phe Lys Arg Ile Glu Arg Leu Asn

885 890 895

Val Asn Tyr Arg Lys Glu Thr Arg Glu Leu Asp Asp Leu Glu Lys Asn

900 905 910

Asn Ile Lys Phe Lys Glu Lys Ala Ser Phe Asp Asn Phe Leu Leu Ile

915 920 925

Asn Ala Leu Asp Glu Leu Asn Glu Lys Leu Asn Gln Met Arg Phe Ser

930 935 940

Arg Met Val Ile Thr Lys Lys Asn Thr Gln Leu Phe Asn Glu Thr Leu

945 950 955 960

Tyr Ser Gly Lys Tyr Asp Lys Gly Lys Asn Thr Ile Lys Lys Val Glu

965 970 975

Lys Leu Asn Leu Leu Asp Asn Arg Thr Asp Lys Ile Lys Lys Ile Glu

980 985 990

Glu Phe Phe Asp Glu Asp Lys Leu Lys Glu Asn Glu Leu Thr Lys Leu

995 1000 1005

His Ile Phe Asn His Asp Lys Asn Leu Tyr Glu Thr Leu Lys Ile

1010 1015 1020

Ile Trp Asn Glu Val Lys Ile Glu Ile Lys Asn Lys Asn Leu Asn

1025 1030 1035

Glu Lys Asn Tyr Phe Lys Tyr Phe Val Asn Lys Lys Leu Gln Glu

1040 1045 1050

Gly Lys Ile Ser Phe Asn Glu Trp Val Pro Ile Leu Asp Asn Asp

1055 1060 1065

Phe Lys Ile Ile Arg Lys Ile Arg Tyr Ile Lys Phe Ser Ser Glu

1070 1075 1080

Glu Lys Glu Thr Asp Glu Ile Ile Phe Ser Gln Ser Asn Phe Leu

1085 1090 1095

Lys Ile Asp Gln Arg Gln Asn Phe Ser Phe His Asn Thr Leu Tyr

1100 1105 1110

Trp Val Gln Ile Trp Val Tyr Lys Asn Gln Lys Asp Gln Tyr Cys

1115 1120 1125

Phe Ile Ser Ile Asp Ala Arg Asn Ser Lys Phe Glu Lys Asp Glu

1130 1135 1140

Ile Lys Ile Asn Tyr Glu Lys Leu Lys Thr Gln Lys Glu Lys Leu

1145 1150 1155

Gln Ile Ile Asn Glu Glu Pro Ile Leu Lys Ile Asn Lys Gly Asp

1160 1165 1170

Leu Phe Glu Asn Glu Glu Lys Glu Leu Phe Tyr Ile Val Gly Arg

1175 1180 1185

Asp Glu Lys Pro Gln Lys Leu Glu Ile Lys Tyr Ile Leu Gly Lys

1190 1195 1200

Lys Ile Lys Asp Gln Lys Gln Ile Gln Lys Pro Val Lys Lys Tyr

1205 1210 1215

Phe Pro Asn Trp Lys Lys Val Asn Leu Thr Tyr Met Gly Glu Ile

1220 1225 1230

Phe Lys Lys

1235

<210> SEQ ID NO 17

<211> LENGTH: 1372

<212> TYPE: PRT

<213> ORGANISM: Legionella pneumophila

<400> SEQUENCE: 17

Met Glu Ser Ser Gln Ile Leu Ser Pro Ile Gly Ile Asp Leu Gly Gly

1 5 10 15

Lys Phe Thr Gly Val Cys Leu Ser His Leu Glu Ala Phe Ala Glu Leu

20 25 30

Pro Asn His Ala Asn Thr Lys Tyr Ser Val Ile Leu Ile Asp His Asn

35 40 45

Asn Phe Gln Leu Ser Gln Ala Gln Arg Arg Ala Thr Arg His Arg Val

50 55 60

Arg Asn Lys Lys Arg Asn Gln Phe Val Lys Arg Val Ala Leu Gln Leu

65 70 75 80

Phe Gln His Ile Leu Ser Arg Asp Leu Asn Ala Lys Glu Glu Thr Ala

85 90 95

Leu Cys His Tyr Leu Asn Asn Arg Gly Tyr Thr Tyr Val Asp Thr Asp

100 105 110

Leu Asp Glu Tyr Ile Lys Asp Glu Thr Thr Ile Asn Leu Leu Lys Glu

115 120 125

Leu Leu Pro Ser Glu Ser Glu His Asn Phe Ile Asp Trp Phe Leu Gln

130 135 140

Lys Met Gln Ser Ser Glu Phe Arg Lys Ile Leu Val Ser Lys Val Glu

145 150 155 160

Glu Lys Lys Asp Asp Lys Glu Leu Lys Asn Ala Val Lys Asn Ile Lys

165 170 175

Asn Phe Ile Thr Gly Phe Glu Lys Asn Ser Val Glu Gly His Arg His

180 185 190

Arg Lys Val Tyr Phe Glu Asn Ile Lys Ser Asp Ile Thr Lys Asp Asn

195 200 205

Gln Leu Asp Ser Ile Lys Lys Lys Ile Pro Ser Val Cys Leu Ser Asn

210 215 220

Leu Leu Gly His Leu Ser Asn Leu Gln Trp Lys Asn Leu His Arg Tyr

225 230 235 240

Leu Ala Lys Asn Pro Lys Gln Phe Asp Glu Gln Thr Phe Gly Asn Glu

245 250 255

Phe Leu Arg Met Leu Lys Asn Phe Arg His Leu Lys Gly Ser Gln Glu

260 265 270

Ser Leu Ala Val Arg Asn Leu Ile Gln Gln Leu Glu Gln Ser Gln Asp

275 280 285

Tyr Ile Ser Ile Leu Glu Lys Thr Pro Pro Glu Ile Thr Ile Pro Pro

290 295 300

Tyr Glu Ala Arg Thr Asn Thr Gly Met Glu Lys Asp Gln Ser Leu Leu

305 310 315 320

Leu Asn Pro Glu Lys Leu Asn Asn Leu Tyr Pro Asn Trp Arg Asn Leu

325 330 335

Ile Pro Gly Ile Ile Asp Ala His Pro Phe Leu Glu Lys Asp Leu Glu

340 345 350

His Thr Lys Leu Arg Asp Arg Lys Arg Ile Ile Ser Pro Ser Lys Gln

355 360 365

Asp Glu Lys Arg Asp Ser Tyr Ile Leu Gln Arg Tyr Leu Asp Leu Asn

370 375 380

Lys Lys Ile Asp Lys Phe Lys Ile Lys Lys Gln Leu Ser Phe Leu Gly

385 390 395 400

Gln Gly Lys Gln Leu Pro Ala Asn Leu Ile Glu Thr Gln Lys Glu Met

405 410 415

Glu Thr His Phe Asn Ser Ser Leu Val Ser Val Leu Ile Gln Ile Ala

420 425 430

Ser Ala Tyr Asn Lys Glu Arg Glu Asp Ala Ala Gln Gly Ile Trp Phe

435 440 445

Asp Asn Ala Phe Ser Leu Cys Glu Leu Ser Asn Ile Asn Pro Pro Arg

450 455 460

Lys Gln Lys Ile Leu Pro Leu Leu Val Gly Ala Ile Leu Ser Glu Asp

465 470 475 480

Phe Ile Asn Asn Lys Asp Lys Trp Ala Lys Phe Lys Ile Phe Trp Asn

485 490 495

Thr His Lys Ile Gly Arg Thr Ser Leu Lys Ser Lys Cys Lys Glu Ile

500 505 510

Glu Glu Ala Arg Lys Asn Ser Gly Asn Ala Phe Lys Ile Asp Tyr Glu

515 520 525

Glu Ala Leu Asn His Pro Glu His Ser Asn Asn Lys Ala Leu Ile Lys

530 535 540

Ile Ile Gln Thr Ile Pro Asp Ile Ile Gln Ala Ile Gln Ser His Leu

545 550 555 560

Gly His Asn Asp Ser Gln Ala Leu Ile Tyr His Asn Pro Phe Ser Leu

565 570 575

Ser Gln Leu Tyr Thr Ile Leu Glu Thr Lys Arg Asp Gly Phe His Lys

580 585 590

Asn Cys Val Ala Val Thr Cys Glu Asn Tyr Trp Arg Ser Gln Lys Thr

595 600 605

Glu Ile Asp Pro Glu Ile Ser Tyr Ala Ser Arg Leu Pro Ala Asp Ser

610 615 620

Val Arg Pro Phe Asp Gly Val Leu Ala Arg Met Met Gln Arg Leu Ala

625 630 635 640

Tyr Glu Ile Ala Met Ala Lys Trp Glu Gln Ile Lys His Ile Pro Asp

645 650 655

Asn Ser Ser Leu Leu Ile Pro Ile Tyr Leu Glu Gln Asn Arg Phe Glu

660 665 670

Phe Glu Glu Ser Phe Lys Lys Ile Lys Gly Ser Ser Ser Asp Lys Thr

675 680 685

Leu Glu Gln Ala Ile Glu Lys Gln Asn Ile Gln Trp Glu Glu Lys Phe

690 695 700

Gln Arg Ile Ile Asn Ala Ser Met Asn Ile Cys Pro Tyr Lys Gly Ala

705 710 715 720

Ser Ile Gly Gly Gln Gly Glu Ile Asp His Ile Tyr Pro Arg Ser Leu

725 730 735

Ser Lys Lys His Phe Gly Val Ile Phe Asn Ser Glu Val Asn Leu Ile

740 745 750

Tyr Cys Ser Ser Gln Gly Asn Arg Glu Lys Lys Glu Glu His Tyr Leu

755 760 765

Leu Glu His Leu Ser Pro Leu Tyr Leu Lys His Gln Phe Gly Thr Asp

770 775 780

Asn Val Ser Asp Ile Lys Asn Phe Ile Ser Gln Asn Val Ala Asn Ile

785 790 795 800

Lys Lys Tyr Ile Ser Phe His Leu Leu Thr Pro Glu Gln Gln Lys Ala

805 810 815

Ala Arg His Ala Leu Phe Leu Asp Tyr Asp Asp Glu Ala Phe Lys Thr

820 825 830

Ile Thr Lys Phe Leu Met Ser Gln Gln Lys Ala Arg Val Asn Gly Thr

835 840 845

Gln Lys Phe Leu Gly Lys Gln Ile Met Glu Phe Leu Ser Thr Leu Ala

850 855 860

Asp Ser Lys Gln Leu Gln Leu Glu Phe Ser Ile Lys Gln Ile Thr Ala

865 870 875 880

Glu Glu Val His Asp His Arg Glu Leu Leu Ser Lys Gln Glu Pro Lys

885 890 895

Leu Val Lys Ser Arg Gln Gln Ser Phe Pro Ser His Ala Ile Asp Ala

900 905 910

Thr Leu Thr Met Ser Ile Gly Leu Lys Glu Phe Pro Gln Phe Ser Gln

915 920 925

Glu Leu Asp Asn Ser Trp Phe Ile Asn His Leu Met Pro Asp Glu Val

930 935 940

His Leu Asn Pro Val Arg Ser Lys Glu Lys Tyr Asn Lys Pro Asn Ile

945 950 955 960

Ser Ser Thr Pro Leu Phe Lys Asp Ser Leu Tyr Ala Glu Arg Phe Ile

965 970 975

Pro Val Trp Val Lys Gly Glu Thr Phe Ala Ile Gly Phe Ser Glu Lys

980 985 990

Asp Leu Phe Glu Ile Lys Pro Ser Asn Lys Glu Lys Leu Phe Thr Leu

995 1000 1005

Leu Lys Thr Tyr Ser Thr Lys Asn Pro Gly Glu Ser Leu Gln Glu

1010 1015 1020

Leu Gln Ala Lys Ser Lys Ala Lys Trp Leu Tyr Phe Pro Ile Asn

1025 1030 1035

Lys Thr Leu Ala Leu Glu Phe Leu His His Tyr Phe His Lys Glu

1040 1045 1050

Ile Val Thr Pro Asp Asp Thr Thr Val Cys His Phe Ile Asn Ser

1055 1060 1065

Leu Arg Tyr Tyr Thr Lys Lys Glu Ser Ile Thr Val Lys Ile Leu

1070 1075 1080

Lys Glu Pro Met Pro Val Leu Ser Val Lys Phe Glu Ser Ser Lys

1085 1090 1095

Lys Asn Val Leu Gly Ser Phe Lys His Thr Ile Ala Leu Pro Ala

1100 1105 1110

Thr Lys Asp Trp Glu Arg Leu Phe Asn His Pro Asn Phe Leu Ala

1115 1120 1125

Leu Lys Ala Asn Pro Ala Pro Asn Pro Lys Glu Phe Asn Glu Phe

1130 1135 1140

Ile Arg Lys Tyr Phe Leu Ser Asp Asn Asn Pro Asn Ser Asp Ile

1145 1150 1155

Pro Asn Asn Gly His Asn Ile Lys Pro Gln Lys His Lys Ala Val

1160 1165 1170

Arg Lys Val Phe Ser Leu Pro Val Ile Pro Gly Asn Ala Gly Thr

1175 1180 1185

Met Met Arg Ile Arg Arg Lys Asp Asn Lys Gly Gln Pro Leu Tyr

1190 1195 1200

Gln Leu Gln Thr Ile Asp Asp Thr Pro Ser Met Gly Ile Gln Ile

1205 1210 1215

Asn Glu Asp Arg Leu Val Lys Gln Glu Val Leu Met Asp Ala Tyr

1220 1225 1230

Lys Thr Arg Asn Leu Ser Thr Ile Asp Gly Ile Asn Asn Ser Glu

1235 1240 1245

Gly Gln Ala Tyr Ala Thr Phe Asp Asn Trp Leu Thr Leu Pro Val

1250 1255 1260

Ser Thr Phe Lys Pro Glu Ile Ile Lys Leu Glu Met Lys Pro His

1265 1270 1275

Ser Lys Thr Arg Arg Tyr Ile Arg Ile Thr Gln Ser Leu Ala Asp

1280 1285 1290

Phe Ile Lys Thr Ile Asp Glu Ala Leu Met Ile Lys Pro Ser Asp

1295 1300 1305

Ser Ile Asp Asp Pro Leu Asn Met Pro Asn Glu Ile Val Cys Lys

1310 1315 1320

Asn Lys Leu Phe Gly Asn Glu Leu Lys Pro Arg Asp Gly Lys Met

1325 1330 1335

Lys Ile Val Ser Thr Gly Lys Ile Val Thr Tyr Glu Phe Glu Ser

1340 1345 1350

Asp Ser Thr Pro Gln Trp Ile Gln Thr Leu Tyr Val Thr Gln Leu

1355 1360 1365

Lys Lys Gln Pro

1370

<210> SEQ ID NO 18

<211> LENGTH: 1122

<212> TYPE: PRT

<213> ORGANISM: Streptococcus thermophilus

<400> SEQUENCE: 18

Met Ser Asp Leu Val Leu Gly Leu Asp Ile Gly Ile Gly Ser Val Gly

1 5 10 15

Val Gly Ile Leu Asn Lys Val Thr Gly Glu Ile Ile His Lys Asn Ser

20 25 30

Arg Ile Phe Pro Ala Ala Gln Ala Glu Asn Asn Leu Val Arg Arg Thr

35 40 45

Asn Arg Gln Gly Arg Arg Leu Thr Arg Arg Lys Lys His Arg Ile Val

50 55 60

Arg Leu Asn Arg Leu Phe Glu Glu Ser Gly Leu Ile Thr Asp Phe Thr

65 70 75 80

Lys Ile Ser Ile Asn Leu Asn Pro Tyr Gln Leu Arg Val Lys Gly Leu

85 90 95

Thr Asp Glu Leu Ser Asn Glu Glu Leu Phe Ile Ala Leu Lys Asn Met

100 105 110

Val Lys His Arg Gly Ile Ser Tyr Leu Asp Asp Ala Ser Asp Asp Gly

115 120 125

Asn Ser Ser Val Gly Asp Tyr Ala Gln Ile Val Lys Glu Asn Ser Lys

130 135 140

Gln Leu Glu Thr Lys Thr Pro Gly Gln Ile Gln Leu Glu Arg Tyr Gln

145 150 155 160

Thr Tyr Gly Gln Leu Arg Gly Asp Phe Thr Val Glu Lys Asp Gly Lys

165 170 175

Lys His Arg Leu Ile Asn Val Phe Pro Thr Ser Ala Tyr Arg Ser Glu

180 185 190

Ala Leu Arg Ile Leu Gln Thr Gln Gln Glu Phe Asn Ser Gln Ile Thr

195 200 205

Asp Glu Phe Ile Asn Arg Tyr Leu Glu Ile Leu Thr Gly Lys Arg Lys

210 215 220

Tyr Tyr His Gly Pro Gly Asn Glu Lys Ser Arg Thr Asp Tyr Gly Arg

225 230 235 240

Tyr Arg Thr Asn Gly Glu Thr Leu Asp Asn Ile Phe Gly Ile Leu Ile

245 250 255

Gly Lys Cys Thr Phe Tyr Pro Asp Glu Phe Arg Ala Ala Lys Ala Ser

260 265 270

Tyr Thr Ala Gln Glu Phe Asn Leu Leu Asn Asp Leu Asn Asn Leu Thr

275 280 285

Val Pro Thr Glu Thr Lys Lys Leu Ser Lys Glu Gln Lys Asn Gln Ile

290 295 300

Ile Asn Tyr Val Lys Asn Glu Lys Val Met Gly Pro Ala Lys Leu Phe

305 310 315 320

Lys Tyr Ile Ala Lys Leu Leu Ser Cys Asp Val Ala Asp Ile Lys Gly

325 330 335

His Arg Ile Asp Lys Ser Gly Lys Ala Glu Ile His Thr Phe Glu Ala

340 345 350

Tyr Arg Lys Met Lys Thr Leu Glu Thr Leu Asp Ile Glu Gln Met Asp

355 360 365

Arg Glu Thr Leu Asp Lys Leu Ala Tyr Val Leu Thr Leu Asn Thr Glu

370 375 380

Arg Glu Gly Ile Gln Glu Ala Leu Glu His Glu Phe Ala Asp Gly Ser

385 390 395 400

Phe Ser Gln Lys Gln Val Asp Glu Leu Val Gln Phe Arg Lys Ala Asn

405 410 415

Ser Ser Ile Phe Gly Lys Gly Trp His Asn Phe Ser Val Lys Leu Met

420 425 430

Met Glu Leu Ile Pro Glu Leu Tyr Glu Thr Ser Glu Glu Gln Met Thr

435 440 445

Ile Leu Thr Arg Leu Gly Lys Gln Lys Thr Thr Ser Ser Ser Asn Lys

450 455 460

Thr Lys Tyr Ile Asp Glu Lys Leu Leu Thr Glu Glu Ile Tyr Asn Pro

465 470 475 480

Val Val Ala Lys Ser Val Arg Gln Ala Ile Lys Ile Val Asn Ala Ala

485 490 495

Ile Lys Glu Tyr Gly Asp Phe Asp Asn Ile Val Ile Glu Met Ala Arg

500 505 510

Glu Thr Asn Glu Asp Asp Glu Lys Lys Ala Ile Gln Lys Ile Gln Lys

515 520 525

Ala Asn Lys Asp Glu Lys Asp Ala Ala Met Leu Lys Ala Ala Asn Gln

530 535 540

Tyr Asn Gly Lys Ala Glu Leu Pro His Ser Val Phe His Gly His Lys

545 550 555 560

Gln Leu Ala Thr Lys Ile Arg Leu Trp His Gln Gln Gly Glu Arg Cys

565 570 575

Leu Tyr Thr Gly Lys Thr Ile Ser Ile His Asp Leu Ile Asn Asn Pro

580 585 590

Asn Gln Phe Glu Val Asp His Ile Leu Pro Leu Ser Ile Thr Phe Asp

595 600 605

Asp Ser Leu Ala Asn Lys Val Leu Val Tyr Ala Thr Ala Asn Gln Glu

610 615 620

Lys Gly Gln Arg Thr Pro Tyr Gln Ala Leu Asp Ser Met Asp Asp Ala

625 630 635 640

Trp Ser Phe Arg Glu Leu Lys Ala Phe Val Arg Glu Ser Lys Thr Leu

645 650 655

Ser Asn Lys Lys Lys Glu Tyr Leu Leu Thr Glu Glu Asp Ile Ser Lys

660 665 670

Phe Asp Val Arg Lys Lys Phe Ile Glu Arg Asn Leu Val Asp Thr Arg

675 680 685

Tyr Ala Ser Arg Val Val Leu Asn Ala Leu Gln Glu His Phe Arg Ala

690 695 700

His Lys Ile Asp Thr Lys Val Ser Val Val Arg Gly Gln Phe Thr Ser

705 710 715 720

Gln Leu Arg Arg His Trp Gly Ile Glu Lys Thr Arg Asp Thr Tyr His

725 730 735

His His Ala Val Asp Ala Leu Ile Ile Ala Ala Ser Ser Gln Leu Asn

740 745 750

Leu Trp Lys Lys Gln Lys Asn Thr Leu Val Ser Tyr Ser Glu Glu Gln

755 760 765

Leu Leu Asp Ile Glu Thr Gly Glu Leu Ile Ser Asp Asp Glu Tyr Lys

770 775 780

Glu Ser Val Phe Lys Ala Pro Tyr Gln His Phe Val Asp Thr Leu Lys

785 790 795 800

Ser Lys Glu Phe Glu Asp Ser Ile Leu Phe Ser Tyr Gln Val Asp Ser

805 810 815

Lys Phe Asn Arg Lys Ile Ser Asp Ala Thr Ile Tyr Ala Thr Arg Gln

820 825 830

Ala Lys Val Gly Lys Asp Lys Lys Asp Glu Thr Tyr Val Leu Gly Lys

835 840 845

Ile Lys Asp Ile Tyr Thr Gln Asp Gly Tyr Asp Ala Phe Met Lys Ile

850 855 860

Tyr Lys Lys Asp Lys Ser Lys Phe Leu Met Tyr Arg His Asp Pro Gln

865 870 875 880

Thr Phe Glu Lys Val Ile Glu Pro Ile Leu Glu Asn Tyr Pro Asn Lys

885 890 895

Glu Met Asn Glu Lys Gly Lys Glu Val Pro Cys Asn Pro Phe Leu Lys

900 905 910

Tyr Lys Glu Glu His Gly Tyr Ile Arg Lys Tyr Ser Lys Lys Gly Asn

915 920 925

Gly Pro Glu Ile Lys Ser Leu Lys Tyr Tyr Asp Ser Lys Leu Leu Gly

930 935 940

Asn Pro Ile Asp Ile Thr Pro Glu Asn Ser Lys Asn Lys Val Val Leu

945 950 955 960

Gln Ser Leu Lys Pro Trp Arg Thr Asp Val Tyr Phe Asn Lys Asn Thr

965 970 975

Gly Lys Tyr Glu Ile Leu Gly Leu Lys Tyr Ala Asp Leu Gln Phe Glu

980 985 990

Lys Lys Thr Gly Thr Tyr Lys Ile Ser Gln Glu Lys Tyr Asn Gly Ile

995 1000 1005

Met Lys Glu Glu Gly Val Asp Ser Asp Ser Glu Phe Lys Phe Thr

1010 1015 1020

Leu Tyr Lys Asn Asp Leu Leu Leu Val Lys Asp Thr Glu Thr Lys

1025 1030 1035

Glu Gln Gln Leu Phe Arg Phe Leu Ser Arg Thr Met Pro Asn Val

1040 1045 1050

Lys Tyr Tyr Val Glu Leu Lys Pro Tyr Ser Lys Asp Lys Phe Glu

1055 1060 1065

Lys Asn Glu Ser Leu Ile Glu Ile Leu Gly Ser Ala Asp Lys Ser

1070 1075 1080

Gly Arg Cys Ile Lys Gly Leu Gly Lys Ser Asn Ile Ser Ile Tyr

1085 1090 1095

Lys Val Arg Thr Asp Val Leu Gly Asn Gln His Ile Ile Lys Asn

1100 1105 1110

Glu Gly Asp Lys Pro Lys Leu Asp Phe

1115 1120

<210> SEQ ID NO 19

<211> LENGTH: 1128

<212> TYPE: PRT

<213> ORGANISM: Streptococcus thermophilus

<400> SEQUENCE: 19

Met Ser Asp Leu Val Leu Gly Leu Asp Ile Gly Ile Gly Ser Val Gly

1 5 10 15

Val Gly Ile Leu Asn Lys Val Thr Gly Glu Ile Ile His Lys Asn Ser

20 25 30

Arg Ile Phe Pro Ala Ala Gln Ala Glu Asn Asn Leu Val Arg Arg Thr

35 40 45

Asn Arg Gln Gly Arg Arg Leu Thr Arg Arg Lys Lys His Arg Ile Val

50 55 60

Arg Leu Asn Arg Leu Phe Glu Glu Ser Gly Leu Ile Thr Asp Phe Thr

65 70 75 80

Lys Ile Ser Ile Asn Leu Asn Pro Tyr Gln Leu Arg Val Lys Gly Leu

85 90 95

Thr Asp Glu Leu Ser Asn Glu Glu Leu Phe Ile Ala Leu Lys Asn Met

100 105 110

Val Lys His Arg Gly Ile Ser Tyr Leu Asp Asp Ala Ser Asp Asp Gly

115 120 125

Asn Ser Ser Val Gly Asp Tyr Ala Gln Ile Val Lys Glu Asn Ser Lys

130 135 140

Gln Leu Glu Thr Lys Thr Pro Gly Gln Ile Gln Leu Glu Arg Tyr Gln

145 150 155 160

Thr Tyr Gly Gln Leu Arg Gly Asp Phe Thr Val Glu Lys Asp Gly Lys

165 170 175

Lys His Arg Leu Ile Asn Val Phe Pro Thr Ser Ala Tyr Arg Ser Glu

180 185 190

Ala Leu Arg Ile Leu Gln Thr Gln Gln Glu Phe Asn Pro Gln Ile Thr

195 200 205

Asp Glu Phe Ile Asn Arg Tyr Leu Glu Ile Leu Thr Gly Lys Arg Lys

210 215 220

Tyr Tyr His Gly Pro Gly Asn Glu Lys Ser Arg Thr Asp Tyr Gly Arg

225 230 235 240

Tyr Arg Thr Ser Gly Glu Thr Leu Asp Asn Ile Phe Gly Ile Leu Ile

245 250 255

Gly Lys Cys Thr Phe Tyr Pro Glu Glu Phe Arg Ala Ala Lys Ala Ser

260 265 270

Tyr Thr Ala Gln Glu Phe Asn Leu Leu Asn Asp Leu Asn Asn Leu Thr

275 280 285

Val Pro Thr Glu Thr Lys Lys Leu Ser Lys Glu Gln Lys Asn Gln Ile

290 295 300

Ile Asn Tyr Val Lys Asn Glu Lys Ala Met Gly Pro Ala Lys Leu Phe

305 310 315 320

Lys Tyr Ile Ala Lys Leu Leu Ser Cys Asp Val Ala Asp Ile Lys Gly

325 330 335

Tyr Arg Ile Asp Lys Ser Gly Lys Ala Glu Ile His Thr Phe Glu Ala

340 345 350

Tyr Arg Lys Met Lys Thr Leu Glu Thr Leu Asp Ile Glu Gln Met Asp

355 360 365

Arg Glu Thr Leu Asp Lys Leu Ala Tyr Val Leu Thr Leu Asn Thr Glu

370 375 380

Arg Glu Gly Ile Gln Glu Ala Leu Glu His Glu Phe Ala Asp Gly Ser

385 390 395 400

Phe Ser Gln Lys Gln Val Asp Glu Leu Val Gln Phe Arg Lys Ala Asn

405 410 415

Ser Ser Ile Phe Gly Lys Gly Trp His Asn Phe Ser Val Lys Leu Met

420 425 430

Met Glu Leu Ile Pro Glu Leu Tyr Glu Thr Ser Glu Glu Gln Met Thr

435 440 445

Ile Leu Thr Arg Leu Gly Lys Gln Lys Arg Leu Arg Leu Gln Ile Lys

450 455 460

Gln Asn Ile Ser Asn Lys Thr Lys Tyr Ile Asp Glu Lys Leu Leu Thr

465 470 475 480

Glu Glu Ile Tyr Asn Pro Val Val Ala Lys Ser Val Arg Gln Ala Ile

485 490 495

Lys Ile Val Asn Ala Ala Ile Lys Glu Tyr Gly Asp Phe Asp Asn Ile

500 505 510

Val Ile Glu Met Ala Arg Glu Thr Asn Glu Asp Asp Glu Lys Lys Ala

515 520 525

Ile Gln Lys Ile Gln Lys Ala Asn Lys Asp Glu Lys Asp Ala Ala Met

530 535 540

Leu Lys Ala Ala Asn Gln Tyr Asn Gly Lys Ala Glu Leu Pro His Ser

545 550 555 560

Val Phe His Gly His Lys Gln Leu Ala Thr Lys Ile Arg Leu Trp His

565 570 575

Gln Gln Gly Glu Arg Cys Leu Tyr Thr Gly Lys Thr Ile Ser Ile His

580 585 590

Asp Leu Ile Asn Asn Pro Asn Gln Phe Glu Val Asp His Ile Leu Pro

595 600 605

Leu Ser Ile Thr Phe Asp Asp Ser Leu Ala Asn Lys Val Leu Val Tyr

610 615 620

Ala Thr Ala Asn Gln Glu Lys Gly Gln Arg Thr Pro Tyr Gln Ala Leu

625 630 635 640

Asp Ser Met Asp Asp Ala Trp Ser Phe Arg Glu Leu Lys Ala Phe Val

645 650 655

Arg Glu Ser Lys Thr Leu Ser Asn Lys Lys Lys Glu Tyr Leu Leu Thr

660 665 670

Glu Glu Asp Ile Ser Lys Phe Asp Val Arg Lys Lys Phe Ile Glu Arg

675 680 685

Asn Leu Val Asp Thr Arg Tyr Ala Ser Arg Val Val Leu Asn Ala Leu

690 695 700

Gln Glu His Phe Arg Ala His Lys Ile Asp Thr Lys Val Ser Val Val

705 710 715 720

Arg Gly Gln Phe Thr Ser Gln Leu Arg Arg His Trp Gly Ile Glu Lys

725 730 735

Thr Arg Asp Thr Tyr His His His Ala Val Asp Ala Leu Ile Ile Ala

740 745 750

Ala Ser Ser Gln Leu Asn Leu Trp Lys Lys Gln Lys Asn Thr Leu Val

755 760 765

Ser Tyr Ser Glu Glu Gln Leu Leu Asp Ile Glu Thr Gly Glu Leu Ile

770 775 780

Ser Asp Asp Glu Tyr Lys Glu Ser Val Phe Lys Ala Pro Tyr Gln His

785 790 795 800

Phe Val Asp Thr Leu Lys Ser Lys Glu Phe Glu Asp Ser Ile Leu Phe

805 810 815

Ser Tyr Gln Val Asp Ser Lys Phe Asn Arg Lys Ile Ser Asp Ala Thr

820 825 830

Ile Tyr Ala Thr Arg Gln Ala Lys Val Gly Lys Asp Lys Lys Asp Glu

835 840 845

Thr Tyr Val Leu Gly Lys Ile Lys Asp Ile Tyr Thr Gln Asp Gly Tyr

850 855 860

Asp Ala Phe Met Lys Ile Tyr Lys Lys Asp Lys Ser Lys Phe Leu Met

865 870 875 880

Tyr Arg His Asp Pro Gln Thr Phe Glu Lys Val Ile Glu Pro Ile Leu

885 890 895

Glu Asn Tyr Pro Asn Lys Gln Met Asn Glu Lys Gly Lys Glu Val Pro

900 905 910

Cys Asn Pro Phe Leu Lys Tyr Lys Glu Glu His Gly Tyr Ile Arg Lys

915 920 925

Tyr Ser Lys Lys Gly Asn Gly Pro Glu Ile Lys Ser Leu Lys Tyr Tyr

930 935 940

Asp Ser Lys Leu Leu Gly Asn Pro Ile Asp Ile Thr Pro Glu Asn Ser

945 950 955 960

Lys Asn Lys Val Val Leu Gln Ser Leu Lys Pro Trp Arg Thr Asp Val

965 970 975

Tyr Phe Asn Lys Ala Thr Gly Lys Tyr Glu Ile Leu Gly Leu Lys Tyr

980 985 990

Ala Asp Leu Gln Phe Glu Lys Gly Thr Gly Thr Tyr Lys Ile Ser Gln

995 1000 1005

Glu Lys Tyr Asn Asp Ile Lys Lys Lys Glu Gly Val Asp Ser Asp

1010 1015 1020

Ser Glu Phe Lys Phe Thr Leu Tyr Lys Asn Asp Leu Leu Leu Val

1025 1030 1035

Lys Asp Thr Glu Thr Lys Glu Gln Gln Leu Phe Arg Phe Leu Ser

1040 1045 1050

Arg Thr Leu Pro Lys Gln Lys His Tyr Val Glu Leu Lys Pro Tyr

1055 1060 1065

Asp Lys Gln Lys Phe Glu Gly Gly Glu Ala Leu Ile Lys Val Leu

1070 1075 1080

Gly Asn Val Ala Asn Gly Gly Gln Cys Ile Lys Gly Leu Ala Lys

1085 1090 1095

Ser Asn Ile Ser Ile Tyr Lys Val Arg Thr Asp Val Leu Gly Asn

1100 1105 1110

Gln His Ile Ile Lys Asn Glu Gly Asp Lys Pro Lys Leu Asp Phe

1115 1120 1125

<210> SEQ ID NO 20

<211> LENGTH: 1123

<212> TYPE: PRT

<213> ORGANISM: Francisella tularensis

<400> SEQUENCE: 20

Met Asn Val Lys Ile Leu Pro Ile Ala Ile Asp Leu Asp Val Lys Asn

1 5 10 15

Thr Gly Val Phe Ser Ala Phe Tyr Gln Lys Gly Thr Ser Leu Glu Lys

20 25 30

Leu Asp Asn Lys Asn Gly Lys Val Tyr Glu Leu Ser Lys Asp Ser Tyr

35 40 45

Thr Leu Leu Met Asn Asn Arg Thr Ala Arg Arg His Lys Arg Arg Gly

50 55 60

Ile Asp Arg Lys Gln Leu Val Lys Arg Leu Phe Lys Leu Val Trp Thr

65 70 75 80

Glu Gln Leu Asn Leu Glu Trp Asp Lys Asp Thr Gln Gln Ala Ile Ser

85 90 95

Phe Leu Phe Asn Arg Arg Gly Phe Ser Phe Ile Thr Asp Gly Tyr Ser

100 105 110

Thr Glu Tyr Leu Asn Ile Val Pro Glu Gln Val Lys Ala Ile Leu Met

115 120 125

Asp Ile Phe Asp Asp Tyr Asn Gly Glu Asp Asp Leu Asp Ser Tyr Leu

130 135 140

Lys Leu Ala Thr Glu Gln Glu Ser Lys Ile Ser Glu Ile Tyr Asn Lys

145 150 155 160

Leu Met Gln Lys Ile Leu Glu Phe Lys Leu Arg Lys Leu Cys Thr Asp

165 170 175

Ile Lys Asp Asp Lys Val Ser Thr Lys Thr Leu Lys Glu Ile Thr Ser

180 185 190

Tyr Glu Phe Glu Leu Leu Ala Asp Tyr Leu Ala Asn Tyr Ser Glu Ser

195 200 205

Leu Lys Thr Gln Lys Phe Ser Tyr Thr Asp Lys Gln Gly Asn Leu Lys

210 215 220

Glu Leu Ser Tyr Tyr His His Asp Lys Tyr Asn Ile Gln Glu Phe Leu

225 230 235 240

Lys Arg His Ala Thr Ile Asn Asp Glu Ile Leu Gly Thr Leu Leu Thr

245 250 255

Asp Asp Phe Asp Ile Trp Asn Phe Asn Phe Glu Lys Phe Asp Phe Asp

260 265 270

Lys Asn Glu Glu Lys Leu Gln Asn Gln Glu Asp Lys Asp His Thr Gln

275 280 285

Ala His Leu His His Phe Val Phe Val Val Asn Lys Ile Lys Ser Glu

290 295 300

Met Ala Ser Gly Gly Arg His Arg Ser Gln Tyr Phe Gln Glu Ile Thr

305 310 315 320

Asn Val Leu Asp Glu Asn Asn His Gln Glu Gly Tyr Leu Lys Asn Phe

325 330 335

Cys Glu Asn Leu His Asn Lys Lys Tyr Ser Asn Leu Ser Val Lys Asn

340 345 350

Leu Val Asn Leu Val Gly Asn Leu Ser Asn Leu Glu Leu Lys Pro Leu

355 360 365

Arg Lys Tyr Phe Asn Asp Lys Ile His Ala Lys Ala Asp His Trp Asp

370 375 380

Glu Gln Lys Phe Thr Glu Thr Tyr Cys His Trp Ile Leu Gly Glu Trp

385 390 395 400

Arg Val Gly Val Lys Asp Gln Asp Lys Lys Asp Gly Ala Lys Tyr Ser

405 410 415

Tyr Lys Asp Leu Cys Asn Glu Leu Lys Gln Lys Val Thr Lys Ala Gly

420 425 430

Leu Ile Asp Phe Leu Leu Glu Leu Asp Pro Cys Arg Thr Ile Pro Pro

435 440 445

Tyr Leu Asp Asn Asn Asn Arg Lys Pro Pro Lys Cys Gln Ser Leu Ile

450 455 460

Leu Asn Pro Lys Phe Leu Asp Asn Gln Tyr Pro Asn Trp Gln Gln Tyr

465 470 475 480

Leu Gln Glu Leu Lys Lys Leu Gln Ser Ile Gln Asp Tyr Leu Asp Ser

485 490 495

Phe Glu Thr Asp Leu Lys Asp Leu Lys Ser Ser Lys Asp Gln Pro Tyr

500 505 510

Phe Val Glu Tyr Lys Ser Ser Asn Gln Gln Met Ala Ser Gly Gln Arg

515 520 525

Asp Tyr Lys Asp Leu Asp Ala Arg Ile Leu Gln Phe Ile Phe Asp Arg

530 535 540

Val Lys Ala Ser Asp Glu Leu Leu Leu Asn Glu Ile Tyr Phe Gln Ala

545 550 555 560

Lys Lys Leu Lys Gln Lys Ala Ser Ser Glu Leu Glu Lys Leu Glu Ser

565 570 575

Ser Lys Lys Leu Asp Glu Val Ile Ala Asn Ser Gln Leu Ser Gln Ile

580 585 590

Leu Lys Ser Gln His Thr Asn Gly Ile Phe Glu Gln Gly Thr Phe Leu

595 600 605

His Leu Val Cys Lys Tyr Tyr Lys Gln Arg Gln Arg Ala Arg Asp Ser

610 615 620

Arg Leu Tyr Ile Met Pro Glu Tyr Arg Tyr Asp Lys Lys Leu Asp Lys

625 630 635 640

Tyr Asn Asn Thr Gly Arg Phe Asp Asp Asn Asn Gln Leu Leu Thr Tyr

645 650 655

Cys Asn His Lys Pro Arg Gln Lys Arg Tyr Gln Leu Leu Asn Asp Leu

660 665 670

Ala Gly Val Leu Gln Val Ser Arg Asn Gln Leu Leu Ser Ser Val Glu

675 680 685

Glu Trp Phe Gln Gln Ala Gln Arg Val Gly Glu Ile Ser Lys Ser Gln

690 695 700

Asp Glu Gln Ile Phe Glu Trp Leu Lys Ser Phe Lys Ile Ala Ser Tyr

705 710 715 720

Cys Lys Ala Ala Val Glu Met Gln Lys Gln Tyr Arg Gly Thr Leu Lys

725 730 735

Asn Ala Ile Gln Thr Ala Ile Phe Arg Gln Ser Glu Asn Ile Asn Lys

740 745 750

Asn Lys Asn Thr Gly Asn Gln Gln Gln Ala Leu Ser Glu Asn Ser Lys

755 760 765

Asp Val Lys Ser Leu Thr Ala Asp Glu Lys Lys Leu Leu Lys Leu Ile

770 775 780

Glu Asn Ile Ala Lys Ala Ser Gln Lys Ile Gly Glu Ser Leu Gly Leu

785 790 795 800

Asn Asp Lys Gln Ile Lys Lys Phe Asn Ser Ile Tyr Ser Phe Ala Gln

805 810 815

Ile Gln Gln Ile Ala Phe Ala Lys Arg Lys Gly Asn Ala Asn Thr Cys

820 825 830

Ala Val Cys Ser Ala Asp Asn Ala His Arg Met Gln Gln Ile Lys Ile

835 840 845

Thr Glu Leu Val Glu Asp Asn Lys Asp Asn Ile Ile Leu Ser Ala Lys

850 855 860

Ala Gln Arg Leu Pro Ala Ile Pro Thr Arg Ile Val Asp Gly Ala Val

865 870 875 880

Lys Lys Met Ala Thr Ile Leu Ala Lys Asn Ile Val Asp Asp Asn Trp

885 890 895

Gln Asn Ile Lys Gln Val Leu Ser Ala Lys His Gln Leu His Ile Pro

900 905 910

Ile Ile Thr Glu Ser Asn Ala Phe Glu Phe Glu Pro Ala Leu Ala Asp

915 920 925

Val Lys Gly Lys Ser Leu Lys Asp Arg Arg Lys Lys Ala Leu Glu Arg

930 935 940

Ile Ser Pro Glu Asn Ile Phe Lys Asp Lys Asn Asn Arg Ile Lys Glu

945 950 955 960

Phe Ala Lys Gly Ile Ser Ala Tyr Ser Gly Ala Asn Leu Thr Asp Gly

965 970 975

Asp Phe Asp Gly Ala Lys Glu Glu Leu Asp His Ile Ile Pro Arg Ser

980 985 990

His Lys Lys Tyr Gly Thr Leu Asn Asp Glu Ala Asn Leu Ile Cys Val

995 1000 1005

Thr Arg Asp Asp Asn Lys Asn Ile Phe Ala Ile Asp Thr Ser Lys

1010 1015 1020

Trp Phe Glu Ile Glu Thr Pro Ser Asp Leu Arg Asp Ile Gly Val

1025 1030 1035

Ala Thr Ile Gln Tyr Lys Ile Asp Asn Asn Ser Arg Pro Lys Val

1040 1045 1050

Arg Val Lys Leu Asp Tyr Val Ile Asp Asp Asp Ser Lys Ile Asn

1055 1060 1065

Tyr Phe Met Asn His Ser Leu Leu Lys Ser Arg Tyr Pro Asp Lys

1070 1075 1080

Val Leu Glu Ile Leu Lys Gln Ser Thr Ile Ile Glu Phe Glu Ser

1085 1090 1095

Ser Gly Phe Asn Lys Thr Ile Lys Glu Met Leu Gly Met Thr Leu

1100 1105 1110

Ala Gly Ile Tyr Asn Glu Thr Ser Asn Asn

1115 1120

<210> SEQ ID NO 21

<211> LENGTH: 1436

<212> TYPE: PRT

<213> ORGANISM: Bacteroides fragilis

<400> SEQUENCE: 21

Met Lys Arg Ile Leu Gly Leu Asp Leu Gly Thr Asn Ser Ile Gly Trp

1 5 10 15

Ala Leu Val Asn Glu Ala Glu Asn Lys Asp Glu Arg Ser Ser Ile Val

20 25 30

Lys Leu Gly Val Arg Val Asn Pro Leu Thr Val Asp Glu Leu Thr Asn

35 40 45

Phe Glu Lys Gly Lys Ser Ile Thr Thr Asn Ala Asp Arg Thr Leu Lys

50 55 60

Arg Gly Met Arg Arg Asn Leu Gln Arg Tyr Lys Leu Arg Arg Glu Thr

65 70 75 80

Leu Thr Glu Val Leu Lys Glu His Lys Leu Ile Thr Glu Asp Thr Ile

85 90 95

Leu Ser Glu Asn Gly Asn Arg Thr Thr Phe Glu Thr Tyr Arg Leu Arg

100 105 110

Ala Lys Ala Val Thr Glu Glu Ile Ser Leu Glu Glu Phe Ala Arg Val

115 120 125

Leu Leu Met Ile Asn Lys Lys Arg Gly Tyr Lys Ser Ser Arg Lys Ala

130 135 140

Lys Gly Val Glu Glu Gly Thr Leu Ile Asp Gly Met Asp Ile Ala Arg

145 150 155 160

Glu Leu Tyr Asn Asn Asn Leu Thr Pro Gly Glu Leu Cys Leu Gln Leu

165 170 175

Leu Asp Ala Gly Lys Lys Phe Leu Pro Asp Phe Tyr Arg Ser Asp Leu

180 185 190

Gln Asn Glu Leu Asp Arg Ile Trp Glu Lys Gln Lys Glu Tyr Tyr Pro

195 200 205

Glu Ile Leu Thr Asp Val Leu Lys Glu Glu Leu Arg Gly Lys Lys Arg

210 215 220

Asp Ala Val Trp Ala Ile Cys Ala Lys Tyr Phe Val Trp Lys Glu Asn

225 230 235 240

Tyr Thr Glu Trp Asn Lys Glu Lys Gly Lys Thr Glu Gln Gln Glu Arg

245 250 255

Glu His Lys Leu Glu Gly Ile Tyr Ser Lys Arg Lys Arg Asp Glu Ala

260 265 270

Lys Arg Glu Asn Leu Gln Trp Arg Val Asn Gly Leu Lys Glu Lys Leu

275 280 285

Ser Leu Glu Gln Leu Val Ile Val Phe Gln Glu Met Asn Thr Gln Ile

290 295 300

Asn Asn Ser Ser Gly Tyr Leu Gly Ala Ile Ser Asp Arg Ser Lys Glu

305 310 315 320

Leu Tyr Phe Asn Lys Gln Thr Val Gly Gln Tyr Gln Met Glu Met Leu

325 330 335

Asp Lys Asn Pro Asn Ala Ser Leu Arg Asn Met Val Phe Tyr Arg Gln

340 345 350

Asp Tyr Leu Asp Glu Phe Asn Met Leu Trp Glu Lys Gln Ala Val Tyr

355 360 365

His Lys Glu Leu Thr Glu Glu Leu Lys Lys Glu Ile Arg Asp Ile Ile

370 375 380

Ile Phe Tyr Gln Arg Arg Leu Lys Ser Gln Lys Gly Leu Ile Gly Phe

385 390 395 400

Cys Glu Phe Glu Ser Arg Gln Ile Glu Val Asp Ile Asp Gly Lys Lys

405 410 415

Lys Ile Lys Thr Val Gly Asn Arg Val Ile Ser Arg Ser Ser Pro Leu

420 425 430

Phe Gln Glu Phe Lys Ile Trp Gln Ile Leu Asn Asn Ile Glu Val Thr

435 440 445

Val Val Gly Lys Lys Arg Lys Arg Arg Lys Leu Lys Glu Asn Tyr Ser

450 455 460

Ala Leu Phe Glu Glu Leu Asn Asp Ala Glu Gln Leu Glu Leu Asn Gly

465 470 475 480

Ser Arg Arg Leu Cys Gln Glu Glu Lys Glu Leu Leu Ala Gln Glu Leu

485 490 495

Phe Ile Arg Asp Lys Met Thr Lys Ser Glu Val Leu Lys Leu Leu Phe

500 505 510

Asp Asn Pro Gln Glu Leu Asp Leu Asn Phe Lys Thr Ile Asp Gly Asn

515 520 525

Lys Thr Gly Tyr Ala Leu Phe Gln Ala Tyr Ser Lys Met Ile Glu Met

530 535 540

Ser Gly His Glu Pro Val Asp Phe Lys Lys Pro Val Glu Lys Val Val

545 550 555 560

Glu Tyr Ile Lys Ala Val Phe Asp Leu Leu Asn Trp Asn Thr Asp Ile

565 570 575

Leu Gly Phe Asn Ser Asn Glu Glu Leu Asp Asn Gln Pro Tyr Tyr Lys

580 585 590

Leu Trp His Leu Leu Tyr Ser Phe Glu Gly Asp Asn Thr Pro Thr Gly

595 600 605

Asn Gly Arg Leu Ile Gln Lys Met Thr Glu Leu Tyr Gly Phe Glu Lys

610 615 620

Glu Tyr Ala Thr Ile Leu Ala Asn Val Ser Phe Gln Asp Asp Tyr Gly

625 630 635 640

Ser Leu Ser Ala Lys Ala Ile His Lys Ile Leu Pro His Leu Lys Glu

645 650 655

Gly Asn Arg Tyr Asp Val Ala Cys Val Tyr Ala Gly Tyr Arg His Ser

660 665 670

Glu Ser Ser Leu Thr Arg Glu Glu Ile Ala Asn Lys Val Leu Lys Asp

675 680 685

Arg Leu Met Leu Leu Pro Lys Asn Ser Leu His Asn Pro Val Val Glu

690 695 700

Lys Ile Leu Asn Gln Met Val Asn Val Ile Asn Val Ile Ile Asp Ile

705 710 715 720

Tyr Gly Lys Pro Asp Glu Ile Arg Val Glu Leu Ala Arg Glu Leu Lys

725 730 735

Lys Asn Ala Lys Glu Arg Glu Glu Leu Thr Lys Ser Ile Ala Gln Thr

740 745 750

Thr Lys Ala His Glu Glu Tyr Lys Thr Leu Leu Gln Thr Glu Phe Gly

755 760 765

Leu Thr Asn Val Ser Arg Thr Asp Ile Leu Arg Tyr Lys Leu Tyr Lys

770 775 780

Glu Leu Glu Ser Cys Gly Tyr Lys Thr Leu Tyr Ser Asn Thr Tyr Ile

785 790 795 800

Ser Arg Glu Lys Leu Phe Ser Lys Glu Phe Asp Ile Glu His Ile Ile

805 810 815

Pro Gln Ala Arg Leu Phe Asp Asp Ser Phe Ser Asn Lys Thr Leu Glu

820 825 830

Ala Arg Ser Val Asn Ile Glu Lys Gly Asn Lys Thr Ala Tyr Asp Phe

835 840 845

Val Lys Glu Lys Phe Gly Glu Ser Gly Ala Asp Asn Ser Leu Glu His

850 855 860

Tyr Leu Asn Asn Ile Glu Asp Leu Phe Lys Ser Gly Lys Ile Ser Lys

865 870 875 880

Thr Lys Tyr Asn Lys Leu Lys Met Ala Glu Gln Asp Ile Pro Asp Gly

885 890 895

Phe Ile Glu Arg Asp Leu Arg Asn Thr Gln Tyr Ile Ala Lys Lys Ala

900 905 910

Leu Ser Met Leu Asn Glu Ile Ser His Arg Val Val Ala Thr Ser Gly

915 920 925

Ser Val Thr Asp Lys Leu Arg Glu Asp Trp Gln Leu Ile Asp Val Met

930 935 940

Lys Glu Leu Asn Trp Glu Lys Tyr Lys Ala Leu Gly Leu Val Glu Tyr

945 950 955 960

Phe Glu Asp Arg Asp Gly Arg Gln Ile Gly Arg Ile Lys Asp Trp Thr

965 970 975

Lys Arg Asn Asp His Arg His His Ala Met Asp Ala Leu Thr Val Ala

980 985 990

Phe Thr Lys Asp Val Phe Ile Gln Tyr Phe Asn Asn Lys Asn Ala Ser

995 1000 1005

Leu Asp Pro Asn Ala Asn Glu His Ala Ile Lys Asn Lys Tyr Phe

1010 1015 1020

Gln Asn Gly Arg Ala Ile Ala Pro Met Pro Leu Arg Glu Phe Arg

1025 1030 1035

Ala Glu Ala Lys Lys His Leu Glu Asn Thr Leu Ile Ser Ile Lys

1040 1045 1050

Ala Lys Asn Lys Val Ile Thr Gly Asn Ile Asn Lys Thr Arg Lys

1055 1060 1065

Lys Gly Gly Val Asn Lys Asn Met Gln Gln Thr Pro Arg Gly Gln

1070 1075 1080

Leu His Leu Glu Thr Ile Tyr Gly Ser Gly Lys Gln Tyr Leu Thr

1085 1090 1095

Lys Glu Glu Lys Val Asn Ala Ser Phe Asp Met Arg Lys Ile Gly

1100 1105 1110

Thr Val Ser Lys Ser Ala Tyr Arg Asp Ala Leu Leu Lys Arg Leu

1115 1120 1125

Tyr Glu Asn Asp Asn Asp Pro Lys Lys Ala Phe Ala Gly Lys Asn

1130 1135 1140

Ser Leu Asp Lys Gln Pro Ile Trp Leu Asp Lys Glu Gln Met Arg

1145 1150 1155

Lys Val Pro Glu Lys Val Lys Ile Val Thr Leu Glu Ala Ile Tyr

1160 1165 1170

Thr Ile Arg Lys Glu Ile Ser Pro Asp Leu Lys Val Asp Lys Val

1175 1180 1185

Ile Asp Val Gly Val Arg Lys Ile Leu Ile Asp Arg Leu Asn Glu

1190 1195 1200

Tyr Gly Asn Asp Ala Lys Lys Ala Phe Ser Asn Leu Asp Lys Asn

1205 1210 1215

Pro Ile Trp Leu Asn Lys Glu Lys Gly Ile Ser Ile Lys Arg Val

1220 1225 1230

Thr Ile Ser Gly Ile Ser Asn Ala Gln Ser Leu His Val Lys Lys

1235 1240 1245

Asp Lys Asp Gly Lys Pro Ile Leu Asp Glu Asn Gly Arg Asn Ile

1250 1255 1260

Pro Val Asp Phe Val Asn Thr Gly Asn Asn His His Val Ala Val

1265 1270 1275

Tyr Tyr Arg Pro Val Ile Asp Lys Arg Gly Gln Leu Val Val Asp

1280 1285 1290

Glu Ala Gly Asn Pro Lys Tyr Glu Leu Glu Glu Val Val Val Ser

1295 1300 1305

Phe Phe Glu Ala Val Thr Arg Ala Asn Leu Gly Leu Pro Ile Ile

1310 1315 1320

Asp Lys Asp Tyr Lys Thr Thr Glu Gly Trp Gln Phe Leu Phe Ser

1325 1330 1335

Met Lys Gln Asn Glu Tyr Phe Val Phe Pro Asn Glu Lys Thr Gly

1340 1345 1350

Phe Asn Pro Lys Glu Ile Asp Leu Leu Asp Val Glu Asn Tyr Gly

1355 1360 1365

Leu Ile Ser Pro Asn Leu Phe Arg Val Gln Lys Phe Ser Leu Lys

1370 1375 1380

Asn Tyr Val Phe Arg His His Leu Glu Thr Thr Ile Lys Asp Thr

1385 1390 1395

Ser Ser Ile Leu Arg Gly Ile Thr Trp Ile Asp Phe Arg Ser Ser

1400 1405 1410

Lys Gly Leu Asp Thr Ile Val Lys Val Arg Val Asn His Ile Gly

1415 1420 1425

Gln Ile Val Ser Val Gly Glu Tyr

1430 1435

<210> SEQ ID NO 22

<211> LENGTH: 1314

<212> TYPE: PRT

<213> ORGANISM: Mycoplasma synoviae

<400> SEQUENCE: 22

Met Leu Arg Leu Tyr Cys Ala Asn Asn Leu Val Leu Asn Asn Val Gln

1 5 10 15

Asn Leu Trp Lys Tyr Leu Leu Leu Leu Ile Phe Asp Lys Lys Ile Ile

20 25 30

Phe Leu Phe Lys Ile Lys Val Ile Leu Ile Arg Arg Tyr Met Glu Asn

35 40 45

Asn Asn Lys Glu Lys Ile Val Ile Gly Phe Asp Leu Gly Val Ala Ser

50 55 60

Val Gly Trp Ser Ile Val Asn Ala Glu Thr Lys Glu Val Ile Asp Leu

65 70 75 80

Gly Val Arg Leu Phe Ser Glu Pro Glu Lys Ala Asp Tyr Arg Arg Ala

85 90 95

Lys Arg Thr Thr Arg Arg Leu Leu Arg Arg Lys Lys Phe Lys Arg Glu

100 105 110

Lys Phe His Lys Leu Ile Leu Lys Asn Ala Glu Ile Phe Gly Leu Gln

115 120 125

Ser Arg Asn Glu Ile Leu Asn Val Tyr Lys Asp Gln Ser Ser Lys Tyr

130 135 140

Arg Asn Ile Leu Lys Leu Lys Ile Asn Ala Leu Lys Glu Glu Ile Lys

145 150 155 160

Pro Ser Glu Leu Val Trp Ile Leu Arg Asp Tyr Leu Gln Asn Arg Gly

165 170 175

Tyr Phe Tyr Lys Asn Glu Lys Leu Thr Asp Glu Phe Val Ser Asn Ser

180 185 190

Phe Pro Ser Lys Lys Leu His Glu His Tyr Glu Lys Tyr Gly Phe Phe

195 200 205

Arg Gly Ser Val Lys Leu Asp Asn Lys Leu Asp Asn Lys Lys Asp Lys

210 215 220

Ala Lys Glu Lys Asp Glu Glu Glu Glu Ser Asp Ala Lys Lys Glu Ser

225 230 235 240

Glu Glu Leu Ile Phe Ser Asn Lys Gln Trp Ile Asn Glu Ile Val Lys

245 250 255

Val Phe Glu Asn Gln Ser Tyr Leu Thr Glu Ser Phe Lys Glu Glu Tyr

260 265 270

Leu Lys Leu Phe Asn Tyr Val Arg Pro Phe Asn Lys Gly Pro Gly Ser

275 280 285

Lys Asn Ser Arg Thr Ala Tyr Gly Val Phe Ser Thr Asp Ile Asp Pro

290 295 300

Glu Thr Asn Lys Phe Lys Asp Tyr Ser Asn Ile Trp Asp Lys Thr Ile

305 310 315 320

Gly Lys Cys Ser Leu Phe Glu Glu Glu Ile Arg Ala Pro Lys Asn Leu

325 330 335

Pro Ser Ala Leu Ile Phe Asn Leu Gln Asn Glu Ile Cys Thr Ile Lys

340 345 350

Asn Glu Phe Thr Glu Phe Lys Asn Trp Trp Leu Asn Ala Glu Gln Lys

355 360 365

Ser Glu Ile Leu Lys Phe Val Phe Thr Glu Leu Phe Asn Trp Lys Asp

370 375 380

Lys Lys Tyr Ser Asp Lys Lys Phe Asn Lys Asn Leu Gln Asp Lys Ile

385 390 395 400

Lys Lys Tyr Leu Leu Asn Phe Ala Leu Glu Asn Phe Asn Leu Asn Glu

405 410 415

Glu Ile Leu Lys Asn Arg Asp Leu Glu Asn Asp Thr Val Leu Gly Leu

420 425 430

Lys Gly Val Lys Tyr Tyr Glu Lys Ser Asn Ala Thr Ala Asp Ala Ala

435 440 445

Leu Glu Phe Ser Ser Leu Lys Pro Leu Tyr Val Phe Ile Lys Phe Leu

450 455 460

Lys Glu Lys Lys Leu Asp Leu Asn Tyr Leu Leu Gly Leu Glu Asn Thr

465 470 475 480

Glu Ile Leu Tyr Phe Leu Asp Ser Ile Tyr Leu Ala Ile Ser Tyr Ser

485 490 495

Ser Asp Leu Lys Glu Arg Asn Glu Trp Phe Lys Lys Leu Leu Lys Glu

500 505 510

Leu Tyr Pro Lys Ile Lys Asn Asn Asn Leu Glu Ile Ile Glu Asn Val

515 520 525

Glu Asp Ile Phe Glu Ile Thr Asp Gln Glu Lys Phe Glu Ser Phe Ser

530 535 540

Lys Thr His Ser Leu Ser Arg Glu Ala Phe Asn His Ile Ile Pro Leu

545 550 555 560

Leu Leu Ser Asn Asn Glu Gly Lys Asn Tyr Glu Ser Leu Lys His Ser

565 570 575

Asn Glu Glu Leu Lys Lys Arg Thr Glu Lys Ala Glu Leu Lys Ala Gln

580 585 590

Gln Asn Gln Lys Tyr Leu Lys Asp Asn Phe Leu Lys Glu Ala Leu Val

595 600 605

Pro Leu Ser Val Lys Thr Ser Val Leu Gln Ala Ile Lys Ile Phe Asn

610 615 620

Gln Ile Ile Lys Asn Phe Gly Lys Lys Tyr Glu Ile Ser Gln Val Val

625 630 635 640

Ile Glu Met Ala Arg Glu Leu Thr Lys Pro Asn Leu Glu Lys Leu Leu

645 650 655

Asn Asn Ala Thr Asn Ser Asn Ile Lys Ile Leu Lys Glu Lys Leu Asp

660 665 670

Gln Thr Glu Lys Phe Asp Asp Phe Thr Lys Lys Lys Phe Ile Asp Lys

675 680 685

Ile Glu Asn Ser Val Val Phe Arg Asn Lys Leu Phe Leu Trp Phe Glu

690 695 700

Gln Asp Arg Lys Asp Pro Tyr Thr Gln Leu Asp Ile Lys Ile Asn Glu

705 710 715 720

Ile Glu Asp Glu Thr Glu Ile Asp His Val Ile Pro Tyr Ser Lys Ser

725 730 735

Ala Asp Asp Ser Trp Phe Asn Lys Leu Leu Val Lys Lys Ser Thr Asn

740 745 750

Gln Leu Lys Lys Asn Lys Thr Val Trp Glu Tyr Tyr Gln Asn Glu Ser

755 760 765

Asp Pro Glu Ala Lys Trp Asn Lys Phe Val Ala Trp Ala Lys Arg Ile

770 775 780

Tyr Leu Val Gln Lys Ser Asp Lys Glu Ser Lys Asp Asn Ser Glu Lys

785 790 795 800

Asn Ser Ile Phe Lys Asn Lys Lys Pro Asn Leu Lys Phe Lys Asn Ile

805 810 815

Thr Lys Lys Leu Phe Asp Pro Tyr Lys Asp Leu Gly Phe Leu Ala Arg

820 825 830

Asn Leu Asn Asp Thr Arg Tyr Ala Thr Lys Val Phe Arg Asp Gln Leu

835 840 845

Asn Asn Tyr Ser Lys His His Ser Lys Asp Asp Glu Asn Lys Leu Phe

850 855 860

Lys Val Val Cys Met Asn Gly Ser Ile Thr Ser Phe Leu Arg Lys Ser

865 870 875 880

Met Trp Arg Lys Asn Glu Glu Gln Val Tyr Arg Phe Asn Phe Trp Lys

885 890 895

Lys Asp Arg Asp Gln Phe Phe His His Ala Val Asp Ala Ser Ile Ile

900 905 910

Ala Ile Phe Ser Leu Leu Thr Lys Thr Leu Tyr Asn Lys Leu Arg Val

915 920 925

Tyr Glu Ser Tyr Asp Val Gln Arg Arg Glu Asp Gly Val Tyr Leu Ile

930 935 940

Asn Lys Glu Thr Gly Glu Val Lys Lys Ala Asp Lys Asp Tyr Trp Lys

945 950 955 960

Asp Gln His Asn Phe Leu Lys Ile Arg Glu Asn Ala Ile Glu Ile Lys

965 970 975

Asn Val Leu Asn Asn Val Asp Phe Gln Asn Gln Val Arg Tyr Ser Arg

980 985 990

Lys Ala Asn Thr Lys Leu Asn Thr Gln Leu Phe Asn Glu Thr Leu Tyr

995 1000 1005

Gly Val Lys Glu Phe Glu Asn Asn Phe Tyr Lys Leu Glu Lys Val

1010 1015 1020

Asn Leu Phe Ser Arg Lys Asp Leu Arg Lys Phe Ile Leu Glu Asp

1025 1030 1035

Leu Asn Glu Glu Ser Glu Lys Asn Lys Lys Asn Glu Asn Gly Ser

1040 1045 1050

Arg Lys Arg Ile Leu Thr Glu Lys Tyr Ile Val Asp Glu Ile Leu

1055 1060 1065

Gln Ile Leu Glu Asn Glu Glu Phe Lys Asp Ser Lys Ser Asp Ile

1070 1075 1080

Asn Ala Leu Asn Lys Tyr Met Asp Ser Leu Pro Ser Lys Phe Ser

1085 1090 1095

Glu Phe Phe Ser Gln Asp Phe Ile Asn Lys Cys Lys Lys Glu Asn

1100 1105 1110

Ser Leu Ile Leu Thr Phe Asp Ala Ile Lys His Asn Asp Pro Lys

1115 1120 1125

Lys Val Ile Lys Ile Lys Asn Leu Lys Phe Phe Arg Glu Asp Ala

1130 1135 1140

Thr Leu Lys Asn Lys Gln Ala Val His Lys Asp Ser Lys Asn Gln

1145 1150 1155

Ile Lys Ser Phe Tyr Glu Ser Tyr Lys Cys Val Gly Phe Ile Trp

1160 1165 1170

Leu Lys Asn Lys Asn Asp Leu Glu Glu Ser Ile Phe Val Pro Ile

1175 1180 1185

Asn Ser Arg Val Ile His Phe Gly Asp Lys Asp Lys Asp Ile Phe

1190 1195 1200

Asp Phe Asp Ser Tyr Asn Lys Glu Lys Leu Leu Asn Glu Ile Asn

1205 1210 1215

Leu Lys Arg Pro Glu Asn Lys Lys Phe Asn Ser Ile Asn Glu Ile

1220 1225 1230

Glu Phe Val Lys Phe Val Lys Pro Gly Ala Leu Leu Leu Asn Phe

1235 1240 1245

Glu Asn Gln Gln Ile Tyr Tyr Ile Ser Thr Leu Glu Ser Ser Ser

1250 1255 1260

Leu Arg Ala Lys Ile Lys Leu Leu Asn Lys Met Asp Lys Gly Lys

1265 1270 1275

Ala Val Ser Met Lys Lys Ile Thr Asn Pro Asp Glu Tyr Lys Ile

1280 1285 1290

Ile Glu His Val Asn Pro Leu Gly Ile Asn Leu Asn Trp Thr Lys

1295 1300 1305

Lys Leu Glu Asn Asn Asn

1310

<210> SEQ ID NO 23

<211> LENGTH: 1368

<212> TYPE: PRT

<213> ORGANISM: Streptococcus pyogenes

<400> SEQUENCE: 23

Met Asp Lys Lys Tyr Ser Ile Gly Leu Asp Ile Gly Thr Asn Ser Val

1 5 10 15

Gly Trp Ala Val Ile Thr Asp Asp Tyr Lys Val Pro Ser Lys Lys Phe

20 25 30

Lys Val Leu Gly Asn Thr Asp Arg His Ser Ile Lys Lys Asn Leu Ile

35 40 45

Gly Ala Leu Leu Phe Asp Ser Gly Glu Thr Ala Glu Ala Thr Arg Leu

50 55 60

Lys Arg Thr Ala Arg Arg Arg Tyr Thr Arg Arg Lys Asn Arg Ile Cys

65 70 75 80

Tyr Leu Gln Glu Ile Phe Ser Asn Glu Met Ala Lys Val Asp Asp Ser

85 90 95

Phe Phe His Arg Leu Glu Glu Ser Phe Leu Val Glu Glu Asp Lys Lys

100 105 110

His Glu Arg His Pro Ile Phe Gly Asn Ile Val Asp Glu Val Ala Tyr

115 120 125

His Glu Lys Tyr Pro Thr Ile Tyr His Leu Arg Lys Lys Leu Val Asp

130 135 140

Ser Thr Asp Lys Ala Asp Leu Arg Leu Ile Tyr Leu Ala Leu Ala His

145 150 155 160

Met Ile Lys Phe Arg Gly His Phe Leu Ile Glu Gly Asp Leu Asn Pro

165 170 175

Asp Asn Ser Asp Val Asp Lys Leu Phe Ile Gln Leu Val Gln Thr Tyr

180 185 190

Asn Gln Leu Phe Glu Glu Asn Pro Ile Asn Ala Ser Gly Val Asp Ala

195 200 205

Lys Ala Ile Leu Ser Ala Arg Leu Ser Lys Ser Arg Arg Leu Glu Asn

210 215 220

Leu Ile Ala Gln Leu Pro Gly Glu Lys Lys Asn Gly Leu Phe Gly Asn

225 230 235 240

Leu Ile Ala Leu Ser Leu Gly Leu Thr Pro Asn Phe Lys Ser Asn Phe

245 250 255

Asp Leu Ala Glu Asp Ala Lys Leu Gln Leu Ser Lys Asp Thr Tyr Asp

260 265 270

Asp Asp Leu Asp Asn Leu Leu Ala Gln Ile Gly Asp Gln Tyr Ala Asp

275 280 285

Leu Phe Leu Ala Ala Lys Asn Leu Ser Asp Ala Ile Leu Leu Ser Asp

290 295 300

Ile Leu Arg Leu Asn Ser Glu Ile Thr Lys Ala Pro Leu Ser Ala Ser

305 310 315 320

Met Ile Lys Arg Tyr Asp Glu His His Gln Asp Leu Thr Leu Leu Lys

325 330 335

Ala Leu Val Arg Gln Gln Leu Pro Glu Lys Tyr Lys Glu Ile Phe Phe

340 345 350

Asp Gln Ser Lys Asn Gly Tyr Ala Gly Tyr Ile Asp Gly Gly Ala Ser

355 360 365

Gln Glu Glu Phe Tyr Lys Phe Ile Lys Pro Ile Leu Glu Lys Met Asp

370 375 380

Gly Thr Glu Glu Leu Leu Ala Lys Leu Asn Arg Glu Asp Leu Leu Arg

385 390 395 400

Lys Gln Arg Thr Phe Asp Asn Gly Ser Ile Pro His Gln Ile His Leu

405 410 415

Gly Glu Leu His Ala Ile Leu Arg Arg Gln Glu Asp Phe Tyr Pro Phe

420 425 430

Leu Lys Asp Asn Arg Glu Lys Ile Glu Lys Ile Leu Thr Phe Arg Ile

435 440 445

Pro Tyr Tyr Val Gly Pro Leu Ala Arg Gly Asn Ser Arg Phe Ala Trp

450 455 460

Met Thr Arg Lys Ser Glu Glu Thr Ile Thr Pro Trp Asn Phe Glu Glu

465 470 475 480

Val Val Asp Lys Gly Ala Ser Ala Gln Ser Phe Ile Glu Arg Met Thr

485 490 495

Asn Phe Asp Lys Asn Leu Pro Asn Glu Lys Val Leu Pro Lys His Ser

500 505 510

Leu Leu Tyr Glu Tyr Phe Thr Val Tyr Asn Glu Leu Thr Lys Val Lys

515 520 525

Tyr Val Thr Glu Gly Met Arg Lys Pro Ala Phe Leu Ser Gly Glu Gln

530 535 540

Lys Lys Ala Ile Val Asp Leu Leu Phe Lys Thr Asn Arg Lys Val Thr

545 550 555 560

Val Lys Gln Leu Lys Glu Asp Tyr Phe Lys Lys Ile Glu Cys Phe Asp

565 570 575

Ser Val Glu Ile Ser Gly Val Glu Asp Arg Phe Asn Ala Ser Leu Gly

580 585 590

Thr Tyr His Asp Leu Leu Lys Ile Ile Lys Asp Lys Asp Phe Leu Asp

595 600 605

Asn Glu Glu Asn Glu Asp Ile Leu Glu Asp Ile Val Leu Thr Leu Thr

610 615 620

Leu Phe Glu Asp Arg Glu Met Ile Glu Glu Arg Leu Lys Thr Tyr Ala

625 630 635 640

His Leu Phe Asp Asp Lys Val Met Lys Gln Leu Lys Arg Arg Arg Tyr

645 650 655

Thr Gly Trp Gly Arg Leu Ser Arg Lys Leu Ile Asn Gly Ile Arg Asp

660 665 670

Lys Gln Ser Gly Lys Thr Ile Leu Asp Phe Leu Lys Ser Asp Gly Phe

675 680 685

Ala Asn Arg Asn Phe Met Gln Leu Ile His Asp Asp Ser Leu Thr Phe

690 695 700

Lys Glu Asp Ile Gln Lys Ala Gln Val Ser Gly Gln Gly Asp Ser Leu

705 710 715 720

His Glu His Ile Ala Asn Leu Ala Gly Ser Pro Ala Ile Lys Lys Gly

725 730 735

Ile Leu Gln Thr Val Lys Val Val Asp Glu Leu Val Lys Val Met Gly

740 745 750

Arg His Lys Pro Glu Asn Ile Val Ile Glu Met Ala Arg Glu Asn Gln

755 760 765

Thr Thr Gln Lys Gly Gln Lys Asn Ser Arg Glu Arg Met Lys Arg Ile

770 775 780

Glu Glu Gly Ile Lys Glu Leu Gly Ser Gln Ile Leu Lys Glu His Pro

785 790 795 800

Val Glu Asn Thr Gln Leu Gln Asn Glu Lys Leu Tyr Leu Tyr Tyr Leu

805 810 815

Gln Asn Gly Arg Asp Met Tyr Val Asp Gln Glu Leu Asp Ile Asn Arg

820 825 830

Leu Ser Asp Tyr Asp Val Asp His Ile Val Pro Gln Ser Phe Ile Lys

835 840 845

Asp Asp Ser Ile Asp Asn Lys Val Leu Thr Arg Ser Asp Lys Asn Arg

850 855 860

Gly Lys Ser Asn Asn Val Pro Ser Glu Glu Val Val Lys Lys Met Lys

865 870 875 880

Asn Tyr Trp Arg Gln Leu Leu Asn Ala Lys Leu Ile Thr Gln Arg Lys

885 890 895

Phe Asp Asn Leu Thr Lys Ala Glu Arg Gly Gly Leu Ser Glu Leu Asp

900 905 910

Lys Ala Gly Phe Ile Lys Arg Gln Leu Val Glu Thr Arg Gln Ile Thr

915 920 925

Lys His Val Ala Gln Ile Leu Asp Ser Arg Met Asn Thr Lys Tyr Asp

930 935 940

Glu Asn Asp Lys Leu Ile Arg Glu Val Lys Val Ile Thr Leu Lys Ser

945 950 955 960

Lys Leu Val Ser Asp Phe Arg Lys Asp Phe Gln Phe Tyr Lys Val Arg

965 970 975

Glu Ile Asn Asn Tyr His His Ala His Asp Ala Tyr Leu Asn Ala Val

980 985 990

Val Gly Thr Ala Leu Ile Lys Lys Tyr Pro Lys Leu Glu Ser Glu Phe

995 1000 1005

Val Tyr Gly Asp Tyr Lys Val Tyr Asp Val Arg Lys Met Ile Ala

1010 1015 1020

Lys Ser Glu Gln Glu Ile Gly Lys Ala Thr Ala Lys Tyr Phe Phe

1025 1030 1035

Tyr Ser Asn Ile Met Asn Phe Phe Lys Thr Glu Ile Thr Leu Ala

1040 1045 1050

Asn Gly Glu Ile Arg Lys Arg Pro Leu Ile Glu Thr Asn Gly Glu

1055 1060 1065

Thr Gly Glu Ile Val Trp Asp Lys Gly Arg Asp Phe Ala Thr Val

1070 1075 1080

Arg Lys Val Leu Ser Met Pro Gln Val Asn Ile Val Lys Lys Thr

1085 1090 1095

Glu Val Gln Thr Gly Gly Phe Ser Lys Glu Ser Ile Leu Pro Lys

1100 1105 1110

Arg Asn Ser Asp Lys Leu Ile Ala Arg Lys Lys Asp Trp Asp Pro

1115 1120 1125

Lys Lys Tyr Gly Gly Phe Asp Ser Pro Thr Val Ala Tyr Ser Val

1130 1135 1140

Leu Val Val Ala Lys Val Glu Lys Gly Lys Ser Lys Lys Leu Lys

1145 1150 1155

Ser Val Lys Glu Leu Leu Gly Ile Thr Ile Met Glu Arg Ser Ser

1160 1165 1170

Phe Glu Lys Asn Pro Ile Asp Phe Leu Glu Ala Lys Gly Tyr Lys

1175 1180 1185

Glu Val Arg Lys Asp Leu Ile Val Lys Leu Pro Lys Tyr Ser Leu

1190 1195 1200

Phe Glu Leu Glu Asn Gly Arg Lys Arg Met Leu Ala Ser Ala Gly

1205 1210 1215

Glu Leu Gln Lys Gly Asn Glu Leu Ala Leu Pro Ser Lys Tyr Val

1220 1225 1230

Asn Phe Leu Tyr Leu Ala Ser His Tyr Glu Lys Leu Lys Gly Ser

1235 1240 1245

Pro Glu Asp Asn Glu Gln Lys Gln Leu Phe Val Glu Gln His Lys

1250 1255 1260

His Tyr Leu Asp Glu Ile Ile Glu Gln Ile Ser Glu Phe Ser Lys

1265 1270 1275

Arg Val Ile Leu Ala Asp Ala Asn Leu Asp Lys Val Leu Ser Ala

1280 1285 1290

Tyr Asn Lys His Arg Asp Lys Pro Ile Arg Glu Gln Ala Glu Asn

1295 1300 1305

Ile Ile His Leu Phe Thr Leu Thr Asn Leu Gly Ala Pro Ala Ala

1310 1315 1320

Phe Lys Tyr Phe Asp Thr Thr Ile Asp Arg Lys Arg Tyr Thr Ser

1325 1330 1335

Thr Lys Glu Val Leu Asp Ala Thr Phe Ile His Gln Ser Ile Thr

1340 1345 1350

Gly Leu Tyr Glu Thr Arg Ile Asp Leu Ser Gln Leu Gly Gly Asp

1355 1360 1365

<210> SEQ ID NO 24

<211> LENGTH: 1368

<212> TYPE: PRT

<213> ORGANISM: Streptococcus pyogenes

<400> SEQUENCE: 24

Met Asp Lys Lys Tyr Ser Ile Gly Leu Asp Ile Gly Thr Asn Ser Val

1 5 10 15

Gly Trp Ala Val Ile Thr Asp Asp Tyr Lys Val Pro Ser Lys Lys Phe

20 25 30

Lys Val Leu Gly Asn Thr Asp Arg His Ser Ile Lys Lys Asn Leu Ile

35 40 45

Gly Ala Leu Leu Phe Asp Ser Gly Glu Thr Ala Glu Ala Thr Arg Leu

50 55 60

Lys Arg Thr Ala Arg Arg Arg Tyr Thr Arg Arg Lys Asn Arg Ile Cys

65 70 75 80

Tyr Leu Gln Glu Ile Phe Ser Asn Glu Met Ala Lys Val Asp Asp Ser

85 90 95

Phe Phe His Arg Leu Glu Glu Ser Phe Leu Val Glu Glu Asp Lys Lys

100 105 110

His Glu Arg His Pro Ile Phe Gly Asn Ile Val Asp Glu Val Ala Tyr

115 120 125

His Glu Lys Tyr Pro Thr Ile Tyr His Leu Arg Lys Lys Leu Val Asp

130 135 140

Ser Thr Asp Lys Ala Asp Leu Arg Leu Ile Tyr Leu Ala Leu Ala His

145 150 155 160

Met Ile Lys Phe Arg Gly His Phe Leu Ile Glu Gly Asp Leu Asn Pro

165 170 175

Asp Asn Ser Asp Val Asp Lys Leu Phe Ile Gln Leu Val Gln Thr Tyr

180 185 190

Asn Gln Leu Phe Glu Glu Asn Pro Ile Asn Ala Ser Gly Val Asp Ala

195 200 205

Lys Ala Ile Leu Ser Ala Arg Leu Ser Lys Ser Arg Arg Leu Glu Asn

210 215 220

Leu Ile Ala Gln Leu Pro Gly Glu Lys Lys Asn Gly Leu Phe Gly Asn

225 230 235 240

Leu Ile Ala Leu Ser Leu Gly Leu Thr Pro Asn Phe Lys Ser Asn Phe

245 250 255

Asp Leu Ala Glu Asp Ala Lys Leu Gln Leu Ser Lys Asp Thr Tyr Asp

260 265 270

Asp Asp Leu Asp Asn Leu Leu Ala Gln Ile Gly Asp Gln Tyr Ala Asp

275 280 285

Leu Phe Leu Ala Ala Lys Asn Leu Ser Asp Ala Ile Leu Leu Ser Asp

290 295 300

Ile Leu Arg Val Asn Thr Glu Ile Thr Lys Ala Pro Leu Ser Ala Ser

305 310 315 320

Met Ile Lys Arg Tyr Asp Glu His His Gln Asp Leu Thr Leu Leu Lys

325 330 335

Ala Leu Val Arg Gln Gln Leu Pro Glu Lys Tyr Lys Glu Ile Phe Phe

340 345 350

Asp Gln Ser Lys Asn Gly Tyr Ala Gly Tyr Ile Asp Gly Gly Ala Ser

355 360 365

Gln Glu Glu Phe Tyr Lys Phe Ile Lys Pro Ile Leu Glu Lys Met Asp

370 375 380

Gly Thr Glu Glu Leu Leu Val Lys Leu Asn Arg Glu Asp Leu Leu Arg

385 390 395 400

Lys Gln Arg Thr Phe Asp Asn Gly Ser Ile Pro His Gln Ile His Leu

405 410 415

Gly Glu Leu His Ala Ile Leu Arg Arg Gln Glu Asp Phe Tyr Pro Phe

420 425 430

Leu Lys Asp Asn Arg Glu Lys Ile Glu Lys Ile Leu Thr Phe Arg Ile

435 440 445

Pro Tyr Tyr Val Gly Pro Leu Ala Arg Gly Asn Ser Arg Phe Ala Trp

450 455 460

Met Thr Arg Lys Ser Glu Glu Thr Ile Thr Pro Trp Asn Phe Glu Glu

465 470 475 480

Val Val Asp Lys Gly Ala Ser Ala Gln Ser Phe Ile Glu Arg Met Thr

485 490 495

Asn Phe Asp Lys Asn Leu Pro Asn Glu Lys Val Leu Pro Lys His Ser

500 505 510

Leu Leu Tyr Glu Tyr Phe Thr Val Tyr Asn Glu Leu Thr Lys Val Lys

515 520 525

Tyr Val Thr Glu Gly Met Arg Lys Pro Ala Phe Leu Ser Gly Glu Gln

530 535 540

Lys Lys Ala Ile Val Asp Leu Leu Phe Lys Thr Asn Arg Lys Val Thr

545 550 555 560

Val Lys Gln Leu Lys Glu Asp Tyr Phe Lys Lys Ile Glu Cys Phe Asp

565 570 575

Ser Val Glu Ile Ser Gly Val Glu Asp Arg Phe Asn Ala Ser Leu Gly

580 585 590

Thr Tyr His Asp Leu Leu Lys Ile Ile Lys Asp Lys Asp Phe Leu Asp

595 600 605

Asn Glu Glu Asn Glu Asp Ile Leu Glu Asp Ile Val Leu Thr Leu Thr

610 615 620

Leu Phe Glu Asp Arg Glu Met Ile Glu Glu Arg Leu Lys Thr Tyr Ala

625 630 635 640

His Leu Phe Asp Asp Lys Val Met Lys Gln Leu Lys Arg Arg Arg Tyr

645 650 655

Thr Gly Trp Gly Arg Leu Ser Arg Lys Leu Ile Asn Gly Ile Arg Asp

660 665 670

Lys Gln Ser Gly Lys Thr Ile Leu Asp Phe Leu Lys Ser Asp Gly Phe

675 680 685

Ala Asn Arg Asn Phe Met Gln Leu Ile His Asp Asp Ser Leu Thr Phe

690 695 700

Lys Glu Asp Ile Gln Lys Ala Gln Val Ser Gly Gln Gly Asp Ser Leu

705 710 715 720

His Glu His Ile Ala Asn Leu Ala Gly Ser Pro Ala Ile Lys Lys Gly

725 730 735

Ile Leu Gln Thr Val Lys Val Val Asp Glu Leu Val Lys Val Met Gly

740 745 750

Arg His Lys Pro Glu Asn Ile Val Ile Glu Met Ala Arg Glu Asn Gln

755 760 765

Thr Thr Gln Lys Gly Gln Lys Asn Ser Arg Glu Arg Met Lys Arg Ile

770 775 780

Glu Glu Gly Ile Lys Glu Leu Gly Ser Gln Ile Leu Lys Glu His Pro

785 790 795 800

Val Glu Asn Thr Gln Leu Gln Asn Glu Lys Leu Tyr Leu Tyr Tyr Leu

805 810 815

Gln Asn Gly Arg Asp Met Tyr Val Asp Gln Glu Leu Asp Ile Asn Arg

820 825 830

Leu Ser Asp Tyr Asp Val Asp His Ile Val Pro Gln Ser Phe Leu Lys

835 840 845

Asp Asp Ser Ile Asp Asn Lys Val Leu Thr Arg Ser Asp Lys Asn Arg

850 855 860

Gly Lys Ser Asp Asn Val Pro Ser Glu Glu Val Val Lys Lys Met Lys

865 870 875 880

Asn Tyr Trp Arg Gln Leu Leu Asn Ala Lys Leu Ile Thr Gln Arg Lys

885 890 895

Phe Asp Asn Leu Thr Lys Ala Glu Arg Gly Gly Leu Ser Glu Leu Asp

900 905 910

Lys Ala Gly Phe Ile Lys Arg Gln Leu Val Glu Thr Arg Gln Ile Thr

915 920 925

Lys His Val Ala Gln Ile Leu Asp Ser Arg Met Asn Thr Lys Tyr Asp

930 935 940

Glu Asn Asp Lys Leu Ile Arg Glu Val Lys Val Ile Thr Leu Lys Ser

945 950 955 960

Lys Leu Val Ser Asp Phe Arg Lys Asp Phe Gln Phe Tyr Lys Val Arg

965 970 975

Glu Ile Asn Asn Tyr His His Ala His Asp Ala Tyr Leu Asn Ala Val

980 985 990

Val Gly Thr Ala Leu Ile Lys Lys Tyr Pro Lys Leu Glu Ser Glu Phe

995 1000 1005

Val Tyr Gly Asp Tyr Lys Val Tyr Asp Val Arg Lys Met Ile Ala

1010 1015 1020

Lys Ser Glu Gln Glu Ile Gly Lys Ala Thr Ala Lys Tyr Phe Phe

1025 1030 1035

Tyr Ser Asn Ile Met Asn Phe Phe Lys Thr Glu Ile Thr Leu Ala

1040 1045 1050

Asn Gly Glu Ile Arg Lys Arg Pro Leu Ile Glu Thr Asn Gly Glu

1055 1060 1065

Thr Gly Glu Ile Val Trp Asp Lys Gly Arg Asp Phe Ala Thr Val

1070 1075 1080

Arg Lys Val Leu Ser Met Pro Gln Val Asn Ile Val Lys Lys Thr

1085 1090 1095

Glu Val Gln Thr Gly Gly Phe Ser Lys Glu Ser Ile Leu Pro Lys

1100 1105 1110

Arg Asn Ser Asp Lys Leu Ile Ala Arg Lys Lys Asp Trp Asp Pro

1115 1120 1125

Lys Lys Tyr Gly Gly Phe Asp Ser Pro Thr Val Ala Tyr Ser Val

1130 1135 1140

Leu Val Val Ala Lys Val Glu Lys Gly Lys Ser Lys Lys Leu Lys

1145 1150 1155

Ser Val Lys Glu Leu Leu Gly Ile Thr Ile Met Glu Arg Ser Ser

1160 1165 1170

Phe Glu Lys Asn Pro Ile Asp Phe Leu Glu Ala Lys Gly Tyr Lys

1175 1180 1185

Glu Val Lys Lys Asp Leu Ile Ile Lys Leu Pro Lys Tyr Ser Leu

1190 1195 1200

Phe Glu Leu Glu Asn Gly Arg Lys Arg Met Leu Ala Ser Ala Gly

1205 1210 1215

Glu Leu Gln Lys Gly Asn Glu Leu Ala Leu Pro Ser Lys Tyr Val

1220 1225 1230

Asn Phe Leu Tyr Leu Ala Ser His Tyr Glu Lys Leu Lys Gly Ser

1235 1240 1245

Pro Glu Asp Asn Glu Gln Lys Gln Leu Phe Val Glu Gln His Lys

1250 1255 1260

His Tyr Leu Asp Glu Ile Ile Glu Gln Ile Ser Glu Phe Ser Lys

1265 1270 1275

Arg Val Ile Leu Ala Asp Ala Asn Leu Asp Lys Val Leu Ser Ala

1280 1285 1290

Tyr Asn Lys His Arg Asp Lys Pro Ile Arg Glu Gln Ala Glu Asn

1295 1300 1305

Ile Ile His Leu Phe Thr Leu Thr Asn Leu Gly Ala Pro Ala Ala

1310 1315 1320

Phe Lys Tyr Phe Asp Thr Thr Ile Asp Arg Lys Arg Tyr Thr Ser

1325 1330 1335

Thr Lys Glu Val Leu Asp Ala Thr Leu Ile His Gln Ser Ile Thr

1340 1345 1350

Gly Leu Tyr Glu Thr Arg Ile Asp Leu Ser Gln Leu Gly Gly Asp

1355 1360 1365

<210> SEQ ID NO 25

<211> LENGTH: 1370

<212> TYPE: PRT

<213> ORGANISM: Streptococcus agalactiae

<400> SEQUENCE: 25

Met Asn Lys Pro Tyr Ser Ile Gly Leu Asp Ile Gly Thr Asn Ser Val

1 5 10 15

Gly Trp Ser Ile Ile Thr Asp Asp Tyr Lys Val Pro Ala Lys Lys Met

20 25 30

Arg Val Leu Gly Asn Thr Asp Lys Glu Tyr Ile Lys Lys Asn Leu Ile

35 40 45

Gly Ala Leu Leu Phe Asp Gly Gly Asn Thr Ala Ala Asp Arg Arg Leu

50 55 60

Lys Arg Thr Ala Arg Arg Arg Tyr Thr Arg Arg Arg Asn Arg Ile Leu

65 70 75 80

Tyr Leu Gln Glu Ile Phe Ala Glu Glu Met Ser Lys Val Asp Asp Ser

85 90 95

Phe Phe His Arg Leu Glu Asp Ser Phe Leu Val Glu Glu Asp Lys Arg

100 105 110

Gly Ser Lys Tyr Pro Ile Phe Ala Thr Leu Gln Glu Glu Lys Asp Tyr

115 120 125

His Glu Lys Phe Ser Thr Ile Tyr His Leu Arg Lys Glu Leu Ala Asp

130 135 140

Lys Lys Glu Lys Ala Asp Leu Arg Leu Ile Tyr Ile Ala Leu Ala His

145 150 155 160

Ile Ile Lys Phe Arg Gly His Phe Leu Ile Glu Asp Asp Ser Phe Asp

165 170 175

Val Arg Asn Thr Asp Ile Ser Lys Gln Tyr Gln Asp Phe Leu Glu Ile

180 185 190

Phe Asn Thr Thr Phe Glu Asn Asn Asp Leu Leu Ser Gln Asn Val Asp

195 200 205

Val Glu Ala Ile Leu Thr Asp Lys Ile Ser Lys Ser Ala Lys Lys Asp

210 215 220

Arg Ile Leu Ala Gln Tyr Pro Asn Gln Lys Ser Thr Gly Ile Phe Ala

225 230 235 240

Glu Phe Leu Lys Leu Ile Val Gly Asn Gln Ala Asp Phe Lys Lys Tyr

245 250 255

Phe Asn Leu Glu Asp Lys Thr Pro Leu Gln Phe Ala Lys Asp Ser Tyr

260 265 270

Asp Glu Asp Leu Glu Asn Leu Leu Gly Gln Ile Gly Asp Glu Phe Ala

275 280 285

Asp Leu Phe Ser Ala Ala Lys Lys Leu Tyr Asp Ser Val Leu Leu Ser

290 295 300

Gly Ile Leu Thr Val Ile Asp Leu Ser Thr Lys Ala Pro Leu Ser Ala

305 310 315 320

Ser Met Ile Gln Arg Tyr Asp Glu His Arg Glu Asp Leu Lys Gln Leu

325 330 335

Lys Gln Phe Val Lys Ala Ser Leu Pro Glu Lys Tyr Gln Glu Ile Phe

340 345 350

Ala Asp Ser Ser Lys Asp Gly Tyr Ala Gly Tyr Ile Glu Gly Lys Thr

355 360 365

Asn Gln Glu Ala Phe Tyr Lys Tyr Leu Ser Lys Leu Leu Thr Lys Gln

370 375 380

Glu Asp Ser Glu Asn Phe Leu Glu Lys Ile Lys Asn Glu Asp Phe Leu

385 390 395 400

Arg Lys Gln Arg Thr Phe Asp Asn Gly Ser Ile Pro His Gln Val His

405 410 415

Leu Thr Glu Leu Lys Ala Ile Ile Arg Arg Gln Ser Glu Tyr Tyr Pro

420 425 430

Phe Leu Lys Glu Asn Gln Asp Arg Ile Glu Lys Ile Leu Thr Phe Arg

435 440 445

Ile Pro Tyr Tyr Ile Gly Pro Leu Ala Arg Glu Lys Ser Asp Phe Ala

450 455 460

Trp Met Thr Arg Lys Thr Asp Asp Ser Ile Arg Pro Trp Asn Phe Glu

465 470 475 480

Asp Leu Val Asp Lys Glu Lys Ser Ala Glu Ala Phe Ile His Arg Met

485 490 495

Thr Asn Asn Asp Phe Tyr Leu Pro Glu Glu Lys Val Leu Pro Lys His

500 505 510

Ser Leu Ile Tyr Glu Lys Phe Thr Val Tyr Asn Glu Leu Thr Lys Val

515 520 525

Arg Tyr Lys Asn Glu Gln Gly Glu Thr Tyr Phe Phe Asp Ser Asn Ile

530 535 540

Lys Gln Glu Ile Phe Asp Gly Val Phe Lys Glu His Arg Lys Val Ser

545 550 555 560

Lys Lys Lys Leu Leu Asp Phe Leu Ala Lys Glu Tyr Glu Glu Phe Arg

565 570 575

Ile Val Asp Val Ile Gly Leu Asp Lys Glu Asn Lys Ala Phe Asn Ala

580 585 590

Ser Leu Gly Thr Tyr His Asp Leu Glu Lys Ile Leu Asp Lys Asp Phe

595 600 605

Leu Asp Asn Pro Asp Asn Glu Ser Ile Leu Glu Asp Ile Val Gln Thr

610 615 620

Leu Thr Leu Phe Glu Asp Arg Glu Met Ile Lys Lys Arg Leu Glu Asn

625 630 635 640

Tyr Lys Asp Leu Phe Thr Glu Ser Gln Leu Lys Lys Leu Tyr Arg Arg

645 650 655

His Tyr Thr Gly Trp Gly Arg Leu Ser Ala Lys Leu Ile Asn Gly Ile

660 665 670

Arg Asp Lys Glu Ser Gln Lys Thr Ile Leu Asp Tyr Leu Ile Asp Asp

675 680 685

Gly Arg Ser Asn Arg Asn Phe Met Gln Leu Ile Asn Asp Asp Gly Leu

690 695 700

Ser Phe Lys Ser Ile Ile Ser Lys Ala Gln Ala Gly Ser His Ser Asp

705 710 715 720

Asn Leu Lys Glu Val Val Gly Glu Leu Ala Gly Ser Pro Ala Ile Lys

725 730 735

Lys Gly Ile Leu Gln Ser Leu Lys Ile Val Asp Glu Leu Val Lys Val

740 745 750

Met Gly Tyr Glu Pro Glu Gln Ile Val Val Glu Met Ala Arg Glu Asn

755 760 765

Gln Thr Thr Asn Gln Gly Arg Arg Asn Ser Arg Gln Arg Tyr Lys Leu

770 775 780

Leu Asp Asp Gly Val Lys Asn Leu Ala Ser Asp Leu Asn Gly Asn Ile

785 790 795 800

Leu Lys Glu Tyr Pro Thr Asp Asn Gln Ala Leu Gln Asn Glu Arg Leu

805 810 815

Phe Leu Tyr Tyr Leu Gln Asn Gly Arg Asp Met Tyr Thr Gly Glu Ala

820 825 830

Leu Asp Ile Asp Asn Leu Ser Gln Tyr Asp Ile Asp His Ile Ile Pro

835 840 845

Gln Ala Phe Ile Lys Asp Asp Ser Ile Asp Asn Arg Val Leu Val Ser

850 855 860

Ser Ala Lys Asn Arg Gly Lys Ser Asp Asp Val Pro Ser Leu Glu Ile

865 870 875 880

Val Lys Asp Cys Lys Val Phe Trp Lys Lys Leu Leu Asp Ala Lys Leu

885 890 895

Met Ser Gln Arg Lys Tyr Asp Asn Leu Thr Lys Ala Glu Arg Gly Gly

900 905 910

Leu Thr Ser Asp Asp Lys Ala Arg Phe Ile Gln Arg Gln Leu Val Glu

915 920 925

Thr Arg Gln Ile Thr Lys His Val Ala Arg Ile Leu Asp Glu Arg Phe

930 935 940

Asn Asn Glu Leu Asp Ser Lys Gly Arg Arg Ile Arg Lys Val Lys Ile

945 950 955 960

Val Thr Leu Lys Ser Asn Leu Val Ser Asn Phe Arg Lys Glu Phe Gly

965 970 975

Phe Tyr Lys Ile Arg Glu Val Asn Asn Tyr His His Ala His Asp Ala

980 985 990

Tyr Leu Asn Ala Val Val Ala Lys Ala Ile Leu Thr Lys Tyr Pro Gln

995 1000 1005

Leu Glu Pro Glu Phe Val Tyr Gly Asp Tyr Pro Lys Tyr Asn Ser

1010 1015 1020

Tyr Lys Thr Arg Lys Ser Ala Thr Glu Lys Leu Phe Phe Tyr Ser

1025 1030 1035

Asn Ile Met Asn Phe Phe Lys Thr Lys Val Thr Leu Ala Asp Gly

1040 1045 1050

Thr Val Val Val Lys Asp Asp Ile Glu Val Asn Asn Asp Thr Gly

1055 1060 1065

Glu Ile Val Trp Asp Lys Lys Lys His Phe Ala Thr Val Arg Lys

1070 1075 1080

Val Leu Ser Tyr Pro Gln Asn Asn Ile Val Lys Lys Thr Glu Ile

1085 1090 1095

Gln Thr Gly Gly Phe Ser Lys Glu Ser Ile Leu Ala His Gly Asn

1100 1105 1110

Ser Asp Lys Leu Ile Pro Arg Lys Thr Lys Asp Ile Tyr Leu Asp

1115 1120 1125

Pro Lys Lys Tyr Gly Gly Phe Asp Ser Pro Ile Val Ala Tyr Ser

1130 1135 1140

Val Leu Val Val Ala Asp Ile Lys Lys Gly Lys Ala Gln Lys Leu

1145 1150 1155

Lys Thr Val Thr Glu Leu Leu Gly Ile Thr Ile Met Glu Arg Ser

1160 1165 1170

Arg Phe Glu Lys Asn Pro Ser Ala Phe Leu Glu Ser Lys Gly Tyr

1175 1180 1185

Leu Asn Ile Arg Ala Asp Lys Leu Ile Ile Leu Pro Lys Tyr Ser

1190 1195 1200

Leu Phe Glu Leu Glu Asn Gly Arg Arg Arg Leu Leu Ala Ser Ala

1205 1210 1215

Gly Glu Leu Gln Lys Gly Asn Glu Leu Ala Leu Pro Thr Gln Phe

1220 1225 1230

Met Lys Phe Leu Tyr Leu Ala Ser Arg Tyr Asn Glu Ser Lys Gly

1235 1240 1245

Lys Pro Glu Glu Ile Glu Lys Lys Gln Glu Phe Val Asn Gln His

1250 1255 1260

Val Ser Tyr Phe Asp Asp Ile Leu Gln Leu Ile Asn Asp Phe Ser

1265 1270 1275

Lys Arg Val Ile Leu Ala Asp Ala Asn Leu Glu Lys Ile Asn Lys

1280 1285 1290

Leu Tyr Gln Asp Asn Lys Glu Asn Ile Ser Val Asp Glu Leu Ala

1295 1300 1305

Asn Asn Ile Ile Asn Leu Phe Thr Phe Thr Ser Leu Gly Ala Pro

1310 1315 1320

Ala Ala Phe Lys Phe Phe Asp Lys Ile Val Asp Arg Lys Arg Tyr

1325 1330 1335

Thr Ser Thr Lys Glu Val Leu Asn Ser Thr Leu Ile His Gln Ser

1340 1345 1350

Ile Thr Gly Leu Tyr Glu Thr Arg Ile Asp Leu Gly Lys Leu Gly

1355 1360 1365

Glu Asp

1370

<210> SEQ ID NO 26

<211> LENGTH: 1173

<212> TYPE: PRT

<213> ORGANISM: Rhodospirillum rubrum

<400> SEQUENCE: 26

Met Arg Pro Ile Glu Pro Trp Ile Leu Gly Leu Asp Ile Gly Thr Asp

1 5 10 15

Ser Leu Gly Trp Ala Val Phe Ser Cys Glu Glu Lys Gly Pro Pro Thr

20 25 30

Ala Lys Glu Leu Leu Gly Gly Gly Val Arg Leu Phe Asp Ser Gly Arg

35 40 45

Asp Ala Lys Asp His Thr Ser Arg Gln Ala Glu Arg Gly Ala Phe Arg

50 55 60

Arg Ala Arg Arg Gln Thr Arg Thr Trp Pro Trp Arg Arg Asp Arg Leu

65 70 75 80

Ile Ala Leu Phe Gln Ala Ala Gly Leu Thr Pro Pro Ala Ala Glu Thr

85 90 95

Arg Gln Ile Ala Leu Ala Leu Arg Arg Glu Ala Val Ser Arg Pro Leu

100 105 110

Ala Pro Asp Ala Leu Trp Ala Ala Leu Leu His Leu Ala His His Arg

115 120 125

Gly Phe Arg Ser Asn Arg Ile Asp Lys Arg Glu Arg Ala Ala Ala Lys

130 135 140

Ala Leu Ala Lys Ala Lys Pro Ala Lys Ala Thr Ala Lys Ala Thr Ala

145 150 155 160

Pro Ala Lys Glu Ala Asp Asp Glu Ala Gly Phe Trp Glu Gly Ala Glu

165 170 175

Ala Ala Leu Arg Gln Arg Met Ala Ala Ser Gly Ala Pro Thr Val Gly

180 185 190

Ala Leu Leu Ala Asp Asp Leu Asp Arg Gly Gln Pro Val Arg Met Arg

195 200 205

Tyr Asn Gln Ser Asp Arg Asp Gly Val Val Ala Pro Thr Arg Ala Leu

210 215 220

Ile Ala Glu Glu Leu Ala Glu Ile Val Ala Arg Gln Ser Ser Ala Tyr

225 230 235 240

Pro Gly Leu Asp Trp Pro Ala Val Thr Arg Leu Val Leu Asp Gln Arg

245 250 255

Pro Leu Arg Ser Lys Gly Ala Gly Pro Cys Ala Phe Leu Pro Gly Glu

260 265 270

Asp Arg Ala Leu Arg Ala Leu Pro Thr Val Gln Asp Phe Ile Ile Arg

275 280 285

Gln Thr Leu Ala Asn Leu Arg Leu Pro Ser Thr Ser Ala Asp Glu Pro

290 295 300

Arg Pro Leu Thr Asp Glu Glu His Ala Lys Ala Leu Ala Leu Leu Ser

305 310 315 320

Thr Ala Arg Phe Val Glu Trp Pro Ala Leu Arg Arg Ala Leu Gly Leu

325 330 335

Lys Arg Gly Val Lys Phe Thr Ala Glu Thr Glu Arg Asn Gly Ala Lys

340 345 350

Gln Ala Ala Arg Gly Thr Ala Gly Asn Leu Thr Glu Ala Ile Leu Ala

355 360 365

Pro Leu Ile Pro Gly Trp Ser Gly Trp Asp Leu Asp Arg Lys Asp Arg

370 375 380

Val Phe Ser Asp Leu Trp Ala Ala Arg Gln Asp Arg Ser Ala Leu Leu

385 390 395 400

Ala Leu Ile Gly Asp Pro Arg Gly Pro Thr Arg Val Thr Glu Asp Glu

405 410 415

Thr Ala Glu Ala Val Ala Asp Ala Ile Gln Ile Val Leu Pro Thr Gly

420 425 430

Arg Ala Ser Leu Ser Ala Lys Ala Ala Arg Ala Ile Ala Gln Ala Met

435 440 445

Ala Pro Gly Ile Gly Tyr Asp Glu Ala Val Thr Leu Ala Leu Gly Leu

450 455 460

His His Ser His Arg Pro Arg Gln Glu Arg Leu Ala Arg Leu Pro Tyr

465 470 475 480

Tyr Ala Ala Ala Leu Pro Asp Val Gly Leu Asp Gly Asp Pro Val Gly

485 490 495

Pro Pro Pro Ala Glu Asp Asp Gly Ala Ala Ala Glu Ala Tyr Tyr Gly

500 505 510

Arg Ile Gly Asn Ile Ser Val His Ile Ala Leu Asn Glu Thr Arg Lys

515 520 525

Ile Val Asn Ala Leu Leu His Arg His Gly Pro Ile Leu Arg Leu Val

530 535 540

Met Val Glu Thr Thr Arg Glu Leu Lys Ala Gly Ala Asp Glu Arg Lys

545 550 555 560

Arg Met Ile Ala Glu Gln Ala Glu Arg Glu Arg Glu Asn Ala Glu Ile

565 570 575

Asp Val Glu Leu Arg Lys Ser Asp Arg Trp Met Ala Asn Ala Arg Glu

580 585 590

Arg Arg Gln Arg Val Arg Leu Ala Arg Arg Gln Asn Asn Leu Cys Pro

595 600 605

Tyr Thr Ser Thr Pro Ile Gly His Ala Asp Leu Leu Gly Asp Ala Tyr

610 615 620

Asp Ile Asp His Val Ile Pro Leu Ala Arg Gly Gly Arg Asp Ser Leu

625 630 635 640

Asp Asn Met Val Leu Cys Gln Ser Asp Ala Asn Lys Thr Lys Gly Asp

645 650 655

Lys Thr Pro Trp Glu Ala Phe His Asp Lys Pro Gly Trp Ile Ala Gln

660 665 670

Arg Asp Asp Phe Leu Ala Arg Leu Asp Pro Gln Thr Ala Lys Ala Leu

675 680 685

Ala Trp Arg Phe Ala Asp Asp Ala Gly Glu Arg Val Ala Arg Lys Ser

690 695 700

Ala Glu Asp Glu Asp Gln Gly Phe Leu Pro Arg Gln Leu Thr Asp Thr

705 710 715 720

Gly Tyr Ile Ala Arg Val Ala Leu Arg Tyr Leu Ser Leu Val Thr Asn

725 730 735

Glu Pro Asn Ala Val Val Ala Thr Asn Gly Arg Leu Thr Gly Leu Leu

740 745 750

Arg Leu Ala Trp Asp Ile Thr Pro Gly Pro Ala Pro Arg Asp Leu Leu

755 760 765

Pro Thr Pro Arg Asp Ala Leu Arg Asp Asp Thr Ala Ala Arg Arg Phe

770 775 780

Leu Asp Gly Leu Thr Pro Pro Pro Leu Ala Lys Ala Val Glu Gly Ala

785 790 795 800

Val Gln Ala Arg Leu Ala Ala Leu Gly Arg Ser Arg Val Ala Asp Ala

805 810 815

Gly Leu Ala Asp Ala Leu Gly Leu Thr Leu Ala Ser Leu Gly Gly Gly

820 825 830

Gly Lys Asn Arg Ala Asp His Arg His His Phe Ile Asp Ala Ala Met

835 840 845

Ile Ala Val Thr Thr Arg Gly Leu Ile Asn Gln Ile Asn Gln Ala Ser

850 855 860

Gly Ala Gly Arg Ile Leu Asp Leu Arg Lys Trp Pro Arg Thr Asn Phe

865 870 875 880

Glu Pro Pro Tyr Pro Thr Phe Arg Ala Glu Val Met Lys Gln Trp Asp

885 890 895

His Ile His Pro Ser Ile Arg Pro Ala His Arg Asp Gly Gly Ser Leu

900 905 910

His Ala Ala Thr Val Phe Gly Val Arg Asn Arg Pro Asp Ala Arg Val

915 920 925

Leu Val Gln Arg Lys Pro Val Glu Lys Leu Phe Leu Asp Ala Asn Ala

930 935 940

Lys Pro Leu Pro Ala Asp Lys Ile Ala Glu Ile Ile Asp Gly Phe Ala

945 950 955 960

Ser Pro Arg Met Ala Lys Arg Phe Lys Ala Leu Leu Ala Arg Tyr Gln

965 970 975

Ala Ala His Pro Glu Val Pro Pro Ala Leu Ala Ala Leu Ala Val Ala

980 985 990

Arg Asp Pro Ala Phe Gly Pro Arg Gly Met Thr Ala Asn Thr Val Ile

995 1000 1005

Ala Gly Arg Ser Asp Gly Asp Gly Glu Asp Ala Gly Leu Ile Thr

1010 1015 1020

Pro Phe Arg Ala Asn Pro Lys Ala Ala Val Arg Thr Met Gly Asn

1025 1030 1035

Ala Val Tyr Glu Val Trp Glu Ile Gln Val Lys Gly Arg Pro Arg

1040 1045 1050

Trp Thr His Arg Val Leu Thr Arg Phe Asp Arg Thr Gln Pro Ala

1055 1060 1065

Pro Pro Pro Pro Pro Glu Asn Ala Arg Leu Val Met Arg Leu Arg

1070 1075 1080

Arg Gly Asp Leu Val Tyr Trp Pro Leu Glu Ser Gly Asp Arg Leu

1085 1090 1095

Phe Leu Val Lys Lys Met Ala Val Asp Gly Arg Leu Ala Leu Trp

1100 1105 1110

Pro Ala Arg Leu Ala Thr Gly Lys Ala Thr Ala Leu Tyr Ala Gln

1115 1120 1125

Leu Ser Cys Pro Asn Ile Asn Leu Asn Gly Asp Gln Gly Tyr Cys

1130 1135 1140

Val Gln Ser Ala Glu Gly Ile Arg Lys Glu Lys Ile Arg Thr Thr

1145 1150 1155

Ser Cys Thr Ala Leu Gly Arg Leu Arg Leu Ser Lys Lys Ala Thr

1160 1165 1170

<210> SEQ ID NO 27

<211> LENGTH: 997

<212> TYPE: PRT

<213> ORGANISM: Maritimibacter alkaliphilus

<400> SEQUENCE: 27

Met Lys Val Leu Val Asp Ala Gly Leu Met Pro Ser Asp Pro Ala Ala

1 5 10 15

Ala Lys Ala Leu Glu Thr Leu Asp Pro Tyr Glu Leu Arg Ala Thr Gly

20 25 30

Leu Asp Val Met Leu Pro Leu Thr His Leu Gly Arg Ala Val Phe His

35 40 45

Leu Asn Gln Arg Arg Gly Phe Lys Ser Asn Arg Lys Thr Asp Arg Gly

50 55 60

Asp Asn Glu Ser Gly Lys Ile Lys Asp Ala Thr Lys Arg Leu Thr Trp

65 70 75 80

Ala Met Gln Glu Ala Gly Ala Arg Thr Tyr Gly Glu Phe Leu His Met

85 90 95

Arg Arg Ala Arg Ala Ser Asp Pro Arg Gln Val Pro Ser Val Arg Thr

100 105 110

Arg Leu Ser Val Ala Ser Arg Gly Gly Pro Asp Ser Lys Glu Glu Ala

115 120 125

Gly Tyr Asp Phe Tyr Pro Glu Arg Ala His Leu Glu Asp Glu Phe His

130 135 140

Ala Leu Trp Ser Ala Gln Ala Ala Phe His Pro Asp Leu Thr Cys Glu

145 150 155 160

Leu Arg Asp Val Val Phe Glu Lys Ile Phe Tyr Gln Arg Pro Leu Lys

165 170 175

Glu Pro Lys Val Gly Leu Cys Leu Phe Thr Ala Glu Glu Arg Leu Pro

180 185 190

Lys Ala His Pro Leu Thr Gly Arg Arg Val Leu Tyr Glu Thr Val Asn

195 200 205

Gln Leu Arg Val Thr Ala Asp Gly Arg Thr Thr Arg Pro Leu Thr Leu

210 215 220

Glu Glu Arg Asp Met Ile Val Tyr Ala Leu Asp Asn Lys Lys Pro Val

225 230 235 240

Lys Ser Leu Ser Ser Met His Leu Lys Leu Val Ala Leu Gly Lys Leu

245 250 255

Val Lys Leu Lys Asp Gly Glu Arg Phe Thr Leu Glu Ser Gly Val Arg

260 265 270

Asp Ala Ile Ala Cys Asp Pro Ile Arg Ala Thr Met Gly His Pro Asp

275 280 285

Gly Phe Gly Pro Ala Trp Ser Arg Leu Asp Trp Gln Ala Gln Trp Ala

290 295 300

Leu Ile Glu Lys Leu Arg Gln Val Gln Ser Asp Asp Asp Phe Ala Ser

305 310 315 320

Leu Val Ala Trp Leu Gly Ala Thr His Asn Leu Ser Glu Asp His Ala

325 330 335

Lys Arg Val Ala Asn Leu His Leu Pro Glu Gly Tyr Gly Arg Leu Gly

340 345 350

Leu Thr Ala Thr Glu Ala Ile Leu Cys Asn Leu Lys Ala Glu Val Cys

355 360 365

Thr Tyr Ser Gln Ala Val Glu Arg Trp Gly Lys His His Ser Asn Gln

370 375 380

Arg Thr Gly Glu Cys Leu Glu Ala Leu Pro Tyr Tyr Gly Glu Val Leu

385 390 395 400

Asp Arg His Val Ile Pro Gly Thr Tyr Asn Glu Asp Asp Asp Asp Val

405 410 415

Thr Arg Tyr Gly Arg Ile Thr Asn Pro Thr Val His Ile Gly Leu Asn

420 425 430

Gln Leu Arg Arg Leu Val Asn Arg Ile Ile Ala Thr Tyr Gly Lys Pro

435 440 445

Asp Gln Ile Val Val Glu Leu Ala Arg Glu Leu Lys Gln Ser Glu Ala

450 455 460

Gln Lys Ser Glu Ala Leu Lys Lys Ile Arg Asp Thr Thr Ala Asp Ala

465 470 475 480

Ile Arg Arg Gly Arg Gln Leu Glu Glu Ile Gly Gln Lys Asn Thr Gly

485 490 495

Ala Asn Arg Leu Leu Leu Arg Leu Trp Glu Asp Leu Gly Pro Ala Val

500 505 510

Gly Pro Arg Cys Cys Pro Tyr Thr Gly Lys Pro Ile Ser Val Thr Met

515 520 525

Leu Phe Asp Gly Ala Cys Asp Val Asp His Ile Leu Pro Tyr Ser Arg

530 535 540

Thr Leu Asp Asp Ser Phe Ala Asn Arg Thr Ile Cys Leu Arg Glu Ala

545 550 555 560

Asn Arg Gln Lys Gly Asp Arg Thr Pro Trp Glu Ala Trp Gly Gly Ser

565 570 575

Asp Gln Trp Asp Ala Ile Glu Ala Asn Leu Lys Asp Leu Pro Glu Asn

580 585 590

Lys Arg Lys Arg Phe Ala Pro Asp Ala Met Glu Arg Phe Glu Gly Glu

595 600 605

Arg Asp Phe Leu Asp Arg Ala Leu Val Asp Thr Gln Tyr Leu Ala Arg

610 615 620

Ile Ser Arg Ala Tyr Leu Asp Thr Leu Phe Thr Glu Gly Gly His Val

625 630 635 640

Trp Val Val Pro Gly Arg Met Thr Glu Met Leu Arg Arg His Trp Gly

645 650 655

Leu Asn Ala Leu Leu Ser Asp Lys Asp Arg Gly Ala Gly Lys Val Lys

660 665 670

Asn Arg Thr Asp His Arg His His Ala Ile Asp Ala Ala Val Ile Ala

675 680 685

Ala Thr Asp His Ser Leu Val Asn Arg Ile Ser Lys Ala Ala Gly Gln

690 695 700

Gly Glu Ser Ala Gly Gln Ser Ala Glu Leu Ile Ala Arg Asp Thr Pro

705 710 715 720

Glu Pro Trp Glu Gly Phe Arg Asn Asp Ile Ser Ala Gln Ile Ala Arg

725 730 735

Ile Val Val Ser His Arg Ala Asp His Gly Arg Ile Asp Val Glu Gly

740 745 750

Arg Lys His Gly Lys Asp Ser Thr Ala Gly Arg Leu His Asn Asp Thr

755 760 765

Ala Tyr Gly Ile Val Asp Asp His Thr Val Val Ser Arg Thr Pro Leu

770 775 780

Leu Ser Leu Lys Pro Gly Asp Ile Glu Val Thr Thr Lys Gly Lys Asn

785 790 795 800

Ile Arg Asp Thr Gln Leu Gln Lys Ala Leu Ser Val Ala Thr Thr Gly

805 810 815

Lys Asp Gly Lys Gly Phe Glu Glu Ala Leu Arg Asn Phe Ala Glu Lys

820 825 830

Glu Gly Pro Tyr Gln Gly Ile Arg Arg Val Arg Leu Ile Glu Thr Leu

835 840 845

Gln Ser Ser Ala Arg Val Glu Val Gly Ser Asp Glu Asp Gly Gln Pro

850 855 860

Leu Lys Ala Tyr Lys Gly Asp Ser Asn His Cys Tyr Asp Leu Trp Lys

865 870 875 880

Leu Pro Asp Gly Lys Val Val Pro His Val Val Ser Thr Tyr Glu Ala

885 890 895

His Ala Gly Thr Asp Lys Arg Pro His Pro Ala Ala Lys Arg Ile Leu

900 905 910

Arg Val Phe Lys Lys Asp Met Val Ala Ile Glu Gln Asp Asp Gln Thr

915 920 925

Ala Ile Phe Tyr Val Gln Lys Leu Asp Arg Ala Asn Gly Leu Phe Leu

930 935 940

Ala Pro His Leu Glu Ala Asn Ala Asp Ala Arg His Arg Asn Pro Asn

945 950 955 960

Asp Ala Phe Lys Phe Leu Gln Met Gly Ala Gly Ser Val Val Arg Asn

965 970 975

Lys Leu Arg Arg Val Tyr Val Asp Glu Ile Gly Arg Ile Lys Asp Pro

980 985 990

Gly Pro Pro Lys His

995

<210> SEQ ID NO 28

<211> LENGTH: 1027

<212> TYPE: PRT

<213> ORGANISM: Blastopirellula marina

<400> SEQUENCE: 28

Met Phe Ser Glu Ser Gly Asp Gln Pro Ser Leu Val Asp Cys Gly Val

1 5 10 15

Arg Ile Phe Pro Glu Gly Val Glu Arg Asp Gln Gln Gly Gly Glu Lys

20 25 30

Ser Lys Ser Gln Ser Arg Arg Val Ala Arg Gly Ile Arg Arg Gln Val

35 40 45

Arg Arg Arg Ala Gln Arg Leu Arg His Leu Lys His Ala Leu Ile Thr

50 55 60

Thr Gly Leu Phe Pro Ala Asp Val Val Asp Gln Gln Glu Val Leu Ala

65 70 75 80

Ser Asn Pro Tyr Glu Leu Arg Ser Arg Ala Leu Ser Glu Lys Leu Glu

85 90 95

Pro Tyr Glu Ile Gly Arg Ile Leu Leu His Leu Ala Lys Arg Arg Gly

100 105 110

Phe Leu Ser Asn Arg Thr Thr Asp Arg Ser Gly Gly Asn Glu Leu Lys

115 120 125

Gly Ile Leu Ala Glu Met Thr Gln Leu Gln Gly Glu Ile Ala Ala His

130 135 140

Gly Cys Gln Thr Leu Gly Gln Tyr Leu His Gln Leu Gly Asp Asp Gly

145 150 155 160

Glu Thr Phe Ala Gln Arg Leu Arg Gly Arg His Thr Leu Arg Ala Met

165 170 175

Tyr Glu Asp Glu Phe Glu Lys Ile Trp Glu Arg Gln Ala Ser Tyr Tyr

180 185 190

Pro Ala Leu Leu Thr Glu Glu Leu Arg Gly Gly Glu Glu Gly Lys Gln

195 200 205

Ser Tyr Pro Leu Val Pro Thr Ala Arg Arg Lys Ser Glu Ser Leu Leu

210 215 220

Lys Arg Phe Gly Leu His Gly Leu Leu Phe Phe Gln Arg Lys Met Tyr

225 230 235 240

Trp Pro Lys Ser Val Ile Gly Arg Cys Asp Leu Glu Pro Lys Glu Lys

245 250 255

Arg Cys Pro Lys Ala Asp Arg Leu Ala Gln Arg Phe Arg Ile Leu Gln

260 265 270

Glu Val Asn Asn Leu Arg Leu Ile Ser Pro Ala Asp Arg Lys Glu Tyr

275 280 285

Thr Phe Ala Glu Phe Val Gly Ala Gly Ala Asp Lys Lys Leu Val Asp

290 295 300

Tyr Leu Cys Glu Ala Lys Glu Arg Thr Phe Pro Gln Ile Arg Lys Lys

305 310 315 320

Phe Lys Leu Pro Glu Thr Ile Ser Phe Asn Tyr Glu Arg Gly Glu Arg

325 330 335

Ser Lys Leu Gln Gly His Glu Thr Asp Ala His Phe Asn Gly Lys Lys

340 345 350

Gly Leu Gly Lys Arg Arg Trp Ser Glu Ile Ala Glu Asp Val Lys Asp

355 360 365

His Ile Ile Gln Ile Val Leu Glu Glu Asp Arg Glu Asp Val Ala Leu

370 375 380

Arg Lys Leu Thr Gln Asp Cys Ser Leu Thr Val Glu Glu Ala His Lys

385 390 395 400

Ala Met Ala Leu His Leu Pro Ala Gly Tyr Ser Gln Tyr Ser Arg Val

405 410 415

Ala Ile Ser Arg Leu Leu Pro His Leu Glu Leu Gly Lys Leu Leu Met

420 425 430

Gly Asn Asp Ala Ser Asp Ser Ala Met His Ala Ala Gly Tyr Leu Arg

435 440 445

Pro Asp Glu Arg Glu Val Arg Gln Tyr Asp Leu Leu Pro Glu Pro Pro

450 455 460

Asp Ile Pro Asn Pro Leu Val Arg Gln Ala Leu Tyr Glu Val Arg Lys

465 470 475 480

Val Ile Asn Ala Ile Ile Arg Glu Tyr Gly Ala Leu Gly Thr Ile Arg

485 490 495

Val Glu Leu Ala Arg Glu Ala Lys Lys Ser Ala Glu Gln Arg Thr Gln

500 505 510

Ile Arg Ile Ala Asn Ala Lys Arg Glu Arg Glu Asn Ala Ala Ile Ala

515 520 525

Lys Thr Leu Gln Glu Met Ser Pro Ala Ile Arg Pro Thr Arg Arg Asn

530 535 540

Ile Gln Arg Tyr Leu Leu Trp Lys Asp Gln Gly Gly Val Cys Ile Tyr

545 550 555 560

Thr Gly Lys Val Ile Ser Gln Ala Gln Leu Phe Asp Ser Gly Glu Val

565 570 575

Asp Val Asp His Ile Leu Pro Arg Trp Arg Ser Leu Asp Asp Ser Met

580 585 590

Ala Asn Lys Val Ile Ala His Arg Ser Ala Asn Asn Glu Lys Gly Asp

595 600 605

Arg Thr Pro Trp Glu Trp Leu Gly His Asp Lys Pro Arg Phe Asp Glu

610 615 620

Leu Leu Leu Arg Ala Gln Arg Leu Pro Tyr Gly Lys Arg Gln Arg Phe

625 630 635 640

Ile Gln Lys Glu Val Glu Leu Thr Asn Phe Val Glu Arg Gln Leu Arg

645 650 655

Asp Thr Thr Tyr Val Ser Arg Leu Val Val Gln Tyr Leu Lys Gly Leu

660 665 670

Gly Val Pro Ile Thr Thr Val Lys Gly Pro Met Thr Ala Glu Leu Arg

675 680 685

His His Trp Gly Leu Asn Ala Leu Leu Asn Glu Asp Gly Ser Gln Lys

690 695 700

Lys Asn Arg Ser Asp His Arg His His Ala Ile Asp Ala Ile Val Ile

705 710 715 720

Gly Leu Thr Asp Ala Lys Arg Leu His Ala Leu Ala Asn Ala Arg Gly

725 730 735

Lys Asp Leu Gln Pro Pro Trp Ser Glu Phe Arg Ser Asp Val Ala Thr

740 745 750

Ala Leu Gly Arg Met Ala Val Ser His Arg Val Arg Arg Arg Ile Gln

755 760 765

Gly Ala Leu His Glu Glu Thr Ile Tyr Gly Pro Thr Gln Lys Ala Ala

770 775 780

Gln Pro Thr Ser Thr Asp Gln Arg Pro Trp Ala Lys Asp Trp Ile Glu

785 790 795 800

Asp Ser Gln Ile Val Val Arg Arg Lys Ser Val Ser Asp Leu Thr Asn

805 810 815

Thr Lys His Leu Ala Lys Ile Arg Asp Val Thr Ile Arg Lys Ile Leu

820 825 830

Glu Asp His Leu Arg Arg Gln Lys Val Asp Pro Thr Lys Pro Gly Val

835 840 845

Ile Pro Lys Asp Ala Phe Ala Gly Ser Asn Ala Pro Gln Met Pro Ser

850 855 860

Gly Val Pro Ile Lys Lys Val Arg Leu Val Glu Arg Gly Glu Thr Trp

865 870 875 880

Arg Lys Ile Gly Ser Gly Glu Ala Ser Lys Phe Ile Lys Pro Gly Ser

885 890 895

Asn His His Ile Ser Tyr Phe Ala Ile Ser Arg Lys Gly Arg Glu Ser

900 905 910

Trp Lys Ala His Val Thr Ile Met Leu Asp Ala Ala Cys Ile Ala Lys

915 920 925

Glu Ser Ile Asn Asn Gly Lys Ile Val Asn Arg Ser Ile Gly Asp Gln

930 935 940

Gly Arg Phe Leu Met Ser Leu Ser Ile Gly Glu Met Phe Glu Ile Thr

945 950 955 960

Asn Asn Gly Gly Glu Ile Leu Leu Cys Val Val Arg Lys Ile Asp Gln

965 970 975

Ser Gly Arg Ile Tyr Tyr Lys Ile His Asp Asp Ala Arg Glu Ser Met

980 985 990

Asp Leu Lys Lys Asp Asn Leu Tyr Met Ser Pro Ser Lys Met Gln Gln

995 1000 1005

Phe Asp Ala Arg Lys Val Thr Val Thr Pro Leu Gly Lys Ile Arg

1010 1015 1020

Asn Ala Asn Asp

1025

<210> SEQ ID NO 29

<211> LENGTH: 158

<212> TYPE: PRT

<213> ORGANISM: Francisella tularensis

<400> SEQUENCE: 29

Met Ile Asp Phe Leu Leu Glu Leu Asp Pro Cys Ile Thr Ile Pro Pro

1 5 10 15

Tyr Leu Asp Asn Asn Asn Arg Lys Pro Pro Lys Cys Gln Ser Leu Ile

20 25 30

Leu Asn Pro Lys Phe Leu Asp Asn Gln Tyr Pro Asn Trp Gln Gln Tyr

35 40 45

Leu Gln Glu Leu Lys Lys Leu Gln Ser Ile Gln Asp Tyr Leu Asp Ser

50 55 60

Phe Glu Thr Asp Leu Lys Asp Leu Lys Ser Ser Lys Asp Gln Pro Tyr

65 70 75 80

Phe Val Glu Tyr Lys Ser Ser Asn Gln Gln Met Ala Ser Gly Gln Arg

85 90 95

Asp Tyr Lys Asp Leu Asp Ala Arg Ile Leu Gln Phe Ile Phe Asp Arg

100 105 110

Val Lys Ala Ser Asp Glu Leu Leu Leu Ser Glu Ile Tyr Phe Gln Ala

115 120 125

Lys Lys Leu Lys Gln Lys Ala Ser Ser Glu Leu Glu Lys Leu Glu Ser

130 135 140

Ser Lys Lys Leu Asp Glu Val Ile Ala Asn Ser Gln Leu Ser

145 150 155

<210> SEQ ID NO 30

<211> LENGTH: 393

<212> TYPE: PRT

<213> ORGANISM: Francisella tularensis

<400> SEQUENCE: 30

Met Asn Val Lys Ile Leu Pro Ile Ala Ile Asp Leu Asp Val Lys Asn

1 5 10 15

Thr Gly Val Phe Ser Ala Phe Tyr Gln Lys Gly Thr Ser Leu Glu Lys

20 25 30

Leu Asp Asn Lys Asn Gly Lys Val Tyr Glu Leu Ser Lys Asp Ser Tyr

35 40 45

Thr Leu Leu Met Asn Asn Arg Thr Ala Gln Arg His Gln Arg Arg Gly

50 55 60

Ile Asp Arg Lys Gln Leu Val Lys Arg Leu Phe Lys Leu Val Trp Thr

65 70 75 80

Glu Gln Leu Asn Leu Glu Trp Asp Lys Asp Thr Gln Gln Ala Ile Ser

85 90 95

Phe Leu Phe Asn Arg Arg Gly Phe Ser Phe Ile Thr Asp Gly Tyr Ser

100 105 110

Thr Glu Tyr Leu Asn Ile Val Pro Glu Gln Val Lys Ala Ile Leu Met

115 120 125

Asp Ile Phe Asp Asp Tyr Asn Gly Glu Asp Asp Leu Asp Ser Tyr Leu

130 135 140

Lys Leu Ala Thr Glu Gln Glu Ser Lys Ile Ser Glu Ile Tyr Asn Lys

145 150 155 160

Leu Met Gln Lys Ile Leu Glu Phe Lys Leu Arg Lys Leu Cys Thr Asp

165 170 175

Ile Lys Asp Asp Lys Val Ser Thr Lys Thr Leu Lys Glu Ile Thr Ser

180 185 190

Tyr Glu Phe Glu Leu Leu Ala Asp Tyr Leu Ala Asn Tyr Ser Glu Ser

195 200 205

Leu Lys Thr Gln Lys Phe Ser Tyr Thr Asp Lys Gln Gly Asn Leu Lys

210 215 220

Glu Leu Ser Tyr Tyr His His Asp Lys Tyr Asn Ile Gln Glu Phe Leu

225 230 235 240

Lys Arg His Ala Thr Ile Asn Asp Glu Ile Leu Asp Thr Leu Leu Thr

245 250 255

Asp Asp Phe Asp Ile Trp Asn Phe Asn Phe Glu Lys Phe Asp Phe Asp

260 265 270

Lys Asn Glu Glu Lys Leu Gln Ser Gln Glu Asp Lys Asp His Thr Gln

275 280 285

Ala Tyr Phe His His Phe Val Phe Ala Val Asn Lys Ile Lys Ser Glu

290 295 300

Met Ala Ser Gly Gly Arg His Arg Ser Gln Tyr Phe Gln Glu Ile Thr

305 310 315 320

Asn Val Leu Asp Glu Asn Asn His Gln Glu Gly Tyr Leu Lys Asn Phe

325 330 335

Cys Glu Asn Leu His Asn Lys Lys Tyr Ser Asn Leu Ser Val Lys Asn

340 345 350

Leu Val Asn Leu Val Gly Asn Leu Ser Asn Leu Glu Leu Lys Pro Leu

355 360 365

Arg Lys Tyr Phe Asn Asp Lys Asn Leu Ile Ile Gly Met Ser Lys Ser

370 375 380

Leu Gln Lys Leu Ile Ala Thr Gly Tyr

385 390

<210> SEQ ID NO 31

<211> LENGTH: 1066

<212> TYPE: PRT

<213> ORGANISM: Rhodopseudomonas palustris

<400> SEQUENCE: 31

Met Ser Glu Arg Val Val Arg Arg Ile Leu Gly Ile Asp Leu Gly Ile

1 5 10 15

Ala Ser Cys Gly Trp Gly Val Val Asp Ile Ser Gly Ala Gly Gly Gly

20 25 30

Ile Ile Ala Thr Gly Val Arg Cys Phe Asp Ala Pro Leu Ile Asp Lys

35 40 45

Thr Gly Glu Pro Lys Ser Ala Thr Arg Arg Thr Ala Arg Gly Gln Arg

50 55 60

Arg Ile Val Arg Arg Arg Arg Gln Arg Met Asn Gly Val Arg Arg Leu

65 70 75 80

Leu Cys Glu Phe Gly Leu Leu Pro Asp Pro Arg Pro Asp Ala Leu Asn

85 90 95

Gln Ala Met Arg Arg Ile Ser Thr Ala Ser Ala Ala Ala Gln Val Thr

100 105 110

Pro Trp Thr Leu Arg Ala Ala Ala His Gln Arg Leu Leu Ser Asn Glu

115 120 125

Glu Leu Ala Val Val Leu Gly His Ile Ala Arg His Arg Gly Phe Arg

130 135 140

Ser Asn Ala Lys Asn Glu Ala Gly Ala Asn Ala Ala Asp Glu Thr Ser

145 150 155 160

Lys Met Lys Lys Ala Met Glu Ala Thr Arg Glu Gly Leu Ala Lys Tyr

165 170 175

His Ala Phe Gly Asp Met Ile Ala Asn Asp Pro Lys Phe Ala Asn Arg

180 185 190

Lys Arg Asn Arg Asp Lys Asp Tyr Ser His Thr Ala Lys Arg Ser Asp

195 200 205

Leu Glu Asp Glu Val Arg Ala Ile Leu Arg Ala Gln Leu Arg Phe Gly

210 215 220

Ser Ala Ala Ala Thr Glu Thr Leu Ala Gln Thr Phe Ala Asp Val Ala

225 230 235 240

Phe Phe Gln Arg Pro Leu Gln Asp Ser Glu Asp Arg Val Gly Asp Cys

245 250 255

Pro Phe Glu Pro Gly Gln Lys Arg Ala Ala Arg Arg Ala Pro Ser Phe

260 265 270

Glu Leu Phe Arg Phe Leu Ser Arg Leu Ala Asn Leu Lys Leu Ala Val

275 280 285

Gly Arg Ser Pro Glu Arg Arg Leu Thr Ala Glu Glu Ile Ala Leu Ala

290 295 300

Ala Lys Gly Phe Gly Glu Thr Lys Lys Thr Ile Thr Phe Lys Ser Leu

305 310 315 320

Arg Glu Ala Leu Asp Leu Asp Pro Asn Ala Arg Phe Ser Gly Ile Gly

325 330 335

Lys Asp Lys Glu Ala Ser Leu Asp Val Val Ala Arg Thr Gly Gly Ala

340 345 350

Ala Tyr Gly Thr Lys Thr Leu Lys Asp Ala Leu Gly Asp Ala Pro Trp

355 360 365

Arg Ser Leu Ser Arg Thr Pro Glu Thr Leu Asp Arg Ile Ala Glu Ile

370 375 380

Leu Ser Phe Arg Glu Asp Ile Thr Ser Ile Arg Asn Gly Leu Glu Asp

385 390 395 400

Leu Gly Leu Asp Ser Leu Val Val Glu Ala Leu Met Gln Ala Ala Ala

405 410 415

Asn Gly Asp Phe Lys Glu Phe Thr Arg Ala Gly His Ile Ser Ala Arg

420 425 430

Ala Ala Arg Asn Ile Ile Pro Gly Leu Arg Glu Gly Leu Val Tyr Ser

435 440 445

Glu Ala Cys Ala Arg Val Gly Tyr Asp His Ala Ala Arg Leu Ser Val

450 455 460

Pro Leu Asp Gln Val Gly Ser Pro Val Thr Arg Lys Ala Leu Ser Glu

465 470 475 480

Ala Leu Lys Gln Val Arg Ala Val Ala Arg Glu Tyr Gly Pro Ile Asp

485 490 495

Tyr Phe His Ile Glu Leu Ala Arg Ser Val Gly Lys Ser Ala Glu Glu

500 505 510

Arg Lys Gln Leu Thr Asp Gly Ile Glu Ala Arg Asn Val Glu Lys Ala

515 520 525

Lys Arg Arg Lys Gln Ala Glu Glu His Leu Gly Arg Ala Pro Ser Asp

530 535 540

Asp Glu Leu Leu Arg Tyr Glu Leu Ala Lys Glu Gln Asn Phe Lys Cys

545 550 555 560

Ile Tyr Ser Gly Asp Ala Ile Asp Pro Ala Gly Val Ala Ala Asn Asp

565 570 575

Thr Arg Tyr Gln Val Asp His Ile Leu Pro Trp Ser Arg Phe Gly Asp

580 585 590

Asp Ser Tyr Leu Asn Lys Thr Leu Cys Thr Ala Arg Ser Asn Gln Asn

595 600 605

Lys Arg Gly Arg Thr Pro Phe Glu Trp Phe Glu Ala Asp Lys Thr Asp

610 615 620

Val Glu Trp Met Glu Tyr Val Ala Arg Val Glu Asn Leu Gln Glu Val

625 630 635 640

Lys Gly Arg Lys Lys Arg Asn Tyr Ser Ile Lys Asp Ala Ala Ala Ile

645 650 655

Glu Asp Lys Phe Lys Ala Arg Asn Leu Thr Asp Thr Gln Trp Ala Thr

660 665 670

Arg Leu Leu Ala Asp Glu Leu Ser Arg Met Phe Pro Pro Arg Glu Cys

675 680 685

Glu Arg Ala Ile Cys Gly Arg Ala Asp Gly Gly Asn Asp Gly Leu Thr

690 695 700

Ile Val Glu Glu Arg Arg Val Phe Thr Arg Pro Gly Ala Ile Thr Ser

705 710 715 720

Lys Leu Arg Arg Ala Trp Gly Leu Glu Gly Leu Lys Lys Gln Asp Gly

725 730 735

Lys Arg Val Glu Asp Asp Arg His His Ala Val Asp Ala Leu Val Leu

740 745 750

Ala Ala Thr Thr Glu Ser Leu Leu Gln Arg Leu Thr Val Glu Val Gln

755 760 765

Arg Arg Glu Arg Glu Gly Arg Pro Asp Asp Ile Phe His Cys Ala Glu

770 775 780

Pro Trp Arg Gly Phe Arg Ala Asp Val Arg Arg Thr Val Tyr Gly Ser

785 790 795 800

Glu Thr Met Pro Gly Ile Phe Val Ser Arg Ala Glu Arg Arg Arg Ala

805 810 815

Arg Gly Lys Ala His Asp Ala Thr Ile Lys Gln Ile Arg Glu Ile Glu

820 825 830

Gly Glu Arg Leu Val Phe Glu Arg Lys Pro Val Glu Lys Leu Thr Asp

835 840 845

Lys Asp Leu Glu Lys Ile Pro Ile Pro Lys Pro Tyr Gly Gln Val Ser

850 855 860

Asp Pro Lys Arg Leu Arg Asp Glu Leu Val Glu Ser Leu Arg Ala Trp

865 870 875 880

Ile Ala Ala Gly Lys Pro Lys Asp Arg Pro Pro Leu Ser Pro Lys Gly

885 890 895

Asp Val Ile Arg Lys Val Arg Ile Gln Thr Asn Asp Lys Val Ser Val

900 905 910

Glu Ile Asn Gly Gly Thr Val Asp Arg Gly Asp Met Ala Arg Val Asp

915 920 925

Val Phe Arg Lys Lys Asn Lys Lys Ala Val Trp Glu Tyr Tyr Val Val

930 935 940

Pro Ile Tyr Pro His Gln Ile Val Ala Leu Asn Asp Pro Pro Asp Arg

945 950 955 960

Ala Val Ile Ala Tyr Ala Glu Asp Lys Asp Trp Lys Glu Ile Asp Ser

965 970 975

Ser Tyr Glu Phe Leu Trp Ser Leu Phe Gly Leu Ser Tyr Val Glu Ile

980 985 990

Ser Lys Ala Asn Gly Glu Cys Ile Asp Gly Tyr Phe Arg Gly Leu His

995 1000 1005

Arg Gly Thr Gly Ala Ala Ser Val Cys Lys His Ile Ser Leu Gly

1010 1015 1020

Lys Asp Ala Thr Val Ser Gly Ile Gly Leu Lys Thr Leu Ala Ser

1025 1030 1035

Phe Lys Lys Phe Thr Ile Asp Arg Leu Gly Arg Lys Phe Glu Ile

1040 1045 1050

Pro Arg Glu Val Arg Thr Trp Arg Gly Glu Ala Cys Thr

1055 1060 1065

<210> SEQ ID NO 32

<211> LENGTH: 1149

<212> TYPE: PRT

<213> ORGANISM: Lactobacillus salivarius

<400> SEQUENCE: 32

Met Glu Arg Tyr His Ile Gly Leu Asp Ile Gly Thr Ser Ser Ile Gly

1 5 10 15

Trp Ala Val Ile Gly Asp Asp Phe Lys Ile Lys Arg Lys Lys Gly Lys

20 25 30

Asn Leu Ile Gly Val Arg Leu Phe Lys Glu Gly Asp Thr Ala Ala Glu

35 40 45

Arg Arg Gly Phe Arg Thr Gln Arg Arg Arg Leu Asn Arg Arg Lys Trp

50 55 60

Arg Leu Lys Leu Leu Glu Glu Ile Phe Asp Pro Tyr Met Ala Glu Val

65 70 75 80

Asp Lys Tyr Phe Phe Ala Arg Leu Lys Glu Ser Asn Leu Ser Pro Lys

85 90 95

Asp Ser Asn Lys Lys Tyr Leu Gly Ser Leu Leu Phe Pro Asp Ile Ser

100 105 110

Asp Ser Asn Phe Tyr Asp Lys Tyr Pro Thr Ile Tyr His Leu Arg Arg

115 120 125

Asp Leu Met Glu Lys Asp Lys Lys Phe Asp Leu Arg Glu Ile Tyr Leu

130 135 140

Ala Ile His His Ile Val Lys Tyr Arg Gly Asn Phe Leu Glu Lys Val

145 150 155 160

Pro Ala Lys Asn Tyr Lys Asn Ser Gly Ala Ser Ile Gly Phe Leu Leu

165 170 175

Glu Glu Val Asn Asp Leu Tyr Gly Asn Ile Ile Gly Asn Glu Asp Val

180 185 190

Ala Ile Leu Asp Asn Asp Lys Phe Glu Asp Val Glu Lys Ile Ile Leu

195 200 205

Asn Asp Glu Ile Arg Asn Ile Asp Lys Gln Lys Asn Val Gly Arg Leu

210 215 220

Leu Val Lys Asp Lys Lys Glu Lys Asn Ile Val Thr Ala Phe Ser Lys

225 230 235 240

Ala Ile Phe Gly Tyr Lys Phe Asn Leu Glu Asp Leu Leu Leu Ile Glu

245 250 255

Ser Asp Glu Lys Asn Lys Leu Thr Phe Asn Asp Glu Asn Ile Asp Asp

260 265 270

Ile Phe Asn Glu Leu Ser His Ser Leu Asn Asp Asn Gln Met Asp Leu

275 280 285

Leu Thr Lys Thr Arg Glu Ile Tyr Phe Lys Phe Lys Leu Asn Met Ile

290 295 300

Val Pro Thr Gly Tyr Thr Ile Ser Glu Ser Met Ile Glu Lys Tyr Glu

305 310 315 320

Met His Lys Ala His Leu Lys Met Tyr Lys Glu Phe Ile Asn Thr Leu

325 330 335

Asn Ala Lys Asp Arg Lys Ile Leu Lys Asn Ala Tyr Ser Asp Tyr Ile

340 345 350

Asn Asn Glu Lys Ala Lys Ala Ala Asn Ala Gln Glu Asn Phe Tyr Lys

355 360 365

Thr Val Lys Lys Thr Ile Lys Glu Asn Asp Ser Asp Thr Ala Lys Lys

370 375 380

Ile Ile Gly Leu Ile Asp Glu Gly Asn Phe Met Pro Lys Gln Arg Thr

385 390 395 400

Gly Glu Asn Gly Val Ile Pro His Gln Leu His Gln Ile Glu Leu Asp

405 410 415

Arg Ile Ile Glu Asn Gln Ala Lys Tyr Tyr Pro Trp Leu Ala Glu Glu

420 425 430

Asn Pro Val Glu Lys Asn Arg Lys Phe Ala Lys Tyr Lys Leu Asp Glu

435 440 445

Leu Val Thr Phe Arg Val Pro Tyr Tyr Val Gly Pro Leu Ile Asp Lys

450 455 460

Thr Glu Ser Asn Lys Asn Glu Lys Glu Thr Lys Phe Ala Trp Met Val

465 470 475 480

Arg Lys Ala Lys Gly Thr Ile Thr Pro Trp Asn Phe Glu Asn Leu Val

485 490 495

Asp Arg Thr Glu Ser Ala Asn Arg Phe Ile Lys Arg Met Thr Ser Lys

500 505 510

Asp Thr Tyr Ile Ile Gly Glu Asp Val Leu Pro Ala Ser Ser Leu Leu

515 520 525

Tyr Glu Lys Tyr Lys Val Leu Asn Glu Leu Asn Asn Ile Lys Val Asn

530 535 540

Lys Lys Lys Leu Asp Val Glu Gln Lys Gln His Val Tyr Leu Asp Leu

545 550 555 560

Phe Thr Thr Arg Lys Asn Val Thr Lys Asp Asp Leu Ala Thr Ser Leu

565 570 575

Asn Cys Asp Val Glu Ser Ile Thr Gly Leu Thr Asp Asn Lys Lys Phe

580 585 590

Asn Ser Ser Leu Ser Ser Tyr Ile Asp Leu Lys Ala Ile Leu Gly Asn

595 600 605

Ile Val Asp Asp Tyr Ser Lys Asn Glu Asp Leu Glu Lys Ile Ile Glu

610 615 620

Tyr Ser Thr Ile Phe Glu Asp Gly Asn Ile Tyr Lys Glu Lys Leu Ser

625 630 635 640

Glu Ile Ser Trp Leu Thr Asp Glu Gln Ile Glu Lys Leu Ser Asn Ile

645 650 655

His Phe Lys Gly Trp Gly Arg Leu Ser Lys Lys Leu Leu Thr Gln Ile

660 665 670

Thr Asn Glu Asn Gly Glu Arg Ile Ile Asp Thr Leu Trp Asn Thr Ser

675 680 685

Asn Asn Phe Ile Gln Val Ile Ser Asp Glu Ser Ile Gln Ala Lys Leu

690 695 700

Ala Glu Ile Asn Gly Glu Tyr Ala Asn Lys Tyr Asn Leu Glu Asp Ile

705 710 715 720

Leu Asp Glu Ala Tyr Thr Ser Pro Gln Asn Lys Lys Ala Ile Arg Gln

725 730 735

Val Met Lys Val Val Glu Asp Ile Glu Lys Ala Met Lys Cys Glu Pro

740 745 750

Thr Ser Ile Ala Ile Glu Phe Thr Arg Glu Lys Arg Lys Ser Lys Leu

755 760 765

Thr Asn Thr Arg Tyr Lys Lys Ile Ser Glu Thr Tyr Glu Lys Ile Thr

770 775 780

Asp Glu Leu Ile Ser Glu Tyr Glu Leu Gly Lys Leu Gln Ser Glu Leu

785 790 795 800

Asp Ser Lys Val Asn Asn Met Arg Asp Arg Tyr Tyr Leu Tyr Phe Met

805 810 815

Gln Leu Gly Arg Asp Met Tyr Thr Gly Glu Lys Met Asn Ile Asp Glu

820 825 830

Leu His Gln Lys Tyr Asp Ile Asp His Ile Leu Pro Gln Ser Phe Ile

835 840 845

Lys Asp Asp Ser Leu Asn Asn Arg Val Leu Thr Ser Lys Ser Val Asn

850 855 860

Ile Lys Glu Lys Ser Asp Lys Thr Ala Ala Asp Leu Tyr Ala Ala Lys

865 870 875 880

Met Gly Asp Phe Trp Arg Lys Leu Arg Lys Gln Gly Leu Met Thr Glu

885 890 895

Gln Lys Tyr Lys Asn Leu Leu Thr Arg Thr Asp Ser Ile Asn Lys Tyr

900 905 910

Thr Lys Gln Ser Phe Ile Lys Arg Gln Leu Val Glu Thr Ser Gln Val

915 920 925

Val Lys Leu Ala Ala Asn Ile Leu Gln Asp Lys Tyr Arg Asn Thr Lys

930 935 940

Ile Ile Glu Ile Arg Ala Arg Leu Asn Ser Asp Leu Arg Lys Lys Tyr

945 950 955 960

Glu Leu Ile Lys Asn Arg Glu Val Asn Asp Tyr His His Ala Ile Asp

965 970 975

Gly Tyr Leu Thr Thr Phe Val Gly Gln Tyr Leu Tyr Lys Val Tyr Pro

980 985 990

Lys Leu Arg Ser Tyr Phe Val Tyr Asp Asp Phe Lys Lys Leu Asp Ser

995 1000 1005

Asn Tyr Leu Lys His Met Asp Lys Phe Asn Phe Ile Trp Lys Leu

1010 1015 1020

Glu Asp Lys Lys Ala Glu Asp Val Tyr Asp Lys Val Asn Asp Glu

1025 1030 1035

Phe Val Leu Asn Val Pro Glu Met Lys Glu Tyr Ile Arg Lys Ile

1040 1045 1050

Tyr Asn Tyr Lys Tyr Met Leu Val Ser Lys Glu Val Thr Thr Lys

1055 1060 1065

Asn Gly Ala Phe Tyr Asp Gln Thr Lys Tyr Asn Ala Lys Thr Ile

1070 1075 1080

Asn Leu Ile Pro Ile Lys Lys Asp Lys Pro Thr Asn Ile Tyr Gly

1085 1090 1095

Gly Tyr Lys Gly Lys Val Ser Ser Tyr Met Met Leu Val Lys Ile

1100 1105 1110

Gln Lys Lys Lys Glu Val Ile Tyr Lys Phe Val Gly Val Pro Arg

1115 1120 1125

Leu Trp Thr Asp Glu Leu Asp Arg Leu Ile Asp Thr Asp Glu Lys

1130 1135 1140

Lys Ala Leu Leu Lys Lys

1145

<210> SEQ ID NO 33

<211> LENGTH: 1064

<212> TYPE: PRT

<213> ORGANISM: Rhodopseudomonas palustris

<400> SEQUENCE: 33

Met Ser Glu Cys Val Thr Arg Arg Ile Leu Gly Ile Asp Leu Gly Ile

1 5 10 15

Ala Ser Cys Gly Trp Gly Val Ile Glu Val Gly Glu Ala Ser Gly Ser

20 25 30

Ile Ile Ala Ser Gly Val Arg Cys Phe Asp Ala Pro Leu Ile Asp Lys

35 40 45

Thr Gly Glu Pro Lys Ser Ala Thr Arg Arg Thr Ala Arg Gly Gln Arg

50 55 60

Arg Ile Ile Arg Arg Arg Arg Gln Arg Met Asn Ala Val Arg Arg Leu

65 70 75 80

Leu Ala Glu Phe Gly Val Leu Thr Gly Arg Ser Pro Asp Ala Leu His

85 90 95

Gln Ala Leu Leu Arg Leu Ser Gln Ser Val Ala Gly Ser Gln Val Thr

100 105 110

Pro Trp Thr Leu Arg Ala Ala Ala His Glu Arg Lys Leu Thr Asn Asp

115 120 125

Glu Leu Ala Val Val Leu Gly His Ile Ala Arg His Arg Gly Phe Arg

130 135 140

Ser Asn Ser Lys Asn Asp Gly Gly Ala Asn Ala Ala Asp Glu Thr Ser

145 150 155 160

Lys Met Lys Lys Ala Met Glu Thr Thr Arg Glu Gly Leu Ala Arg Tyr

165 170 175

His Ser Phe Gly Ala Met Ile Ala Ser Asp Pro Lys Phe Ala Asp Arg

180 185 190

Lys Arg Asn Arg Asp Lys Asp Tyr Ser His Thr Ala Lys Arg Ser Asp

195 200 205

Leu Glu Asp Glu Val Arg Thr Ile Phe Arg Ser Gln Thr Arg Phe Gly

210 215 220

Ser Leu Val Ala Ser Glu Lys Leu Ser Gln Ala Phe Ala Asp Ala Ala

225 230 235 240

Phe Phe Gln Arg Pro Leu Gln Asp Ser Glu Asp Met Val Gly Ser Cys

245 250 255

Pro Phe Glu Pro Gly Gln Lys Arg Thr Ala Arg Arg Ala Pro Ser Phe

260 265 270

Glu Leu Phe Arg Phe Leu Ser Arg Leu Ala Asn Leu Lys Leu Thr Val

275 280 285

Gly Arg Ala Pro Glu Arg Arg Leu Thr Pro Asp Glu Ile Ala Leu Ala

290 295 300

Ala Lys Gly Phe Gly Glu Thr Lys Lys Ser Ile Thr Phe Lys Ser Leu

305 310 315 320

Arg Glu Ala Leu Asp Leu Asp Pro Asn Ala Arg Phe Ser Gly Val Ala

325 330 335

Lys Glu Lys Glu Ser Thr Leu Asp Val Ala Ala Arg Thr Gly Gly Ala

340 345 350

Ala Tyr Gly Thr Lys Thr Leu Lys Asp Ala Leu Gly Asp Ala Pro Trp

355 360 365

Arg Ser Leu Ser Arg Met Pro Glu Lys Leu Asp Arg Ile Ala Glu Ile

370 375 380

Leu Ser Phe Arg Glu Asp Met Lys Ala Ile Arg Asn Gly Leu Glu Glu

385 390 395 400

Val Gly Leu Asp Gly Leu Val Val Asp Ala Leu Met Gln Ala Thr Ala

405 410 415

Asn Gly Asp Phe Lys Asp Phe Thr Arg Ala Ala His Ile Ser Ala Leu

420 425 430

Ala Ala Arg Asn Ile Ile Pro Gly Leu Arg Glu Gly Leu Val Tyr Ser

435 440 445

Asp Ala Cys Thr Arg Val Gly Tyr Asp His Ala Ala Arg Pro Ala Val

450 455 460

Pro Leu Ser Gln Ile Gly Ser Pro Val Thr Arg Lys Ala Leu Ser Glu

465 470 475 480

Ala Leu Lys Gln Val Arg Ala Val Ala Arg Glu Tyr Gly Pro Ile Asp

485 490 495

Tyr Phe His Ile Glu Leu Ala Arg Ser Ile Gly Lys Ser Ala Glu Glu

500 505 510

Arg Lys Lys Leu Thr Asp Gly Ile Glu Ala Arg Asn Val Glu Lys Glu

515 520 525

Lys Arg Arg Lys Glu Ala Ala Glu His Leu Gly Arg Ala Pro Ser Asp

530 535 540

Asp Glu Leu Leu Arg Tyr Glu Leu Ala Lys Glu Gln Asn Phe Lys Cys

545 550 555 560

Ile Tyr Ser Gly Asp Pro Ile Asp Pro Ala Gly Ile Ser Ala Asn Asp

565 570 575

Thr Arg Tyr Gln Val Asp His Ile Leu Pro Trp Ser Arg Phe Gly Asp

580 585 590

Asp Ser Tyr Val Asn Lys Thr Leu Cys Thr Ala Arg Ser Asn Gln Asn

595 600 605

Lys Arg Gly Arg Thr Pro Phe Glu Trp Phe Asp Ala Asp Lys Thr Glu

610 615 620

Ala Glu Trp Met Glu Tyr Ser Ala Arg Val Glu Asp Leu Lys Glu Val

625 630 635 640

Lys Gly Arg Lys Lys Arg Asn Tyr Ser Ile Lys Asp Ala Ala Ser Val

645 650 655

Glu Asp Lys Phe Lys Ala Arg Asn Leu Thr Asp Thr Gln Trp Ala Thr

660 665 670

Arg Leu Leu Ala Asp Glu Leu Lys Arg Met Phe Pro Pro Arg Glu Cys

675 680 685

Glu Arg Val Val Thr Val Arg Ala Asp Gly Gly Asn Asp Gly Leu Ser

690 695 700

Ile Val Glu Glu Arg Arg Val Phe Thr Arg Pro Gly Ala Ile Thr Ser

705 710 715 720

Lys Leu Arg Arg Ala Trp Gly Leu Glu Gly Leu Lys Lys Gln Asp Gly

725 730 735

Lys Arg Val Glu Asp Asp Arg His His Ala Val Asp Ala Leu Val Leu

740 745 750

Ala Ala Thr Thr Glu Ser Leu Leu Asn Arg Leu Thr Val Glu Val Gln

755 760 765

Gln Arg Glu Arg Glu Gly Arg Gln Asp Asp Ile Phe His Cys Ser Gln

770 775 780

Pro Trp Pro Gly Phe Arg Val Asp Val Gln Arg Thr Val Tyr Gly Ser

785 790 795 800

Glu Thr Met Pro Gly Ile Phe Val Ser Arg Ala Glu Arg Arg Arg Ala

805 810 815

Arg Gly Lys Ala His Asp Ala Thr Val Lys Gln Ile Arg Asp Ile Asp

820 825 830

Gly Glu Arg Ile Val Phe Glu Arg Lys Pro Ile Glu Lys Leu Thr Asp

835 840 845

Lys Asp Leu Glu Arg Ile Pro Val Pro Glu Pro Tyr Gly Lys Ala Ala

850 855 860

Asp Pro Lys Lys Leu Arg Asp Glu Leu Val Glu Asn Leu Arg Ala Trp

865 870 875 880

Ile Ala Ala Gly Lys Pro Lys Asp Lys Pro Pro Arg Ser Pro Lys Gly

885 890 895

Asp Ile Ile Arg Lys Val Arg Ile Glu Thr Lys Asp Lys Val Ala Val

900 905 910

Glu Ile Asn Gly Gly Thr Val Asp Arg Gly Asp Met Ala Arg Val Asp

915 920 925

Val Phe Arg Lys Lys Asn Lys Lys Gly Val Trp Glu Phe Tyr Val Ile

930 935 940

Pro Ile Tyr Pro His Gln Ile Val Ala Ser Ala Leu Pro Pro Asn Arg

945 950 955 960

Ala Val Ile Ala Tyr Lys Ala Glu Ser Glu Trp Thr Ala Ile Asp Gly

965 970 975

Cys Phe Glu Phe Ala Trp Ser Leu Asn Pro Met Ser Tyr Leu Glu Leu

980 985 990

Val Lys Ser Asn Gly Glu Leu Ile Glu Gly Tyr Phe Arg Ser Met Asp

995 1000 1005

Arg Thr Thr Gly Ala Ile Asn Leu Ser Pro Met Ser Thr Asn Ser

1010 1015 1020

Glu Thr Ile Arg Ser Ile Gly Val Lys Thr Leu Ser Ser Phe Arg

1025 1030 1035

Lys Phe Thr Val Asp Arg Leu Gly Arg Lys Phe Glu Ile Pro Arg

1040 1045 1050

Glu Val Arg Thr Trp Arg Gly Glu Ala Cys Thr

1055 1060

<210> SEQ ID NO 34

<211> LENGTH: 1166

<212> TYPE: PRT

<213> ORGANISM: Nitrobacter hamburgensis

<400> SEQUENCE: 34

Met His Val Glu Ile Asp Phe Pro His Phe Ser Arg Gly Asp Ser His

1 5 10 15

Leu Ala Met Asn Lys Asn Glu Ile Leu Arg Gly Ser Ser Val Leu Tyr

20 25 30

Arg Leu Gly Leu Asp Leu Gly Ser Asn Ser Leu Gly Trp Phe Val Thr

35 40 45

His Leu Glu Lys Arg Gly Asp Arg His Glu Pro Val Ala Leu Gly Pro

50 55 60

Gly Gly Val Arg Ile Phe Pro Asp Gly Arg Asp Pro Gln Ser Gly Thr

65 70 75 80

Ser Asn Ala Val Asp Arg Arg Met Ala Arg Gly Ala Arg Lys Arg Arg

85 90 95

Asp Arg Phe Val Glu Arg Arg Lys Glu Leu Ile Ala Ala Leu Ile Lys

100 105 110

Tyr Asn Leu Leu Pro Asp Asp Ala Arg Glu Arg Arg Ala Leu Glu Val

115 120 125

Leu Asp Pro Tyr Ala Leu Arg Lys Thr Ala Leu Thr Asp Thr Leu Pro

130 135 140

Ala His His Val Gly Arg Ala Leu Phe His Leu Asn Gln Arg Arg Gly

145 150 155 160

Phe Gln Ser Asn Arg Lys Thr Asp Ser Lys Gln Ser Glu Asp Gly Ala

165 170 175

Ile Lys Gln Ala Ala Ser Arg Leu Ala Thr Asp Lys Gly Asn Glu Thr

180 185 190

Leu Gly Val Phe Phe Ala Asp Met His Leu Arg Lys Ser Tyr Glu Asp

195 200 205

Arg Gln Thr Ala Ile Arg Ala Glu Leu Val Arg Leu Gly Lys Asp His

210 215 220

Leu Thr Gly Asn Ala Arg Lys Lys Ile Trp Ala Lys Val Arg Lys Arg

225 230 235 240

Leu Phe Gly Asp Glu Val Leu Pro Arg Ala Asp Ala Pro His Gly Val

245 250 255

Arg Ala Arg Ala Thr Ile Thr Gly Thr Lys Ala Ser Tyr Asp Tyr Tyr

260 265 270

Pro Thr Arg Asp Met Leu Arg Asp Glu Phe Asn Ala Ile Trp Ala Gly

275 280 285

Gln Ser Ala His His Ala Thr Ile Thr Asp Glu Ala Arg Thr Glu Ile

290 295 300

Glu His Ile Ile Phe Tyr Gln Arg Pro Leu Lys Pro Ala Ile Val Gly

305 310 315 320

Lys Cys Thr Leu Asp Pro Ala Thr Arg Pro Phe Lys Glu Asp Pro Glu

325 330 335

Gly Tyr Arg Ala Pro Trp Ser His Pro Leu Ala Gln Arg Phe Arg Ile

340 345 350

Leu Ser Glu Ala Arg Asn Leu Glu Ile Arg Asp Thr Gly Lys Gly Ser

355 360 365

Arg Arg Leu Thr Lys Glu Gln Ser Asp Leu Val Val Ala Ala Leu Leu

370 375 380

Ala Asn Arg Glu Val Lys Phe Asp Lys Leu Arg Thr Leu Leu Lys Leu

385 390 395 400

Pro Ala Glu Ala Arg Phe Asn Leu Glu Ser Asp Arg Arg Ala Ala Leu

405 410 415

Asp Gly Asp Gln Thr Ala Ala Arg Leu Ser Asp Lys Lys Gly Phe Asn

420 425 430

Lys Ala Trp Arg Gly Phe Pro Pro Glu Arg Gln Ile Ala Ile Val Ala

435 440 445

Arg Leu Glu Glu Thr Glu Asp Glu Asn Glu Leu Ile Ala Trp Leu Glu

450 455 460

Lys Glu Cys Ala Leu Asp Gly Ala Ala Ala Ala Arg Val Ala Asn Thr

465 470 475 480

Thr Leu Pro Asp Gly His Cys Arg Leu Gly Leu Arg Ala Ile Lys Lys

485 490 495

Ile Val Pro Ile Met Gln Asp Gly Leu Asp Glu Asp Gly Val Ala Gly

500 505 510

Ala Gly Tyr His Ile Ala Ala Lys Arg Ala Gly Tyr Asp His Ala Lys

515 520 525

Leu Pro Thr Gly Glu Gln Leu Gly Arg Leu Pro Tyr Tyr Gly Gln Trp

530 535 540

Leu Gln Asp Ala Val Val Gly Ser Gly Asp Ala Arg Asp Gln Lys Glu

545 550 555 560

Lys Gln Tyr Gly Gln Phe Pro Asn Pro Thr Val His Ile Gly Leu Gly

565 570 575

Gln Leu Arg Arg Val Val Asn Asp Leu Ile Asp Lys Tyr Gly Pro Pro

580 585 590

Thr Glu Ile Ser Ile Glu Phe Thr Arg Ala Leu Lys Leu Ser Glu Gln

595 600 605

Gln Lys Ala Glu Arg Gln Arg Glu Gln Arg Arg Asn Gln Asp Lys Asn

610 615 620

Lys Ala Arg Ala Glu Glu Leu Ala Lys Phe Gly Arg Pro Ala Asn Pro

625 630 635 640

Arg Asn Leu Leu Lys Met Arg Leu Trp Glu Glu Leu Ala His Asp Pro

645 650 655

Leu Asp Arg Lys Cys Val Tyr Thr Gly Glu Gln Ile Ser Ile Glu Arg

660 665 670

Leu Leu Ser Asp Glu Val Asp Ile Asp His Ile Leu Pro Val Ala Met

675 680 685

Thr Leu Asp Asp Ser Pro Ala Asn Lys Ile Ile Cys Met Arg Tyr Ala

690 695 700

Asn Arg His Lys Arg Lys Gln Thr Pro Ser Glu Ala Phe Gly Ser Ser

705 710 715 720

Pro Thr Leu Gln Gly His Arg Tyr Asn Trp Asp Asp Ile Ala Ala Arg

725 730 735

Ala Thr Gly Leu Pro Arg Asn Lys Arg Trp Arg Phe Asp Ala Asn Ala

740 745 750

Arg Glu Glu Phe Asp Lys Arg Gly Gly Phe Leu Ala Arg Gln Leu Asn

755 760 765

Glu Thr Gly Trp Leu Ala Arg Leu Ala Lys Gln Tyr Leu Gly Ala Val

770 775 780

Thr Asp Pro Asn Gln Ile Trp Val Val Pro Gly Arg Leu Thr Ser Met

785 790 795 800

Leu Arg Gly Lys Trp Gly Leu Asn Gly Leu Leu Pro Ser Asp Asn Tyr

805 810 815

Ala Gly Val Gln Asp Lys Ala Glu Glu Phe Leu Ala Ser Thr Asp Asp

820 825 830

Met Glu Phe Ser Gly Val Lys Asn Arg Ala Asp His Arg His His Ala

835 840 845

Ile Asp Gly Leu Val Thr Ala Leu Thr Asp Arg Ser Leu Leu Trp Lys

850 855 860

Met Ala Asn Ala Tyr Asp Glu Glu His Glu Lys Phe Val Ile Glu Pro

865 870 875 880

Pro Trp Pro Thr Met Arg Asp Asp Leu Lys Ala Ala Leu Glu Lys Met

885 890 895

Val Val Ser His Lys Pro Asp His Gly Ile Glu Gly Lys Leu His Glu

900 905 910

Asp Ser Ala Tyr Gly Phe Val Lys Pro Leu Asp Ala Thr Gly Leu Lys

915 920 925

Glu Glu Glu Ala Gly Asn Leu Val Tyr Arg Lys Ala Ile Glu Ser Leu

930 935 940

Asn Glu Asn Glu Val Asp Arg Ile Arg Asp Ile Gln Leu Arg Thr Ile

945 950 955 960

Val Arg Asp His Val Asn Val Glu Lys Thr Lys Gly Val Ala Leu Ala

965 970 975

Asp Ala Leu Arg Gln Leu Gln Ala Pro Ser Asp Asp Tyr Pro Gln Phe

980 985 990

Lys His Gly Leu Arg His Val Arg Ile Leu Lys Lys Glu Lys Gly Asp

995 1000 1005

Tyr Leu Val Pro Ile Ala Asn Arg Ala Ser Gly Val Ala Tyr Lys

1010 1015 1020

Ala Tyr Ser Ala Gly Glu Asn Phe Cys Val Glu Val Phe Glu Thr

1025 1030 1035

Ala Gly Gly Lys Trp Asp Gly Glu Ala Val Arg Arg Phe Asp Ala

1040 1045 1050

Asn Lys Lys Asn Ala Gly Pro Lys Ile Ala His Ala Pro Gln Trp

1055 1060 1065

Arg Asp Ala Asn Glu Gly Ala Lys Leu Val Met Arg Ile His Lys

1070 1075 1080

Gly Asp Leu Ile Arg Leu Asp His Glu Gly Arg Ala Arg Ile Met

1085 1090 1095

Val Val His Arg Leu Asp Ala Ala Ala Gly Arg Phe Lys Leu Ala

1100 1105 1110

Asp His Asn Glu Thr Gly Asn Leu Asp Lys Arg His Ala Thr Asn

1115 1120 1125

Asn Asp Ile Asp Pro Phe Arg Trp Leu Met Ala Ser Tyr Asn Thr

1130 1135 1140

Leu Lys Lys Leu Ala Ala Val Pro Val Arg Val Asp Glu Leu Gly

1145 1150 1155

Arg Val Trp Arg Val Met Pro Asn

1160 1165

<210> SEQ ID NO 35

<211> LENGTH: 641

<212> TYPE: PRT

<213> ORGANISM: Nitrobacter hamburgensis

<400> SEQUENCE: 35

Met His Lys Arg Asn Ser Arg Ser Tyr Arg Leu Gly Leu Asp Leu Gly

1 5 10 15

Ser Asn Ser Leu Gly Trp Phe Val Thr His Leu Glu Lys Arg Gly Asp

20 25 30

Arg Tyr Glu Pro Val Ala Leu Asp Pro Gly Gly Val Arg Ile Phe Pro

35 40 45

Asp Gly Arg Asp Pro Gln Ser Gly Met Ser Asn Ala Val Asp Arg Arg

50 55 60

Met Ala Arg Gly Ala Arg Lys Arg Arg Asp Arg Phe Val Asp Pro Tyr

65 70 75 80

Ala Leu Arg Lys Ala Ala Leu Thr Asn Ala Leu Pro Ala His Arg Val

85 90 95

Gly Arg Ala Ile Phe His Leu Asn Gln Arg Arg Gly Phe Gln Ser Asn

100 105 110

Arg Lys Thr Asp Arg Lys Gln Ser Glu Asp Gly Ala Ile Lys Gln Ala

115 120 125

Ala Ser Lys Leu Lys Ala Arg Met His Glu Glu Ala Ala Pro Thr Leu

130 135 140

Gly Ala Phe Phe Ala Asp Met His Leu Arg Lys Ser Tyr Asp Asp Gln

145 150 155 160

Gln Thr Ala Ile Arg Ala Glu Leu Val Arg Leu Gly Lys Asp His Leu

165 170 175

Thr Gly Asn Ala Arg Lys Lys Val Trp Ala Lys Val Arg Lys Arg Leu

180 185 190

Phe Gly Asp Glu Val Leu Arg Pro Lys Asp Thr Ser Asp Gly Ala Arg

195 200 205

Ala Arg Ala Thr Thr Thr Gly Thr Lys Ala Ser Tyr Asp Phe Tyr Pro

210 215 220

Thr Arg Ala Met Leu Leu Asp Glu Phe Asn Ala Ile Trp Ala Ala Gln

225 230 235 240

Arg Glu His His Leu Ala Met Thr Asp Glu Ala Lys Ala Glu Ile Glu

245 250 255

His Ile Ile Phe Tyr His Arg Pro Leu Lys Pro Ala Ile Val Gly Lys

260 265 270

Cys Thr Leu Asp Pro Ala Thr Arg Pro Phe Lys Glu Asp Pro Glu Gly

275 280 285

Tyr Arg Ala Pro Trp Ser His Pro Leu Ala Gln Arg Phe Arg Ile Leu

290 295 300

Ser Glu Ala Arg Asn Leu Glu Ile Arg Glu Thr Gly Lys Thr Ser Arg

305 310 315 320

Arg Leu Thr Lys Asp Gln Ser Asp Leu Val Val Ala Ala Leu Leu Ala

325 330 335

Asn Arg Glu Val Lys Phe Asp Lys Leu Arg Thr Leu Leu Lys Leu Pro

340 345 350

Ala Glu Ala Arg Phe Asn Leu Glu Ser Asp Arg Arg Ser Thr Ala Thr

355 360 365

Arg Pro Pro His Gly Cys Arg Thr Arg Arg Ala Ser Ile Arg Arg Gly

370 375 380

Ala Gly Phe Leu Arg Asn Gly Arg Ser Arg Ser Ser Pro Asp Trp Lys

385 390 395 400

Lys Pro Lys Thr Lys Thr Asn Cys Ser His Gly Ser Lys Trp Asn Ala

405 410 415

His Leu Thr Ala Gln Arg Pro Arg Ala Ser Pro Ile Arg Arg Cys Arg

420 425 430

Thr Ala Ile Ala Gly Ser Ala Cys Ala Gln Ser Lys Arg Ser Cys Arg

435 440 445

Ser Cys Arg Thr Ser Ser Met Arg Met Ala Leu Arg Ala Pro Ala Ile

450 455 460

Thr Ser Trp Pro Ser Val Pro Ala Thr Thr Ile Pro Val Asp Thr Phe

465 470 475 480

Gln Gln Gly Gly Asn Ala Phe Gly Arg Ile Leu Ser Val Val Thr Ile

485 490 495

Glu Leu Phe Ala Ile Thr Asp Arg Val Asp Pro Val Gln Glu Gln Asp

500 505 510

Ala Arg Cys Phe Ala Leu Arg Ile Ala Glu Cys Arg Ser Asp Gly Leu

515 520 525

Gln His Phe Ala Glu Val Ser Leu Gly Leu Pro Pro Arg Asn Ala Ala

530 535 540

Gln Asp Gln Arg His Ser Ala Gly Leu Cys Gln Ser Ser Cys Val Gly

545 550 555 560

Arg Phe Val Pro Gly Thr Pro Cys Arg Arg Thr Pro Arg Ser Thr Ser

565 570 575

Leu His Gly Lys Glu Pro Ser Ala His Ala Phe Gly Ser Thr Ala Arg

580 585 590

Ser Arg Pro Gln Leu Pro Ala Ser Pro Arg Pro Cys Met Ser Phe Arg

595 600 605

Asp Gly Thr Ser Ser Ser Pro Ala Arg Met Pro Glu Thr Leu Asp Arg

610 615 620

Cys Ala Ile Ser Arg Ser Asn Arg Ser Ser Val Cys Arg Ala Glu Arg

625 630 635 640

Leu

<210> SEQ ID NO 36

<211> LENGTH: 1368

<212> TYPE: PRT

<213> ORGANISM: Streptococcus pyogenes

<400> SEQUENCE: 36

Met Asp Lys Lys Tyr Ser Ile Gly Leu Asp Ile Gly Thr Asn Ser Val

1 5 10 15

Gly Trp Ala Val Ile Thr Asp Asp Tyr Lys Val Pro Ser Lys Lys Phe

20 25 30

Lys Val Leu Gly Asn Thr Asp Arg His Ser Ile Lys Lys Asn Leu Ile

35 40 45

Gly Ala Leu Leu Phe Asp Ser Gly Glu Ile Ala Glu Ala Thr Arg Leu

50 55 60

Lys Arg Thr Ala Arg Arg Arg Tyr Thr Arg Arg Lys Asn Arg Ile Cys

65 70 75 80

Tyr Leu Gln Glu Ile Phe Ser Asn Glu Met Ala Lys Val Asp Asp Ser

85 90 95

Phe Phe His Arg Leu Glu Glu Ser Phe Leu Val Glu Glu Asp Lys Lys

100 105 110

His Glu Arg His Pro Ile Phe Gly Asn Ile Val Asp Glu Val Ala Tyr

115 120 125

His Glu Lys Tyr Pro Thr Ile Tyr His Leu Arg Lys Lys Leu Val Asp

130 135 140

Ser Thr Asp Lys Ala Asp Leu Arg Leu Ile Tyr Leu Ala Leu Ala His

145 150 155 160

Met Ile Lys Phe Arg Gly His Phe Leu Ile Glu Gly Asp Leu Asn Pro

165 170 175

Asp Asn Ser Asp Val Asp Lys Leu Phe Ile Gln Leu Val Gln Thr Tyr

180 185 190

Asn Gln Leu Phe Glu Glu Asn Pro Ile Asn Ala Ser Gly Val Asp Ala

195 200 205

Lys Ala Ile Leu Ser Ala Arg Leu Ser Lys Ser Arg Arg Leu Glu Asn

210 215 220

Leu Ile Ala Gln Leu Pro Gly Glu Lys Lys Asn Gly Leu Phe Gly Asn

225 230 235 240

Leu Ile Ala Leu Ser Leu Gly Leu Thr Pro Asn Phe Lys Ser Asn Phe

245 250 255

Asp Leu Ala Glu Asp Ala Lys Leu Gln Leu Ser Lys Asp Thr Tyr Asp

260 265 270

Asp Asp Leu Asp Asn Leu Leu Ala Gln Ile Gly Asp Gln Tyr Ala Asp

275 280 285

Leu Phe Leu Ala Ala Lys Asn Leu Ser Asp Ala Ile Leu Leu Ser Asp

290 295 300

Ile Leu Arg Leu Asn Ser Glu Ile Thr Lys Ala Pro Leu Ser Ala Ser

305 310 315 320

Met Ile Lys Arg Tyr Asp Glu His His Gln Asp Leu Thr Leu Leu Lys

325 330 335

Ala Leu Val Arg Gln Gln Leu Pro Glu Lys Tyr Lys Glu Ile Phe Phe

340 345 350

Asp Gln Ser Lys Asn Gly Tyr Ala Gly Tyr Ile Asp Gly Gly Ala Ser

355 360 365

Gln Glu Glu Phe Tyr Lys Phe Ile Lys Pro Ile Leu Glu Lys Met Asp

370 375 380

Gly Thr Glu Glu Leu Leu Ala Lys Leu Asn Arg Glu Asp Leu Leu Arg

385 390 395 400

Lys Gln Arg Thr Phe Asp Asn Gly Ser Ile Pro His Gln Ile His Leu

405 410 415

Gly Glu Leu His Ala Ile Leu Arg Arg Gln Glu Asp Phe Tyr Pro Phe

420 425 430

Leu Lys Asp Asn Arg Glu Lys Ile Glu Lys Ile Leu Thr Phe Arg Ile

435 440 445

Pro Tyr Tyr Val Gly Pro Leu Ala Arg Gly Asn Ser Arg Phe Ala Trp

450 455 460

Met Thr Arg Lys Ser Glu Glu Thr Ile Thr Pro Trp Asn Phe Glu Glu

465 470 475 480

Val Val Asp Lys Gly Ala Ser Ala Gln Ser Phe Ile Glu Arg Met Thr

485 490 495

Asn Phe Asp Lys Asn Leu Pro Asn Glu Lys Val Leu Pro Lys His Ser

500 505 510

Leu Leu Tyr Glu Tyr Phe Thr Val Tyr Asn Glu Leu Thr Lys Val Lys

515 520 525

Tyr Val Thr Glu Gly Met Arg Lys Pro Ala Phe Leu Ser Gly Glu Gln

530 535 540

Lys Lys Ala Ile Val Asp Leu Leu Phe Lys Thr Asn Arg Lys Val Thr

545 550 555 560

Val Lys Gln Leu Lys Glu Asp Tyr Phe Lys Lys Ile Glu Cys Phe Asp

565 570 575

Ser Val Glu Ile Ser Gly Val Glu Asp Arg Phe Asn Ala Ser Leu Gly

580 585 590

Thr Tyr His Asp Leu Leu Lys Ile Ile Lys Asp Lys Asp Phe Leu Asp

595 600 605

Asn Glu Glu Asn Glu Asp Ile Leu Glu Asp Ile Val Leu Thr Leu Thr

610 615 620

Leu Phe Glu Asp Arg Glu Met Ile Glu Glu Arg Leu Lys Thr Tyr Ala

625 630 635 640

His Leu Phe Asp Asp Lys Val Met Lys Gln Leu Lys Arg Arg Arg Tyr

645 650 655

Thr Val Trp Gly Arg Leu Ser Arg Lys Leu Ile Asn Gly Ile Arg Asp

660 665 670

Lys Gln Ser Gly Lys Thr Ile Leu Asp Phe Leu Lys Ser Asp Gly Phe

675 680 685

Ala Asn Arg Asn Phe Met Gln Leu Ile His Asp Asp Ser Leu Thr Phe

690 695 700

Lys Glu Asp Ile Gln Lys Ala Gln Val Ser Gly Gln Gly Asp Ser Leu

705 710 715 720

His Glu His Ile Ala Asn Leu Ala Gly Ser Pro Ala Ile Lys Lys Gly

725 730 735

Ile Leu Gln Thr Val Lys Val Val Asp Glu Leu Val Lys Val Met Gly

740 745 750

Arg His Lys Pro Glu Asn Ile Val Ile Glu Met Ala Arg Glu Asn Gln

755 760 765

Thr Thr Gln Lys Gly Gln Lys Asn Ser Arg Glu Arg Met Lys Arg Ile

770 775 780

Glu Glu Gly Ile Lys Glu Leu Gly Ser Gln Ile Leu Lys Glu His Pro

785 790 795 800

Val Glu Asn Thr Gln Leu Gln Asn Glu Lys Leu Tyr Leu Tyr Tyr Leu

805 810 815

Gln Asn Gly Arg Asp Met Tyr Val Asp Gln Glu Leu Asp Ile Asn Arg

820 825 830

Leu Ser Asp Tyr Asp Val Asp His Ile Val Pro Gln Ser Phe Ile Lys

835 840 845

Asp Asp Ser Ile Asp Asn Lys Val Leu Thr Arg Ser Asp Lys Asn Arg

850 855 860

Gly Lys Ser Asp Asn Val Pro Ser Glu Glu Val Val Lys Lys Met Lys

865 870 875 880

Asn Tyr Trp Arg Gln Leu Leu Asn Ala Lys Leu Ile Thr Gln Arg Lys

885 890 895

Phe Asp Asn Leu Thr Lys Ala Glu Arg Gly Gly Leu Ser Glu Leu Asp

900 905 910

Lys Ala Gly Phe Ile Lys Arg Gln Leu Val Glu Thr Arg Gln Ile Thr

915 920 925

Lys His Val Ala Gln Ile Leu Asp Ser Arg Met Asn Thr Lys Tyr Asp

930 935 940

Glu Asn Asp Lys Leu Ile Arg Glu Val Lys Val Ile Thr Leu Lys Ser

945 950 955 960

Lys Leu Val Ser Asp Phe Arg Lys Asp Phe Gln Phe Tyr Lys Val Arg

965 970 975

Glu Ile Asn Asn Tyr His His Ala His Asp Ala Tyr Leu Asn Ala Val

980 985 990

Val Gly Thr Ala Leu Ile Lys Lys Tyr Pro Lys Leu Glu Ser Glu Phe

995 1000 1005

Val Tyr Gly Asp Tyr Lys Val Tyr Asp Val Arg Lys Met Leu Ala

1010 1015 1020

Lys Ser Glu Gln Glu Ile Gly Lys Ala Thr Ala Lys Tyr Phe Phe

1025 1030 1035

Tyr Ser Asn Ile Met Asn Phe Phe Lys Thr Glu Ile Thr Leu Ala

1040 1045 1050

Asn Gly Glu Ile Arg Lys Arg Pro Leu Ile Glu Thr Asn Gly Glu

1055 1060 1065

Thr Gly Glu Ile Val Trp Asp Lys Gly Arg Asp Phe Ala Thr Val

1070 1075 1080

Arg Lys Val Leu Ser Met Pro Gln Val Asn Ile Val Lys Lys Thr

1085 1090 1095

Glu Val Gln Thr Gly Gly Phe Ser Lys Glu Ser Ile Leu Pro Lys

1100 1105 1110

Arg Asn Ser Asp Lys Leu Ile Ala Arg Lys Lys Asp Trp Asp Pro

1115 1120 1125

Lys Lys Tyr Gly Gly Phe Asp Ser Pro Thr Val Ala Tyr Ser Val

1130 1135 1140

Leu Val Val Ala Lys Val Glu Lys Gly Lys Ser Lys Lys Leu Lys

1145 1150 1155

Ser Val Lys Glu Leu Leu Gly Ile Thr Ile Met Glu Arg Ser Ser

1160 1165 1170

Phe Glu Lys Asn Pro Ile Asp Phe Leu Glu Ala Lys Gly Tyr Lys

1175 1180 1185

Glu Val Arg Lys Asp Leu Ile Ile Lys Leu Pro Lys Tyr Ser Leu

1190 1195 1200

Phe Glu Leu Glu Asn Gly Arg Lys Arg Met Leu Ala Ser Ala Gly

1205 1210 1215

Glu Leu Gln Lys Gly Asn Glu Leu Ala Leu Pro Ser Lys Tyr Val

1220 1225 1230

Asn Phe Leu Tyr Leu Ala Ser His Tyr Glu Lys Leu Lys Gly Ser

1235 1240 1245

Pro Glu Asp Asn Glu Gln Lys Gln Leu Phe Val Glu Gln His Lys

1250 1255 1260

His Tyr Leu Asp Glu Ile Ile Glu Gln Ile Ser Glu Phe Ser Lys

1265 1270 1275

Arg Val Ile Leu Ala Asp Ala Asn Leu Asp Lys Val Leu Ser Ala

1280 1285 1290

Tyr Asn Lys His Arg Asp Lys Pro Ile Arg Glu Gln Ala Glu Asn

1295 1300 1305

Ile Ile His Leu Phe Thr Leu Thr Asn Leu Gly Ala Pro Thr Ala

1310 1315 1320

Phe Lys Tyr Phe Asp Thr Thr Ile Asp Arg Lys Arg Tyr Thr Ser

1325 1330 1335

Thr Lys Glu Val Leu Asp Ala Thr Phe Ile His Gln Ser Ile Thr

1340 1345 1350

Gly Leu Tyr Glu Thr Arg Ile Asp Leu Ser Gln Leu Gly Gly Asp

1355 1360 1365

<210> SEQ ID NO 37

<211> LENGTH: 1368

<212> TYPE: PRT

<213> ORGANISM: Streptococcus pyogenes

<400> SEQUENCE: 37

Met Asp Lys Lys Tyr Ser Ile Gly Leu Asp Ile Gly Thr Asn Ser Val

1 5 10 15

Gly Trp Ala Val Ile Thr Asp Asp Tyr Lys Val Pro Ser Lys Lys Phe

20 25 30

Lys Val Leu Gly Asn Thr Asp Arg His Ser Ile Lys Lys Asn Leu Ile

35 40 45

Gly Ala Leu Leu Phe Asp Ser Gly Glu Thr Ala Glu Ala Thr Arg Leu

50 55 60

Lys Arg Thr Ala Arg Arg Arg Tyr Thr Arg Arg Lys Asn Arg Ile Cys

65 70 75 80

Tyr Leu Gln Glu Ile Phe Ser Asn Glu Met Ala Lys Val Asp Asp Ser

85 90 95

Phe Phe His Arg Leu Glu Glu Ser Phe Leu Val Glu Glu Asp Lys Lys

100 105 110

His Glu Arg His Pro Ile Phe Gly Asn Ile Val Asp Glu Val Ala Tyr

115 120 125

His Glu Lys Tyr Pro Thr Ile Tyr His Leu Arg Lys Lys Leu Val Asp

130 135 140

Ser Thr Asp Lys Ala Asp Leu Arg Leu Ile Tyr Leu Ala Leu Ala His

145 150 155 160

Met Ile Lys Phe Arg Gly His Phe Leu Ile Glu Gly Asp Leu Asn Pro

165 170 175

Asp Asn Ser Asp Val Asp Lys Leu Phe Ile Gln Leu Val Gln Thr Tyr

180 185 190

Asn Gln Leu Phe Glu Glu Asn Pro Ile Asn Ala Ser Gly Val Asp Ala

195 200 205

Lys Ala Ile Leu Ser Ala Arg Leu Ser Lys Ser Arg Arg Leu Glu Asn

210 215 220

Leu Ile Ala Gln Leu Pro Gly Glu Lys Lys Asn Gly Leu Phe Gly Asn

225 230 235 240

Leu Ile Ala Leu Ser Leu Gly Leu Thr Pro Asn Phe Lys Ser Asn Phe

245 250 255

Asp Leu Ala Glu Asp Ala Lys Leu Gln Leu Ser Lys Asp Thr Tyr Asp

260 265 270

Asp Asp Leu Asp Asn Leu Leu Ala Gln Ile Gly Asp Gln Tyr Ala Asp

275 280 285

Leu Phe Leu Ala Ala Lys Asn Leu Ser Asp Ala Ile Leu Leu Ser Asp

290 295 300

Ile Leu Arg Leu Asn Ser Glu Ile Thr Lys Ala Pro Leu Ser Ala Ser

305 310 315 320

Met Ile Lys Arg Tyr Asp Glu His His Gln Asp Leu Thr Leu Leu Lys

325 330 335

Ala Leu Val Arg Gln Gln Leu Pro Glu Lys Tyr Lys Glu Ile Phe Phe

340 345 350

Asp Gln Ser Lys Asn Gly Tyr Ala Gly Tyr Ile Asp Gly Gly Ala Ser

355 360 365

Gln Glu Glu Phe Tyr Lys Phe Ile Lys Pro Ile Leu Glu Lys Met Asp

370 375 380

Gly Thr Glu Glu Leu Leu Ala Lys Leu Asn Arg Glu Asp Leu Leu Arg

385 390 395 400

Lys Gln Arg Thr Phe Asp Asn Gly Ser Ile Pro His Gln Ile His Leu

405 410 415

Gly Glu Leu His Ala Ile Leu Arg Arg Gln Glu Asp Phe Tyr Pro Phe

420 425 430

Leu Lys Asp Asn Arg Glu Lys Ile Glu Lys Ile Leu Thr Phe Arg Ile

435 440 445

Pro Tyr Tyr Val Gly Pro Leu Ala Arg Gly Asn Ser Arg Phe Ala Trp

450 455 460

Met Thr Arg Lys Ser Glu Glu Thr Ile Thr Pro Trp Asn Phe Glu Glu

465 470 475 480

Val Val Asp Lys Gly Ala Ser Ala Gln Ser Phe Ile Glu Arg Met Thr

485 490 495

Asn Phe Asp Lys Asn Leu Pro Asn Glu Lys Val Leu Pro Lys His Ser

500 505 510

Leu Leu Tyr Glu Tyr Phe Thr Val Tyr Asn Glu Leu Thr Lys Val Lys

515 520 525

Tyr Val Thr Glu Gly Met Arg Lys Pro Ala Phe Leu Ser Gly Glu Gln

530 535 540

Lys Lys Ala Ile Val Asp Leu Leu Phe Lys Thr Asn Arg Lys Val Thr

545 550 555 560

Val Lys Gln Leu Lys Glu Asp Tyr Phe Lys Lys Ile Glu Cys Phe Asp

565 570 575

Ser Val Glu Ile Ser Gly Val Glu Asp Arg Phe Asn Ala Ser Leu Gly

580 585 590

Thr Tyr His Asp Leu Leu Lys Ile Ile Lys Asp Lys Asp Phe Leu Asp

595 600 605

Asn Glu Glu Asn Glu Asp Ile Leu Glu Asp Ile Val Leu Thr Leu Thr

610 615 620

Leu Phe Glu Asp Arg Glu Met Ile Glu Glu Arg Leu Lys Thr Tyr Ala

625 630 635 640

His Leu Phe Asp Asp Lys Val Met Lys Gln Leu Lys Arg Arg Arg Tyr

645 650 655

Thr Gly Trp Gly Arg Leu Ser Arg Lys Leu Ile Asn Gly Ile Arg Asp

660 665 670

Lys Gln Ser Gly Lys Thr Ile Leu Asp Phe Leu Lys Ser Asp Gly Phe

675 680 685

Ala Asn Arg Asn Phe Met Gln Leu Ile His Asp Asp Ser Leu Thr Phe

690 695 700

Lys Glu Asp Ile Gln Lys Ala Gln Val Ser Gly Gln Gly Asp Ser Leu

705 710 715 720

His Glu His Ile Ala Asn Leu Ala Gly Ser Pro Ala Ile Lys Lys Gly

725 730 735

Ile Leu Gln Thr Val Lys Val Val Asp Glu Leu Val Lys Val Met Gly

740 745 750

Arg His Lys Pro Glu Asn Ile Val Ile Glu Met Ala Arg Glu Asn Gln

755 760 765

Thr Thr Gln Lys Gly Gln Lys Asn Ser Arg Glu Arg Met Lys Arg Ile

770 775 780

Glu Glu Gly Ile Lys Glu Leu Gly Ser Gln Ile Leu Lys Glu His Pro

785 790 795 800

Val Glu Asn Thr Gln Leu Gln Asn Glu Lys Leu Tyr Leu Tyr Tyr Leu

805 810 815

Gln Asn Gly Arg Asp Met Tyr Val Asp Gln Glu Leu Asp Ile Asn Arg

820 825 830

Leu Ser Asp Tyr Asp Val Asp His Ile Val Pro Gln Ser Phe Ile Lys

835 840 845

Asp Asp Ser Ile Asp Asn Lys Val Leu Thr Arg Ser Asp Lys Asn Arg

850 855 860

Gly Lys Ser Asp Asn Val Pro Ser Glu Glu Val Val Lys Lys Met Lys

865 870 875 880

Asn Tyr Trp Arg Gln Leu Leu Asn Ala Lys Leu Ile Thr Gln Arg Lys

885 890 895

Phe Asp Asn Leu Thr Lys Ala Glu Arg Gly Gly Leu Ser Glu Leu Asp

900 905 910

Lys Ala Gly Phe Ile Lys Arg Gln Leu Val Glu Thr Arg Gln Ile Thr

915 920 925

Lys His Val Ala Gln Ile Leu Asp Ser Arg Met Asn Thr Lys Tyr Asp

930 935 940

Glu Asn Asp Lys Leu Ile Arg Glu Val Lys Val Ile Thr Leu Lys Ser

945 950 955 960

Lys Leu Val Ser Asp Phe Arg Lys Asp Phe Gln Phe Tyr Lys Val Arg

965 970 975

Glu Ile Asn Asn Tyr His His Ala His Asp Ala Tyr Leu Asn Ala Val

980 985 990

Val Gly Thr Ala Leu Ile Lys Lys Tyr Pro Lys Leu Glu Ser Glu Phe

995 1000 1005

Val Tyr Gly Asp Tyr Lys Val Tyr Asp Val Arg Lys Met Ile Ala

1010 1015 1020

Lys Ser Glu Gln Glu Ile Gly Lys Ala Thr Ala Lys Tyr Phe Phe

1025 1030 1035

Tyr Ser Asn Ile Met Asn Phe Phe Lys Thr Glu Ile Thr Leu Ala

1040 1045 1050

Asn Gly Glu Ile Arg Lys Arg Pro Leu Ile Glu Thr Asn Gly Glu

1055 1060 1065

Thr Gly Glu Ile Val Trp Asp Lys Gly Arg Asp Phe Ala Thr Val

1070 1075 1080

Arg Lys Val Leu Ser Met Pro Gln Val Asn Ile Val Lys Lys Thr

1085 1090 1095

Glu Val Gln Thr Gly Gly Phe Ser Lys Glu Ser Ile Leu Pro Lys

1100 1105 1110

Arg Asn Ser Asp Lys Leu Ile Ala Arg Lys Lys Asp Trp Asp Pro

1115 1120 1125

Lys Lys Tyr Gly Gly Phe Asp Ser Pro Thr Val Ala Tyr Ser Val

1130 1135 1140

Leu Val Val Ala Lys Val Glu Lys Gly Lys Ser Lys Lys Leu Lys

1145 1150 1155

Ser Val Lys Glu Leu Leu Gly Leu Thr Ile Met Glu Arg Ser Ser

1160 1165 1170

Phe Glu Lys Asn Pro Ile Asp Phe Leu Glu Ala Lys Gly Tyr Lys

1175 1180 1185

Glu Val Lys Lys Asp Leu Ile Ile Lys Leu Pro Lys Tyr Ser Leu

1190 1195 1200

Phe Glu Leu Glu Asn Gly Arg Lys Arg Met Leu Ala Ser Ala Gly

1205 1210 1215

Glu Leu Gln Lys Gly Asn Glu Leu Ala Leu Pro Ser Lys Tyr Val

1220 1225 1230

Asn Phe Leu Tyr Leu Ala Ser His Tyr Glu Lys Leu Lys Gly Ser

1235 1240 1245

Pro Glu Asp Asn Glu Gln Lys Gln Leu Phe Val Glu Gln His Lys

1250 1255 1260

His Tyr Leu Asp Glu Ile Ile Glu Gln Ile Ser Glu Phe Ser Lys

1265 1270 1275

Arg Val Ile Leu Ala Asp Ala Asn Leu Asp Lys Val Leu Ser Ala

1280 1285 1290

Tyr Asn Lys His Arg Asp Lys Pro Ile Arg Glu Gln Ala Glu Asn

1295 1300 1305

Ile Ile His Leu Phe Thr Leu Thr Asn Leu Gly Ala Pro Ala Ala

1310 1315 1320

Phe Lys Tyr Phe Asp Thr Thr Ile Asp Arg Lys Arg Tyr Thr Ser

1325 1330 1335

Thr Lys Glu Val Leu Asp Ala Thr Leu Ile His Gln Ser Ile Thr

1340 1345 1350

Gly Leu Tyr Glu Thr Arg Ile Asp Leu Ser Gln Leu Gly Gly Asp

1355 1360 1365

<210> SEQ ID NO 38

<211> LENGTH: 1368

<212> TYPE: PRT

<213> ORGANISM: Streptococcus pyogenes

<400> SEQUENCE: 38

Met Asp Lys Lys Tyr Ser Ile Gly Leu Asp Ile Gly Thr Asn Ser Val

1 5 10 15

Gly Trp Ala Val Ile Thr Asp Asp Tyr Lys Val Pro Ser Lys Lys Phe

20 25 30

Lys Val Leu Gly Asn Thr Asp Arg His Ser Ile Lys Lys Asn Leu Ile

35 40 45

Gly Ala Leu Leu Phe Asp Ser Gly Glu Ile Ala Glu Ala Thr Arg Leu

50 55 60

Lys Arg Thr Ala Arg Arg Arg Tyr Thr Arg Arg Lys Asn Arg Ile Cys

65 70 75 80

Tyr Leu Gln Glu Ile Phe Ser Asn Glu Met Ala Lys Val Asp Asp Ser

85 90 95

Phe Phe His Arg Leu Glu Glu Ser Phe Leu Val Glu Glu Asp Lys Lys

100 105 110

His Glu Arg His Pro Ile Phe Gly Asn Ile Val Asp Glu Val Ala Tyr

115 120 125

His Glu Lys Tyr Pro Thr Ile Tyr His Leu Arg Lys Lys Leu Val Asp

130 135 140

Ser Thr Asp Lys Ala Asp Leu Arg Leu Ile Tyr Leu Ala Leu Ala His

145 150 155 160

Met Ile Lys Phe Arg Gly His Phe Leu Ile Glu Gly Asp Leu Asn Pro

165 170 175

Asp Asn Ser Asp Val Asp Lys Leu Phe Ile Gln Leu Val Gln Thr Tyr

180 185 190

Asn Gln Leu Phe Glu Glu Asn Pro Ile Asn Ala Ser Gly Val Asp Ala

195 200 205

Lys Ala Ile Leu Ser Ala Arg Leu Ser Lys Ser Arg Arg Leu Glu Asn

210 215 220

Leu Ile Ala Gln Leu Pro Gly Glu Lys Lys Asn Gly Leu Phe Gly Asn

225 230 235 240

Leu Ile Ala Leu Ser Leu Gly Leu Thr Pro Asn Phe Lys Ser Asn Phe

245 250 255

Asp Leu Ala Glu Asp Ala Lys Leu Gln Leu Ser Lys Asp Thr Tyr Asp

260 265 270

Asp Asp Leu Asp Asn Leu Leu Ala Gln Ile Gly Asp Gln Tyr Ala Asp

275 280 285

Leu Phe Leu Ala Ala Lys Asn Leu Ser Asp Ala Ile Leu Leu Ser Asp

290 295 300

Ile Leu Arg Leu Asn Ser Glu Ile Thr Lys Ala Pro Leu Ser Ala Ser

305 310 315 320

Met Ile Lys Arg Tyr Asp Glu His His Gln Asp Leu Thr Leu Leu Lys

325 330 335

Ala Leu Val Arg Gln Gln Leu Pro Glu Lys Tyr Lys Glu Ile Phe Phe

340 345 350

Asp Gln Ser Lys Asn Gly Tyr Ala Gly Tyr Ile Asp Gly Gly Ala Ser

355 360 365

Gln Glu Glu Phe Tyr Lys Phe Ile Lys Pro Ile Leu Glu Lys Met Asp

370 375 380

Gly Thr Glu Glu Leu Leu Ala Lys Leu Asn Arg Glu Asp Leu Leu Arg

385 390 395 400

Lys Gln Arg Thr Phe Asp Asn Gly Ser Ile Pro His Gln Ile His Leu

405 410 415

Gly Glu Leu His Ala Ile Leu Arg Arg Gln Glu Asp Phe Tyr Pro Phe

420 425 430

Leu Lys Asp Asn Arg Glu Lys Ile Glu Lys Ile Leu Thr Phe Arg Ile

435 440 445

Pro Tyr Tyr Val Gly Pro Leu Ala Arg Gly Asn Ser Arg Phe Ala Trp

450 455 460

Met Thr Arg Lys Ser Glu Glu Thr Ile Thr Pro Trp Asn Phe Glu Glu

465 470 475 480

Val Val Asp Lys Gly Ala Ser Ala Gln Ser Phe Ile Glu Arg Met Thr

485 490 495

Asn Phe Asp Lys Asn Leu Pro Asn Glu Lys Val Leu Pro Lys His Ser

500 505 510

Leu Leu Tyr Glu Tyr Phe Thr Val Tyr Asn Glu Leu Thr Lys Val Lys

515 520 525

Tyr Val Thr Glu Gly Met Arg Lys Pro Ala Phe Leu Ser Gly Glu Gln

530 535 540

Lys Lys Ala Ile Val Asp Leu Leu Phe Lys Thr Asn Arg Lys Val Thr

545 550 555 560

Val Lys Gln Leu Lys Glu Asp Tyr Phe Lys Lys Ile Glu Cys Phe Asp

565 570 575

Ser Val Glu Ile Ser Gly Val Glu Asp Arg Phe Asn Ala Ser Leu Gly

580 585 590

Thr Tyr His Asp Leu Leu Lys Ile Ile Lys Asp Lys Asp Phe Leu Asp

595 600 605

Asn Glu Glu Asn Glu Asp Ile Leu Glu Asp Ile Val Leu Thr Leu Thr

610 615 620

Leu Phe Glu Asp Arg Glu Met Ile Glu Glu Arg Leu Lys Thr Tyr Ala

625 630 635 640

His Leu Phe Asp Asp Lys Val Met Lys Gln Leu Lys Arg Arg Arg Tyr

645 650 655

Thr Gly Trp Gly Arg Leu Ser Arg Lys Leu Ile Asn Gly Ile Arg Asp

660 665 670

Lys Gln Ser Gly Lys Thr Ile Leu Asp Phe Leu Lys Ser Asp Gly Phe

675 680 685

Ala Asn Arg Asn Phe Met Gln Leu Ile His Asp Asp Ser Leu Thr Phe

690 695 700

Lys Glu Asp Ile Gln Lys Ala Gln Val Ser Gly Gln Gly Asp Ser Leu

705 710 715 720

His Glu His Ile Ala Asn Leu Ala Gly Ser Pro Ala Ile Lys Lys Gly

725 730 735

Ile Leu Gln Thr Val Lys Val Val Asp Glu Leu Val Lys Val Met Gly

740 745 750

Arg His Lys Pro Glu Asn Ile Val Ile Glu Met Ala Arg Glu Asn Gln

755 760 765

Thr Thr Gln Lys Gly Gln Lys Asn Ser Arg Glu Arg Met Lys Arg Ile

770 775 780

Glu Glu Gly Ile Lys Glu Leu Gly Ser Gln Ile Leu Lys Glu His Pro

785 790 795 800

Val Glu Asn Thr Gln Leu Gln Asn Glu Lys Leu Tyr Leu Tyr Tyr Leu

805 810 815

Gln Asn Gly Arg Asp Met Tyr Val Asp Gln Glu Leu Asp Ile Asn Arg

820 825 830

Leu Ser Asp Tyr Asp Val Asp His Ile Val Pro Gln Ser Phe Ile Lys

835 840 845

Asp Asp Ser Ile Asp Asn Lys Val Leu Thr Arg Ser Asp Lys Asn Arg

850 855 860

Gly Lys Ser Asp Asn Val Pro Ser Glu Glu Val Val Lys Lys Met Lys

865 870 875 880

Asn Tyr Trp Arg Gln Leu Leu Asn Ala Lys Leu Ile Thr Gln Arg Lys

885 890 895

Phe Asp Asn Leu Thr Lys Ala Glu Arg Gly Gly Leu Ser Glu Leu Asp

900 905 910

Lys Ala Gly Phe Ile Lys Arg Gln Leu Val Glu Thr Arg Gln Ile Thr

915 920 925

Lys His Val Ala Gln Ile Leu Asp Ser Arg Met Asn Thr Lys Tyr Asp

930 935 940

Glu Asn Asp Lys Leu Ile Arg Glu Val Lys Val Ile Thr Leu Lys Ser

945 950 955 960

Lys Leu Val Ser Asp Phe Arg Lys Asp Phe Gln Phe Tyr Lys Val Arg

965 970 975

Glu Ile Asn Asn Tyr His His Ala His Asp Ala Tyr Leu Asn Ala Val

980 985 990

Val Gly Thr Ala Leu Ile Lys Lys Tyr Pro Lys Leu Glu Ser Glu Phe

995 1000 1005

Val Tyr Gly Asp Tyr Lys Val Tyr Asp Val Arg Lys Met Leu Ala

1010 1015 1020

Lys Ser Glu Gln Glu Ile Gly Lys Ala Thr Ala Lys Tyr Phe Phe

1025 1030 1035

Tyr Ser Asn Ile Met Asn Phe Phe Lys Thr Glu Ile Thr Leu Ala

1040 1045 1050

Asn Gly Glu Ile Arg Lys Arg Pro Leu Ile Glu Thr Asn Gly Glu

1055 1060 1065

Thr Gly Glu Ile Val Trp Asp Lys Gly Arg Asp Phe Ala Thr Val

1070 1075 1080

Arg Lys Val Leu Ser Met Pro Gln Val Asn Ile Val Lys Lys Thr

1085 1090 1095

Glu Val Gln Thr Gly Gly Phe Ser Lys Glu Ser Ile Leu Pro Lys

1100 1105 1110

Arg Asn Ser Asp Lys Leu Ile Ala Arg Lys Lys Asp Trp Asp Pro

1115 1120 1125

Lys Lys Tyr Gly Gly Phe Asp Ser Pro Thr Val Ala Tyr Ser Val

1130 1135 1140

Leu Val Val Ala Lys Val Glu Lys Gly Lys Ser Lys Lys Leu Lys

1145 1150 1155

Ser Val Lys Glu Leu Leu Gly Ile Thr Ile Met Glu Arg Ser Ser

1160 1165 1170

Phe Glu Lys Asn Pro Ile Asp Phe Leu Glu Ala Lys Gly Tyr Lys

1175 1180 1185

Glu Val Arg Lys Asp Leu Ile Ile Lys Leu Pro Lys Tyr Ser Leu

1190 1195 1200

Phe Glu Leu Glu Asn Gly Arg Lys Arg Met Leu Ala Ser Ala Gly

1205 1210 1215

Glu