Phylogenetics Homework 1

These labs are designed to introduce you to the tools you need to conduct phylogenetic analyses. The analyses that they ask from you have been simplified in order to make reasonable homework assignments and thus do not make realistic, publishable analyses. If you have data you are interested in, you may talk to me to get permision to modify these homeworks to address your interests.

In this first assignment, you will gather data from ten bacteria and two outgroups. You should get the 6-phosphogluconate dehydrogenase, decarboxylating (gnd) protein and DNA sequences for the following species:

Encephalitozoon cuniculi (Fungus)
Dictyostelium discoideum (Slime mold)
Synechococcus elongatus (Cyanobacteria)
Cronobacter dublinensis (Proteobacteria)
Candidatus Solibacter usitatus (Acidobacteria)
Leptotrichia goodfellowii (Fusobacteria)
Borrelia garinii (Spirochaetes)
Phycisphaera mikurensis (Planctomycetes)
Mycobacterium tuberculosis (Actinobacteria)
Anaerolinea thermophila (Chloroflexi)
Chlamydia trachomatis (Chlamydiae)
Lactococcus lactis (Firmicutes)

I succeeded in doing this using GenBank. For some species, you have to click on the right places to get the entire genome with DNA sequences, search for "gnd (the quote helps), and click on CDS to get the DNA sequence that encodes the protein highlighted. If the DNA sequence is not loaded, there is a place you can click to load them. I did it all with a slowish internet connection in a reasonable amount of time.

When gathering this data, it is ideal to keep track of the necessary references in order to be able to find the data again. As an example, the record I made for Candidatus Solibacter usitatus is the following:


ORGANISM  Candidatus Solibacter usitatus Ellin6076
            Bacteria; Acidobacteria; Solibacteres; Solibacterales;
            Solibacteraceae; Candidatus Solibacter.
REFERENCE   1  (bases 1 to 9965640)
  AUTHORS   Copeland,A., Lucas,S., Lapidus,A., Barry,K., Detter,J.C., Glavina
            del Rio,T., Hammon,N., Israni,S., Dalin,E., Tice,H., Pitluck,S.,
            Thompson,L.S., Brettin,T., Bruce,D., Han,C., Tapia,R., Gilna,P.,
            Schmutz,J., Larimer,F., Land,M., Hauser,L., Kyrpides,N.,
            Mikhailova,N., Janssen,P.H., Kuske,C.R. and Richardson,P.
  CONSRTM   US DOE Joint Genome Institute
  TITLE     Complete sequence of Solibacter usitatus Ellin6076
  JOURNAL   Unpublished
REFERENCE   2  (bases 1 to 9965640)
  AUTHORS   Copeland,A., Lucas,S., Lapidus,A., Barry,K., Detter,J.C., Glavina
            del Rio,T., Hammon,N., Israni,S., Dalin,E., Tice,H., Pitluck,S.,
            Thompson,L.S., Brettin,T., Bruce,D., Han,C., Tapia,R., Gilna,P.,
            Schmutz,J., Larimer,F., Land,M., Hauser,L., Kyrpides,N.,
            Mikhailova,N., Janssen,P.H., Kuske,C.R. and Richardson,P.
  CONSRTM   US DOE Joint Genome Institute
  TITLE     Direct Submission
  JOURNAL   Submitted (06-OCT-2006) US DOE Joint Genome Institute, 2800
            Mitchell Drive B100, Walnut Creek, CA 94598-1698, USA

  gene            8897542..8898999
                     /locus_tag="Acid_7082"
     CDS             8897542..8898999
                     /locus_tag="Acid_7082"
                     /EC_number="1.1.1.44"
                     /note="KEGG: sde:Sde_0635 6-phosphogluconate
                     dehydrogenase, decarboxylating;
                     TIGRFAM: 6-phosphogluconate dehydrogenase,
                     decarboxylating;
                     PFAM: 6-phosphogluconate dehydrogenase domain protein;
                     6-phosphogluconate dehydrogenase, NAD-binding"
                     /codon_start=1
                     /transl_table=11
                     /product="6-phosphogluconate dehydrogenase
                     (decarboxylating)"
                     /protein_id="ABJ87995.1"
                     /db_xref="GI:116229286"
                     /db_xref="InterPro:IPR006113"
                     /db_xref="InterPro:IPR006114"
                     /db_xref="InterPro:IPR006115"
                     /db_xref="InterPro:IPR006183"
                     /db_xref="InterPro:IPR006184"
                     /translation="MEQTADIALIGLAVMGQNLIMNMNDHGYTVVAYNRTTSKVDEFL
                     NDAAKGSKVIGAHSIEEMVKLLKRPRKIMLMVKAGKPVDEFIETLLPYLEPGDLIIDG
                     GNSHFPDTIRRTQYLESKGLLFVGTGVSGGEEGARFGPSMMPGGTPAAWPLVKDIFQA
                     ICAKTPEGEPCCDWVGRDGAGHFVKMTHNGIEYGDMQLICEAYQLMKEGLGMSNEEMH
                     EVFAEWNKGELDSYLIEITRDILGYKDPATGEQTLDKILDTAGQKGTGKWTSVSSLDL
                     GMPVTLIGEAVYARCLSAMKDDRVKASKILTGPKAKFPGDKKAFVEDIRQALLASKIV
                     SYAQGFMLLAEAAKEYKWDLNYGSIAMMWREGCIIRSVFLGKIKAAFANNPTLANLLL
                     DSYFRGLLDRCQGSWRHTVSEAVLNGVPVPAFTTALAFYDGYRSERLPANLLQAQRDY
                     FGAHTFERVDQPRGKFFHTNWTGKGGNVSAGVYTV"

atggaacaa acggcagaca tcgcattgat cggtctggca
  8897581 gtcatgggcc agaacctgat tatgaatatg aacgaccacg ggtacacggt ggtcgcttat
  8897641 aaccgcacga cctccaaggt cgatgaattc ctgaacgacg ccgccaaagg cagcaaggtc
  8897701 atcggcgcgc actcgatcga ggagatggtc aaacttctca agcgcccccg caagatcatg
  8897761 ctcatggtca aggccggcaa gccggtggac gaattcatcg agaccctgct cccctacctc
  8897821 gagcccggcg acctgatcat cgatggcggc aattcgcatt tcccggatac catccgccgc
  8897881 acccaatacc tcgaaagcaa gggccttctg ttcgtcggca ccggcgtttc cggcggcgag
  8897941 gaaggcgcgc gtttcggccc gtccatgatg cccggaggta cccccgccgc gtggcccctc
  8898001 gtgaaggaca tcttccaggc catctgcgcc aagacacccg agggcgagcc ctgctgcgat
  8898061 tgggtcggcc gcgatggcgc cggccacttc gtcaagatga cccacaacgg catcgagtac
  8898121 ggcgatatgc agctcatctg cgaggcctac caactcatga aggaaggcct cggcatgagc
  8898181 aacgaagaaa tgcacgaagt cttcgccgaa tggaacaagg gcgagctcga tagctacctc
  8898241 atcgaaatca cccgcgacat tctgggctac aaagaccccg ccaccggcga acagaccctc
  8898301 gacaaaatcc tcgataccgc cggccaaaag ggtaccggca agtggaccag cgtcagctcg
  8898361 ctcgatctcg gcatgcccgt taccctgatc ggcgaagccg tctacgcgcg ctgcctcagc
  8898421 gctatgaagg acgatcgcgt caaggcttcc aagatcctca ccggacccaa ggccaagttc
  8898481 cccggtgaca agaaggcctt cgtggaagac atccgccagg cccttctcgc ctccaagatc
  8898541 gtcagctacg cgcagggctt catgctcctc gccgaagccg ccaaggaata taagtgggac
  8898601 ctgaactacg gttccatcgc catgatgtgg cgcgaaggct gcatcatccg cagcgtcttc
  8898661 ctcggcaaaa ttaaggccgc gtttgccaac aacccgacgc tggcgaacct gctgctcgat
  8898721 agctacttca ggggcctgct ggaccgttgc cagggttcct ggcgccacac cgtttccgaa
  8898781 gccgtcctca atggcgtgcc ggtgcccgcc ttcaccaccg ctctcgcctt ctacgatggc
  8898841 taccgcagcg aacgcctgcc tgccaacctg ctccaggcac agcgcgatta cttcggcgcc
  8898901 cacaccttcg agcgcgtcga tcagccgcgc ggcaagttct tccacaccaa ttggaccggc
  8898961 aagggcggca acgtctcggc cggagtctac accgtatga

Next week you will create an alignment. Later you will use your alignments to create phylogenies. I have kept the number of species small in order that you can "look" at the data and get a feel for them. These assignments will highlight some practical difficulties in conducting phylogenetic analyses but they are kept short and reasonable enough for homework and so do not constitute a realistic attempt at recreating bacterial phylogenetics.