In this first assignment, you will gather data from ten bacteria and two outgroups. You should get the 6-phosphogluconate dehydrogenase, decarboxylating (gnd) protein and DNA sequences for the following species:
Encephalitozoon cuniculi (Fungus)
Dictyostelium discoideum (Slime mold)
Synechococcus elongatus (Cyanobacteria)
Cronobacter dublinensis (Proteobacteria)
Candidatus Solibacter usitatus (Acidobacteria)
Leptotrichia goodfellowii (Fusobacteria)
Borrelia garinii (Spirochaetes)
Phycisphaera mikurensis (Planctomycetes)
Mycobacterium tuberculosis (Actinobacteria)
Anaerolinea thermophila (Chloroflexi)
Chlamydia trachomatis (Chlamydiae)
Lactococcus lactis (Firmicutes)
I succeeded in doing this using GenBank. For some species, you have to click on the right places to get the entire genome with DNA sequences, search for "gnd (the quote helps), and click on CDS to get the DNA sequence that encodes the protein highlighted. If the DNA sequence is not loaded, there is a place you can click to load them. I did it all with a slowish internet connection in a reasonable amount of time.
When gathering this data, it is ideal to keep track of the necessary references in order to be able to find the data again. As an example, the record I made for Candidatus Solibacter usitatus is the following:
ORGANISM Candidatus Solibacter usitatus Ellin6076 Bacteria; Acidobacteria; Solibacteres; Solibacterales; Solibacteraceae; Candidatus Solibacter. REFERENCE 1 (bases 1 to 9965640) AUTHORS Copeland,A., Lucas,S., Lapidus,A., Barry,K., Detter,J.C., Glavina del Rio,T., Hammon,N., Israni,S., Dalin,E., Tice,H., Pitluck,S., Thompson,L.S., Brettin,T., Bruce,D., Han,C., Tapia,R., Gilna,P., Schmutz,J., Larimer,F., Land,M., Hauser,L., Kyrpides,N., Mikhailova,N., Janssen,P.H., Kuske,C.R. and Richardson,P. CONSRTM US DOE Joint Genome Institute TITLE Complete sequence of Solibacter usitatus Ellin6076 JOURNAL Unpublished REFERENCE 2 (bases 1 to 9965640) AUTHORS Copeland,A., Lucas,S., Lapidus,A., Barry,K., Detter,J.C., Glavina del Rio,T., Hammon,N., Israni,S., Dalin,E., Tice,H., Pitluck,S., Thompson,L.S., Brettin,T., Bruce,D., Han,C., Tapia,R., Gilna,P., Schmutz,J., Larimer,F., Land,M., Hauser,L., Kyrpides,N., Mikhailova,N., Janssen,P.H., Kuske,C.R. and Richardson,P. CONSRTM US DOE Joint Genome Institute TITLE Direct Submission JOURNAL Submitted (06-OCT-2006) US DOE Joint Genome Institute, 2800 Mitchell Drive B100, Walnut Creek, CA 94598-1698, USA gene 8897542..8898999 /locus_tag="Acid_7082" CDS 8897542..8898999 /locus_tag="Acid_7082" /EC_number="1.1.1.44" /note="KEGG: sde:Sde_0635 6-phosphogluconate dehydrogenase, decarboxylating; TIGRFAM: 6-phosphogluconate dehydrogenase, decarboxylating; PFAM: 6-phosphogluconate dehydrogenase domain protein; 6-phosphogluconate dehydrogenase, NAD-binding" /codon_start=1 /transl_table=11 /product="6-phosphogluconate dehydrogenase (decarboxylating)" /protein_id="ABJ87995.1" /db_xref="GI:116229286" /db_xref="InterPro:IPR006113" /db_xref="InterPro:IPR006114" /db_xref="InterPro:IPR006115" /db_xref="InterPro:IPR006183" /db_xref="InterPro:IPR006184" /translation="MEQTADIALIGLAVMGQNLIMNMNDHGYTVVAYNRTTSKVDEFL NDAAKGSKVIGAHSIEEMVKLLKRPRKIMLMVKAGKPVDEFIETLLPYLEPGDLIIDG GNSHFPDTIRRTQYLESKGLLFVGTGVSGGEEGARFGPSMMPGGTPAAWPLVKDIFQA ICAKTPEGEPCCDWVGRDGAGHFVKMTHNGIEYGDMQLICEAYQLMKEGLGMSNEEMH EVFAEWNKGELDSYLIEITRDILGYKDPATGEQTLDKILDTAGQKGTGKWTSVSSLDL GMPVTLIGEAVYARCLSAMKDDRVKASKILTGPKAKFPGDKKAFVEDIRQALLASKIV SYAQGFMLLAEAAKEYKWDLNYGSIAMMWREGCIIRSVFLGKIKAAFANNPTLANLLL DSYFRGLLDRCQGSWRHTVSEAVLNGVPVPAFTTALAFYDGYRSERLPANLLQAQRDY FGAHTFERVDQPRGKFFHTNWTGKGGNVSAGVYTV" atggaacaa acggcagaca tcgcattgat cggtctggca 8897581 gtcatgggcc agaacctgat tatgaatatg aacgaccacg ggtacacggt ggtcgcttat 8897641 aaccgcacga cctccaaggt cgatgaattc ctgaacgacg ccgccaaagg cagcaaggtc 8897701 atcggcgcgc actcgatcga ggagatggtc aaacttctca agcgcccccg caagatcatg 8897761 ctcatggtca aggccggcaa gccggtggac gaattcatcg agaccctgct cccctacctc 8897821 gagcccggcg acctgatcat cgatggcggc aattcgcatt tcccggatac catccgccgc 8897881 acccaatacc tcgaaagcaa gggccttctg ttcgtcggca ccggcgtttc cggcggcgag 8897941 gaaggcgcgc gtttcggccc gtccatgatg cccggaggta cccccgccgc gtggcccctc 8898001 gtgaaggaca tcttccaggc catctgcgcc aagacacccg agggcgagcc ctgctgcgat 8898061 tgggtcggcc gcgatggcgc cggccacttc gtcaagatga cccacaacgg catcgagtac 8898121 ggcgatatgc agctcatctg cgaggcctac caactcatga aggaaggcct cggcatgagc 8898181 aacgaagaaa tgcacgaagt cttcgccgaa tggaacaagg gcgagctcga tagctacctc 8898241 atcgaaatca cccgcgacat tctgggctac aaagaccccg ccaccggcga acagaccctc 8898301 gacaaaatcc tcgataccgc cggccaaaag ggtaccggca agtggaccag cgtcagctcg 8898361 ctcgatctcg gcatgcccgt taccctgatc ggcgaagccg tctacgcgcg ctgcctcagc 8898421 gctatgaagg acgatcgcgt caaggcttcc aagatcctca ccggacccaa ggccaagttc 8898481 cccggtgaca agaaggcctt cgtggaagac atccgccagg cccttctcgc ctccaagatc 8898541 gtcagctacg cgcagggctt catgctcctc gccgaagccg ccaaggaata taagtgggac 8898601 ctgaactacg gttccatcgc catgatgtgg cgcgaaggct gcatcatccg cagcgtcttc 8898661 ctcggcaaaa ttaaggccgc gtttgccaac aacccgacgc tggcgaacct gctgctcgat 8898721 agctacttca ggggcctgct ggaccgttgc cagggttcct ggcgccacac cgtttccgaa 8898781 gccgtcctca atggcgtgcc ggtgcccgcc ttcaccaccg ctctcgcctt ctacgatggc 8898841 taccgcagcg aacgcctgcc tgccaacctg ctccaggcac agcgcgatta cttcggcgcc 8898901 cacaccttcg agcgcgtcga tcagccgcgc ggcaagttct tccacaccaa ttggaccggc 8898961 aagggcggca acgtctcggc cggagtctac accgtatga
Next week you will create an alignment. Later you will use your alignments to create phylogenies. I have kept the number of species small in order that you can "look" at the data and get a feel for them. These assignments will highlight some practical difficulties in conducting phylogenetic analyses but they are kept short and reasonable enough for homework and so do not constitute a realistic attempt at recreating bacterial phylogenetics.