[Bmi776] Additional datasets for projects

Beverly Seavey seavey at biostat.wisc.edu
Wed Mar 1 17:18:31 CST 2006


The MSU Center for Microbial Ecology
http://cme.msu.edu/
________________________________________________________________________________
has images that you can use as datasets. They have done some work on supervised
classification of microbes and some computer vision stuff.  Most of the microbes
on this planet can not be cultured, meaning that large quantities can't be grown,
and that means people have to look at pieces of DNA harvested from  e.g., soil,
and try to put together a genome that includes sequence data for just one


European ribosomal RNA database
http://www.psb.ugent.be/rRNA/secmodel/index.html secondary structure maps 
http://www.psb.ugent.be/rRNA/varmaps/index.html variability maps
Contains schematic drawings of 16s rRNA folds. might be amenable to computer 
vision and/or learning of features. Closely related organism have fewer sequence
and fold differences. It is not understood why there are some of the more
distorted/bizarre folds of some organisms take a look at a few of the images. 
The 2nd address at this site show the rate of mutations for each nucleotide
in the 16S rRNA for different organisms , mapped onto the fold structure. 

Go to www.library.wisc.edu and use the E-journal List to access the journal
Bioinformatics.  Do a search for "database". You will receive references to
thousands of articles that refer to databases, many of them "boutique" dbs that
focus on a certain molecule or aspect of molecules.
http://bioinformatics.oxfordjournals.org


http://pfam.wustl.edu/
This database is about protein domains.
Contains trained HMMs for each domain.
If you want to group data differently from the way it is presented,
there is software for creating an HMM from the seed sequences that you choose.


http://sdmc.lit.org.sg/GEDatasets/Datasets.html
pointers to microarray datasets.
http://visitor.ics.uci.edu/genex/cybert/datasets/http://visitor.ics.uci.edu/genex/cybert/datasets
http://www.broad.mit.edu/cgi-bin/cancer/datasets.cgi


http://metacyc.org/
Contains information about metabolic pathways if you want to do some kind of graph
analysis of interacting molecules.


http://www.unleashedinformatics.com/index.php?pg=products&refer=bind
A database about molecules, large or small, that bind to other molecules in
cells. 



More information about the Bmi776 mailing list