See also
Datasets for Data Mining Competitions
and KDD Cup results and data.
Datasets for Data Mining
SAS® Analytics gives organizations The Power to Know®

-
UCI KDD Database Repository for large datasets used machine learning and knowledge discovery research.
-
UCI Machine Learning Repository.
AWS (Amazon Web Services) Public Data Sets, provides a centralized repository of public data sets that can be seamlessly integrated into AWS cloud-based applications.
-
Delve, Data for Evaluating Learning in Valid Experiments
-
FEDSTATS, a comprehensive source of US statistics and more
-
FIMI repository for frequent itemset mining, implementations and datasets.
-
Financial Data Finder at OSU, a large catalog of financial data sets
-
GeneSifter Data Center, access to microarray datasets through the GeneSifter microarray data analysis system.
-
GEO (GEO Gene Expression Omnibus), a gene expression/molecular abundance repository supporting MIAME compliant data submissions, and a curated, online resource for gene expression data browsing, query and retrieval.
-
Grain Market Research, financial data including stocks, futures, etc.
ICWSM-2009 dataset contains 44 million blog posts made between August 1st and October 1st, 2008.
-
Investor Links, includes financial data
-
Microsoft's TerraServer, aerial photographs and satellite images you can view and purchase.
-
MIT Cancer Genomics gene expression datasets and publications, from MIT Whitehead Center for Genome Research.
-
National Government Statistical Web Sites, data, reports, statistical yearbooks, press releases, and more from about 70 web sites, including countries from Africa, Europe, Asia, and Latin America.
-
National Space Science Data Center (NSSDC), NASA data sets
from planetary exploration, space and solar physics, life sciences, astrophysics, and more.
-
PubGene(TM) Gene Database and Tools, genomic-related publications database
-
SMD: Stanford Microarray Database, stores raw and normalized data from microarray experiments.
-
SourceForge.net Research Data, includes historic and status statistics on approximately 100,000 projects and over 1 million registered users' activities at the project management web site.
-
STATOO Datasets part 1 and
part 2
-
UCR Time Series Data Mining Archive, offering datasets, papers, links, and code.
-
United States Census Bureau.
| |
|