Topics (ppt)

Download Report

Transcript Topics (ppt)

CPS 216: Advanced Database
Systems
Shivnath Babu
Minor Change to Course Logistics
• Grading:
– Project 40%  35%
– Homework Assignments 15%
– Midterm 20%  25%
– Final 25%
Presentation & Report on “Big Data”
• 6 topics, 2 students per topic.
– Let us try to form groups in class. Otherwise, email your
ranked preferences. Shivnath will form the groups
• Shivnath will give some initial pointers. Get more
information (use the Web, books, library, etc.)
• Do a 10-minute in-class presentation on Thu 9/24
• Submit a detailed report that will be read by all
students
• Presentation and report will be graded as part of
the project
“Big Data” Topics
1. MapReduce Vs. Databases, Hive, Hybrid
approaches
2. Parallel Databases: Old (Gamma) and New
(Greenplum, Aster Data, HadoopDB)
3. HBase and databases over HDFS, Google File
System, Google BigTable
4. Pig and other higher-level languages (Scope,
Dryad)
5. Optimization of MapReduce programs: Hadoop
Scheduling, Resource allocation
6. Key-Value stores (Amazon Dynamo, Cassandra)
The Duke CS Hadoop Cluster
• See the project web page for access instructions. I
will try to give an introduction in class
• Programming component of Homework 1 will be
done on the Hadoop cluster
– Implement MapReduce program to compute
average temperature per year over the NCDC
data
– Submit sources (Java files and Jar file)
– Due on Tuesday 9/22