Privacy-Aware Computing

Download Report

Transcript Privacy-Aware Computing

CEG7380 Cloud Computing Lecture 1

Keke Chen

Outline

 Syllabus  Scope of this course  Tentative schedule    Prerequisites Resources Assignments  Introduction

Scope of this course

   Understand the basic ideas of cloud computing Get familiar with  Tools  Systems Expose to some research topics

Two major parts:

  Processing large data with the cloud Scaling up/down web applications with the cloud Note: some programming parts need self-study

Prerequisites

  Some programming skills  Java, python, shell  Comfortable with learning new programming frameworks Sufficient knowledge about  Data structure and databases  Operating systems  Distributed systems

Assignments and Grading

     Reading papers (~3) (10%) Some miniprojects (4~5) (60%)  Help you master the concepts  Learn to use tools and systems Self-motivated research projects are strongly encouraged!

Final exam (20%) Class attendance and discussion (10%)

Resources

    updated reference list Inhouse hadoop cluster AWS access  coupon code for each student Pilot  Submitting reading assignments and projects

Tentative Schedule

 Parallel data processing    Distributed file systems (GFS, HDFS) MapReduce High-level distributed data management  Cloud infrastructures    Virtualization AWS and Eucalyptus Interactive front-end – Google App Engine   Cloud security and privacy Research topics

In projects, we will learn to use

    Hadoop Mapreduce, Pig Latin AWS google app engine

Cloud Computing

lecture 1-2 Some slides are borrowed from UC Berkeley RAD Lab Keke Chen

Outline

     What is cloud computing?

Why now?

Cloud killer applications Cloud economics Challenges and opportunities   “above the cloud” “Clairemont Report”

What is Cloud Computing?

  Old idea: Software as a Service (SaaS)  Def: delivering applications over the Internet Recently: “[Hardware, Infrastrucuture, Platform] as a service”  Utility Computing: pay-as-you-use computing  Illusion of infinite resources   No up-front cost Fine-grained billing (e.g. hourly) 12

Cloud computing vs. grid computing

  Cloud computing = virtualization+ grid + services + utility computing  Grid computing: resource provisioning, load balancing, parallel processing Views of different users  System admin/hadoop users: grid  Application owners/service users: service, utility

Users and cloud providers

Why Now?

  Experience with very large datacenters – profitable for cloud providers  economics of scale   Pervasive broadband Internet Fast x86 virtualization Pay-as-you-go billing model  Large user base  Online payment      Online Ads Content distribution Web 2.0 lowers the entry point to e-business more small e-business owners Large user base of clouds 15

Spectrum of Clouds

   Instruction Set VM (Amazon EC2, 3Tera) Bytecode VM (Microsoft Azure) Framework VM  Google AppEngine, Force.com

Lower-level, Less management Higher-level, More management EC2 Azure AppEngine Force.com

16

Cloud Killer Apps

   Mobile and web applications Batch processing / MapReduce  Data analytics (big data)  E.g., OLAP, data mining, machine learning Extensions of desktop software  Matlab, Mathematica 17

Cloud Economics

• Pay by use instead of provisioning for peak Capacity Demand Capacity Time Static data center Demand Time Data center in the cloud Unused resources 18

Economics of Cloud Users

• Risk of over-provisioning: underutilization Capacity Unused resources Demand Time Static data center 19

Economics of Cloud Users

• Heavy penalty for under-provisioning 1 Time (days) 2 Capacity 3 Demand 1 Time (days) 2 Lost revenue Capacity 3 Demand 1 Time (days) 2 Lost users Capacity 3 Demand 20

Economics of Cloud Providers

 5-7x economies of scale [Hamilton 2008]

Resource

Network

Cost in Medium DC

$95 / Mbps / month

Cost in Very Large DC

$13 / Mbps / month

Ratio

7.1x

Storage Administration $2.20 / GB / month ≈140 servers/admin $0.40 / GB / month >1000 servers/admin 5.7x

7.1x

 Extra benefits    Amazon: utilize off-peak capacity Microsoft: sell .NET tools Google: reuse existing infrastructure 21

Adoption Challenges

Challenge

Availability Data lock-in Data Confidentiality, Auditability, and privacy

Opportunity

Multiple providers & DCs Standardization Encryption, VLANs, Firewalls; Geographical Data Storage; Privacy preserving data outsourcing 22

Growth Challenges

Challenge

Data transfer bottlenecks Performance unpredictability Scalable storage

Opportunity

FedEx-ing disks, Data Backup/Archival Improved VM support, flash memory, scheduling VMs Invent scalable store Bugs in large distributed systems Invent Debugger that relies on Distributed VMs Scaling quickly Invent Auto-Scaler that relies on ML; Snapshots 23

Policy and Business Challenges

Challenge Opportunity

Reputation Fate Sharing Offer reputation-guarding services like those for email Software Licensing Pay-for-use licenses; Bulk use sales 24

Research Challenges Mentioned by Database Community (Claremont Report)

Functionality and operational cost

   Background: compare massive-scale data intensive computing systems with today’s DBMS Limited functionality   Simple APIs (e.g. mapreduce) Pushes more burden on developers Benefits  Easier to manage   Lower operational cost Service Level Agreement (SLA) that is hard to provide for a SQL DBMS P.S. DB Systems are notorious for their expenses in installation and maintenance.

Manageability

 Features of cloud systems  Limited human intervention    High variance workloads A variety of shared infrastructures No DBAs or Administrators to assist developers  Systems need to do work automatically  Self-managing  Adaptive (autonomous) computing

Data security and privacy

  Users sharing physical resources in a cloud  Protect from each other (security)  Protect from curious cloud providers (privacy) Successes may depend on specific target usage scenarios  Examples   Query based services Mining based services

Datasets over multiple clouds

 Interesting datasets might be available in different clouds  Different cloud providers  Private or public clouds  Services mashing up datasets  Inevitably crossing clouds  Federated cloud architectures

Algorithms on Big data

   Working on “Big Data”  Data mining  Machine learning  Visualization Traditionally assume data is in  flat files or relational databases Distributed data organization puts new challenges   Redesign algorithms Redesign frameworks