Privacy-Aware Computing

Download Report

Transcript Privacy-Aware Computing

CEG7380 Cloud Computing Lecture 1

Keke Chen

Outline

 Syllabus  Scope of this course  Tentative schedule    Prerequisites Resources Assignments  Introduction

Scope of this course

   Understand the basic ideas of cloud computing Get familiar with  Tools  Systems Expose to some research topics

Two major parts:

  Processing large data with the cloud Scaling up/down web applications with the cloud Note: some programming parts need self-study

Prerequisites

  Some programming skills  Java, python, shell  Comfortable with learning new programming frameworks Sufficient knowledge about  Data structure and databases  Operating systems  Distributed systems

Assignments and Grading

     Reading papers (~3) (10%) Some miniprojects (4~5) (60%)  Help you master the concepts  Learn to use tools and systems Self-motivated research projects are strongly encouraged!

Final exam (20%) Class attendance and discussion (10%)

Resources

    updated reference list Inhouse hadoop cluster AWS access  coupon code for each student Pilot  Submitting reading assignments and projects

Tentative Schedule

 Parallel data processing    Distributed file systems (GFS, HDFS) MapReduce High-level distributed data management  Cloud infrastructures    Virtualization AWS and Eucalyptus Interactive front-end – Google App Engine   Cloud security and privacy Research topics

In projects, we will learn to use

    Hadoop Mapreduce, Pig Latin AWS google app engine

Cloud Computing

lecture 1-2 Some slides are borrowed from UC Berkeley RAD Lab Keke Chen

Outline

     What is cloud computing?

Why now?

Cloud killer applications Cloud economics Challenges and opportunities   “above the cloud” “Clairemont Report”

What is Cloud Computing?

  Old idea: Software as a Service (SaaS)  Def: delivering applications over the Internet Recently: “[Hardware, Infrastrucuture, Platform] as a service”  Utility Computing: pay-as-you-use computing  Illusion of infinite resources   No up-front cost Fine-grained billing (e.g. hourly) 12

Cloud computing vs. grid computing

  Cloud computing = virtualization+ grid + services + utility computing  Grid computing: resource provisioning, load balancing, parallel processing Views of different users  System admin/hadoop users: grid  Application owners/service users: service, utility

Users and cloud providers

Why Now?

  Experience with very large datacenters – profitable for cloud providers  economics of scale   Pervasive broadband Internet Fast x86 virtualization Pay-as-you-go billing model  Large user base  Online payment      Online Ads Content distribution Web 2.0 lowers the entry point to e-business more small e-business owners Large user base of clouds 15

Spectrum of Clouds

   Instruction Set VM (Amazon EC2, 3Tera) Bytecode VM (Microsoft Azure) Framework VM  Google AppEngine, Force.com

Lower-level, Less management Higher-level, More management EC2 Azure AppEngine Force.com

Cloud Killer Apps

   Mobile and web applications Batch processing / MapReduce  Data analytics (big data)  E.g., OLAP, data mining, machine learning Extensions of desktop software  Matlab, Mathematica 17

Cloud Economics

• Pay by use instead of provisioning for peak Capacity Demand Capacity Time Static data center Demand Time Data center in the cloud Unused resources 18

Economics of Cloud Users

• Risk of over-provisioning: underutilization Capacity Unused resources Demand Time Static data center 19

Economics of Cloud Users

• Heavy penalty for under-provisioning 1 Time (days) 2 Capacity 3 Demand 1 Time (days) 2 Lost revenue Capacity 3 Demand 1 Time (days) 2 Lost users Capacity 3 Demand 20

Economics of Cloud Providers

 5-7x economies of scale [Hamilton 2008]

Resource

Network

Cost in Medium DC

$95 / Mbps / month

Cost in Very Large DC

$13 / Mbps / month

Ratio

7.1x

Storage Administration $2.20 / GB / month ≈140 servers/admin $0.40 / GB / month >1000 servers/admin 5.7x

7.1x

 Extra benefits    Amazon: utilize off-peak capacity Microsoft: sell .NET tools Google: reuse existing infrastructure 21

Adoption Challenges

Challenge

Availability Data lock-in Data Conﬁdentiality, Auditability, and privacy

Opportunity

Multiple providers & DCs Standardization Encryption, VLANs, Firewalls; Geographical Data Storage; Privacy preserving data outsourcing 22

Growth Challenges

Challenge

Data transfer bottlenecks Performance unpredictability Scalable storage

Opportunity

FedEx-ing disks, Data Backup/Archival Improved VM support, flash memory, scheduling VMs Invent scalable store Bugs in large distributed systems Invent Debugger that relies on Distributed VMs Scaling quickly Invent Auto-Scaler that relies on ML; Snapshots 23

Policy and Business Challenges

Challenge Opportunity

Reputation Fate Sharing Offer reputation-guarding services like those for email Software Licensing Pay-for-use licenses; Bulk use sales 24

Research Challenges Mentioned by Database Community (Claremont Report)

Functionality and operational cost

   Background: compare massive-scale data intensive computing systems with today’s DBMS Limited functionality   Simple APIs (e.g. mapreduce) Pushes more burden on developers Benefits  Easier to manage   Lower operational cost Service Level Agreement (SLA) that is hard to provide for a SQL DBMS P.S. DB Systems are notorious for their expenses in installation and maintenance.

Manageability

 Features of cloud systems  Limited human intervention    High variance workloads A variety of shared infrastructures No DBAs or Administrators to assist developers  Systems need to do work automatically  Self-managing  Adaptive (autonomous) computing

Data security and privacy

  Users sharing physical resources in a cloud  Protect from each other (security)  Protect from curious cloud providers (privacy) Successes may depend on specific target usage scenarios  Examples   Query based services Mining based services

Datasets over multiple clouds

 Interesting datasets might be available in different clouds  Different cloud providers  Private or public clouds  Services mashing up datasets  Inevitably crossing clouds  Federated cloud architectures

Algorithms on Big data

   Working on “Big Data”  Data mining  Machine learning  Visualization Traditionally assume data is in  flat files or relational databases Distributed data organization puts new challenges   Redesign algorithms Redesign frameworks

Privacy-Aware Computing

Transcript Privacy-Aware Computing

CEG7380 Cloud Computing Lecture 1

Outline

Scope of this course

Two major parts:

Prerequisites

Assignments and Grading

Resources

Tentative Schedule

In projects, we will learn to use

Cloud Computing

Outline

What is Cloud Computing?

Cloud computing vs. grid computing

Users and cloud providers

Why Now?

Spectrum of Clouds

Cloud Killer Apps

Cloud Economics

Economics of Cloud Users

Economics of Cloud Users

Economics of Cloud Providers

Adoption Challenges

Growth Challenges

Policy and Business Challenges

Functionality and operational cost

Manageability

Data security and privacy

Datasets over multiple clouds

Algorithms on Big data

Directory