Updating Computer Science Education

Download Report

Transcript Updating Computer Science Education

Updating Computer Science Education

Jacques Cohen Brandeis University Waltham, MA USA

January 2007

Topics

     

Preliminary remarks Present state of affairs and concerns Objectives of this talk Trends ( hardware, software, networks, others) Illustrative examples Suggestions

Present state of affairs and concerns

Huge increase in PC and internet usage.

Decreasing enrollment.

(USA mainly)

Possible Reasons

    

Previous high school preparation Bubble burst ( 2000) + outsourcing Widespread usage of computers by lay persons Interest in interdisciplinary topics (e.g., biology, business, economics) Public perception about: What is Computer Science?

The Nature of Computer Science  

Two main components: Theoretical Mathematics and and Experimental Engineering What characterizes CS is the notion of Algorithms Emphasis on the discrete and logic An interdisciplinary approach with other sciences may well revive the interest on the continuous (or use of qualitative reasoning)

Related fields

      

Sciences in general (scientific computing), Management, Psychology (human interaction), Business, Communications, Journalism, Arts, etc.

The role of Computer Science among other sciences

( How we are perceived by the other sciences )   In physics, chemistry, biology, is the ultimate umpire.

nature Discovery

is paramount In math and engineering:

aesthetics

,

ease of use, acceptance, permanence,

play key roles

Uneasy dialogue with biologists

It is not unusual to hear from a physicist, chemist or biologist: “If computer scientists do not get involved in our field, we will do it ourselves!!”

It looks very likely that the biological sciences (including, of course, neuroscience) will dominate the 21st century

Differences in approaches

Most scientific and creative discoveries proceed in a bottom-up manner

 

Computer scientists are taught to emphasize top-down approaches

Polya’s “ How to solve it” often mentions

First specialize then generalize

.

Hacking is beautiful (mostly bottom-up)

Objectives

  Provide a bird’s eye view of what is happening in CS education (USA) and attempt to make recommendations about possible directions. Hopefully, some of it would be applicable to European universities.

Premise

Changes ought to be gradual and depend on resources and time constraints

First we have to observe current trends Generality, Storage, Speed, Networks,

o thers.

  

Trying to make sense of present directions.

Difficult and risky to foresee future, e.g., PC (windows, mouse), internet, parallelism

Topics influencing computer science education.

Trends in hardware, software, networks.

Huge volume of data (terabytes and petabytes)   

Statistical nature of data Clustering, classification Probability and Statistics become increasingly important

Trend towards generality

    

Need to know more about what is going on in related topics

A few examples: Robotics and mechanical engineering Hardware, electrical engineering, material science, nanotechnology Multi-field visualization (e.g., medicine) Biophysics and bioinformatics

Nature of data structures

    

Sequences (strings), streams Trees, DAGs, and Graphs 3D structures Emphasis in discrete structures Neglect of the continuous should be corrected ( e.g., use of MatLab )

Trends on data growth How Much Information Is There In the World?

  The 20-terabyte size of the Library of Congress derived by assuming that LC has 20 million books and each requires 1 MB. Of course, LC has much other stuff besides printed text, and this other stuff would take much more space.

From Lesk

http://www.lesk.com/mlesk/ksg97/ksg.html

Library of Congress data (cont)

1. Thirteen million photographs , even if compressed to a 1 MB JPG each, would be 13 terabytes.

2. The 4 million maps in the Geography Division might scan to 200 TB .

3. LC has over five hundred thousand movies; at 1 GB each they would be 4. Bulkiest might be the 500 terabytes (most are not full-length color features).

3.5 million sound recordings about , which at one audio CD each, would be almost 2,000 TB .

This makes the total size of the Library perhaps

3 petabytes (3,000 terabytes).

How Much Information Is There In the World?

Lesk’s Conclusions

 There will be enough disk space and tape storage in the world to store everything people or photograph or two away.

.

write, say, perform For writing this is true already; for the others it is only a year

Lesk’s Conclusions (cont)

 The challenge for librarians and computer scientists is to let us find the information we want in other people's work; and the challenge for the lawyers and economists is the payment structures so to arrange that we are encouraged to use the work of others rather than re-create it.

       

The huge volume of data

implies:

Linearity of algorithms is a must Emphasis in pattern matching Increased preprocessing Different levels of memory transfer rates Algorithmic incrementality tasks) (avoid redoing Need of approximate ( optimization ) algorithms Distributed computing Centralized parallelism (Blue Gene, Argonne)

The importance of pattern matching (searches) in large number of items

Pattern matching has to be “tolerant” (approximate) Find closest matches (dynamic programming, optimization)

     

Sequences Pictures 3D structures (e.g. proteins) Sound Photos Video

Trends in computer cycles (speed) 

Moore’s law appears to be applicable until at least 2020

Use of supercomputers

(2006)  Researchers at Los Alamos National Laboratory have set a new world's record by performing the structure, the first million-atom computer simulation in biology . Using the "Q Machine" supercomputer, Los Alamos computer scientists have created a molecular simulation of the cell's protein-making ribosome . The project, simulating 2.64 million atoms in motion is more than six times larger than any biological simulations performed to date. ,

Graphical visualization of the simulation of a Ribosome at work

Network transmission speed (Lambda Rail Net)

 USA backbone

Trends in Transmission Speed 

The High Energy Physics team's demonstration achieved a peak throughput of 151 official mark of 131.6

beating their previous mark for peak throughput of 50 percent. Gbps and an Gbps 101 Gbps by

Trends in Transmission Speed II

The new record data transfer speed is also equivalent to serving 10,000 MPEG2 HDTV movies simultaneously in real time, or transmitting all of the printed content of the Library of Congress in 10 minutes.

Trend in Languages

 

Importance of scripting and string processing

XML, Java C++, Trend towards Python, Matlab, Mathematica

No ideal languages

No agreement of what the first language ought to be

A recently proposed language (

Fortress 2006

)

 From Guy Steel, The Fortress Programming Language, Sun Micro Systems http://iic.harvard.edu/documents/steeleLecture2006public.pdf

Fortress Language (Sun, Guy Steele)

Meta-level approach to teaching

   Learn 2 or 3 languages and assume that expertise in other languages can be acquired on the fly.

Hopefully, the same will occur in learning a topic in depth. Once in-depth research is taught using a particular area it can be extrapolated to other areas.

Increasing usage of data banks Typical examples: WordNet

canned

programs or GraphViz,

Trends in Algorithmic Complexity

   

Overcoming the scare of NP problems ( it happened before with undecidability ) 3-SAT lessons Mapping polynomial problems within NP Optimization, approximate or random algorithms

Three Examples

Example I

The lessons of BLAST (preprocessing, incrementability, approximation ) 

Example II

The importance of analyzing very large networks.

(probability, sensors, sociological implications) 

Example III

Time Series.

(data mining, pattern searches, classification)

Example I

(History of BLAST) sequence alignment

 Biologists matched sequences of nucleotides or aminoacids empirically using Dot Matrices

Dot matrices

No exact matching

Alignment with Gaps

Dynamic Programming Approach

Dynamic Programming complexity O(n

2

)

Two solutions with gaps

Complexity can be exponential for determining all solutions

The BLAST approach complexity is almost linear

Equivalent Dot Matrices would have the size 3 billion columns ( human genome ) and Z rows sequence being matched against a genome ( where Z is the size of the possibly tens of thousands )

BLAST Tricks

   

Preprocessing

Compile the locations in a genome containing all possible “seeds” (combinations of 6 nucleotides or aminoacids)

Hacking

Follow diagonals as much as possible (Blast strategy) Use dynamic programming as a last resort

Lots of approximations but a very successful outcome       No multiple solutions BLAST may not find best matches The notion of sequences) p-values becomes very important (probability of matches in random Tuning of the BLAST algorithm parameters Mixture of hacking and theory Advantage: satisfies incrementability

Example II (Networks and Sociology)

Money travels (bills)

Probabilities P(time,distance)

Money travels

  

The entire process could be implemented using sensors.

Mimics spread of disease.

The impact of computing will go deeper into the sciences and spread more into the social sciences (Jon Kleinberg, 2006)

Example III (Time Series)

Illustrates data mining and how much CS can help other sciences Slides from Dr Eamonn Keogh

University of California. Riverside,CA

Examples of time series

Time Series (cont 1)

Time Series (cont 2)

Time Series (cont 3)

Time Series (cont 4)

Time Series (cont 5)

Using Logic Programming in Multivariate Time Series (Sleep Apnea)

from G Guimarães and L. Moniz Pereira 9000 8000 7000 6000 5000 4000 3000 2000 1000 0 Eve nt2 Eve nt3 Eve nt5 Eve nt Ta ce t No ribca ge a nd a bdomina l move me nts without s noring S trong ribca ge a nd a bdomina l move me nts Re duce d ribca ge a nd a bdomina l move me nts without s noring Ta ce t No a irflow without s noring S trong a irflow with s noring Ta ce t Airflow Ribca ge move me nts Abdomina l move me nts S noring

Back to curricula recommendations

Present status (USA) and suggested changes

Current recommended curricula

ACM, SIGCSE 2001 (USA)

1. Discrete Structures (43 core hours) 2. Programming Fundamentals (54 core hours) 3. Algorithms and Complexity (31 core hours) 4. Programming Languages (6 core hours) 5. Architecture and Organization (36 core hours) 6. Operating Systems (18 core hours) 7. Net-Centric Computing (15 core hours) 8. Human-Computer Interaction (6 core hours) 9. Graphics and Visual Computing (5 core hours) 10. Intelligent Systems (10 core hours) 11. Information Management (10 core hours) 12. Software Engineering (30 core hours) 13. Social and Professional Issues (16 core hours) 14. Computational Science (no core hours) From Domik G.: Glimpses into the Future of Computer Science Education University of Paderhor, Germany

Changing Curricula

Two extremes Increased Generality and Limited Depth Limited Generality and Increased Depth

The two extremes in graphical form Breadth

( generality )

D Depth

The MIT pilot program for freshmen

 At MIT there is a unified EECS department Two choices for the first year course: 

Robotics using probabilistic Bayesian approaches (CS)

Study of cell phones inside out (EE)

Concrete suggestions I

     

Teaching is inextricably linked to research .

Time and Gradual resources govern curriculum changes.

changes are essential.

Avoid overlap required courses.

of material among different If possible introduce an elective course on Current trends in computer science.

Deal with massive data even in intro courses.

Concrete suggestions II

When teaching algorithms stress the potential of:

Preprocessing

 

Incrementality Parallelization

 

Approximations Taking advantage of sparseness

Concrete suggestions III

     

Emphasize probability and statistics Bayesian approaches Hidden Markov Models Random algorithms Clustering and classification Machine learning and Data Mining

Finally, …

Encourage interdisciplinary work.

It will inspire new directions in computer science.

Thank you!!

Future of Computer Intensive Science in the U.S. (Daniel Reed 2006)

   Ten years – a geological epoch on the computing time scale. Looking back, a decade brought the web and consumer email, digital cameras and music, broadband networking, multifunction cell phones, WiFi, HDTV, telematics, multiplayer games, electronic commerce and computational science . It also brought outsourcing and globalization, information warfare and blurred work-life boundaries spam, phishing, identity theft, software insecurity, . What will a decade of technology advances bring in communications and collaboration, sensors and knowledge management, modeling and discovery, electronic commerce and digital entertainment, critical infrastructure management and security? What will it mean for research and education?

 Daniel A. Reed is the director of the Renaissance Computing Institute. He also is Chancellor's Eminent Professor and Vice-Chancellor for Information Technology at the University of North Carolina at Chapel Hill.

Cyberinfrastructure and Economic Curvature Creating Curvature in a Flat World (Singtae Kim, Purdue, 2006)

 Cyberinfrastructure is central to scientific advancement in the modern, data-intensive research environment. For example, the recent revolution in the life sciences, including the seminal achievement of sequencing the human genome on an accelerated time frame, was made possible by parallel advances in cyberinfrastructure for research in this data-intensive field.  But beyond the enablement of basic research, cyberinfrastructure is a driver for global economic growth despite the disruptive 'flattening' effect of IT in the developed economies. But even at the regional level, visionary cyber investments to create smart infrastructures will induce 'economic curvature' a gravitational pull to overcome the dispersive effects of the 'flat' world and the consequential acceleration in economic growth.

Miscellaneous I

       Claytronics Game theory (economics - psychology) Other examples in bioinformatics Beautiful interaction between sequence (strings) and structures Reverse engineering In biology Geography and Phenotype (external structural appearance) are of paramount importance Systems Biology

Miscellaneous II

  Cross word puzzle using Google Skiena and statistical NLP