Transcript Updating Computer Science Education
Updating Computer Science Education
Jacques Cohen Brandeis University Waltham, MA USA
January 2007
Topics
Preliminary remarks Present state of affairs and concerns Objectives of this talk Trends ( hardware, software, networks, others) Illustrative examples Suggestions
Present state of affairs and concerns
Huge increase in PC and internet usage.
Decreasing enrollment.
(USA mainly)
Possible Reasons
Previous high school preparation Bubble burst ( 2000) + outsourcing Widespread usage of computers by lay persons Interest in interdisciplinary topics (e.g., biology, business, economics) Public perception about: What is Computer Science?
The Nature of Computer Science
Two main components: Theoretical Mathematics and and Experimental Engineering What characterizes CS is the notion of Algorithms Emphasis on the discrete and logic An interdisciplinary approach with other sciences may well revive the interest on the continuous (or use of qualitative reasoning)
Related fields
Sciences in general (scientific computing), Management, Psychology (human interaction), Business, Communications, Journalism, Arts, etc.
The role of Computer Science among other sciences
( How we are perceived by the other sciences ) In physics, chemistry, biology, is the ultimate umpire.
nature Discovery
is paramount In math and engineering:
aesthetics
,
ease of use, acceptance, permanence,
play key roles
Uneasy dialogue with biologists
It is not unusual to hear from a physicist, chemist or biologist: “If computer scientists do not get involved in our field, we will do it ourselves!!”
It looks very likely that the biological sciences (including, of course, neuroscience) will dominate the 21st century
Differences in approaches
Most scientific and creative discoveries proceed in a bottom-up manner
Computer scientists are taught to emphasize top-down approaches
Polya’s “ How to solve it” often mentions
First specialize then generalize
.
Hacking is beautiful (mostly bottom-up)
Objectives
Provide a bird’s eye view of what is happening in CS education (USA) and attempt to make recommendations about possible directions. Hopefully, some of it would be applicable to European universities.
Premise
Changes ought to be gradual and depend on resources and time constraints
First we have to observe current trends Generality, Storage, Speed, Networks,
o thers.
Trying to make sense of present directions.
Difficult and risky to foresee future, e.g., PC (windows, mouse), internet, parallelism
Topics influencing computer science education.
Trends in hardware, software, networks.
Huge volume of data (terabytes and petabytes)
Statistical nature of data Clustering, classification Probability and Statistics become increasingly important
Trend towards generality
Need to know more about what is going on in related topics
A few examples: Robotics and mechanical engineering Hardware, electrical engineering, material science, nanotechnology Multi-field visualization (e.g., medicine) Biophysics and bioinformatics
Nature of data structures
Sequences (strings), streams Trees, DAGs, and Graphs 3D structures Emphasis in discrete structures Neglect of the continuous should be corrected ( e.g., use of MatLab )
Trends on data growth How Much Information Is There In the World?
The 20-terabyte size of the Library of Congress derived by assuming that LC has 20 million books and each requires 1 MB. Of course, LC has much other stuff besides printed text, and this other stuff would take much more space.
From Lesk
http://www.lesk.com/mlesk/ksg97/ksg.html
Library of Congress data (cont)
1. Thirteen million photographs , even if compressed to a 1 MB JPG each, would be 13 terabytes.
2. The 4 million maps in the Geography Division might scan to 200 TB .
3. LC has over five hundred thousand movies; at 1 GB each they would be 4. Bulkiest might be the 500 terabytes (most are not full-length color features).
3.5 million sound recordings about , which at one audio CD each, would be almost 2,000 TB .
This makes the total size of the Library perhaps
3 petabytes (3,000 terabytes).
How Much Information Is There In the World?
Lesk’s Conclusions
There will be enough disk space and tape storage in the world to store everything people or photograph or two away.
.
write, say, perform For writing this is true already; for the others it is only a year
Lesk’s Conclusions (cont)
The challenge for librarians and computer scientists is to let us find the information we want in other people's work; and the challenge for the lawyers and economists is the payment structures so to arrange that we are encouraged to use the work of others rather than re-create it.
The huge volume of data
implies:
Linearity of algorithms is a must Emphasis in pattern matching Increased preprocessing Different levels of memory transfer rates Algorithmic incrementality tasks) (avoid redoing Need of approximate ( optimization ) algorithms Distributed computing Centralized parallelism (Blue Gene, Argonne)
The importance of pattern matching (searches) in large number of items
Pattern matching has to be “tolerant” (approximate) Find closest matches (dynamic programming, optimization)
Sequences Pictures 3D structures (e.g. proteins) Sound Photos Video
Trends in computer cycles (speed)
Moore’s law appears to be applicable until at least 2020
Use of supercomputers
(2006) Researchers at Los Alamos National Laboratory have set a new world's record by performing the structure, the first million-atom computer simulation in biology . Using the "Q Machine" supercomputer, Los Alamos computer scientists have created a molecular simulation of the cell's protein-making ribosome . The project, simulating 2.64 million atoms in motion is more than six times larger than any biological simulations performed to date. ,
Graphical visualization of the simulation of a Ribosome at work
Network transmission speed (Lambda Rail Net)
USA backbone
Trends in Transmission Speed
The High Energy Physics team's demonstration achieved a peak throughput of 151 official mark of 131.6
beating their previous mark for peak throughput of 50 percent. Gbps and an Gbps 101 Gbps by
Trends in Transmission Speed II
The new record data transfer speed is also equivalent to serving 10,000 MPEG2 HDTV movies simultaneously in real time, or transmitting all of the printed content of the Library of Congress in 10 minutes.
Trend in Languages
Importance of scripting and string processing
XML, Java C++, Trend towards Python, Matlab, Mathematica
No ideal languages
No agreement of what the first language ought to be
A recently proposed language (
Fortress 2006
)
From Guy Steel, The Fortress Programming Language, Sun Micro Systems http://iic.harvard.edu/documents/steeleLecture2006public.pdf
Fortress Language (Sun, Guy Steele)
Meta-level approach to teaching
Learn 2 or 3 languages and assume that expertise in other languages can be acquired on the fly.
Hopefully, the same will occur in learning a topic in depth. Once in-depth research is taught using a particular area it can be extrapolated to other areas.
Increasing usage of data banks Typical examples: WordNet
canned
programs or GraphViz,
Trends in Algorithmic Complexity
Overcoming the scare of NP problems ( it happened before with undecidability ) 3-SAT lessons Mapping polynomial problems within NP Optimization, approximate or random algorithms
Three Examples
Example I
The lessons of BLAST (preprocessing, incrementability, approximation )
Example II
The importance of analyzing very large networks.
(probability, sensors, sociological implications)
Example III
Time Series.
(data mining, pattern searches, classification)
Example I
(History of BLAST) sequence alignment
Biologists matched sequences of nucleotides or aminoacids empirically using Dot Matrices
Dot matrices
No exact matching
Alignment with Gaps
Dynamic Programming Approach
Dynamic Programming complexity O(n
2
)
Two solutions with gaps
Complexity can be exponential for determining all solutions
The BLAST approach complexity is almost linear
Equivalent Dot Matrices would have the size 3 billion columns ( human genome ) and Z rows sequence being matched against a genome ( where Z is the size of the possibly tens of thousands )
BLAST Tricks
Preprocessing
Compile the locations in a genome containing all possible “seeds” (combinations of 6 nucleotides or aminoacids)
Hacking
Follow diagonals as much as possible (Blast strategy) Use dynamic programming as a last resort
Lots of approximations but a very successful outcome No multiple solutions BLAST may not find best matches The notion of sequences) p-values becomes very important (probability of matches in random Tuning of the BLAST algorithm parameters Mixture of hacking and theory Advantage: satisfies incrementability
Example II (Networks and Sociology)
Money travels (bills)
Probabilities P(time,distance)
Money travels
The entire process could be implemented using sensors.
Mimics spread of disease.
The impact of computing will go deeper into the sciences and spread more into the social sciences (Jon Kleinberg, 2006)
Example III (Time Series)
Illustrates data mining and how much CS can help other sciences Slides from Dr Eamonn Keogh
University of California. Riverside,CA
Examples of time series
Time Series (cont 1)
Time Series (cont 2)
Time Series (cont 3)
Time Series (cont 4)
Time Series (cont 5)
Using Logic Programming in Multivariate Time Series (Sleep Apnea)
from G Guimarães and L. Moniz Pereira 9000 8000 7000 6000 5000 4000 3000 2000 1000 0 Eve nt2 Eve nt3 Eve nt5 Eve nt Ta ce t No ribca ge a nd a bdomina l move me nts without s noring S trong ribca ge a nd a bdomina l move me nts Re duce d ribca ge a nd a bdomina l move me nts without s noring Ta ce t No a irflow without s noring S trong a irflow with s noring Ta ce t Airflow Ribca ge move me nts Abdomina l move me nts S noring
Back to curricula recommendations
Present status (USA) and suggested changes
Current recommended curricula
ACM, SIGCSE 2001 (USA)
1. Discrete Structures (43 core hours) 2. Programming Fundamentals (54 core hours) 3. Algorithms and Complexity (31 core hours) 4. Programming Languages (6 core hours) 5. Architecture and Organization (36 core hours) 6. Operating Systems (18 core hours) 7. Net-Centric Computing (15 core hours) 8. Human-Computer Interaction (6 core hours) 9. Graphics and Visual Computing (5 core hours) 10. Intelligent Systems (10 core hours) 11. Information Management (10 core hours) 12. Software Engineering (30 core hours) 13. Social and Professional Issues (16 core hours) 14. Computational Science (no core hours) From Domik G.: Glimpses into the Future of Computer Science Education University of Paderhor, Germany
Changing Curricula
Two extremes Increased Generality and Limited Depth Limited Generality and Increased Depth
The two extremes in graphical form Breadth
( generality )
D Depth
The MIT pilot program for freshmen
At MIT there is a unified EECS department Two choices for the first year course:
Robotics using probabilistic Bayesian approaches (CS)
Study of cell phones inside out (EE)
Concrete suggestions I
Teaching is inextricably linked to research .
Time and Gradual resources govern curriculum changes.
changes are essential.
Avoid overlap required courses.
of material among different If possible introduce an elective course on Current trends in computer science.
Deal with massive data even in intro courses.
Concrete suggestions II
When teaching algorithms stress the potential of:
Preprocessing
Incrementality Parallelization
Approximations Taking advantage of sparseness
Concrete suggestions III
Emphasize probability and statistics Bayesian approaches Hidden Markov Models Random algorithms Clustering and classification Machine learning and Data Mining
Finally, …
Encourage interdisciplinary work.
It will inspire new directions in computer science.
Thank you!!
Future of Computer Intensive Science in the U.S. (Daniel Reed 2006)
Ten years – a geological epoch on the computing time scale. Looking back, a decade brought the web and consumer email, digital cameras and music, broadband networking, multifunction cell phones, WiFi, HDTV, telematics, multiplayer games, electronic commerce and computational science . It also brought outsourcing and globalization, information warfare and blurred work-life boundaries spam, phishing, identity theft, software insecurity, . What will a decade of technology advances bring in communications and collaboration, sensors and knowledge management, modeling and discovery, electronic commerce and digital entertainment, critical infrastructure management and security? What will it mean for research and education?
Daniel A. Reed is the director of the Renaissance Computing Institute. He also is Chancellor's Eminent Professor and Vice-Chancellor for Information Technology at the University of North Carolina at Chapel Hill.
Cyberinfrastructure and Economic Curvature Creating Curvature in a Flat World (Singtae Kim, Purdue, 2006)
Cyberinfrastructure is central to scientific advancement in the modern, data-intensive research environment. For example, the recent revolution in the life sciences, including the seminal achievement of sequencing the human genome on an accelerated time frame, was made possible by parallel advances in cyberinfrastructure for research in this data-intensive field. But beyond the enablement of basic research, cyberinfrastructure is a driver for global economic growth despite the disruptive 'flattening' effect of IT in the developed economies. But even at the regional level, visionary cyber investments to create smart infrastructures will induce 'economic curvature' a gravitational pull to overcome the dispersive effects of the 'flat' world and the consequential acceleration in economic growth.
Miscellaneous I
Claytronics Game theory (economics - psychology) Other examples in bioinformatics Beautiful interaction between sequence (strings) and structures Reverse engineering In biology Geography and Phenotype (external structural appearance) are of paramount importance Systems Biology
Miscellaneous II
Cross word puzzle using Google Skiena and statistical NLP