Information Science

Download Report

Transcript Information Science

Information Science 2005

Tefko Saracevic, PhD School of Communication, Information and Library Studies Rutgers University New Brunswick, New Jersey USA http://www.scils.rutgers.edu/~tefko

© Tefko Saracevic 1

Information science: a short definition

“the collection, classification, storage, retrieval, and dissemination of recorded knowledge treated both as a pure and as an applied science” Merriam-Webster © Tefko Saracevic 2

Organization of presentation

1.

2.

3.

4.

5.

6.

7.

8.

Big picture

Structure

– problems, solutions, social place main areas in research & practice

Technology

Information

– – information retrieval –

largest part

representation; bibliometrics

People

– users, use, seeking, context

Paradigm split

– distancing of areas

Digital libraries –

Conclusions

– whose are they anyhow?

big questions for the future © Tefko Saracevic 3

Part 1. The big picture

Problems addressed

α

Bit of history: Vannevar Bush

(1945): β β Defined problem as

“... the massive task of making more accessible of a bewildering store of knowledge.”

Problem still with us & growing © Tefko Saracevic 4

… solution

α

Bush suggested a machine:

“Memex ... association of ideas ... duplicate mental processes artificially.”

α

Technological fix to problem

α

Still with us: technological determinant

© Tefko Saracevic 5

At the base of information science: Problem

Trying to control content in α

Information explosion

β

exponential growth of information artifacts, if not of information itself

α

PLUS today Communication explosion

β

exponential growth of means and ways by which information is communicated, transmitted, accesses, used

© Tefko Saracevic 6

technological solution, BUT …

applying technology to solving problems of effective use of information

BUT:

from a

HUMAN & SOCIAL

and not only

TECHNOLOGICAL

perspective © Tefko Saracevic 7

or a symbolic model

People

© Tefko Saracevic

Information Technology

8

Problems & solutions: SOCIAL CONTEXT

α α Professional practice AND scientific inquiry related to: Effective communication of knowledge records - ‘literature’ - among humans in the context of social, organizational, & individual need for and use of information Taking advantage of modern information technology © Tefko Saracevic 9

Elaboration

α α α α Knowledge records = β content-bearing structures γ texts, sounds, images, multimedia, web ... ‘literature’ in given domains Communication = β human-computer-literature interface γ study of information science is the interface between people & literatures Information need, seeking, and use = β reason d'être Effectiveness = β relevance, utility © Tefko Saracevic 10

or as White & McCaine put it:

“modeling the world of publications with a practical goal of being able to deliver their content to inquirers [users] on demand.”

© Tefko Saracevic 11

General characteristics

α α α

Interdisciplinarity

with a number of fields, some more or less predominant - relations

Technological imperative

- driving force, as in many modern fields

Information society -

social context and role in evolution shared with many fields © Tefko Saracevic

Table of content

12

Part 2. Structure

Composition of the field

α α As many fields, information science has different areas of concentration & specialization They change, evolve over time β β grow closer, grow apart ignore each other, less or more © Tefko Saracevic 13

most importantly different areas…

α α receive more or less in funding emphasis β β & producing great imbalances in work & progress attracting different audiences & fields this includes β β vastly different levels of support for research and huge commercial investments & applications © Tefko Saracevic 14

How to view structure?

by decomposing areas & efforts in research & practice emphasizing

Technology or Informatio n or People

© Tefko Saracevic

Table of content

15

Part 3.

Technology

α Identified with

information retrieval (IR)

β β β by far biggest effort and investment international & global commercial interest large & growing © Tefko Saracevic 16

Information Retrieval – definition & objective

“ IR: ... intellectual aspects of description of information, ... search, ... & systems, machines...” Calvin Mooers , 1951 α How to provide users with relevant information

effectively?

For that objective: 1. How to organize information intellectually?

2. How to specify the search & interaction intellectually?

3. What techniques & systems to use effectively?

© Tefko Saracevic 17

Streams in IR Res. & Dev.

1.

β

Information science:

Services, users, use; β β Human-computer interaction; Cognitive aspects 2.

Computer science:

β β Algorithms, techniques Systems aspects 3

. Information industry:

β β Products, services, Web Market aspects α Problem: β relative isolation – discussed later © Tefko Saracevic 18

Contemporary IR research

α α α Now mostly done within computer science β e.g Special Interest Group on IR, Association for Computing Machinery (SIGIR,ACM) Spread globally β e.g. major IR research communities emerged in China, Korea, Singapore Branched outside of information science “everybody does information retrieval” β data mining, machine learning, natural language processing, artificial intelligence, computer graphics … © Tefko Saracevic 19

Text REtrieval Conference (TREC)

α α α Major research, laboratory effort Started in 1992, now probably ending β “support research within the IR community by providing the infrastructure necessary for large scale evaluation” Methods β β β provides large test beds, queries, relevance judgments, comparative analyses essentially using Cranfield 1960’s methodology organized around tracks γ various topics – changing over years © Tefko Saracevic 20

TREC impact

α α α International – big impact on creating research communities Annual conferences β reports, exchange results, foster cooperation Results β β β mostly in reports, available at http://trec.nist.gov/pubs.html

overviews provided as well but, only a fraction published in journals or books © Tefko Saracevic 21

TREC tracks 2004

103 groups from 21 countries

α α α α α α α Genomics with 4 sub tracks HARD (High Accuracy Retrieval from Documents) Novelty (new, nonredundant information) Question answering Robust (improving poorly performing topics) Terabyte (very large collections) Web track α Previous tracks: β β β β β β β β β β β ad-hoc (1992-1999) routing (92–97) interactive (94-02) filtering (95-02) cross language (97-02) speech (97-00) Spanish (94-96) video (00-01) Chinese (96-97) query (98-00) and a few more run for two years only © Tefko Saracevic 22

Broadening of IR –

ever changing, ever new areas added

α α α α α α α α α α α α Cross language IR (CLIR) Natural language processing (NLP IR) Music IR (MIR) Image, video, multimedia retrieval Spoken language retrieval IR for bioinformatics and genomics Summarization; text extraction Question answering Many human-computer interactions XML IR Web IR; Web search engines IR in context – big area for major search engines & newer research © Tefko Saracevic 23

Commercial IR

α α α Search engines based on IR But added many elaborations & significant innovations β β β dealing with HUGE numbers of pages fast countering spamming & page rank games – adversarial IR - combat of algorithms adding context for searching Spread & impact worldwide β β about 2000 engines in over 160 countries English was dominant, but not any more © Tefko Saracevic 24

Commercial IR: brave new world

α α α α Large investments & economic sector β hope for big profits, as yet questionable Leading to proprietary, secret IR β β also aggressive hiring of best talent new commercial research centers in different countries (e.g. MS in China) Academic research funding is changing β brain drain from academe Commercial search engines facing many challenges β view from : Amit Singhal presentation © Tefko Saracevic 25

IR successfully effected:

α α α α α Emergence & growth of the INFORMATION INDUSTRY Evolution of IS as a PROFESSION & SCIENCE Many APPLICATIONS in many fields β including on the Web – search engines Improvements in HUMAN - COMPUTER INTERACTION Evolution of INTEDISCIPLINARITY

IR has a long, proud history

© Tefko Saracevic

Table of content

26

Part 4.

Information

α Several areas of investigation; β β β as basic phenomenon – not much progress γ measures as Shannon's not successful γ γ concentrated on manifestations and effects no recent progress in this basic research information representation γ γ large area connected with IR, librarianship metadata bibliometrics γ structures of literature © Tefko Saracevic 27

What is information?

Intuitively well understood, but formally not well stated β Several viewpoints, models emerged α Shannon : β source-channel-destination signals not content – not really applicable, despite many tries α α Cognitive: structures β content processing & effects Social: context, situation β changes in cognitive information seeking, tasks © Tefko Saracevic 28

Cognitive – basic idea: K(S) +

Δ

I = K(S +

Δ

S)

(Brookes)

α Information when operating on a knowledge structure effect whereby the knowledge structure is changed produces an “Information is differences that make a difference” (Bateson) α Actually, it states the problem – β “unoperational” in information systems β involves mental events only β constructivists rejected it © Tefko Saracevic 29

Information manifestations

(Buckland)

α Information as a

process

β what someone knows is changed when informed; “the action of informing” (similar as Brookes) γ refers to cognitive changes + process of doing it α Information as

knowledge

β knowledge communicated about x γ uncertainty removal a special case γ refers to that which is being communicated intangible α Information as a

thing

β data, documents with quality of imparting information - tangible γ © Tefko Saracevic refers to potential information conveyed from objects 30

Information in information science: Three senses

(from narrowest to broadest)

1.

2.

3.

β Information in terms of decision involving little or no cognitive processing signals, bits, straightforward data - e.g.. inf. theory (Shanon), economics, β Information involving cognitive processing & understanding understanding, matching texts, Brookes β Information also as related to context, situation, problem-at-hand

USERS, USE,TASK

For information science (including information retrieval):

third, broadest interpretation necessary

© Tefko Saracevic 31

Bibliometrics

“… the quantitative treatment of the properties of recorded discourse and behavior pertaining to it.” Fairthorne α Many quantitative studies & some laws β Bradford’s law, Lotka’s law – regularities γ quantity/yield distributions of journals, authors α also related areas: β Scientometrics γ covering science in general, not just publications β β Infometrics γ all information objects Webmetrics or cybermetrics γ using bibliometric techniques to study the web © Tefko Saracevic

Table of content

32

Part 5.

People

α α Professional services β in organization – moving toward knowledge management, competitive intelligence β in industry – vendors, aggregators, Internet, Research β β β β user & use studies interaction studies broadening to information seeking studies, social context, collaboration relevance studies β social informatics © Tefko Saracevic 33

User & use studies

α α Oldest area β β covers many topics, methods, orientations many studies related to IR γ e.g. searching, multitasking, browsing, navigation Branching into Web use studies β β quantitative & qualitative studies emergence of webmetrics © Tefko Saracevic 34

Interaction

α α α α Traditional IR model concentrates on matching but not on user side & interaction Several interaction models suggested β γ Ingwersen’s cognitive , Belkin’s episode, Saracevic’s stratified model hard to get experiments & confirmation Considered key to providing γ γ basis for better design understanding of use of systems Web interactions: a major new area © Tefko Saracevic 35

Information seeking

α α α α Concentrates on broader context not only IR or interaction, people as they move in life & work Number of models provided β e.g. Kuhlthau’s stages , Järvelin’s task based Includes studies of ‘life in the round,’ making sense, information encountering, work life, information discovery Based on concept of social construction of information © Tefko Saracevic

Table of content

36

Part 6.

Paradigm split in technology - people

α Split from early 80’s to date into: β β

System-centered

algorithms, TREC, search engines continue traditional IR model β β

Human-(user)-centered

cognitive, situational, user studies interaction models, some started in TREC © Tefko Saracevic 37

Human vs. system

α α α Human (user) side: β β β often highly critical, even one-sided mantra of implications for design but does not deliver concretely System side: β β mostly ignores user side & studies ‘tell us what to do & we will’ Issue NOT H

or

S approach β β β even less H

vs.

S but how can H AND S work

together

major challenge for the future © Tefko Saracevic 38

Calls vs support

α α α α Many calls for user-centered or human centered design, approaches & evaluation Number of works discussing it, but few proposing concrete solutions But: most support for system work β

in the digital age support is for digital

Recent attempt at combining two views: Book: Ingerwersen, P . and Järvelin, K. (2005). The turn: Integration of information seeking and retrieval in context. Springer .

© Tefko Saracevic

Table of content

39

Part 7.

Digital libraries

α α α LARGE & growing area “Hot” area in R&D β β a number of large grants & projects in the US, European Union, & other countries but “

DIGITAL”

big & “libraries“ small “Hot” area in practice β β building digital collections, hybrid libraries, many projects throughout the world © Tefko Saracevic 40

Technical problems

α Substantial - larger & more complex than anticipated: β β β β β representing, storing & retrieving of library objects γ particularly if originally designed to be printed & then digitized operationally managing large collections issues of scale dealing with diverse & distributed collections γ interoperability assuring preservation & persistence incorporating rights management © Tefko Saracevic 41

US: Digital Library Initiatives

α α Consortia under National Science Foundation funding research β DLI 1 : 1994-98, 3 agencies, $24M, 6 large projects β β DLI 2: 1999-2006, 8 agencies, $60+M, 77 large & small projects in various categories joint international projects β National Science, Mathematics, Engineering, and Technology Education Digital Library γ some 200 demonstration & development projects Funding pretty much

over

by 2005 β funding now in related areas © Tefko Saracevic 42

EU: DELOS

α DELOS Network of Excellence on Digital Libraries β β β β β many projects throughout European Union γ heavily technological many meetings, workshops to some degree resembles DLIs in the US well funded, long range unlike in the US support still going on © Tefko Saracevic 43

Research issues

α α α α α α α α α understanding objects in DL β representing in many formats metadata, cataloging, indexing conversion, digitization organizing large collections managing collections, scaling preservation, archiving interoperability, standardization accessing, using, searching β federated searching of distributed collections evaluation of digital libraries © Tefko Saracevic 44

DL projects in practice

α α α α Heavily oriented toward institutions & their missions β in libraries, but also others γ γ museums, societies, government, commercial come in many varieties Spread globally β including digitization U California, Berkeley’s Libweb “lists over 7300 pages from libraries in over 125 countries” Spending increasing significantly β often a trade-off for other resources © Tefko Saracevic 45

Agendas

α α Most DL research agenda is set from top down β from funding agencies to projects β imprint of the computer science community's interest & vision Most DL practice agendas are set from bottom up β from institutions, incl. many libraries β imprint of institutional missions, interests & vision γ γ providing access to specialized materials and collections from an institution (s) that are otherwise not accessible covering in an integral way a domain with a range of sources © Tefko Saracevic 46

Connection?

α α DL research & DL practice presently are conducted β β β mostly independently of each other minimally informing each other and having slight, or no connection Parallel universes with little connections & interaction, at present β not good for either research or practice © Tefko Saracevic

Table of content

47

Part 8. Conclusions

IS contributions

α α α α α IS effected handling of information in society Developed an organized body of knowledge & professional competencies Applied interdisciplinarity IR reached a mature stage β penetrated many fields & human activities Stressed

HUMAN

interaction in human-computer © Tefko Saracevic 48

Challenges

α α α α α α Adjust to the growing & changing social & organizational role of inf. & related inf. infrastructure Play a positive role in globalization of information Respond to technological imperative in human terms Respond to changes from inf. to communication explosion - bringing own experiences to resolutions, particularly to the web Join competition with quality Join

DIGITAL

with

LIBRARIES

© Tefko Saracevic 49

Juncture

α α α α IS is at a critical juncture in its evolution Many fields, groups ... moving into information β β β big competition entrance of powerful players fight for stakes To be a major player IS needs to progress in its: β β β β research & development professional competencies educational efforts interdisciplinary relations

Reexamination necessary

© Tefko Saracevic 50

Thank you Miró!

© Tefko Saracevic 51

© Tefko Saracevic 52

Bibliography

Bates, M. J. (1999). Invisible Substrate of Information Science. Journal of the American Society for Information Science,50, 1043 1050.

Bush, V. (1945). As We May Think. Atlantic Monthly, 176, (11), 101 108. Available: http://www.theatlantic.com/unbound/flashbks/computer/bushf.htm

Hjørland, B. (2000). Library and Information Science: Practice, Theory, and Philosophical Basis. Information Processing & Management, 36 (3), 501-531.

Pettigrew, K.E. & McKechnie, L.E.F. (2000). The use of theory in information science research. Journal of the American Society for Information Science and Technology, 52 (1), 62 - 73.

Saracevic, T. (1999). Information Science. Journal of the American Society for Information Science, 50 (9) 1051-1063. Available: http://www.scils.rutgers.edu/~tefko/JASIS1999.pdf Saracevic, T. (2005). How were digital libraries evaluated? Presentation at the course and conference Libraries in the Digital Age (LIDA)30 May-3 June 2005, Dubrovnik, Croatia. Available: http://www.scils.rutgers.edu/~tefko/DL_evaluation_LIDA.pdf

Webber, S. (2003) Information Science in 2003: A Critique. Journal of Information Science, 29, (4), 311-330. White, H. and Mc Cain, K. (1998). Visualizing a Discipline: An Author Co-citation Analysis of Information Science 1972-1995. Journal of the American Society for Information Science, 49 (4), 327-355.

© Tefko Saracevic 53