Transcript Document

Information Seeking Behavior of Scientists

Brad Hemminger [email protected]

School of Information and Library Science University of North Carolina at Chapel Hill

Contributors

• • •

Assisting Researchers

– Jackson Fox (web survey) – Steph Adams (participant recruiter) – Dihui Lu (initial descriptive statistical analysis) – Billy Saelim (continued statistical analysis) – Chris Weisen (Odum Institute, statistical consultant)

Feedback on Survey Design

– UNC Libraries: Bill Burke (Botany), David Romito (Zoology), Jimmy Dickerson (Chemistry), Zari Kamarei (Math/Physics) – KT Vaughan (Health Sciences Library) – Cecy Brown (University of Oklahoma)

Supported by

– UNC Libraries – Carolina Center for Genome Sciences – Basic Science Department chairs – RENCI P20 grant

• • •

Why Study Information Seeking Behavior of Scientists

Goal is to improve scholarly communications. Other areas of my research involve presentation aspects (visualization/computer human interaction) and the storage and communication of scholarly information (digital libraries, institutional repositories, virtual communities of practice).

To do this we need to understand how people search out and use information currently, and why. As part of investigating this we found that there has been a significant change in the last 5-10 years.

So we’re studying ISB both to understand it, and to look at recent changes.

• •

How to Study the Information Seeking Behavior of Scientists?

Survey

– Reach many people – Address common questions – Produce lots of feedback for libraries – Quantitative, models of variance (“positivist” approach)

Interviews

– In depth coverage of selected groups (bioinformatics) – Use grounded theory and critical incident techniques to capture more qualitative, contextual experiences – Develop models of information processing and use

Survey--Long Term Plan

• • •

Conduct an initial survey study at UNC. Develop survey instrument and interview methodologies that work here, but could easily be applied on a larger scale. From the results of the initial UNC study, draft national version (with feedback from national sites).

Run national study. Setup so that other sites only have to recruit subjects; the entire survey runs off of UNC website. Hopefully this results in large number of sites and participants for minimal experimental costs.

Survey Sampling Technique

• •

Census

– Need to be able to reach all members – Best if can get response from large segment of population – Results in potentially more input from wider audiences, especially for the open comment questions. – Subject to bias (only computer users take, etc.)

Random sample

– Statistically, generally a better choice – Higher cost and significantly more work due to identifying and following up with individual subjects

Questions

Questions were based on

– Prior studies with which we wished to correlate our results. This is facilitated by authors who have published their surveys (in papers as appendix, e.g. Cecy Brown), and especially to folks who have put theirs collections of surveys online (e.g. Carol Tenopir).

– This allows us to compare results over time, as well as to clarify current practices (for instance whether print or electronic formats are used —and looking breaking this out into two questions, retrieval versus reading) – Covering issues that our librarians were concerned about – Developed during several drafts and that were reviewed by representatives from all libraries on campus.

Survey Instrument Choices

• • • •

Paper Phone Email Web-based. While these can require more effort than anticipated, if the number of survey respondents is over several hundred it is generally more cost effective*. This seemed the best choice since our pilot survey was of several thousand subjects, and our national survey was planned for tens of thousands. Since we have web and database expertise we were able to automate the process with minimal startup costs. *[Schonlau 2001, “Conducting Research Surveys via E-mail and the Web”].

Data Acquisition Details

• • •

PHP Surveyor used for web based survey. Another common choice at our school for simpler surveys is Survey Monkey. PHP Surveyor allowed us to ask multi-part questions, and to constrain answers to specific format responses. PHP Surveyor dumps data directly into MySQL database. Data is cleaned up then feed into SAS for analysis. (data cleaning is still a significant manual effort! Examples were determining Dept/CB, browsers that didn’t validate datatypes on forms properly).

Subjects and Recruitment

• • •

Subjects are university faculty, grad students and research staff.

We approached all science department chairs to get support first.

Contact

– Initial contact was by email giving motivation for study, indication of support by depts&campus, and link to web-based survey.

– Follow-ups by letter, then two emails – Flyers in department, Pizza Party Rewards

Look at Survey

902 participants from recruited departments, which were classified as either science or medicine.

Participation rate was 26%.

Participants by Department Survey

Analysis

• •

For the quantitative response variables standard descriptive statistics (mean, min, max, standard deviation) are computed, and histograms are used to visualize the distribution. Categorical variables are reported as counts and percentages for each category, and displayed as frequency tables.

Analysis: Correlations

• • •

Categorical vs Categorical

– Chi-square

Categorical vs Quantitative

– Analysis of Variance

Quantitative vs Quantitative

– Correlation •

Examples are by dept analysis of other features; age vs preferred interface (Google or Library)

Participants

Position professor associate professor assistant professor research staff/adjunct post graduate/fellow others doctoral student masters student Science 58 23 40 15 46 19 246 18 Science (%) Medicine 12.47 4.95 39 41 8.60 3.23 9.89 4.09 52.90 3.87 179 30 46 17 37 48 Medicine (%) 8.92 9.38 10.53 3.89 8.47 10.98 40.96 6.86 Total Total (%) 97 64 10.75 7.10 86 32 83 67 425 48 9.53 3.55 9.20 7.43 47.12 5.32

Gender

Science Science % Medicine Medicine % Total Total % Female 179 38.49 280 64.07 459 50.89 Male 286 61.51 157 35.93 443 49.11

Distance to Library

Distance to Library Count Percentage Same building 175 19.40

1/4 mile 1/2 mile 1 mile or more 570 88 69 63.19

9.76

7.65

Simple Questions

• • •

Ninety-one percent of the participants had access to the internet in their office or lab. Do you maintain a personal article collection?” Most all participants (85.4%) responded that they did, while only 14.6% did not Do you maintain a personal bibliographic database for print and/or electronic references?”, and 52.2% of the participants did maintain one, while 47.8% did not.

How often do you use…

book journal preprint conference proceeding webpage online database personal communic ation other Daily or Week ly % 24% 87% 18% 2% 5% 70% 67% daily 60 509 57 4 14 362 weekly 157 277 105 14 37 277 monthl y 241 72 155 37 79 132 293 311 119 52% quarterly 223 22 109 193 168 67 annuall y 148 6 72 492 273 19 never 73 16 404 162 331 45 49 32 98 1% 241 5 228 7 132 3 114 0 64 2 123 885

Most Important Individual Sources

Basic Science Journals Count Science 99 Nature 90 Cell 36 Journal of the American Chemical Society 34 Journal of Cell Biology 20 Journal of Biological Chemistry 19 Analytical Chemistry 18 PNAS 13 Journal of Neuroscience 12 Evolution 11 Neuron 11 Development 10 Journal of Organic Chemistry 10 Medicine Journals Count Science Nature JAMA UpToDate New England Journal of Medicine 28 18 Journal of Immunology American Journal of Epidemiology Cell 17 16 15 Lexi-Comp Journal of Biological Chemistry Epidemiology AIDS 14 13 12 12 PNAS 45 39 38 30

Important Alerts

Basic Science Alerts Count PubMed Faculty of 1000 ISI ACS Journal Alert Nature ScienceDirect Science PubCrawler Biomail COS J Biol Chem ACM 3 3 2 40 27 14 11 10 9 7 4 3 ArXiv BMC alerts Cancer Research 2 2 2 Medical Alerts Count PubMed Medscape Nature Faculty of 1000 PubCrawler ISI ePocrates ASHP NEJM MDLinx Science ScienceDirect ADA Daily Knowledge JAMA Kaiser listserv 3 3 5 4 4 4 3 53 11 10 9 9 7 6 5

Tools for Searching Information

Search tool type Citation index database Frequency Percentage General web search engine Fulltext digital library Personal search tool Knowledgebase web portal Others Online or local database Library collection 1084 694 156 125 93 69 52 21 47.25% 30.25% 6.80% 5.45% 4.05% 3.01% 2.27% 0.92%

Types of Information Sources

Sources (electronic) library subscribed journal (electronic) open (free) access journal or institutional repository or digital library (print) library subscribed journal (electronic) web site (author's website) (print) Personally subscribed journal (print) copy of colleague's print copy (electronic) personal subscribed journal (electronic) personal digital library (electronic) lab subscribed journal (electronic) copy of colleague's electronic copy (print) lab subscribed journal (print) interlibrary loan (print) document delivery service other Science 20.17 7.86 Medicine 19.89 9.29 4.48 4.36 3.44 1.07 3.10 2.89 2.72 1.60 2.05 0.59 0.13 0.02 3.61 3.31 4.01 5.00 2.65 1.97 1.14 1.98 0.79 0.55 0.19 0.13 Total 20.03 8.57 4.05 3.89 3.73 3.00 2.88 2.43 1.97 1.79 1.43 0.57 0.16 0.07

Articles in Personal Collection

Number of Articles Print Print % Electronic Electronic % none 45 104 1-49 50-99 100-499 500-999 1000+ 154 160 280 81 50 21.24% 22.07% 38.62% 11.17% 6.90% 259 127 210 44 26 38.89% 19.07% 31.53% 6.61% 3.90%

Articles in Personal Article Collection that have annotations

Percentage of entries with notes Total count Total Percentage <10% 11-20% 21-30% 31-40% 41-50% 51-60% 61-70% 71-80% 81-90% >90% 327 75 82 30 126 19 26 100 47 70 36.25 8.31 9.09 3.33 13.97 2.11 2.88 11.09 5.21 7.76

Preferred Search Method

Science Science % Medicine Medicine % Total Total % Electronic versions of databases and journals 443 95.27 429 98.17 872 96.67 Print versions of databases and journals 22 4.73 8 1.83 30 3.33

Preferred Viewing Method

Science Science (%) Medicine Medicine (%) Total Total (%) Both/it depends 292 62.80 260 59.50 552 61.20 electronic (computer) only 63 print (hard copy) only 110 13.55 23.66 52 11.90 115 12.75 125 28.60 235 26.05

0-2

Number of Visits to the Library in the past 12 Months

Science Science% Medicine Medicine % Total Total% 101 21.72% 107 24.49% 208 23.06% 3-5 6-10 11-20 21-50 51-100 101-200 >200 75 77 84 85 34 7 2 16.13% 16.56% 18.06% 18.28% 7.31% 1.51% 0.43% 99 71 55 67 19 13 6 22.65% 16.25 12.59 15.33 4.35 2.97 1.37 174 148 139 152 53 20 8 19.29% 16.41 15.41 16.85 5.88 2.22 0.89

Reasons for Visiting the Library

Science Science Medicine Medicine Total Total photocopy get assistance from a librarian use computers perform searches read current journals or other materials quiet reading space meeting browse pick up /drop off materials 256 65 59 81 161 156 45 99 214 22.54% 5.72% 274 96 22.81% 7.99% 5.19% 7.13% 14.17% 112 117 156 9.33% 9.74% 12.99% 13.73% 3.96% 8.71% 18.84% 179 73 60 134 14.90% 6.08% 5.00% 11.16% 335 118 159 348 530 161 171 198 317 22.68% 6.89% 7.32% 8.47% 13.56% 14.33% 5.05% 6.80% 14.89%

Factors Affecting Choice of Journal to Publish In

Science Medicine Total Factors Affecting Choice of Journal to Publish in Ability to include links, color, graphics, multimedia audience author having to pay cost of publication 1.38 3.52 1.51 1.24 3.38 1.54 2.31

4.45

2.53

availability on campus editorial board page charges for long articles or color figures speed of publication standing of journal in your field support of open access to journal articles 1.79 2.11 1.40 2.42 3.77 2.09 1.88 1.95 1.45 2.27 3.61 2.17 2.83

3.03

2.42

3.35

4.70

3.13

Google vs Library Search Page

“Which interface would you rather use to begin you search process?” with the possible responses “Google search page” and “Your library’s home page”. Overall, a slight majority of users preferred Google (53.3%) over the library page (46.7%); however, the difference was substantially larger for basic science researchers (Google 58.5% versus Library 41.5%) compared to medical researchers (Google 52.2% versus Library 47.8%).

Google vs Library Search Page

This difference may also be larger if the question had asked which style or type of interface the users preferred, as many of the comments in the survey indicated a strong preference for a single “meta” search tool where the user could enter a single search string that would result in all content in all resource collections being searched (as opposed to manually identifying resource collections and individually searching them).

Summary

We never leave our chairs…

• • •

Most all information seeking and use interactions occur on the researchers’ computer in their office.

As a result library visits have dramatically declined, and the reasons for visits to library have changed. Researchers read both in electronic and print form, but print (paper) is still the most preferred form.

Single Text Box + MetaSearch

• •

Researchers prefer a single text box for initial searching, that covers all resources.

This is most evidenced by preference for Google Scholar over library web page interfaces.

More than just text

• Researchers are making increasing use of content contained in online databases like Genbank, or web pages of research labs.

• For the scientists in our survey this type of access has

surpassed personal communications

and is close to journal articles in frequency of usage by researchers.

Transformative Changes

Transformative collaborative group communications have already taken place in the consumer marketplace, and are finding their way into scholarly communications.

Examples include folksonomies supporting community tagging (Del.icio.us), comment and review systems like Amazon’s rankings, FLickr, etc. Beginnings of similar changes are in their initial stages for scholarly communities, for instance Faculty of 1000 and the Connotea application for online sharing of bibliographic databases and annotations by scientists.

What might the future hold?

In the future the researcher may all maintain all their scholarly knowledge online and make it accessible to others as they see fit. Having scholars’ descriptions and annotations of the digital scholarly materials as well as the materials themselves available on the web will allow online communities and community review systems to blossom, just like the availability of online journals articles has transformed basic information seeking of science scholars today.

Future Work

Upcoming papers from UNC survey

– Correlations, information seeking behavior predictions from demographics – By department/research area comparisons – Review and reflection on major changes (with Cecy Brown, Don King, Carol Tenopir) – Textual analysis of library comments (Meredith Pulley, KT Vaughan) • ICIS tool for visualizing comments within schema • – New work being proposed by other researchers using this data (if you think the data from this study might help you in your research come talk to me).

National Study….(Florida, Oklahoma, others to start soon)…

Interview Studies (labs, individuals)

[email protected]