The Web of Science and Science on the Web: Recent Developments in Science Database Searching Gary Wiggins E-mail: [email protected] Indiana University School of Informatics IU-PU, Ft.
Download ReportTranscript The Web of Science and Science on the Web: Recent Developments in Science Database Searching Gary Wiggins E-mail: [email protected] Indiana University School of Informatics IU-PU, Ft.
The Web of Science and Science on the Web: Recent Developments in Science Database Searching Gary Wiggins E-mail: [email protected] Indiana University School of Informatics IU-PU, Ft. Wayne April 1, 2005 ISI’s Web of Knowledge • Integrates: – Web of Science – Current Contents Connect – Other sources • Contains journal, patent, proceedings, etc. and Web content Web of Science Web of Science: What is it? • 3 Citation Indexes in 1: – Science Citation Index (Expanded), 1900– Social Sciences Citation Index, 1956– Arts & Humanities Index, 1975- • SCI Time Coverage for Source Journals: – Previously, 1945– Starting in 2005, 1900• Added 262 journals from first half of 20th century • Coverage at IUB for all three: 1977- What is Citation Indexing? • Utilizes a known relevant document regardless of when published to find newer journal articles that have cited that document • Assumption: Autors who are citing the document must be writing on a related topic – Citation indexing lets you find newer articles from an older reference – Found on other tools, e.g., SciFinder Scholar, SCOPUS, but citation indexing doesn’t go as far back as does SCI • Gets around the problems of doing a subject search when you aren’t sure of the words to use Source Journal Coverage • SCIE: • SSCI: • A&HCI: 5700 titles 1735 titles* 1145 titles* *also includes selected articles from SCIE • Weekly updates – Lag time: 2-3 weeks • Journal List: http://www.isinet.com/journals/ 17 Document Types in All Files • • • • • • • Article Biographical Item Correction, Addition Editorial Material Letter News Item Also 13 special types for A&HCI only Reviews • • • • • • Book Review Database Review Hardware Review Review Software Review A&HCI: art exhibit, dance performance, film, music performance or score, etc. Cited Reference Searching • Every reference in the bibliography of each item published in the source journals becomes a potential search key. • Even second-listed and later authors can be searched IF you are content to consider just ISI source journals. • Cautions: – References in bibliographies may be incorrect or incomplete. – Not every journal in the world is a SCI source journal Web of Science Search Screen Search Example: Cited Reference Searching • Use the Full General Search and Cited Reference Search Option • Find publications that have cited the works of Donald E. Linn. – Dots before his name indicates he is not the first listed author on the publication. – Links are to ISI source journals. – Unlinked items may be incorrect forms of the reference. SCI Cited Ref Search for DE Linn Lookup Results for DE Linn Search DE Linn’s 2003 JACS Article Newer Articles Citing the 2003 JACS Article by DE Linn et al. Analysis of All Authors Citing DE Linn WOS Boolean and Proximity Operators; Truncation • AND, OR, NOT • Phrases imply adjacency when no operator is used. • SAME or SENT – Terms in the same “sentence” in any order • Precedence: ( ) > SAME or SENT > NOT > AND > OR • Truncation symbol (wild card): * Search Example: Use of Truncation and Proximity • Full General Search for a Topic: – Mouse Pheromones • Strategy: (mouse or mice or mus musculus) same (pheromone* or sex attractant*) Option: Place Searches (Full General Search) • Institution – College or University – Research Institution – Company • Geographic Data – Country – City – Postal Code Search Example: Use of Full Search with Address Data • Find all 1998 WOS source publications by IUB Chemistry Department personnel and arrange by source title (journal). • Use the Full General Search option. • Enter address data with the “same” proximity operator: Indiana same Chem same 47405 • Hint: Check the abbreviations list. Subject (Topic) Searches: What’s being searched? • Title words in all years • Authors’ Keywords, Keywords Plus, and Words from the Author Abstracts from 1991• Keywords Plus: words or phrases that frequently appear in the titles of an article's references, but do not appear in the title of the article itself Related Records • Articles that cite one or more of the same papers cited by the fully displayed record, sorted by relevance (i.e., by the number of shared references) • Taken from all databases and all years, not just those selected Search Example: Need to Use Synonyms in Topic Searches • No controlled subject vocabulary in WOS • XPS (ESCA) • Since both refer to the same technique, use both in an OR search Search Hints: Spaces, apostrophes included 1998• For an article authored by C. D. O'Brian O'BRIAN C* OR OBRIAN C* • For an article on Paget's disease PAGET'S DISEASE OR PAGETS DISEASE • For an article authored by W. de la Rosa DE LA ROSA W* OR DELAROSA W* Printing, Downloading Results • Mark records that you want to print or download (or MARK ALL), then SUBMIT them to MARKED LIST • Marked records are automatically submitted when you change pages Customizing the Answer Set • Basic output: Authors, titles, sources • Optional: – cited references, addresses, abstract, language, publisher information, ISSN document type, keywords, times cited • Sort by: – Latest date, First author, Source title, Times cited Printing or Saving/Exporting the Results • “Format for Print button” must be used if selected records were customized • Could print from the Web page • Saving/Exporting to file puts a filename.filetype: CIW.cgi – Can be read or printed with WordPad – Includes 2-character field codes – Can be imported into database software Bibliography Manager Software Options • • • • • • ProCite Reference Manager EndNote Reference Web Poster Bookwhere 2000 Import filters: http://www.risinc.com/support/rmfilters.asp Influences on the Current Database Environment • Increase in Interdisciplinary scientific research • Consolidation of the Scientific-Technical-Medical (STM) publishing world • Appearance of databases covering different formats: encyclopedias, treatises, review serials • The Web – Move to open access journals and “free” DBs – Different cultures in the chemistry publishing environment compared to that in biology Growth of Articles in CA Year 1907 1945 1960 1970 1980 1990 2000 Articles Abstracted 7,994 22,824 104,484 230,902 407,342 394,945 573,469 Source: http://www.cas.org/EO/casstats.pdf Vendors and Publishers • Partnership between commercial vendors and abstracting/indexing services (and to some extent with journal publishers) – Most activity in online searching started in the 1970s – Comparatively little change in the vendors’ search systems until relatively recently • Aggregation of databases • Cross-file searching • Command-driven access STN International • Partnership among Chemical Abstracts Service, FIZ Chemie, and the Japan Science and Technology Corporation • Has over 200 STM databases – STN Database Summary Sheets: http://info.cas.org/ONLINE/DBSS/dbsslist.html – Includes some databases also available free through other venues (e.g., Medline, GenBank) Features in Commercial Systems • • Special Boolean operators (proximity, adjacency, etc.) Truncation (wild cards and left-hand or right-hand truncation) Controlled vocabulary tools (MeSH, CAS’s Index Guide, CA Lexicon) Classification of the documents • • – – • • • • PACS (Physics and Astronomy Classification Scheme) CA Sections/Subsections Structure searching (usually range from exact to full substructure search) Numeric and other data that is searchable Data analysis tools Current awareness options Command Language Systems • Allow field-directed searches • Incorporate sophisticated Boolean relationships – AND, OR, NOT – Adjacency, Proximity, Logical linking to the same field or sub-field of a record • Numbers of intervening words can be specified • Drawback: User must learn the commands User-Oriented Software • Front-end systems to mask command language – STN’s SciFinder (& SF Scholar) – STN on the Web, STNEasy, STN Express – CrossFire Commander and MDL DiscoveryGate – Dialog’s DialogLink – Questel-ORBIT’s QWeb and Imagination Main Chemical Databases • • • • • Chemical Abstracts Beilstein/Gmelin Cambridge Structural Database Protein Data Bank Many other relevant databases CAS DBs: CA File • CA File, a bibliographic database covering journal articles (from ~8000 journals), technical reports, conference proceedings, dissertations, patents and other literature • 1907 to the present; full indexing has been added for all records retrospectively • Linked through the Registry Number to compound data • CAplus File, includes CA File data plus ejournals, some preprints, and all articles from ~1500 key chemical journals within one week of receipt Relative Contributions of Literature Types to CA Used with the permission of Chemical Abstracts Service (CAS), a division of the American Chemical Society, from: http://www.cas.org/casdb.html Old References Recently Added to CA Database The boiling-point curve for mixtures of ethyl alcohol and water. Noyes, William A.; Warfel, R. R. Rose Polytechnic Institute, Terre Haute, Journal of the American Chemical Society (1901), 23(7), 463-8. CODEN: JACSAT ISSN: 0002-7863. Journal written in English. CAN 0:1311 AN 1906:1311 CAPLUS (Copyright 2004 ACS on SciFinder (R)) Abstract In the determination with small amounts of alcohol, the readings of the thermometer were taken when the vapors first entered the condenser, as after boiling for a few minutes a relatively large proportion of the alcohol present would be found in the upper layers and in the condenser. The thermometer under these conditions registered about 0.3 higher. An examination of the table and curve revealed that the minimum boiling point is for alcohol of 96% by weight. The curve was steeper on the side toward absolute alcohol. Alcohol of 90.7% had the same boiling point as absolute alcohol. CAS DBs: Registry File • “Authority” file that lets indexers and searchers definitively identify a substance as new or find a previous entry • Contains all types of chemical substances, including biomolecules • Best file for chemical names • Many physical properties being added • Linked to CA and other files through the Registry Number (RN) Registry File Contents • Includes synonyms, molecular formulas, alloy composition tables, classes for polymers, nucleic acid and protein sequences, ring analysis data, and structure diagrams • Also: experimental and calculated property data from various sources as well as super roles and document type information from CAplus Registry File Contents • 72,297,557substances have a RN in the Registry File as of 9/26/2004 • All substances in CAS files plus others • Many physical constants now added to the records, most of them calculated – Lipinski Rule of Five values – BP, MP, Density, Optical Rotatory Power, Refractive Index – Data for 3D visualization Traumatic Acid: SFS eScience Size of the Registry File Date Sunday, 9/26/2004 Count 24,205,177 organic and inorganic substances 48,092,380 sequences CAS Registry Number 751481-24-0 is the most recent CAS Registry Number PubChem: A Threat to CAS? • PubChem, part of the NIH Roadmap plan under the Molecular Libraries and Imaging Initiative • Eventually planned to have several million compounds in the database • To be linked to assay data from High Throughput Screening analyses • http://pubchem.ncbi.nlm.nih.gov/ PubChem Opening Screen Beilstein Database • • • • • Covers organic chemistry back to 1771 Includes many physical properties Includes reaction information Structure searchable Available on: – STN and Dialog vendor systems – CrossFire Commander system for academic institutions – Elsevier MDL’s DiscoveryGate option DiscoveryGate for Academics • • • • • • • • • • • • • • • • • • • CrossFire Beilstein CrossFire Gmelin MDL® Available Chemicals Directory MDL® Screening Compounds Directory MDL® Reference Library of Synthetic Methodology MDL® Solid-Phase Organic Reactions ORGSYN (Organic Syntheses) Database Encyclopedia of Reagents for Organic Synthesis Comprehensive Organic Functional Group Transformations Comprehensive Asymmetric Catalysis MDL® Comprehensive Medicinal Chemistry MDL® Drug Data Report MDL® Metabolite Database MDL® Toxicity Database ChemInform Reaction Library Current Synthetic Methodology Derwent Journal of Synthetic Methods National Cancer Institute Database http://www.mdl.com/solutions/solutions_for/academics/dg_academics.jsp Gmelin Database • Covers inorganic and organometallic chemistry back to 1772 • Includes many physical and chemical properties • Not searchable for reactions • Accessible through the CrossFire Commander system for academic institutions and commercial systems that offer Beilstein Reaction Databases • CASReact • SPRESI • Organic Syntheses – Free version: http://chemfinder.cambridgesoft.com/reactions/orgsyn.asp • ISI’s Index Chemicus • e-EROS (Encyclopedia of Reagents for Organic Synthesis) • MDL’s Integrated Major Reference Works – Reactions indexed with InfoChem’s Reaction Classification Code, based on the degree of specificity around the reacting center: – http://www.infochem.de/eng/index.htm Cross-Product Approaches • MDL/InfoChem’s Integrated Major Reference Works – Thieme’s Science of Synthesis (successor to Houben–Weyl) – Springer’s Comprehensive Asymmetric Synthesis and their Glycoscience – Elsevier Science’s Comprehensive Organic Functional Group Transformations – Wiley’s Encyclopedia of Reagents for Organic Synthesis – Links to primary journal literature. Physical Property Databases • • • • Beilstein & Gmelin CRC Handbook (CHEMnetBASE) Ei ChemVillage knovel – Perry’s Chemical Engineers’ Handbook – Lange’s Handbook of Chemistry • Landolt-Börnstein Spectral Databases • • • • Bio-Rad Aldrich NIST Chemical WebBook Some high-quality free databases on the Web, e.g., • SDBS, Integrated Spectral Data System for Organic Compounds – http://www.aist.go.jp/RIODB/SDBS/menu-e.html SDBS IR Spectrum for Traumatic Acid Cambridge Structural Database • Bibliographic, chemical and crystallographic information for: – organic molecules – metal-organic compounds • 3D structures have been determined using: – X-ray diffraction – neutron diffraction • The CSD records results of: • 3D atomic coordinate data for at least all non-H atoms Isatin on the CSD Other Structural Databases • Protein Data Bank for polypeptides and polysaccharides having more than 24 units FREE http://www.rcsb.org/pdb/ • Nucleic Acids Data Bank for oligonucleotides FREE http://ndbserver.rutgers.edu/ • Inorganic Crystal Structure Database http://www.fizinformationsdienste.de/en/DB/icsd/ • CRYSTMET® for metals and alloys http://www.tothcanada.com/ Chemical Information System • 34 environmental databases – Originally developed by the US National Institutes of Health and the Environmental Protection Agency • Covers over 515,000 compounds – – – – – – Toxicological and/or carcinogenic research data information on handling hazardous materials chemical/physical property information Regulations safety and health effects information pharmaceutical data • http://www.nisc.com/cis/qcis1.asp Hybrid Links to the Web • STN’s eScience – http://www.escience.org/ • Elsevier Science’s Scirus – http://www.scirus.com/srsapp/ – Incorporated into Elsevier’s Scopus • http://www.scopus.com/scopus/home.url Single Publisher Databases • Elsevier’s ScienceDirect and their encyclopedia DBs – Scirus: http://www.scirus.com/srsapp/ • Wiley’s journal, book, and encyclopedia DBs: http://www3.interscience.wiley.com/ • American Chemical Society journals – http://pubs.acs.org/ Free Services • ChemFinder – http://chemfinder.cambridgesoft.com/ • ChemIDplus – http://chemfinder.cambridgesoft.com/ • Frederick/Bethesda Data and Online Services – http://cactus.nci.nih.gov/ • PubMed – http://www.ncbi.nlm.nih.gov/entrez/query.fcgi • DOE’s STI Information Bridge – http://www.osti.gov/bridge/ Electronic Journals • Coverage in some cases back to the 17th century • Most major publishers’ backfiles are now online • DOI: http://www.doi.org/ – Turn a DOI into a URL by appending http://dx.doi.org/ to the front of it • SFX: http://www.exlibrisgroup.com/sfx.htm • MDL’s Litlink • CrossRef: http://www.crossref.org/ Shift from Ownership to Licensing of Journals • IUB Chemistry Library e-journals – http://www.indiana.edu/~libchem/402ejrnl.html • Archival issues – Publisher archives (usually 2-3 locations) – LOCKSS: http://lockss.stanford.edu/ – Libraries often have no archival rights Budapest Open Access Initiative • Based on: – Self archiving by authors – Open Access journals, e.g., BioMed Central • http://www.soros.org/openaccess/ Open Access • Institute of Physics: most papers free for 30 days after publication – http://www.iop.org/EJ/ and http://www.iop.org/EJ/journal/NJP • Public Library of Science – http://www.publiclibraryofscience.org • Highwire Press – http://www.highwire.org/ • PubMed Central – http://www.pubmedcentral.nih.gov/ Opposition to Open Access • Reacting to NIH’s proposed policy on open access, C&EN Editor Rudy Baum says: “[This] action will inflict long-term damage on the communication of scientific results and on maintenance of the archive of scientific knowledge.” -- C&EN, September 20, 2004, p. 7 Open Access + Semantic Web • "Almost all of an author's output (compounds, spectra, reactions, properties, etc.) is nowadays computerised and in principle redistributable to the community for re-use. Few journals actively validate the primary data (e.g. spectra) involved in a publication (chemical crystallography being a clear exception where data are intensively reviewed by machine). We reassert that chemists must now move towards publishing their collective knowledge in a systematic and easily accessible form for re-use and innovation.... Open Access + Semantic Web • We urge that authors, funders, editors, publishers and readers move further towards the following protocol: [1] All information should be ultimately machineunderstandable in XML.... [2] Machine-understandable information for a compound should include a connection table, the IUPAC unique identifier (INChI) which guarantees that the connection table can be checked and regenerated, and a name.... [3] Rights metadata.” -- Murray-Rust, Rzepa, Tyrrella, Zhanga (2004) Future • XML and metadata – Dymond (DYnamic Metadata ON Demand) • Virtual journals (Virtual Journal of Nanoscale Science and Technology) • Copyright question and open access resolution • Legal protection of databases • Impact of INCHi and CML • Demise of Abstracting and Indexing Services? Conclusion • “The main challenge is for chemists to recognise the value of making their data machine-understandable, rather than destroying it with traditional paper or slidefocused publication and dissemination processes.” -- Murray-Rust, Rzepa, Tyrrella, Zhanga (2004) Parting words . . . If you're not part of the solution, you're part of the precipitate! Searches • • • • Isatin (91-56-5) Moronic Acid (RN 6713-27-5) Traumatic Acid (RN 6402-36-4) Others: http://www.chm.bris.ac.uk/sillymolecules/sillymols.htm Beilstein Structure Search R1=O or S R2=H, OH, OMe, CH3, or CO2H X = any halogen ? = any bond value Beilstein Property Search • Find the compounds in the Beilstein CrossFire database that have structure keyword "stereo compound" and molecular formula C29H36O8 and melting points in the range 258-271 Celsius. Bibliography • Culp, F. Bartow. "Ten or so things that every chemistry librarian absolutely, positively has to have to keep from being an absolute plonk." Sci-Tech News, February 2004, 58(1), 9. also published as: SLA Chemistry Division E-Newsletter Winter 2004, 18(3), 19-20). http://www.sla.org/division/dche/Newsletters/Feb_2004.pdf • Gasaway, Laura. “The open archives movement.” Information Outlook October 2004, 8(10), 36, 39-40. • Glen, Robert; Aldridge, Susan. “Developing tools and standards in molecular informatics.” Chemical Communications 2002, (23), 27452747. DOI: 10.1039/b207793k http://xlink.rsc.org/?DOI=b207793k Bibliography • Huber, C.; Porter, K. “Cheap tricks.” http://www.indiana.edu/~cheminfo/workshop/cheap.html • McLeland, Le-Nhung. What every chemist should know about patents. http://www.chemistry.org/portal/resources/?id=1b41692a6cf811d6f8dd6ed9fe800100 • Murray-Rust, Peter; Rzepa, Henry S.; Tyrrella, Simon M.; Zhanga, Y. “Representation and use of chemistry in the global electronic age.” forthcoming article in: Organic & Biomolecular Chemistry. http://www.ch.ic.ac.uk/rzepa/obc/ (preprint) • Wagner, A. Ben. "Finding physical properties of chemicals: A practical guide for scientists, engineers, and librarians.” Science & Technology Libraries 2001, 21(3/4), 27-45. (published Fall 2003) Text for personal and professional use available at: http://ublib.buffalo.edu/libraries/asl/staff/documents/wagner_phys_prop_stl_art.pdf Bibliography • Wiggins, Gary. “Overview of databases/data sources.” in Gasteiger, Johannes, ed. Handbook of Chemoinformatics: From Data to Knowledge in 4 Volumes. Wiley-VCH: 2003, v. 2, pp. 496-506. http://www.indiana.edu/~cheminfo/C571/wiggins_chapter_2003.pdf • Wiggins, Gary. “Teaching chemical literature, databases, and chemical informatics.” CPT; Committee on Professional Training [newsletter] Spring 2004, 4(1), 1-2. http://www.chemistry.org/portal/resources/ACS/ACSContent/education/ cpt/nl_cpt_spring2004.pdf