The Web of Science and Science on the Web: Recent Developments in Science Database Searching Gary Wiggins E-mail: [email protected] Indiana University School of Informatics IU-PU, Ft.

Download Report

Transcript The Web of Science and Science on the Web: Recent Developments in Science Database Searching Gary Wiggins E-mail: [email protected] Indiana University School of Informatics IU-PU, Ft.

The Web of Science and Science
on the Web: Recent Developments
in Science Database Searching
Gary Wiggins
E-mail: [email protected]
Indiana University
School of Informatics
IU-PU, Ft. Wayne
April 1, 2005
ISI’s Web of Knowledge
• Integrates:
– Web of Science
– Current Contents Connect
– Other sources
• Contains journal, patent, proceedings, etc.
and Web content
Web of Science
Web of Science: What is it?
• 3 Citation Indexes in 1:
– Science Citation Index (Expanded), 1900– Social Sciences Citation Index, 1956– Arts & Humanities Index, 1975-
• SCI Time Coverage for Source Journals:
– Previously, 1945– Starting in 2005, 1900• Added 262 journals from first half of 20th century
• Coverage at IUB for all three: 1977-
What is Citation Indexing?
• Utilizes a known relevant document regardless
of when published to find newer journal articles
that have cited that document
• Assumption: Autors who are citing the document
must be writing on a related topic
– Citation indexing lets you find newer articles from an
older reference
– Found on other tools, e.g., SciFinder Scholar,
SCOPUS, but citation indexing doesn’t go as far back
as does SCI
• Gets around the problems of doing a subject
search when you aren’t sure of the words to use
Source Journal Coverage
• SCIE:
• SSCI:
• A&HCI:
5700 titles
1735 titles*
1145 titles*
*also includes selected articles from SCIE
• Weekly updates
– Lag time: 2-3 weeks
• Journal List: http://www.isinet.com/journals/
17 Document Types in All Files
•
•
•
•
•
•
•
Article
Biographical Item
Correction, Addition
Editorial Material
Letter
News Item
Also 13 special types for A&HCI only
Reviews
•
•
•
•
•
•
Book Review
Database Review
Hardware Review
Review
Software Review
A&HCI: art exhibit, dance performance,
film, music performance or score, etc.
Cited Reference Searching
• Every reference in the bibliography of each item
published in the source journals becomes a
potential search key.
• Even second-listed and later authors can be
searched IF you are content to consider just ISI
source journals.
• Cautions:
– References in bibliographies may be incorrect or
incomplete.
– Not every journal in the world is a SCI source journal
Web of Science Search Screen
Search Example: Cited
Reference Searching
• Use the Full General Search and Cited
Reference Search Option
• Find publications that have cited the works
of Donald E. Linn.
– Dots before his name indicates he is not the
first listed author on the publication.
– Links are to ISI source journals.
– Unlinked items may be incorrect forms of the
reference.
SCI Cited Ref Search for DE Linn
Lookup Results for DE Linn Search
DE Linn’s 2003 JACS Article
Newer Articles Citing the 2003
JACS Article by DE Linn et al.
Analysis of All Authors Citing DE
Linn
WOS Boolean and Proximity
Operators; Truncation
• AND, OR, NOT
• Phrases imply adjacency when no
operator is used.
• SAME or SENT
– Terms in the same “sentence” in any order
• Precedence: ( ) > SAME or SENT >
NOT > AND > OR
• Truncation symbol (wild card): *
Search Example: Use of
Truncation and Proximity
• Full General Search for a Topic:
– Mouse Pheromones
• Strategy:
(mouse or mice or mus musculus) same
(pheromone* or sex attractant*)
Option: Place Searches (Full
General Search)
• Institution
– College or University
– Research Institution
– Company
• Geographic Data
– Country
– City
– Postal Code
Search Example: Use of Full
Search with Address Data
• Find all 1998 WOS source publications by IUB
Chemistry Department personnel and arrange
by source title (journal).
• Use the Full General Search option.
• Enter address data with the “same” proximity
operator:
Indiana same Chem same 47405
• Hint: Check the abbreviations list.
Subject (Topic) Searches:
What’s being searched?
• Title words in all years
• Authors’ Keywords, Keywords Plus, and
Words from the Author Abstracts from
1991• Keywords Plus: words or phrases that
frequently appear in the titles of an article's
references, but do not appear in the title of
the article itself
Related Records
• Articles that cite one or more of the same
papers cited by the fully displayed record,
sorted by relevance (i.e., by the number of
shared references)
• Taken from all databases and all years,
not just those selected
Search Example: Need to Use
Synonyms in Topic Searches
• No controlled subject vocabulary in WOS
• XPS (ESCA)
• Since both refer to the same technique,
use both in an OR search
Search Hints: Spaces,
apostrophes included 1998• For an article authored by C. D. O'Brian
O'BRIAN C* OR OBRIAN C*
• For an article on Paget's disease
PAGET'S DISEASE OR PAGETS DISEASE
• For an article authored by W. de la Rosa
DE LA ROSA W* OR DELAROSA W*
Printing, Downloading Results
• Mark records that you want to print or
download (or MARK ALL), then SUBMIT
them to MARKED LIST
• Marked records are automatically
submitted when you change pages
Customizing the Answer Set
• Basic output: Authors, titles, sources
• Optional:
– cited references, addresses, abstract,
language, publisher information, ISSN
document type, keywords, times cited
• Sort by:
– Latest date, First author, Source title, Times
cited
Printing or Saving/Exporting the
Results
• “Format for Print button” must be used if
selected records were customized
• Could print from the Web page
• Saving/Exporting to file puts a
filename.filetype: CIW.cgi
– Can be read or printed with WordPad
– Includes 2-character field codes
– Can be imported into database software
Bibliography Manager Software
Options
•
•
•
•
•
•
ProCite
Reference Manager
EndNote
Reference Web Poster
Bookwhere 2000
Import filters:
http://www.risinc.com/support/rmfilters.asp
Influences on the Current Database
Environment
• Increase in Interdisciplinary scientific research
• Consolidation of the Scientific-Technical-Medical
(STM) publishing world
• Appearance of databases covering different
formats: encyclopedias, treatises, review serials
• The Web
– Move to open access journals and “free” DBs
– Different cultures in the chemistry publishing
environment compared to that in biology
Growth of Articles in CA
Year
1907
1945
1960
1970
1980
1990
2000
Articles Abstracted
7,994
22,824
104,484
230,902
407,342
394,945
573,469
Source: http://www.cas.org/EO/casstats.pdf
Vendors and Publishers
• Partnership between commercial vendors
and abstracting/indexing services (and to
some extent with journal publishers)
– Most activity in online searching started in the
1970s
– Comparatively little change in the vendors’
search systems until relatively recently
• Aggregation of databases
• Cross-file searching
• Command-driven access
STN International
• Partnership among Chemical Abstracts
Service, FIZ Chemie, and the Japan
Science and Technology Corporation
• Has over 200 STM databases
– STN Database Summary Sheets:
http://info.cas.org/ONLINE/DBSS/dbsslist.html
– Includes some databases also available free
through other venues (e.g., Medline,
GenBank)
Features in Commercial Systems
•
•
Special Boolean operators (proximity, adjacency, etc.)
Truncation (wild cards and left-hand or right-hand
truncation)
Controlled vocabulary tools (MeSH, CAS’s Index
Guide, CA Lexicon)
Classification of the documents
•
•
–
–
•
•
•
•
PACS (Physics and Astronomy Classification Scheme)
CA Sections/Subsections
Structure searching (usually range from exact to full
substructure search)
Numeric and other data that is searchable
Data analysis tools
Current awareness options
Command Language Systems
• Allow field-directed searches
• Incorporate sophisticated Boolean
relationships
– AND, OR, NOT
– Adjacency, Proximity, Logical linking to the
same field or sub-field of a record
• Numbers of intervening words can be specified
• Drawback: User must learn the commands
User-Oriented Software
• Front-end systems to mask command
language
– STN’s SciFinder (& SF Scholar)
– STN on the Web, STNEasy, STN Express
– CrossFire Commander and MDL
DiscoveryGate
– Dialog’s DialogLink
– Questel-ORBIT’s QWeb and Imagination
Main Chemical Databases
•
•
•
•
•
Chemical Abstracts
Beilstein/Gmelin
Cambridge Structural Database
Protein Data Bank
Many other relevant databases
CAS DBs: CA File
• CA File, a bibliographic database covering
journal articles (from ~8000 journals), technical
reports, conference proceedings, dissertations,
patents and other literature
• 1907 to the present; full indexing has been added for all
records retrospectively
• Linked through the Registry Number to compound data
• CAplus File, includes CA File data plus ejournals, some preprints, and all articles from
~1500 key chemical journals within one week of
receipt
Relative Contributions of Literature
Types to CA
Used with the permission of Chemical Abstracts Service (CAS),
a division of the American Chemical Society, from:
http://www.cas.org/casdb.html
Old References Recently Added to
CA Database
The boiling-point curve for mixtures of ethyl alcohol and water.
Noyes, William A.; Warfel, R. R. Rose Polytechnic Institute, Terre
Haute, Journal of the American Chemical Society (1901), 23(7),
463-8. CODEN: JACSAT ISSN: 0002-7863. Journal written in
English. CAN 0:1311 AN 1906:1311 CAPLUS (Copyright
2004 ACS on SciFinder (R))
Abstract
In the determination with small amounts of alcohol, the readings of the
thermometer were taken when the vapors first entered the
condenser, as after boiling for a few minutes a relatively large
proportion of the alcohol present would be found in the upper layers
and in the condenser. The thermometer under these conditions
registered about 0.3 higher. An examination of the table and curve
revealed that the minimum boiling point is for alcohol of 96% by
weight. The curve was steeper on the side toward absolute alcohol.
Alcohol of 90.7% had the same boiling point as absolute alcohol.
CAS DBs: Registry File
• “Authority” file that lets indexers and
searchers definitively identify a substance
as new or find a previous entry
• Contains all types of chemical substances,
including biomolecules
• Best file for chemical names
• Many physical properties being added
• Linked to CA and other files through the
Registry Number (RN)
Registry File Contents
• Includes synonyms, molecular formulas,
alloy composition tables, classes for
polymers, nucleic acid and protein
sequences, ring analysis data, and
structure diagrams
• Also: experimental and calculated property
data from various sources as well as super
roles and document type information from
CAplus
Registry File Contents
• 72,297,557substances have a RN in the
Registry File as of 9/26/2004
• All substances in CAS files plus others
• Many physical constants now added to the
records, most of them calculated
– Lipinski Rule of Five values
– BP, MP, Density, Optical Rotatory Power,
Refractive Index
– Data for 3D visualization
Traumatic Acid: SFS  eScience
Size of the Registry File
Date
Sunday, 9/26/2004
Count
24,205,177 organic and
inorganic substances
48,092,380 sequences
CAS Registry Number
751481-24-0 is the most
recent CAS Registry
Number
PubChem: A Threat to CAS?
• PubChem, part of the NIH Roadmap plan
under the Molecular Libraries and Imaging
Initiative
• Eventually planned to have several million
compounds in the database
• To be linked to assay data from High
Throughput Screening analyses
• http://pubchem.ncbi.nlm.nih.gov/
PubChem Opening Screen
Beilstein Database
•
•
•
•
•
Covers organic chemistry back to 1771
Includes many physical properties
Includes reaction information
Structure searchable
Available on:
– STN and Dialog vendor systems
– CrossFire Commander system for academic
institutions
– Elsevier MDL’s DiscoveryGate option
DiscoveryGate for Academics
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
CrossFire Beilstein
CrossFire Gmelin
MDL® Available Chemicals Directory
MDL® Screening Compounds Directory
MDL® Reference Library of Synthetic Methodology
MDL® Solid-Phase Organic Reactions
ORGSYN (Organic Syntheses) Database
Encyclopedia of Reagents for Organic Synthesis
Comprehensive Organic Functional Group Transformations
Comprehensive Asymmetric Catalysis
MDL® Comprehensive Medicinal Chemistry
MDL® Drug Data Report
MDL® Metabolite Database
MDL® Toxicity Database
ChemInform Reaction Library
Current Synthetic Methodology
Derwent Journal of Synthetic Methods
National Cancer Institute Database
http://www.mdl.com/solutions/solutions_for/academics/dg_academics.jsp
Gmelin Database
• Covers inorganic and organometallic
chemistry back to 1772
• Includes many physical and chemical
properties
• Not searchable for reactions
• Accessible through the CrossFire
Commander system for academic
institutions and commercial systems that
offer Beilstein
Reaction Databases
• CASReact
• SPRESI
• Organic Syntheses
– Free version:
http://chemfinder.cambridgesoft.com/reactions/orgsyn.asp
• ISI’s Index Chemicus
• e-EROS (Encyclopedia of Reagents for Organic
Synthesis)
• MDL’s Integrated Major Reference Works
– Reactions indexed with InfoChem’s Reaction Classification
Code, based on the degree of specificity around the reacting
center:
– http://www.infochem.de/eng/index.htm
Cross-Product Approaches
• MDL/InfoChem’s Integrated Major Reference
Works
– Thieme’s Science of Synthesis (successor to
Houben–Weyl)
– Springer’s Comprehensive Asymmetric Synthesis and
their Glycoscience
– Elsevier Science’s Comprehensive Organic
Functional Group Transformations
– Wiley’s Encyclopedia of Reagents for Organic
Synthesis
– Links to primary journal literature.
Physical Property Databases
•
•
•
•
Beilstein & Gmelin
CRC Handbook (CHEMnetBASE)
Ei ChemVillage
knovel
– Perry’s Chemical Engineers’ Handbook
– Lange’s Handbook of Chemistry
• Landolt-Börnstein
Spectral Databases
•
•
•
•
Bio-Rad
Aldrich
NIST Chemical WebBook
Some high-quality free databases on the
Web, e.g.,
• SDBS, Integrated Spectral Data System
for Organic Compounds
– http://www.aist.go.jp/RIODB/SDBS/menu-e.html
SDBS IR Spectrum for
Traumatic Acid
Cambridge Structural Database
• Bibliographic, chemical and crystallographic
information for:
– organic molecules
– metal-organic compounds
• 3D structures have been determined using:
– X-ray diffraction
– neutron diffraction
• The CSD records results of:
• 3D atomic coordinate data for at least all non-H
atoms
Isatin on the CSD
Other Structural Databases
• Protein Data Bank for polypeptides and
polysaccharides having more than 24 units
FREE http://www.rcsb.org/pdb/
• Nucleic Acids Data Bank for oligonucleotides
FREE http://ndbserver.rutgers.edu/
• Inorganic Crystal Structure Database
http://www.fizinformationsdienste.de/en/DB/icsd/
• CRYSTMET® for metals and alloys
http://www.tothcanada.com/
Chemical Information System
• 34 environmental databases
– Originally developed by the US National Institutes of
Health and the Environmental Protection Agency
• Covers over 515,000 compounds
–
–
–
–
–
–
Toxicological and/or carcinogenic research data
information on handling hazardous materials
chemical/physical property information
Regulations
safety and health effects information
pharmaceutical data
• http://www.nisc.com/cis/qcis1.asp
Hybrid Links to the Web
• STN’s eScience
– http://www.escience.org/
• Elsevier Science’s Scirus
– http://www.scirus.com/srsapp/
– Incorporated into Elsevier’s Scopus
• http://www.scopus.com/scopus/home.url
Single Publisher Databases
• Elsevier’s ScienceDirect and their
encyclopedia DBs
– Scirus: http://www.scirus.com/srsapp/
• Wiley’s journal, book, and encyclopedia
DBs: http://www3.interscience.wiley.com/
• American Chemical Society journals
– http://pubs.acs.org/
Free Services
• ChemFinder
– http://chemfinder.cambridgesoft.com/
• ChemIDplus
– http://chemfinder.cambridgesoft.com/
• Frederick/Bethesda Data and Online Services
– http://cactus.nci.nih.gov/
• PubMed
– http://www.ncbi.nlm.nih.gov/entrez/query.fcgi
• DOE’s STI Information Bridge
– http://www.osti.gov/bridge/
Electronic Journals
• Coverage in some cases back to the 17th
century
• Most major publishers’ backfiles are now online
• DOI: http://www.doi.org/
– Turn a DOI into a URL by appending http://dx.doi.org/
to the front of it
• SFX: http://www.exlibrisgroup.com/sfx.htm
• MDL’s Litlink
• CrossRef: http://www.crossref.org/
Shift from Ownership to Licensing
of Journals
• IUB Chemistry Library e-journals
– http://www.indiana.edu/~libchem/402ejrnl.html
• Archival issues
– Publisher archives (usually 2-3 locations)
– LOCKSS: http://lockss.stanford.edu/
– Libraries often have no archival rights
Budapest Open Access Initiative
• Based on:
– Self archiving by authors
– Open Access journals, e.g., BioMed Central
• http://www.soros.org/openaccess/
Open Access
• Institute of Physics: most papers free for 30 days
after publication
– http://www.iop.org/EJ/ and
http://www.iop.org/EJ/journal/NJP
• Public Library of Science
– http://www.publiclibraryofscience.org
• Highwire Press
– http://www.highwire.org/
• PubMed Central
– http://www.pubmedcentral.nih.gov/
Opposition to Open Access
• Reacting to NIH’s proposed policy on open
access, C&EN Editor Rudy Baum says:
“[This] action will inflict long-term damage on
the communication of scientific results and
on maintenance of the archive of scientific
knowledge.”
-- C&EN, September 20, 2004, p. 7
Open Access + Semantic Web
• "Almost all of an author's output (compounds,
spectra, reactions, properties, etc.) is nowadays
computerised and in principle redistributable to
the community for re-use. Few journals actively
validate the primary data (e.g. spectra) involved
in a publication (chemical crystallography being
a clear exception where data are intensively
reviewed by machine). We reassert that
chemists must now move towards publishing
their collective knowledge in a systematic and
easily accessible form for re-use and
innovation....
Open Access + Semantic Web
• We urge that authors, funders, editors,
publishers and readers move further towards the
following protocol:
[1] All information should be ultimately machineunderstandable in XML....
[2] Machine-understandable information for a compound
should include a connection table, the IUPAC unique
identifier (INChI) which guarantees that the
connection table can be checked and regenerated,
and a name....
[3] Rights metadata.”
-- Murray-Rust, Rzepa, Tyrrella, Zhanga (2004)
Future
• XML and metadata
– Dymond (DYnamic Metadata ON Demand)
• Virtual journals (Virtual Journal of Nanoscale
Science and Technology)
• Copyright question and open access resolution
• Legal protection of databases
• Impact of INCHi and CML
• Demise of Abstracting and Indexing Services?
Conclusion
• “The main challenge is for chemists to
recognise the value of making their data
machine-understandable, rather than
destroying it with traditional paper or slidefocused publication and dissemination
processes.”
-- Murray-Rust, Rzepa, Tyrrella, Zhanga (2004)
Parting words . . .
If you're not part of the solution, you're part
of the precipitate!
Searches
•
•
•
•
Isatin (91-56-5)
Moronic Acid (RN 6713-27-5)
Traumatic Acid (RN 6402-36-4)
Others:
http://www.chm.bris.ac.uk/sillymolecules/sillymols.htm
Beilstein Structure Search
R1=O or S
R2=H, OH, OMe, CH3, or CO2H
X = any halogen
? = any bond value
Beilstein Property Search
• Find the compounds in the Beilstein
CrossFire database that have structure
keyword "stereo compound" and
molecular formula C29H36O8 and melting
points in the range 258-271 Celsius.
Bibliography
• Culp, F. Bartow. "Ten or so things that every chemistry librarian
absolutely, positively has to have to keep from being an absolute
plonk." Sci-Tech News, February 2004, 58(1), 9. also published as:
SLA Chemistry Division E-Newsletter Winter 2004, 18(3), 19-20).
http://www.sla.org/division/dche/Newsletters/Feb_2004.pdf
• Gasaway, Laura. “The open archives movement.” Information
Outlook October 2004, 8(10), 36, 39-40.
• Glen, Robert; Aldridge, Susan. “Developing tools and standards in
molecular informatics.” Chemical Communications 2002, (23), 27452747. DOI: 10.1039/b207793k
http://xlink.rsc.org/?DOI=b207793k
Bibliography
•
Huber, C.; Porter, K. “Cheap tricks.”
http://www.indiana.edu/~cheminfo/workshop/cheap.html
•
McLeland, Le-Nhung. What every chemist should know about patents.
http://www.chemistry.org/portal/resources/?id=1b41692a6cf811d6f8dd6ed9fe800100
•
Murray-Rust, Peter; Rzepa, Henry S.; Tyrrella, Simon M.; Zhanga, Y.
“Representation and use of chemistry in the global electronic age.”
forthcoming article in: Organic & Biomolecular Chemistry.
http://www.ch.ic.ac.uk/rzepa/obc/ (preprint)
•
Wagner, A. Ben. "Finding physical properties of chemicals: A practical guide
for scientists, engineers, and librarians.” Science & Technology Libraries
2001, 21(3/4), 27-45. (published Fall 2003)
Text for personal and professional use available at:
http://ublib.buffalo.edu/libraries/asl/staff/documents/wagner_phys_prop_stl_art.pdf
Bibliography
• Wiggins, Gary. “Overview of databases/data sources.” in Gasteiger,
Johannes, ed. Handbook of Chemoinformatics: From Data to
Knowledge in 4 Volumes. Wiley-VCH: 2003, v. 2, pp. 496-506.
http://www.indiana.edu/~cheminfo/C571/wiggins_chapter_2003.pdf
• Wiggins, Gary. “Teaching chemical literature, databases, and
chemical informatics.” CPT; Committee on Professional Training
[newsletter] Spring 2004, 4(1), 1-2.
http://www.chemistry.org/portal/resources/ACS/ACSContent/education/
cpt/nl_cpt_spring2004.pdf