Pears, Gwen and JZKit Training

Download Report

Transcript Pears, Gwen and JZKit Training

Pears, Gwen and JZKit
Training
Designing and Building Databases
Topics
 Pears Database Building - Introduction
B Database Description File
C Building Databases
D Configuring and Testing
E Database Utilities and Maintenance
F Advanced Database Description Concepts
Pears, Gwen and JZKit Training
2
Pears Database Building
Introduction
Pears provides tools that allow you to:
• Build databases from structured data such as:
– MARC - that has a defined standard
structure.
– XML – that has loose structure but clearly
identified fields.
• Determine each index for the database.
• Load the records into a database following
your indexing definitions.
Pears, Gwen and JZKit Training
3
Pears Database Building
Exercise Preview
• View the structure of a small set of MARC records.
• Build a small database from those records.
• Look at the setup database description file.
• Build the Database.
• Test database for correctness using testgwen.
• Add the database to the JZKit configuration files,
making it searchable by a Z39.50 Client.
Pears, Gwen and JZKit Training
4
Pears Database Building
The Gwen Search Engine
• The Gwen search engine is a generalized text retrieval
engine.
• Functionality is contained in the Java classes that can be
embedded in Java applications including the JZKit Z39.50
Server.
• The JZKit server allows multiple, simultaneous users
utilizing a client program supporting the Z39.50 protocol, to
browse, search and display records from Pears databases.
Pears, Gwen and JZKit Training
5
Pears Database Building
Logical and Physical Databases
• A Gwen Database is a logical database
– It provides features for searching and retrieving
records
• A Pears Database is a physical database
– It provides the information that a Gwen database
needs
Pears, Gwen and JZKit Training
6
Pears Database Building
Gwen Database Features
• A Gwen Database has:
– Indexes with numeric ID’s
– Index Terms with Postings Lists
– Postings Lists have Record Numbers and
Restrictor Data
Pears, Gwen and JZKit Training
7
Pears Database Building
What is a Pears Database?
• A Pears database is a single physical file
with three main kinds of data
– Record data
– Index data
– Postings data
Pears, Gwen and JZKit Training
8
Pears Database Building
Record Data
• Contains the actual records of your
database.
• Records are stored as BER-encoded
records.
• Each record is identified by a unique logical
record number.
Pears, Gwen and JZKit Training
9
Pears Database Building
Index Data
• Contains a sorted list of all the Index Terms
extracted from your data records.
• Index Terms Contain:
– term/index-id.
– number of records that term appears in
(postings count).
– a list of records that contain that term or a
pointer to such a list.
Pears, Gwen and JZKit Training
10
Understanding the
Database Structure
INDEX
abercrombie: au: postings=2, postings list=r17, r15
anderson: au : postings=102, postings list ID=l21
Pears, Gwen and JZKit Training
11
Pears Database Building
Postings Data
• Contains a list of record ID’s for each of the
terms in the index.
• Each record ID may have restrictor and
proximity information associated with it.
Pears, Gwen and JZKit Training
12
Understanding the
Database Structure
INDEX
abercrombie: au: postings=2, postings list=r17, r15
anderson: au : postings=102, postings list ID=l21
Pears, Gwen and JZKit Training
POSTINGS
l21: r1024, r1021, r1007, r995, …
13
Understanding the
Database Structure
POSTINGS
INDEX
abercrombie: au: postings=2, postings list=r995, r175
anderson: au : postings=102, postings list ID=l21
l21: r1024, r1021, r1007, r995, …
RECORDS
r995:
au: Abercrombie & Anderson
ti: Tennis Made Easy
yr: 1905
Pears, Gwen and JZKit Training
14
Pears Database Building
Data Conversion
• The Bartlett class is responsible for updating a
Pears database.
• Bartlett automatically converts input records to the
Pears internal BER format.
• The class of objects that do the conversion are
called RecordHandlers.
• RecordHandler is a Java Interface class
– You can write your own RecordHandlers!
Pears, Gwen and JZKit Training
15
Pears Database Building
Data Conversion Options
• There are two primary Pears RecordHandlers that
convert your data to BER format.
– HandleUSMARC
– HandleSGML
• There are several others:
– HandleBER, HandleDB, HandlePDB, HandleUnimarc,
HandleChinaMarc
Pears, Gwen and JZKit Training
16
Pears Database Building
Data Conversion
• The RecordHandler class has a main() method that
you can use to test RecordHandlers and/or your
data.
– Usage:
java ORG.oclc.RecordHandler.RecordHandler –
c<class> -i<inputFile> -o<outputFile> …
– Example:
java ORG.oclc.RecordHandler.RecordHandler
–cUSMARC –iscifi.usmarc –oscifi.ber –n10
Pears, Gwen and JZKit Training
17
Pears Database Building
Data Conversion
• BER (Basic Encoding Rules) is defined by ISO8825
• It was created to encode ASN.1 records
• Encodes tree-structured data (equivalent to DOM
records)
• Can contain binary data (e.g. .jpeg files) (unlike
DOM records!)
Pears, Gwen and JZKit Training
18
BER Record Structure
tag=1
tag=2
tag=3
Ohio
tag=1
Ralph
tag=2
LeVan
Pears, Gwen and JZKit Training
tag=4
OCLC
tag=1, Class=1, form=1,
count=3
tag=2, Class=2, form=1,
count=2
tag=1, Class=2, form=0,
count=5
data=Ralph
tag=1, Class=2, form=0,
count=5
data=LeVan
tag=3, Class=2, form=0,
count=4
data=Ohio
tag=4, Class=2, form=0,
count=4
data=OCLC
19
Pears Database Building
000 nmm Ia
001 ocm35003642
003 OCoLC
005 19000000000108.0
For USMARC data – (InputRecordtype=USMARC)
008 960628s1995 cau
d
eng d
040
$aFQM$cFQM
096
$aNTERNET
245 00
$aOphthalmic Anesthesia Society $h[computer file].
256
$aComputer data.
260
$a San Diego, CA : $b Ophthalmic Anesthesia Society, $c1995.
516
$aHtml text and images in GIF and JPeg.
538
$aSystem requirements: Html browser, JPeg compatible browser or image viewer.
538
$aMode of access: Internet. Host: www.iea.com/3dans/OAS/oasDhomepage.html
500
$aTitle from title screen.
521
$aMedical.
520
$aHome page of the Ophthalmic Anesthesia Society with articles, references,
e-mail addresses of members, pictures and ophthalmic anesthesia resources.
650 02$aSocieties, Medical.
650 02$aOphthalmology.
650 02$aAnesthesia.
710 02$aOphthalmic Anesthesia Society.
856 07$u http://www.iea.com/3dans/OAS/oasDhomepage.html$2http$zOphthalmic Anesthesia Society home page
Marc Data Example
Pears, Gwen and JZKit Training
20
HandleUSMARC converts
this...
01981cam220034945000080041000000170024000410220014000650300011000790690020000901000020001
10100001300130110004800143245010800191260002900299500005700328500004600385500031600431520
05340074754600120128165000180129365000400131165000410135165000210139269000440141369000400
145790002301497690001801520690003001538690002501568690001601593773002201609^^000000s1993e
ng^_a0370-2693/93/$06.00^^ ^_a0370-2693^^ ^_aPYLBAJ^^ ^_aA9308-1385K-002^^ ^_aBrandenburg, A.
^^ ^_aMa, J.P.^^ ^_aInst. fur Theor. Phys., Heidelberg, Germany^^ ^_aCP odd observables for the top-antitop
system produced at proton-antiproton and proton-proton colliders^^ ^_aNetherlands^_c7 Jan. 1993^^ ^_a
SOURCE:Physics Letters B, vol.298, no.1-2, p. 211-17^^ ^_aTREATMENT: T; Theoretical or Mathematical^^
^_aCLASS CODES: A1385K (Inclusive reactions, including total cross sections, (energy > 10 GeV))^_aA1110E
(Lagrangian and Hamiltonian approach)^_aA1130E (Charge conjugation, parity, time reversal and other discret
symmetries)^_aA1340F (Electromagnetic form factors; electric and magnetic moments; structure functions)^^ ^_
aThe authors propose some CP odd observables to test CP invariance in the tt system produced at pp and pp colliders.
Using these observables the effects of CP violation from the production and from the decay of the top quarks can be
separated well. The application of their observables to pp collisions, where one has no CP invariant initial state, is
discussed. To parametrize CP violating interactions their use an effective lagrangian for the tt production and a general
form factor approach for the decay of t and t (19 Refs.)^^ ^_aEnglish^^ ^_aCP invariance^^ ^_aform factors
(elementary particles)^^ ^_aproton-proton inclusive interactions^^ ^_aquark production^^ ^_aantiproton+proton
producing antitop+top^^ ^_aproton+proton producing antitop+top^^ ^_aCP odd observables^^ ^_aCP invariance^^
^_aCP violating interactions^^ ^_aeffective lagrangian^^ ^_aform factor
Pears, Gwen and JZKit Training
21
...to this
tag=0, Class=1, form=1, count=22
tag=0, Class=2, form=0, count=8
data=nmm Ia
tag=245, Class=2, form=1, count=3
tag=0, Class=2, form=0, count=2
data=00
tag=1, Class=2, form=0, count=29
data=Ophthalmic Anesthesia
Society
tag=8, Class=2, form=0, count=16
data=[computer file].
tag=260, Class=2, form=1, count=4
tag=0, Class=2, form=0, count=2
data=
tag=1, Class=2, form=0, count=15
data=San Diego, CA :
tag=2, Class=2, form=0, count=30
data=Ophthalmic Anesthesia
Society,
tag=3, Class=2, form=0, count=5
data=1995.
Pears, Gwen and JZKit Training
tag=650, Class=2, form=1, count=2
tag=0, Class=2, form=0, count=2
data= 2
tag=1, Class=2, form=0, count=19
data=Societies, Medical.
tag=650, Class=2, form=1, count=2
tag=0, Class=2, form=0, count=2
data= 2
tag=1, Class=2, form=0, count=14
data=Ophthalmology.
tag=650, Class=2, form=1, count=2
tag=0, Class=2, form=0, count=2
data= 2
tag=1, Class=2, form=0, count=11
data=Anesthesia.
tag=710, Class=2, form=1, count=2
tag=0, Class=2, form=0, count=2
data=2
tag=1, Class=2, form=0, count=30
data=Ophthalmic Anesthesia Society.
22
Pears Database Building
SGML Data Example
For SGML data – (InputRecordtype=SGML)
<Rec>
<Title>BEG - PANHANDLE COLOR INFRARED AERIAL PHOTOGRAPHY</Title>
<Abstract>TNRIS file no. 01010422. File consists of original and duplicate positive
transparencies, color-infrared, stereoscopic, 1:80,000, quad centered, aerial
photography of the Texas Panhandle, flown in September, 1977 by Mark Hurd. </Abstract>
<Spatial-Domain>
<Geographic-Coverage>US STATE</Geographic-Coverage>
<Coverage-Description>TEXAS PANHANDLE</Coverage-Description>
<Bounding-Coordinates>
<West-Bounding-Coordinate>-102</West-Bounding-Coordinate>
<East-Bounding-Coordinate>-98</East-Bounding-Coordinate>
<North-Bounding-Coordinate>30</North-Bounding-Coordinate>
<South-Bounding-Coordinate>26</South-Bounding-Coordinate>
</Bounding-Coordinates>
</Spatial-Domain>
<Time-Period>
<Time-Period-Textual>1977-1977</Time-Period-Textual>
</Time-Period>
<Name>BUREAU OF ECONOMIC GEOLOGY</Name>
<Organization>BUREAU OF ECONOMIC GEOLOGY</Organization>
</Rec>
Pears, Gwen and JZKit Training
.tags file
Title 1
Local-Subject-Index 2
Abstract 3
Spatial-Domain 4
Geographic-Coverage 1
Coverage-Description 2
Bounding-Coordinates 3
West-BoundingCoordinate 1
East-BoundingCoordinate 2
North-BoundingCoordinate 3
South-BoundingCoordinate 4
Time-Period 5
Time-Period-Textual 1
Name 6
Organization 7
23
Converted SGML
tag=0, Class=1, form=1, count=8
tag=1, Class=2, form=1, count=1
tag=1, Class=2, form=0, count=49
data=BEG - PANHANDLE COLOR INFRARED
AERIAL PHOTOGRAPHY
tag=2, Class=2, form=1, count=1
tag=1, Class=2, form=0, count=35
data=AERIAL PHOTOGRAPHY; INFRARED;
TEXAS
tag=3, Class=2, form=1, count=1
tag=1, Class=2, form=0, count=229
data=TNRIS file no. 01010422. File consists of
original and duplicate positive transparencies,
color-infrared, stereoscopic, 1:80,000, quad
centered, aerial .photography of the Texas
Panhandle, flown in September, 1977 by Mark
Hurd.
tag=4, Class=2, form=1, count=3
tag=1, Class=2, form=1, count=1
tag=1, Class=2, form=0, count=8
tag=3, Class=2, form=1, count=4
tag=1, Class=2, form=1, count=1
tag=1, Class=2, form=0, count=4
data=-102
tag=2, Class=2, form=1, count=1
tag=1, Class=2, form=0, count=3
data=-98
tag=3, Class=2, form=1, count=1
tag=1, Class=2, form=0, count=2
data=30
tag=4, Class=2, form=1, count=1
tag=1, Class=2, form=0, count=2
data=26
tag=5, Class=2, form=1, count=1
tag=1, Class=2, form=1, count=1
tag=1, Class=2, form=0, count=9
data=1977-1977
tag=6, Class=2, form=1, count=1
tag=1, Class=2, form=0, count=26
data=BUREAU OF ECONOMIC GEOLOGY
tag=7, Class=2, form=1, count=1
tag=1, Class=2, form=0, count=26
data=BUREAU OF ECONOMIC GEOLOGY
data=US STATE
tag=2, Class=2, form=1, count=1
tag=1, Class=2, form=0, count=15
data=TEXAS PANHANDLE
Pears, Gwen and JZKit Training
24
Pears Database Building
Viewing a BER record - BufferedBerStream
• BER records are not readable in their encoded form.
• BufferedBerStream is a class that includes main() that
dumps BER records in a human readable format.
usage:BufferedBerStream –i<input file> [-n<numrecs>] [-s<skiprecs>]
To see a page at a time:
BufferedBerStream –i<input file> | more
To dump to a file:
BufferedBerStream –i<input file> > filename
Pears, Gwen and JZKit Training
25
Exercise Configuration Information
• The database is in ~/dbs/scifi
• The jar files are in ~/jars
• Aliases are:
alias Bartlett 'java -Xmx800m ORG.oclc.pears.Bartlett.Bartlett'
alias BufferedBerStream 'java ORG.oclc.ber.BufferedBerStream'
alias IndexLoop 'java ORG.oclc.pears.util.IndexLoop'
alias RecordHandler 'java ORG.oclc.RecordHandler.RecordHandler'
alias testgwen 'java ORG.oclc.os.gwen.testgwen'
alias validate 'java ORG.oclc.pears.util.validate'
alias ZClient 'java com.k_int.z3950.client.ZClient'
alias ZServer 'java com.k_int.z3950.server.ZServer'
Pears, Gwen and JZKit Training
26
Exercise Configuration Information
• The CLASSPATH is:
setenv CLASSPATH
.:/home/levan/java:/home/levan/lib/pears.jar:/home/levan/lib/Dbutils.jar:
/home/levan/lib/ki-jzkit-z3950.jar:/home/levan/lib/ki-util.jar:
/home/levan/lib/log4j.jar:/home/levan/lib/a2jruntime.jar:
/home/levan/lib/ki-jzkit-iface.jar:/home/levan/lib/gwen.jar:
/home/levan/lib/xerces.jar
• All of this is in ~/.tcshrc. Just say “tcsh” at
the command line to get it.
Pears, Gwen and JZKit Training
27
Pears Database Building Exercise
Exercise 1:
Identifying Data in a BER Record
• Using the BER records generated from the MARC
data file:
dbs/scifi/scifi.usmarc
identify the tags used for the data.
(Hint: run RecordHandler to make the BER records
and then BufferedBerStream to look at them)
Pears, Gwen and JZKit Training
28
Designing and Building Databases
Topics
A Pears Database Building - Introduction
Database Description File
C Building Databases
D Configuring and Testing
E Database Utilities and Maintenance
F Advanced Database Description
Concepts
Pears, Gwen and JZKit Training
29
Database Description File
Function
• The database description is a text file that
you set up to determine:
– Database Indexing
– What Indexes support proximity searching
– What Index contains the unique recordID
• Known as the <filename>desc.ini file
Pears, Gwen and JZKit Training
30
Database Description File
File Example
Database Name
Accession index
Raw Data Type
[DB]
Name=scifi
RecordIDIndex=17
InputRecordType=USMARC
Index definitions
Index ID
Indexing Routine
Field to be indexed
[Title]
index=1
routine=ORG.oclc.pears.IndexRoutines.Words
tagpath*=245/1
tagpath*=245/2
[Author]
index=3
routine=ORG.oclc.pears.IndexRoutines.Words
tagpath*=100/1
tagpath*=100/2
tagpath*=700/1
[Control Number]
index=5
routine=ORG.oclc.pears.IndexRoutines.Words
tagpath=1
Pears, Gwen and JZKit Training
31
Database Description File
General Database Information
• The [DB] section provides the database name,
accession index and input record type
• Syntax:
– [DB]
– Name=<database name>
– RecordIDIndex=<index number>
– InputRecordType=<RecordHandler type>
Pears, Gwen and JZKit Training
32
Database Description File
General Database Information
Example:
[DB]
Name=Test
RecordIDIndex=1
InputRecordType=SGML
Pears, Gwen and JZKit Training
33
Database Description File
Setting up Index Definitions
• Any number of independent indexes can be
defined.
• An index can be made from multiple fields.
– Example: index 1 may include title, author, notes,
etc.
• Indexes can share fields.
– Example: index 2 may also include title
Pears, Gwen and JZKit Training
34
Database Description File
Setting up Index Definitions
• An index section is any section with Index,
Routine and Tagpath
• Syntax:
– [<Index Name>]
– Index=<index number>
– Routine=<index routine>
– Tagpath*=<path to field>
– OccurrenceRoutine=<proximity routine>
Pears, Gwen and JZKit Training
35
Database Description File
Setting up Index Definitions
• index number is any number
• Index routine defines how the term is extracted
- use ORG.oclc.pears.IndexRoutines.Words for basic
keywords
- use ORG.oclc.pears.IndexRoutines.Phrase for
basic bound phrases
• path to field contains a list of BER tags separated by
slashes
• occurrence routine (optional) specifies the routine to
add proximity information to the index
Pears, Gwen and JZKit Training
36
Database Description File
Index Definition
Example:
[Title Words]
Index=2
Routine=ORG.oclc.pears.IndexRoutines.Words
Tagpath*=245/1
Tagpath*=245/2
Pears, Gwen and JZKit Training
37
Database Description File
Term Adjacency (Optional)
• Defines positional information stored with
each indexed term.
• Adjacency information is stored at build time
on a per record basis, so is within fields, NOT
across field boundaries.
• Set by the OccurrenceRoutine.
• ORG.oclc.pears.Bartlett.wordfield is most
commonly used.
Pears, Gwen and JZKit Training
38
Database Description File
Index Definition with Adjacency
Example:
[Title Words]
Index=2
Routine=ORG.oclc.pears.IndexRoutines.Words
OccurrenceRoutine=ORG.oclc.pears.Bartlett.wordfield
Tagpath*=245/1
Tagpath*=245/2
Pears, Gwen and JZKit Training
39
Database Description File
Global Stopwords
• List of terms NOT indexed
• Syntax:
[Stopwords]
index=0
routine=ORG.oclc.pears.IndexRoutines.StopwordEnforcer
tagpath=none
stopword*=<word>
Pears, Gwen and JZKit Training
40
Database Description File
Global Stopwords
• Example:
[Stopwords]
index=0
routine=ORG.oclc.pears.IndexRoutines.StopwordEnforcer
tagpath=none
stopword*=and
stopword*=the
Pears, Gwen and JZKit Training
41
Database Description File
Index Specific Stopwords
• Syntax:
[<index name>]
Index=<index number>
Routine=<index routine>
Tagpath*=<path to field>
Stopword*=<word>
Pears, Gwen and JZKit Training
42
Database Description File
Index Definition with Stopwords
Example:
[Title Words]
Index=2
Routine=ORG.oclc.pears.IndexRoutines.Words
OccurrenceRoutine=ORG.oclc.pears.Bartlett.wordfield
Tagpath*=245/1
Tagpath*=245/2
Stopword*=and
Stopword*=the
Pears, Gwen and JZKit Training
43
Database Description File
Exercise 2:
Identifying Database Description Indexes
• View the database description file
(dbs/scifi/scifidesc.ini) that has been created
for your student account. Identify what
indexes will be created from this file.
Pears, Gwen and JZKit Training
44
Designing and Building Databases
Topics
A Pears Database Building - Introduction
B Database Description File
Building A Database
D Configuring and Testing
E Database Utilities and Maintenance
F Advanced Database Description Concepts
Pears, Gwen and JZKit Training
45
Building A Database
Program Steps
1.) Convert Input Data
2.) Store Records and Extract Index Terms
3.) Sort Extracted Terms
4.) Update Index and Postings
Pears, Gwen and JZKit Training
46
Building a Pears Database
Program Steps - Illustrated
desc.ini
Database
Description
Input
Data
Pears, Gwen and JZKit Training
Bartlett
.pdb file
Databas
e
47
Building A Database
Bartlett
usage: Bartlett <dbname> -i<InputFileName> -d<dbdesc.ini>
[-n<numrecs>] [-s<skipnum>] [-t<numThreads>]
[-w<sorted nip filename>] [-fX]
where the -f flags (which turn things on) are:
-fg: guaranteed that all records are adds
-fn: printing to a file / use newlines
-fu: update the stored database description with a new one
All of the arguments are optional, but somehow you must specify an input
file and a database file. If you specify <dbname> then the others
default to -i<dbname>.recordType and -d<dbname>desc.ini
Pears, Gwen and JZKit Training
48
Building A Database
Validate a Database
• Use validate to verify the internal correctness of a database
• usage: java validate <dbname> [-count] [-records]
[-index] [-data] [-postings] [-regions] [-all]
-count
means validate the record count
-records means validate the records and implies -count
-index
means validate the index structure
-data
means validate the data for each index term and
implies -index
-postings means validate the postings list for each term and
implies -data
-all
means validate everything
Pears, Gwen and JZKit Training
49
Building a Database
Exercise 3
Build and validate the scifi database
– cd dbs/scifi
– type: Bartlett scifi
– type: validate scifi -all
Pears, Gwen and JZKit Training
50
Designing and Building Databases Topics
A. Pears Database Building - Introduction
B. Database Description File
C. Building A Database
 Configuring and Testing
E. Database Utilities and Maintenance
F. Advanced Database Description Concepts
Pears, Gwen and JZKit Training
51
Configuring and Testing
Test using testgwen
testgwen is a command line search engine
that demonstrates how to embed
searching in your java applications
usage: testgwen –p<database.properties>
Pears, Gwen and JZKit Training
52
Configuring and Testing
Test using testgwen
scifi.properties:
database.name=scifi
implementation.class=ORG.oclc.os.pearsgwen.pDatabase
pearsgwen.inifileName=scifi.ini
#CQL Stuff
qualifier.srw.serverChoice= 1=1016
qualifier.dc.title= 1=4
structure.*= 4=6
Pears, Gwen and JZKit Training
53
Configuring and Testing
Test using testgwen
scifi.ini:
[Database]
ZBaseDbType=ORG.oclc.db.DbNewton
class=ORG.oclc.pears.pears
dbName= scifi
LongName = SiteSearch example USMARC database
pdbFile=scifi.pdb
# this allows for more than 1 attribute type BIB1, EXP1, ZDSR
[attributes]
type1=BIB1attributes
Pears, Gwen and JZKit Training
54
Configuring and Testing
Test using testgwen (scifi.ini continued)
[BIB1attributes]
OID=BIB1
default=words
parse_mode = 0
browse_default=0
stopwords= default
operator= 0
index* = titleWords
index* = subjectCategoryCodes
index* = authorWords
index* = titlePhrase
…
Pears, Gwen and JZKit Training
55
Configuring and Testing
Test using testgwen (scifi.ini continued)
[titleWords]
use=4
structure=2
alternateID=1
filter=ORG.oclc.pears.IndexRoutines.Words
[subjectCategoryCodes]
use=20
structure=2
alternateID=2
filter=ORG.oclc.pears.IndexRoutines.Words
Pears, Gwen and JZKit Training
56
Configuring and Testing
Test using testgwen
testgwen commands:
BROWSE
b[rowse] [numberOfTerms] [positionOfSeed] <browseTerm>
numberOfTerms defaults to 10
positionOfSeed defaults to numberOfTerms/2
example: b dc.author=smith
SEARCH
s[earch] <query>
example: s dog
DISPLAY DOCUMENT
d[ocument] [startpoint][-endpoint]
startpoint defaults to 1
endpoint defaults to 1
example: d 1
Pears, Gwen and JZKit Training
57
Configuring and Testing
testgwen testing suggestions
• Test the indexes with the browse command
• Browse the top and bottom of the index;
garbage in the records tends to go there
• Browse all of your indexes to verify that
indexing rules
• Test the postings lists with searches
Pears, Gwen and JZKit Training
58
Configuring and Testing
testgwen testing suggestions
• Test the records with ‘d’isplay commands
e.g. d 1 to view the first record from the latest
search
Pears, Gwen and JZKit Training
59
Configuring and Testing
Exercise 4
Test your scifi database using testgwen
• testgwen –pscifi.properties
• b dog
• b dc.author=smith
• s dc.title=“ninja turtles”
• d
• q
Pears, Gwen and JZKit Training
60
Configuring and Testing
Expose your database using JZKit’s ZServer
JZKit is an OpenSource Z39.50 server and client
package
– http://www.k-int.com/products/jzkit/index.php
We have embedded gwen inside of the JZKit Server
through database interfaces provided in JZKit. This
allows the JZKit server to search Pears databases
Pears, Gwen and JZKit Training
61
Configuring and Testing
Expose your database using JZKit’s ZServer
Usage: ZServer <ZServer.PropertiesFile>
ZServer.props:
port=2105
evaluator=ORG.oclc.os.jzkit.GwenSearchable
Gwen.configuration=gwen.properties
#
# Record conversion configuration
#
XSLConverterConfiguratorClassName=
com.k_int.IR.Syntaxes.Conversion.XMLConfigurator
ConvertorConfigFile=./SchemaMappings.xml
Pears, Gwen and JZKit Training
62
Configuring and Testing
Expose your database using JZKit’s ZServer
gwen.properties:
gwen.db1=scifi.properties
Scifi.properties:
The same as for testgwen!
Pears, Gwen and JZKit Training
63
Configuring and Testing
Expose your database using JZKit’s ZServer
Converting your database records to Z39.50 records:
SchemaMappings.xml:
<SchemaMappings>
<templatesource directory="./mappings"/>
<mapping from="OCLCRecord" to="sutrs" sheet="naiveMarcBerToSutrs.xsl"/>
<mapping from="OCLCRecord" to="meta" sheet="naiveMarcBerToMeta.xsl"/>
<mapping from="meta" to="usmarc" sheet="meta_to_usmarc.xsl"/>
</SchemaMappings>
Pears, Gwen and JZKit Training
64
Configuring and Testing
Search your database using JZKit’s ZClient
usage: ZClient
Commands:
open hostname[:portnum]
- Connect to z server on host[:port]
show n[+i]
- show i records starting at n
find [rpn-string]
- Process the supplied rpn query
base db1 [db2.....]
- Search the specified databases
format [ xml|sutrs|grs..] - Ask the server for the specified kind of records
scan [rpn-string]
Pears, Gwen and JZKit Training
65
Configuring and Testing
Search your database using JZKit’s ZClient
usage: ZClient
rpn strings are composed as follows:
rpn-string = @attrset default-attrset expr
expr = [ attr-plus-term | boolean ]
attr-plus-term = attrdef [ attrdef...] { single-term | "quoted string" }
attrdef = @attr [attrset] attrtype=attrval
boolean = { @and | @or | @not } expr expr
Pears, Gwen and JZKit Training
66
Configuring and Testing
Exercise 5
• Start Zserver
– ZServer ZServer.props&
• Test the database files with Zclient
– Zclient
– open localhost:2105
– base scifi
– find @attrset bib-1 @attr 1=1016 @attr 4=2 dog
– quit
Pears, Gwen and JZKit Training
67
Designing and Building Databases
Topics
A. Pears Database Building - Introduction
B. Database Description File
C. Building Databases
D. Configuring and Testing
 Database Utilities and Maintenance
F. Advanced Database Description Concepts
Pears, Gwen and JZKit Training
68
Database Utilities and Maintenance
General Database Information Report
Indexloop:
•
usage: java IndexLoop <dbname> [-b<num>][-d<num>][-i<index>]
[-n<num>]
[-t<num>] [-f]
-b the number of terms from the bottom of the index to be returned
(default is 0)
-d the number of terms distributed through the index to be returned
(default is 0)
-n the number of the most highly posted terms to be returned
(default is 100)
-t the number of terms from the top of the index to be returned
(default is 0)
Pears, Gwen and JZKit Training
69
Database Utilities and Maintenance
Exercise 6:
Using the Database Utilities
• Run IndexLoop against the scifi database
– IndexLoop scifi
Pears, Gwen and JZKit Training
70
Designing and Building Databases
Topics
A Pears Database Building - Introduction
B Database Description File
C Building Databases
D Configuring and Testing
E Database Utilities and Maintenance
 Advanced Database Description Concepts
Pears, Gwen and JZKit Training
75
Advanced Database Concepts
Topics
• Restrictors
• Replacing and Deleting Records
Pears, Gwen and JZKit Training
76
Advanced Database Concepts Record
Restrictions
• Used to additionally qualify indexes.
• Speeds up Boolean searching.
• Can only be used in combination with another
search term.
• One database can have multiple restrictors
defined.
• Can be linked with a searchable index.
– by shared id
Pears, Gwen and JZKit Training
77
Advanced Database Concepts Record
Restrictions
• Practical with data that has a defined range.
– categories like publication type
– range like publication date
– language
• Binary value
– set on a per-record basis.
– stored in the postings entry for each extracted
term.
Pears, Gwen and JZKit Training
78
Advanced Database Concepts Defining
Record Restrictions
• Syntax:
[docrule<n>]
index=<index number>
routine=ORG.oclc.pears.Bartlett.termrest
parameters=<terms to use as restrictors>
• Example:
[docrule1]
index=24
routine=ORG.oclc.pears.Bartlett.termrest
parameters=english german french
Pears, Gwen and JZKit Training
79
Advanced Database Concepts
Defining Record Restrictions
• Link to an index by using the same Id.
• routine - rule used for setting the
restriction.
• parameters - specific to restriction routine.
Pears, Gwen and JZKit Training
80
Advanced Database Concepts Replace and
Delete Records
• Unique record key is in index <RecordIDIndex>.
• If a record is added that has the same unique
record key as a previous record, then the new
record replaces the existing record.
• HandleUSMARC uses record status values
from the MARC fixed fields to delete records.
Pears, Gwen and JZKit Training
81
Advanced Class Topics
• A class on Advanced Database Building will cover:
– Building databases with SGML data.
– Advanced restrictor concepts.
– Debugging of data errors.
– and more exciting topics too numerous to
mention.
Pears, Gwen and JZKit Training
82
Pears
Designing and Building Databases
...and that’s how you test your new database.
What questions do you have?
Pears, Gwen and JZKit Training
83