www.miketaylor.org.uk

Download Report

Transcript www.miketaylor.org.uk

Advanced CQL and Profiling
1. Esoteric CQL features:
– Word Anchoring
– Proximity
– Relation modifiers
– Boolean modifiers
2. Profiling
3. Prefix mapping
4. Defining relations
Advanced CQL and Profiling
Mike Taylor <[email protected]>
CQL features: esoterica
“You are not expected to understand this.”
– comment in the Unix Version 7 source code.
The point is that new users are not required to understand
this, and may happily use CQL for many years – perhaps
forever – without needing to.
Advanced CQL and Profiling
Mike Taylor <[email protected]>
CQL esoterica: word anchoring
A word beginning with “^” must occur at the start of its
field. A word ending with “^” must occur at the end of
its field.
●
●
●
●
●
●
dinosaur
dinosaur^
^dinosaur
the
^the
the^
Advanced CQL and Profiling
– matches “the complete dinosaur”
– also matches
– does not match
– matches “the complete dino
– also matches
– does not match
Mike Taylor <[email protected]>
CQL esoterica: proximity
The “prox” boolean, by default, requires its operands
to be next to each other, in either order:
●
●
cervical prox vertebra
– equivalent to
"cervical vertebra" or "vertebra cervical"
(cervical or dorsal) prox vertebra
– equivalent to
"cervical vertebra" or "dorsal vertebra" or
"vertebra cervical" or "vertebra dorsal"
Advanced CQL and Profiling
Mike Taylor <[email protected]>
CQL esoterica: proximity II
Modifiers can generalise the semantics of proximity:
●
●
●
●
cervical prox/distance<=5/ vertebrae
– within five words of each other
cervical prox/distance=0/unit=sentence vertebrae
– within the same sentence
cervical prox/distance>0/unit=paragraph vertebrae
– in different paragraphs
cervical prox/ordered vertebrae
– in the specified order: exactly equivalent to
"cervical vertebra"
Advanced CQL and Profiling
Mike Taylor <[email protected]>
CQL esoterica: relation modifiers
Modifiers can refine the semantics of relations:
●
●
●
●
title =/stem dig
– finds “dig”, “digging”, “dug”, etc.
title any/relevant "dinosaur bird reptile"
– finds “sauropods”, “avian”, “crocodile”, “snake”, etc.
author =/fuzzy tailor
– finds “Mike Taylor”
phoneNumber exact/fuzzy "020 8348 6768"
– finds “020 8348 6769”
Advanced CQL and Profiling
Mike Taylor <[email protected]>
CQL esoterica: relation modifiers II
Relation modifiers can be overloaded to specify extra
information about the term that the relation joins to the
index:
●
●
createdDate >/isoDate "2004-03-12 09:45:00"
– the term is in ISO 8601 format.
location within/geom.polygon "(12,46) (15,52)"
– the term indicates a polygon of two points (i.e. a
straight line) rather than the corners of a rectangle.
Advanced CQL and Profiling
Mike Taylor <[email protected]>
CQL esoterica: boolean modifiers
Modifiers can refine the semantics of boolean operators.
We've already seen some examples of this in proximity.
●
●
●
●
●
cervical prox/distance<=5/ vertebrae
– within five words of each other
cervical or/exclusive vertebrae
– one or the other, but not both.
"denenberg or/rel.mean "information retrieval"
"denenberg or/rel.sum "information retrieval"
"denenberg or/rel.max "information retrieval"
– average, total or maximum relevance of operands
Advanced CQL and Profiling
Mike Taylor <[email protected]>
Profiling CQL
For simple searching, it suffices to use common indexes.
Semantic interoperability requires more precise behaviour.
This lesson was learned in the Z39.50 world and resulted in
the invention of “profiles” - specifications for a subset of the
full specification that are needed to support an application.
The classic example in Z39.50 is a Bath Profile for
bibliographic searching.
Similarly, we define a Bath Profile for CQL searching.
Advanced CQL and Profiling
Mike Taylor <[email protected]>
Profiles and context sets
A profile is not the same thing as a context set!
●
●
A context set is merely a bag of indexes (and relation modifi
and boolean modifiers) that may be used in any application
A profile provides a palette of indexes drawn from several
context sets.
The distinction is similar to that between XML namespaces an
XML Schemas.
●
●
Schemas depend on namespaces, and may use several.
CQL profiles depend on context sets, and may use several.
Advanced CQL and Profiling
Mike Taylor <[email protected]>
Example: the Bath Profile
See http://zing.z3950.org/srw/bath/2.0/
Bath searches may use any of the following indexes:
dc.creator
dc.title
dc.subject
cql.anywhere
dc.identifier
dc.date
bath.keyTitle
dc.format
dc.language
bath.possessingInstitution
bath.name
Advanced CQL and Profiling
bath.personalNam
bath.corporateNa
bath.conferenceN
bath.uniformTitle
bath.issn
rec.id
bath.geographicName
bath.notes
bath.topicalSubject
bath.genreForm
Mike Taylor <[email protected]>
Existing and possible profiles
Explicit CQL profiles have been created for some applications
●
Bath Profile for bibliographic data
●
Zthes profile for hierarchical thesaurus navigation
Profile are in development (or “unwritten”) for others:
●
Google-like structureless searching
●
Simple metadata searching with the Dublin Core
●
CCG for collectable card games
●
Music – musicalKey, arranger, duration, etc.
●
GILS (Global Information Locator Service)
Advanced CQL and Profiling
Mike Taylor <[email protected]>
CQL esoterica: prefix mapping
So far, we have been free and easy with index prefixes
such as “dc”. But how do we know what they mean?
Why should “dc” mean Dublin Core rather than Deep
Custard?
●
dc.custardDepth <= 20
Why should “bath” mean the Bath Profile for bibliographic
searching instead of plumbing supplies?
●
bath.capacityInGallons > 45
Advanced CQL and Profiling
Mike Taylor <[email protected]>
CQL esoterica: prefix mapping II
Prefixes are just convenient, easy-to-type abbreviations.
The real identifier of a context set is its URI.
For example, the Dublin Core context set is
info:srw/cql-context-set/1/dc-v1.1
but we map that URI to a prefix for convenience.
This is exactly like XML namespaces: they are identified
by URIs, but the URIs do not appear in the names of
elements or attributes: short prefixes are used instead.
Advanced CQL and Profiling
Mike Taylor <[email protected]>
CQL esoterica: prefix mapping III
In XML, a prefix is associated with a namespace using:
●
<element xmlns:prefix="http://example.org/xyz/">
In CQL, a prefix is associated with a namespace using:
●
>prefix=http://example.org/xyz/
and the rest of the query follows.
The following queries are exactly equivalent:
●
>dc=info:srw/cql-context-set/1/dc-v1.1 dc.title=fish
●
>yx=info:srw/cql-context-set/1/dc-v1.1 yx.title=fish
Most applications will have established default mappings.
Advanced CQL and Profiling
Mike Taylor <[email protected]>
CQL esoterica: prefix mapping IV
It is possible to establish the context set from which
indexes with no explicit prefix are taken by omitting the
“prefix=” part from the mapping:
●
>http://example.org/heraldry/
title=baron and side=sinister
So the following queries are exactly equivalent:
●
>info:srw/cql-context-set/1/dc-v1.1 title=fish
●
>yx=info:srw/cql-context-set/1/dc-v1.1 yx.title=fish
Advanced CQL and Profiling
Mike Taylor <[email protected]>
CQL esoterica: prefix mapping V
Finally ... Finally! :-)
Prefix mappings can be stacked up:
●
>dc = info:srw/cql-context-set/1/dc-v1.1
>bath=http://zing.z3950.org/cql/bath/2.0/
>rec=info:srw/cql-context-set/2/rec-1.0
rec.created < 2004-10-09 and
dc.title=ecology and
bath.conferenceName=dinosaur
(Yes, this is all one query.)
Advanced CQL and Profiling
Mike Taylor <[email protected]>
CQL esoterica: prefix mapping VI
Don't try this at home.
Advanced CQL and Profiling
Mike Taylor <[email protected]>
Defining relations
CQL has a “feature” where any word can act as a relation.
For example, the query:
foo bar baz
is interpreted as index-name “foo”, relation “bar”, term
“baz” – even though there is no relation “bar”.
This is a misfeature. it prevents the obvious interpretation
of this query as a phrase-search or AND search.
If your profile needs a new relation, consider defining it as
a relation modifier on one of the existing relation, instead.
Advanced CQL and Profiling
Mike Taylor <[email protected]>
Thanks for listening!
Advanced CQL and Profiling
Mike Taylor <[email protected]>