Mark Logic Overview August 25, 2004

Download Report

Transcript Mark Logic Overview August 25, 2004

Unlock Content™
Mark Logic Introduction
September 14, 2005
Copyright © 2005 MarkLogic Corporation. All rights reserved.
Slide 1
Topics
Introducing MarkLogic Corporation
How MarkLogic helps information providers
Introducing MarkLogic Server
Search functionality in depth
Conclusion
Copyright © 2005 MarkLogic Corporation. All rights reserved.
Slide 2
MarkLogic History
Founders foresaw a perfect storm for content
Emergence of XML as dominant markup standard
General dissatisfaction with search engine inflexibility
Limited support for content in RDBMSs
XML content landing in files with no way to process it
Why did we tag all this stuff in the first place?
Saw XQuery as an open, standard solution
Oracle, SQL, data  MarkLogic, XQuery, content
Founders joined forces in 2/01 to create MarkLogic
Copyright © 2005 MarkLogic Corporation. All rights reserved.
Slide 3
MarkLogic Corporate Overview
Founded by Chris Lindblad (InfoSeek) and Paul Pedersen (Google)
Approximately 50 employees
Headquartered in San Mateo, California
Rapidly growing: > 300% sales growth in 2004
Privately held: $18M raised in two financing rounds
Top-tier backers: Sequoia Capital and Lehman Brothers
Apple, Documentum, Google, Informix, Oracle, Symantec, Yahoo
Experienced management
Business Objects, Google, InfoSeek, Oracle, PayPal, US Web
Four patents pending
Copyright © 2005 MarkLogic Corporation. All rights reserved.
Slide 4
MarkLogic Customers
Adexs
Alberta Learning
Air Force Research Laboratory
Cedars Sinai Medical Center
Defense Technical Information
Center
Defense Information Systems
Agency
Elsevier
Copyright © 2005 MarkLogic Corporation. All rights reserved.
New England Journal of Medicine
Nerac
O’Reilly Media
Thomson Findlaw
United States Army
University of Virginia Press
WoltersKluwer
Slide 5
Topics
Introducing MarkLogic Corporation
How MarkLogic helps information providers
Introducing MarkLogic Server
MarkLogic case studies
Conclusion
Copyright © 2005 MarkLogic Corporation. All rights reserved.
Slide 6
Information Provider Industry Challenges
Internet-driven industry transformation
What’s next after print?
Lack of agility in new product development
Changing times demand rapid experimentation
Inability to repurpose and/or integrate content
Want to leverage existing content to make new products
Differentiation from Google search
Commodity basic search: Internet, journals, books (coming)
Loss of brand/site – content provider for Google
The open access movement in scholarly publishing
Seismic changes driven by tight budgets and academic pushback
“It’s about the content, not the container!”
Copyright © 2005 MarkLogic Corporation. All rights reserved.
Slide 7
How MarkLogic Helps Information Providers
We accelerate the creation of new information products
Content repurposing
Using the same content in multiple products
Content integration
Building products with content from different sources
Content delivery
Delivering content to multiple output formats and devices
Custom publishing
User-driven creation of unique information products
Content mining
Information discovery
Copyright © 2005 MarkLogic Corporation. All rights reserved.
Slide 8
Topics
Introducing MarkLogic Corporation
How MarkLogic helps information providers
Introducing MarkLogic Server
Search functionality in depth
Conclusion
Copyright © 2005 MarkLogic Corporation. All rights reserved.
Slide 9
Introducing MarkLogic Server
The industry’s leading XML content server
Query
Standard text search
Element-level XML search
Native XQuery interface
Manipulate
Navigate within content
Modify content programmatically
Combine content from multiple sources
Render
Transform XML schema or DTDs
Output to various formats and devices
Copyright © 2005 MarkLogic Corporation. All rights reserved.
Slide 10
Native XQuery Support
MarkLogic supports XQuery as its native interface
Query language designed for querying XML data and content
An open, W3C standard
Example content query: quality assurance
for $proc in /book/section[title = "Procedure"]
where not (some $a in $proc//anesthesia
satisfies $a << ($proc//incision)[1])
return $proc
Copyright © 2005 MarkLogic Corporation. All rights reserved.
Slide 11
Native XQuery Support
MarkLogic supports XQuery as its native interface
Query language designed for querying XML data and content
An open, W3C standard
Example content query: quality assurance
Find all medical procedures that have incision before anesthesia
for $proc in /book/section[title = "Procedure"]
where not (some $a in $proc//anesthesia
satisfies $a << ($proc//incision)[1])
return $proc
Copyright © 2005 MarkLogic Corporation. All rights reserved.
Slide 12
Manipulate Content
Navigate within content
Walk through the tree structure of the document – e.g.,
Create breadcrumb trail to top of document
Move to adjacent paragraphs, illustrations, tables, or captions
Modify content programmatically
Translate content to different languages
Alphabetize index terms and produce new index sheet
Summarize by returning lead paragraphs or topic sentences
Combine content from multiple sources
Nested queries across content sources
Create common index across content from multiple sources
Copyright © 2005 MarkLogic Corporation. All rights reserved.
Slide 13
Render Content
Flexibly output content for multi-channel delivery
XHTML for web browsers
XSL-FO for PDF generation, custom publishing
WML for mobile devices
Office XML for Microsoft Office documents
High-performance, server-based transformations
Performed close to the content
Faster than XSLT
Copyright © 2005 MarkLogic Corporation. All rights reserved.
Slide 14
Topics
Introducing MarkLogic Corporation
How MarkLogic helps information providers
Introducing MarkLogic Server
Search functionality in depth
Conclusion
Copyright © 2005 MarkLogic Corporation. All rights reserved.
Slide 15
Search Processing Model
(1) Specify a search using composable constructors
cts:and-query(("wrist", "injury"))
cts:or-query((cts:and-query(("cat", "scratch"))
cts:and-query(("dog", "bite")) ))
cts:and-not-query(“United States”, "Texas")
cts:element-query(xs:QName("Year"),
cts:or-query(("1980", "1981")))
(2) Define a searchable set of nodes
//MedlineCitation[
Journal/JournalIssue/PubDate/Year = "1980"]
(3) Apply the search query to the nodes
cts:search(//MedlineCitation,
cts:and-query(("wrist", "injury")))
(4) Return the results in relevance order
Copyright © 2005 MarkLogic Corporation. All rights reserved.
Slide 16
Free Text Search
cts:word-query( $text as xs:string,
[$options as xs:string*],
[$weight as xs:double] )
as cts:word-query
Options include:
"case-sensitive“
Specifies a case-sensitive query
"case-insensitive“ Specifies a case-insensitive query
"punctuation-sensitive"
Specifies a punctuation-sensitive query
"punctuation-insensitive" Specifies a punctuation-insensitive query
"stemmed"
Specifies a stemmed query
"unstemmed"
Specifies an unstemmed query
"wildcarded"
Specifies a wildcarded query
"unwildcarded“
Specifies an unwildcarded query
"lang=en“
Specifies, (e.g.) that the query is in English
Copyright © 2005 MarkLogic Corporation. All rights reserved.
Slide 17
Boolean Queries
cts:and-query()
conjunction of an arbitrary lists of sub-queries
cts:or-query()
disjunction of an arbitrary lists of sub-queries
cts:and-not-query()
relative complement of two sub-queries
cts:not-query()
complement of a single sub-query
Copyright © 2005 MarkLogic Corporation. All rights reserved.
Slide 18
Linguistic Controls
“case-insensitive” options
Configuration option to add case-sensitive index terms
“case-sensitive”,
cts:word-query(“Genetic Engineering”,”case-insensitive”)
“punctuation-insensitive”
Configuration option to add punctuation-sensitive index
terms
“punctuation-sensitive”,
cts:word-query(“Genetic-Engineering”,”punctuationinsensitive”)
Stemming - “stemmed”, “unstemmed” query options
Stemming does not cross different parts of speech
Thesaurus – XML Schema, query expansion
Copyright © 2005 MarkLogic Corporation. All rights reserved.
Slide 19
Spelling
Double-metaphone
Spelling suggestions
spell:suggest("/mySpell/spell.xml","alfabet")
Spell checking
spell:is-correct("/mySpell/spell.xml","alfabet")
Dictionary load and management
spell:load("c:\dictionaries\spell.xml",
"/mySpell/spell.xml")
spell:add-word("/mySpell/spell.xml",”uxorious”)
spell:remove-word("/mySpell/spell.xml","atomise")
Copyright © 2005 MarkLogic Corporation. All rights reserved.
Slide 20
Wildcards
A*, *B, A*B, A?, ?B, A?B, A*B*C, A*B?C*, etc.
Regular expression optimization
For example:
cts:search(input(), cts:word-query("he*"))
will result in a wildcard search
Character indexing provides optimization for
fn:contains(),
fn:matches(),
fn:starts-with(), fn:ends-with()
Copyright © 2005 MarkLogic Corporation. All rights reserved.
Slide 21
Proximity Queries
cts:near-query($queries, $distance, $ordered, $weight)
The results match if two queries match and the distance
between the two matches is equal to or less than the specified
distance. A distance of 0 matches only when there is
overlapping text. The default value is 100.
For example,
cts:search(//p,
cts:near-query(
(cts:word-query("James"),
cts:word-query("Maxwell")), 2))
Copyright © 2005 MarkLogic Corporation. All rights reserved.
Slide 22
Proximity Queries
For example,
cts:search(//p,
cts:near-query((
cts:near-query(("James","Maxwell"), 2),
cts:near-query(("Albert", "Einstein"), 2),
cts:near-query(("Lorentz", "Contraction"), 2)
), 50, "unordered"))
Copyright © 2005 MarkLogic Corporation. All rights reserved.
Slide 23
Beyond Free Text Search
XML Query / search integration
XML granular search
XPath constraints
Rich interaction between text and structural constraints
Free access to all fields and combinations of constraints
XML searchable database
Integrate data, metadata, search and update
Copyright © 2005 MarkLogic Corporation. All rights reserved.
Slide 24
Range Queries
Numeric range queries are optimized with range indexes
//article[date <= xs:date("2002-10-10T17:00:00Z")]
Lexicographic range queries, likewise
//article[("A" <= name) and (name < "B")]
Sort optimization uses range indexes to eliminate post-sort
for $x in //article
order by $x/last/name, $x/first/name
return <li>{ $x/date }</li>
Copyright © 2005 MarkLogic Corporation. All rights reserved.
Slide 26
Structured Highlighting
Embed hyperlink to commercial drug equivalent for each instance
of a generic drug:
define function
lookup-drug-name($name as xs:string)
{
<xhtml:selection>
{
doc(“drug-list.xml”)/name[.=$name]/variants
}
</xhtml:selection>
}
for $a in cts:search(//articles, "ibuprofen“ )
return
cts:highlight($a, cts:word-query( "ibuprofen“ ),
lookup-drug-name( "ibuprofen” ))
Copyright © 2005 MarkLogic Corporation. All rights reserved.
Slide 27
Topics
Introducing MarkLogic Corporation
How MarkLogic helps information providers
Introducing MarkLogic Server
Search functionality in depth
Conclusion
Copyright © 2005 MarkLogic Corporation. All rights reserved.
Slide 28
Conclusion
A leading company
Leading provider of XML content servers
Top-tier backers, including Sequoia Capital
Experienced management team
An innovative product
Search, manipulate, and render content
Scales to multiple terabyte contentbases
Complementary to existing RDBMS and ECM systems
Exciting solutions
Accelerate new product development
Differentiate services from Internet search engines
Integrate and repurpose content
Copyright © 2005 MarkLogic Corporation. All rights reserved.
Slide 29