Introduction to Lucene and Solr

Download Report

Transcript Introduction to Lucene and Solr

Introduction to Apache Lucene/Solr

CSCI 572: Information Retrieval and Search Engines Summer 2010

Outline

• • • •

What is Lucene/Solr?

Where did it come from?

What are the current versions of Lucene/Solr?

What can it do?

CAM-2 May-20-10 CS572-Summer2010

Apache Lucene

• • • •

The brainchild of Doug Cutting Free-text indexing library that implements most of the functionality I’ve talked to you about

Query Models, Ranking, Indexing Core API is implemented in Java

C++/C, Ruby, Python APIs as well, but small communities or automatically generated Initially Sourceforge, moved to Apache in 2001

May-20-10 CS572-Summer2010 CAM-3

Apache Solr

• • • • • •

Originally developed at CNET Web service layer built on top of Lucene library Provides schema and understanding of field types, conversion to and from representation Provides huge-scale scalability, deployed on top of application server like Tomcat or Jetty P/L independent programming APIs Sharing, replication, faceting, highlighting, explain, more like this and other functionality provided easily

May-20-10 CS572-Summer2010 CAM-4

How to get started

Lucene (2.9.2 and 3.0.1 stable)

– – –

Put your Java hat on Have Eclipse ready or your favorite IDE Download lucene-core-.jar from

http://repo1.maven.org/maven2/org/apache/lucene/

Download src and build from

http://www.apache.org/dyn/closer.cgi/lucene/java/

Check out some example Java code that demonstrates indexing and querying from Otis Gospodnetic

http://onjava.com/pub/a/onjava/2003/01/15/lucene.html

May-20-10 CS572-Summer2010 CAM-5

How to get started

Solr

Grab a release of Solr (1.4.0 stable)

http://www.apache.org/dyn/closer.cgi/lucene/solr/

– –

Unpack into e.g., /usr/local/solr Deploy onto tomcat

• •

Install tomcat into /usr/local/tomcat Create solr.xml file and drop into /usr/local/tomcat/conf/Catalina/localhost/

Create solr.home JNDI property and point to /usr/local/solr/solr

Start tomcat

Head over to $solr/example/example-docs

curl http://localhost:8983/solr/update -H 'Content-type:text/xml; charset=utf-8' --data-binary @artists.xml

May-20-10 CS572-Summer2010 CAM-6

Modifying your schema.xml

• • •

Field Types Analyzers Tokenizers

http://wiki.apache.org/solr/SchemaXml May-20-10 CS572-Summer2010 CAM-7

Solr Faceting

• •

facet=on&facet.field=&facet.field=… http://wiki.apache.org/solr/SimpleFacetParameters

May-20-10 CS572-Summer2010 CAM-8

Advanced Topics

• • • •

Standing up cores Sharding Replication Zookeeper and Cloud

May-20-10 CS572-Summer2010 CAM-9

Development currently in flux

• • •

Stick with release versions Depending on trunk won’t really help Lucene and Solr have merged

May-20-10 CS572-Summer2010 CAM-10

Wrapup

• •

Lots more information at

– – –

http://lucene.apache.org

http://lucene.apache.org/solr/ http://lucene.apache.org/java/ Possible projects

Geospatial search

Improving existing code and contributing back to Apache SIS and to Apache Solr

– –

Improving date faceting Rewriting the ResponseWriter framework

May-20-10 CS572-Summer2010 CAM-11

Acknowledgements

Material inspired by discussions and talks on the Apache Mailing lists for Solr, Lucene and through discussions with the rest of the Lucene community

May-20-10 CS572-Summer2010 CAM-12