Transcript Introduction to Lucene and Solr
Introduction to Apache Lucene/Solr
CSCI 572: Information Retrieval and Search Engines Summer 2010
Outline
• • • •
What is Lucene/Solr?
Where did it come from?
What are the current versions of Lucene/Solr?
What can it do?
CAM-2 May-20-10 CS572-Summer2010
Apache Lucene
• • • •
The brainchild of Doug Cutting Free-text indexing library that implements most of the functionality I’ve talked to you about
–
Query Models, Ranking, Indexing Core API is implemented in Java
–
C++/C, Ruby, Python APIs as well, but small communities or automatically generated Initially Sourceforge, moved to Apache in 2001
May-20-10 CS572-Summer2010 CAM-3
Apache Solr
• • • • • •
Originally developed at CNET Web service layer built on top of Lucene library Provides schema and understanding of field types, conversion to and from representation Provides huge-scale scalability, deployed on top of application server like Tomcat or Jetty P/L independent programming APIs Sharing, replication, faceting, highlighting, explain, more like this and other functionality provided easily
May-20-10 CS572-Summer2010 CAM-4
How to get started
•
Lucene (2.9.2 and 3.0.1 stable)
– – –
Put your Java hat on Have Eclipse ready or your favorite IDE Download lucene-core-
•
http://repo1.maven.org/maven2/org/apache/lucene/
–
Download src and build from
•
http://www.apache.org/dyn/closer.cgi/lucene/java/
–
Check out some example Java code that demonstrates indexing and querying from Otis Gospodnetic
•
http://onjava.com/pub/a/onjava/2003/01/15/lucene.html
May-20-10 CS572-Summer2010 CAM-5
How to get started
•
Solr
–
Grab a release of Solr (1.4.0 stable)
•
http://www.apache.org/dyn/closer.cgi/lucene/solr/
– –
Unpack into e.g., /usr/local/solr Deploy onto tomcat
• •
Install tomcat into /usr/local/tomcat Create solr.xml file and drop into /usr/local/tomcat/conf/Catalina/localhost/
–
Create solr.home JNDI property and point to /usr/local/solr/solr
•
Start tomcat
–
Head over to $solr/example/example-docs
•
curl http://localhost:8983/solr/update -H 'Content-type:text/xml; charset=utf-8' --data-binary @artists.xml
May-20-10 CS572-Summer2010 CAM-6
Modifying your schema.xml
• • •
Field Types Analyzers Tokenizers
http://wiki.apache.org/solr/SchemaXml May-20-10 CS572-Summer2010 CAM-7
Solr Faceting
• •
facet=on&facet.field=&facet.field=… http://wiki.apache.org/solr/SimpleFacetParameters
May-20-10 CS572-Summer2010 CAM-8
Advanced Topics
• • • •
Standing up cores Sharding Replication Zookeeper and Cloud
May-20-10 CS572-Summer2010 CAM-9
Development currently in flux
• • •
Stick with release versions Depending on trunk won’t really help Lucene and Solr have merged
May-20-10 CS572-Summer2010 CAM-10
Wrapup
• •
Lots more information at
– – –
http://lucene.apache.org
http://lucene.apache.org/solr/ http://lucene.apache.org/java/ Possible projects
–
Geospatial search
•
Improving existing code and contributing back to Apache SIS and to Apache Solr
– –
Improving date faceting Rewriting the ResponseWriter framework
May-20-10 CS572-Summer2010 CAM-11
Acknowledgements
•
Material inspired by discussions and talks on the Apache Mailing lists for Solr, Lucene and through discussions with the rest of the Lucene community
May-20-10 CS572-Summer2010 CAM-12