${just go with the flo(w)}

Menu Close

Getting Started with Solr

I recently wrote that I am working on a new project using some exciting technologies like the Spark View Engine and nHibernate. I was planning to write about some pitfalls along the development process or stuff in general which I think could be interesting to write about.

Well, I didn’t!

That’s not because there wasn’t much exciting stuff going on, it was the complete opposite. I also didn’t feel to blog about a lot of the stuff which was going on in the background. There are a million great blog posts out their on how to implement a repository pattern using nHibernate and ASP.NET MVC, on how to do nice stuff using query and all the rest of it.

What happened

That said, I recently started working on a search front end for the new site and ran into a few problems. Let me explain. The website basically consists of a backend part and a front page search facility. The domain model behind the application is quite complex. Because I am going the DDD approach together with nHibernate I was able to create a clean model with pocos, abstract nHibernate away using the repository pattern and end up with a reasonable consistent and clean “infrastructure”. I still love how easy it is to map inheritance with nHibernate using the discriminators.  That all works reasonably well, although I wish I would have started the project using a NoSQL database rather than a relational one.

The Problem

As I said I have quite a complex domain model and thus an even more complex database. I am using SQL Server 2005 which so far served me well for all the backend stuff. Doing simple searches against the complex data model worked fine as well but as soon as the client came up with some more complex search requirements, it all fell down. In the end it was the “multiple word search against everything + an auto complete function to assist the user while searching” requirement which let SQL Server fall down. I simply wasn’t able to produce really fast results using nHibernate. Another requirement included the just described search features + a spatial search functionality whereby we take the location of the user (using Google Maps to get the user’s lat/long) and attach a radius search to the general search functionality.

Now I was able to do most of that achieving a “OK performance”, but that was on my local machine with me, as a single user. As soon as I was simulating more realistic user numbers, performance turned really bad.

The Solution

Now, the first obvious things to look at would be the SQL Server full-text catalogue or the MS Search Server (Express) but to be honest  none of those really convinced me. SQL Server full text catalogue seems to be quite simple to implement and use at first but as soon as you need something more “special” you run against a wall. Don’t even try.

A colleague of mine was using Lucene  in another project and really liked it. He then pointed me to a product called Solr. What is Solr?

Solr is the popular, blazing fast open source enterprise search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, and rich document (e.g., Word, PDF) handling. Solr is highly scalable, providing distributed search and index replication, and it powers the search and navigation features of many of the world’s largest internet sites.

So 3 days ago I was starting to have a look at Solr and also started implementing it. Mauricio Scheffer did an amazing job by creating SolrNet, a Solr client for .NET which really simplifies querying Solr from within my MVC application.

I really would like to do some more detailed blog posts about how I exactly implemented Solr, what the difficult bits are and how to solve some if not all of the problems described above. But let me start with giving you a brief overview of what I have done.

Part of my project is to migrate a legacy database and transform it into my new data model. Now that part is done. What I needed to do was to get the legacy data into Solr in order to make it searchable. Now Solr is supposed to be lightning fast when you query against it but I have to say I am also surprised how bloody fast it is when you batch import data into it particular if you compare it with a relational database. So after some experimenting with the SolrNet client and after I got reasonable familiar with the Solr query syntax and the way you model the Solr schema I was able to put data into Solr and query against it as well. Overall it took my about 2 days to get all that up and running.

The second challenge was to setup a second core against which I would do my auto complete lookups. I kept the schema for my second core fairly simple. I have an ID field, so I can perform updates against the index using my relational database data, a type field which is a simple text field describing the type of the entity within the index. This also comes in handy when I need to boost certain results depending on the type/query. The last field is a text field containing  the actual. The only challenging thing was to find a good tokenizer to allow multi word searches and handle stuff like “some-auto-complete-text-in-london”. Again I hope I will find some time to blog about the details later on.

So far the last challenge has been to implement a spatial search functionality. unfortunately it is Solr 1.5 which will introduce a build in spatial search functionality. However there is a free plug-in called Spatial Solr which addresses that issue. Now I have to say I was really disappointed by the lack of documentation. Let me give you a quick example.  In an example query for Solr which can be found in the plug-in zip file (pdf) they state in order to query Solr you can do this:


q={!spatial lat=4.32 lng=54.32 radius=30 unit=km calc=arc threadCount=2}title:pizza

But the correct query should look more like this:

q={!spatial lat=4.32 long=54.32 radius=30 unit=km calc=arc threadCount=2}title:pizza

Another challenge was to actually get the plug-in running. I am using jetty on my windows machine and according to the documentation it is as simple as putting the plug-in into your lib folder and add the update processor and query parser to your config file. Well it was not. When I was starting the server I got the following error message:

First I though it had to do with my multi-core setup but after some research and goggling I came across a blog post by Phillip who had exactly the same problem and luckily found the solution.

You need to put the plug-in into the example/work/Jetty_0_0_0_0_8983_solr.war__solr__k1kf17/solr/WEB-INF/lib folder. Obvious isn’t it? 🙂

There are a couple of little examples like this which made it really hard to integrate the plug-in and get some queries running without syntax exceptions but in the end it all worked out fine. And again since the plug-in is free you can’t really complain.

© 2020 florianb.net. All rights reserved.

Theme by Anders Norén.