${just go with the flo(w)}

Menu Close

Tag: geocode

Conversion of the Gauss Krueger notation into latitude/longitude

From the archives

A while ago I wanted to implement Microsoft Virtual Earth (that’s now Bing Maps) into one of my projects (Web-Application). Unfortunately the existing geo-coordinates were formatted using the the Gauss Krüger notation. The problem is that both, Virtual Earth and Google Maps, are working with latitude/longitude. So I had to convert the Guass Krüger coordinates.
After some research I found an article from Wolfgang Back. He wrote a little pda application to convert Gauss Krüger into latitute/longitude. Perfect! Well, almost. Mr. Back likes his VB so the code was in in Visual Basic which I had to translate into C# .

Convert Gauss Krueger into latitude/longitude

First we need to convert the given Gauss Krueger coordinates into lat/long:

7-Parameter-Helmert Transformation

After the conversion you have to do the 7-Parameter-Helmert Transformation to avoid the distortion which occurs when you convert coordinates from one 3-dimensional system to another 3-dimensional geodesic system. (To be honest I don’t understand it in every details but it works :-))

Getting Started with Solr

I recently wrote that I am working on a new project using some exciting technologies like the Spark View Engine and nHibernate. I was planning to write about some pitfalls along the development process or stuff in general which I think could be interesting to write about.

Well, I didn’t!

That’s not because there wasn’t much exciting stuff going on, it was the complete opposite. I also didn’t feel to blog about a lot of the stuff which was going on in the background. There are a million great blog posts out their on how to implement a repository pattern using nHibernate and ASP.NET MVC, on how to do nice stuff using query and all the rest of it.

What happened

That said, I recently started working on a search front end for the new site and ran into a few problems. Let me explain. The website basically consists of a backend part and a front page search facility. The domain model behind the application is quite complex. Because I am going the DDD approach together with nHibernate I was able to create a clean model with pocos, abstract nHibernate away using the repository pattern and end up with a reasonable consistent and clean “infrastructure”. I still love how easy it is to map inheritance with nHibernate using the discriminators.  That all works reasonably well, although I wish I would have started the project using a NoSQL database rather than a relational one.

The Problem

As I said I have quite a complex domain model and thus an even more complex database. I am using SQL Server 2005 which so far served me well for all the backend stuff. Doing simple searches against the complex data model worked fine as well but as soon as the client came up with some more complex search requirements, it all fell down. In the end it was the “multiple word search against everything + an auto complete function to assist the user while searching” requirement which let SQL Server fall down. I simply wasn’t able to produce really fast results using nHibernate. Another requirement included the just described search features + a spatial search functionality whereby we take the location of the user (using Google Maps to get the user’s lat/long) and attach a radius search to the general search functionality.

Now I was able to do most of that achieving a “OK performance”, but that was on my local machine with me, as a single user. As soon as I was simulating more realistic user numbers, performance turned really bad.

The Solution

Now, the first obvious things to look at would be the SQL Server full-text catalogue or the MS Search Server (Express) but to be honest  none of those really convinced me. SQL Server full text catalogue seems to be quite simple to implement and use at first but as soon as you need something more “special” you run against a wall. Don’t even try.

A colleague of mine was using Lucene  in another project and really liked it. He then pointed me to a product called Solr. What is Solr?

Solr is the popular, blazing fast open source enterprise search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, and rich document (e.g., Word, PDF) handling. Solr is highly scalable, providing distributed search and index replication, and it powers the search and navigation features of many of the world’s largest internet sites.

So 3 days ago I was starting to have a look at Solr and also started implementing it. Mauricio Scheffer did an amazing job by creating SolrNet, a Solr client for .NET which really simplifies querying Solr from within my MVC application.

I really would like to do some more detailed blog posts about how I exactly implemented Solr, what the difficult bits are and how to solve some if not all of the problems described above. But let me start with giving you a brief overview of what I have done.

Part of my project is to migrate a legacy database and transform it into my new data model. Now that part is done. What I needed to do was to get the legacy data into Solr in order to make it searchable. Now Solr is supposed to be lightning fast when you query against it but I have to say I am also surprised how bloody fast it is when you batch import data into it particular if you compare it with a relational database. So after some experimenting with the SolrNet client and after I got reasonable familiar with the Solr query syntax and the way you model the Solr schema I was able to put data into Solr and query against it as well. Overall it took my about 2 days to get all that up and running.

The second challenge was to setup a second core against which I would do my auto complete lookups. I kept the schema for my second core fairly simple. I have an ID field, so I can perform updates against the index using my relational database data, a type field which is a simple text field describing the type of the entity within the index. This also comes in handy when I need to boost certain results depending on the type/query. The last field is a text field containing  the actual. The only challenging thing was to find a good tokenizer to allow multi word searches and handle stuff like “some-auto-complete-text-in-london”. Again I hope I will find some time to blog about the details later on.

So far the last challenge has been to implement a spatial search functionality. unfortunately it is Solr 1.5 which will introduce a build in spatial search functionality. However there is a free plug-in called Spatial Solr which addresses that issue. Now I have to say I was really disappointed by the lack of documentation. Let me give you a quick example.  In an example query for Solr which can be found in the plug-in zip file (pdf) they state in order to query Solr you can do this:


q={!spatial lat=4.32 lng=54.32 radius=30 unit=km calc=arc threadCount=2}title:pizza

But the correct query should look more like this:

q={!spatial lat=4.32 long=54.32 radius=30 unit=km calc=arc threadCount=2}title:pizza

Another challenge was to actually get the plug-in running. I am using jetty on my windows machine and according to the documentation it is as simple as putting the plug-in into your lib folder and add the update processor and query parser to your config file. Well it was not. When I was starting the server I got the following error message:

First I though it had to do with my multi-core setup but after some research and goggling I came across a blog post by Phillip who had exactly the same problem and luckily found the solution.

You need to put the plug-in into the example/work/Jetty_0_0_0_0_8983_solr.war__solr__k1kf17/solr/WEB-INF/lib folder. Obvious isn’t it? 🙂

There are a couple of little examples like this which made it really hard to integrate the plug-in and get some queries running without syntax exceptions but in the end it all worked out fine. And again since the plug-in is free you can’t really complain.

Batch geocode your address data

I recently faced the challenge of geocoding existing locations. The location table contains about 10k entries. Around 75% of which had useful address data attached to it. The table contains mainly addresses from the UK but also from the US, Canada and Germany.

I was looking for a tool which would take my 10k entries and attach lat/long values where ever possible. If the address data wasn’t accurate enough, I wanted to have at least the next possible level of accuracy in terms of the lat/long values. So if an entry wouldn’t have a street and house number but a valid city/postcode I was happy with just getting the lat/long values for let’s say the West End in Glasgow. If the City would be the only thing then lat/long of the Glasgow City Centre would do it as well.

One reason why I am so desperate to get at least some degree of geocoding is that I will be working on a radius based search. Basically something like “I am in Glasgow on Byres Road and I would like to see all entries within 5 miles”. Get the idea?

There are a couple of commercial tools out there which at least look quite nice. I could have also developed my own little application to geocode the data for example by using the yahoo geocoder. However I didn’t want to spend any money or to much time on writing a custom tool for the task.

After loads of researching I came across two tools which I would like to recommend. One of them is batchgeocode.com.One way of geocoding your data is using the website where you can copy/paste your data into a text box and click a button. That’s it. Really simple and it also gives you information/feedback about the accuracy, which I think is really handy.

These indicate how accurate your geocode was. If you are finding a large number of your geocodes end up as APPROXIMATE, check the formatting and completness of your addresses.

  • ROOFTOP (most accurate) – indicates that the returned result reflects a precise geocode.
  • RANGE_INTERPOLATED – indicates that the returned result reflects an approximation (usually on a road) interpolated between two precise points (such as intersections). Interpolated results are generally returned when rooftop geocodes are unavailable for a street address.
  • GEOMETRIC_CENTER – indicates that the returned result is the geometric center of a result such as a polyline (for example, a street) or polygon (region).
  • APPROXIMATE (least accurate) – indicates that the returned result is approximate, usually the center of the zip code.

They also provide an excel template which contains some vbscript voodoo talking to the google maps api. So you can export your data into that file, adding an extra column with your primary key, geocode all your data and than import them back into your database. The limit here is 15k requests per day (that’s if you keep your IP address 🙂 ).
A very similar service comes from the guys from juice analytics called “Excel Geocoding Tool v2“. Again they provide you with an excel template to which you can export your existing data. They are using the yahoo geocoder and according to their site the current request limit is 5k per day. batchgeocode.com was offline during the day or at least I couldn’t connect to their site and so I used the juice analytics excel file. I’ve used the batchgeocode service before but I will definitely give it another go tomorrow and compare the accuracy and the ability to handle dodgy address data.

© 2020 florianb.net. All rights reserved.

Theme by Anders Norén.