Tuesday, May 29, 2007

Play nice with Zend Framework

Glad to announce that I managed to integrate Zend_Search_Lucene with Seagull framework. I used bridge pattern to close the gap to it, and have written a little IndexBuilder class that makes use of strategies to harvest search targets. Made a strategy for articles in database so they are searchable.

Was a bit concerned about performance side of the thing, just imagine that ZF uses Apache Lucene`s java based search engine & I made a wrapper around a wrapper, sound pretty silly btw :) On win box I needed quite a few seconds do generate/search even small indices. The biggest surprise came when I made it in work on linux (debian/vserver) server, its lightning fast, so slowness on win is probably a local environmental issue. The index size is really acceptable (if you design fields of documents correctly), so it scales well.

Most people use LIKE SQL queries for searches, it has serious limitations (it is enough for simple cases), example: search in multiple fields, various boost factors.

More still to come (release/wiki), if somebody needs it badly leave a message here

2 comments:

demianturner@gmail.com said...

Sounds really interesting, would love to hear more how you did it.

pentarim said...

Hi Demian,

I have made a class SGL_Search that holds the bridge to Zend_Search_Lucene, it adds some functionality as SGL path information, added useful functions as drop, getPagedData, isIndexDefined. ZF cant implement this because they work with full path, & I have added some rules for storing indices

"vardir"/search/index/"indexname"/"language_encoding".

The SGL_Search_IndexBuilder class supports multilingual index generation, with strategy pattern, so a strategy could be deployed for whatever type of content SGL has (cms, custom content).

Preparing the documentation (will make a wiki entry of it) with some improvements with alpha release of this thingie.

Have my todo on admin backend because its really unserious for now, just a simple manager without reindex ranges, updating indices without drop etc. But when it is done it will be published.