Mika Heinonen's Blog
Counter
1607

New search engine for Domino
Friday, 05 August 2005 22:07:00 EET

I've been working on a new fulltext search engine for Domino, which works like the built-in FullTextSeach engine. The core of the engine is a C++ program, which does the search from a FullTextIndex text file. The FTI.txt file is created per database which have the new search engine enabled.

There is a seperate configuration database, where you can enlist all databases on which you want to run the new search engine. It's just an add-on to Domino, and the default Search engine still works for each database, if it's full text indexed.

The new engine has some essential benefits compared to Domino's built-in engine, which were also the main reason for this project:
• Improved pattern matching algorithm, also word breaks like "-" can be included in the pattern, and all combinations of * and ? work correctly, and brackets as well. I am also planning to have exclude patterns, like ~*abc* (=do not find fields which have *abc* in it).
• Better control over the fields you want to include in the search. Domino always includes unwanted fields in the search which can get wrong results. With the new engine you can get a list of existing fields in the target database, and select which fields you want to include or exclude.
• Faster search speed: firsts tests show that the C++ search speed is quite fast, a database with 300000 documents is searched in a few seconds.
• Unlimited search results: the new search engine does not use any memory lists, but rather streams directly from the input file to the output file. In Domino 6.0.2 you were limited to 65534 results. This has been fixed in later Domino versions, but due to slowness problems in later Domino versions, many companies are forced to use Domino 6.0.2CF2.
• Sorted search results: While in Domino you need seperate views for each sorted search, the new engine does the sorting using linear optimized quicksort, and thus you can get rid of some few big views and improve your database performance.


J Conradt
Thursday, 25 August 2005 23:45:16 EET
I would be interested what your results will be and will check your blog from time to time.

All the best,
J

Gerald Mengisen
Wednesday, 31 August 2005 11:23:34 EET
Are you available by any chance for a few days of consulting work? We would like to tap into your search engine expertise. Please write to gerald-dot-mengisen-at-yahoo-dot-com (simply replace the -dot- and -at- with the respective characters). Thanks!

Mika Heinonen
Wednesday, 31 August 2005 18:17:40 EET
Hi Gerald,

Sure, I'm always open to discuss and improve the technology. I'll drop you a mail in a few moments.

Mika Heinonen
Wednesday, 31 August 2005 19:26:18 EET
Gerald,

Your yahoo mail account does not exist.
Can you write to me at mika.heinonen@siipi.com