Free Republic
Browse · Search
News/Activism
Topics · Post Article

To: WIMom
You reminded me of the one horrible limitation of this search engine: it indexes words of a minimum of four characters in length. Not one of those TLA will index. :-(

I will have to document that somewhere.

66 posted on 12/07/2001 8:16:07 PM PST by John Robinson
[ Post Reply | Private Reply | To 65 | View Replies ]


To: John Robinson
Dang, it doesn't like spaces either. I tried INS_ where _ = the space character. Too bad, it leaves out a lot of 3 letter acronyms searches. Otherwise it's great! Can you see my question in #58? What determines the relevance? If I search for Immigration, what makes one article more relevant than another? You message says by word search, is that the number of times the word you are searching for is counted in the article?
69 posted on 12/07/2001 8:31:41 PM PST by WIMom
[ Post Reply | Private Reply | To 66 | View Replies ]

To: John Robinson
We'll just have to make sure that Sheila Jackson Lee never rises to prominence then, since searching on her last name wouldn't work.

It's all about us. ;-)

87 posted on 12/08/2001 4:24:53 AM PST by Hugh Akston
[ Post Reply | Private Reply | To 66 | View Replies ]

To: John Robinson; WIMom
Well, this search engine is a hybrid.
The workhorse is the FULLTEXT index facility included with recent versions of MySQL, this workhorse ignores three-letter words. If you give the workhorse only three-letter words, it will return nothing.

However, if you give the workhorse something it can find, like "song" and "parody" and use "match all" or "match exact", my code will further restrict what the workhorse finds-- it takes the results from the workhorse and then filters them again. So if you are looking for "song", "parody", and "you", the workhorse ignores "you", finds many records with "song" and/or "parody", gives those results to my code, which then makes sure each has "song" AND "parody" AND "you".

And my code is not invoked if you do a "match any"; so "match any" will never find a three-letter word.

This might sound amazingly complicated, but it really isn't that much code. :-)

I see. So a "work-around" for the "ignore three-letter words" problem is to include in the search SOMETHING ELSE, which is longer than four characters, like:
fbi files - rather than just FBI
irs taxes - rather than just IRS
fox news - rather than just FOX
freep cnn - rather than just CNN
Also, FOUR-letter words - by themselves - seem to be O.K.:
waco
news
gore

108 posted on 12/08/2001 2:36:26 PM PST by RonDog
[ Post Reply | Private Reply | To 66 | View Replies ]

Free Republic
Browse · Search
News/Activism
Topics · Post Article


FreeRepublic, LLC, PO BOX 9771, FRESNO, CA 93794
FreeRepublic.com is powered by software copyright 2000-2008 John Robinson