Free Republic
Browse · Search
News/Activism
Topics · Post Article

Skip to comments.

The search engine that would outdo Google
MarketWatch ^ | Nov 7, 2006 | Bambi Francisco

Posted on 11/07/2006 8:42:00 PM PST by annie laurie

I was fortunate enough to be the first journalist to get an in-depth demonstration of Powerset, which is a natural-language search engine. Simply put, the technology analyzes the meaning and relationships of words in context so that it can accommodate questions asked in natural language, such as "How're the Giants doing," rather than questions asked in sketchy "keywordese" inputs, like "scores, Giants."

...

Since Powerset indexes a fraction of a fraction of what Google currently handles, we confined our test of Powerset to searching the The New York Times and Wikipedia sites and then checked how Google stacked up when doing the same.

Sample input: "What does News Corp. own?" In Powerset, the top 3 results were very relevant -- and specific. They included a link to a document about Fox TV studios owned by News Corp; another was a link to Balkan News, owned by News Corp. Yet another was about Foxtel being owned by News Corp. The same search on Google generated relatively relevant results, such as a link to the about News Corp. page on Wikipedia. But one result was a link to a Netscape "news" story about President Bush and how the U.S. "does" not torture prisoners.

In capturing the Netscape page, Google implied that its technology found relevancy because the words "news" and "does" were in close proximity. But the result had nothing to do with the company News Corp. In the query, "news" was referring to a company name and "does" referred to an action by that company.

That's the problem with existing search engines, Powerset's founders say. Conventional search indexes words based on the occurrence they're mentioned and their proximity are to one another. Where they fall short is they don't index the relationships between words or the meanings of the words ...

(Excerpt) Read more at marketwatch.com ...


TOPICS: Miscellaneous
KEYWORDS: google; internet; powerset; searchengine
Navigation: use the links below to view more comments.
first previous 1-2021-32 last
To: Blue Highway
Askjeeves.com sucked. No wonder they got rid of the butler. Does anyone still use ask.com?

Askjeeves cheated. They used human editors to precook results for common questions.

However, ask.com is actually somewhat useful. In Bambi's interview with Pell, he cites searching for "books by children" as an example of a search that is hard for regular search engines but easy for Powerset (because it understands the preposition "by" instead of tossing it as noise). Ask.com actually does better on this problem than Google. If you submit the query to Ask, it shows "Narrow Your Search" links over on the right, which are more or less on target.

On the other hand, there are limits. Neither Google nor Ask does very well on "journalist named after a disney character" [Google] [Ask], for instance.

I find that the hard queries are the ones in which the mere conjunction of the keywords isn't enough — where the keywords don't have much selectivity, but their relationship does. It's a hard problem. It requires the computer to understand the ideas presented in the text at some level, pick out and index the semantic relationships, then understand the query at a similar level and search the index. It will be interesting to see if Powerset is any more successful than the existing attempts.

Powerset's future: (1) forget about it or (2) get acquired by Google or Microsoft or (3) give up on being a public search engine in favor of producing a more powerful intranet search product or (4) displace Google (least likely).

21 posted on 11/07/2006 10:32:36 PM PST by cynwoody
[ Post Reply | Private Reply | To 13 | View Replies]

To: rit

"HP Deskjet" "ink cartridge" "next day delivery" and "visa"

Surprisingly, only 107 links. Now if you remove the term and then the count goes up to 416.

And moreover, when you include 'and', Google tells you it has no effect, yet clearly it does.

I tried your query on Froogle, and it gave no results at all until I got rid of the last two terms. One would think Froogle could tackle the delivery modes and payment methods problems, given they already know the context is shopping.

22 posted on 11/07/2006 10:53:28 PM PST by cynwoody
[ Post Reply | Private Reply | To 20 | View Replies]

To: cynwoody
Another thing that has also frustrated me is that many web sites could, but do not, handle arbitrary searches in the URL. Consider, for example
http://www.verizon.com/cell%20phones

You would think that verizon would be smart enough to auto search for pages containing the phrase "cell phones".... but no... they just give "We are not able to process your request. ".

The same applies for just about any web site out there. Even google should be able to handle the request. Try typing http://www.google.com/cell%phones and you get back an error page. Good grief... a search engine site that cannot even be bothered to search for a reasonable answer.

So much to do... so little time...

23 posted on 11/07/2006 11:01:22 PM PST by rit
[ Post Reply | Private Reply | To 22 | View Replies]

To: annie laurie
Here's another natural language search engine: Lexxe. They are out of stealth mode, so you can actually try queries.

Let's try Pell's example, books by children. Not bad.

Let's see if it can tell us how long a shake is. Oops, make that time, not shingles. Nope, can't get shingles out of its silicon head.

Now let's google it. Bingo! First hit.

24 posted on 11/08/2006 12:11:29 AM PST by cynwoody
[ Post Reply | Private Reply | To 1 | View Replies]

To: annie laurie

This is a GOOD THING. Google's politics make me yearn for another option.


25 posted on 11/08/2006 12:16:27 AM PST by glorgau
[ Post Reply | Private Reply | To 1 | View Replies]

To: rit
It's all about knowing the format:

http://www.google.com/search?q=d'oh

http://www.google.com/search?q=cell+phones+site%3Averizon.com

26 posted on 11/08/2006 7:35:52 AM PST by PissAndVinegar
[ Post Reply | Private Reply | To 23 | View Replies]

Is anyone using Clusty.com and how does it compare...
27 posted on 11/08/2006 8:02:48 AM PST by tubebender (Growing old is mandatory...Growing up is optional)
[ Post Reply | Private Reply | To 1 | View Replies]

To: cynwoody
I tried the following query on Lexxe
     define operating system service

Their response is
     Sorry, Lexxe is not sure if there is a correct answer for your query. Please check the web and cluster results. Thank you.

28 posted on 11/08/2006 8:10:36 AM PST by rit
[ Post Reply | Private Reply | To 24 | View Replies]

To: stainlessbanner

Thailand?


29 posted on 11/08/2006 8:11:25 AM PST by Lazamataz (Thats the spirit.)
[ Post Reply | Private Reply | To 4 | View Replies]

To: rit
I should also note that Lexxe did have any answer for the query:
     who makes more money, mom or dad?

The answer is:
     bluesuitmom

30 posted on 11/08/2006 8:14:49 AM PST by rit
[ Post Reply | Private Reply | To 28 | View Replies]

To: annie laurie

(Opinion) Google is easy enough to use by putting in keywords.


31 posted on 11/08/2006 8:55:55 AM PST by Jedi Master Pikachu ( Creationists are as smart as Macroevolutionists.)
[ Post Reply | Private Reply | To 1 | View Replies]

To: KarlInOhio

Google is clean and uncluttered, and yet it is more successful at getting advertising revenue than its competitors.


32 posted on 11/08/2006 8:57:05 AM PST by Jedi Master Pikachu ( Creationists are as smart as Macroevolutionists.)
[ Post Reply | Private Reply | To 3 | View Replies]


Navigation: use the links below to view more comments.
first previous 1-2021-32 last

Disclaimer: Opinions posted on Free Republic are those of the individual posters and do not necessarily represent the opinion of Free Republic or its management. All materials posted herein are protected by copyright law and the exemption for fair use of copyrighted works.

Free Republic
Browse · Search
News/Activism
Topics · Post Article

FreeRepublic, LLC, PO BOX 9771, FRESNO, CA 93794
FreeRepublic.com is powered by software copyright 2000-2008 John Robinson