PDA

View Full Version : Document the search engine syntax and bad word list


Schnoodledorfer
12th February 2006, 21:52
I've had a rough time trying to figure out how to use the forum's search engine well. Basically, there are three issues that have caused problems.

1) The search engine isn't documented well.

This is all the FAQ (http://forum.doom9.org/faq.php?faq=vb_board_usage) says: Can I search the forum?

You can search for posts based on username, word(s) in the post or just in the subject, by date, and only in particular forums.

To access the search feature, click on the "search" link at the top of most pages.

You can search any forum that you have permission to search - you will not be allowed to search through private forums unless the administrator has given you the necessary security rights to do so.

That's not much help!

2) The Advance Search (http://forum.doom9.org/search.php)page uses a syntax that is not used by the search vBMenu popup.

It's obvious that the popup doesn't give you all of the options that the Advanced Search page does, but I never expected that it would ignore the search operators that the advanced search engine accepts. This inconsistency is confusing. [EDIT: Apparently, I was wrong about this. I now think both do a boolean search, but they display the keywords differently among other things. Also, the full forum search popup displays the results as threads, but the match might have been a post far down the thread that I didn't see. I don't think I took that into account.]

3) Frequently used words aren't indexed and that isn't documented.

Even on the Advanced Search window, if you search for +x264 +MP4 +MeGUI , you get some results that don't include "MP4". At first glance, it appears that the + has no effect, so it would be reasonable to conclude that the search engine doesn't support the '+' operator, even though it does. (The real issue is that MP4 is used so often that it's not indexed apparently, so it is completely ignored, but that's not documented.)

I have some ideas that I think would help. Keep in mind that I'm not a vBulletin expert, though.

1) Please do one of these things to the search popup:
1a) Disable it entirely and replace it with a link directly to the Advanced Search page. (My preference.) I think the fact that the popup causes confusion and causes a person to have to wait and click a second time to get to the Advanced Search page outweighs the benefit of having it.
1b) Make the popup use the same syntax that the Advanced Search page uses. (I doubt this is possible, though.) EDIT: It can be done, supposedly! See http://www.vbulletin.com/forum/showpost.php?p=1007001&postcount=4
1c) Document the fact that the popup doesn't use the syntax modifiers that the Advanced Search page uses and perhaps rename the popup "Simple forum search". Also I would like you to add a link directly to the Advanced Search page on the basic page in addition to the popup. This at least would not require me click and wait and click again just to get to the Advanced Search page.

2) Document how the search modifiers work. I don't think you have enabled mySQL fulltext searching (that's a vBulletin option, apparently) [EDIT: Now I think you do], but the Advanced Search page seems to use the same syntax (not that I've tried everything). If the syntax does match, you could probably just copy what it says here (http://dev.mysql.com/doc/refman/5.0/en/fulltext-boolean.html). Again, if the popup remains enabled, but it can't be made to use this syntax, that distinction should be documented. [EDIT: Now I understand that it's not a matter of whether or not you using the fulltext searching, but whether or not the queries are sent as a boolean fulltext query or a natural language fulltext query. The Advanced Search page apparently uses boolean fulltext queries, while the popup uses natural language fulltext queries (if I've got this figured out).] [Edit: Apparently, I didn't. I now think both are boolean and probably (but not necessarily) fulltext]

3) Also document which words aren't indexed, or at least the fact that more than the normal words aren't indexed. (I don't know if it is a simple matter to generate a list or not.) I think most people would expect "and" "the" etc. to not be indexed, but "MP4" and "FAQ" probably wouldn't occur to them (at least it didn't to me until I thought about it). EDIT: Or does the formum use mySQL's fulltext default of not indexing 3-letter words? @Doom (http://forum.doom9.org/showthread.php?p=563923#post563923)says the limit is 2 letters, but is that still true after the upgrade? In any case, MySQL's default list is here (http://dev.mysql.com/doc/refman/5.0/en/fulltext-stopwords.html).

4) Document how to use Google as an alternative to search the forum. Sometimes Google can find things that the forum search engine can't find. For example, Googling +MP4 +FAQ site:forum.doom9.org returns the MP4 FAQ as the first link, while the forum's search engine can't even return anything (apparently because neither "MP4" nor "FAQ" are indexed). Also Google seems to do a better job of calculating relevancy, IMO.

One thing that might make the Google searching even better would be to change the "Forum Archive Posts Per Page" setting in vBulletin to 1. Since Google searches on a per-page basis, a search on Google winds up finding matches on a per-thread basis (there seems to be one thread per page as if "Forum Archive Threads Per Page" is set to one). Of course, there might be overriding concerns that I'm not aware of, and I'm just interpreting what I think is happening, so I could be wrong.

5) This is probably a vBulletin bug, but just in case it's not: A (minor) problem with using the search operators is that they aren't removed in the links to the found posts. Example: I just searched for +x264 +MeGUI (using the Advanced Search page, of course). The first item found had a URL that ended with &highlight=%2Bx264+%2BmeGUI#post761504 (http://forum.doom9.org/showthread.php?p=761504&highlight=%2Bx264+%2BmeGUI#post761504). If the link ended with &highlight=x264+meGUI#post761504 (http://forum.doom9.org/showthread.php?p=761504&highlight=x264+meGUI#post761504), instead (no %2B = '+' in the link), each use of 'x264' and 'MeGUI' would be highlighted. That would be nicer.

Anyway, hopefully these ideas would increase the use of the search engine, yet wouldn't cause too much effort on anyone's part.

Schnoodledorfer
13th February 2006, 02:45
One problem with using Google to search the forum is that Google's spiders find the same posts multiple times, but with different parameters in the URL, so Google doesn't recognize the duplication. I think the solution is to use Robots.txt to direct the spiders to http://forum.doom9.org/archive/ (vBulletin creates that specifically for spiders) and to block them otherwise.

Also, it looks like it would be relatively easy to add links like "http://www.google.com/advanced_search?q=+site:forum.doom9.org" to the search popup.