Lynda Moulton is a Senior Analyst and Consultant in enterprise search at Outsell's Gilbane Group. She has over 30 years as a professional search architect. She also consults on metadata management and taxonomies for content behind the firewall. Lynda is a DZone MVB and is not an employee of DZone and has posted 15 posts at DZone. You can read more from them at their website. View Full User Profile

Native Database Search vs. Commercial Search Engines

05.26.2011
| 6014 views |
  • submit to reddit

This topic is random and a short response to a question that popped up recently from a reader seeking technical research on the subject. Since none was available in the Gilbane library of studies, I decided to think about how to answer the subject with some practical suggestions.

The focus is on an enterprise with a substantive amount of content aggregated from a diverse universe of industry specific information, and what to do about searching it. If the information has been parsed and stored in an RDBMS database, is it not better to leverage the SQL query engine native to the RDBMS? Typical database engines might be: DB2, MS Access, MS SQL, MySQL, Oracle or Progress Software.

To be clear, I am not a developer but worked closely with software engineers for 20 years when I owned a software company. We worked with several DBMS products, three of them supported SQL queries and the application we invented and supported was a forerunner of today's content management systems with a variety of retrieval (search) interfaces. The retrievable content our product supported was limited to metadata plus abstracts up to two or three pages in length; the typical database sizes of our customers ranged from 250,000 to a couple of million records.

This is small potatoes compared to what search engines typically traverse and index today but scale was always an issue and we were well aware of the limitations of the SQL engines to support contextual searching, phrase searching and complex Boolean queries. It was essential that indexes be built in real time, when records were added whether manually through screen forms, or through batch loads. The engine needed to support explicit adjacency (phrase) searching as well as key words anywhere in a field, in a record, or in a set. Saving and re-purposing results, storing search strategies, narrowing large sets incrementally, and browsing indexes of terminology (taxonomy navigation) to select unique terms that would enable a Boolean "and" or "or" query were part of the application. When our original text-based DBMS vendor went belly-up, we spent a couple of years test driving numerous RDBMS products to find one that would support the types of searches our customers expected. We settled on Progress Software primarily because of its support for search and experience as an OEM to application software vendors, like us. Development time was minimized because of good application building tools and index building utilities.

So, what does that have to do with the original question, native RDBMS search vs. standalone enterprise search? Based on discussions and observations with developers trying to optimize search for special applications, using generic search tools for database retrieval, I would make the following observations. Search is very hard and advanced search, including concept searching, Boolean operations, and text analytics, is harder still. Developers of enterprise search solutions have grappled with and solved search problems that need to be supported in environments where content is dynamically changing and growing, different user interfaces for diverse audiences and types of queries are needed, and query results require varieties of display formats. Also, in e-commerce applications, interfaces require routine screen face lifts that are best supported by specialized tools for that purpose.

Then you need to consider all these development requirements; they do not come out-of-the-box with SQL search:


  • Full text indexes and database field or metadata indexes require independent development efforts for each database application that needs to be queried.

  • Security databases must be developed to match each application where individual access to specific database elements (records or rows) is required.

  • Natural language queries require integration with taxonomies, thesauri, or ontologies; this means software development independent of the native search tools.

  • Interfaces must be developed for search engine administrators to make routine updates to taxonomies and thesauri, retrieval and results ranking algorithms, adjustments to include/exclude target content in the databases. These content management tasks require substantive content knowledge but should not require programming expertise and must be very efficient to execute.

  • Social features that support interaction among users and personalization options must be built.

  • Connectors need to be built to federate search across other content repositories that are non-native and may even be outside the enterprise.


Any one of these efforts is a multi-person and perpetual activity. The sheer scale of the development tasks mitigate against trying to sustain state-of-the-art search in-house with the relatively minimalist tools provided in most RDBMS suites. The job is never done and in-depth search expertise is hard to come by. Software companies that specialize in search for enterprises are also diverse in what they offer and the vertical markets they support well. Bottom line: identify your business needs and find the search vendor that matches your problem with a solution they will continue to support with regular updates and services. Finally, the issue of search performance and speed of processing are another huge factor to consider. For this you need some serious technical assessment. If the target application is going to be a big revenue generator with heavy loads and huge processing, do not overlook. Do benchmarks to prove the performance and scalability.
Read more: http://gilbane.com/search_blog/2009/02/native_database_search_vs_comm.html?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+EnterpriseSearchPracticeBlog+%28Enterprise+Search+Practice+Blog%29&utm_content=Google+Reader#ixzz1NSZoNFtQ
References
Published at DZone with permission of Lynda Moulton, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Comments

Mark Unknown replied on Thu, 2011/05/26 - 9:40am

Why is this "Native Database Search vs. Commercial Search Engines" and not just "Search Engines"? What also needs to be included are "non-relational databases". Another question: How can some one be a "professional search architect" and "not a developer"? I am not saying she does not have insight, but personal experience goes a long way to truly understanding a problem and the solution. "Full text indexes and database field or metadata indexes require independent development efforts for each database application that needs to be queried". I would say this is not [100%] true. And what is a "database application"? To me, it seems she is looking at the problem from the wrong end (i.e. The sun does not revolve around the earth).

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.