Xerox invents powerful next-generation content discovery software

Researchers at Xerox have developed next-generation content discovery software powerful enough to sift through almost any electronic data source, regardless of the language, location, format or type of document.

The new text-mining tool is tuned to the way humans think, speak and ask questions, and is able to highlight just a handful of relevant answers to search queries instead of returning thousands of unrelated responses.

Developed at Xerox Research Centre Europe (XRCE) in Grenoble, France, Xeroxs text mining software, FactSpotter, combines a powerful, advanced linguistic engine, which analyses the meaning of words and the construction of phrases and sentences with an easy-to-use interface, allowing a non-expert to conduct searches using everyday language.

FactSpotter is expected to be available from Xerox Global Services sometime between now and 2008 via the recently announced Xerox Litigation Services, which provide electronic discovery (e-discovery) services, primarily supporting litigation and regulatory compliance. The intelligent document technology complements Xeroxs growing portfolio of services-related innovations that differentiate the Xerox Global Services offerings and help customers deal with document-intensive work processes.

FactSpotter promises a significant boost in productivity for data-intensive environments, including electronic legal discovery, risk management, pharmaceutical research, competitive and market intelligence, security intelligence and fraud detection; it will significantly reduce search times and also improve the relevance of results.

Todays knowledge worker has quite a task in front of them as each and every day they search for specific data, information, or corporate knowledge in order to do their job well, said Mike Maziarka, Director for InfoTrends' Dynamic Content Software and Image Scanning Trends Consulting Services. We all need tools that will make it easier to search for that needle among the haystack of masses of information that exists in our world today. FactSpotter meets this need because it can make searches easier to conduct, more accurate and more encompassing, ultimately improving the focus of the results and allowing workers to be more productive.

Unlike traditional search engines, which bring back a plethora of complete documents that contain the search term (e.g. a 20-page document with one mention of the Eiffel Tower), Xeroxs text mining software is smart and selective in its search. For instance, it returns only those portions of the document that contain relevant information. What is more, the portions do not even need to contain the actual search terms used the engine can also track words that are similar in meaning. For example, FactSpotter knows that Pariss tallest monument refers to the Eiffel Tower.

This next-generation linguistic engine goes beyond todays keyword search and current data mining programs which typically end up searching only 40 percent of all relevant documents, said Frdrique Segond, area manager of parsing and semantics, XRCE. Xeroxs tool is more accurate because it delves into documents, extracting the concepts and the relationships among them. By understanding the context, it returns the right information to the searcher, and it even highlights exactly where the answer is located within a document.

The new software goes beyond traditional search engines in several ways:

  • Its novel interface means users can express their queries naturally instead of forcing them to adapt their questions to the logic of computers. Traditional systems, on the other hand, split a query into isolated words and return only documents that contain exactly those words in exactly that order,
  • It takes into account the context of the entire document instead of just a cluster of nearby words. It introduces the concept of relation, searching within and across sentences and paragraphs,
  • It recognises abstract concepts, like people or building, and will retrieve all the words that fit within that category.

These advanced capabilities enable Xeroxs new software to find in seconds information that would otherwise be very difficult to uncover. Analysing the meaning of both the query and the searched document is critical, for example, during the electronic discovery phase of a legal trial because it allows specific facts to be found quickly and easily among thousands (and often millions) of different documents. In addition to legal applications, FactSpotter is also expected to be valuable in other situations where information must be retrieved from a massive database, including risk management, corporate and governmental searches, drug discovery and fraud detection.


About Xerox

Xerox markets a comprehensive range of Xerox products, solutions and services, as well as associated supplies and software. Its offerings are focused on three main areas: offices from small to large, production print and graphic arts environments, and services that include consulting, systems design and management, and document outsourcing.  Xerox also has manufacturing and logistics operations in Ireland, the UK and Holland, and a research and development facility (Xerox Research Centre Europe) in Grenoble, France.

 

Add a Comment

No messages on this article yet

Editorial: +44 (0)1892 536363
Publisher: +44 (0)208 440 0372
Subscribe FREE to the weekly E-newsletter