Everything is clear about search on a local PC. It’s not remarkable for any particular functionality features accept for the choice of file type (media, text etc.) and the search destination. Just enter the name of the searched file (or part of text, for example in the Word format) and that’s it. The speed and result depend fully on the text entered into the query line. There is zero intellectuality in this: simply looking through the available files to define their relevance. This is in its sense explicable: what’s the use of creating a sophisticated system for such uncomplicated needs.
Global search technologies
Matters stand totally different with the search systems operating in the global network. One can’t rely simply on looking through the available data. Huge volume (Yandex for instance can boast the indexing capacity of more than 11 terabyte of data) of the global chaos of unstructured information will make the simple search not only ineffective but also long and labor-consuming. That’s why lately the focus has shifted towards optimizing and improving quality characteristics of search. But the scheme is still very simple (except for the secret innovations of every separate system) – the phrasal search through the indexed data base with proper consideration for morphology and synonyms. Undoubtedly, such an approach works but doesn’t solve the problem completely. Reading dozens of various articles dedicated to improving search with the help of Google or Yandex, one can drive at the conclusion that without knowing the hidden opportunities of these systems finding a relevant document by the query is a matter of more than a minute, and sometimes more than an hour. The problem is that such a realization of search is very dependent on the query word or phrase, entered by the user. The more indistinct the query the worse is the search. This has become an axiom, or dogma, whichever you prefer.
Of course, intelligently using the key functions of the search systems and properly defining the phrase by which the documents and sites are searched, it is possible to get acceptable results. But this would be the result of painstaking mental work and time wasted on looking through irrelevant information with a hope to at least find some clues on how to upgrade the search query. In general, the scheme is the following: enter the phrase, look through several results, making sure that the query was not the right one, enter a new phrase and the stages are repeated till the relevancy of results achieves the highest possible level. But even in that case the chances to find the right document are still few. No average user will voluntary go for the sophistication of “advanced search” (although it is equipped with a number of very useful functions such as the choice of language, file format etc.). The best would be to simply insert the word or phrase and get a ready answer, without particular concern for the means of getting it. Let the horse think – it has a big head. Maybe this is not exactly up to the point, but one of the Google search functions is called “I am feeling lucky!” characterizes very well the existent searching technologies. Nevertheless, the technology works, not ideally and not always justifying the hopes, but if you allow for the complexity of searching through the chaos of Internet data volume, it could be acceptable.
The third on the list are the turnkey solutions based on the searching technologies. They are meant for serious companies and corporations, possessing really large data bases and staffed with all sorts of information systems and documents. In principle, the technologies themselves can also be used for home needs. For example, a programmer working remotely from the office will make good use of the search to access randomly located on his hard drive program source codes. But these are particulars. The main application of the technology is still solving the problem of quickly and accurately searching through large data volumes and working with various information sources. Such systems usually operate by a very simple scheme (although there are undoubtedly numerous unique methods of indexing and processing queries underneath the surface): phrasal search, with proper consideration for all the stem forms, synonyms etc. which once again leads us to the problem of human resource. When using such technology the user should first word the query phrases which are going to be the search criteria and presumably met in the necessary documents to be retrieved. But there is no guarantee that the user will be able to independently choose or remember the correct phrase and furthermore, that the search by this phrase will be satisfactory.
One more key moment is the speed of processing a query. Of course, when using the whole document instead of a couple of words, the accuracy of search increases manifold. But up to date, such an opportunity has not been used because of the high capacity drain of such a process. The point is that search by words or phrases will not provide us with a highly relevant similarity of results. And the search by phrase equal in its length the whole document consumes much time and computer resources. Here is an example: while processing the query by one word there is no considerable difference in speed: whether it’s 0,1 or 0,001 second is not of crucial importance to the user. But when you take an average size document which contains about 2000 unique words, then the search with consideration for morphology (stem forms) and thesaurus (synonyms), as well as generating a relevant list of results in case of search by key words will take several dozens of minutes (which is unacceptable for a user).
The interim summary
As we can see, currently existing systems and search technologies, although properly functioning, don’t solve the problem of search completely. Where speed is acceptable the relevancy leaves more to be desired. If the search is accurate and adequate, it consumes lots of time and resources. It is of course possible to solve the problem by a very obvious manner – by increasing the computer capacity. But equipping the office with dozens of ultra-fast computers which will continuously process phrasal queries consisting of thousands of unique words, struggling through gigabytes of incoming correspondence, technical literature, final reports and other information is more than irrational and disadvantageous. There is a better way.