Abstract
With the network of information resources of the rapid growth, people more and more concerned about how quickly and efficiently from the vast network of information, from a potential and valuable information to make it effective in the management and decision-making role. Search engine technology to solve the user network information retrieval difficulties, the Web search engine (Search Engine) technology is becoming computer science and information industry competing research and development targets.
Search engine (Search Engine) is on the Internet to provide specialized services for a class of sites, these sites through the Internet search software (also known as the Web search robot) or log sites, including through the collection on the Internet site of the large number of pages, after processing the building, to enable the users to respond to the various enquiries, providing users the information needed.
In this paper, based on the traditional relational database like search site station crawl under the programme, the Lucene open source information retrieval technology for the analysis, compared Lucene and the distinction between relational database. With an example and then introduced in the Lucene search system on the basis of a Spring framework, integrated network reptiles Heritrix, Ajax and other technologies, eventually establish a Java-based subsystem Lucene full-text search process.