Apache Solr is a web application and is built around Lucene. Lucene has a powerful search library to provide full-text indexing . The significant aspect of Lucene search is its inverted index, meaning keyword-centric data structure i.e. word -> Pages rather than page -> words.
Solr not only takes advantage of all good features like inverted index of search , spellchecking, hit highlighting and advanced analysis/tokenization capabilities in Lucene ; but empowers itself as one of the powerful search application with SolrAPI . One of the advanced features of Solr is faceting , i.e. arranging search results in the form of columns and numerical counts of the key terms.
Thus Solr is the paradise of programmers to develop sophisticated and efficient search applications as it provides easier scaling and distribution.
The DataImportHandler (DIH) is a mechanism for importing structured data from a data store into Solr. It is often used with relational databases, but can also handle XML with its XPath Entity Processor. We can pass incoming XML to an XSL, as well as parse and transform the XML with built-in DIH transformers. We could translate our arbitrary XML to Solr’s standard input XML format via XSL, or map/transform the arbitrary XML to the Solr schema fields right there in the DIH config file, or a combination of both. DIH is flexible.
I will discuss here how to deploy Solr DIH for search XML files.
Ensure the JDK installed in your system and Java_Home is set appropriately. Then install Solr- 5.0.0 or above version . Once the installation is completed go to the solr root directory and go to bin folder.
Step 1:
Start the solr using command
./solr start
Step 2:
Create a collection with name Manufactures
solr create -c Manufactures
Now the Manufactures core is being populated in the core selector . We also see the statistics of the core in solr Admin UI
Step 3:
Create a .xml file and place the XML file in root directory of solr
sudo vim /home/centos/solr/manufactures.xml
The content of example file (manufactures.xml)
Step 4:
Configure solrconfg.xml
Open (vim solrconfg.xml) the solrconfig.xml and place the following code
Step 5:
Data Import configuration
Create a file named Manufacturesconfig.xml with the following content
Note 1: The source folder should contain only one file that is the .xml file to index. If we want to index more than one file , we have to give specific path of each file and name of the file OR we provide in data config file give the exact file name or group of the files )
Note 2: The base directory should be the solr root installation directory (<solr_installtion_root_dir>)
Note 3: The forEach path has to be changed according to the structure of the .xml file.
Step 6:
Configure managed-schema
sudo vim managed-schema
Step 7:
Restart the solr
Go to bin folder and use the command
./solr restart
Step 8:
Go to Admin console URL and click on DataImportHandler
Step 9:
Select full import and then execute.
Now we see the data in admin UI console . Now do the search operations.
Note: To import the data from any relational database, place jdbc driver configuration in dataconfig.xml file.
Ex: for hsql database