[Solr TIP] Data Import Handler tip for improving performance

Recently used DIH to upload data into SolrCloud cluster via DIH. The DIH is single threaded and provides very limited performance optimizations. Initially, the DIH was very slow due to an incorrect setting of batchSize. After few iterations of arrived at an optimum batchSize. This change resulted in a huge performance improvement for Data upload into the cluster.

Choosing batchSize depends on
1. data size returned by query
2. network latency

Choosing an incorrect setting would impact the Data upload speed. Best way it to run few imports with varying sizes before arriving at the best value. It's recommended to try the upload with higher number of elements, to take into account the initial warmup time for connections and queries. In my setup, I tried with 1M entries.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.