Recently used DIH to upload data into SolrCloud cluster via DIH. The DIH is single threaded and provides very limited performance optimizations. Initially, the DIH was very slow due to an incorrect setting of batchSize. After few iterations of arrived at an optimum batchSize. This change resulted in a huge performance improvement for Data upload into the cluster.
Choosing batchSize depends on
1. data size returned by query
2. network latency
Choosing an incorrect setting would impact the Data upload speed. Best way it to run few imports with varying sizes before arriving at the best value. It’s recommended to try the upload with higher number of elements, to take into account the initial warmup time for connections and queries. In my setup, I tried with 1M entries.