The number of cloudsolrclient commit indexes is less than the actual value.

solrj creates an index of about 129776 pieces of data at a time, created by multiple threads, with a maximum of 1w pieces of data processed by each thread. The core code for each thread is as follows.

but after the actual creation of the index, there are only 97878 indexes (the actual number of indexes is different for each execution). After reading the log, there is no error. The index library primary key is the primary key of the database and will not be repeated.

                        String ip = PropertiesInit.getPropertiesValue("solrCluster.ip");
                        CloudSolrClient solrServer = new CloudSolrServer(ip);
                        // :DefaultCollection
                        solrServer.setDefaultCollection(coreName);
                        // :SolrInputDocument
                        Collection<SolrInputDocument> docs = new ArrayList<SolrInputDocument>();
                        // :
                        for (int i = 0; i < list.size(); iPP) {
                            newSumPP;
                            //
                            docs.add(document);
                        }
                        // :
                        solrServer.add(docs);
                        // :
                        UpdateResponse response = solrServer.commit();
                        System.out.println(": "+newSum+":" + response.getResponse());
                        solrServer.close();

printed log



configuration file

    <autoCommit>
      <maxTime>${solr.autoCommit.maxTime:15000}</maxTime>
      <openSearcher>false</openSearcher>
    </autoCommit>

    <autoSoftCommit>
      <maxTime>${solr.autoSoftCommit.maxTime:-1}</maxTime>
    </autoSoftCommit>
Mar.11,2022

found the reason, it is true that there is a duplicate primary key in the index.

observe the statistics option through customs clearance and find that the number of Deleted Docs is just the missing number. The problem is located to the data returned by the API webservice. Finally, it is found that when it is oracle paging, the sql statement is sorted by time. Because there is data at the same time, the data on each page is not fixed, resulting in the same data on the first page and the second page.

the final solution is to sort by id to avoid paging data duplication.

Menu