Lucene/Solr インデクシングの簡単なデバッグログ出力の設定方法

Solrを使っいると、Luceneインデクシングの内部仕様を調べることが出てくると思っています。

Luceneのインデクシングデバッグ

ということでLucene in Action読んでいたところ、「2.12. Debugging indexing」という項目見つけました。

Lucene in Action

Lucene in Action

それによると IndexWriter#setInfoStream()デバッグ情報を出力できるとのこと。

IndexWriter writer = new IndexWriter(dir, analyzer,
                                     IndexWriter.MaxFieldLength.UNLIMITED);
writer.setInfoStream(System.out);

確かにあります。

lucene-solr/IndexWriterConfig.java at releases/lucene-solr/6.6.0 · apache/lucene-solr · GitHub

とはいえSolr使っているのでLuceneのIndexWriterを呼んでるのはSolrの中。少しいじってコンパイルし直しかなと思っていたのですが、少し調べてみたら設定がありました。

[SOLR-4977] info stream in solrconfig should have option for writing to the solr log - ASF JIRA

設定方法

以下設定方法です。

solrconfig.xml に以下の設定を入れます。

<indexConfig>
 <infoStream >true</infoStream>
</indexConfig>

次にSolrのlog4jの設定ファイル log4j.properties に以下の設定を入れます。

# set to INFO to enable infostream log messages
log4j.logger.org.apache.solr.update.LoggingInfoStream=INFO

以上です。簡単です。

どんなデータが出るか

どんなデータが出るか知りたい人もいると思うので、このシンプルなスキーマに、少しデータ入れてみた時のログ貼っておきます。

スキーマ

<?xml version="1.0" encoding="UTF-8" ?>
<schema name="testfluentd-schema" version="1.5">
    <field name="_version_" type="tlongs" indexed="true" stored="true" multiValued="false"/>
   <field name="id"        type="string"   indexed="true" stored="true" required="true"/>
   <field name="flong"     type="tlongs"     indexed="true" stored="true" required="true"/>
   <field name="flong_dv"  type="tlongs"     indexed="true" stored="true" docValues="true" required="true"/>
   <fieldType name="tlongs" class="solr.TrieLongField" precisionStep="0" positionIncrementGap="0"/>
   <fieldType name="tdoubles" class="solr.TrieDoubleField" precisionStep="0" positionIncrementGap="0"/>
   <fieldType name="uuid" class="solr.UUIDField"  />
   <fieldType name="booleans" class="solr.BoolField"  />
   <fieldType name="string" class="solr.StrField"  />
   <fieldType name="tdates" class="solr.TrieDateField" />

   <uniqueKey>id</uniqueKey>

</schema>

データ

id,flong,flong_dv
1,444,777
2,555,888
3,666,999

ログ

 [DW][commitScheduler-16-thread-1]: anyChanges? numDocsInRam=3 deletes=true hasTickets:false pendingChangesInFullFlush: false
 [IW][commitScheduler-16-thread-1]: nrtIsCurrent: infoVersion matches: false; DW changes: true; BD changes: false
 [IW][commitScheduler-16-thread-1]: flush at getReader
 [DW][commitScheduler-16-thread-1]: startFullFlush
 [DW][commitScheduler-16-thread-1]: anyChanges? numDocsInRam=3 deletes=true hasTickets:false pendingChangesInFullFlush: false
 [DWFC][commitScheduler-16-thread-1]: addFlushableState DocumentsWriterPerThread [pendingDeletes=gen=0 2 deleted terms (unique count=2) bytesUsed=330, segment=_d, aborted=false, numDocsInRAM=3, deleteQueue=DWDQ: [ generation: 8 ]]
 [DWPT][commitScheduler-16-thread-1]: flush postings as segment _d numDocs=3
 [IW][commitScheduler-16-thread-1]: 0 msec to write norms
 [IW][commitScheduler-16-thread-1]: 1 msec to write docValues
 [IW][commitScheduler-16-thread-1]: 0 msec to write points
 [IW][commitScheduler-16-thread-1]: 1 msec to finish stored fields
 [IW][commitScheduler-16-thread-1]: 3 msec to write postings and finish vectors
 [IW][commitScheduler-16-thread-1]: 0 msec to write fieldInfos
 [DWPT][commitScheduler-16-thread-1]: new segment has 0 deleted docs
 [DWPT][commitScheduler-16-thread-1]: new segment has no vectors; no norms; docValues; no prox; no freqs
 [DWPT][commitScheduler-16-thread-1]: flushedFiles=[_d_Lucene50_0.tip, _d_Lucene50_0.doc, _d.fdx, _d_Lucene50_0.tim, _d_Lucene54_0.dvm, _d.fnm, _d.fdt, _d_Lucene54_0.dvd]
 [DWPT][commitScheduler-16-thread-1]: flushed codec=Lucene60
 [DWPT][commitScheduler-16-thread-1]: flushed: segment=_d ramUsed=0.104 MB newFlushedSize=0.002 MB docs/MB=1,923.993
 [DWPT][commitScheduler-16-thread-1]: flush time 9.335589 msec
 [DW][commitScheduler-16-thread-1]: publishFlushedSegment seg-private updates=null
 [IW][commitScheduler-16-thread-1]: publishFlushedSegment
 [BD][commitScheduler-16-thread-1]: push deletes  3 deleted terms (unique count=3) bytesUsed=1072 segmentPrivate?=false delGen=15 packetCount=1 totBytesUsed=1072
 [IW][commitScheduler-16-thread-1]: publish sets newSegment delGen=16 seg=_d(6.0.1):C3
 [IFD][commitScheduler-16-thread-1]: now checkpoint "_c(6.0.1):C3 _d(6.0.1):C3" [2 segments ; isCommit = false]
 [IFD][commitScheduler-16-thread-1]: delete []
 [IFD][commitScheduler-16-thread-1]: 0 msec to checkpoint
 [IW][commitScheduler-16-thread-1]: apply all deletes during flush
 [IW][commitScheduler-16-thread-1]: now apply all deletes for all segments maxDoc=6
 [BD][commitScheduler-16-thread-1]: applyDeletes: open segment readers took 0 msec
 [BD][commitScheduler-16-thread-1]: applyDeletes: infos=[_c(6.0.1):C3, _d(6.0.1):C3] packetCount=1
 [BD][commitScheduler-16-thread-1]: applyTermDeletes took 0.2 msec for 2 segments and 1 packets; 3 del terms visited; 8 seg terms visited
 [BD][commitScheduler-16-thread-1]: applyDeletes: 3 new deleted documents
 [BD][commitScheduler-16-thread-1]: applyDeletes took 2 msec for 2 segments, 0 newly deleted docs (query deletes), 3 visited terms, allDeleted=[_c(6.0.1):C3]
 [IFD][commitScheduler-16-thread-1]: now checkpoint "_c(6.0.1):C3/3 _d(6.0.1):C3" [2 segments ; isCommit = false]
 [IFD][commitScheduler-16-thread-1]: delete []
 [IFD][commitScheduler-16-thread-1]: 0 msec to checkpoint
 [IW][commitScheduler-16-thread-1]: drop 100% deleted segments: _c(6.0.1):C3/3
 [IFD][commitScheduler-16-thread-1]: now checkpoint "_d(6.0.1):C3" [1 segments ; isCommit = false]
 [IFD][commitScheduler-16-thread-1]: delete []
 [IFD][commitScheduler-16-thread-1]: 0 msec to checkpoint
 [BD][commitScheduler-16-thread-1]: prune sis=segments_d: _d(6.0.1):C3 minGen=17 packetCount=1
 [BD][commitScheduler-16-thread-1]: pruneDeletes: prune 1 packets; 0 packets remain
 [IW][commitScheduler-16-thread-1]: incRefDeleter for NRT reader version=80 segments=_d(6.0.1):C3
 [IW][commitScheduler-16-thread-1]: return reader version=80 reader=StandardDirectoryReader(segments_d:80:nrt _d(6.0.1):C3)
 [DW][commitScheduler-16-thread-1]: commitScheduler-16-thread-1 finishFullFlush success=true
 [TMP][commitScheduler-16-thread-1]: findMerges: 1 segments
 [TMP][commitScheduler-16-thread-1]:   seg=_d(6.0.1):C3 size=0.002 MB [floored]
 [TMP][commitScheduler-16-thread-1]:   allowedSegmentCount=1 vs count=1 (eligible count=1) tooBigCount=0
 [MS][commitScheduler-16-thread-1]: now merge
 [MS][commitScheduler-16-thread-1]:   index: _d(6.0.1):C3
 [MS][commitScheduler-16-thread-1]:   no more merges pending; now return
 [IW][commitScheduler-16-thread-1]: getReader took 14 msec
 [IFD][commitScheduler-16-thread-1]: delete []
 [IW][commitScheduler-16-thread-1]: decRefDeleter for NRT reader version=76 segments=_c(6.0.1):C3
.u.LoggingInfoStream [IFD][searcherExecutor-7-thread-1-processing-x:mycore]: delete []
.u.LoggingInfoStream [IW][searcherExecutor-7-thread-1-processing-x:mycore]: decRefDeleter for NRT reader version=74 segments=_c(6.0.1):C3
 [IW][commitScheduler-15-thread-1]: commit: start
 [IW][commitScheduler-15-thread-1]: commit: enter lock
 [IW][commitScheduler-15-thread-1]: commit: now prepare
 [IW][commitScheduler-15-thread-1]: prepareCommit: flush
 [IW][commitScheduler-15-thread-1]:   index before flush _d(6.0.1):C3
 [DW][commitScheduler-15-thread-1]: startFullFlush
 [IW][commitScheduler-15-thread-1]: apply all deletes during flush
 [IW][commitScheduler-15-thread-1]: now apply all deletes for all segments maxDoc=3
 [BD][commitScheduler-15-thread-1]: applyDeletes: open segment readers took 0 msec
 [BD][commitScheduler-15-thread-1]: applyDeletes: no segments; skipping
 [BD][commitScheduler-15-thread-1]: prune sis=segments_d: _d(6.0.1):C3 minGen=17 packetCount=0
 [DW][commitScheduler-15-thread-1]: commitScheduler-15-thread-1 finishFullFlush success=true
 [IW][commitScheduler-15-thread-1]: startCommit(): start
 [IW][commitScheduler-15-thread-1]: startCommit index=_d(6.0.1):C3 changeCount=22
 [IW][commitScheduler-15-thread-1]: startCommit: wrote pending segments file "pending_segments_e"
 [IW][commitScheduler-15-thread-1]: done all syncs: [_d_Lucene50_0.tip, _d_Lucene50_0.doc, _d.fdx, _d_Lucene50_0.tim, _d_Lucene54_0.dvm, _d.si, _d.fnm, _d.fdt, _d_Lucene54_0.dvd]
 [IW][commitScheduler-15-thread-1]: commit: pendingCommit != null
 [IW][commitScheduler-15-thread-1]: commit: done writing segments file "segments_e"
 [IFD][commitScheduler-15-thread-1]: now checkpoint "_d(6.0.1):C3" [1 segments ; isCommit = true]
 [IFD][commitScheduler-15-thread-1]: deleteCommits: now decRef commit "segments_d"