Confluence 2.6 : Extractor Plugins
This page last changed on Jan 18, 2007 by jnolen.
Extractor plugins allow you to hook into the mechanism by which Confluence populates its search index. Each time content is created or updated in Confluence, it is passed through a chain of extractors that assemble the fields and data that will be added to the search index for that content. By writing your own extractor you can add information to the index.Extractor plugins can be used to extract the content from attachment types that Confluence does not support,
Extractor PluginsHere is an example atlassian-plugin.xml file containing a single search extractor: <atlassian-plugin name="Sample Extractor" key="confluence.extra.extractor"> ... <extractor name="Page Metadata Extractor" key="pageMetadataExtractor" class="confluence.extra.extractor.PageMetadataExtractor" priority="1000"> <description>Extracts certain keys from a page's metadata and adds them to the search index.</description> </extractor> ... </atlassian-plugin>
The Extractor InterfaceAll extractors must implement the following interface: package bucket.search.lucene; import bucket.search.Searchable; import org.apache.lucene.document.Document; public interface Extractor { public void addFields(Document document, StringBuffer defaultSearchableText, Searchable searchable); }
Attachment Content ExtractorsIf you are writing an extractor that indexes the contents of a particular attachment type (for example, OpenOffice documents or Flash files), you should extend the abstract class bucket.search.lucene.extractor.BaseAttachmentContentExtractor. This class ensures that only one attachment content extractor successfully runs against any file (you can manipulate the priorities of attachment content extractors to make sure they run in the right order). For more information, see: Attachment Content Extractor Plugins An Example ExtractorThe following example extractor is untested, but it associates a set of page-level properties with the page in the index, both as part of the regular searchable text, and also as Lucene Text fields that can be searched individually, for example in a custom {abstract-search} macro. package com.example.extras.extractor; import bucket.search.lucene.Extractor; import bucket.search.Searchable; import org.apache.lucene.document.Document; import org.apache.lucene.document.Field; import com.atlassian.confluence.core.ContentEntityObject; import com.atlassian.confluence.core.ContentPropertyManager; import com.opensymphony.util.TextUtils; public class ContentPropertyExtractor implements Extractor { public static final String[] INDEXABLE_PROPERTIES = {"status", "abstract"}; private ContentPropertyManager contentPropertyManager; public void addFields(Document document, StringBuffer defaultSearchableText, Searchable searchable) { if (searchable instanceof ContentEntityObject) { ContentEntityObject contentEntityObject = (ContentEntityObject) searchable; for (int i = 0; i < INDEXABLE_PROPERTIES.length; i++) { String key = INDEXABLE_PROPERTIES[i]; String value = contentPropertyManager.getStringProperty(contentEntityObject, key); if (TextUtils.stringSet(value)) { defaultSearchableText.append(value).append(" "); document.add(Field.Text(key, value)); } } } } public void setContentPropertyManager(ContentPropertyManager contentPropertyManager) { this.contentPropertyManager = contentPropertyManager; } } DebuggingThere's a really primitive Lucene index browser hidden in Confluence which may help when debugging. You'll need to tell it the filesystem path to your $conf-home/index directory. |
![]() |
Document generated by Confluence on Oct 10, 2007 18:49 |