DLESE Tools
v1.2

org.dlese.dpc.index.writer
Class XMLFileIndexingWriter

java.lang.Object
  extended byorg.dlese.dpc.index.writer.FileIndexingServiceWriter
      extended byorg.dlese.dpc.index.writer.XMLFileIndexingWriter
All Implemented Interfaces:
DocWriter
Direct Known Subclasses:
DleseAnnoFileIndexingServiceWriter, ItemFileIndexingWriter, SimpleXMLFileIndexingWriter

public abstract class XMLFileIndexingWriter
extends FileIndexingServiceWriter

Creates a Lucene Document from any XML file by stripping the XML tags to extract and index the content. The reader for this type of Document is XMLDocReader.

The Lucene Document fields that are created by this class are (in addition the the ones listed for FileIndexingServiceWriter):

collection - The collection associated with this resource.

Author:
John Weatherley
See Also:
FileIndexingService, XMLDocReader

Field Summary
protected  RecordDataService recordDataService
          Serves indexible data for a given record such as recordStatus, annotations, vocab ID mappings and associated IDs.
protected  MetadataVocab vocab
          DESCRIPTION
 
Constructor Summary
XMLFileIndexingWriter(RecordDataService recordDataService)
          Constructor for the XMLFileIndexingWriter.
 
Method Summary
protected  void addCustomFields(Document newDoc, Document existingDoc, File sourceFile)
          Adds the full content of the XML to the default search field.
protected abstract  void addFields(Document newDoc, Document existingDoc, File sourceFile)
          Adds additional fields that are unique the document format being indexed.
protected abstract  String getCollection()
          Returns unique collection keys for the item being indexed, separated by spaces.
protected  String getFieldContent(String[] values, String useVocabMapping)
          Gets the vocab encoded keys for the given values, separated by the '+' symbol.
protected  String getFieldContent(String value, String useVocabMapping)
          Gets the encoded vocab key for the given content.
protected  String getFieldName(String fieldString)
          Gets the fieldName attribute of the XMLFileIndexingWriter object
protected abstract  String getId()
          Returns unique IDs for the item being indexed, one for each collection that catalog the resource, separted by spaces.
static String getOaiModtime(File sourceFile, Document existingDoc)
          Gets the oaiModtime for the given File or Document.
 
Methods inherited from class org.dlese.dpc.index.writer.FileIndexingServiceWriter
abortIndexing, addToAdminDefaultField, addToDefaultField, create, destroy, getDeletedDoc, getDocType, getExistingDoc, getFileIndexingService, getReaderClass, getSourceDir, getSourceFile, getValidationReport, init, isValidationEnabled, prtln, prtlnErr, setDebug, setDefaultFieldName, setFileIndexingService, setValidationEnabled
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

vocab

protected MetadataVocab vocab
DESCRIPTION


recordDataService

protected RecordDataService recordDataService
Serves indexible data for a given record such as recordStatus, annotations, vocab ID mappings and associated IDs.

Constructor Detail

XMLFileIndexingWriter

public XMLFileIndexingWriter(RecordDataService recordDataService)
Constructor for the XMLFileIndexingWriter.

Parameters:
recordDataService - Used to get data about the file.
Method Detail

getId

protected abstract String getId()
                         throws Exception
Returns unique IDs for the item being indexed, one for each collection that catalog the resource, separted by spaces. For example "DLESE-000-000-000-001." The String is not tokenized and is stored and indexed under the field key 'id.'

Returns:
The id String
Throws:
Exception - This method should throw and Exception with appropriate error message if an error occurs.

getCollection

protected abstract String getCollection()
                                 throws Exception
Returns unique collection keys for the item being indexed, separated by spaces. For example 'dcc,' 'comet' or 'dwel'.

Returns:
The collection keys
Throws:
Exception - This method should throw and Exception with appropriate error message if an error occurs.

addFields

protected abstract void addFields(Document newDoc,
                                  Document existingDoc,
                                  File sourceFile)
                           throws Exception
Adds additional fields that are unique the document format being indexed. When implementing this method, use the add method of the Document class to add a Field.

The following Lucene Field types are available for indexing with the Document:
Field.Text(string name, string value) -- tokenized, indexed, stored
Field.UnStored(string name, string value) -- tokenized, indexed, not stored
Field.Keyword(string name, string value) -- not tokenized, indexed, stored
Field.UnIndexed(string name, string value) -- not tokenized, not indexed, stored
Field(String name, String string, boolean store, boolean index, boolean tokenize) -- allows control to do anything you want

Example code:
protected void addCustomFields(Document newDoc, Document existingDoc) throws Exception {
  String customContent = "Some content";
  newDoc.add(Field.Text("mycustomefield", customContent));
}

Parameters:
newDoc - The new Document that is being created for this resource
existingDoc - An existing Document that currently resides in the index for the given resource, or null if none was previously present
sourceFile - The sourceFile that is being indexed
Throws:
Exception - This method should throw and Exception with appropriate error message if an error occurs.

addCustomFields

protected void addCustomFields(Document newDoc,
                               Document existingDoc,
                               File sourceFile)
                        throws Exception
Adds the full content of the XML to the default search field. Strips the XML tags to extract the content. Will not work properly if the XML is not well-formed.

Specified by:
addCustomFields in class FileIndexingServiceWriter
Parameters:
newDoc - The new Document that is being created for this resource
existingDoc - An existing Document that currently resides in the index for the given resource, or null if none was previously present
sourceFile - The feature to be added to the CustomFields attribute
Throws:
Exception - This method should throw and Exception with appropriate error message if an error occurs.

getFieldContent

protected String getFieldContent(String[] values,
                                 String useVocabMapping)
                          throws Exception
Gets the vocab encoded keys for the given values, separated by the '+' symbol.

Parameters:
values - The valuse to encode.
useVocabMapping - The mapping to use, for example "contentStandards".
Returns:
The encoded vocab keys.
Throws:
Exception - If error.

getFieldContent

protected String getFieldContent(String value,
                                 String useVocabMapping)
                          throws Exception
Gets the encoded vocab key for the given content.

Parameters:
value - The value to encode.
useVocabMapping - The vocab mapping to use, for example "contentStandard".
Returns:
The encoded value.
Throws:
Exception - If error.

getFieldName

protected String getFieldName(String fieldString)
                       throws Exception
Gets the fieldName attribute of the XMLFileIndexingWriter object

Parameters:
fieldString - DESCRIPTION
Returns:
The fieldName value
Throws:
Exception - DESCRIPTION

getOaiModtime

public static final String getOaiModtime(File sourceFile,
                                         Document existingDoc)
Gets the oaiModtime for the given File or Document.

Parameters:
sourceFile - The source file
existingDoc - The existing Doc
Returns:
The oaiModtime value

DLESE Tools
v1.2