DLESE Tools
v1.2

org.dlese.dpc.index.writer
Class ItemFileIndexingWriter

java.lang.Object
  extended byorg.dlese.dpc.index.writer.FileIndexingServiceWriter
      extended byorg.dlese.dpc.index.writer.XMLFileIndexingWriter
          extended byorg.dlese.dpc.index.writer.ItemFileIndexingWriter
All Implemented Interfaces:
DocWriter
Direct Known Subclasses:
ADNFileIndexingWriter, DleseIMSFileIndexingWriter

public abstract class ItemFileIndexingWriter
extends XMLFileIndexingWriter

Abstract class for writing a Lucene Document for a collection of item-level metadata records of a specific format (DLESE IMS, ADN-Item, ADN-Collection, etc). The reader for this type of Document is XMLDocReader or ItemDocReader.


The Lucene Document fields that are created by this class are (in addition the the ones listed for FileIndexingServiceWriter):

title - The tile for the resource. Stored.
description - The description for the resource. Stored.
url - The url to the resoruce. Stored.
Stored. Appended with a '0' at the beginning to support wildcard searching.
metadatapfx - The metadata prefix (format) for this record, for example 'adn' or 'oai_dc'. Stored. Appended with a '0' at the beginning to support wildcard searching.
accessionstatus - The accession status for this record. Stored. Appended with a '0' at the beginning to support wildcard searching.
annotypes - Annotataion types that are refer to this record. Keyword.
annopathways - Annotataion pathways that are refer to this record. Keyword.
associatedids - A list of record IDs that refer to the same resource. Keyword.
valid - Indicates whether the record is valid [true | false]. Not stored.
validationreport - Text describing an error in the validation of the data for this record. Stored. Only indexed if there was a validation error indicated by the valid field containing false.

Author:
John Weatherley
See Also:
ItemDocReader, XMLDocReader, RecordDataService, FileIndexingServiceWriter

Field Summary
 
Fields inherited from class org.dlese.dpc.index.writer.XMLFileIndexingWriter
recordDataService, vocab
 
Constructor Summary
protected ItemFileIndexingWriter(RecordDataService recordDataService)
          Creates a ItemFileIndexingWriter that indexes the given collection in field collection.
 
Method Summary
protected  void addFields(Document newDoc, Document existingDoc, File sourceFile)
          Adds fields to the index that are common to all collection-related documents.
protected abstract  void addFrameworkFields(Document newDoc, Document existingDoc)
          Adds fields to the index that are unique to the given framework.
protected abstract  void destroy()
          This method is called at the conclusion of processing and may be used for tear-down.
protected abstract  String getAccessionStatus()
          Returns the accession status of this record, for example 'accessioned'.
protected abstract  String getCreator()
          Returns the items creator's full name.
protected abstract  String getCreatorLastName()
          Returns the items creator's last name.
 Document getDeletedDoc(Document existingDoc)
          Creates a Lucene Document from an existing CollectionFileIndexing Document by setting the field "deleted" to "true" and making the modtime equal to current time.
protected abstract  String getDescription()
          Returns a description for the document being indexed.
abstract  String getDocType()
          Returns a unique document type key for this kind of record, corresponding to the format type.
protected abstract  String getKeywords()
          Returns the items keywords.
abstract  String getReaderClass()
          Gets the fully qualified name of the concrete DocReader class that is used to read this type of Document, for example "org.dlese.dpc.index.reader.ItemDocReader".
protected abstract  String getTitle()
          Returns a title for the document being indexed.
protected abstract  String getUrl()
          Returns the URL to the resource being indexed.
protected abstract  String getValidationReport()
          Gets a report detailing any errors found in the validation of the data, or null if no error was found.
abstract  void init(File source, Document existingDoc)
          This method is called prior to processing and may be used to for any necessary set-up.
 
Methods inherited from class org.dlese.dpc.index.writer.XMLFileIndexingWriter
addCustomFields, getCollection, getFieldContent, getFieldContent, getFieldName, getId, getOaiModtime
 
Methods inherited from class org.dlese.dpc.index.writer.FileIndexingServiceWriter
abortIndexing, addToAdminDefaultField, addToDefaultField, create, getExistingDoc, getFileIndexingService, getSourceDir, getSourceFile, isValidationEnabled, prtln, prtlnErr, setDebug, setDefaultFieldName, setFileIndexingService, setValidationEnabled
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

ItemFileIndexingWriter

protected ItemFileIndexingWriter(RecordDataService recordDataService)
Creates a ItemFileIndexingWriter that indexes the given collection in field collection. The RecordDataService is used to get indexible data such as recordStatus, annotations, vocab ID mappings and associated IDs.

Parameters:
recordDataService - The recordData service used with writer.
Method Detail

getTitle

protected abstract String getTitle()
                            throws Exception
Returns a title for the document being indexed. An empty String or null is acceptable. The String is tokenized, stored and indexed under the field key 'title' and is also indexed in the 'default' field.

Returns:
The title String
Throws:
Exception - This method should throw and Exception with appropriate error message if an error occurs.

getDescription

protected abstract String getDescription()
                                  throws Exception
Returns a description for the document being indexed. An empty String or null is acceptable. The String is tokenized, stored and indexed under the field key 'description' and is also indexed in the 'default' field.

Returns:
The description String
Throws:
Exception - This method should throw and Exception with appropriate error message if an error occurs.

getUrl

protected abstract String getUrl()
                          throws Exception
Returns the URL to the resource being indexed. An empty String or null is acceptable. The URL String is tokenized and indexed under the field key 'uri' and is also indexed in the 'default' field. It is also stored in the index untokenized under the field key 'url.'

Returns:
The url String
Throws:
Exception - This method should throw and Exception with appropriate error message if an error occurs.

getKeywords

protected abstract String getKeywords()
                               throws Exception
Returns the items keywords. An empty String or null is acceptable. The String is tokenized, stored and indexed under the field key 'keywords' and is also indexed in the 'default' field.

Returns:
The keywords String
Throws:
Exception - This method should throw and Exception with appropriate error message if an error occurs.

getCreatorLastName

protected abstract String getCreatorLastName()
                                      throws Exception
Returns the items creator's last name. An empty String or null is acceptable. The String is tokenized, stored and indexed under the field the 'default' field only.

Returns:
The creator's last name String
Throws:
Exception - This method should throw and Exception with appropriate error message if an error occurs.

getCreator

protected abstract String getCreator()
                              throws Exception
Returns the items creator's full name. An empty String or null is acceptable. The String is tokenized, stored and indexed under the field key 'creator'.

Returns:
Creator's full name
Throws:
Exception - This method should throw and Exception with appropriate error message if an error occurs.

getAccessionStatus

protected abstract String getAccessionStatus()
                                      throws Exception
Returns the accession status of this record, for example 'accessioned'. The String is tokenized, stored and indexed under the field key 'accessionstatus'.

Returns:
The accession status.
Throws:
Exception - This method should throw and Exception with appropriate error message if an error occurs.

addFrameworkFields

protected abstract void addFrameworkFields(Document newDoc,
                                           Document existingDoc)
                                    throws Exception
Adds fields to the index that are unique to the given framework.

The following Lucene Field types are available for indexing with the Document:
Field.Text(string name, string value) -- tokenized, indexed, stored
Field.UnStored(string name, string value) -- tokenized, indexed, not stored
Field.Keyword(string name, string value) -- not tokenized, indexed, stored
Field.UnIndexed(string name, string value) -- not tokenized, not indexed, stored
Field(String name, String string, boolean store, boolean index, boolean tokenize) -- allows control to do anything you want

Example code:
protected void addFrameworkFields(Document newDoc, Document existingDoc) throws Exception {
  String customContent = "Some content";
  newDoc.add(Field.Text("mycustomefield", customContent));
}

Parameters:
newDoc - The new Document that is being created for this resource
existingDoc - An existing Document that currently resides in the index for the given resource, or null if none was previously present
Throws:
Exception - This method should throw and Exception with appropriate error message if an error occurs.

getDocType

public abstract String getDocType()
                           throws Exception
Returns a unique document type key for this kind of record, corresponding to the format type. For example "adn," "dlese_ims," or "dlese_anno". The string is parsed using the Lucene StandardAnalyzer so it must be lowercase and should not contain any stop words.

Specified by:
getDocType in interface DocWriter
Specified by:
getDocType in class FileIndexingServiceWriter
Returns:
The docType String
Throws:
Exception - This method should throw and Exception with appropriate error message if an error occurs.

getReaderClass

public abstract String getReaderClass()
Gets the fully qualified name of the concrete DocReader class that is used to read this type of Document, for example "org.dlese.dpc.index.reader.ItemDocReader".

Specified by:
getReaderClass in interface DocWriter
Specified by:
getReaderClass in class FileIndexingServiceWriter
Returns:
The name of the DocReader.

init

public abstract void init(File source,
                          Document existingDoc)
                   throws Exception
This method is called prior to processing and may be used to for any necessary set-up. This method should throw and exception with appropriate message if an error occurs.

Specified by:
init in class FileIndexingServiceWriter
Parameters:
source - The source file being indexed
existingDoc - An existing Document that currently resides in the index for the given resource, or null if none was previously present
Throws:
Exception - If an error occured during set-up.

destroy

protected abstract void destroy()
This method is called at the conclusion of processing and may be used for tear-down.

Specified by:
destroy in class FileIndexingServiceWriter

getValidationReport

protected abstract String getValidationReport()
                                       throws Exception
Gets a report detailing any errors found in the validation of the data, or null if no error was found. This could be implemented by simply performing XML schema validation on the file, or can involve more customized validation of the data if necessary. This method is called after all other methods that access the data (getTitle(), addFrameworkFields(Document, Document), etc.) so that data verification can be done during those calls, if needed.

Overrides:
getValidationReport in class FileIndexingServiceWriter
Returns:
Null if no data validation errors were found, otherwise a String that details the nature of the error.
Throws:
Exception - If error in performing the validation.

addFields

protected final void addFields(Document newDoc,
                               Document existingDoc,
                               File sourceFile)
                        throws Exception
Adds fields to the index that are common to all collection-related documents. These include the title, description, id and url as well as collection, accession status, annotation references, and collection(s).

Specified by:
addFields in class XMLFileIndexingWriter
Parameters:
newDoc - The new Document that is being created for this resource
existingDoc - An existing Document that currently resides in the index for the given resource, or null if none was previously present
sourceFile - The sourceFile that is being indexed.
Throws:
Exception - If an error occurs

getDeletedDoc

public Document getDeletedDoc(Document existingDoc)
                       throws Throwable
Creates a Lucene Document from an existing CollectionFileIndexing Document by setting the field "deleted" to "true" and making the modtime equal to current time.

Overrides:
getDeletedDoc in class FileIndexingServiceWriter
Parameters:
existingDoc - An existing FileIndexingService Document that currently resides in the index for the given resource.
Returns:
A Lucene FileIndexingService Document with the field "deleted" set to "true" and modtime set to current time.
Throws:
Throwable - Thrown if error occurs

DLESE Tools
v1.2