Veridian digital collection software

Veridian XML Interface

Last updated: March 7, 2019

The XML interface described in this document is designed to provide an alternative method of accessing the data in a Veridian system. Veridian usually returns HTML pages, to provide a user interface via a web browser, but it can also return XML versions of some pages. These allow access to the document data at a low level, making it possible to use Veridian as a "backend" and create alternative interfaces to search, browse and view the documents.


Change history

March 7, 2019 Replaced "PageFeatureCode" field with the more general "PageType".
December 4, 2018 Minor updates to example values.
July 21, 2017 Added "DocumentLastModified" field to the "DocumentContent" type.
July 13, 2017 The "Document", "LogicalSection" and "Page" fields may have a "robots-noindex" attribute to indicate that the document, logical section or page should not be indexed by bots/web crawlers.
June 20, 2017 Start of change history

Contents

1. Introduction

2. Types
2.1. DocumentContent
2.2. DocumentMetadata
2.3. LogicalSectionContent
2.4. LogicalSectionMetadata
2.5. PageContent
2.6. PageMetadata
2.7. PublicationMetadata

3. Requests
3.1. GetDates
3.2. GetDocumentContent
3.3. GetLogicalSectionContent
3.4. GetPageContent
3.5. GetPublications
3.6. GetPublicationDocuments
3.7. SearchDocuments
3.8. SearchLogicalSections
3.9. SearchPages

4. Text Correction Requests
4.1. GetSectionBlocks
4.2. SubmitBlockText

A. Appendix: Example Requests and Responses


1. Introduction

Veridian runs as a CGI executable through a web server (typically Apache HTTPD). When requests are received (usually GET requests with a series of parameters in the URL), Veridian will search the built Solr index and look up information from its metadata databases, and output a response. Usually this response is HTML, but many of the requests also support XML output. Accessing the XML version is done by adding "&f=XML" to the request parameters.

The remainder of this document describes the parameters to the different XML requests, and the format of the XML responses produced.


2. Types

This section describes some common data types used in the XML responses. The purpose of the data types is to reduce duplication and inconsistency by grouping related fields together and ensuring that these fields always appear together.

2.1. DocumentContent

DocumentContent contains fields related to the contents of a document (rather than its metadata): the pages and logical sections it contains. This type is only returned from the GetDocumentContent request. It contains the following fields:

DocumentContent
Fields
DocumentLastModified The date and time the document was last modified, in ISO 8601 format with UTC timezone.

Type: String
Example: 2017-02-09T02:17:24Z
DocumentNextDocumentID The identifier of the next document within the publication, in date order. This will be empty if no next document exists (i.e. the current document is the last document in the publication).

Type: String
Example: DSC19800119
DocumentPrevDocumentID The identifier of the previous document within the publication, in date order. This will be empty if no previous document exists (i.e. the current document is the first document in the publication).

Type: String
Example: DSC19800117
DocumentPdfURL The URL of the PDF for this document, if one is available.

Type: String
Example: /cgi-bin/imageserver.pl?oid=DSC19800118&getpdf=true
DocumentViewURL The URL of the HTML page in the Veridian delivery system displaying the document.

Type: String
Example: /cgi-bin/veridian?a=d&d=DSC19800118
ArrayOfPage A container object, with zero or more occurrences (one for each page that makes up the document) of the following field:
Page
The "robots-noindex" attribute (if present and set to "true") indicates that this page should not be indexed by bots/web crawlers. A container object, with the following field:
PageMetadata
An instance of PageMetadata.
ArrayOfLogicalSection A container object, with zero or more occurrences (one for each logical section that makes up the document) of the following field:
LogicalSection
The "robots-noindex" attribute (if present and set to "true") indicates that this logical section should not be indexed by bots/web crawlers. A container object, with the following fields:
LogicalSectionMetadata
An instance of LogicalSectionMetadata.
ArrayOfLogicalSection
A container object, with zero or more occurrences (one for each child logical section) of the LogicalSection field (described above).

2.2. DocumentMetadata

DocumentMetadata contains fields related to the metadata of a document. This type is returned from many requests. It contains the following fields:

DocumentMetadata
Fields
DocumentDate The date of the document, in human-readable "DD Month YYYY" format.

Type: String
Example: 18 January 1980
DocumentFeatureCode Specifies any special information about the document, or a reason why it is not present in the Veridian system. This field is not usually present.

Type: String
Example: Missing document
DocumentID The unique identifier of the document. Document identifiers consist of the document's publication code followed by the document date in YYYYMMDD format.

Type: String
Example: DSC19800118
DocumentNumber The "number" of the document, as specified in the source METS file. This field will not be present if no value exists in the METS file.

Type: String
Example: 5255
DocumentTitle The title of the document, as specified in the source METS file. These titles may be incorrect or inconsistent, so the PublicationTitle field is generally more useful.

Type: String
Example: Daily Southern Cross
DocumentType The type of the document; possible values include BOOK, DOCUMENT, IMAGES, MULTIMEDIA, NEWSPAPER, PERIODICAL, PHOTO.

Type: String
Example: NEWSPAPER
DocumentVolume The "volume" of the document, as specified in the source METS file. This field will not be present if no value exists in the METS file.

Type: String
Example: XXX

2.3. LogicalSectionContent

LogicalSectionContent contains fields related to the content of a logical section: its text and images. This type is only returned from the GetLogicalSectionContent request. It contains the following fields:

LogicalSectionContent
Fields
LogicalSectionImagesHTML A block of HTML that displays the series of block images that make up the logical section, including highlighting of query terms (if specified). This is the same HTML that is used in the "clipped article" view in the standard Veridian user interface.

Type: HTML
Example: <div class="imagecontainer" style="width: 283px; height: 18px"><img alt="Block image" src="/cgi-bin/imageserver.pl?oid=DSC18740625.2.4.1&amp;colours=all&amp;ext=jpg&amp;area=1&amp;width=283" style="width: 283px; height: 18px" title="Block image" /></div>
LogicalSectionNextLogicalSectionID The identifier of the next logical section within the document, traversing the hierarchical logical section list. This will be empty if no next logical section exists (i.e. the current logical section is the last logical section in the document).

Type: String
Example: DSC19800118.2.5
LogicalSectionPrevLogicalSectionID The identifier of the previous logical section within the document, traversing the hierarchical logical section list. This will be empty if no previous logical section exists (i.e. the current logical section is the first logical section in the document).

Type: String
Example: DSC19800118.2.3
LogicalSectionPdfURL The URL of the PDF for this logical section, if one is available. This field is not usually present.

Type: String
Example: /cgi-bin/imageserver.pl?oid=DSC19800118.2.4&getpdf=true
LogicalSectionTextHTML The text of the logical section, in HTML format with <p> tags surrounding the blocks, and query terms (if specified) highlighted. This may be empty for some logical section types.

Type: String
Example: <p>'Entrance fee' to clubhouse shock for lunch crowd IRENE NGOO By CURRY lovers to the popular Indian curry stall at the Singapore Civil Service Sports I Council in Dempsey Road</p>
LogicalSectionTextWordCount The number of tokens in the logical section text.

Type: Number
Example: 31
LogicalSectionViewURL The URL of the HTML page in the Veridian delivery system displaying the logical section.

Type: String
Example: /cgi-bin/veridian?a=d&d=DSC19800118.2.4

2.4. LogicalSectionMetadata

LogicalSectionMetadata contains fields related to the metadata of a logical section. This type is returned from the GetLogicalSectionContent, GetDocumentContent and SearchLogicalSections requests. It contains the following fields:

LogicalSectionMetadata
Fields
LogicalSectionFirstPageID The identifier of the page on which the logical section starts. Page identifiers consist of the document identifier followed by ".1" and then the page's position within the document.

Type: String
Example: DSC19800118.1.1
LogicalSectionID The unique identifier of the logical section. Logical section identifiers consist of the document identifier followed by ".2" and then the hierarchical position of the logical section within the document.

Type: String
Example: DSC19800118.2.4
LogicalSectionTitle The title of the logical section.

Type: String
Example: 'Entrance fee' to clubhouse shock for lunch crowd
LogicalSectionType The type of the logical section; possible values being ADVERTISEMENT, ARTICLE, ARTICLE + ILLUSTRATION, GROUPING_NODE, LETTER, MISCELLANEOUS, OBITUARY.

Type: String
Example: ARTICLE

2.5. PageContent

PageContent contains fields related to the content of a page: its image and PDF. This type is only returned from the GetPageContent request. It contains the following fields:

PageContent
Fields
PageImageHTML A block of HTML for displaying the page image, including highlighting of query terms (if specified). This is the same HTML that is used in the "classic" page view in the standard Veridian user interface, but without the information for highlighting the logical section areas (this is encapsulated in the LogicalSectionBlock fields below).

Type: HTML
Example: <div class="imagecontainer" style="width: 1400px; height: 1947px"> <img alt="Page image" src="/cgi-bin/imageserver.pl?oid=DSC19800118.1.4&amp;colours=all&amp;ext=jpg&amp;width=1400" style="width: 1400px; height: 1947px" title="Page image" /> </div>
PageNextPageID The identifier of the next page within the document, traversing the linear page list. This will be empty if no next page exists (i.e. the current page is the last page in the document).

Type: String
Example: DSC19800118.1.5
PagePrevPageID The identifier of the previous page within the document, traversing the linear page list. This will be empty if no previous page exists (i.e. the current page is the first page in the document).

Type: String
Example: DSC19800118.1.3
PagePdfURL The URL of the PDF for this page, if one is available.

Type: String
Example: /cgi-bin/imageserver.pl?oid=DSC19800118.1.4&getpdf=true
PageTextHTML The text of the page, in HTML format with <p> tags surrounding the blocks, and query terms (if specified) highlighted. This may be empty for some page types.

Type: String
Example: <p>'Entrance fee' to clubhouse shock for lunch crowd IRENE NGOO By CURRY lovers to the popular Indian curry stall at the Singapore Civil Service Sports I Council in Dempsey Road</p> [truncated]
PageTextWordCount The number of tokens in the page text.

Type: Number
Example: 31
PageViewURL The URL of the HTML page in the Veridian delivery system displaying the page.

Type: String
Example: /cgi-bin/veridian?a=d&d=DSC19800118.1.4
ArrayOfLogicalSectionBlock A container object, with zero or more occurrences (one for each fragment of logical section that appears on the page) of the following field:
LogicalSectionBlock
A container object, with the following field:
LogicalSectionBlockLocation
The location of the logical section fragment within the page, in "X,Y,Width,Height" format.

Type: String
Example: 498,205,158,810
LogicalSectionID
The identifier of the logical section this fragment is part of.

Type: String
Example: DSC18450819.2.2.4
LogicalSectionTitle
The title of the logical section this fragment is part of.

Type: String
Example: Page 41 Advertisements Column 4
LogicalSectionType
The type of the logical section this fragment is part of.

Type: String
Example: ADVERTISEMENT

2.6. PageMetadata

PageMetadata contains fields related to the metadata of a page. This type is returned from the GetDocumentContent, GetPageContent and SearchPages requests. It contains the following fields:

PageMetadata
Fields
PageID The unique identifier of the page. Page identifiers consist of the document identifier followed by ".1" and then the page's position within the document.

Type: String
Example: DSC19800118.1.1
PageImageHeight The height of the original page image, in pixels.

Type: Number
Example: 9197
PageImageWidth The width of the original page image, in pixels.

Type: Number
Example: 6614
PageOCRAccuracy The estimated accuracy of the page text from the OCR process, if available from the source data. This field will not be present if no accuracy information is available from the source data.

Type: String
Example: 95%
PageTitle The title of the page. Page numbers do not necessarily start at 1 for a document, and some pages may not have a number at all.

Type: String
Example: Page 8
PageType The type of the page; possible values include PAGE, MISSING_PAGE, TOCPAGE, COVER_PAGE, SUPPLEMENT_PAGE etc.

Type: String
Example: PAGE

2.7. PublicationMetadata

PublicationMetadata contains fields related to the metadata of a publication. This type is returned from many requests. It contains the following fields:

PublicationMetadata
Fields
PublicationID The unique identifier of the publication.

Type: String
Example: DSC
PublicationTitle The human-readable name of the publication.

Type: String
Example: Daily Southern Cross

3. Requests

This section describes each of the nine available requests, with descriptions of their request parameters and response values.

3.1. GetDates

The GetDates request allows the date coverage of the collection to be obtained. It returns the list of dates containing documents, optionally filtered by a publication.

GetDates Request
Required parameters
a Must be set to "cl".
cl Must be set to "CL2".
f Must be set to "XML".
Optional parameters
dafdq Filter by date: day component of range start. Valid values are 1-31.

Type: Number
Example: 1
dafmq Filter by date: month component of range start. Valid values are 1-12.

Type: Number
Example: 1
dafyq Filter by date: year component of range start. Valid values are 1-9999.

Type: Number
Example: 1980
datdq Filter by date: day component of range end. Valid values are 1-31.

Type: Number
Example: 31
datmq Filter by date: month component of range end. Valid values are 1-12.

Type: Number
Example: 12
datyq Filter by date: year component of range end. Valid values are 1-9999.

Type: Number
Example: 1989
sp A publication ID to filter the results by. Only dates with documents in the specified publication will be returned.

Type: String
Example: DSC
GetDates Response
Fields present on failure
Error Contains an error message if the GetDates request failed. The GetDates request will fail if the "sp" parameter does not specify a valid value.

Type: String
Example: Invalid value "The New Zealand Times" for CGI argument "sp".
Fields present on success
ArrayOfDate A container object, with zero or more occurrences (one for each date in the collection/publication) of the following field:
Date
The date value in YYYY, YYYYMM or YYYYMMDD format. The "n" attribute indicates how many documents have this date.

3.2. GetDocumentContent

The GetDocumentContent request provides access to all the data available for one document: document content (including the list of pages and the hierarchical list of logical sections), and document metadata.

GetDocumentContent Request
Required parameters
a Must be set to "d".
d The identifier of the document being requested. Document identifiers consist of the document's publication code followed by the document date in YYYYMMDD format.

Type: String
Example: DSC19800118
f Must be set to "XML".
Optional parameters
(None)
GetDocumentContent Response
Fields present on failure
Error Contains an error message if the GetDocumentContent request failed. The GetDocumentContent request will fail if the "d" parameter does not specify a valid document identifier.

Type: String
Example: Missing required CGI argument "d".
Fields present on success
Document The "robots-noindex" attribute (if present and set to "true") indicates that this document should not be indexed by bots/web crawlers. A container object, with the following fields:
DocumentContent
An instance of DocumentContent.
DocumentMetadata
An instance of DocumentMetadata.
PublicationMetadata
An instance of PublicationMetadata.

3.3. GetLogicalSectionContent

The GetLogicalSectionContent request provides access to all the data available for one logical section: logical section content (both text and images), and logical section metadata.

GetLogicalSectionContent Request
Required parameters
a Must be set to "d".
d The identifier of the logical section being requested. Logical section identifiers consist of the document identifier followed by ".2" and then the hierarchical position of the logical section within the document.

Type: String
Example: DSC19800118.2.4
f Must be set to "XML".
Optional parameters
hl A comma separated list of phrases/terms to locate in the logical section images/text returned.

Type: String
Example: the daily,cross
GetLogicalSectionContent Response
Fields present on failure
Error Contains an error message if the GetLogicalSectionContent request failed. The GetLogicalSectionContent request will fail if the "d" parameter does not specify a valid logical section identifier.

Type: String
Example: Missing required CGI argument "d".
Fields present on success
LogicalSection The "robots-noindex" attribute (if present and set to "true") indicates that this logical section should not be indexed by bots/web crawlers. A container object, with the following fields:
LogicalSectionContent
An instance of LogicalSectionContent.
LogicalSectionMetadata
An instance of LogicalSectionMetadata.
DocumentMetadata
An instance of DocumentMetadata.
PublicationMetadata
An instance of PublicationMetadata.

3.4. GetPageContent

The GetPageContent request provides access to all the data available for one page: page content (image and PDF), and page metadata.

GetPageContent Request
Required parameters
a Must be set to "d".
d The identifier of the page being requested. Page identifiers consist of the document identifier followed by ".1" and then the page's position within the document.

Type: String
Example: DSC19800118.1.4
f Must be set to "XML".
Optional parameters
hl A comma separated list of phrases/terms to locate in the page image/text returned.

Type: String
Example: the daily,cross
GetPageContent Response
Fields present on failure
Error Contains an error message if the GetPageContent request failed. The GetPageContent request will fail if the "d" parameter does not specify a valid page identifier.

Type: String
Example: Missing required CGI argument "d".
Fields present on success
Page The "robots-noindex" attribute (if present and set to "true") indicates that this page should not be indexed by bots/web crawlers. A container object, with the following fields:
PageContent
An instance of PageContent.
PageMetadata
An instance of PageMetadata.
DocumentMetadata
An instance of DocumentMetadata.
PublicationMetadata
An instance of PublicationMetadata.

3.5. GetPublications

The GetPublications request returns the list of publications in the collection.

GetPublications Request
Required parameters
a Must be set to "cl".
cl Must be set to "CL1".
f Must be set to "XML".
Optional parameters
(None)
GetPublications Response
Fields present on failure
Error Contains an error message if the GetPublications request failed. The GetPublications request should not fail.
Fields present on success
ArrayOfPublication A container object, with zero or more occurrences (one for each publication in the collection) of the following field:
Publication
A container object, with the following field:
PublicationMetadata
An instance of PublicationMetadata.

3.6. GetPublicationDocuments

The GetPublicationDocuments request returns the list of documents in a publication.

GetPublicationDocuments Request
Required parameters
a Must be set to "cl".
cl Must be set to "CL1".
f Must be set to "XML".
sp The identifier of the publication being requested.

Type: String
Example: DSC
Optional parameters
(None)
GetPublicationDocuments Response
Fields present on failure
Error Contains an error message if the GetPublicationDocuments request failed. The GetPublicationDocuments request will fail if the "sp" parameter does not specify a valid publication identifier.

Type: String
Example: Invalid value "The New Zealand Times" for CGI argument "sp".
Fields present on success
ArrayOfDocument A container object, with zero or more occurrences (one for each document in the publication) of the following field:
Document
The "robots-noindex" attribute (if present and set to "true") indicates that this document should not be indexed by bots/web crawlers. A container object, with the following field:
DocumentMetadata
An instance of DocumentMetadata.

3.7. SearchDocuments

The SearchDocuments request allows the thousands of documents in the collection to be listed, and optionally filtered in a number of different ways. It returns a list of matching documents.

SearchDocuments Request
Required parameters
a Must be set to "q".
f Must be set to "XML".
leq Must be set to "Document".
Optional parameters
dafdq Filter by date: day component of range start. Valid values are 1-31.

Type: Number
Example: 1
dafmq Filter by date: month component of range start. Valid values are 1-12.

Type: Number
Example: 1
dafyq Filter by date: year component of range start. Valid values are 1-9999.

Type: Number
Example: 1980
datdq Filter by date: day component of range end. Valid values are 1-31.

Type: Number
Example: 31
datmq Filter by date: month component of range end. Valid values are 1-12.

Type: Number
Example: 12
datyq Filter by date: year component of range end. Valid values are 1-9999.

Type: Number
Example: 1989
deq Filter by decade.

Type: String
Example: 198
o The number of search results to return. Default value is 20 if not specified. Maximum value is 100. Used in conjunction with the "r" parameter, this allows the results for a search to be requested in blocks. This is particularly important when there are many matches for a search.

Type: Number
Example: 20
puq Filter by publication ID.

Type: String
Example: DSC
r The number of the first search result to return. Default value is 1 if not specified. Used in conjunction with the "o" parameter, this allows the results for a search to be requested in blocks. This is particularly important when there are many matches for a search.

Type: Number
Example: 21
sf Allows the search results to be sorted by a document field; valid values are "byDA" (document date), "byPU" (publication ID) and "byTY" (document type). If this parameter is not specified, the search results will be sorted by relevance. To sort in reverse, add ".rev" to the end of the value (e.g. "byDA.rev" to reverse sort by document date).

Type: String
Example: byDA
tyq Filter by document type.

Type: String
Example: BOOK
yeq Filter by year.

Type: String
Example: 1980
SearchDocuments Response
Fields present on failure
Error Contains an error message if the SearchDocuments request failed. The SearchDocuments request will fail if invalid values are specified for the "o" or "sf" parameters. If invalid values are specified for the filter parameters then no search results will be returned.

Type: String
Example: Invalid value "Date" for CGI argument "sf".
Fields present on success
TotalNumberOfSearchResults The number of documents that matched the search criteria. May be 0.

Type: Number
Example: 94851
FirstSearchResultNumberReturned The number of the first search result in the results returned.

Type: Number
Example: 1
LastSearchResultNumberReturned The number of the last search result in the results returned.

Type: Number
Example: 20
ArrayOfDocument A container object, with zero or more occurrences (one for each document returned) of the following field:
Document
The "robots-noindex" attribute (if present and set to "true") indicates that this document should not be indexed by bots/web crawlers. A container object, with the following fields:
SearchResultNumber
The position of the search result amongst all the matches to the query (not just those returned).

Type: Number
Example: 5
DocumentMetadata
An instance of DocumentMetadata.
PublicationMetadata
An instance of PublicationMetadata.
ArrayOfSearchFacet A container object, with zero or more occurrences (one for each search facet returned) of the following field:
SearchFacet
A container object, with the following fields:
SearchFacetField
The field display name and field name (in parentheses) of the search facet.

Type: String
Example: PublicationCode (PU)
SearchFacetValue
The value of the search facet.

Type: String
Example: DSC
SearchFacetCount
The frequency of this search facet within the search results.

Type: Number
Example: 494

3.8. SearchLogicalSections

The SearchLogicalSections request allows the thousands/millions of logical sections in the collection to be searched by keyword, and optionally filtered in a number of different ways. It returns a list of matching logical sections.

SearchLogicalSections Request
Required parameters
a Must be set to "q".
f Must be set to "XML".
leq Must be set to "Logical".
txq One or more keywords or phrases to search for in the logical section text. Phrases are specified using double quote characters at the start and end of the phrase.

Type: String
Example: "hamilton farming"
Optional parameters
dafdq Filter by date: day component of range start. Valid values are 1-31.

Type: Number
Example: 1
dafmq Filter by date: month component of range start. Valid values are 1-12.

Type: Number
Example: 1
dafyq Filter by date: year component of range start. Valid values are 1-9999.

Type: Number
Example: 1980
datdq Filter by date: day component of range end. Valid values are 1-31.

Type: Number
Example: 31
datmq Filter by date: month component of range end. Valid values are 1-12.

Type: Number
Example: 12
datyq Filter by date: year component of range end. Valid values are 1-9999.

Type: Number
Example: 1989
deq Filter by decade.

Type: String
Example: 198
o The number of search results to return. Default value is 20 if not specified. Maximum value is 100. Used in conjunction with the "r" parameter, this allows the results for a search to be requested in blocks. This is particularly important when there are many matches for a search.

Type: Number
Example: 20
puq Filter by publication ID.

Type: String
Example: DSC
r The number of the first search result to return. Default value is 1 if not specified. Used in conjunction with the "o" parameter, this allows the results for a search to be requested in blocks. This is particularly important when there are many matches for a search.

Type: Number
Example: 21
sf Allows the search results to be sorted by a logical section field; valid values are "byDA" (document date), "byPU" (publication ID), "byTI" (section title) and "byTY" (section type). If this parameter is not specified, the search results will be sorted by relevance. To sort in reverse, add ".rev" to the end of the value (e.g. "byDA.rev" to reverse sort by document date).

Type: String
Example: byDA
ssnip Specifies the type of search result "snippet" to return. Valid values are "img" for image snippets, "txt" for text snippets, "auto" for an image snippet if possible (METS/ALTO data) or a text snippet otherwise, and "" for no search snippets. Default value is "txt" if not specified.

Type: String
Example: txt
t Set to "1" to perform an "any" search (one or more of the query keywords/phrases specified in "txq" must match), or "0" to perform an "all" search (all of the query keywords/phrases specified in "txq" must match). Default value is "0" if not specified.

Type: Number
Example: 1
tyq Filter by section type.

Type: String
Example: ADVERTISEMENT
wofq Filter by section text word count: range start.

Type: String
Example: 50
wotq Filter by section text word count: range end.

Type: String
Example: 100
yeq Filter by year.

Type: String
Example: 1980
SearchLogicalSections Response
Fields present on failure
Error Contains an error message if the SearchLogicalSections request failed. The SearchLogicalSections request will fail if the "txq" parameter is empty, or invalid values are specified for the "o", "sf" or "ssnip" parameters. If invalid values are specified for the filter parameters then no search results will be returned.

Type: String
Example: Missing required CGI argument "txq".
Fields present on success
TotalNumberOfSearchResults The number of logical sections that matched the search criteria. May be 0.

Type: Number
Example: 94851
FirstSearchResultNumberReturned The number of the first search result in the results returned.

Type: Number
Example: 1
LastSearchResultNumberReturned The number of the last search result in the results returned.

Type: Number
Example: 20
ArrayOfLogicalSection A container object, with zero or more occurrences (one for each logical section returned) of the following field:
LogicalSection
The "robots-noindex" attribute (if present and set to "true") indicates that this logical section should not be indexed by bots/web crawlers. A container object, with the following fields:
SearchResultNumber
The position of the search result amongst all the matches to the query (not just those returned).

Type: Number
Example: 5
SearchResultScore
The relevancy score for the search result, higher scores being better matches (more relevant).

Type: Number
Example: 3708
SearchResultSnippetHTML
A small preview of the search result, showing the first matching term in the section.

Type: HTML
Example: <div class="txtsearchsnippet">... . 80 0 I up ili urn .. ..8 6 <b class="highlightcolor">Hamilton</b> .. 85 0 OT'Jiv .. .. 7 C Cambridge ...</div>
LogicalSectionMetadata
An instance of LogicalSectionMetadata.
DocumentMetadata
An instance of DocumentMetadata.
PublicationMetadata
An instance of PublicationMetadata.
ArrayOfSearchFacet A container object, with zero or more occurrences (one for each search facet returned) of the following field:
SearchFacet
A container object, with the following fields:
SearchFacetField
The field display name and field name (in parentheses) of the search facet.

Type: String
Example: Type (TY)
SearchFacetValue
The value of the search facet.

Type: String
Example: advertisement
SearchFacetCount
The frequency of this search facet within the search results.

Type: Number
Example: 494

3.9. SearchPages

The SearchPages request allows the thousands/millions of pages in the collection to be searched by keyword, and optionally filtered in a number of different ways. It returns a list of matching pages.

SearchPages Request
Required parameters
a Must be set to "q".
f Must be set to "XML".
leq Must be set to "Page".
txq One or more keywords or phrases to search for in the page text. Phrases are specified using double quote characters at the start and end of the phrase.

Type: String
Example: "hamilton farming"
Optional parameters
dafdq Filter by date: day component of range start. Valid values are 1-31.

Type: Number
Example: 1
dafmq Filter by date: month component of range start. Valid values are 1-12.

Type: Number
Example: 1
dafyq Filter by date: year component of range start. Valid values are 1-9999.

Type: Number
Example: 1980
datdq Filter by date: day component of range end. Valid values are 1-31.

Type: Number
Example: 31
datmq Filter by date: month component of range end. Valid values are 1-12.

Type: Number
Example: 12
datyq Filter by date: year component of range end. Valid values are 1-9999.

Type: Number
Example: 1989
deq Filter by decade.

Type: String
Example: 198
o The number of search results to return. Default value is 20 if not specified. Maximum value is 100. Used in conjunction with the "r" parameter, this allows the results for a search to be requested in blocks. This is particularly important when there are many matches for a search.

Type: Number
Example: 20
puq Filter by publication ID.

Type: String
Example: DSC
r The number of the first search result to return. Default value is 1 if not specified. Used in conjunction with the "o" parameter, this allows the results for a search to be requested in blocks. This is particularly important when there are many matches for a search.

Type: Number
Example: 21
sf Allows the search results to be sorted by a page field; valid values are "byDA" (document date), "byPU" (publication ID) and "byTI" (page title). If this parameter is not specified, the search results will be sorted by relevance. To sort in reverse, add ".rev" to the end of the value (e.g. "byDA.rev" to reverse sort by document date).

Type: String
Example: byDA
ssnip Specifies the type of search result "snippet" to return. Valid values are "img" for image snippets, "txt" for text snippets, "auto" for an image snippet if possible (METS/ALTO data) or a text snippet otherwise, and "" for no search snippets. Default value is "txt" if not specified.

Type: String
Example: txt
t Set to "1" to perform an "any" search (one or more of the query keywords/phrases specified in "txq" must match), or "0" to perform an "all" search (all of the query keywords/phrases specified in "txq" must match). Default value is "0" if not specified.

Type: Number
Example: 1
wofq Filter by page text word count: range start.

Type: String
Example: 50
wotq Filter by page text word count: range end.

Type: String
Example: 100
yeq Filter by year.

Type: String
Example: 1980
SearchPages Response
Fields present on failure
Error Contains an error message if the SearchPages request failed. The SearchPages request will fail if the "txq" parameter is empty, or invalid values are specified for the "o", "sf" or "ssnip" parameters. If invalid values are specified for the filter parameters then no search results will be returned.

Type: String
Example: Missing required CGI argument "txq".
Fields present on success
TotalNumberOfSearchResults The number of pages that matched the search criteria. May be 0.

Type: Number
Example: 94851
FirstSearchResultNumberReturned The number of the first search result in the results returned.

Type: Number
Example: 1
LastSearchResultNumberReturned The number of the last search result in the results returned.

Type: Number
Example: 20
ArrayOfPage A container object, with zero or more occurrences (one for each page returned) of the following field:
Page
The "robots-noindex" attribute (if present and set to "true") indicates that this page should not be indexed by bots/web crawlers. A container object, with the following fields:
SearchResultNumber
The position of the search result amongst all the matches to the query (not just those returned).

Type: Number
Example: 5
SearchResultScore
The relevancy score for the search result, higher scores being better matches (more relevant).

Type: Number
Example: 3708
SearchResultSnippetHTML
A small preview of the search result, showing the first matching term in the section.

Type: HTML
Example: <div class="txtsearchsnippet">... . 80 0 I up ili urn .. ..8 6 <b class="highlightcolor">Hamilton</b> .. 85 0 OT'Jiv .. .. 7 C Cambridge ...</div>
PageMetadata
An instance of PageMetadata.
DocumentMetadata
An instance of DocumentMetadata.
PublicationMetadata
An instance of PublicationMetadata.
ArrayOfSearchFacet A container object, with zero or more occurrences (one for each search facet returned) of the following field:
SearchFacet
A container object, with the following fields:
SearchFacetField
The field display name and field name (in parentheses) of the search facet.

Type: String
Example: PublicationCode (PU)
SearchFacetValue
The value of the search facet.

Type: String
Example: DSC
SearchFacetCount
The frequency of this search facet within the search results.

Type: Number
Example: 494

4. Text Correction Requests

It is possible to correct the text of the pages and logical sections in the collection through the XML API. There are some limitations to this functionality however:

This functionality is not enabled by default, and it requires some setup (please contact DL Consulting for assistance):

This section describes the two available requests, with descriptions of their request parameters and response values.

4.1. GetSectionBlocks

The GetSectionBlocks request allows information about the blocks and lines that make up pages/logical sections to be obtained. It returns the list of blocks in the specified page/logical section, along with the text of each line in the block.

GetSectionBlocks Request
Required parameters
a Must be set to "tc".
d The identifier of the logical section or page being requested. Logical section identifiers consist of the document identifier followed by ".2" and then the hierarchical position of the logical section within the document. Page identifiers consist of the document identifier followed by ".1" and then the page's position within the document.

Type: String
Example: DSC19800118.2.4
f Must be set to "XML".
Optional parameters
(None)
GetSectionBlocks Response
Fields present on failure
Error Contains an error message if the GetSectionBlocks request failed. The GetSectionBlocks request will fail if the "d" parameter does not specify a valid value, the document is not editable, or if the text correction functionality has not been configured correctly.

Type: String
Example: Invalid value "BOOK1856.1.1000" for CGI argument "d".
Fields present on success
ArrayOfBlock A container object, with zero or more occurrences (one for each block in the logical section/page) of the following field:
Block
A container object, with the following fields:
PageID
The identifier of the page containing the block.

Type: String
Example: BOOK1856.1.1
BlockID
The identifier of the block on the page.

Type: String
Example: P1_TB00001
BlockImageHTML
A piece of HTML for displaying the block (cut out from the page image).

Type: HTML
Example: <div class="imagecontainer" style="width: 600px; height: 221px"><img alt="Block image" src="/cgi-bin/imageserver.pl?oid=BOOK1856.1.1&amp;colours=all&amp;ext=jpg&amp;area=1&amp;width=600" style="width: 600px; height: 221px" title="Block image" /></div>
ArrayOfLine
A container object, with zero or more occurrences (one for each line in the block) of the following field:
Line
A container object, with the following fields:
LineID
The identifier of the line on the page.

Type: String
Example: P1_TL00001
LineText
The text of the line.

Type: String
Example: ESTABLISIIKIJ 13413 JNo. 5/255. VOL. AAA.
BlockCompletelyCorrect
Whether the block has been marked as 100% complete and correct.

Type: Boolean
Example: true

4.2. SubmitBlockText

The SubmitBlockText request is used to submit text corrections for one block back to the index. It returns a message summarising the result of the submission.

SubmitBlockText Request
Required parameters
a Must be set to "tc".
blockid The identifier of the block being corrected. This comes from the GetSectionBlocks response <BlockID> tag.
d The identifier of the logical section or page being corrected. Logical section identifiers consist of the document identifier followed by ".2" and then the hierarchical position of the logical section within the document. Page identifiers consist of the document identifier followed by ".1" and then the page's position within the document.

Type: String
Example: DSC19800118.2.4
f Must be set to "XML".
lid [Multiple values] The identifiers of the lines in the block. These come from the GetSectionBlocks response <LineID> tags. It's important that the order is unchanged.
ntv [Multiple values] The new (corrected) text values for the lines in the block. These come from the text corrector, being corrected versions of the GetSectionBlocks response <LineText> tags. It's important that the order is unchanged.
otv [Multiple values] The old (pre-correction) text values for the lines in the block. These come from the GetSectionBlocks response <LineText> tags. It's important that the order is unchanged.
pageoid The identifier of the page containing the block. This comes from the GetSectionBlocks response <PageID> tag.
submit Must be set to "1".
Optional parameters
blockcc Set to "true" to mark the block as 100% complete and correct.
SubmitBlockText Response
Fields present on failure
Error Contains an error message if the SubmitBlockText request failed. The GetSectionBlocks request will fail if the "d" parameter does not specify a valid value, the document is not editable, the "otv" values are incorrect or the "ntv" values are unchanged, or if the text correction functionality has not been configured correctly.

Type: String
Example: No lines were changed for (BOOK1856.1.1,BOOK1856.2.1.1)::P1_TB00001
Fields present on success
SubmissionResult A message summarising how many lines in the block were updated.

Type: String
Example: All 2 changed lines were updated successfully for (BOOK1856.1.1,BOOK1856.2.1.1)::P1_TB00001 (2 lines were unchanged)

A. Appendix: Example Requests and Responses

Request
GetDates a=cl&cl=CL2&f=XML
Request list of all the dates with documents in the collection.
GetDocumentContent a=d&d=DSC18740625&f=XML
Request contents of the Daily Southern Cross June 25 1874 issue.
GetLogicalSectionContent a=d&d=DSC18740625.2.4.1&hl=waipu&f=XML
Request contents of the Daily Southern Cross June 25 1874 "PORT OF AUCKLAND" article, with occurrences of "waipu" located.
GetPageContent a=d&d=DSC18740625.1.1&f=XML
Request contents of the Daily Southern Cross June 25 1874 first page.
GetPublications a=cl&cl=CL1&f=XML
Request list of all the publications in the collection.
GetPublicationDocuments a=cl&cl=CL1&sp=DSC&f=XML
Request list of all the issues in the Daily Southern Cross.
SearchDocuments a=q&leq=Document&dafyq=1850&datyq=1900&puq=DSC&sf=byDA&f=XML
Request issues in the Daily Southern Cross between 1850 and 1900, sorted by date.
SearchLogicalSections a=q&leq=Logical&txq=hamilton&dafyq=1850&datyq=1900&puq=DSC&sf=byTI&f=XML
Request articles in the Daily Southern Cross between 1850 and 1900 containing "hamilton", sorted by article title.
SearchPages a=q&leq=Page&txq=hamilton&dafyq=1850&datyq=1900&puq=DSC&sf=byDA&f=XML
Request pages in the Daily Southern Cross between 1850 and 1900 containing "hamilton", sorted by date.
Text Correction Request
GetSectionBlocks a=tc&d=BOOK1856.1.1&f=XML
Request list of all the text blocks on the first page of the BOOK1856 document.
SubmitBlockText a=tc&d=BOOK1856.1.1&pageoid=BOOK1856.1.1&blockid=P1_TB00001&lid=P1_TL00001&otv=GESCHICHT&ntv=GESCHICHTE&lid=P1_TL00002&otv=DES&ntv=DES&lid=P1_TL00003&otv=ABENDLANDES&ntv=ABENDLANDES&submit=1&f=XML
Correct "GESCHICHT" to "GESCHICHTE" in the first line of the first block on the first page of the BOOK1856 document.

© Copyright DL Consulting Ltd., 2008-2017