API

The Sickle Client

class sickle.app.Sickle(endpoint, http_method='GET', protocol_version='2.0', iterator=<class 'sickle.iterator.OAIItemIterator'>, max_retries=5, timeout=None, class_mapping=None, auth=None)

Client for harvesting OAI interfaces.

Use it like this:

>>> sickle = Sickle('http://elis.da.ulcc.ac.uk/cgi/oai2')
>>> records = sickle.ListRecords(metadataPrefix='oai_dc')
>>> records.next()
<Record oai:eprints.rclis.org:3780>
Parameters:
  • endpoint (str) – The endpoint of the OAI interface.
  • http_method (str) – Method used for requests (GET or POST, default: GET).
  • protocol_version (str) – The OAI protocol version.
  • iterator – The type of the returned iterator (default: sickle.iterator.OAIItemIterator)
  • max_retries (int) – Number of retries if HTTP request fails.
  • timeout (int) – Timeout for HTTP requests.
  • class_mapping (dict) – A dictionary that maps OAI verbs to classes representing OAI items. If not provided, sickle.app.DEFAULT_CLASS_MAPPING will be used.
  • auth (tuple) – An optional tuple (‘username’, ‘password’) for accessing protected OAI interfaces.
last_response

Contains the last response that has been received.

GetRecord(**kwargs)

Issue a ListSets request.

Identify()

Issue an Identify request.

Return type:sickle.models.Identify
ListIdentifiers(ignore_deleted=False, **kwargs)

Issue a ListIdentifiers request.

Parameters:ignore_deleted – If set to True, the resulting iterator will skip records flagged as deleted.
Return type:sickle.iterator.BaseOAIIterator
ListMetadataFormats(**kwargs)

Issue a ListMetadataFormats request.

Return type:sickle.iterator.BaseOAIIterator
ListRecords(ignore_deleted=False, **kwargs)

Issue a ListRecords request.

Parameters:ignore_deleted – If set to True, the resulting iterator will skip records flagged as deleted.
Return type:sickle.iterator.BaseOAIIterator
ListSets(**kwargs)

Issue a ListSets request.

Return type:sickle.iterator.BaseOAIIterator
harvest(**kwargs)

Make HTTP requests to the OAI server.

Parameters:kwargs – OAI HTTP parameters.
Return type:sickle.OAIResponse

Working with OAI Responses

class sickle.response.OAIResponse(http_response, params)

A response from an OAI server.

Provides access to the returned data on different abstraction levels.

Parameters:
  • http_response – The original HTTP response.
  • params (dict) – The OAI parameters for the request.
raw

The server’s response as unicode.

xml

The server’s response as parsed XML.

Iterating over OAI Items

class sickle.iterator.OAIItemIterator(sickle, params, ignore_deleted=False)

Iterator over OAI records/identifiers/sets transparently aggregated via OAI-PMH.

Can be used to conveniently iterate through the records of a repository.

Parameters:
  • sickle (sickle.app.Sickle) – The Sickle object that issued the first request.
  • params (dict) – The OAI arguments.
  • ignore_deleted (bool) – Flag for whether to ignore deleted records.
sickle

The sickle.app.Sickle instance used for making requests to the server.

verb

The OAI verb used for making requests to the server.

element

The name of the OAI item to iterate on (record, header, set or metadataFormat).

resumption_token

The content of the XML element resumptionToken from the last request.

ignore_deleted

Flag for whether to skip records marked as deleted.

next()

Return the next record/header/set.

Iterating over OAI Responses

class sickle.iterator.OAIResponseIterator(sickle, params, ignore_deleted=False)

Iterator over OAI responses.

next()

Return the next response.

Classes for OAI Items

The following classes represent OAI-specific items like records, headers, and sets. All items feature the attributes raw and xml which contain their original XML representation as unicode and as parsed XML objects.

Note

Sickle’s automatic mapping of XML to OAI objects only works for Dublin Core encoded record data.

Identify Object

The Identify object is generated from Identify responses and is returned by sickle.app.Sickle.Identify(). It contains general information about the repository.

class sickle.models.Identify(identify_response)

Represents an Identify container.

This object differs from the other entities in that is has to be created from a sickle.response.OAIResponse instead of an XML element.

Parameters:identify_response (sickle.OAIResponse) – The response for an Identify request.

Note

As the attributes of this class are auto-generated from the Identify XML elements, some of them may be missing for specific OAI interfaces.

adminEmail

The content of the element adminEmail. Normally the repository’s administrative contact.

baseURL

The content of the element baseURL, which is the URL of the repository’s OAI endpoint.

respositoryName

The content of the element repositoryName, which contains the name of the repository.

deletedRecord

The content of the element deletedRecord, which indicates whether and how the repository keeps track of deleted records.

delimiter

The content of the element delimiter.

description

The content of the element description, which contains a description of the repository.

earliestDatestamp

The content of the element earliestDatestamp, which indicates the datestamp of the oldest record in the repository.

granularity

The content of the element granularity, which indicates the granularity of the used dates.

oai_identifier

The content of the element oai-identifier.

Note

oai-identifier is not a valid name in Python.

protocolVersion

The content of the element protocolVersion, which indicates the version of the OAI protocol implemented by the repository.

repositoryIdentifier

The content of the element repositoryIdentifier.

sampleIdentifier

The content of the element sampleIdentifier, which usually contains an example of an identifier used by this repository.

scheme

The content of the element scheme.

raw

The original XML as unicode.

Record Object

Record objects represent single OAI records.

class sickle.models.Record(record_element, strip_ns=True)

Represents an OAI record.

Parameters:
  • record_element (lxml.etree._Element) – The XML element ‘record’.
  • strip_ns – Flag for whether to remove the namespaces from the element names.
header

Contains the record header represented as a sickle.models.Header object.

deleted

A boolean flag that indicates whether this record is deleted.

raw

The original XML as unicode.

Header Object

Header objects represent OAI headers.

class sickle.models.Header(header_element)

Represents an OAI Header.

Parameters:header_element (lxml.etree._Element) – The XML element ‘header’.
raw

The original XML as unicode.

Set Object

class sickle.models.Set(set_element)

Represents an OAI set.

Parameters:set_element (lxml.etree._Element) – The XML element ‘set’.
setName

The name of the set.

setSpec

The identifier of this set used for querying.

raw

The original XML as unicode.

MetadataFormat Object

class sickle.models.MetadataFormat(mdf_element)

Represents an OAI MetadataFormat.

Parameters:mdf_element (lxml.etree._Element) – The XML element ‘metadataFormat’.
metadataPrefix

The prefix used to identify this format.

metadataNamespace

The namespace URL for this format.

schema

The URL to the schema file of this format.

raw

The original XML as unicode.