LoCloud Hackathon winner !
The hackathon was organised by LoCloud and Europeana in the context of EuropeanaTech 2015.
The Metadata & Object Repository (MoRe) is an easy and powerful tool to aggregate information and harvest metadata from multiple sources in multiple schemas. Such aggregation schemas usually create problematic situations regarding the quality of the harvested metadata.
Metadata may pass the standard Europeana XML validity tests but they may include problematic metadata values. For instance:
* a dc:date value could be formatted in the wrong way:
<dc:date>approximately 18th century</dc:date>
This format is not correct according to established date formats.
* an author name could be incomplete according to bibliographic standards.
* a URL may be invalid. E.g.: <ese:isShownAt>http://invalidurl.com/error-url</ese:isShownAt>
The aim of the MoRe Quality tool is to implement a validation system which could be able to catch these errors and produce useful reports to the collection administrators.
MoRe Quality communicates with the LoCloud MoRe instance using a specific user API Key, retrieves the ingested metadata and performs
evaluations to identify common errors such as:
* Invalid date formats (ISO 8601 standard)
* Invalid hyperlinks
* Invalid language codes (ISO 639 standard)
* Invalid author names
The results are presented to the user in a simple report.
The application is implemented in such a way that enables developers to add extra evaluation rules in an easy and intuitive way by implementing simple functions - plugins.
MoRe Quality is implemented using linux and python 2.7.
Some common python modules are utilised:
* Virtual environments
* Python Requests
The prototype is not currently running on a production server but the full source code freely available at: https://bitbucket.org/vbanos/more-quality/