EPrints Technical Mailing List Archive
See the EPrints wiki for instructions on how to join this mailing list and related information.
Message: #06796
< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First
Re: [EP-tech] Fixity Check and EPrints - Digital Preservation
- To: "eprints-tech@ecs.soton.ac.uk" <eprints-tech@ecs.soton.ac.uk>
- Subject: Re: [EP-tech] Fixity Check and EPrints - Digital Preservation
- From: John Salter <J.Salter@leeds.ac.uk>
- Date: Fri, 25 Aug 2017 10:03:40 +0000
Hi Tomasz, I think we're looking into similar things at the moment :o) I think there are similarities between 'fixity' and 'probity' - so although there isn't integration of fixity, this might be useful info: EPrints does support 'probity' files (http://www.probity.org/), which include a hash of the contents. I don’t think these are generated by default, but the $doc->rehash command should generate them. See the EPrints::Probity module, and the 'rehash' option of bin/epadmin. Running [EPRINTS_ROOT]/bin/epadmin rehash [ARCHIVEID] [docid] will generate a file in the owning eprint folder e.g. [EPRINTS_ROOT]/archives/[ARCHIVEID]/documents/disk0/00/00/00/01/1.2017-08-25T09=003a55=003a29Z.xsh (for eprintid = 1, and docid = 1. Note the endcoded ':'s (=003a) in the timestamp in the filename). The file has the following data: <?xml version="1.0" encoding="UTF-8" ?> <hashlist xmlns="http://probity.org/XMLprobity"> <hash> <name>wreo.txt</name> <algorithm>MD5</algorithm> <value>17f861744d77c1d9754fd7ab6f403065</value> <date>2017-08-25T09:55:45Z</date> </hash> </hashlist> You can create multiple Probity files, but I don't think there's any way to compare one with another, or check the current checksum is equal to the most recently store one (which is
the main part of your question). Cheers, John PS I'm also looking into DROID - as you were at some point. The Bazaar package needs an update or three… From: eprints-tech-bounces@ecs.soton.ac.uk [mailto:eprints-tech-bounces@ecs.soton.ac.uk]
On Behalf Of Tomasz Neugebauer I believe that EPrints stores a checksum value for each uploaded file, but as far as I understand, there is no way to monitor if the checksums match up with current file, and thus no way of
checking for bit rot. DSpace has the following:
https://wiki.duraspace.org/display/DSDOC6x/Validating+CheckSums+of+Bitstreams A periodic fixity check is a part of the lowest level of support for digital preservation, i.e., “Bit-level”. See some examples of Digital Preservation policy, all of which have some variation on this as a requirement:“regularly
audit checksums to ensure that no files have corrupted or changed in any way. This practice ensures the ability to provide an exact copy of original files over time”: ·
https://www.sfu.ca/content/dam/sfu/archives/DigitalPreservation/FormatPolicyRegistry.pdf “Regularly perform
fixity checks on AIPs” ·
https://digital.library.yorku.ca/documentation/fixity-procedures “York University Library are committed to maintaining the integrity
of objects in its care. This includes creating checksums for all archival format objects -- plus associated datastreams -- ingested into the repository, and regular fixity checking of those objects” ·
https://researchworks.lib.washington.edu/policy-preservation.html "Maintains the authenticity of the bitstream through integrity
checking” I understand that EPrints is primarily an open access platform, but I think that we should be able to provide at least the lowest “bit-level” digital preservation support with it, and without a Fixity check, I don’t think
we can ensure that no files are corrupted or changed over time. Preservation Metadata for Institutional Repositories, a report looking at EPrints and digital preservation dating back to 2007 states the following
about Fixity checking “Where is fixity check first performed? Not within EPrints currently, but a script that crawls the archive comparing files with checksums is possible”.
We are now 10 years later, and I am wondering if and how institutions running EPrints are implementing their Fixity checks? Are you using an external tool like this:
https://www.avpreserve.com/tools/fixity/? Are you using your own custom script? Did you develop something that is integrated with the EPrints Admin interface?
Tomasz ________________________________________________
Tomasz Neugebauer
Tel. / Tél. 514-848-2424 ext. / poste 7738
Mailing address / adresse postale: 1455 De Maisonneuve Blvd. W., LB-540-03, Montreal, Quebec H3G 1M8
http://library.concordia.ca |
- References:
- [EP-tech] Fixity Check and EPrints - Digital Preservation
- From: Tomasz Neugebauer <Tomasz.Neugebauer@concordia.ca>
- [EP-tech] Fixity Check and EPrints - Digital Preservation
- Prev by Date: Re: [EP-tech] SSL (HTTPS) only for an EPrints repository
- Next by Date: Re: [EP-tech] Fixity Check and EPrints - Digital Preservation
- Previous by thread: [EP-tech] Fixity Check and EPrints - Digital Preservation
- Next by thread: Re: [EP-tech] Fixity Check and EPrints - Digital Preservation
- Index(es):