EPrints Technical Mailing List Archive
See the EPrints wiki for instructions on how to join this mailing list and related information.
Message: #02849
< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First
[EP-tech] Re: Thousands of old eprints repropagated via OAI after epadmin redo_thumbnails &co.
- To: eprints-tech@ecs.soton.ac.uk
- Subject: [EP-tech] Re: Thousands of old eprints repropagated via OAI after epadmin redo_thumbnails &co.
- From: Sebastien Francois <sf2@ecs.soton.ac.uk>
- Date: Tue, 08 Apr 2014 10:57:08 +0100
Hi Florian, Interesting scenario!.. On 08/04/14 07:43, Florian Heß wrote:
In the OAI scenario, I think that the OAI clients are faulty as an update of the lastmod timestamp doesn't modify the resource's unique identifier which should be used to see if an item is new or being updated.Hi, is there an option (am I just missing it? using EPrints v3.3.10) to leave the current lastmod timestamp untouched when processing an epadmin or alike routine automated by EPrints-boxed tools? We had in the past and will still have a need to batch-process plenty of eprints, epadmin redo_thumbnails for instance, which results in e.g. their being renotified via our aggregator for freshly acquired media (RSS-feed and mail channel both are limited to 1000 items per request, thus some really fresh ones might be suppressed in the list). Client OAI harvesters might handle them as new, too, which would be not that user-friendly.
But I agree that certain actions shouldn't update the lastmod field (cf. below).
There's a "non_volatile_change" flag you can set (grep for it in DataObj/EPrint.pm), which does pretty much the same as "no_autoupdate_lastmod".Pondering on it, I would even prefer to see EPrints update it only when a non-admin user has acted upon an eprint, when they changed metadata. But sometimes the admin might want to touch eprints "obviously" indeed, e.g. when he changed field values using the regular workflow or when he explicitly opts in that. To put it in a nutshell, I'd wish I could use EPrints API this way: use EPrints qw(no_autoupdate_lastmod); $dataobj->commit(); # stealth update if $dataobj in storage $dataobj->commit({ update_lastmod => 1 }); # opt-in overwrite default {update_lastmod} # = !exists $import_opts{no_autoupdate_lastmod} In order to ensure that changes made by admin are still obvious in terms of database-level debugging or "forensics", my idea is to have an API-hidden and unprocessed native DATESTAMP field, say "sql_updated", and have it independently update with means of the database engine. (AFAIK, MySQL implies out-of-the-box "ON UPDATE CURRENT_TIMESTAMP()" for any first datestamp field of a table.)
I don't see a need for another timestamp, but I agree that the behaviour around lastmod could be reviewed. Also I don't think fields should be updated or not depending on which part of the system you're using (workflow etc) or which user is modifying a resource. The behaviour should be consistent and intuitive (and handled at the low-level for such system/internal fields).
What about reviewing which actions should update lastmod and which ones should NOT update lastmod?
I think that lastmod should be updated when either the metadata is modified and/or when the file content is changed hence, from the available epadmin functions:
- rebuild_triples: no metadata/content change => no lastmod update - recommit: by definition, this action should touch lastmod - reorder: re-create the order values for searching => no lastmod update - reindex: similar as above- redo_mime_type: might modify the Document's mime type => update lastmod when the mime type is updated - redo_thumbnails: generation of volatile files for previewing => no lastmod update
What do you reckon? Which other actions need to be reviewed/included here?
You might/should be able to recover the timestamps by querying the "history" dataset which keeps records of changes for eprint objects alongside their revision number (which is stored in the eprint).By the way, guessing there isn't another way to restore the timestamps but from backup dumps, is there? Is there yet a way to commit an eprint explicitly without updating the lastmod timestamp that I can consider in the future to prevent this?
By setting the non_volatile_change flag you should be able to avoid the auto-updating property of lastmod.
I can create new github issues once we're happy with the revised behaviour. Seb.
Regards Florian
- References:
- [EP-tech] Thousands of old eprints repropagated via OAI after epadmin redo_thumbnails &co.
- From: Florian Heß <hess@ub.uni-heidelberg.de>
- [EP-tech] Thousands of old eprints repropagated via OAI after epadmin redo_thumbnails &co.
- Prev by Date: [EP-tech] Re: How to remove link Login | Create Account
- Next by Date: [EP-tech] Re: Thousands of old eprints repropagated via OAI after epadmin redo_thumbnails &co.
- Previous by thread: [EP-tech] Thousands of old eprints repropagated via OAI after epadmin redo_thumbnails &co.
- Next by thread: [EP-tech] Re: Thousands of old eprints repropagated via OAI after epadmin redo_thumbnails &co.
- Index(es):