EPrints Technical Mailing List Archive
See the EPrints wiki for instructions on how to join this mailing list and related information.
Message: #07412
< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First
Re: [EP-tech] Multiple Uploaded Files in One Directory
- To: <eprints-tech@ecs.soton.ac.uk>
- Subject: Re: [EP-tech] Multiple Uploaded Files in One Directory
- From: James Kerwin <jkerwin2101@gmail.com>
- Date: Thu, 16 Aug 2018 13:46:41 +0100
Hi James,
Welcome to EPrints :o)
When EPrints resolves a URL, it uses the eprintid and pos to get the document data object via
EPrints::DataObj::Document::
doc_with_eprintid_and_pos
Normally there would only be one object returned - and the document that 'works' is the first one returned by the above call.
Onto the question about how items get into this state:
This sounds very similar to an issue we had with our Symplectic connector - and how it merged two EPrints together when the corresponding Symplectic items were merged together. This ends up with two documents attached to the same EPrint existing in the same 'pos'.
EPrints' default behaviour is to remove the 'pos' during a clone *only* when the doc is being cloned to the same parent: https://github.com/eprints/
eprints/blob/3.3/perl_lib/ EPrints/DataObj/Document.pm# L374
In some circumstances, this is not the correct course of action - EPrints should check that a doc doesn't already exist at that pos for that eprint.
I flagged the issue to Symplectic - thes ticket reads:
#################
We've discovered an issue with the Elements/EPrints connector:
EPrint ID 1; document: A.pdf with pos=1.
EPrint ID 2; document: B.pdf with pos=1.
If both of these are attached to Elements records, which are then merged, the resulting EPrint ends up with two documents at pos=1.
This is not meant to happen, and will mean that one of the documents is unreachable.
The 'real' bug lies in EPrints - but the connector 'tickles' it when two records are merged - and the $document->clone() method is used (which possibly should be flagged as an 'internal' EPrints method).
#################
I've created a fix for the Symplectic connector - and submitted it to them for review/release as a new version of RT1.
As yet this hasn't been released.
The specific fix I have for the Symplectic connector is (also saved as: https://gist.github.com/
jesusbagpuss/ ) in case the code below gets mangled by email transport):d9e292bd4dd222f5199a36747989f7 08
##############################
############################## ############################## # # Based on EPrints::DataObj::Document::
clone # NB Code duplication with Symplectic::RepoProcess::
MergeManager #
# Cloning documents can result in:
# - two documents with the same 'pos' field - and therefore sharing the same folder
# - 'spaces' in the document structure (e.g. pos=1 and pos=3, but no pos=2)
# this isn't what is needed. The code below manages these scenarios.
# EPrints' default behaviour is to remove the 'pos' during a clone *only* when the doc is being cloned to the same parent.
sub clone_document
{
my ($self, %args ) = @_;
my $eprint = $args{'eprint'};
my $doc = $args{'doc'};
my $reset_pos = $args{'reset_pos'};
my $data = "" $doc->{data} );
# cloning within the same eprint, in which case get a new position!
#if( defined $doc->parent && $eprint->id eq $doc->parent->id )
if( ( defined $doc->parent && $eprint->id eq $doc->parent->id ) || $reset_pos )
{
$data->{pos} = undef;
}
$data->{eprintid} = $eprint->get_id;
$data->{_parent} = $eprint;
# First create a new doc object
my $new_doc = $doc->{dataset}->create_
object( $doc->{session}, $data ); return undef if !defined $new_doc;
my $ok = 1;
# Copy files
foreach my $file (@{$doc->get_value( "files" )})
{
$file->clone( $new_doc ) or $ok = 0, last;
}
if( !$ok )
{
$new_doc->remove();
return undef;
}
return $new_doc;
}
##############################
############################## ############################## #
NB There are also some other changes requires in the Symplectic connector to make this work. If you'd like more information about this fix, let me know!
If you want to know how many items in your repository are affected by the 'duplicated pos' issue, try:
On the database, you can detect how many of your EPrints have this issue using the following SQL:
SELECT
eprintid, pos, count(*) as c
FROM
document
GROUP BY
eprintid, pos
HAVING c > 1;
If there are a few items, you may be able to resolve them by human effort.
If there are lots, then some scripting might be needed…
Does that help at all?
Cheers,
John
From: eprints-tech-bounces@ecs.
soton.ac.uk [mailto:eprints-tech-bounces@ecs.soton.ac.uk ] On Behalf Of James Kerwin
Sent: 15 August 2018 10:20
To: eprints-tech@ecs.soton.ac.uk
Subject: [EP-tech] Multiple Uploaded Files in One Directory
Morning all,
I'm very new to the world of EPrints and I'm still getting to grips with it.
I was alerted to a problem today where a file uploaded to Eprints is giving a "404 File not Found" warning when attempting to view/download the document.
On the repository server the document is present but appears in the same directory as another document (which can be accessed through eprints). There is then a a third document in a second directory that can be accessed.
Looking in the database I can see that all three documents are public and should be accessible.
As I understand it, the URL matches the file structure as:
And on the server are stored somewhere in the Eprints directory as:
[EP/ri/nt/sI/d]/DocPos/
document.pdf
As in a one-to-one between DocPos and doc name (I've looked at some other examples with more than 2 documents in one EPrint and each one follows this so far).
Firstly, are my assumptions correct?
Has anybody had a similar thing happen before?
Thanks,
James
*** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints- tech
*** Archive: http://www.eprints.org/tech.php/
*** EPrints community wiki: http://wiki.eprints.org/
*** EPrints developers Forum: http://forum.eprints.org/
- Follow-Ups:
- Re: [EP-tech] Multiple Uploaded Files in One Directory
- From: James Kerwin <jkerwin2101@gmail.com>
- Re: [EP-tech] Multiple Uploaded Files in One Directory
- References:
- [EP-tech] Multiple Uploaded Files in One Directory
- From: James Kerwin <jkerwin2101@gmail.com>
- Re: [EP-tech] Multiple Uploaded Files in One Directory
- From: John Salter <J.Salter@leeds.ac.uk>
- Re: [EP-tech] Multiple Uploaded Files in One Directory
- From: James Kerwin <jkerwin2101@gmail.com>
- [EP-tech] Multiple Uploaded Files in One Directory
- Prev by Date: Re: [EP-tech] Multiple Uploaded Files in One Directory
- Next by Date: Re: [EP-tech] Integrating eprints-3.3.15 with office365
- Previous by thread: [EP-tech] EPrints/CRIS
- Next by thread: [EP-tech] DOI handling in orcid_support_advance
- Index(es):