EPrints Technical Mailing List Archive
See the EPrints wiki for instructions on how to join this mailing list and related information.
Message: #09955
< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First
Re: [EP-tech] How to check deleted/missing uploaded files?
- To: David R Newman <drn@ecs.soton.ac.uk>
- Subject: Re: [EP-tech] How to check deleted/missing uploaded files?
- From: "Agung Prasetyo W." <prazetyo@gmail.com>
- Date: Mon, 27 Jan 2025 00:05:54 +0700
CAUTION: This e-mail originated outside the University of Southampton.
Hi David,
Thank you for your respond.
By the way, for the script, is it I put it in /usr/share/eprints/bin directory ?
I will try it and let you know.
Thank you.
Best regards,
Agung PW
On Sun, Jan 26, 2025, 21:39 David R Newman <drn@ecs.soton.ac.uk> wrote:
Hi Agung,
If I interpret what you are saying is that the files have disappeared off the filesystem but the database still has a reference to them, then you could write a script that looks at all the eprint records iterates over its documents and reports the files that are missing. That script would look something like this:
#!/usr/bin/perl -I /opt/eprints3/perl_lib -w
use EPrints;
my $repoid = "ARCHIVE_ID";
my $repo = new EPrints::Session( 1 , $repoid );
my $ds = $repo->dataset( "eprint" );
my $eprints = $ds->search;
$eprints->map( \&find_missing_docs );
sub find_missing_docs
{
my( $session, $dataset, $eprint ) = @_;
return unless defined $eprint->get_value( 'dir' );
my $eprint_path = $session->config( 'documents_path' ) . "/" . $eprint->get_value( 'dir' );
foreach my $doc ( $eprint->get_all_documents )
{
my $docpath = $eprint_path . "/" . sprintf( '%02s', $doc->get_value( 'pos' ) ) . "/" . $doc->get_value( 'main' );
unless ( -f $docpath )
{
print "Missing (eprint: ".$eprint->id."): $docpath\n";
}
}
}
You will need to modify $repoid to the your archive ID. Also if eprints path is not /opt/eprints3 then you will need to adjust this in the first line. The script will check that the main file for every document is present across all eprints. If it is not in will print out a line like:
Missing (eprint: 112): /opt/eprints3/archives/ARCHIVE_ID/documents/disk0/00/00/01/12/01/foo.png
Regards
David Newman
On 26/01/2025 12:14 pm, Agung Prasetyo W. wrote:
CAUTION: This e-mail originated outside the University of Southampton.Hi David,
So the problem is that we have uploaded items with pdf files, but suddenly the pdf files are lost/deleted.
Because we have a backup, we want to re-upload the pdf files.
Because of that, we want to know the items whose files are lost/deleted, so we can re-upload the files.
Thank you.
Regards,Agung PW
On Sun, Jan 26, 2025, 16:56 David R Newman <drn@ecs.soton.ac.uk> wrote:
Hi Agung,
I am not sure what you mean by "missing":
1. An eprint record that has not had any documents uploaded to it.
2. An eprint record that has had one or more documents uploaded to it but the files are not present on the filesystem where they are expected to be.
For 1, finding a negative using the EPrints API is a little tricky. I would assume you only care about items in the live archive and review buffer. As retired items are no longer relevant and if they are in the user inbox they may not have yet uploaded a file. So first I would use search to find all these eprints:
$ds = $repo->dataset( "eprint" ); $list = $ds->search(filters => [{ meta_fields => [qw( eprint_status )], value => "archive buffer", }]);I would then run the map function over the list to get it to print out the eprint IDs that have no documents.
sub fn { my( $session, $dataset, $eprint, $eprints ) = @_; push @$eprints, $eprint if scalar @{ $eprint->get_all_documents } == 0; }; my $eprints = []; $list->map( \&fn, $eprints );You can then use the set of eprints in $eprints as you choose. If you want editors/admins to be able to view this, then you probably want to install the Generic Reporting Framework plugin [A] and build a custom report. I would advise using the Example report [B] as a template to create your own report. Be sure to enable your new report with the following in a configuration file under your archive's cfg/cfg/d/ directory:
$c->{plugins}{"Screen::Report::YOUR_REPORT_NAME"}{params}{disable} = 0;As there is an issue with searching for a negative, you will need to just use the "filters" function to get all eprints in the live archive and review buffer and then run an items function like:
sub items { my( $self ) = @_; my $items = $self->SUPER::items; my $eprint_ids = []; $items->map( \&no_documents, $eprint_ids ); my $order = defined $self->{processor}->{sort} ? $self->{processor}->{sort} : $self->param( 'custom_order' ); my $new_items = EPrints::List->new( repository => $self->repository, dataset => $self->{processor}->{dataset}, ids => $eprint_ids, [order => $order] ); return $new_items; } sub no_documents { my( $session, $dataset, $eprint, $eprint_ids ) = @_; push @$eprint_ids, $eprint->id if scalar( $eprint->get_all_documents() ) eq 0; };I have only written this off the top of my head, so it is completely untested and therefore will likely need a bit of tidying up to make it work.
If by missing you mean 2, there are a number of different approached you may need to use depending on the issue. Sometimes the file is recorded in the database but missing on disk, sometimes it can be missing in both. I don't think it is worth going into this until I know what you mean by missing, as what I have explain above may have already answered your question.
Regards
David Newman
[A] https://bazaar.eprints.org/1105/
[B] https://bazaar.eprints.org/1105/1/plugins/EPrints/Plugin/Screen/Report/Example.pm
On 26/01/2025 2:43 am, Agung Prasetyo W. wrote:
CAUTION: This e-mail originated outside the University of Southampton.CAUTION: This e-mail originated outside the University of Southampton.Hi,
Is there a way to check for missing uploaded files? so that they can be re-uploaded by the administrator/editor.
Thank you.
Regards,Agung PW
*** Options: https://wiki.eprints.org/w/Eprints-tech_Mailing_List *** Archive: https://www.eprints.org/tech.php/ *** EPrints community wiki: https://wiki.eprints.org/
- References:
- [EP-tech] How to check deleted/missing uploaded files?
- From: "Agung Prasetyo W." <prazetyo@gmail.com>
- Re: [EP-tech] How to check deleted/missing uploaded files?
- From: David R Newman <drn@ecs.soton.ac.uk>
- Re: [EP-tech] How to check deleted/missing uploaded files?
- From: "Agung Prasetyo W." <prazetyo@gmail.com>
- Re: [EP-tech] How to check deleted/missing uploaded files?
- From: David R Newman <drn@ecs.soton.ac.uk>
- [EP-tech] How to check deleted/missing uploaded files?
- Prev by Date: Re: [EP-tech] How to check deleted/missing uploaded files?
- Next by Date: [EP-tech] CORS
- Previous by thread: Re: [EP-tech] How to check deleted/missing uploaded files?
- Next by thread: [EP-tech] CORS
- Index(es):