EPrints Technical Mailing List Archive

Message: #08874


< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First

Re: [EP-tech] apostrophe in file names of uploaded/deposited file


Hi Tomasz,

There are two ways to work round this issue.  One has been in EPrints for quite a while, another I introduced in 3.4.3 to help deal retrospectively with this issue.

1. https://wiki.eprints.org/w/Optional_filename_sanitise.pl allows you to set characters that should be removed before a filename is recorded in the database or saved to disk.  I have to admit I did not know about this until fairly recently, so I have not tested how well it will work or solve your problem.  If you look at /opt/eprints3/lib/cfg,d/optional_filename_sanitise.pl there is a function that can be added under $c->{optional_filename_sanitise}.  The default (albeit commented out) function will remove white space, brackets and @ signs into underscores.  You could add a line like below to deal with apostrophes.

$filepath =~ s!\x27!_!g;

2. The new functionality I added for 3.4.3, is to allow files on disk to be found under the filename <fileid>.bin.  This allows you to fix this sort of issue by renaming the file on disk to <fileid>.bin.  Also, you can enable it so that future files are automatically saved in the format <fileid>.bin by setting:

$c->{generic_filenames} = 1;

I would probably advise against doing this on a live repository, especially if you have unusual uploads like uploading multiple files an once through "Upload from URL".  If you want to test this on a development repo, then please do, as any real-world-ish feedback on this feature would be useful.

Regards

David Newman

On 20/02/2022 20:32, Tomasz Neugebauer via Eprints-tech wrote:
CAUTION: This e-mail originated outside the University of Southampton.

Good afternoon!

 

I’m trying to troubleshoot an issue with exporting out a deposited file that has an apostrophe in the filename.

This is the issue: https://github.com/eprintsug/EPrintsArchivematica/issues/40

 

Does EPrints replace apostrophes in filenames on disk with =0027?

Where in the code does that happen?

The URL of the file has the apostrophe, for example:

https://spectrum.library.concordia.ca/id/eprint/7066/1/Services_techniques_a_l'Universite_Concordia.pdf

But unlike other Unicode characters, the apostrophe doesn’t make it into the file name on disk, and is substituted with =0027.

I’m looking for confirmation that this is how it is “supposed” to work, and for an understanding where this happens in the code, so that I might ultimately know how many OTHER characters are replaced in this way in the filename?

 

Tomasz

 


*** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
*** Archive: http://www.eprints.org/tech.php/
*** EPrints community wiki: http://wiki.eprints.org/