EPrints Technical Mailing List Archive
See the EPrints wiki for instructions on how to join this mailing list and related information.
Message: #02617
< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First
[EP-tech] Indexing based on case sensitive file extension check?
- To: <eprints-tech@ecs.soton.ac.uk>
- Subject: [EP-tech] Indexing based on case sensitive file extension check?
- From: Rory McNicholl <rory.mcnicholl@london.ac.uk>
- Date: Fri, 7 Feb 2014 14:54:17 +0000
Hello, Bernard from IOE noticed that if he uploaded a pdf with an uppercase extension (ie .PDF) it was never indexed. If he replaced that with the same file with a lowercase extension it got indexed. I managed to find the cause in perl_lib/EPrints/Plugin/Convert/PlainText.pm Where in the *can_convert* (ln 70) and *export* (ln 118) subs, there are regexs that check the file extension before continuing. These expect lowercase file extensions and so no indexcodes are extracted from .PDFs of .DOCs or .HTMLs etc. Easy to fix, once found, but took me ages. Looking in github I can't see where any regression might have occurred so I'm wondering if it was ever thus? Cheers, Rory -- Rory McNicholl Lead developer, Research Repositories Team Academic Research Technologies University of London Computer Centre Senate House Malet Street London WC1E 7HU t: +44 (0)20 7863 1344 e: r.mcnicholl@ulcc.ac.uk w: http://www.ulcc.ac.uk/ b: http://dablog.ulcc.ac.uk/ To ensure you receive the full benefits of the repositories service please remember to cc repositories@ulcc.ac.uk The University of London is an exempt charity in England and Wales and a charity registered in Scotland (reg. no. SC041194)
- Prev by Date: [EP-tech] OR2014: EPrints User Group Call - Get Involved
- Next by Date: [EP-tech] Re: Issues with installing eprints on Ubuntu 12.04
- Previous by thread: [EP-tech] OR2014: EPrints User Group Call - Get Involved
- Next by thread: [EP-tech] error on indexer
- Index(es):