EPrints Technical Mailing List Archive

See the EPrints wiki for instructions on how to join this mailing list and related information.

Message: #02636


< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First

[EP-tech] Re: Indexing files inside a compressed (zip, rar) document


Hi Andras,

If you're on 3.3, have a look inside lib/cfg.d/search_xapian.pl and look where it does the full text indexing (look for "Convert"). It might just be easier to write a zip/tar to "text/plain" Convert plug-in.

Note that at the moment there are no consideration on the "security" settings of the documents. It might something you need to address when you're indexing research data.

If you come up with something, feel free to share your work on e.g. GitHub (github.com/eprints/eprints), thanks!

Seb.

 

On 17.02.2014 14:39, András Micsik wrote:

Hi,

   do you have any hint on how to extend the indexer to index the inside of 
zip/rar/etc archives? Is there any ready solution for this, or do I have to 
write an indexer plugin? The rationale behind: the large number of files 
contain research data, so they are easiest handled as a zip, but still would 
be nice to search inside...

thanks,