EPrints Technical Mailing List Archive

See the EPrints wiki for instructions on how to join this mailing list and related information.

Message: #06717


< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First

Re: [EP-tech] Making a static copy of an EPrints repo


I would use:

    wget --no-parent \
         --no-check-certificate \
         --html-extension \
         --convert-links \
         --restrict-file-names=windows \
         --recursive \
         --level=inf \
         -N \
         --page-requisites \
         -e robots=off \
         --wait=0 \
         --quota=inf \

I think --convert-links will do the job of converting links.


Il 18/07/2017 11:04, Ian Stuart ha scritto:
I need to make a read-only, static, copy of an old repo (the hardware is
dying, the installation was heavily tailored for the environment, and I
don't have the time to re-create in a new environment.)

I can grab all the active pages:

    wget --local-encoding=UTF-8 --remote-encoding=UTF-8 --no-cache
--mirror -nc -k http://my.repo/

This is good, however it doesn't edit all the absolute URLs in the view
pages, so we need to modify them:

    find my.repo -type f -exec sed -i 's_http://my.repo/_/_g' {} +

However this leaves me with the problem that the http://my.repo/nnn/
pages haven't been pulled down!

Any suggestions on how to do this?

Cheers