EPrints Technical Mailing List Archive

See the EPrints wiki for instructions on how to join this mailing list and related information.

Message: #06718


< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First

Re: [EP-tech] Making a static copy of an EPrints repo


On 18 July 2017 at 19:04, Ian Stuart <Ian.Stuart@ed.ac.uk> wrote:
> I need to make a read-only, static, copy of an old repo (the hardware is
> dying, the installation was heavily tailored for the environment, and I
> don't have the time to re-create in a new environment.)
>
> I can grab all the active pages:
>
>    wget --local-encoding=UTF-8 --remote-encoding=UTF-8 --no-cache
> --mirror -nc -k http://my.repo/
>
> This is good, however it doesn't edit all the absolute URLs in the view
> pages, so we need to modify them:
>
>    find my.repo -type f -exec sed -i 's_http://my.repo/_/_g' {} +
>
> However this leaves me with the problem that the http://my.repo/nnn/
> pages haven't been pulled down!
>
> Any suggestions on how to do this?
>
> Cheers
>

Depends how many records there are, and how sparse.  Do you have a
sitemap?  It might be worth parsing that, and fetching them one by
one.

If you're desperate, there's always:

    for id in {1..12345} ; do wget --etc http://my.repo/$id ; done

Cheers
-- 
  Matthew Kerwin
  http://matthew.kerwin.net.au/