EPrints Technical Mailing List Archive

See the EPrints wiki for instructions on how to join this mailing list and related information.

Message: #06382


< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First

[EP-tech] Scripted XML download?


Hi,

I do some checking, analysis and visualisation of our repository in a third-party package, and I have it set up to ingest Eprints XML.  I’d like to update this once a week or so, but if I download it all in one big go it takes about 3 hours, 1.5GB, and tends to fail halfway in.  I have been doing it manually one year at a time, but that means 17 separate manual search-and-download operations, each taking ten minutes or so.  I don’t have shell access to the server, so can’t script it command-line. 

 

I have looked at the search page but after a search, the download form references a cached search id so I can’t just copy the URL in the download form. 

 

Can anyone give me a template for a URL that would work in a single pass in wget or libwww,  that I could then cron to fetch the EPXML ?  Obviously I have to be able to authenticate as well…  ?

 

Andy Reid

Research Information Manager

Executive Office, Room G40a

London School of Hygiene and Tropical Medicine

Keppel St, LONDON, WC1E 7HT

0207-927-2618 (Internal/Teleworker x2618)