EPrints Technical Mailing List Archive

See the EPrints wiki for instructions on how to join this mailing list and related information.

Message: #06382

[EP-tech] Scripted XML download?

To: "eprints-tech@ecs.soton.ac.uk" <eprints-tech@ecs.soton.ac.uk>
Subject: [EP-tech] Scripted XML download?
From: Andy Reid <Andy.REID@lshtm.ac.uk>
Date: Mon, 27 Mar 2017 13:51:32 +0000

Hi,

I do some checking, analysis and visualisation of our repository in a third-party package, and I have it set up to ingest Eprints XML. I’d like to update this once a week or so, but if I download it all in one big go it takes about 3 hours, 1.5GB, and tends to fail halfway in. I have been doing it manually one year at a time, but that means 17 separate manual search-and-download operations, each taking ten minutes or so. I don’t have shell access to the server, so can’t script it command-line.

I have looked at the search page but after a search, the download form references a cached search id so I can’t just copy the URL in the download form.

Can anyone give me a template for a URL that would work in a single pass in wget or libwww, that I could then cron to fetch the EPXML ? Obviously I have to be able to authenticate as well… ?

Andy Reid

Research Information Manager

Executive Office, Room G40a

London School of Hygiene and Tropical Medicine

Keppel St, LONDON, WC1E 7HT

0207-927-2618 (Internal/Teleworker x2618)

Prev by Date: Re: [EP-tech] Apache log getting a lot of errors and Mysql Going away
Next by Date: Re: [EP-tech] Apache log getting a lot of errors and Mysql Going away
Previous by thread: [EP-tech] Apache log getting a lot of errors and Mysql Going away
Next by thread: Re: [EP-tech] Scripted XML download?
Index(es):
- Date
- Thread