EPrints Technical Mailing List Archive
Message: #06425
< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First
[EP-tech] Linkcheck
- To: eprints-tech@ecs.soton.ac.uk
- Subject: [EP-tech] Linkcheck
- From: martin.braendle@id.uzh.ch
- Date: Fri, 7 Apr 2017 18:03:47 +0200
Hi,
I just wrote a linkcheck crawler that checks the remote URLs stored in an EPrints repo and updates the issues list for URLs that have an invalid format or report HTTP status codes other than 200. Please let me know if there is an interest to have it available, then I will put it on GitHub. There's some more work to do, e.g. move some of the methods to a plugin so that they can be called from elsewhere. Please also be aware that by applying a linkcheck crawler your editorial team may come under strain to fix all the dead links. Our initial run revealed that after 10 years of running our repository, about 25% of the URLs (about 7500 in our case) are now working anymore. The script also produces a report by HTTP status code and that is sorted either by eprint id or by URL. The latter allows to identify patterns so that URLs can be replaced or removed in batch. Best regards, Martin -- Dr. Martin Brändle Zentrale Informatik Universität Zürich Stampfenbachstr. 73 CH-8006 Zürich |
- Prev by Date: [EP-tech] Antwort: Re: BatchEdit permission name
- Next by Date: Re: [EP-tech] Antwort: Re: BatchEdit permission name
- Previous by thread: [EP-tech] BatchEdit permission name
- Next by thread: Re: [EP-tech] Linkcheck
- Index(es):