EPrints Technical Mailing List Archive
Message: #09027
< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First
Re: [EP-tech] OAI Harvester broken by new security
- To: "eprints-tech@ecs.soton.ac.uk" <eprints-tech@ecs.soton.ac.uk>, John Salter <J.Salter@leeds.ac.uk>, "James Kerwin" <jkerwin2101@gmail.com>
- Subject: Re: [EP-tech] OAI Harvester broken by new security
- From: John Salter <J.Salter@leeds.ac.uk>
- Date: Tue, 9 Aug 2022 09:52:14 +0000
CAUTION: This e-mail originated outside the University of Southampton.
Just another quick thought: most harvesters present a user-agent string of either: - something useful e.g. 'IRUS_metadata_harvesting_bot' or '
Unpaywall (http://unpaywall.org/; mailto:team@impactstory.org)' - something software-y e.g. '
Apache-HttpClient/4.5.1 (Java/11.0.15) ', 'pyoai' or 'GuzzleHttp/6.5.5 curl/7.58.0 PHP/7.4.29' These could also be triggering a WAF (or similar mechanism) to say 'no'. As the requests are currently being blocked, they probably aren't reaching your Apache logs, but you could check older logs with something
like this (assuming you're using the common log format) to get a list of user-agents hitting the OAI endpoint, and how many times they've been: The 'use a double-quote as a delimiter' feels a bit hacky - but in this case I think is easier than splitting on whitespace or another
character! Cheers, John From: eprints-tech-bounces@ecs.soton.ac.uk [mailto:eprints-tech-bounces@ecs.soton.ac.uk]
On Behalf Of John Salter via Eprints-tech CAUTION: This
e-mail originated outside the University of Southampton. Hi James, The OAI-PMH resumptionToken isn't that complicated - essentially parameters that can be passed to the script directly are URL-encoded. I can see how this might trigger some WAF rules. I think the main approaches are:- - whitelist the OAI-PMH endpoint in the WAF - whitelist harvested in the WAF (you might not know all harvesters that visit your repo though!) - create a ruleset for the OAI-PMH vocabulary to be included in the WAF The nature of an OAI-PMH harvest could look very much like a bad-actor probing your server. The nature of the response payload could also mean the harvest creates peaks in server usage, which could make automated tooling connect
the OAI-PMH requests to a DOS style attack. Without knowing exactly what's at play it's difficult to make more refined suggestions.
Happy to have an off-list discussion about this, seeing as it's security-related. Cheers, John From:
eprints-tech-bounces@ecs.soton.ac.uk [mailto:eprints-tech-bounces@ecs.soton.ac.uk]
On Behalf Of James Kerwin via Eprints-tech CAUTION: This
e-mail originated outside the University of Southampton. Hello all, Hope everyone is doing well. This isn't a specific EPrints problem, but as you all use EPrints there may be some experience... We've had some security changes at the uni recently. Some of these result in us clicking buttons in EPrints and then we get taken to our IT Services security page. So far we've handled this by accessing via the university network (e.g.
VPN). This issue has now hit our OAI harvester. Specifically under "ListRecords" when we click the "Resume" button (https://livrepository.liverpool.ac.uk/cgi/oai2?verb=ListRecords&metadataPrefix=oai_dc).
Currently the organisations that usually harvest our content are unable to. I have spoken with our IT Services team to find a solution. Has anybody else experienced similar issues at their organisations and are there any steps you think I can take to resolve
it? It doesn't help that I don't know how resumption tokens work. I assume they are stored in a database somewhere? Or a file? The other incidences of this in the repository occur when making changes to file metadata, though not EPrint record
metadata. Thanks, James |
- Follow-Ups:
- Re: [EP-tech] OAI Harvester broken by new security
- From: John Salter <J.Salter@leeds.ac.uk>
- Re: [EP-tech] OAI Harvester broken by new security
- References:
- [EP-tech] OAI Harvester broken by new security
- From: James Kerwin <jkerwin2101@gmail.com>
- Re: [EP-tech] OAI Harvester broken by new security
- From: John Salter <J.Salter@leeds.ac.uk>
- Re: [EP-tech] OAI Harvester broken by new security
- From: John Salter <J.Salter@leeds.ac.uk>
- [EP-tech] OAI Harvester broken by new security
- Prev by Date: Re: [EP-tech] OAI Harvester broken by new security
- Next by Date: [EP-tech] Altmetric badge plugin
- Previous by thread: [EP-tech] Sort view with creators_name and corp_creators
- Index(es):