EPrints Technical Mailing List Archive
See the EPrints wiki for instructions on how to join this mailing list and related information.
Message: #10122
< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First
[EP-tech] DDoS of EPrints advanced search
- To: "eprints-tech@ecs.soton.ac.uk" <eprints-tech@ecs.soton.ac.uk>
- Subject: [EP-tech] DDoS of EPrints advanced search
- From: David R Newman <drn@ecs.soton.ac.uk>
- Date: Wed, 28 May 2025 10:26:13 +0100
Hi all,We have been observing that a lot of EPrints repositories have been receiving Distributed Denial-of-Service (DDoS) attacks on their advanced search. As running advanced search queries can put quite a lot of load on the server, this can lead to the repository becoming unresponsive.
Analysis of the requests has shown that typically these requests are bots working their way through the pages of search results for the same search rather than lots of individual searches. Typically, each affected repository will only have a few, maybe up to a dozen different actual searches. The following command will allow you to see what searches these are. (You may need to adapt this is you access log is elsewhere):
grep "GET /cgi/search/archive/advanced" /var/log/httpd/ssl_access_log | grep -v " 403 " | grep -o 'exp=[^&]\+' | sort | uniq -c | sort -n
Typically, /var/log/httpd/ssl_access_log will only cover the requests since sometime just after midnight on the previous Sunday, if you have default log rotate in place. So, as it is Wednesday now, you should have decent sample to analyse.
What we have done with these results is added a LocationMatch block inside the Virtualhost block in EPrints Apache configuration. Typically, adding this for the HTTPS virtualhost has been sufficient but you may also need to add it to the HTTP virtualhost. Therefore, it may be worth adding the LocationMatch configuration to a separate file and then including it under both Virtualhost blocks.
Let's say your command above found one specific search that had been requested thousands of times in the last few days:
exp=0%7C1%7C-date%2Fcreators_name%2Ftitle%7Carchive%7C-%7Ctitle%3Atitle%3AALL%3AIN%3Afibromyalgia+symptoms%7C-%7Ceprint_status%3Aeprint_status%3AANY%3AEQ%3Aarchive%7Cmetadata_visibility%3Ametadata_visibility%3AANY%3AEQ%3AshowYou should take this and strip off everything before %7Ctitle, keeping the title but removing the %7C, so it would look like*:
title%3Atitle%3AALL%3AIN%3Afibromyalgia+symptoms%7C-%7Ceprint_status%3Aeprint_status%3AANY%3AEQ%3Aarchive%7Cmetadata_visibility%3Ametadata_visibility%3AANY%3AEQ%3AshowYou can then add it to the Apache configuration as follows, being sure to escape with a '\' any plus (+) symbols:
<LocationMatch "^/cgi/search/archive/advanced"><If "%{QUERY_STRING} =~ /exp=0%7C1%7C-date%2Fcreators_name%2Ftitle%7Carchive%7C-%7Ctitle%3Atitle%3AALL%3AIN%3Afibromyalgia\+symptoms%7C-%7Ceprint_status%3Aeprint_status%3AANY%3AEQ%3Aarchive%7Cmetadata_visibility%3Ametadata_visibility%3AANY%3AEQ%3Ashow/">
Require all denied </If> </LocationMatch>If you have multiple searches you want to block, it clearer to add additional If blocks for each search expression (rather than trying to match multiple search expressions in the same regular expression). Once you have finished adding these, then as run the appropriate Apache commands to check the config and reload. E.g.
apachectl configtest apachectl gracefulWe have found that this is quite effective in dealing with this problem. However, it does means some genuine users may perform legitimate searches and although the first page should return OK, if they try to re-order or go to the next page it will return a 403 forbidden response (but only for these specific searches). This is not ideal but there is really no other straightforward way to handle this problem as the IP addresses vary so widely and doing nothing may mean long periods where no user can access any part of the EPrints repository.
If anyone has any suggestions on how to refine this configuration, then please share.
Thanks and regards David Newman*Sometimes requests URL encode the / symbol as %7C and sometimes it doesn't so removing up to %7Ctitle ensure that the pattern you are matching on covers both the encoded and un-encoded versions.
- Follow-Ups:
- AW: [EP-tech] DDoS of EPrints advanced search
- From: Jens Witzel <jens.witzel@uzh.ch>
- Re: [EP-tech] DDoS of EPrints advanced search
- From: Martin Brändle <martin.braendle@uzh.ch>
- AW: [EP-tech] DDoS of EPrints advanced search
- Prev by Date: Re: [EP-tech] Future REF plugin
- Next by Date: Re: [EP-tech] DDoS of EPrints advanced search
- Previous by thread: [EP-tech] Future REF plugin
- Next by thread: Re: [EP-tech] DDoS of EPrints advanced search
- Index(es):