EPrints Technical Mailing List Archive

See the EPrints wiki for instructions on how to join this mailing list and related information.

Message: #10124


< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First

AW: [EP-tech] DDoS of EPrints advanced search


CAUTION: This e-mail originated outside the University of Southampton.

Hi all

We can confirm Davids observation (like my collegue Martin already posted), but probably for some of you it could help to find botnets on class-c level:


$ grep "cgi\/search\/.*advanced" /var/log/httpd/access_log_xxxxxxx | awk '{split($1, ip, "."); class_b=ip[1]"."ip[2]; class_b_count[class_b]++; unique_ips[class_b][$1]=1} END {for (class_b in class_b_count) { printf "%s\t\t%d\t\t%d\n", class_b, class_b_count[class_b], length(unique_ips[class_b]) }}'  | sort -k2,2nr | head -n 10
177.37          8294            5644
179.125         7461            5304
187.19          5980            4061
191.5           5408            3226
130.60          5154            10
177.22          4212            2658
170.254         4181            2538
45.70           4157            2658
179.108         4123            2652
168.232         4105            2635
$

Where the 1st is IP class-b, 2nd is count access, 3rd number of different Ips from that class-b, and  xxxxxxx is your access_log name extension.

Good luck
Jens

--
Jens Witzel
Universität Zürich
Zentrale Informatik
Pfingstweidstrasse 60B
CH-8005 Zürich

mail:  jens.witzel@uzh.ch
phone: +41 44 63 56777
https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.zi.uzh.ch%2F&data=05%7C02%7Ceprints-tech%40ecs.soton.ac.uk%7Ceb2a092f1f014500520c08dd9ddbf77e%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C638840290264628402%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=3rlQIaGz6bTEBK2DkEsOQt22kHzHdjS75yLhifhuxoE%3D&reserved=0

-----Ursprüngliche Nachricht-----
Von: eprints-tech-request@ecs.soton.ac.uk <eprints-tech-request@ecs.soton.ac.uk> Im Auftrag von David R Newman
Gesendet: Mittwoch, 28. Mai 2025 11:26
An: eprints-tech@ecs.soton.ac.uk
Betreff: [EP-tech] DDoS of EPrints advanced search

Hi all,

We have been observing that a lot of EPrints repositories have been receiving Distributed Denial-of-Service (DDoS) attacks on their advanced search.  As running advanced search queries can put quite a lot of load on the server, this can lead to the repository becoming unresponsive.

Analysis of the requests has shown that typically these requests are bots working their way through the pages of search results for the same search rather than lots of individual searches. Typically, each affected repository will only have a few, maybe up to a dozen different actual searches.  The following command will allow you to see what searches these are. (You may need to adapt this is you access log is elsewhere):

grep "GET /cgi/search/archive/advanced" /var/log/httpd/ssl_access_log | grep -v " 403 " | grep -o 'exp=[^&]\+' | sort | uniq -c | sort -n

Typically, /var/log/httpd/ssl_access_log will only cover the requests since sometime just after midnight on the previous Sunday, if you have default log rotate in place.  So, as it is Wednesday now, you should have decent sample to analyse.

What we have done with these results is added a LocationMatch block inside the Virtualhost block in EPrints Apache configuration. Typically, adding this for the HTTPS virtualhost has been sufficient but you may also need to add it to the HTTP virtualhost.  Therefore, it may be worth adding the LocationMatch configuration to a separate file and then including it under both Virtualhost blocks.

Let's say your command above found one specific search that had been requested thousands of times in the last few days:

exp=0%7C1%7C-date%2Fcreators_name%2Ftitle%7Carchive%7C-%7Ctitle%3Atitle%3AALL%3AIN%3Afibromyalgia+symptoms%7C-%7Ceprint_status%3Aeprint_status%3AANY%3AEQ%3Aarchive%7Cmetadata_visibility%3Ametadata_visibility%3AANY%3AEQ%3Ashow

You should take this and strip off everything before %7Ctitle, keeping the title but removing the %7C, so it would look like*:

title%3Atitle%3AALL%3AIN%3Afibromyalgia+symptoms%7C-%7Ceprint_status%3Aeprint_status%3AANY%3AEQ%3Aarchive%7Cmetadata_visibility%3Ametadata_visibility%3AANY%3AEQ%3Ashow

You can then add it to the Apache configuration as follows, being sure to escape with a '\' any plus (+) symbols:

<LocationMatch "^/cgi/search/archive/advanced">
   <If "%{QUERY_STRING} =~
/exp=0%7C1%7C-date%2Fcreators_name%2Ftitle%7Carchive%7C-%7Ctitle%3Atitle%3AALL%3AIN%3Afibromyalgia\+symptoms%7C-%7Ceprint_status%3Aeprint_status%3AANY%3AEQ%3Aarchive%7Cmetadata_visibility%3Ametadata_visibility%3AANY%3AEQ%3Ashow/">
     Require all denied
   </If>
</LocationMatch>

If you have multiple searches you want to block, it clearer to add additional If blocks for each search expression (rather than trying to match multiple search expressions in the same regular expression).  Once you have finished adding these, then as run the appropriate Apache commands to check the config and reload.  E.g.

apachectl configtest
apachectl graceful

We have found that this is quite effective in dealing with this problem.  However, it does means some genuine users may perform legitimate searches and although the first page should return OK, if they try to re-order or go to the next page it will return a 403 forbidden response (but only for these specific searches). This is not ideal but there is really no other straightforward way to handle this problem as the IP addresses vary so widely and doing nothing may mean long periods where no user can access any part of the EPrints repository.

If anyone has any suggestions on how to refine this configuration, then please share.

Thanks and regards

David Newman

*Sometimes requests URL encode the / symbol as %7C and sometimes it doesn't so removing up to %7Ctitle ensure that the pattern you are matching on covers both the encoded and un-encoded versions.