EPrints Technical Mailing List Archive
See the EPrints wiki for instructions on how to join this mailing list and related information.
Message: #09785
< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First
RE: [EP-tech] Bots - Server Resources
- To: "eprints-tech@ecs.soton.ac.uk" <eprints-tech@ecs.soton.ac.uk>
- Subject: RE: [EP-tech] Bots - Server Resources
- From: John Salter <J.Salter@leeds.ac.uk>
- Date: Wed, 24 Jul 2024 14:03:33 +0000
CAUTION: This e-mail originated outside the University of Southampton.
Hi James, It looks like the IP address in your logs is a Liverpool one..? It might be worth looking at where those requests are actually originating from (X-Forwarded-For – XFF headers). This might be easiest to ask your friendly IT people about – they should have access
to the perimeter logs.
Client -> Liverpool LoadBalancer -> Liverpool Proxy -> Liverpool EPrints Server –
Client -> Some random network stuff -> Liverpool LoadBalancer -> Liverpool Proxy -> Liverpool EPrints Server
I was going to suggest running something like: but I think that this will only show the connections from your load balancer, rather than the real Client IPs. Cheers, John From: eprints-tech-request@ecs.soton.ac.uk <eprints-tech-request@ecs.soton.ac.uk>
On Behalf Of James Kerwin
CAUTION: This e-mail originated outside the
University of Southampton. CAUTION: This e-mail originated outside the
University of Southampton. Hi everyone, I'm having an incredibly rough time with my server. Apache keeps getting killed by the OOM Killer because I'm out of memory. Mostly I can restart Apache, but it also sometimes kills a process that leaves me unable to log on until IT turn
the server off and on again (I'm unable to do this myself). I've been watching Top output all morning and see the memory and CPU usage shoot up for /usr/bin/apach running as the eprints user. Looking in my /var/log/apache2/other_vhosts_access.log file I can see LOADS of requests for stats pages under /cgi/stats/report. I suspect it's a crawler as an army of humans could never submit so many requests. I do have a robots.txt file in /opt/eprints3/archives/uolrepo/html/en that specifically disallows the /cgi/ directory, but that is either incorrect, being ignored or (likely) I'm not understanding things correctly. livrepository.liverpool.ac.uk:443 138.253.158.16 - - [24/Jul/2024:13:53:23 +0100] "GET /cgi/stats/report/eprint/3151894?range=2016
HTTP/1.0" 200 70049 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.6478.126 Mobile Safari/537.36 (compatible; GoogleOther)" livrepository.liverpool.ac.uk:443 138.253.158.16 - - [24/Jul/2024:13:53:23 +0100] "GET /cgi/stats/report/eprint/3108433/requests?range=2019
HTTP/1.0" 200 72878 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.6478.126 Mobile Safari/537.36 (compatible; GoogleOther)" livrepository.liverpool.ac.uk:443 138.253.158.16 - - [24/Jul/2024:13:53:26 +0100] "GET /cgi/stats/report/eprint/3006551?range=2017
HTTP/1.0" 200 70049 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.6478.126 Mobile Safari/537.36 (compatible; GoogleOther)" livrepository.liverpool.ac.uk:443 138.253.158.16 - - [24/Jul/2024:13:53:27 +0100] "GET /cgi/stats/report/eprint/3033512/compare_years?range=2021
HTTP/1.0" 200 87158 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.6478.126 Mobile Safari/537.36 (compatible; GoogleOther)" |
- Follow-Ups:
- Re: [EP-tech] Bots - Server Resources
- From: James Kerwin <jkerwin2101@gmail.com>
- Re: [EP-tech] Bots - Server Resources
- References:
- [EP-tech] Bots - Server Resources
- From: James Kerwin <jkerwin2101@gmail.com>
- [EP-tech] Bots - Server Resources
- Prev by Date: Re: [EP-tech] Bots - Server Resources
- Next by Date: Re: [EP-tech] Bots - Server Resources
- Previous by thread: Re: [EP-tech] Bots - Server Resources
- Next by thread: Re: [EP-tech] Bots - Server Resources
- Index(es):