EPrints Technical Mailing List Archive
See the EPrints wiki for instructions on how to join this mailing list and related information.
Message: #09798
< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First
Re: [EP-tech] Bots - Server Resources
- To: eprints-tech@ecs.soton.ac.uk
- Subject: Re: [EP-tech] Bots - Server Resources
- From: James Kerwin <jkerwin2101@gmail.com>
- Date: Thu, 25 Jul 2024 15:33:09 +0100
*** Options: https://wiki.eprints.org/w/Eprints-tech_Mailing_ListCAUTION: This e-mail originated outside the University of Southampton.CAUTION: This e-mail originated outside the University of Southampton.Hi James,
Do you have a max_menu_age or max_list_age set for that view? (see: https://wiki.eprints.org/w/Views.pl)If you are periodically regenerating that view, it might be worth setting one to a high value – maybe a month, if you are regenerating that view weekly.
It’s also worth checking that you haven’t left the ‘developer mode’ on for that archive, as that will mean the views get regenerated for every request!
Cheers,
John
From: eprints-tech-request@ecs.soton.ac.uk <eprints-tech-request@ecs.soton.ac.uk> On Behalf Of James Kerwin
Sent: Thursday, July 25, 2024 12:40 PM
To: eprints-tech@ecs.soton.ac.uk
Subject: Re: [EP-tech] Bots - Server Resources
CAUTION: External Message. Use caution opening links and attachments.
CAUTION: This e-mail originated outside the University of Southampton.
CAUTION: This e-mail originated outside the University of Southampton.
Hello!
Just to tie this off. It appears that it wasn't bots or harvesting that was causing us problems. I continued having repository problems after the blocking.
It was the people/authors/creators view. I'd re-enabled it after running generare_views for that specific view. I can't explain why, but if someone requested a page from the people view it caused this huge surge in CPU and memory usage which ultimately led to the OOM Killer doing its job.
I'm sure this is a problem with our setup rather than the view itself, so I don't want to cause any unnecessary concern. Why it's doing this I don't know. Maybe the generate_views didn't run successfully and someone has requested a page that hadn't been created which put a huge strain on things.
If anyone has any thoughts on this I'm all ears, otherwise thank you for all the help yesterday! I am still going to employ the code David sent as bots have caused us grief in the recent past and I'm keen to test it.
Thanks,
James
On Wed, Jul 24, 2024 at 8:37 PM John Salter <J.Salter@leeds.ac.uk> wrote:
CAUTION: This e-mail originated outside the University of Southampton.
CAUTION: This e-mail originated outside the University of Southampton.
Hi James,
It looks like the IP address in your logs is a Liverpool one..?
IIRC you are behind some new proxy/load balancing stuff.
It might be worth looking at where those requests are actually originating from (X-Forwarded-For – XFF headers). This might be easiest to ask your friendly IT people about – they should have access to the perimeter logs.
The X-Forwarded-For header should only be trusted to the last machine you’re in control of e.g.Client -> Liverpool LoadBalancer -> Liverpool Proxy -> Liverpool EPrints Server –
- You can trust the XFF added by the Liverpool proxy.
Client -> Some random network stuff -> Liverpool LoadBalancer -> Liverpool Proxy -> Liverpool EPrints Server
- Don’t trust what ‘some random network stuff’ tells you.
I was going to suggest running something like:
root> netstat -pantobut I think that this will only show the connections from your load balancer, rather than the real Client IPs.
Cheers,
John
From: eprints-tech-request@ecs.soton.ac.uk <eprints-tech-request@ecs.soton.ac.uk> On Behalf Of James Kerwin
Sent: Wednesday, July 24, 2024 2:28 PM
To: eprints-tech@ecs.soton.ac.uk
Subject: [EP-tech] Bots - Server Resources
CAUTION: External Message. Use caution opening links and attachments.
CAUTION: This e-mail originated outside the University of Southampton.
CAUTION: This e-mail originated outside the University of Southampton.
Hi everyone,
I'm having an incredibly rough time with my server. Apache keeps getting killed by the OOM Killer because I'm out of memory. Mostly I can restart Apache, but it also sometimes kills a process that leaves me unable to log on until IT turn the server off and on again (I'm unable to do this myself).
I've been watching Top output all morning and see the memory and CPU usage shoot up for /usr/bin/apach running as the eprints user.
Looking in my /var/log/apache2/other_vhosts_access.log file I can see LOADS of requests for stats pages under /cgi/stats/report. I suspect it's a crawler as an army of humans could never submit so many requests.
I do have a robots.txt file in /opt/eprints3/archives/uolrepo/html/en that specifically disallows the /cgi/ directory, but that is either incorrect, being ignored or (likely) I'm not understanding things correctly.
Here is an example of the requests:livrepository.liverpool.ac.uk:443 138.253.158.16 - - [24/Jul/2024:13:53:23 +0100] "GET /cgi/stats/report/eprint/3151894?range=2016 HTTP/1.0" 200 70049 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.6478.126 Mobile Safari/537.36 (compatible; GoogleOther)"
livrepository.liverpool.ac.uk:443 138.253.158.16 - - [24/Jul/2024:13:53:23 +0100] "GET /cgi/stats/report/eprint/3108433/requests?range=2019 HTTP/1.0" 200 72878 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.6478.126 Mobile Safari/537.36 (compatible; GoogleOther)"
livrepository.liverpool.ac.uk:443 138.253.158.16 - - [24/Jul/2024:13:53:26 +0100] "GET /cgi/stats/report/eprint/3006551?range=2017 HTTP/1.0" 200 70049 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.6478.126 Mobile Safari/537.36 (compatible; GoogleOther)"
livrepository.liverpool.ac.uk:443 138.253.158.16 - - [24/Jul/2024:13:53:27 +0100] "GET /cgi/stats/report/eprint/3033512/compare_years?range=2021 HTTP/1.0" 200 87158 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.6478.126 Mobile Safari/537.36 (compatible; GoogleOther)"
*** Options: https://wiki.eprints.org/w/Eprints-tech_Mailing_List
*** Archive: https://www.eprints.org/tech.php/
*** EPrints community wiki: https://wiki.eprints.org/
*** Archive: https://www.eprints.org/tech.php/
*** EPrints community wiki: https://wiki.eprints.org/
- References:
- [EP-tech] Bots - Server Resources
- From: James Kerwin <jkerwin2101@gmail.com>
- RE: [EP-tech] Bots - Server Resources
- From: John Salter <J.Salter@leeds.ac.uk>
- Re: [EP-tech] Bots - Server Resources
- From: James Kerwin <jkerwin2101@gmail.com>
- RE: [EP-tech] Bots - Server Resources
- From: John Salter <J.Salter@leeds.ac.uk>
- [EP-tech] Bots - Server Resources
- Prev by Date: RE: [EP-tech] Bots - Server Resources
- Next by Date: [EP-tech] Override 3.4.5 Import plugin
- Previous by thread: RE: [EP-tech] Bots - Server Resources
- Next by thread: [EP-tech] Error when upgrading version
- Index(es):