EPrints Technical Mailing List Archive
See the EPrints wiki for instructions on how to join this mailing list and related information.
Message: #05847
< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First
Re: [EP-tech] Seeing unusually high downloads in IRStats
- To: "eprints-tech@ecs.soton.ac.uk" <eprints-tech@ecs.soton.ac.uk>
- Subject: Re: [EP-tech] Seeing unusually high downloads in IRStats
- From: "Graham, Clinton T" <ctgraham@pitt.edu>
- Date: Tue, 26 Jul 2016 12:13:37 +0000
The University of Pittsburgh opened ticket UCM000000270852 with Bing Webmaster Support last week regarding this and received the following response: Thank you for contacting Bing Webmaster Support. The activity you are seeing is most likely caused by one of our bots used for verifying your site rather than indexing
your site as Bingbot does. These crawlers do not have the same UA, and are in place to make sure the verification aspects of your site are in place. Yesterday, we requested additional information on what “verification” really means, and describe the problem of conflating user-generated activity with bot-generated activity, especially
for the scholarly publication process. I’ll reply again here if this support request goes anywhere, but perhaps others might be interested in similarly engaging Bing Webmaster Support? Enjoy, - Clinton Graham Systems Developer University of Pittsburgh | University Library System 412-383-1057 From: eprints-tech-bounces@ecs.soton.ac.uk [mailto:eprints-tech-bounces@ecs.soton.ac.uk]
On Behalf Of Coles, Elizabeth A. (Betsy) Forwarding from JISC-REPOSITORIES list – we’ve been seeing this in California too, and our IRStats2 counts are through the roof for the last couple of weeks. Can anyone tell me how to filter out these robots in IRStats2? And how to clean the access file so that our irstats2 reports are not distorted by this deluge? I assume I’d want to delete all entries with a
requester_id in the table below and rerun IRstats2 setup from scratch. Thanks, Betsy Coles Caltech – Digital Library Development From: Repositories discussion list [mailto:JISC-REPOSITORIES@JISCMAIL.AC.UK]
On Behalf Of Hilary Jones Hi everyone, There was a discussion, via UKCORR mailing list, on why there are exceptionally high downloads being seen this week in IRStats and what might be causing it. After some investigation we have found that the unusually high downloads are down to four IP ranges:
These IPs have been systematically trawling and downloading files from many UK repositories. Looking at their User Agent strings they do not declare themselves as bots but masquerade as
normal users. Happily, the IRUS-UK ingest has been filtering out these robotic downloads, so you won’t see a massive spike in your IRUS-UK stats. We hope this is of help. Best wishes Hilary
|
- References:
- [EP-tech] Seeing unusually high downloads in IRStats
- From: "Coles, Elizabeth A. (Betsy)" <bcoles@caltech.edu>
- [EP-tech] Seeing unusually high downloads in IRStats
- Prev by Date: Re: [EP-tech] Google Scholar Help
- Next by Date: Re: [EP-tech] Seeing unusually high downloads in IRStats
- Previous by thread: Re: [EP-tech] Seeing unusually high downloads in IRStats
- Next by thread: Re: [EP-tech] Seeing unusually high downloads in IRStats
- Index(es):