EPrints Technical Mailing List Archive
Message: #05849
< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First
Re: [EP-tech] Seeing unusually high downloads in IRStats
- To: "eprints-tech@ecs.soton.ac.uk" <eprints-tech@ecs.soton.ac.uk>
- Subject: Re: [EP-tech] Seeing unusually high downloads in IRStats
- From: "Graham, Clinton T" <ctgraham@pitt.edu>
- Date: Tue, 26 Jul 2016 14:23:01 +0000
What do you propose that User Agent match be? We found each of the following coming from Bing, among others: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.122 Safari/537.36 SE 2.X MetaSr 1.0 Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36 Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/34.0.1847.131 Safari/537.36 Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.122 Safari/537.36 SE 2.X MetaSr 1.0 Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.122 Safari/537.36 SE 2.X MetaSr 1.0 Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.122 Safari/537.36 SE 2.X MetaSr 1.0 We requested that Bing Support describe any existing pattern for identification, or requested they comply with RFC2616 14.22's use of the From header in such a way that we could recommend to Project COUNTER that this be considered for bot identification. Enjoy, - Clinton Graham Systems Developer University of Pittsburgh | University Library System 412-383-1057 -----Original Message----- From: eprints-tech-bounces@ecs.soton.ac.uk [mailto:eprints-tech-bounces@ecs.soton.ac.uk] On Behalf Of Yuri Sent: Tuesday, July 26, 2016 9:21 AM To: eprints-tech@ecs.soton.ac.uk Subject: Re: [EP-tech] Seeing unusually high downloads in IRStats With Apache: RewriteEngine On RewriteCond %{HTTP:User-Agent} (?:Yandex|msnbot|Owlinbo|sistrix|genieo|proximic|MJ12bot|AhrefsBot|searchmetrics|SearchmetricsBot|Baidu) [NC] RewriteRule .? - [F] just add the guilty. Problem solved :-D Il 26/07/2016 14:13, Graham, Clinton T ha scritto: > > The University of Pittsburgh opened ticket UCM000000270852 with Bing > Webmaster Support last week regarding this and received the following > response: > > Thank you for contacting Bing Webmaster Support. The activity you are > seeing is most likely caused by one of our bots used for verifying > your site rather than indexing your site as Bingbot does. These > crawlers do not have the same UA, and are in place to make sure the > verification aspects of your site are in place. > > Yesterday, we requested additional information on what "verification" > really means, and describe the problem of conflating user-generated > activity with bot-generated activity, especially for the scholarly > publication process. > > I'll reply again here if this support request goes anywhere, but > perhaps others might be interested in similarly engaging Bing > Webmaster Support? > > Enjoy, > > - Clinton Graham > > Systems Developer > > University of Pittsburgh | University Library System > > 412-383-1057 > > *From:*eprints-tech-bounces@ecs.soton.ac.uk > [mailto:eprints-tech-bounces@ecs.soton.ac.uk] *On Behalf Of *Coles, > Elizabeth A. (Betsy) > *Sent:* Monday, July 25, 2016 7:45 PM > *To:* eprints-tech@ecs.soton.ac.uk > *Subject:* [EP-tech] Seeing unusually high downloads in IRStats > > Forwarding from JISC-REPOSITORIES list - we've been seeing this in > California too, and our IRStats2 counts are through the roof for the > last couple of weeks. > > Can anyone tell me how to filter out these robots in IRStats2? And > how to clean the access file so that our irstats2 reports are not > distorted by this deluge? I assume I'd want to delete all entries > with a requester_id in the table below and rerun IRstats2 setup from > scratch. > > Thanks, > > Betsy Coles > > Caltech - Digital Library Development > > bcoles@caltech.edu <mailto:bcoles@caltech.edu> > > *From:* Repositories discussion list > [mailto:JISC-REPOSITORIES@JISCMAIL.AC.UK] *On Behalf Of *Hilary Jones > *Sent:* Friday, July 15, 2016 3:43 AM > *To:* JISC-REPOSITORIES@JISCMAIL.AC.UK > <mailto:JISC-REPOSITORIES@JISCMAIL.AC.UK> > *Subject:* Seeing unusually high downloads in IRStats - IRUS-UK's > explanation and why this isn't affecting IRUS-UK stats > > Hi everyone, > > There was a discussion, via UKCORR mailing list, on why there are > exceptionally high downloads being seen this week in IRStats and what > might be causing it. > > After some investigation we have found that the unusually high > downloads are down to four IP ranges: > > IP range > > > > Organisation > > > > Location > > > > No. IP addresses > > 103.25.156.* > > > > Microsoft Bingbot > > > > China > > > > 128 > > 103.36.96.* > > > > Microsoft Corporation > > > > China > > > > 216 > > 111.221.28.* > > > > Microsoft Bingbot > > > > China > > > > 256 > > 202.89.235.* > > > > Microsoft Bingbot > > > > China > > > > 80 > > These IPs have been systematically trawling and downloading files from > many UK repositories. Looking at their User Agent strings they do not > declare themselves as bots but masquerade as normal users. > > Happily, the IRUS-UK ingest has been filtering out these robotic > downloads, so you won't see a massive spike in your IRUS-UK stats. > > We hope this is of help. > > Best wishes > > Hilary > > Jisc > <https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2fwww.jisc.ac.uk%2f&data=01%7c01%7cctgraham%40pitt.edu%7cc90cb3f4da52477f805508d3b4e65fe1%7c9ef9f489e0a04eeb87cc3a526112fd0d%7c1&sdata=QO%2bCO4aO%2b4wNHbglnWa6s4IinzrhqbxzUGL5ieuMq5E%3d> > > *Hilary Jones* > Services and Projects Support > > 0161 413 7541 > Skype hilary.jones@jisc.ac.uk <mailto:hilary.jones@jisc.ac.uk> > Twitter @JonesHilaryJ > 6th Floor Churchgate House, 56 Oxford Street, Manchester, M1 6EU > > *jisc.ac.uk > <https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2fwww.jisc.ac.uk%2f&data=01%7c01%7cctgraham%40pitt.edu%7cc90cb3f4da52477f805508d3b4e65fe1%7c9ef9f489e0a04eeb87cc3a526112fd0d%7c1&sdata=QO%2bCO4aO%2b4wNHbglnWa6s4IinzrhqbxzUGL5ieuMq5E%3d> > * > > Jisc is a registered charity (number 1149740) and a company limited by > guarantee which is registered in England under Company No. 5747339, > VAT No. GB 882 5529 90. Jisc's registered office is: One Castlepark, > Tower Hill, Bristol, BS2 0JA. T 0203 697 5800. jisc.ac.uk > <https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2fwww.jisc.ac.uk%2f&data=01%7c01%7cctgraham%40pitt.edu%7cc90cb3f4da52477f805508d3b4e65fe1%7c9ef9f489e0a04eeb87cc3a526112fd0d%7c1&sdata=QO%2bCO4aO%2b4wNHbglnWa6s4IinzrhqbxzUGL5ieuMq5E%3d> > > > > *** Options: https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2fmailman.ecs.soton.ac.uk%2fmailman%2flistinfo%2feprints-tech&data=01%7c01%7cctgraham%40pitt.edu%7cfa3c2de61e1549c3314e08d3b5587b28%7c9ef9f489e0a04eeb87cc3a526112fd0d%7c1&sdata=Ehu39hyCMWRVOCRKkKklceTfE%2f%2fkg42Pfzm0wbri09Y%3d > *** Archive: https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2fwww.eprints.org%2ftech.php%2f&data=01%7c01%7cctgraham%40pitt.edu%7cfa3c2de61e1549c3314e08d3b5587b28%7c9ef9f489e0a04eeb87cc3a526112fd0d%7c1&sdata=V6N4nro4zLCpORRsY9pXdQl6DPfNatw0rDArihFMrgY%3d > *** EPrints community wiki: https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2fwiki.eprints.org%2f&data=01%7c01%7cctgraham%40pitt.edu%7cfa3c2de61e1549c3314e08d3b5587b28%7c9ef9f489e0a04eeb87cc3a526112fd0d%7c1&sdata=MgG4kKoc%2fdA02Fp2EIC3TUqlmiKO46QH0gxocexaX5U%3d > *** EPrints developers Forum: https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2fforum.eprints.org%2f&data=01%7c01%7cctgraham%40pitt.edu%7cfa3c2de61e1549c3314e08d3b5587b28%7c9ef9f489e0a04eeb87cc3a526112fd0d%7c1&sdata=4yAgurdLBbTi005%2fDcW74cNSOYyiTbbx%2f6MfusHVCPg%3d *** Options: https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2fmailman.ecs.soton.ac.uk%2fmailman%2flistinfo%2feprints-tech&data=01%7c01%7cctgraham%40pitt.edu%7cfa3c2de61e1549c3314e08d3b5587b28%7c9ef9f489e0a04eeb87cc3a526112fd0d%7c1&sdata=Ehu39hyCMWRVOCRKkKklceTfE%2f%2fkg42Pfzm0wbri09Y%3d *** Archive: https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2fwww.eprints.org%2ftech.php%2f&data=01%7c01%7cctgraham%40pitt.edu%7cfa3c2de61e1549c3314e08d3b5587b28%7c9ef9f489e0a04eeb87cc3a526112fd0d%7c1&sdata=V6N4nro4zLCpORRsY9pXdQl6DPfNatw0rDArihFMrgY%3d *** EPrints community wiki: https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2fwiki.eprints.org%2f&data=01%7c01%7cctgraham%40pitt.edu%7cfa3c2de61e1549c3314e08d3b5587b28%7c9ef9f489e0a04eeb87cc3a526112fd0d%7c1&sdata=MgG4kKoc%2fdA02Fp2EIC3TUqlmiKO46QH0gxocexaX5U%3d *** EPrints developers Forum: https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2fforum.eprints.org%2f&data=01%7c01%7cctgraham%40pitt.edu%7cfa3c2de61e1549c3314e08d3b5587b28%7c9ef9f489e0a04eeb87cc3a526112fd0d%7c1&sdata=4yAgurdLBbTi005%2fDcW74cNSOYyiTbbx%2f6MfusHVCPg%3d
- References:
- [EP-tech] Seeing unusually high downloads in IRStats
- From: "Coles, Elizabeth A. (Betsy)" <bcoles@caltech.edu>
- Re: [EP-tech] Seeing unusually high downloads in IRStats
- From: "Graham, Clinton T" <ctgraham@pitt.edu>
- Re: [EP-tech] Seeing unusually high downloads in IRStats
- From: Yuri <yurj@alfa.it>
- [EP-tech] Seeing unusually high downloads in IRStats
- Prev by Date: Re: [EP-tech] Seeing unusually high downloads in IRStats
- Next by Date: Re: [EP-tech] Seeing unusually high downloads in IRStats
- Previous by thread: Re: [EP-tech] Seeing unusually high downloads in IRStats
- Next by thread: Re: [EP-tech] Seeing unusually high downloads in IRStats
- Index(es):