EPrints Technical Mailing List Archive
See the EPrints wiki for instructions on how to join this mailing list and related information.
Message: #05844
< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First
Re: [EP-tech] Seeing unusually high downloads in IRStats
- To: eprints-tech@ecs.soton.ac.uk
- Subject: Re: [EP-tech] Seeing unusually high downloads in IRStats
- From: Yuri <yurj@alfa.it>
- Date: Tue, 26 Jul 2016 11:49:37 +0200
Irstats is just wrong in using the http access instead than a javascript library (piwik, google analytics). This libraries already has the knowledge to fight the spammer/bot and rely on a real interaction with a web browser instead of an http access.
The added value of Irstats is in showing simple stats for every items, views and downloads, for a period of time. Replicating this simple statistics in an existing system (like piwik) would be the best solution.
Il 26/07/2016 11:16, Enio Carboni ha scritto:
Hi Betsy,i write an IP plugin for IRstats2 a few months ago ( to exclude admin local IP) where you set IP or range IP or CIDR to a config file.To use this add the new filter in cfg/cfg.d/z_irstats2.pl like this:$c->{irstats2}->{datasets} = {access => { filters => [ 'Robots', 'Repeat','IP' ] } },Note the last filter IPYou can download at github and try at https://github.com/eniocarboni/irstats2-filter-by-ipThere is also a test script irstats2-filter-by-ip.pl in archive/<ID>/bin to test the config file before process all stats.You could use it this way: ./irstats2-filter-by-ip.pl <ID> 103.25.156.5 or ./irstats2-filter-by-ip.pl <ID> 103.25.156.1-103.25.156.19Of course do not forget to add the IP range to be discarded in cfg / cfg.d / z_irstats2_filter_ipcidr_blocks.plLet me know if it was useful Enio CarboniIn data lunedì 25 luglio 2016 23:45:16 CEST, Coles, Elizabeth A. (Betsy) ha scritto:Forwarding from JISC-REPOSITORIES list – we’ve been seeing this in California too, and our IRStats2 counts are through the roof for the last couple of weeks.Can anyone tell me how to filter out these robots in IRStats2? And how to clean the access file so that our irstats2 reports are not distorted by this deluge? I assume I’d want to delete all entries with a requester_id in the table below and rerun IRstats2 setup from scratch.Thanks, Betsy Coles Caltech – Digital Library Development bcoles@caltech.edu <mailto:bcoles@caltech.edu>From: Repositories discussion list [mailto:JISC-REPOSITORIES@JISCMAIL.AC.UK] On Behalf Of Hilary Jones Sent: Friday, July 15, 2016 3:43 AM To: JISC-REPOSITORIES@JISCMAIL.AC.UK <mailto:JISC-REPOSITORIES@JISCMAIL.AC.UK> Subject: Seeing unusually high downloads in IRStats - IRUS-UK's explanation and why this isn't affecting IRUS-UK statsHi everyone,There was a discussion, via UKCORR mailing list, on why there are exceptionally high downloads being seen this week in IRStats and what might be causing it.After some investigation we have found that the unusually high downloads are down to four IP ranges:IP range Organisation Location No. IP addresses 103.25.156.* Microsoft Bingbot China 128 103.36.96.* Microsoft Corporation China 216 111.221.28.* Microsoft Bingbot China 256 202.89.235.* Microsoft Bingbot China 80These IPs have been systematically trawling and downloading files from many UK repositories. Looking at their User Agent strings they do not declare themselves as bots but masquerade as normal users.Happily, the IRUS-UK ingest has been filtering out these robotic downloads, so you won’t see a massive spike in your IRUS-UK stats.We hope this is of help. Best wishes Hilary Hilary JonesServices and Projects Support0161 413 7541 Skype hilary.jones@jisc.ac.uk <mailto:hilary.jones@jisc.ac.uk>Twitter @JonesHilaryJ 6th Floor Churchgate House, 56 Oxford Street, Manchester, M1 6EUjisc.ac.uk <http://www.jisc.ac.uk/>Jisc is a registered charity (number 1149740) and a company limited by guarantee which is registered in England under Company No. 5747339, VAT No. GB 882 5529 90. Jisc’s registered office is: One Castlepark, Tower Hill, Bristol, BS2 0JA. T 0203 697 5800. jisc.ac.uk <http://www.jisc.ac.uk/>*** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech *** Archive: http://www.eprints.org/tech.php/ *** EPrints community wiki: http://wiki.eprints.org/ *** EPrints developers Forum: http://forum.eprints.org/
- References:
- [EP-tech] Seeing unusually high downloads in IRStats
- From: "Coles, Elizabeth A. (Betsy)" <bcoles@caltech.edu>
- Re: [EP-tech] Seeing unusually high downloads in IRStats
- From: Enio Carboni <enio.carboni@gmail.com>
- [EP-tech] Seeing unusually high downloads in IRStats
- Prev by Date: Re: [EP-tech] Google Scholar Help
- Next by Date: Re: [EP-tech] Google Scholar Help
- Previous by thread: Re: [EP-tech] Seeing unusually high downloads in IRStats
- Next by thread: Re: [EP-tech] Seeing unusually high downloads in IRStats
- Index(es):