EPrints Technical Mailing List Archive
See the EPrints wiki for instructions on how to join this mailing list and related information.
Message: #03720
< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First
[EP-tech] Re: How to modify robots.txt and add a new bot?
- To: "eprints-tech@ecs.soton.ac.uk" <eprints-tech@ecs.soton.ac.uk>
- Subject: [EP-tech] Re: How to modify robots.txt and add a new bot?
- From: "Brian D. Gregg" <bdgregg@pitt.edu>
- Date: Fri, 19 Dec 2014 17:10:36 +0000
So I’ve found out how to update an archive’s robots.txt. More reading was needed on my part -
L.
By placing a robots.txt file into the archive’s cfg/lane/en/static folder it is pulled and displayed instead of the default robots.txt file which is defined in the perl_lib/EPrints/Apache/RobotsTxt.pm file. Answered my own question. Sorry for the chatter. -Brian. Brian D. Gregg Solutions Architect
| Manager Systems Development University of Pittsburgh | University Library System Address: 7500
Thomas Blvd. Room 129 Pittsburgh, PA 15208 Tel: (412) 648-3264 | Email:
bdgregg@pitt.edu | Fax: (412) 648-3585 From: eprints-tech-bounces@ecs.soton.ac.uk [mailto:eprints-tech-bounces@ecs.soton.ac.uk]
On Behalf Of Brian D. Gregg As a follow up, I’ve found that the perl_lib/robots.pm that I found is related to AWSTATS – so that isn’t going to help here. So please ignore that bit of info. -Brian. Brian D. Gregg Solutions Architect
| Manager Systems Development University of Pittsburgh | University Library System Address: 7500
Thomas Blvd. Room 129 Pittsburgh, PA 15208 Tel: (412) 648-3264 | Email:
bdgregg@pitt.edu | Fax: (412) 648-3585 From:
eprints-tech-bounces@ecs.soton.ac.uk [mailto:eprints-tech-bounces@ecs.soton.ac.uk]
On Behalf Of Brian D. Gregg All, I’ve noticed that we are getting crawled by what seems to be a newer robot “AhrefsBot” (http://ahrefs.com) That also seems to be ignoring the “Disallow: /cgi/” stanza as when looking at the logs or the apache
server-status it is hitting things in /cgi.
L As a first measure to reign this bot in I’d like to add a parameter to the default robots.txt file “Crawl-Delay: 2” per their documentation (https://ahrefs.com/robot/) but not finding a simple way
of doing in EPrints so I started to go through the files and ran across: perl_lib/EPrints/Apache/RobotsTxt.pm where I see what is the default definition for the robots.txt file. I’ve updated that file and restarted the web server but alas the robots.txt file
does not change. So two questions: 1.
Does anyone have a hint on what needs to be done to identify a new bot correctly? I’ve also found the perl_lib/robots.pm but not sure where to add the AhrefsBot to the file. 2.
Does anyone know how to update the robots.txt file? Is it per archive? Thanks, Brian Gregg. Brian D. Gregg Solutions Architect
| Manager Systems Development University of Pittsburgh | University Library System Address:
7500 Thomas Blvd. Room 129 Pittsburgh, PA 15208 Tel: (412) 648-3264 | Email:
bdgregg@pitt.edu | Fax: (412) 648-3585 |
- References:
- [EP-tech] How to modify robots.txt and add a new bot?
- From: "Brian D. Gregg" <bdgregg@pitt.edu>
- [EP-tech] Re: How to modify robots.txt and add a new bot?
- From: "Brian D. Gregg" <bdgregg@pitt.edu>
- [EP-tech] How to modify robots.txt and add a new bot?
- Prev by Date: [EP-tech] Re: How to modify robots.txt and add a new bot?
- Next by Date: [EP-tech] Re: How to modify robots.txt and add a new bot?
- Previous by thread: [EP-tech] Re: How to modify robots.txt and add a new bot?
- Next by thread: [EP-tech] Re: How to modify robots.txt and add a new bot?
- Index(es):