EPrints Technical Mailing List Archive
See the EPrints wiki for instructions on how to join this mailing list and related information.
Message: #00703
< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First
[EP-tech] Re: Sitemap
- To: "eprints-tech@ecs.soton.ac.uk" <eprints-tech@ecs.soton.ac.uk>
- Subject: [EP-tech] Re: Sitemap
- From: Mark Gregson <mark.gregson@qut.edu.au>
- Date: Tue, 12 Jun 2012 09:47:50 +1000
Hi Casey You're correct that modification to EPrints::Apache::Rewrite you include below is required when you apply the optional patch. I missed it when creating the package - thanks for spotting it. I'll add this patch to the download when I get a chance. Cheers Mark -----Original Message----- From: eprints-tech-bounces@ecs.soton.ac.uk [mailto:eprints-tech-bounces@ecs.soton.ac.uk] On Behalf Of rchilliard@mun.ca Sent: Monday, 11 June 2012 10:36 PM To: eprints-tech@ecs.soton.ac.uk Subject: [EP-tech] Re: Sitemap Hi Mark, All Many thanks for the script, great help in getting a Google friendly sitemap up and running. Two supplemental notes Re: our local repository (3.3.7): If applying the patch component of the sitemap code, I believe it might be necessary to add a clause to the Apache URL handler to permit linking to the originaly style sitemap. After applying the patch, the dynamic sitemap is defined as: {Repository URL}/sitemap-sc.xml however, we noted 404's at that location after applying the patch. The issue seemed to be that requests to Apache for that page are not, by default, being forwarded to the sitemap handler (which would normally be generating that page on-the-fly). We solved the trouble by adding a clause to (~eprints/perl_lib/EPrints/Apache/Rewrite.pm) to catch any sitemap-like page requests and pass them all to the perl module for handling sitemaps: -- snip (Rewrite.pm) -- # sitemap.xml (nb. only works if site is in root / of domain.) if( $uri =~ m! ^$urlpath/sitemap\.xml$ !x ) { $r->handler( 'perl-script' ); $r->set_handlers(PerlResponseHandler => \&EPrints::Apache::SiteMap::handler ); return OK; } # Added modification to handle supplementary sitemaps (sitemap*.xml, -- including sitemap-sc.xml) # (nb. only works if site is in root / of domain.) if( $uri =~ m! ^$urlpath/sitemap[-\w]*\.xml$ !x ) { $r->handler( 'perl-script' ); $r->set_handlers(PerlResponseHandler => \&EPrints::Apache::SiteMap::handler ); return OK; } -- snip -- Note 2 Re: the older sitemap. The default Eprints robots.txt excludes access to the cgi directory of eprints, however, the dynamic sitemap is generated as: {base repository url}/cgi/export/repository/RDFXML/{repository name}.rdf This could be a bit of an issue for polite crawling robots unless some form of the above url is added as an allow. E.g robots.txt, given repository specific values for {repository URL} and {repository name}: -- snip -- User-agent: * Sitemap: http://{repository URL}/sitemap.xml Allow: /cgi/export/repository/RDFXML/{repository name}.rdf Disallow: /cgi/ -- snip -- Hopefully the above may prove useful to others working on sitemap bits and pieces. Cheers, Casey ________________________________________ From: eprints-tech-bounces@ecs.soton.ac.uk [eprints-tech-bounces@ecs.soton.ac.uk] on behalf of Mark Gregson [mark.gregson@qut.edu.au] Sent: Wednesday, June 06, 2012 11:12 PM To: eprints-tech@ecs.soton.ac.uk Subject: [EP-tech] Re: Sitemap To all those who asked for the sitemap script I wrote about on the list previously, I'm sorry for the delay in responding but I've just now published the script to files.eprints.org. It's currently in review and not publicly accessible but when the review is complete you will be able to get it from http://files.eprints.org/774/. Please let me know how you get on with it, any feedback and suggestions (or patches) will be taken on board but I can't guarantee I'll have time to do anything about it! Cheers Mark Mark Gregson | Applications and Development Team Leader Library eServices | Queensland University of Technology Level 2 | R Block | Kelvin Grove Campus | GPO Box 2434 | Brisbane 4001 Phone: +61 7 3138 3782 | Web: http://eprints.qut.edu.au/ ABN: 83 791 724 622 CRICOS No: 00213J -----Original Message----- From: eprints-tech-bounces@ecs.soton.ac.uk [mailto:eprints-tech-bounces@ecs.soton.ac.uk] On Behalf Of Centro de Documentación Sent: Tuesday, 5 June 2012 10:43 AM To: eprints-tech@ecs.soton.ac.uk Subject: [EP-tech] Sitemap Hi, Can anyone please share a sitemap file or give me some tips on how to create it? Thanks, Cristian *** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech *** Archive: http://www.eprints.org/tech.php/ *** EPrints community wiki: http://wiki.eprints.org/ *** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech *** Archive: http://www.eprints.org/tech.php/ *** EPrints community wiki: http://wiki.eprints.org/ This electronic communication is governed by the terms and conditions at http://www.mun.ca/cc/policies/electronic_communications_disclaimer_2012.php *** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech *** Archive: http://www.eprints.org/tech.php/ *** EPrints community wiki: http://wiki.eprints.org/
- References:
- [EP-tech] Sitemap
- From: Centro de Documentación <cendocu@gmail.com>
- [EP-tech] Re: Sitemap
- From: Mark Gregson <mark.gregson@qut.edu.au>
- [EP-tech] Re: Sitemap
- From: <rchilliard@mun.ca>
- [EP-tech] Sitemap
- Prev by Date: [EP-tech] Re: Sitemap
- Next by Date: [EP-tech] Re: Cann't upload file
- Previous by thread: [EP-tech] Re: Sitemap
- Next by thread: [EP-tech] adding a dark archive
- Index(es):