EPrints Technical Mailing List Archive
See the EPrints wiki for instructions on how to join this mailing list and related information.
Message: #09164
< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First
Re: [EP-tech] sitemap.xml
- To: eprints-tech@ecs.soton.ac.uk
- Subject: Re: [EP-tech] sitemap.xml
- From: Yuri <yurj@alfa.it>
- Date: Fri, 13 Jan 2023 11:58:55 +0100
CAUTION: This e-mail originated outside the University of Southampton. I think there's no trigger so I've also to add a cronjob to refresh the sitemap. Il 13/01/23 11:38, David R Newman via Eprints-tech ha scritto:
Hi Yuri, How sitemaps can be generated was partially rewritten in EPrints 3.4 to make them more compatible and useful when adding to the Google Search Admin console. I think before that the only sitemap available by default was /sitemap-sc.xml, which was designed for use with sitemaps.org. I think the standalone script: https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Feprints%2Feprints3.4%2Fblob%2Fmaster%2Fbin%2Fgenerate_sitemap&data=05%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C1216668aa2e6439bd79708daf5552ab3%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C638092043377452178%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=VLMJL53awIC7P1Sf5RrKYjofkJJ%2Bqb%2Fz%2B3NMwDUfc40%3D&reserved=0 Could just be copied to the same location in EPrints 3.3.x codebase and run as a nightly cron job. I don't think any changes will need made to perl_lib/EPrints/Apache/Rewrite.pm to make this sitemap the one that is presented when requesting /sitemap.xml. As the sitemap is written to sitemap.xml in your archive's cfg/static/ directory. So it would be treated like any other static page. I think in 3.3 the assumption was you might want your own hand-crafted sitemap at /sitemap.xml, as maybe you have specific non-standard pages you want indexing. So you were left to you own devices to either write this my hand or write a script that could regenerate it periodically. The generate_sitemap script was written very much with Google (and other companies) search indexing in mind, which I think is probably what most repository owners care about. It only adds the abstract pages of live eprint records to the sitemap. These are the most metadata rich pages specifically dedicated to individual publications. So it was deemed best to add these to the sitemap rather than the documents. However, by indexing services crawling abstract pages, the links to the documents should be discovered and subsequently indexed. Regards David Newman On 13/01/2023 10:11 am, Yuri via Eprints-tech wrote:CAUTION: This e-mail originated outside the University of Southampton. Hi! in eprints 3.3 how can I generate a sitemap.xml file? Does it automatically? What is perl_lib/EPrints/Apache/SiteMap.pm? *** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech *** Archive: https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.eprints.org%2Ftech.php%2F&data=05%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C1216668aa2e6439bd79708daf5552ab3%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C638092043377452178%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=N7CP%2Bq4XI3xC7PO0cArfQjSYUA0nUJ1mty5zBmWmp58%3D&reserved=0 *** EPrints community wiki: https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwiki.eprints.org%2F&data=05%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C1216668aa2e6439bd79708daf5552ab3%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C638092043377452178%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=mne3HSgLiBFQ%2FKIyNKpb2UkUak3cBXpkFsY9hk1coEc%3D&reserved=0*** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech *** Archive: https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.eprints.org%2Ftech.php%2F&data=05%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C1216668aa2e6439bd79708daf5552ab3%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C638092043377452178%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=N7CP%2Bq4XI3xC7PO0cArfQjSYUA0nUJ1mty5zBmWmp58%3D&reserved=0 *** EPrints community wiki: https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwiki.eprints.org%2F&data=05%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C1216668aa2e6439bd79708daf5552ab3%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C638092043377452178%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=mne3HSgLiBFQ%2FKIyNKpb2UkUak3cBXpkFsY9hk1coEc%3D&reserved=0
- Follow-Ups:
- Re: [EP-tech] sitemap.xml
- From: Yuri <yurj@alfa.it>
- Re: [EP-tech] sitemap.xml
- References:
- [EP-tech] sitemap.xml
- From: Yuri <yurj@alfa.it>
- Re: [EP-tech] sitemap.xml
- From: David R Newman <drn@ecs.soton.ac.uk>
- Re: [EP-tech] sitemap.xml
- From: Yuri <yurj@alfa.it>
- [EP-tech] sitemap.xml
- Prev by Date: Re: [EP-tech] sitemap.xml
- Next by Date: [EP-tech] Modification of language by script
- Previous by thread: [EP-tech] EPrints/CRIS
- Next by thread: [EP-tech] DOI handling in orcid_support_advance
- Index(es):