EPrints Technical Mailing List Archive

See the EPrints wiki for instructions on how to join this mailing list and related information.

Message: #09164


< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First

Re: [EP-tech] sitemap.xml


CAUTION: This e-mail originated outside the University of Southampton.

I think there's no trigger so I've also to add a cronjob to refresh the
sitemap.

Il 13/01/23 11:38, David R Newman via Eprints-tech ha scritto:
Hi Yuri,

How sitemaps can be generated was partially rewritten in EPrints 3.4 to
make them more compatible and useful when adding to the Google Search
Admin console.  I think before that the only sitemap available by
default was /sitemap-sc.xml, which was designed for use with
sitemaps.org.  I think the standalone script:

https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Feprints%2Feprints3.4%2Fblob%2Fmaster%2Fbin%2Fgenerate_sitemap&data=05%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C1216668aa2e6439bd79708daf5552ab3%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C638092043377452178%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=VLMJL53awIC7P1Sf5RrKYjofkJJ%2Bqb%2Fz%2B3NMwDUfc40%3D&reserved=0

Could just be copied to the same location in EPrints 3.3.x codebase and
run as a nightly cron job.  I don't think any changes will need made to
perl_lib/EPrints/Apache/Rewrite.pm to make this sitemap the one that is
presented when requesting /sitemap.xml.  As the sitemap is written to
sitemap.xml in your archive's cfg/static/ directory. So it would be
treated like any other static page.

I think in 3.3 the assumption was you might want your own hand-crafted
sitemap at /sitemap.xml, as maybe you have specific non-standard pages
you want indexing.  So you were left to you own devices to either write
this my hand or write a script that could regenerate it periodically.
The generate_sitemap script was written very much with Google (and other
companies) search indexing in mind, which I think is probably what most
repository owners care about. It only adds the abstract pages of live
eprint records to the sitemap.  These are the most metadata rich pages
specifically dedicated to individual publications.  So it was deemed
best to add these to the sitemap rather than the documents.  However, by
indexing services crawling abstract pages, the links to the documents
should be discovered and subsequently indexed.

Regards

David Newman

On 13/01/2023 10:11 am, Yuri via Eprints-tech wrote:
CAUTION: This e-mail originated outside the University of Southampton.

Hi!

    in eprints 3.3 how can I generate a sitemap.xml file? Does it
automatically? What is perl_lib/EPrints/Apache/SiteMap.pm?


*** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
*** Archive: https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.eprints.org%2Ftech.php%2F&data=05%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C1216668aa2e6439bd79708daf5552ab3%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C638092043377452178%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=N7CP%2Bq4XI3xC7PO0cArfQjSYUA0nUJ1mty5zBmWmp58%3D&reserved=0
*** EPrints community wiki: https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwiki.eprints.org%2F&data=05%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C1216668aa2e6439bd79708daf5552ab3%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C638092043377452178%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=mne3HSgLiBFQ%2FKIyNKpb2UkUak3cBXpkFsY9hk1coEc%3D&reserved=0

*** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
*** Archive: https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.eprints.org%2Ftech.php%2F&data=05%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C1216668aa2e6439bd79708daf5552ab3%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C638092043377452178%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=N7CP%2Bq4XI3xC7PO0cArfQjSYUA0nUJ1mty5zBmWmp58%3D&reserved=0
*** EPrints community wiki: https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwiki.eprints.org%2F&data=05%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C1216668aa2e6439bd79708daf5552ab3%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C638092043377452178%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=mne3HSgLiBFQ%2FKIyNKpb2UkUak3cBXpkFsY9hk1coEc%3D&reserved=0