EPrints Technical Mailing List Archive

See the EPrints wiki for instructions on how to join this mailing list and related information.

Message: #09683


< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First

Re: [EP-tech] Imported records not searchable in Google Scholar


CAUTION: This e-mail originated outside the University of Southampton.
Hi David,

I already using generate_sitemap, but when I try to add to Google Seach Console, there is an error like screenshot below

image.png

It says : Couldn't fetch.
Are there any solutions for this  error ?

Regards,
Agung PW

On Tue, 19 Mar 2024 at 15:17, David R Newman <drn@ecs.soton.ac.uk> wrote:
Hi Ellis,

It is odd that newly submitted items are appearing but not pre-existing ones.  I assume that this migration from an old server did not change the URLs for these items?  Google Scholar is a somewhat of a black box.  The only definitive information I have is that the original URLs for items of the format:

https://HOSTNAME/ID/

were unhelpful to them.  This is because they could not programmatically determine (just from the URLs themselves) that they were links to items hosted on an EPrints repository.  This is why for EPrints 3.4 there is the following configuration option:

$c->{use_long_url_format} = 1;

This ensures URLs of the following format are used:

https://HOSTNAME/id/eprint/ID/

These URLs Google Scholar are now programmed to recognise as URLs for items in an EPrints repository.

For indexing in regular Google I always register a sitemap for the EPrints repository on Google's Search Console [1].  EPrints 3.4.2+ comes with a script that can generate a sitemap for EPrints [2].  This can then be registered (using the URL https://HOSTNAME/sitemap.xml) on Google's Search Console.  This sitemap contains a list of all items publicly available in the EPrints repository the script was run against.  Typically, you would run the generate_sitemap script as a cron job every night to keep it up to date.  It should be possible to run this script on earlier versions of EPrints, possibly even 3.3.x.

Regards

David Newman

[1] https://search.google.com/search-console
[2] https://github.com/eprints/eprints3.4/blob/master/bin/generate_sitemap

On 19/03/2024 2:17 am, Eliseo Gatchalian wrote:
CAUTION: This e-mail originated outside the University of Southampton.
CAUTION: This e-mail originated outside the University of Southampton.

Hi all,

 

We moved to a new server a few months ago and we’ve imported our records from the old server.  It looks like the imported records are not showing in Google Scholar or in Google Search.  It looks like only the new submitted items after we went live to the new server are being harvested by Google.

 

How do we force google to index all the item in the archive again and make it searchable via Google Scholar and Google Search?

 

Any help is much appreciated.

 

Many thanks!

 

 

Best regards,

Ellis

 

 

Ellis Gatchalian

Systems Librarian

 

signature_2111870412

 

Private Bag 3036, Waikato Mail Centre, Hamilton 3240
+64-(0)7-838 6399 ext. 8633

wintec.ac.nz

 

Ki te Kotahi te kākaho ka whati, ki te kāpuia, e kore e whati | Alone we are vulnerable, together we are invincible

 

signature_3406333193

 


This electronic mail transmission is intended for the named recipients only. It may contain private and confidential information. If this has come to you in error you must take no action based upon it, nor must you copy it or show it to anyone; please telephone or email the sender at Wintec immediately and return the original email. We cannot accept any liability for any loss or damage sustained as a result of software viruses. It is your responsibility to carry out such virus checking as is necessary before opening any attachment which may be included with this message.


*** Options: https://wiki.eprints.org/w/Eprints-tech_Mailing_List
*** Archive: https://www.eprints.org/tech.php/
*** EPrints community wiki: https://wiki.eprints.org/


*** Options: https://wiki.eprints.org/w/Eprints-tech_Mailing_List
*** Archive: https://www.eprints.org/tech.php/
*** EPrints community wiki: https://wiki.eprints.org/