EPrints Technical Mailing List Archive

See the EPrints wiki for instructions on how to join this mailing list and related information.

Message: #02066

[EP-tech] Fwd: exposing metadata elements properly - EPrints

To: "eprints-tech@ecs.soton.ac.uk List" <eprints-tech@ecs.soton.ac.uk>
Subject: [EP-tech] Fwd: exposing metadata elements properly - EPrints
From: Stevan Harnad <harnad@ecs.soton.ac.uk>
Date: Tue, 2 Jul 2013 07:38:09 -0400

Begin forwarded message:

From: "Madhan, M (ICRISAT-IN)" <M.Madhan@cgiar.org>
Subject: exposing metadata elements properly - EPrints
Date: 2 July, 2013 7:25:48 AM EDT
To: "'tdb2@ecs.soton.ac.uk'" <tdb2@ecs.soton.ac.uk>
Cc: "'Stevan Harnad'" <harnad@ecs.soton.ac.uk>, "'lac@ecs.soton.ac.uk'" <lac@ecs.soton.ac.uk>

Hi:

When AGRIS < http://agris.fao.org/ > team tried to harvest our repository < http://oar.icrisat.org > records, it came up with a request to expose keywords and journal name separately through OAI request. How to expose metadata elements the way AGRIS wants in EPrints? I would be grateful for your help.

Thanks
Madhan

From: Anibaldi, Stefano (OEKC) [mailto:Stefano.Anibaldi@fao.org]
Sent: Tuesday, July 02, 2013 3:15 PM
To: Madhan, M (ICRISAT-IN)
Cc: Keizer, Johannes; Celli, Fabrizio (OEKC)
Subject: RE: ICRISAT REPO

Dear Madhan

We discussed internally and decided to hold until the harvested metadata from ICRISAT will be able to output keywords, either uncontrolled or agrotags.
You will also have the time to see if it is possible to isolate at least the journal title and ISSN from the merged citation information which ICRISAT is dumping to dc:identifier.

Thank you and regards,

Stefano

From: Anibaldi, Stefano (OEKC)
Sent: 01 July 2013 14:59
To: 'Madhan, M (ICRISAT-IN)'
Cc: Keizer, Johannes (OEKC)
Subject: RE: ICRISAT REPO

Hi Madhan

I will discuss with my colleagues the feasibility to index data with all the issues that are listed in an email, reported below, which include the problem of lack of keywords, and the “merged “citation” information in dc:identifier”.
I will get back to you as soon as possible.

Cheers

Stefano

From: Madhan, M (ICRISAT-IN) [mailto:M.Madhan@cgiar.org]
Sent: 01 July 2013 06:54
To: Anibaldi, Stefano (OEKC); Johannes Keizer
Subject: ICRISAT REPO

Dear Stefano:

I am trying to tweak the codes to expose keywords separately to the harvesters. I am in discussion with forum members. I would request you please harvest the records without keywords. We may have to re-run the harvest once our repository can expose keywords as well.

Many thanks

M Madhan
Manager, Library and Information Services
Knowledge Sharing and Innovation
International Crops Research Institute for the Semi-Arid Tropics
Patancheru, Hyderabad 502 324
M.Madhan@cgiar.org
mu.madhan@gmail.com

From: Anibaldi, Stefano (OEKC)
Sent: 01 July 2013 10:10
To: 'Madhan, M (ICRISAT-IN)'
Cc: Keizer, Johannes (OEKC)
Subject: RE: ICRISAT open access data

Dear Madhan,

Thanks a lot and no problems J

Yes, as I was writing below, “The main “subjects” are present, but not the keywords, either uncontrolled or Agrotags (taken from AgroPedia). This is occurring when with all the several metadata formats offered.
I also noticed that one of the search engines (BASE) harvested and indexed your metadata and is completely missing with this essential information (especially for AGRIS and its RDF store).”
No problems for the publication of the data, actually I also come back from holidays.. J

Thanks again,
Stefano

From: Madhan, M (ICRISAT-IN) [mailto:M.Madhan@cgiar.org]
Sent: 23 June 2013 12:06
To: Anibaldi, Stefano (OEKC)
Cc: Keizer, Johannes (OEKC)
Subject: RE: ICRISAT open access data

Stefano:

Sorry again. There was an emergency in family, hence, I happened to rush on leave.

Bye the way, I just noticed that the "uncontrolled keywords" are not exposed. I tried to use "Agrotagger" but, I gave up as it was not able to assign proper keywords for a document. Let me find the way to expose the keywords and get back to you. Shall we delay indexing for a couple of days so that I can give a try?

Madhan

From: Anibaldi, Stefano (OEKC) [Stefano.Anibaldi@fao.org]
Sent: 21 June 2013 14:53:57
To: Madhan, M (ICRISAT-IN)
Cc: Keizer, Johannes
Subject: RE: ICRISAT open access data
Dear Madhan,

Could you please advice if the OAI data can include the keywords and eventually also part of the merged “citation” information in dc:identifier?

Thank you and regards

Stefano

From: Anibaldi, Stefano (OEKC)
Sent: 18 June 2013 16:11
To: 'Madhan, M (ICRISAT-IN)'
Subject: RE: ICRISAT open access data

No problems Madhan, take your time.
Please include also Johannes in your email since this morning we had a joint discussion on this specific issue.
Cheers
Stefano

From: Madhan, M (ICRISAT-IN) [mailto:M.Madhan@cgiar.org]
Sent: 18 June 2013 15:01
To: Anibaldi, Stefano (OEKC)
Subject: RE: ICRISAT open access data

Stefano:

Sorry for the belated reply. I was a bit held up.

Give me a day. I will give a detailed note about all the queries. Thanks

Madhan

From: Anibaldi, Stefano (OEKC) [mailto:Stefano.Anibaldi@fao.org]
Sent: Tuesday, June 18, 2013 5:34 PM
To: Madhan, M (ICRISAT-IN)
Cc: Keizer, Johannes
Subject: RE: ICRISAT open access data

Dear Madhan,

This morning I had a brief discussion with Johannes (in copy) and we agreed to accept the metadata with the full citation information merged as is.
We would anyway recommend to have journal titles, ISSN, ISBN, pagination, vol/no information and more, indexed in separate fields.

On another front, please let us know the feasability of exposing the keywords in the OAI-PMH repository, in a way that we can index them in AGRIS, too.

Thank you and regards,

Stefano

From: Anibaldi, Stefano (OEKC)
Sent: 14 June 2013 14:23
To: 'Madhan, M (ICRISAT-IN)'
Cc: Keizer, Johannes (OEKC)
Subject: RE: ICRISAT open access data

Hi Madhan,

There are yet two issues separate from the problems listed below on URL links, for the finalization of the harvesting and indexing of the ICRISAT data.

For the subjects, and more generally the indexing part, I had the chance to look a bit more in detail the data that you send via ftp and it appears that ICRISAT OAI-PMH data does not expose the keywords, when they are actually present in the Open Access Repository of ICRISAT.
The main “subjects” are present, but not the keywords, either uncontrolled or Agrotags (taken from AgroPedia). This is occurring when with all the several metadata formats offered.
I also noticed that one of the search engines (BASE) harvested and indexed your metadata and is completely missing with this essential information (especially for AGRIS and its RDF store).

Then, I found out that a complete set of information like journal title, date of pub, collation, publisher, vol/no, authors, and so on.. is included all together in dc:identifier (as well as the URLs..). It would be essential if this information is separated in its proper metatags.
AGRIS has proper indexes for dates, names, journals, issns and other information and if this is all merged into one tag, it becomes impossible, if, as is this case, there is no fixed pattern that would allow us to normalize the text internally.

Kindly let me know.

Cheers
Stefano

From: Anibaldi, Stefano (OEKC)
Sent: 11 June 2013 16:00
To: 'Madhan, M (ICRISAT-IN)'
Cc: 'Giannis Stoitsis'; 'nikosm@agroknow.gr'; Keizer, Johannes (OEKC)
Subject: RE: ICRISAT open access data

Hello Madhan,

Thanks for your response.
I am sure we’ll find a solution to this issue, since, as is, we would have problems publishing the metadata that was harvested from your OAI server.
I accessed some data from the five thousand and more harvested by our Agro-Know colleagues, but I only take one record for as an example, with three “links to the full text”.
The main problem is that there are multiple dc:identifier elements, and we are not sure which one is the right one for AGRIS, which needs one URL that leads a user to access the full text, which information, meaningfully enough, is described in the AGRIS Search with the label “Full-Text”. For the three (I noticed that most of the records offer four URLs) URLs offered in the attached XML record, this seems difficult to achieve. In fact for the following three URLs:
1.       http://oar.icrisat.org/5/1/cs51_5pp-2011_%282%29.pdf
2.       http://dx.doi.org/10.2135/cropsci2010.07.0440
3.       http://oar.icrisat.org/5/
No. 1. URL is leading the user to the following screen, showing that the access to the PDF is restricted. Result: the user leaves this page and maybe goes back to the reference itself and access No. 2 link

No.2 As is called, “The Official URL”, is the DOI link to the Springer metadata and the possibility to purchase the publication upon subscription (!)
No.3 is the metadata reference as is exposed and published in the ICRISAT repository and contains the widget that you are mentioning below and that should provide the user with the resource itself.

A quick temporary solution would be to index the ICRISAT data, including only the URL that are effectively landing on the full text and excluding all the other URLs.

Please let me know what you think and how we can do to index only the URLs linking directly to the full text.

Best regards,

Stefano

From: Madhan, M (ICRISAT-IN) [mailto:M.Madhan@cgiar.org]
Sent: 07 June 2013 05:04
To: Anibaldi, Stefano (OEKC)
Cc: 'Giannis Stoitsis'; 'Nikos Manolis'; 'nikosm@agroknow.gr'; Johannes Keizer
Subject: RE: ICRISAT open access data

Stefano:

Greetings!

In our repository, for a few documents direct download is restricted to our local users. However, for others, we have given request copy button for each document that is restricted. Harvesters, normally redirect users to the repositories to download full-text. Hence, you may consider linking the persistent URL of the document metadata rather linking the full-text PDF. I don’t have indicators (OA or restricted) readily built-in with the repository. I would request you to harvest all the metadata of our repository and each record has the provision to reach the full-text.

Many thanks.

Madhan

See: http://oar.icrisat.org/6842/

From: Anibaldi, Stefano (OEKC) [mailto:Stefano.Anibaldi@fao.org]
Sent: Thursday, June 06, 2013 8:00 PM
To: Madhan, M (ICRISAT-IN)
Cc: Keizer, Johannes; Subirats, Imma (OEKC)
Subject: RE: ICRISAT open access data

Dear Madhan,

Could you please tell me if there is a way for us to identify the links to the full text of the documents that are open to the entire community and those that are not?
We are close to the publication of the ICRISAT metadata in AGRIS, but we do not want to publish the publications that are restricted.

Thanks and regards

Stefano A.
AGRIS Team

From: Anibaldi, Stefano (OEKC)
Sent: 04 June 2013 17:19
To: 'Madhan, M (ICRISAT-IN)'
Cc: Giannis Stoitsis (stoitsis@ieee.org); Nikos Manolis (manolisn@agroknow.gr); nikosm@agroknow.gr
Subject: ICRISAT open access data

Dear Madhan

Hope things are fine with you. Long time, no see.. J

We are now working in collaboration with the Greek colleagues of Agro-Knows (cced) who are trying to harvest and process the metadata from your repository.
Now, I have a question regarding the information on the full text of the OAI-PMH ICRISAT data.
From the data harvested, both in didl and mets, I noticed a few (I did not check them all) URLs that are pointing to a pdf that cannot be accessed freely.
In the first three records I accessed, for example, from one XML doc that I just now harvested, all of them have restricted access and the user will end up to the ICRISAT Login page:
http://oar.icrisat.org/15/1/1606_ftp.pdf
http://oar.icrisat.org/86/1/AsianBiotechDevRev_12_3_17-34_2010.pdf
http://oar.icrisat.org/87/1/BiosystemsEng105_2_198-204_2010.pdf

Now, I am not sure in percentage how much of the ICRISAT collection is real open access, but, I magine that if an open repository expose metadata to the OAI-PMH, it would be really great to hide those URLs that are not linking to the full text of the document, since the information in itself is not useful to the aggregators, search engines and the students and researchers who aim to deepen their knowledge directly from the internet, without the necessity to send requests to the data owners.

What do you think?

Thanks and regards
Stefano A.

Prev by Date: [EP-tech] Re: bulk importing records with fulltext into eprints buffer
Next by Date: [EP-tech] Re: bulk importing records with fulltext into eprints buffer
Previous by thread: [EP-tech] bulk importing records with fulltext into eprints buffer
Next by thread: [EP-tech] Re: Exposing metadata -EPrints
Index(es):
- Date
- Thread