EPrints Technical Mailing List Archive

Message: #02069


< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First

[EP-tech] Re: Exposing metadata -EPrints



Dear All:
>
> When AGRIS team tried to harvest our repository records, it came up with a request to expose keywords and journal separately through OAI request.  Please see here the way our repository  exposes metadata elements http://oar.icrisat.org/cgi/oai2 .  How to expose metadata elements the way we want in EPrints?
>
>
>
> Thanks
> Madhan
>


-----Original Message-----
From: franc@library.iisc.ernet.in [mailto:franc@library.iisc.ernet.in]
Sent: Tuesday, July 02, 2013 5:29 PM
To: Madhan, M (ICRISAT-IN)
Subject: Re: Madhan requests help

Hi Madan, I had a look at both our repositories. The OAI record doesn't
explicitly expose the source and the keyword metadata elements. If this
needs to be done, then one has to tweak the eprins code (oai2.pl). This is
my understanding. you may also check with the eprints tech list.

Best, Francis

On Tue, 2 Jul 2013, Madhan, M (ICRISAT-IN) wrote:

> Dear Francis:
>
> When AGRIS team tried to harvest our repository records, it came up with a request to expose keywords and journal separately through OAI request.  Please see here the way our repository  exposes metadata elements http://oar.icrisat.org/cgi/oai2 and even IISc repository exposes the same way.  How to expose metadata elements the way we want in EPrints?
>
>
>
> Thanks
> Madhan
>
> ________________________________
> From: Anibaldi, Stefano (OEKC) [mailto:Stefano.Anibaldi@fao.org]
> Sent: Tuesday, July 02, 2013 3:15 PM
> To: Madhan, M (ICRISAT-IN)
> Cc: Keizer, Johannes; Celli, Fabrizio (OEKC)
> Subject: RE: ICRISAT REPO
>
> Dear Madhan
>
> We discussed internally and decided to hold until the harvested metadata from ICRISAT will be able to output keywords, either uncontrolled or agrotags.
> You will also have the time to see if it is possible to isolate at least the journal title and ISSN from the merged citation information which ICRISAT is dumping to dc:identifier.
>
> Thank you and regards,
>
> Stefano
>
>
>
> From: Anibaldi, Stefano (OEKC)
> Sent: 01 July 2013 14:59
> To: 'Madhan, M (ICRISAT-IN)'
> Cc: Keizer, Johannes (OEKC)
> Subject: RE: ICRISAT REPO
>
> Hi Madhan
>
> I will discuss with my colleagues the feasibility to index data with all the issues that are listed in an email, reported below, which include the problem of lack of keywords, and the "merged "citation" information in dc:identifier".
> I will get back to you as soon as possible.
>
> Cheers
>
> Stefano
>
>
>
> From: Madhan, M (ICRISAT-IN) [mailto:M.Madhan@cgiar.org]
> Sent: 01 July 2013 06:54
> To: Anibaldi, Stefano (OEKC); Johannes Keizer
> Subject: ICRISAT REPO
>
> Dear Stefano:
>
> I am trying to tweak the codes to expose keywords separately to the harvesters.  I am in discussion with forum members.  I would request you please harvest the records without keywords.  We may have to re-run the harvest once our repository can expose keywords as well.
>
> Many thanks
>
> M Madhan
> Manager, Library and Information Services
> Knowledge Sharing and Innovation
> International Crops Research Institute for the Semi-Arid Tropics
> Patancheru, Hyderabad 502 324
> M.Madhan@cgiar.org<mailto:M.Madhan@cgiar.org>
> mu.madhan@gmail.com<mailto:mu.madhan@gmail.com>
>
>
>
> From: Anibaldi, Stefano (OEKC)
> Sent: 01 July 2013 10:10
> To: 'Madhan, M (ICRISAT-IN)'
> Cc: Keizer, Johannes (OEKC)
> Subject: RE: ICRISAT open access data
>
> Dear Madhan,
>
> Thanks a lot and no problems :)
>
> Yes, as I was writing below, "The main "subjects" are present, but not the keywords, either uncontrolled or Agrotags (taken from AgroPedia). This is occurring when with all the several metadata formats offered.
> I also noticed that one of the search engines (BASE) harvested and indexed your metadata and is completely missing with this essential information (especially for AGRIS and its RDF store)."
> No problems for the publication of the data, actually I also come back from holidays.. :)
>
> Thanks again,
> Stefano
>
>
> From: Madhan, M (ICRISAT-IN) [mailto:M.Madhan@cgiar.org]
> Sent: 23 June 2013 12:06
> To: Anibaldi, Stefano (OEKC)
> Cc: Keizer, Johannes (OEKC)
> Subject: RE: ICRISAT open access data
>
> Stefano:
>
> Sorry again.  There was an emergency in family, hence, I happened to rush on leave.
>
> Bye the way, I just noticed that the "uncontrolled keywords" are not exposed.  I tried to use "Agrotagger" but, I gave up as it was not able to assign proper keywords for a document.  Let me find the way to expose the keywords and get back to you.  Shall we delay indexing for a couple of days so that I can give a try?
>
>
> Madhan
> ________________________________
> From: Anibaldi, Stefano (OEKC) [Stefano.Anibaldi@fao.org]
> Sent: 21 June 2013 14:53:57
> To: Madhan, M (ICRISAT-IN)
> Cc: Keizer, Johannes
> Subject: RE: ICRISAT open access data
> Dear Madhan,
>
> Could you please advice if the OAI data can include the keywords and eventually also part of the merged "citation" information in dc:identifier?
>
> Thank you and regards
>
> Stefano
>
>
> From: Anibaldi, Stefano (OEKC)
> Sent: 18 June 2013 16:11
> To: 'Madhan, M (ICRISAT-IN)'
> Subject: RE: ICRISAT open access data
>
> No problems Madhan, take your time.
> Please include also Johannes in your email since this morning we had a joint discussion on this specific issue.
> Cheers
> Stefano
>
> From: Madhan, M (ICRISAT-IN) [mailto:M.Madhan@cgiar.org]
> Sent: 18 June 2013 15:01
> To: Anibaldi, Stefano (OEKC)
> Subject: RE: ICRISAT open access data
>
> Stefano:
>
> Sorry for the belated reply. I was a bit held up.
>
> Give me a day.  I will give a detailed note about all the queries.  Thanks
>
> Madhan
>
> ________________________________
> From: Anibaldi, Stefano (OEKC) [mailto:Stefano.Anibaldi@fao.org]
> Sent: Tuesday, June 18, 2013 5:34 PM
> To: Madhan, M (ICRISAT-IN)
> Cc: Keizer, Johannes
> Subject: RE: ICRISAT open access data
>
> Dear Madhan,
>
> This morning I had a brief discussion with Johannes (in copy) and we agreed to accept the metadata with the full citation information merged as is.
> We would anyway recommend to have journal titles, ISSN, ISBN, pagination, vol/no information and more, indexed in separate fields.
>
> On another front, please let us know the feasability of exposing the keywords in the OAI-PMH repository, in a way that we can index them in AGRIS, too.
>
> Thank you and regards,
>
> Stefano
>
>
>
>
> From: Anibaldi, Stefano (OEKC)
> Sent: 14 June 2013 14:23
> To: 'Madhan, M (ICRISAT-IN)'
> Cc: Keizer, Johannes (OEKC)
> Subject: RE: ICRISAT open access data
>
> Hi Madhan,
>
> There are yet two issues separate from the problems listed below on URL links, for the finalization of the harvesting and indexing of the ICRISAT data.
> For the subjects, and more generally the indexing part, I had the chance to look a bit more in detail the data that you send via ftp and it appears that ICRISAT OAI-PMH data does not expose the keywords, when they are actually present in the Open Access Repository of ICRISAT.
> The main "subjects" are present, but not the keywords, either uncontrolled or Agrotags (taken from AgroPedia). This is occurring when with all the several metadata formats offered.
> I also noticed that one of the search engines (BASE) harvested and indexed your metadata and is completely missing with this essential information (especially for AGRIS and its RDF store).
>
> Then, I found out that a complete set of information like journal title, date of pub, collation, publisher, vol/no, authors, and so on.. is included all together in dc:identifier (as well as the URLs..). It would be essential if this information is separated in its proper metatags.
> AGRIS has proper indexes for dates, names, journals, issns and other information and if this is all merged into one tag, it becomes impossible, if, as is this case, there is no fixed pattern that would allow us to normalize the text internally.
>
> Kindly let me know.
>
> Cheers
> Stefano
>
>
> From: Anibaldi, Stefano (OEKC)
> Sent: 11 June 2013 16:00
> To: 'Madhan, M (ICRISAT-IN)'
> Cc: 'Giannis Stoitsis'; 'nikosm@agroknow.gr'; Keizer, Johannes (OEKC)
> Subject: RE: ICRISAT open access data
>
> Hello Madhan,
>
> Thanks for your response.
> I am sure we'll find a solution to this issue, since, as is, we would have problems publishing the metadata that was harvested from your OAI server.
> I accessed some data from the five thousand and more harvested by our Agro-Know colleagues, but I only take one record for as an example, with three "links to the full text".
> The main problem is that there are multiple dc:identifier elements, and we are not sure which one is the right one for AGRIS, which needs one URL that leads a user to access the full text,  which information, meaningfully enough, is described in the AGRIS Search with the label "Full-Text". For the three (I noticed that most of the records offer four URLs) URLs offered in the attached XML record, this seems difficult to achieve. In fact for the following three URLs:
>
> 1.       http://oar.icrisat.org/5/1/cs51_5pp-2011_%282%29.pdf
>
> 2.       http://dx.doi.org/10.2135/cropsci2010.07.0440
>
> 3.       http://oar.icrisat.org/5/
> No. 1. URL is leading the user to the following screen, showing that the access to the PDF is restricted. Result: the user leaves this page and maybe goes back to the reference itself and access No. 2 link
>
> [cid:image001.jpg@01CE7643.15BCBC60]
> No.2 As is called, "The Official URL", is the DOI link to the Springer metadata and the possibility to purchase the publication upon subscription (!)
> No.3 is the metadata reference as is exposed and published in the ICRISAT repository and contains the widget that you are mentioning below and that should provide the user with the resource itself.
>
> A quick temporary solution would be to index the ICRISAT data, including only the URL that are effectively landing on the full text and excluding all the other URLs.
>
> Please let me know what you think and how we can do to index only the URLs linking directly to the full text.
>
> Best regards,
>
> Stefano
>
>
> From: Madhan, M (ICRISAT-IN) [mailto:M.Madhan@cgiar.org]
> Sent: 07 June 2013 05:04
> To: Anibaldi, Stefano (OEKC)
> Cc: 'Giannis Stoitsis'; 'Nikos Manolis'; 'nikosm@agroknow.gr'; Johannes Keizer
> Subject: RE: ICRISAT open access data
>
> Stefano:
>
> Greetings!
>
> In our repository, for a few documents direct download is restricted to our local users.  However, for others, we have given request copy button for each document that is restricted.  Harvesters, normally redirect users to the repositories to download full-text.  Hence, you may consider linking the persistent URL of the document metadata rather linking the full-text PDF.  I don't have indicators (OA or restricted) readily built-in with the repository.  I would request you to harvest all the metadata of our repository and each record has the provision to reach the full-text.
>
> Many thanks.
>
> Madhan
>
> See: http://oar.icrisat.org/6842/
>
>
> [cid:image004.jpg@01CE66AC.C4EDF420]
>
> ________________________________
> From: Anibaldi, Stefano (OEKC) [mailto:Stefano.Anibaldi@fao.org]
> Sent: Thursday, June 06, 2013 8:00 PM
> To: Madhan, M (ICRISAT-IN)
> Cc: Keizer, Johannes; Subirats, Imma (OEKC)
> Subject: RE: ICRISAT open access data
>
> Dear Madhan,
>
> Could you please tell me if there is a way for us to identify the links to the full text of the documents that are open to the entire community and those that are not?
> We are close to the publication of the ICRISAT metadata in AGRIS, but we do not want to publish the publications that are restricted.
>
> Thanks and regards
>
> Stefano A.
> AGRIS Team
>
>
> From: Anibaldi, Stefano (OEKC)
> Sent: 04 June 2013 17:19
> To: 'Madhan, M (ICRISAT-IN)'
> Cc: Giannis Stoitsis (stoitsis@ieee.org<mailto:stoitsis@ieee.org>); Nikos Manolis (manolisn@agroknow.gr<mailto:manolisn@agroknow.gr>); nikosm@agroknow.gr<mailto:nikosm@agroknow.gr>
> Subject: ICRISAT open access data
>
> Dear Madhan
>
> Hope things are fine with you. Long time, no see.. :)
>
> We are now working in collaboration with the Greek colleagues of Agro-Knows (cced) who are trying to harvest and process the metadata from your repository.
> Now, I have a question regarding the information on the full text of the OAI-PMH ICRISAT data.
>> From the data harvested, both in didl and mets, I noticed a few (I did not check them all)  URLs that are pointing to a pdf that cannot be accessed freely.
> In the first three records I accessed, for example, from one XML doc that I just now harvested, all of them have restricted access and the user will end up to the ICRISAT Login page:
> http://oar.icrisat.org/15/1/1606_ftp.pdf
> http://oar.icrisat.org/86/1/AsianBiotechDevRev_12_3_17-34_2010.pdf
> http://oar.icrisat.org/87/1/BiosystemsEng105_2_198-204_2010.pdf
>
> Now, I am not sure in percentage how much of the ICRISAT collection is real open access, but, I magine that if an open repository expose metadata to the OAI-PMH, it would be really great to hide those URLs that are not linking to the full text of the document, since the information in itself is not useful to the aggregators, search engines and the students and researchers who aim to deepen their knowledge directly from the internet, without the necessity to send requests to the data owners.
>
> What do you think?
>
> Thanks and regards
> Stefano A.
>
>
>
>
>
>
>
>

--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.




--
Madhan, M
Manager, Library and Information Services
International Crops Research Institute for Semi-Arid Tropics (ICRISAT)
Patancheru, Hyderabad
India
www.icrisat.org