EPrints Technical Mailing List Archive

See the EPrints wiki for instructions on how to join this mailing list and related information.

Message: #05895


< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First

Re: [EP-tech] Atom.xsl Patch Submission


Hello everybody,

I noticed another issue with the Atom.xsl import stylesheet. The wildcard transformation:

<xsl:apply-templates select="atom:entry/*" />

in combination with the following list of templates and the final ignore whitelist is problematic. If the imported Atom XML file contains nodes that are neither covered by an XSL template nor listed in the ignore whitelist (i.e. atom:published), then the resulting EPrints XML file will be malformed because the nodes get rendered in the file as plain XML literals.

I created a pull request here: https://github.com/eprints/eprints/pull/420

It includes the fixes from John, a more robust implementation of the existing atom transformation and the support for mapping dcterms:type and dcterms:subject.

All the best,

Sebastian

Semiodesk GmbH | Werner-von-Siemens-Str. 6 Geb. 15k, 86159 Augsburg, Germany Phone: +49 821 8854401 | Fax: +49 821 8854410 | www.semiodesk.com


This e-mail message may contain confidential or legally privileged information and is intended only for the use of the intended recipient(s). Any unauthorized disclosure, dissemination, distribution, copying or the taking of any action in reliance on the information herein is prohibited. E-mails are not secure and cannot be guaranteed to be error free as they can be intercepted, amended, or contain viruses. Anyone who communicates with us by e-mail is deemed to have accepted these risks. Semiodesk GmbH is not responsible for errors or omissions in this message and denies any responsibility for any damage arising from the use of e-mail. Any opinion and other statement contained in this message and any attachment are solely those of the author and do not necessarily represent those of the company.


2016-08-31 18:08 GMT+02:00 Sebastian Faubel <sebastian@semiodesk.com>:
Dear John,

thank you for your quick response. I also think that the standard URI for the eprint_status should be used instead of the solution that was proposed by me. However, I am new to EPrints and do not know which the standard URI actually is.

Concerning the subjects: I understand that one could import terms that are not defined in the local vocabulary. However, this is a general problem with using plain literals as identifiers* and not specific to the Atom XML import. The problem also exists when importing EPrints XML datasets. Am I wrong here? If not, then I would suggest to add the support for setting the item type and subjects as I proposed because it does not break anything. It simply generates the equivalent of an EPrints XML dataset.

From a user perspective, everybody is happy if the terms are aligned upon submission. If not, a reviewer has the chance of detecting he problem. However, if these terms are entirely left out then reviewers have no chance of finding out what was originally meant which in turn may complicate the reviewing process.

Moreover, installing a plugin to enable this feature does not solve the actual problem. Therefore, I think if this feature is used correctly, then it is a chance to make deposits to EPrints repositories more convenient for end-users and reviewers.

All the best,

Sebastian

* Aside from the problem that the same term may refer to a different concept.

Semiodesk GmbH | Werner-von-Siemens-Str. 6 Geb. 15k, 86159 Augsburg, Germany Phone: +49 821 8854401 | Fax: +49 821 8854410 | www.semiodesk.com


This e-mail message may contain confidential or legally privileged information and is intended only for the use of the intended recipient(s). Any unauthorized disclosure, dissemination, distribution, copying or the taking of any action in reliance on the information herein is prohibited. E-mails are not secure and cannot be guaranteed to be error free as they can be intercepted, amended, or contain viruses. Anyone who communicates with us by e-mail is deemed to have accepted these risks. Semiodesk GmbH is not responsible for errors or omissions in this message and denies any responsibility for any damage arising from the use of e-mail. Any opinion and other statement contained in this message and any attachment are solely those of the author and do not necessarily represent those of the company.


2016-08-31 16:50 GMT+02:00 John Salter <J.Salter@leeds.ac.uk>:

Hi Sebastian,

Thanks for submitting this patch.

 

The ‘yomiko’ part is a good catch.

When I export something as Atom, I get these category elements:

<category term="article" label="Article" scheme="http://eprints.whiterose.ac.uk/data/eprint/type"/>

<category term="archive" label="Live Archive" scheme="http://eprints.org/ep2/data/2.0/eprint/eprint_status"/>

These are generated here: https://github.com/eprints/eprints/blob/3.3/perl_lib/EPrints/Plugin/Export/Atom.pm#L258-L272

 

The ‘eprint_status’ one uses the eprints.org namespace – which I think is what should possibly be used instead of ‘yomiko’ [EPrints Services: how does 3.4 (without a default ‘flavour’) handle this?].

The ‘type’ one uses the repository namespace – I think because these can be configured at the repository level.

 

I have created this pull request: https://github.com/eprints/eprints/pull/419 for this.

 

For the ‘subjects’ part, in EPrints, the subjects field is normally a controlled-value field, based on the ‘subjects’ dataset.

If values added to the subject field don’t exist in the subjects dataset, EPrints doesn’t break – but they will render like this:

?? value ??

– which isn’t normally what is wanted.

 

By default (in the perl_lib Atom.xsl file), I think it’s safer to *not* map the dcterms:subject into the subjects field (I haven’t done this in the pull request above).

 

To achieve the improved import of data for Artivity, I would either

(i) make a new XSL import mapping (see warning below!):

~/archives/ARCHIVEID/cfg/plugins/EPrints/Plugin/Import/XSLT/ArtivityAtom.xsl (change the attribute to ept:name=”Artivity Atom XML”)

Or (ii) override the default Atom plugin:

~/archives/ARCHIVEID/cfg/plugins/EPrints/Plugin/Import/XSLT/Atom.xsl

 

!! WARNING !!

I’m not sure how the code here: https://github.com/eprints/eprints/blob/3.3/perl_lib/EPrints/Plugin/Import/AtomMultipart.pm#L96-L113  will behave when there are multiple plugins defined that can handle application/atom+xml imports. If you have two plugins: Atom.xsl and ArtivityAtom.xsl, things might not work. I haven’t tested this (please let us know if you go down this route and it works!).

 

Also, I’ve never over-ridden an XSL plugin. To override perl EPrints plugins this is the way to do it:

https://wiki.eprints.org/w/Instructions_for_local_plugins

I’m not sure if you’d need to, or how you would define the ‘plugin_alias_map’ aspect…

 

 

Cheers,

John

 

From: eprints-tech-bounces@ecs.soton.ac.uk [mailto:eprints-tech-bounces@ecs.soton.ac.uk] On Behalf Of Sebastian Faubel
Sent: 31 August 2016 11:43
To: eprints-tech@ecs.soton.ac.uk
Subject: [EP-tech] Atom.xsl Patch Submission

 

Hello everyone,

 

I have found the reason why the category id and subjects are not recognized when depositing files in E-Prints using the Atom Publishing Protocol. The XSLT stylesheet 'Atom.xsl' [0]  in the Import directory does not handle those elements when converting Atom to EPrints XML.

 

Please find attached a version of the file which handles the dcterms:type and dcterms:subject terms and translates them into E-Prints XML. The dcterms vocabulry seems to be widely used in SWORD protocol implementations (i.e. [1]).

 

Additionally, I corrected a line in the stylesheet which transforms a submitted eprints status. The line checked for the status being equal to 'MailScanner has detected a possible fraud attempt from "yomiko.ecs.soton.ac.uk80" claiming to be http://yomiko.ecs.soton.ac.uk:8080/data/eprint/status/'. It seems to me that this is a concrete EPrints instance, so the line would not work for any other EPrints instance. I changed the line to: 'contains(@scheme,'/eprint/status')'. This should work for all EPrints instances, including my test server.

 

Please let me know if you will be including my patch into the repository.

 

Thank you,

 

Sebastian

 

[0] perl_lib/EPrints/Plugin/Import/XSLT/Atom.xsl

[1] http://guides.dataverse.org/en/latest/api/sword.html

Semiodesk GmbH | Werner-von-Siemens-Str. 6 Geb. 15k, 86159 Augsburg, Germany Phone: +49 821 8854401 | Fax: +49 821 8854410 | www.semiodesk.com

 

This e-mail message may contain confidential or legally privileged information and is intended only for the use of the intended recipient(s). Any unauthorized disclosure, dissemination, distribution, copying or the taking of any action in reliance on the information herein is prohibited. E-mails are not secure and cannot be guaranteed to be error free as they can be intercepted, amended, or contain viruses. Anyone who communicates with us by e-mail is deemed to have accepted these risks. Semiodesk GmbH is not responsible for errors or omissions in this message and denies any responsibility for any damage arising from the use of e-mail. Any opinion and other statement contained in this message and any attachment are solely those of the author and do not necessarily represent those of the company.


*** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
*** Archive: http://www.eprints.org/tech.php/
*** EPrints community wiki: http://wiki.eprints.org/
*** EPrints developers Forum: http://forum.eprints.org/



*** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
*** Archive: http://www.eprints.org/tech.php/
*** EPrints community wiki: http://wiki.eprints.org/
*** EPrints developers Forum: http://forum.eprints.org/