EPrints Technical Mailing List Archive
See the EPrints wiki for instructions on how to join this mailing list and related information.
Message: #08386
< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First
Re: [EP-tech] Antwort: Re: Antwort: Re: perl module update introduced some trouble with entities
- To: <eprints-tech@ecs.soton.ac.uk>
- Subject: Re: [EP-tech] Antwort: Re: Antwort: Re: perl module update introduced some trouble with entities
- From: David R Newman <drn@ecs.soton.ac.uk>
- Date: Tue, 1 Dec 2020 17:21:08 +0000
Hi all,
I think what was catching me out is that "epadmin test" will pick up genuine issues with entities that are not defined but I was not using the correct incantations with xmllint when I was using this to test individual XML files. The special incantation you need is:
xmllint --path /opt/eprints3/lib/ --loaddtd /opt/eprints3/flavours/pub_lib/lang/en/templates/default.xml --noout
If this produces no output them your XML is valid based on the DTDs it specifies to load. E.g. the following line in an XML file:
<!DOCTYPE html SYSTEM "entities.dtd">
So for future reference if you see an error like the one Thomas originally described ( /usr/share/eprints/site_lib/lang/en/phrases/modified.xml: Entity: line 226: parser error : Entity 'auml' not defined ), then:
1. Check XML file references "entities.dtd" in a <!DOCTYPE after the <?xml line.
2. Check that EPRINTS_PATH/lib/entities.dtd is present.
3. Check that the entity that the error message says is not defined is present in EPRINTS_PATH/lib/entities.dtd. If not, this entity may have been typo-ed.
4. Check that xmllint validates the file that error mesage complains about using the --path, --loaddtd and --noout switches (e.g. the line below should produce no output if valid):
xmllint --path /usr/share/eprints/lib/ --loaddtd /opt/eprints3/flavours/pub_lib/lang/en/templates/default.xml /usr/share/eprints/site_lib/lang/en/phrases/modified.xml --noout
5. If (4) shows the XML file to be valid. Check which version of LibXML you have installed and if it is 2.0201+ then apply the appropriate patch from:EPrints 3.3 - https://github.com/eprints/eprints/issues/511
EPrints 3.4 - https://github.com/eprints/eprints3.4/issues/41
If anyone is still having XML validation issues after this, we will have a new problem on our hands but hopefully that won't come to pass.
Regards
David Newman
This was a bit of a fiddle to make it possible to do things like £ é etc. to make people's lives a little easier when writing the templates which are XHTML.
The obvious other approach would be to preprocess them with something like this
s/&([a-z]+);/expandentitiy($1)/ge
but that would break with the wisdom that you should never parse XML with a regular _expression_.
On 01/12/2020 14:37, David R Newman via Eprints-tech wrote:
Hi all,
I have been blind. EPrints (at least latest 3.4) already has an entities.dtd in lib/ and is already used in most of the standard XML template, phrase, etc. files. I think the problem is that it does not link in properly in most if not all cases. So I will investigate how that can be done better to avoid encountering undefined entities errors.
Regards
David Newman
On 01/12/2020 14:26, martin.braendle@uzh.ch wrote:
CAUTION: This e-mail originated outside the University of Southampton.The entities file we have here has the following preamble
<!-- Portions (C) International Organization for Standardization 1986
Permission to copy in any form is granted for use with
conforming SGML systems and applications as defined in
ISO 8879, provided this notice is included in all copies.
-->
and contains more than 500 lines.
It stems most probably from here: https://www.w3.org/TR/REC-html40-971218/sgml/entities.html
Kind regards,
Martin
--
Dr. Martin Brändle
Zentrale Informatik
Universität Zürich
Stampfenbachstr. 73
CH-8006 Zürich
"David R Newman" ---01/12/2020 15:19:06---Hi all, EPrints 3.4 has has the patch applied for issues with newer versions of
Von: "David R Newman" <drn@ecs.soton.ac.uk>
An: eprints-tech@ecs.soton.ac.uk
Kopie: th.lauke@arcor.de, martin.braendle@uzh.ch
Datum: 01/12/2020 15:19
Betreff: Re: Antwort: Re: [EP-tech] perl module update introduced some trouble with entities
Hi all,EPrints 3.4 has has the patch applied for issues with newer versions of LibXML and EPrints 3.4.2 onwards should have this particular issue resolved. Regarding special characters, I will look into producing (or hopefully finding) an entities.dtd for all the special characters that EPrints repositories may want use and then update standard template and phrase files to use this. In fact it is probably even worth doing this for citation and workflow files as well. I have created an issue for EPrints 3.4 to address this:
https://github.com/eprints/eprints3.4/issues/112
Regards
David Newman
On 01/12/2020 13:35, martin.braendle@uzh.ch wrote:
CAUTION: This e-mail originated outside the University of Southampton.
Hi Thomas,
there should be an entities.dtd file in [eprints_root]/lib/, maybe this is missing or entries are missing in it?
Also a phrase file should mention that in the
<!DOCTYPE phrases SYSTEM "entities.dtd">
definition right at the beginning after the XML declaration.
Kind regards,
Martin
--
Dr. Martin Brändle
Zentrale Informatik
Universität Zürich
Stampfenbachstr. 73
CH-8006 Zürich
"David R Newman via Eprints-tech" ---01/12/2020 14:24:14---Hi Thomas, Named HTML entities are not supported in XML you need to use the decimal
Von: "David R Newman via Eprints-tech" <eprints-tech@ecs.soton.ac.uk>
An: <eprints-tech@ecs.soton.ac.uk>, <th.lauke@arcor.de>
Datum: 01/12/2020 14:24
Betreff: Re: [EP-tech] perl module update introduced some trouble with entities
Gesendet von: <eprints-tech-bounces@ecs.soton.ac.uk>
Hi Thomas,
Named HTML entities are not supported in XML you need to use the decimal code XML entity for ä which is ä
This is the same as needing to replace things like & and © with their equivalent decimal code XML entities.
Regards
David NewmanOn 01/12/2020 12:00, th.lauke--- via Eprints-tech wrote:
CAUTION: This e-mail originated outside the University of Southampton.
Hi all,
any hint where to start digging for reason(s) after following error:
Failed to parse XML file: /usr/share/eprints/site_lib/lang/en/phrases/modified.xml: Entity: line 226: parser error : Entity 'auml' not defined
This error occurs after updating some perl modules ... :(
Is the 'bad' module already known?
What is more effective: Fixing the module (version) or the phrase file?
Thanks for any idea in advance
Thomas
*** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
*** Archive: https://eur03.safelinks.protection.outlook.com/?url="">
*** EPrints community wiki: https://eur03.safelinks.protection.outlook.com/?url="">*** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
Virus-free. www.avg.com
*** Archive: http://www.eprints.org/tech.php/
*** EPrints community wiki: http://wiki.eprints.org/
Virus-free. www.avg.com
*** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech *** Archive: http://www.eprints.org/tech.php/ *** EPrints community wiki: http://wiki.eprints.org/-- Christopher Gutteridge <totl@soton.ac.uk> You should read our team blog at http://blog.soton.ac.uk/webteam/
- References:
- [EP-tech] misconfigured or failed upgrade to 3.4.1?
- From: <th.lauke@arcor.de>
- [EP-tech] perl module update introduced some trouble with entities
- From: <th.lauke@arcor.de>
- Re: [EP-tech] perl module update introduced some trouble with entities
- From: David R Newman <drn@ecs.soton.ac.uk>
- [EP-tech] Antwort: Re: perl module update introduced some trouble with entities
- From: <martin.braendle@uzh.ch>
- [EP-tech] Antwort: Re: Antwort: Re: perl module update introduced some trouble with entities
- From: <martin.braendle@uzh.ch>
- Re: [EP-tech] Antwort: Re: Antwort: Re: perl module update introduced some trouble with entities
- From: David R Newman <drn@ecs.soton.ac.uk>
- Re: [EP-tech] Antwort: Re: Antwort: Re: perl module update introduced some trouble with entities
- From: Christopher Gutteridge <totl@soton.ac.uk>
- [EP-tech] misconfigured or failed upgrade to 3.4.1?
- Prev by Date: Re: [EP-tech] solved: perl module update introduced some trouble with entities
- Next by Date: [EP-tech] autocomplete based on Sherpa/Romeo
- Previous by thread: [EP-tech] EPrints/CRIS
- Next by thread: [EP-tech] DOI handling in orcid_support_advance
- Index(es):