EPrints Technical Mailing List Archive

See the EPrints wiki for instructions on how to join this mailing list and related information.

Message: #08680


< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First

[EP-tech] Antwort: Re: Antwort: Re: Crawler ends up with 404, dont know how to handle MIME subtype wildcard

  • To: David R Newman <drn@ecs.soton.ac.uk>
  • Subject: [EP-tech] Antwort: Re: Antwort: Re: Crawler ends up with 404, dont know how to handle MIME subtype wildcard
  • From: <jens.witzel@uzh.ch>
  • Date: Mon, 26 Jul 2021 17:41:00 +0200

CAUTION: This e-mail originated outside the University of Southampton.

Sorry David,

it was my fault: tried to catch up an non existant link, caused by mixture between ISSUE and testing host #-)
Now I get my "HTTP/1.1 302 Found" which should be fine.


Thanks again
Jens


--
Jens Witzel
Zentrale Informatik
Universität Zürich
Stampfenbachstrasse 73
CH-8006 Zürich

mail:  jens.witzel@uzh.ch
phone: +41 44 63 56777
http://www.zi.uzh.ch


Inactive hide details for Jens Witzel---26.07.2021 17:33:26---Hi David thanks for your fast fix. Just tested it and unfortunateJens Witzel---26.07.2021 17:33:26---Hi David thanks for your fast fix. Just tested it and unfortunately still get this ugly 404 :-/ Rega

Von: Jens Witzel/at/UZH
An: "David R Newman" <drn@ecs.soton.ac.uk>
Kopie: eprints-tech@ecs.soton.ac.uk, jens.witzel@uzh.ch
Datum: 26.07.2021 17:33
Betreff: Antwort: Re: Antwort: Re: [EP-tech] Crawler ends up with 404, dont know how to handle MIME subtype wildcard




Hi David

thanks for your fast fix. Just tested it and unfortunately still get this ugly 404 :-/

Regards
Jens


--
Jens Witzel
Zentrale Informatik
Universität Zürich
Stampfenbachstrasse 73
CH-8006 Zürich

mail:  jens.witzel@uzh.ch
phone: +41 44 63 56777
http://www.zi.uzh.ch


Inactive hide details for "David R Newman" ---26.07.2021 16:03:36---Hi Jens, To fix your specific problem you need to modify"David R Newman" ---26.07.2021 16:03:36---Hi Jens, To fix your specific problem you need to modify

Von: "David R Newman" <drn@ecs.soton.ac.uk>
An: jens.witzel@uzh.ch
Kopie: eprints-tech@ecs.soton.ac.uk
Datum: 26.07.2021 16:03
Betreff: Re: Antwort: Re: [EP-tech] Crawler ends up with 404, dont know how to handle MIME subtype wildcard




Hi Jens,

To fix your specific problem you need to modify perl_lib/EPrints/Apache/Rewrite.pm on or around line 422:

-                       &&  (index(lc($accept), "text/html") != -1 || index(lc($accept),"*/*") != -1 || $accept eq ""  )   ## header must be text/html, or */*, or undef
+                       &&  (index(lc($accept), "text/html") != -1 || index(lc($accept), "text/*") != -1 || index(lc($accept),"*/*") != -1 || $accept eq ""  )   ## header must be text/html, text/*, */* or undef

I am reviewing the implication of this change and whether any further changes are needed, as I see reference to the accept mime type in several other places and want to see whether setting accept mime type to text/* on other requests would still break things.

Regards

David Newman

On 26/07/2021 09:55, jens.witzel@uzh.ch wrote:

    CAUTION: This e-mail originated outside the University of Southampton.

    Dear David

    thank you for your support!


    Kind regards
    Jens


    --
    Jens Witzel
    Zentrale Informatik
    Universität Zürich
    Stampfenbachstrasse 73
    CH-8006 Zürich

    mail:  
    jens.witzel@uzh.ch
    phone: +41 44 63 56777

    http://www.zi.uzh.ch

    Inactive hide details for "David R Newman"
            ---26.07.2021 10:50:37---Hi Jens, I can replicate the same
            problem on 3.4 GitHub HEA"David R Newman" ---26.07.2021 10:50:37---Hi Jens, I can replicate the same problem on 3.4 GitHub HEAD [1].  I have created

    Von:
    "David R Newman" <drn@ecs.soton.ac.uk>
    An:
    eprints-tech@ecs.soton.ac.uk, jens.witzel@uzh.ch
    Datum:
    26.07.2021 10:50
    Betreff:
    Re: [EP-tech] Crawler ends up with 404, dont know how to handle MIME subtype wildcard





    Hi Jens,

    I can replicate the same problem on 3.4 GitHub HEAD [1].  I have created a GitHub issue for this [2] and will investigate.

    Regards  

    David Newman

    [1] https://github.com/eprints/eprints3.4 

    [2] https://github.com/eprints/eprints3.4/issues/159 

    On 26/07/2021 09:31, jens.witzel--- via Eprints-tech wrote:

    CAUTION: This e-mail originated outside the University of Southampton.

    Dear all

    unfortunately one of our partner crawlers reports a 404 error during the download, The problem occurs when wildcards are used as mime subtype.

    Here an example on our repo ZORA - let us try to get publication no. 143147 via CURL:

    HTTP 200 status is returned, when
    - no Accept header is specified: curl -v
    https://www.zora.uzh.ch/id/eprint/143147/
    - an exact MIME type is specified: curl -v -H 'Accept: text/html'
    https://www.zora.uzh.ch/id/eprint/143147/
    - any MIME type is specified: curl -v -H 'Accept: */*'
    https://www.zora.uzh.ch/id/eprint/143147/

    HTTP 404 status is returned if the MIME subtype is open, e.g. 'text/*'.

    ==> curl -v -H 'Accept: text/*,application/*'
    https://www.zora.uzh.ch/id/eprint/143147/

    [...]
    < HTTP/1.1 404 Not Found
    < Date: Mon, 26 Jul 2021 08:23:04 GMT
    < Server: Apache/2.4.6 (Red Hat Enterprise Linux) OpenSSL/1.0.2k-fips mod_perl/2.0.11 Perl/v5.16.3
    < Cache-Control: no-store, no-cache, must-revalidate
    < Strict-Transport-Security: max-age=15780000
    < Transfer-Encoding: chunked
    < Content-Type: text/html; charset=utf-8

    The Header "Accept: text/*,application/*" should be valid. So, we think is goin wrong around CRUD.pm [line 948] -
    elsif( $subtype eq '*' ) {}

    Is this a bug or is there a workaround? Any help is appreciated.

    Have a nice day
    Jens



    --
    Jens Witzel
    Zentrale Informatik
    Universität Zürich
    Stampfenbachstrasse 73
    CH-8006 Zürich

    mail:  
    jens.witzel@uzh.ch
    phone: +41 44 63 56777

    http://www.zi.uzh.ch 


    *** Options:
    http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
    *** Archive:
    http://www.eprints.org/tech.php/
    *** EPrints community wiki:
    http://wiki.eprints.org/