EPrints Technical Mailing List Archive
See the EPrints wiki for instructions on how to join this mailing list and related information.
Message: #08673
< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First
[EP-tech] Crawler ends up with 404, dont know how to handle MIME subtype wildcard
- To: <eprints-tech@ecs.soton.ac.uk>, <jens.witzel@uzh.ch>
- Subject: [EP-tech] Crawler ends up with 404, dont know how to handle MIME subtype wildcard
- From: <jens.witzel@uzh.ch>
- Date: Mon, 26 Jul 2021 10:31:49 +0200
Dear all
unfortunately one of our partner crawlers reports a 404 error during the download, The problem occurs when wildcards are used as mime subtype.
Here an example on our repo ZORA - let us try to get publication no. 143147 via CURL:
HTTP 200 status is returned, when
- no Accept header is specified: curl -v
https://www.zora.uzh.ch/id/eprint/143147/
- an exact MIME type is specified: curl -v -H 'Accept: text/html'
https://www.zora.uzh.ch/id/eprint/143147/
- any MIME type is specified: curl -v -H 'Accept: */*'
https://www.zora.uzh.ch/id/eprint/143147/
HTTP 404 status is returned if the MIME subtype is open, e.g. 'text/*'.
==> curl -v -H 'Accept: text/*,application/*'
https://www.zora.uzh.ch/id/eprint/143147/
[...]
< HTTP/1.1 404 Not Found
< Date: Mon, 26 Jul 2021 08:23:04 GMT
< Server: Apache/2.4.6 (Red Hat Enterprise Linux) OpenSSL/1.0.2k-fips mod_perl/2.0.11 Perl/v5.16.3
< Cache-Control: no-store, no-cache, must-revalidate
< Strict-Transport-Security: max-age=15780000
< Transfer-Encoding: chunked
< Content-Type: text/html; charset=utf-8
The Header "Accept: text/*,application/*" should be valid. So, we think is goin wrong around
CRUD.pm [line 948] - elsif( $subtype eq '*' ) {}
Is this a bug or is there a workaround? Any help is appreciated.
Have a nice day
Jens
--
Jens Witzel
Zentrale Informatik
Universität Zürich
Stampfenbachstrasse 73
CH-8006 Zürich
mail: jens.witzel@uzh.ch
phone: +41 44 63 56777
http://www.zi.uzh.ch
- Follow-Ups:
- [EP-tech] Crawler ends up with 404, dont know how to handle MIME subtype wildcard
- From: <jens.witzel@uzh.ch>
- [EP-tech] Crawler ends up with 404, dont know how to handle MIME subtype wildcard
- References:
- [EP-tech] Faceted Search with Elasticsearch in EPrints (on Github EprintsUG)
- From: <jens.witzel@uzh.ch>
- [EP-tech] Crawler ends up with 404, dont know how to handle MIME subtype wildcard
- From: <jens.witzel@uzh.ch>
- [EP-tech] Faceted Search with Elasticsearch in EPrints (on Github EprintsUG)
- Prev by Date: Re: [EP-tech] [EXTERNAL] Re: ORCID Support Advance Plugin Query
- Next by Date: Re: [EP-tech] Crawler ends up with 404, dont know how to handle MIME subtype wildcard
- Previous by thread: [EP-tech] EPrints/CRIS
- Next by thread: [EP-tech] DOI handling in orcid_support_advance
- Index(es):