EPrints Technical Mailing List Archive

See the EPrints wiki for instructions on how to join this mailing list and related information.

Message: #10063


< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First

Re: [EP-tech] Links dynamically generated lose the "/cgi" part


CAUTION: This e-mail originated outside the University of Southampton.

When we originally attempted to transfer to SSL connections we were faced with redirect loop issues that turned out to be because of a routing configuration on an intermediary scaling / firewall server.  It had been set to redirect all traffic arriving at port 443 (SSL) for the repository from outside our network, through to the standard port 80 (http) on the repository server itself, which meant that the server replied redirecting to port 443, and the browser followed the redirection, but effectively the new request came back to port 80, so another redirect reply was sent… and repeat until browser gives up.

 

Probably not connected to your issue if you are connecting to your server directly, but worth checking if there are any networking inter-layers.

 

Alan

 

From: eprints-tech-request@ecs.soton.ac.uk <eprints-tech-request@ecs.soton.ac.uk> on behalf of Gunnar Wolf <gwolf@gwolf.org>
Date: Monday, 24 March 2025 at 16:42
To: eprints-tech@ecs.soton.ac.uk <eprints-tech@ecs.soton.ac.uk>
Subject: Re: [EP-tech] Links dynamically generated lose the "/cgi" part

External email: if the sender or content looks suspicious, please click the Report Message icon, or forward it to report-phishing

CAUTION: This e-mail originated outside the University of Southampton.

CAUTION: This e-mail originated outside the University of Southampton.

Hello David,

Thanks for the prompt reply! Let me walk through your answer:

David R Newman dijo [Mon, Mar 24, 2025 at 09:13:10AM +0000]:
> Hi Gunnar,
>
> There are various configuration settings in EPrints, which are mostly
> auto-generated from other settings or defaults.  The ones that generate
> URLs that should typical contain a /cgi/ are as follows:
>
> $c->{http_cgiurl}
> $c->{https_cgiurl}
> $c->{perl_url}
> $c->{rel_cgipath}
> $c->{http_cgiroot}
> $c->{https_cgiroot}
> $c->{userhome}

Getting them all from the running configuration yields:

     'http_cgiurl' => bless( do{\(my $o = '<safelinks_removed>')}, 'URI::http' ),
     'https_cgiurl' => bless( do{\(my $o = '<safelinks_removed>')}, 'URI::https' ),
     'perl_url' => '<safelinks_removed>',
     'rel_cgipath' => bless( do{\(my $o = '/cgi')}, 'URI::http' ),
     'http_cgiroot' => '/cgi',
     'userhome' => '/cgi/users/home',

So, I se https_cgiroot is not defined. I added it to 10_core.pl. OK, things
don't _visibly_ break anymore. Good!

I also confirmed (and I think this is relevant for I write a bit further on)
that the weird redirect cycle I asked about some days ago only happens if I
declare $c->{securehost} in my configuration; I temporarily enabled it and got
the cycles; disabled it and they went away.

> Based on your explanation, it looks like $c->{https_cgiurl} is the
> configuration setting at issue.  I would check under you archive's
> configuration (EPRINTS_PATH/archive/ARCHIVE_ID/cfg/cfg.d/) and see if this
> setting is set anywhere else that where you set this.  It may be that this
> configuration setting  is not in your archive, so if you cannot find the
> configuration setting under you archive, if you go under your EPrints path
> and then look under the following sub-paths for this configuration
> setting. (N.B. use of the wildcard character under flavours and
> ingredients as there may be multiple sub-directories):
>
> lib/cfg.d/
> flavours/*/cfg.d/
> site_lib/cfg.d/
> ingredients/*/cfg.d/

Thanks a lot for this explanation. I am still running EPrints 3.3, although one
of my projects for this year is to face a migration to 3.4 (is it going to hurt,
Doctor?
😉), so I don't have flavours nor ingredients... in fact, not even
site_lib. From my EPrints root:

     $ find . -name cfg.d
     ./archives/iiec/cfg/cfg.d
     ./lib/cfg.d
     ./lib/epm/irstats2/cfg/cfg.d
     ./lib/defaultcfg/cfg.d

None of them include the "https_cgiroot" or "https_cgiurl" strings. However, I
took a quick dive in the Perl source, and found some weird logic (even
comments!) looking for those strings in the comments.

in eprints/perl_lib/EPrints/Repository.pm's _add_http_paths
(https://eur01.safelinks.protection.outlook.com/?url="">),
there is a comment stating that "Backwards-compatibility: http is fairly simple,
https may go wrong" (which matches my experience
😕). But well, given I'm not
configuring $c->{securehost} as explained earlier, my code flow stays in the
"simple" path. Anyway, reading this function helps me understand why some
settings are repeated with differing names...

> If you still cannot find the configuration setting, get back to me with
> what you have found and I shall see if I can advise further.  I doubt that
> any default configuration setting under any of these directories is
> missing "cgi" but it is possible it was accidentally or deliberately
> changed to fix another problem at some point.

Thank you very much for your help! There are many bits that make my head itch,
but at least the configuration seems now to be functional
😃

> Also, I would not usually set the setting the way that you did, although I
> think the way you did it should work.  I would set it like this:
>
> $c->{https_cgiurl} = "https://" . $c->{securehost} . "/cgi";
>
> This assumes that $c->{securehost} is already set in an earlier
> configuration file.  Configuration files load in alphanumeric order.  This
> possibly explains why you configured $c->{https_cgiurl} but then a later
> configuration file overwrote this.

OK -- as said, setting $c->{securehost} breaks my site, but I built this url
using $c->{host}, and at least it didn't break
😉

thanks again,

   – Gunnar.