EPrints Technical Mailing List Archive

See the EPrints wiki for instructions on how to join this mailing list and related information.

Message: #08823


< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First

Re: [EP-tech] mixed-content warnings


CAUTION: This e-mail originated outside the University of Southampton.
Hi David,

Thank you.  I have also opened a similar issue based on yours in DataCiteDOI on IRStats2:


Would something like this work as a patch [i.e., can we rely on a $self->{session}->config ("securehost") just like a $c->{securehost} ]?

$self->{host} = defined $self->{session}->config( "host" ) ?  $self->{session}->config( "host" ) :  $self->{session}->config( "securehost" );


Tomasz


________________________________________________

Tomasz Neugebauer
Senior Librarian | Bibliothécaire titulaire
Digital Projects & Systems Development Librarian / Bibliothécaire des Projets Numériques & Développement de Systèmes
Concordia University / Université Concordia

Tel. / Tél. 514-848-2424 ext. / poste 7738
Email / courriel: 
tomasz.neugebauer@concordia.ca

Mailing address / adresse postale: 1455 De Maisonneuve Blvd. W., LB-540-03, Montreal, Quebec H3G 1M8
Street address / adresse municipale: 1400 De Maisonneuve Blvd. W., LB-540-03, Montreal, Quebec H3G 1M8

library.concordia.ca


From: David R Newman <drn@ecs.soton.ac.uk>
Sent: Friday, January 7, 2022 10:39 AM
To: Tomasz Neugebauer <Tomasz.Neugebauer@concordia.ca>; eprints-tech@ecs.soton.ac.uk <eprints-tech@ecs.soton.ac.uk>
Subject: Re: [EP-tech] mixed-content warnings
 

Attention This email originates from outside the concordia.ca domain. // Ce courriel provient de l'exterieur du domaine de concordia.ca



Hi Tomasz,


Yes, that one will affect registering the referrer correctly in stats generation to log it as an internal referrer, so that could do with updating.  I am not currently responsible for maintaining the IRStats2 Bazaar plugin.  I will speak to my colleagues and other EPrints developers and see what can be done about maintaining this and making changes like the one required here.  There are several different branches of development that I am aware of for IRStats2 and we really need to see if this can be more linked up.


Regards


David Newman



On 07/01/2022 15:30, Tomasz Neugebauer wrote:
CAUTION: This e-mail originated outside the University of Southampton.
Hi David,

Thank you, much appreciated.  

About the IRStats2, I think maybe I was unclear, one of the references, the one in the config file, is indeed commented out by default, but this one is not:
$self->{host} = $self->{session}->config( "host" );
That is part of an initialization function, so don't know if having {host} undefined in the Referrer object on IRstats2 would break anything further along?

Best wishes,

Tomasz



From: David R Newman <drn@ecs.soton.ac.uk>
Sent: Friday, January 7, 2022 7:19 AM
To: Tomasz Neugebauer <Tomasz.Neugebauer@concordia.ca>; eprints-tech@ecs.soton.ac.uk <eprints-tech@ecs.soton.ac.uk>
Subject: Re: [EP-tech] mixed-content warnings
 

Attention This email originates from outside the concordia.ca domain. // Ce courriel provient de l'exterieur du domaine de concordia.ca



Hi Tomasz,


I have just been checking RFC 6265 which is about the HTTP State Management System (i.e. Cookies).  Point 6 of section 5.3 (see link below) makes clear that if the domain attribute is not set then it is assumed to be the same as the current request:


https://www.rfc-editor.org/rfc/rfc6265#section-5.3


Therefore, I cannot see any scenario where the domain not being set would create a security or functional problem, as there should be no need to share EPrints cookie data between domains, (e.g. multiple web sites at an institution).  The only two cookies that EPrints deploys by default are for maintaing a logged in user's session and another for the language they have chosen if the repository is multi-language.  Neither of these should need to be shared with other sites.  Even if they did, the effective setting for domain in the cookie with be the same whether $c->{host} is defined or not.  So the repository system administrator would have had to manually change the $c->{cookie_domain} setting to something other than the hostname of the repository, at which point the value for $c->{host} becomes inconsequential as it will no longer be playing a role in setting $c->{cookie_domain}.


This does not mean that the commit I made last night is inappropriate, just that functionally it will make no difference.  However, at some point in the future it may concern someone (like it did us, yesterday) that the value for $c->{cookie_domain} is not being set because $c->{host} is undefined (if a repository has been configured for HTTPS only).  So fixing this now will avoid concern further down the line.


Regards


David Newmam


On 07/01/2022 00:47, David R Newman wrote:

Hi Tomasz,


Thanks for doing that review.  Point 2 (cookie_domain) is important to fix, although it does not appear to break anything from a user perspective, setting the domain on cookies is important.  I have fixed this with the latest commit:


https://github.com/eprints/eprints3.4/commit/bde3347551e0424fbbc166e52c9179b6e17b6704#diff-5d51fb282bd5d973fb2de0a82e36cdfb465b9e69b2782c9c923f8a24aeaaad97


I have added a GitHub  issue for the DataCiteDOI plugin:


https://github.com/eprintsug/DataCiteDoi/issues/52


The IRStats2 issue is less of a problem, as the code is commented out by default, so if a someone uncomments this, they should spot this is not working and be able to deal with that issue immediately.  Rather than not noticing this has broken after changing their configuration to enable HTTPS only.  Like would be the case the the other two instances.


Thanks and regards


David Newman


On 06/01/2022 22:49, Tomasz Neugebauer wrote:
CAUTION: This e-mail originated outside the University of Southampton.
I did some grep on our configuration files, and found the following instances:

  1. DataCite DOI Minting

$c->{datacitedoi}{repoid} = $c->{host};

DataCite DOI minting, from the Bazaar, but in configuration file, so simply overwrote with "$c->{securehost}" in our local cfg/cfg.d/z_datacitedoi.pl

  1. Core lib/cfg.d/misc.pl
$c->{cookie_domain} = $c->{host}; 

Not sure what to do with this one?  Should I change that or do something about it, given that {host} is now undefined?

  1. IRStats2 
Processor Referrer
$self->{host} = $self->{session}->config( "host" );

Not sure if this would no longer work now that "host" is undef?

Also, not an issue on our repo, but config file on irstats2 has this (optional code, commented out by default and on our repo):
my $hostname = $session->config( 'host' ) or return 0; 

Tomasz


________________________________________________

Tomasz Neugebauer
Senior Librarian | Bibliothécaire titulaire
Digital Projects & Systems Development Librarian / Bibliothécaire des Projets Numériques & Développement de Systèmes
Concordia University / Université Concordia

Tel. / Tél. 514-848-2424 ext. / poste 7738
Email / courriel: 
tomasz.neugebauer@concordia.ca

Mailing address / adresse postale: 1455 De Maisonneuve Blvd. W., LB-540-03, Montreal, Quebec H3G 1M8
Street address / adresse municipale: 1400 De Maisonneuve Blvd. W., LB-540-03, Montreal, Quebec H3G 1M8

library.concordia.ca


From: Tomasz Neugebauer <Tomasz.Neugebauer@concordia.ca>
Sent: Thursday, January 6, 2022 3:34 PM
To: David R Newman <drn@ecs.soton.ac.uk>; eprints-tech@ecs.soton.ac.uk <eprints-tech@ecs.soton.ac.uk>
Subject: Re: [EP-tech] mixed-content warnings
 
Hi David,

Thank you for the detailed explanation and all of your work on this.
I did not know if unsetting {host} variable is the recommended way going forward, hence I hesitated, but given that our repository is HTTPS-only with HSTS , running on 3.4.3, unsetting the {host} to undefined seems like the best way forward.  I will do that that.  As I wrote, I did notice that this solves the issue on the testing server, it's just that I didn't know if that is a setting that is "supported".  These wiki help pages have the {host} set in the examples:
and I was also unaware of this page:
Let's add a link to the "Simplified HTTPS Configuration" page from some of these others?

I did track down the same line that you referenced (perl_lib/EPrints/URL.pm) while troubleshooting, so it is reassuring that I was on the right track: if ( EPrints::Utils::is_set( $session->config( "securehost" ) ) && ( $opts{scheme} eq "https" || !EPrints::Utils::is_set( $session->config( "host" ) )"

I will search through our configuration files to make sure that "host" variable isn't used for something without a fallback, but I think that I will not find that.  I was more worried about breaking something in the core by unsetting that {host} variable, so your message was very helpful.

Best wishes,

Tomasz







________________________________________________

Tomasz Neugebauer
Senior Librarian | Bibliothécaire titulaire
Digital Projects & Systems Development Librarian / Bibliothécaire des Projets Numériques & Développement de Systèmes
Concordia University / Université Concordia

Tel. / Tél. 514-848-2424 ext. / poste 7738
Email / courriel: 
tomasz.neugebauer@concordia.ca

Mailing address / adresse postale: 1455 De Maisonneuve Blvd. W., LB-540-03, Montreal, Quebec H3G 1M8
Street address / adresse municipale: 1400 De Maisonneuve Blvd. W., LB-540-03, Montreal, Quebec H3G 1M8

library.concordia.ca



From: David R Newman <drn@ecs.soton.ac.uk>
Sent: Thursday, December 23, 2021 8:00 PM
To: eprints-tech@ecs.soton.ac.uk <eprints-tech@ecs.soton.ac.uk>; Tomasz Neugebauer <Tomasz.Neugebauer@concordia.ca>
Subject: Re: [EP-tech] mixed-content warnings
 

Attention This email originates from outside the concordia.ca domain. // Ce courriel provient de l'exterieur du domaine de concordia.ca



Hi Tomasz,


Mixed content warnings is something, I have been trying to improve in recent version of EPrints, so new installs should not suffer these problems.   However, upgrades will still be problematic.  This is because old templates, citations, workflows and even CSS and _javascript_ files may have http URLs in them.  This means you really need to go through all these files and seek out http URLs.


The main problem I have found is the use http_url or http_cgiurl in templates citations and even workflows.  These should ideally use rel_path and rel_cgipath instead but as this does not give your the full URL it might be better to use base_url and perl_url instead.  However, to make sure that these are https not http, you will need to make sure you have either no or an up to date version of 20_baseurls.pl in your archive's cfg/cfg.d/ (assuming you are running 3.4.1+, which it sounds like you are).  This is because of a change made for 3.4.1 to ensure that base_url and perl_url get configured as https if $c->{securehost} is defined.


It is worth grepping across all of your archive's cfg directory for the string "http:" to route out any hardcoded http URLs.


One of the things I did in recent versions of EPrints is provide a way of reconfiguring 10_core.pl to better/more intuitive enable HTTPS everywhere [1].  This ensures all http URL requests are redirected to https without needing to have picked up the HSTS header, which require visiting an https URL at least once (and therefore does not work for stateless bots).  If you deploy HTTPS everywhere, as well as running generate_apacheconf and reloading the webserver, you will need to make sure all browse views and abstract pages are regenerated. 


As you comment in your email below, you are worried about unsetting $c->{host} as it may break things.  I am aware of one issue with this in 3.4.3 core code [2].  However, this is a fairly straightforward fix and is only a problem if your have multiple languages enabled for your repository.  If you use the Repository Links Bazaar plugin [3], that will also require a similar fix.  I think there may be one or two other Bazaar plugins that use $c->{host} but I cannot remember what they are off the top of my head.


If you look at perl_lib/EPrints/URL.pm line 129 [4] you should see the line:


if ( EPrints::Utils::is_set( $session->config( "securehost" ) ) && ( $opts{scheme} eq "https" || !EPrints::Utils::is_set( $session->config( "host" ) )


If you have HTTPS everywhere configuration enabled this should ensure HTTPS URLs are always used for things like the thumbnail URLs you describe having a problem with.  However, if you are not using HTTPS everywhere configuration you will still get http URLs for thumbnails and similar.  I would therefore recommending enabling this and I will see if I can track down the Bazaar plugins that may be affected by $c->{host} being undefined.


The problem with EPrints is it has gone through various iterations of HTTP/HTTPS use:


1. No HTTPS

2. HTTP for public pages and HTTPS for back-end admin pages.

3. HTTPS for all pages


This means as the code has evolved over time how to configure the appropriate URLs in various situation has got progressively more complicated, as way of supporting these different approaches for HTTPS have been incorporated into ePrints over the year.  I go in to a bit of detail about this in the EPrints 3.4.3 release page [5].  I still don't think this is perfect, as there is the potential requirements in Bazaar plugins or bespoke archive code/configuration that require $c->{host} to be defined.  However, after a lot of consideration, the changes I made for 3.4.3 tried to make the best compromise between fixing the mixed content warnings, simplifying URLs config variables and their use and not seriously breaking existing repositories when they are upgraded.


Regards


David Newman


[1] https://wiki.eprints.org/w/Simplified_HTTPS_Configuration

[2] https://github.com/eprints/eprints3.4/issues/118

[3] http://bazaar.eprints.org/379/

[4] https://github.com/eprints/eprints3.4/blob/master/perl_lib/EPrints/URL.pm#L129

[5] https://wiki.eprints.org/w/EPrints_3.4.3#Configuration_URLs_and_Paths


On 23/12/2021 23:12, Tomasz Neugebauer via Eprints-tech wrote:
CAUTION: This e-mail originated outside the University of Southampton.
I thought​ that I resolved all of the "mixed content" warnings on our repository a while back, but after a recent upgrade from 3.3.12 to 3.4.3, I noticed that I have some mixed content warnings again, specifically on the thumbnails on the abstract pages.  I might have missed some of these warning before, though, so this might not be a new issue after the upgrade.  

Because I have HSTS headers, the browser redirects those those requests to HTTPS, but I would like to fix it.  Both the SRC and the HREF of the thumbnails for PDFs are referenced as HTTP instead of HTTPS.  The only thing that fixed it during my testing was if I was to remove (comment out) " the $c->{host}  line/ariable in 10_core.pl
That resolves the issue, but I'm worried to apply this change because I don't know if something else might rely on that variable.

I spent a good part of a day trying to follow the code, and I know that the {scheme} variable in URL.pm doesn't get properly set to https in the case of the thumbnails, but the code is so confusing when it comes to the thumbnail URLs that I can't figure out why.  I do have a suspicion that there is a bug in the core code somewhere, but perhaps it is something in our own configuration. 
I know this issue is not new to this list, in fact, I wrote the first drafts of the HSTS page on the Wiki (https://wiki.eprints.org/w/HTTPS-only_and_HSTS), but looking through the updated page there and any recent exchanges that relate to this didn't help me figure it out.  
Let me know if you have any ideas?

Best wishes,
Tomasz




*** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
*** Archive: http://www.eprints.org/tech.php/
*** EPrints community wiki: http://wiki.eprints.org/

Virus-free. www.avg.com