EPrints Technical Mailing List Archive

See the EPrints wiki for instructions on how to join this mailing list and related information.

Message: #09289


< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First

Re: [EP-tech] IRstats2 "origin of downloads"


CAUTION: This e-mail originated outside the University of Southampton.

Hi David,

 

Thank you.  I can confirm that this was the source of the issue for our repository.  The combination that I had was “open” but no local “lib” version of the GeoIP.dat file, just a global one.  That resulted in “open” 1, which failed during process_stats without throwing an error that I could catch.  The code as is in the plugin now fails when there is a “lib” GeoIP.dat file, as it would try to do “new” with the lib path, which fails. Your updated code below looks like it would actually work in all cases – would be a good idea to push that to the code.

 

Now that I fixed it on our repo, and I can confirm that origin data is once again getting added to the tables, the inevitable question is: could I get process_stats to reprocess just a portion of the dataset, for example, everything after January 1, 2022?  Even better, could I get process_stats to reprocess everything after a certain date, but only modify the country of origin data?  Or are the only two options 1) to regenerate everything (in our case, more than 10 years worth of stats) 2) leave the missing country of origin data missing for 2022?

 

Tomasz

 

 

 

 

From: David R Newman <drn@ecs.soton.ac.uk>
Sent: April 21, 2023 3:31 AM
To: eprints-tech@ecs.soton.ac.uk; Tomasz Neugebauer <Tomasz.Neugebauer@concordia.ca>
Subject: Re: [EP-tech] IRstats2 "origin of downloads"

 

Attention This email originates from outside the concordia.ca domain. // Ce courriel provient de l'extérieur du domaine de concordia.ca

 

 

Hi Tomasz,

In fact, I think I found an issue even with that amended code in one scenario, so in a development version of IRStats2 are changed that to:

    foreach my $pkg ( 'Geo::IP', 'Geo::IP::PurePerl' )
    {
        if( EPrints::Utils::require_if_exists( $pkg ) )
        {
            if( $pkg !~ /PurePerl/ )
            {
                $self->{geoip} = $pkg->new( $dat_file ) if $dat_file eq '1';
                $self->{geoip} = $pkg->open( $dat_file ) if $dat_file ne '1';
            }
            else
            {
                $self->{geoip} = $pkg->new( $dat_file );
            }
            last if( defined $self->{geoip} );
        }
    }


Regards

David Newman

On 20/04/2023 11:23 pm, David R Newman via Eprints-tech wrote:

Hi Tomasz,

Ah, now you say this that rings a bell.  We have been looking at new release of IRstats2 and remember that there was a (pretty old) change since the last release that that uses ->new for one of the GeoIP libraries and ->open for the other:

https://github.com/eprints/irstats2/commit/a84f22bf6d8a7faa9b6593afa97ef1a0cd360fcc

Regards

David Newman

On 20/04/2023 11:03 pm, Tomasz Neugebauer wrote:

CAUTION: This e-mail originated outside the University of Southampton.

Hm… I think that one thing that might have happened here is that when I upgraded to 3.4, I removed the GeoIP.dat file from the “lib” folder, thinking that the library is supposed to use “the global one” instead.  I’m talking about this part of the code:

 

my $dat_file = $self->{session}->config( "lib_path").'/geoip/GeoIP.dat';
    
# alternatively use the global one
$dat_file = 1 if( !-e $dat_file );   

 

However, now that I look at this code, I find myself really confused!  $dat_file is set to 1 if the path doesn’t exist?!  Let’s just say that doesn’t look solid to me.  What is setting that path to 1 supposed to accomplish?  The loop later on, I needed to use the “open” command if I specify a path, not the “new” command here:

 

$self->{geoip} = $pkg->open( $dat_file );

 

However, if that path in lib doesn’t exist, I’m opening “1”, whatever that means – so that fails with an error; “new” works on it, but not sure what it does, then it defaults to the global one? I think that’s where possibly things are just failing.  When the file is there in the lib folder, it needs to be “open” command, not new, but when the file is not there, that’s when it’s “new” with that magic number “1”?

 

Tomasz

 

 

 

From: David R Newman <drn@ecs.soton.ac.uk>
Sent: April 19, 2023 3:23 PM
To: Tomasz Neugebauer
<Tomasz.Neugebauer@concordia.ca>; eprints-tech@ecs.soton.ac.uk
Subject: Re: [EP-tech] IRstats2 "origin of downloads"

 

Attention This email originates from outside the concordia.ca domain. // Ce courriel provient de l'extérieur du domaine de concordia.ca

 

 

Hi Tomasz,

Another possibility is that the irstats2_countries table has crashed an is therefore no longer updated.  However, I think you would have likely noticed this.  As I suspect it would have prevented other parts of IRStats2 updating, as one day the irstats2_internal table will retain the flag saying stats are still being updated and then subsequent days the process_stats would not be able to run.  However, maybe a crash table does not cause process_stats to fail without clearing the update flag.

Otherwise, there is various checks in the code that will stop country or origin data being generated.  Here is the whole block of code.  Hopefully, you can work out from this whether something on your repository server does not meet the required criteria:

    # if possible, use the GeoIP data file shipped with EPrints
    my $dat_file = $self->{session}->config( "lib_path").'/geoip/GeoIP.dat';
    
    # alternatively use the global one
    $dat_file = 1 if( !-e $dat_file );   

    #Test Geo::IP first - it's faster!
    foreach my $pkg ( 'Geo::IP', 'Geo::IP::PurePerl' )
    {
        if( EPrints::Utils::require_if_exists( $pkg ) )
        {
            $self->{geoip} = $pkg->new( $dat_file );
            last if( defined $self->{geoip} );
        }
    }

    if( !defined $self->{geoip} )
    {
        $self->{advertise} = 0;
        $self->{disable} = 1;
        $self->{error} = "Failed to load required module for Processor::Access::Country. Country information will not be available.";
        return $self;
    }

This code is in the file EPRINS_PATH/plugins/EPrints/Plugin/Stats/Processor/Access/Country.pm so if you cannot work out whether the criteria is meet, you could try adding some debug and then running process_stats (or wait for it to run overnight).  If this was failing I would hope that wherever you are logging the output (STDOUT and STDERR) fro process_stats would show the $self->{error} message above.

Regards

David Newman

On 19/04/2023 5:33 pm, Tomasz Neugebauer wrote:

CAUTION: This e-mail originated outside the University of Southampton.

Hi David,

 

Thanks for that, but I do believe that we have those packages installed. We are running on Ubuntu 18.04 if that makes a difference.

Here is some output from apt:

 

 apt list | grep geoip-database

 

geoip-database/bionic,now 20180315-1 all [installed]

geoip-database-contrib/bionic 1.19 all

geoip-database-extra/bionic,now 20180315-1 all [installed,automatic]

 

 apt list | grep libgeo-ip-perl

 

      libgeo-ip-perl/bionic,now 1.51-1 amd64 [installed]

 

Tomasz

 

 

 


From: David R Newman <drn@ecs.soton.ac.uk>
Sent: Wednesday, April 19, 2023 5:42 AM
To:
eprints-tech@ecs.soton.ac.uk <eprints-tech@ecs.soton.ac.uk>; Tomasz Neugebauer <Tomasz.Neugebauer@concordia.ca>
Subject: Re: [EP-tech] IRstats2 "origin of downloads"

 

Attention This email originates from outside the concordia.ca domain. // Ce courriel provient de l'extérieur du domaine de concordia.ca

 

 

Hi Tomasz,

You will need an operating system package and a Perl library (which can probably be installed as an OS package) for that to work.  On Ubuntu 20.04 that is:

apt install geoip-database libgeo-ip-perl

On Enterprise Linux (RHEL, CentOS, Fedora, Rocky Linux, Alma Linux, etc.) it is:

yum install GeoIP perl-Geo-IP

I do not think you will be able to get back missing country of origin data without regenerating your stats from scratch but this should start being added in future.

Regards

David Newman

On 18/04/2023 21:29, Tomasz Neugebauer via Eprints-tech wrote:

CAUTION: This e-mail originated outside the University of Southampton.

I’m trying to figure out why the “origin of downloads” country data has stopped working for our repository.

I see much of the old (pre-2022) data is still there, but starting in 2022, “origin of downloads” (the map visualization) contains no data.

Any idea as to what could be causing that or where to look for the issue?

It’s not just the visualization of the map, because that still works for “all” and earlier date ranges.

It looks like the country data is not there starting in 2022?

 

Tomasz

 





*** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
*** Archive: http://www.eprints.org/tech.php/
*** EPrints community wiki: http://wiki.eprints.org/

 




*** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
*** Archive: http://www.eprints.org/tech.php/
*** EPrints community wiki: http://wiki.eprints.org/