EPrints Technical Mailing List Archive

See the EPrints wiki for instructions on how to join this mailing list and related information.

Message: #05634


< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First

Re: [EP-tech] Eprints-tech Digest, Vol 91, Issue 43


Hi Adam

for robots crawling, we have a separate view 
that creates no variations. Accordingly, the robots.txt is configured so that the other views can not be crawled. 

Best regards

Martin




Am 27.04.2016 um 17:20 schrieb Adam Field <Adam.Field@jisc.ac.uk>:

deletion of old files is good, but not a full solution (you are likely to be crawled by all sorts of robots, so all files will be generated).  This will only be a solution if there are lots of files from old configuration that were never deleted.

generate menus won't help either, I'm afraid.  It'll just regenerate the menus which isn't connected to the problem you're having.

I've had a peek at the XML of one of your records and made some assumptions about the nature of your repository.  Have you thought of filtering the person_view field so that it only contains institutional authors.  For example, https://eref.uni-bayreuth.de/15081/ contains:

    <person_view>
      <item>
        <name>
          <family>Bornkamm</family>
          <given>Joachim</given>
        </name>
      </item>
      <item>
        <name>
          <family>Brömmelmeyer</family>
          <given>Christoph</given>
        </name>
      </item>
      <item>
        <name>
          <family>Brönneke</family>
          <given>Tobias</given>
        </name>
      </item>
      <item>
        <name>
          <family>Bultmann</family>
          <given>Friedrich</given>
        </name>
      </item>
      <item>
        <name>
          <family>Busch</family>
          <given>Dörte</given>
        </name>
      </item>
      <item>
        <name>
          <family>Derleder</family>
          <given>Peter</given>
        </name>
      </item>
      <item>
        <name>
          <family>Ernst</family>
          <given>Stefan</given>
        </name>
      </item>
      <item>
        <name>
          <family>Hirsch</family>
          <given>Günter</given>
        </name>
      </item>
      <item>
        <name>
          <family>Hörmann</family>
          <given>Günter</given>
        </name>
      </item>
      <item>
        <name>
          <family>Kohte</family>
          <given>Wolfhard</given>
        </name>
      </item>
      <item>
        <name>
          <family>Maier</family>
          <given>Arne</given>
        </name>
      </item>
      <item>
        <name>
          <family>Metz</family>
          <given>Rainer</given>
        </name>
      </item>
      <item>
        <name>
          <family>Rott</family>
          <given>Peter</given>
        </name>
      </item>
      <item>
        <name>
          <family>Schmidt-Kessel</family>
          <given>Martin</given>
        </name>
        <ubt>yes</ubt>
      </item>
      <item>
        <name>
          <family>Schwintowski</family>
          <given>Hans-Peter</given>
        </name>
      </item>
      <item>
        <name>
          <family>Stadler</family>
          <given>Astrid</given>
        </name>
      </item>
      <item>
        <name>
          <family>Tamm</family>
          <given>Marina</given>
        </name>
      </item>
      <item>
        <name>
          <family>Tiffe</family>
          <given>Achim</given>
        </name>
      </item>
      <item>
        <name>
          <family>Tonner</family>
          <given>Klaus</given>
        </name>
      </item>
    </person_view>

You'd get good utility from your repository if it was only:

    <person_view>
      <item>
        <name>
          <family>Schmidt-Kessel</family>
          <given>Martin</given>
        </name>
        <ubt>yes</ubt>
      </item>
    </person_view>

I'm assuming you really only care deeply about people with the ubt flag set to 'yes'.

 

<6B9928AE-9C97-4E75-8330-7E24168F02D7[10].png>

Adam Field
SHERPA services analyst developer


From: <eprints-tech-bounces@ecs.soton.ac.uk> on behalf of Verena Mattes <verena.mattes@ub.uni-bayreuth.de>
Reply-To: "eprints-tech@ecs.soton.ac.uk" <eprints-tech@ecs.soton.ac.uk>
Date: Wednesday, 27 April 2016 14:37
To: "eprints-tech@ecs.soton.ac.uk" <eprints-tech@ecs.soton.ac.uk>
Subject: Re: [EP-tech] Eprints-tech Digest, Vol 91, Issue 43

Hi Martin,

The deletion of older files is definitely a good idea, I'm looking at
that now.

Not sure about the --generate menus option, but I'm going to try it out
with our Eprints test repository.

Thanks!

Verena


> Date: Wed, 27 Apr 2016 11:18:12 +0200
> Subject: [EP-tech] Antwort: Re: Problems with view generation: EPrints
> System Error
> Message-ID:
>
> Content-Type: text/plain; charset="iso-8859-1"
>
>
> Hi Verena,
>
> did you try out the --generate menus  option of generate_views? This
> reduces the number of files considerably.
>
> Also, in our nightly cron job, we delete all files that are older than 24
> hours.
>
> Best regards,
>
> Martin
>
> --
> Dr. Martin Br?ndle
> Zentrale Informatik
> Universit?t Z?rich
> Stampfenbachstr. 73
> CH-8006 Z?rich
>
> phone: +41 44 63 56705
> fax: +41 44 63 54505
>
>
>
> Von: Verena Mattes <verena.mattes@ub.uni-bayreuth.de>
> Datum: 27/04/2016 10:54
> Betreff: Re: [EP-tech] Problems with view generation: EPrints System
>              Error
>
>
>
> Hi Adam,
>
> For now, my colleagues in the IT service centre solved the problem by
> raising the number of files permitted in one directory on our Netapp,
> but that is only a temporary solution. Almost 340.000 files in one
> directory are just too many, so I'll have to find a way to split the
> person view into groups somehow.
>
> Your suggestion of checking the number of variations should be a first
> step in reducing the number of files. Here's our current view
> configuration, which I'm going to change by taking away the DEFAULT
> variation:
>
>>         {
>>                  id => "person",
>>                  hideempty => 0,
>>                  allow_null => 0,
>>                  menus => [
>>                          {
>>                                  fields => [ "person_view_name" ],
>>                                  mode => "sections",
>>                                  grouping_function =>
> "EPrints::Update::Views::group_by_first_character",
>>                                  group_range_function =>
> "EPrints::Update::Views::cluster_ranges_40",
>>                                  new_column_at => [40],
>>                                  open_first_section => 1,
>>                          },
>>                          ],
>>                  order => "-date;res=year/title/publication/book_title",
>>                  hideup => 0,
>>                  nocount => 0,
>>                  notimestamp => 0,
>>                  include => 1,
>>                  variations => [
>>                          "type",
>>                          "date;truncate=4,reverse",
>>                          "DEFAULT",
>>                  ],
>>                  citation => "view",            # Views mit
> Volltext-Hinweis!
>>          },
>
> Thanks for all your help!
>
> Verena
>
>>
>> Message: 1
>> Date: Tue, 26 Apr 2016 08:35:49 +0000
>> From: Adam Field <Adam.Field@jisc.ac.uk>
>> Subject: Re: [EP-tech] Problems with view generation: EPrints System
>> Error
>> Content-Type: text/plain; charset="utf-8"
>>
>> My previous suggestion wasn't right and won't work exactly as
expected on
> reflection.
>>
>> What happens when you run generate_views from the command line?
>> Just how many files do you have in the directory on the hard disk?
>>
>> Can you paste in the browse view configuration?  It may be that turning
> of variations will solve this.
>>
>>
>> Adam Field
>> SHERPA services analyst developer
>>
>>
> mailto:eprints-tech-bounces@ecs.soton.ac.uk>> on behalf of Verena Mattes
>
>> Date: Tuesday, 26 April 2016 07:48
>> Subject: Re: [EP-tech] Problems with view generation: EPrints System
> Error
>>
>> Hi Alan and Adam,
>>
>> Thanks for your suggestions. We are running EPrints 3.3.11 and the free
>> space on our hard disk is not a problem. I checked with a colleague and
>> he confirmed my guess that the problem is caused by the number of files
>> in the directory, obviously the maximum number was reached last week and
>> no further files can be created.
>> Does anybody have a quick idea on how to divide files for one view, the
>> person view, to different directories? Would it be possible to define
>> different views for groups of letters, e.g. A-D or something like that?
>>
>> Thanks!
>>
>> Verena
>>
>>
>>
>> Date: Mon, 25 Apr 2016 10:58:47 +0000
>> Subject: Re: [EP-tech] Problems with view generation: EPrints System
>> Error
>> Content-Type: text/plain; charset="utf-8"
>>
>> What specific version of EPrints are you running?  What is line 1550
> of /usr/share/eprints3/perl_lib/EPrints/Update/Views.pm
>>
>> Also, silly question, but how much free space do you have on your hard
> disk?
>>
>>
>> Adam Field
>> SHERPA services analyst developer
>>
>>
> mailto:eprints-tech-bounces@ecs.soton.ac.uk>> on behalf of Verena Mattes
>
>
>> Date: Monday, 25 April 2016 09:51
>
>> Subject: [EP-tech] Problems with view generation: EPrints System Error
>>
>> Hello,
>>
>> since last week, we've had problems with the view generation for our
>> author/person view. These problems specifically concern the views for
>> new author names, which are not generated, while the views for already
>> existing author names are updated.
>>
>> For each author name concerned, there is an entry in the apache error
> log:
>>
>> Error writing
> to
/usr/share/eprints3/archives/ubt_eref/html/de/view/person/Kaiser=3AMario=3A=3A.export:
>   File too large
>> ------------------------------------------------------------------
>>       at /usr/share/eprints3/perl_lib/EPrints/Update/Views.pm line 1550
>>              EPrints::Update::Views::output_files
> ('EPrints::Repository=HASH(0x7fb0663688d0)',
> '/usr/share/eprints3/archives/ubt_eref/html/de/view/person/Kai...',
> 'XML::LibXML::DocumentFragment=SCALAR(0x7fb06a7e5938)',
> '/usr/share/eprints3/archives/ubt_eref/html/de/view/person/Kai...',
> 'XML::LibXML::Element=SCALAR(0x7fb06a45f160)',
> '/usr/share/eprints3/archives/ubt_eref/html/de/view/person/Kai...',
> 'XML::LibXML::DocumentFragment=SCALAR(0x7fb06a7e5938)',
> '/usr/share/eprints3/archives/ubt_eref/html/de/view/person/Kai...',
> 'XML::LibXML::DocumentFragment=SCALAR(0x7fb06a668aa8)', ...) called
> at /usr/share/eprints3/perl_lib/EPrints/Update/Views.pm line 935
>>              EPrints::Update::Views::update_view_list
> ('EPrints::Repository=HASH(0x7fb0663688d0)',
> '/usr/share/eprints3/archives/ubt_eref/html/de/view/person/Kai...', 'de',
> 'EPrints::Update::Views=HASH(0x7fb06a6863f0)', 'ARRAY(0x7fb064b78c70)')
> called at /usr/share/eprints3/perl_lib/EPrints/Update/Views.pm line 259
>>              EPrints::Update::Views::update_view_file
> ('EPrints::Repository=HASH(0x7fb0663688d0)', 'de',
> '/view/person/Kaiser=3AMario=3A=3A.html',
> '/view/person/Kaiser=3AMario=3A=3A.html') called
> at /usr/share/eprints3/perl_lib/EPrints/Apache/Rewrite.pm line 513
>>
EPrints::Apache::Rewrite::handler('Apache2::RequestRec=SCALAR
> (0x7fb06a471710)') called at -e line 0
>>              eval {...} called at -e line 0
>>
>> In this specific case, the author's name is linked to only 2 EPrints
>> entries, so it is unlikely that this file would be particularly large.
>>
>> Does anybody have an idea concerning this? I would appreciate any help.
>>
>> Thanks!
>>
>> Verena
*** EPrints community wiki: http://wiki.eprints.org/
*** EPrints developers Forum: http://forum.eprints.org/



Jisc is a registered charity (number 1149740) and a company limited by guarantee which is registered in England under Company No. 5747339, VAT No. GB 197 0632 86. Jisc’s registered office is: One Castlepark, Tower Hill, Bristol, BS2 0JA. T 0203 697 5800.

Jisc Services Limited is a wholly owned Jisc subsidiary and a company limited by guarantee which is registered in England under company number 2881024, VAT number GB 197 0632 86. The registered office is: One Castle Park, Tower Hill, Bristol BS2 0JA. T 0203 697 5800.