EPrints Technical Mailing List Archive
See the EPrints wiki for instructions on how to join this mailing list and related information.
Message: #03141
< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First
[EP-tech] Re: Browse pages
- To: eprints-tech@ecs.soton.ac.uk
- Subject: [EP-tech] Re: Browse pages
- From: Jan Ploski <jpl@plosquare.com>
- Date: Tue, 10 Jun 2014 15:35:08 +0200
You might wish to run generate_views (for one id) under supervision of a profiler to see where it is spending most time (e.g. NYTProf - read up on it, basically you just add some command-line options to collect data and then run nytprofhtml to obtain distributions of call counts and time spent in each subroutine/line/block). I remember having managed to cut some 20% off the generate_views total run time by specific optimizations (a lot depends on your views.pl configuration, so no panacea; in fact tweaking views.pl may bring much more improvement than optimizing the code). However, in the end it was precisely the roundabaout way of generating output by building in-memory data structures rather than producing text fragments directly that accounted for lots of "distributed" overhead - not easy to fix without redesigning EPrints entirely. Theoretically, code that just writes bunches of files should not be cooking the CPU. But it is Perl, it's just not great for performing millions of memory operations or nested subroutine calls. On 06/10/2014 03:20 PM, Ian Stuart wrote: > On 10/06/14 14:08, Yuri wrote: >> Il 10/06/2014 14:57, Ian Stuart ha scritto: >>> On 10/06/14 12:39, John Salter wrote: >>>> If you set up a cron job to regenerate the page twice a day (so it's >>>> never older that a day), does that help things? >>> Unfortunately not..... because the generate_views takes several days to >>> complete (160,000+ records, all multiple-authors - "several" is a BIG >>> number) then the view is out of date before its even finished! >>> >>> :) >>> >> >> can't you just delete them every day, early in the morning? In this way, >> they're regenerated upon request, thus always updated at the current day. > > Surely that's no different to either letting them auto-generate, or > doing a "generate_views"? > > (the problem is not the regeneration.... but the fact that EPrints seems > to spend 4 minutes in a tight loop, having spent 9 seconds slurping > every single author out the database... >
- References:
- [EP-tech] Re: Browse pages
- From: Andrew Beeken <anbeeken@lincoln.ac.uk>
- [EP-tech] Re: Browse pages
- From: Ian Stuart <Ian.Stuart@ed.ac.uk>
- [EP-tech] Re: Browse pages
- From: John Salter <J.Salter@leeds.ac.uk>
- [EP-tech] Re: Browse pages
- From: Ian Stuart <Ian.Stuart@ed.ac.uk>
- [EP-tech] Re: Browse pages
- From: Yuri <yurj@alfa.it>
- [EP-tech] Re: Browse pages
- From: Ian Stuart <Ian.Stuart@ed.ac.uk>
- [EP-tech] Re: Browse pages
- Prev by Date: [EP-tech] Re: Browse pages
- Next by Date: [EP-tech] Re: Browse pages
- Previous by thread: [EP-tech] Re: Browse pages
- Next by thread: [EP-tech] Re: Browse pages
- Index(es):