EPrints Technical Mailing List Archive

See the EPrints wiki for instructions on how to join this mailing list and related information.

Message: #03141


< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First

[EP-tech] Re: Browse pages


You might wish to run generate_views (for one id) under supervision of a
profiler to see where it is spending most time (e.g. NYTProf - read up
on it, basically you just add some command-line options to collect data
and then run nytprofhtml to obtain distributions of call counts and time
spent in each subroutine/line/block).

I remember having managed to cut some 20% off the generate_views total
run time by specific optimizations (a lot depends on your views.pl
configuration, so no panacea; in fact tweaking views.pl may bring much
more improvement than optimizing the code).

However, in the end it was precisely the roundabaout way of generating
output by building in-memory data structures rather than producing text
fragments directly that accounted for lots of "distributed" overhead -
not easy to fix without redesigning EPrints entirely.

Theoretically, code that just writes bunches of files should not be
cooking the CPU. But it is Perl, it's just not great for performing
millions of memory operations or nested subroutine calls.

On 06/10/2014 03:20 PM, Ian Stuart wrote:
> On 10/06/14 14:08, Yuri wrote:
>> Il 10/06/2014 14:57, Ian Stuart ha scritto:
>>> On 10/06/14 12:39, John Salter wrote:
>>>> If you set up a cron job to regenerate the page twice a day (so it's
>>>> never older that a day), does that help things?
>>> Unfortunately not..... because the generate_views takes several days to
>>> complete (160,000+ records, all multiple-authors - "several" is a BIG
>>> number) then the view is out of date before its even finished!
>>>
>>> :)
>>>
>>
>> can't you just delete them every day, early in the morning? In this way,
>> they're regenerated upon request, thus always updated at the current day.
> 
> Surely that's no different to either letting them auto-generate, or 
> doing a "generate_views"?
> 
> (the problem is not the regeneration.... but the fact that EPrints seems 
> to spend 4 minutes in a tight loop, having spent 9 seconds slurping 
> every single author out the database...
>