EPrints Technical Mailing List Archive

See the EPrints wiki for instructions on how to join this mailing list and related information.

Message: #08999


< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First

Re: [EP-tech] Limit Export-search-results (max_items for export)


CAUTION: This e-mail originated outside the University of Southampton.

Another option is to disallow /cgi/ in robots.txt (but of course there are always crawlers that don’t respect robots.txt – if we find such a bad guy, it will be banned in our apachebots configuration).

Regards,

Martin

 

--

Dr. Martin Brändle
Zentrale Informatik
Universität Zürich
Stampfenbachstr. 73
CH-8006 Zürich

 

 

From: eprints-tech-bounces@ecs.soton.ac.uk <eprints-tech-bounces@ecs.soton.ac.uk> on behalf of Yuri via Eprints-tech <eprints-tech@ecs.soton.ac.uk>
Date: Wednesday, 6 July 2022 at 09:45
To: eprints-tech@ecs.soton.ac.uk <eprints-tech@ecs.soton.ac.uk>
Subject: Re: [EP-tech] Limit Export-search-results (max_items for export)

CAUTION: This e-mail originated outside the University of Southampton.

A solution could be to add some js to the export button click to submit the form. Crawlers will not run it, while browsers will.

Il 06/07/22 09:22, Stenger, Avischai via Eprints-tech ha scritto:

CAUTION: This e-mail originated outside the University of Southampton.

Hi David,

 

You got it. This is exactly our problem. Some crawlers ask for an export that contains a very large number of records, which brings our HW to its knees.

 

Regards,

 


Avischai

Am 05.07.2022 um 16:58 schrieb David R Newman via Eprints-tech <eprints-tech@ecs.soton.ac.uk>:

 

Hi Avischai,

Unfortunately, I don't think there is a way of limiting the number of records that can be exported.  I think the consideration at the time was that browse view web pages with loads of items can take a long time to load (even when cached) and they are not particularly useful to a user with their web browser as the page will be really long, (i.e. take forever to scroll through).  So rather than putting load on the server to generate such a web page it easier just to say, "this page has too many items to display".  The opposite is true with exports, which are typically machine-readable and therefore either used for some automated analysis or post-processed (e.g. truncated to only the first n items) before being displayed to a real user.  If an export itself was truncated or restricted if it had what was determined "too many items", this would then prevent or render the analysis/post-processing useless.  I am not sure what other people's thoughts are about this?

I think I may appreciate what might be your more general point, which is the high processing cost of generating these large exports.  If you have some crawler going through your browse views and asking for every export format for some of these really long listings of items, it can put quite some load on the server, (/cgi/exportview is not cached).  Sometimes, there can be multiple connections (maybe even 20+) from the same IP address trying to request view listing exports.   I have observed crawlers doing this on a number of EPrints repositories and have had to resort to blocking the IP addresses, at least temporarily.  We have been considering for future version of EPrints, if there is a way of restricting the number of requests that can make for processor-intensive pages over a set period of time:

https://github.com/eprints/eprints3.4/issues/102

Regards

David Newman

On 05/07/2022 3:28 pm, Stenger, Avischai via Eprints-tech wrote:

CAUTION: This e-mail originated outside the University of Southampton.
 
Hi,
 
I can limit the "max of founded Records" with „max_items“ in views.pl , but it looks like there is no limit for "export founded records“
 
So as I search after „roman“ and get the message "The number of items (7) for this view has exceeded system limits (6). The system administrator either needs to increase "max_items" or apply additional filters to this view.“
 
I can still klick on this Message-page on „export“ and get all the records. Is there a way to limit the permitted size (count)  of records for the export?
 
 
Regards & Tnks
 
*** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
*** Archive: https://eur03.safelinks.protection.outlook.com/?url="">
 LCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=VG3O0I2WEGwkx5UrM7sl26icCIr8WFk2WAs1A608FQk%3D&amp;reserved=0
*** EPrints community wiki: https://eur03.safelinks.protection.outlook.com/?url="">
 zIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=sLgDsQcurx4gFEz5gcxeaNcM5ptAjUZ26pRMellBXT8%3D&amp;reserved=0

*** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
*** Archive: http://www.eprints.org/tech.php/
*** EPrints community wiki: http://wiki.eprints.org/

 



*** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
*** Archive: http://www.eprints.org/tech.php/
*** EPrints community wiki: http://wiki.eprints.org/