EPrints Technical Mailing List Archive

See the EPrints wiki for instructions on how to join this mailing list and related information.

Message: #09787


< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First

Re: [EP-tech] Ask about search result and reindex


Hi Agung,

I have made some improvements to the script at:

http://files.eprints.org/3065/

Here are the installation/usage instructions:

Download the Bash script and run as follows to check that all eprint records in the live archive have titles, abstracts and creators indexed (if they exist for that record):

  ./find_eprint_rindex_unindexed

If your EPrints installation's archives are not under /opt/eprints3/archives then specify with -p flag:

  ./find_eprint_rindex_unindexed -p /usr/share/eprints/archives

If you want to check a specific archive rather than the first one the script finds then specify -a flag:

  ./find_eprint_rindex_unindexed -a my_archive

Results are output to the following file or run with -v flag to outout to the screen:

  EPRINTS_PATH/archives/ARCHIVE_ID/var/eprint_rindex_unindexed.txt

If you have un-indexed results you want to ignore you can provide a new line separated list of these in:

  EPRINTS_PATH/archives/ARCHIVE_ID/var/ignore_eprint_rindex_unindexed.txt

Regards

David Newman

On 24/07/2024 12:05, David R Newman wrote:

Did you specify the ARCHIVE ID as a parameter in the command:

./find_eprint_rindex_unindexed ARCHIVE_ID

Did you make sure you update EP_PATH in the script to match your EPrints path if this is not /opt/eprints3?

Di you update USER_PASS to the username and password for your EPrints database.  The default assume that the root user can access the database with a need for a password.  You will probably need to change:

USER_PASS="-u root"

To something like:

USER_PASS="-u USERNAME -pPASSWORD"

Where USERNAME is $c->{dbuser} and PASSWORD is $c->{dbpass} in your archive's cfg/cfg.d/database.pl.  

I could probably improve the script to get it to pull this out by default when looking up the database name, which is already does from by grabbing dbname from this file.

Regards

David Newman

On 24/07/2024 11:49, Agung Prasetyo W. wrote:
CAUTION: This e-mail originated outside the University of Southampton.
Hi David,

How do I know we use eprints database or xapian? After I run your script, it shows nothing. After I open the the file /var/eprint_rindex_unindexed.txt, it shows like below :
Copyright (c) 2000, 2021, Oracle and/or its affiliates.

Is my step wrong ??

Thank you.

Regards,
Agung PW





On Wed, 24 Jul 2024 at 17:22, David R Newman <drn@ecs.soton.ac.uk> wrote:

Hi Agung,

If you are using the database (i.e. eprint__rindex) table, then I wrote the following (rather hacky) Bash script to test this:

https://files.eprints.org/3065/

The script will ignore items whose metadata visibility is not set to show.  It is worth manually checking you database for item you expect to be able to find in search but cannot to see if the metadata_visibility field has been changed.  If you create new versions of items this will automatically set the current (now old) version to hide.  (This is a far from ideal situation but it is quite difficult to determine a better way to ensure users only find the latest versions, especially when the "New Version" button gets used in the wrong circumstances).

If you are using a Xapian index, (e.g. typically used for simple search), then I did write a different script for this but it is a lot more complex to deploy.

Regards

David Newman

On 24/07/2024 10:51, Agung Prasetyo W. wrote:
CAUTION: This e-mail originated outside the University of Southampton.
CAUTION: This e-mail originated outside the University of Southampton.
Hi,

Sometimes there are items that don't appear when I do a search, even though they are in the repository. But after I did the command: epadmin reindex [archive_id] eprint [item_id]
As a result, these items can appear in search results.

Is there a way to find out the item IDs that have not been indexed so that we can reindex the item IDs?

Thank you.

Regards,
Agung Prasetyo W.

*** Options: https://wiki.eprints.org/w/Eprints-tech_Mailing_List
*** Archive: https://www.eprints.org/tech.php/
*** EPrints community wiki: https://wiki.eprints.org/


*** Options: https://wiki.eprints.org/w/Eprints-tech_Mailing_List
*** Archive: https://www.eprints.org/tech.php/
*** EPrints community wiki: https://wiki.eprints.org/