EPrints Technical Mailing List Archive
See the EPrints wiki for instructions on how to join this mailing list and related information.
Message: #09787
< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First
Re: [EP-tech] Ask about search result and reindex
- To: Agung Prasetyo W. <prazetyo@gmail.com>
- Subject: Re: [EP-tech] Ask about search result and reindex
- From: David R Newman <drn@ecs.soton.ac.uk>
- Date: Wed, 24 Jul 2024 17:34:32 +0100
Hi Agung,
I have made some improvements to the script at:
http://files.eprints.org/3065/
Here are the installation/usage instructions:
Download the Bash script and run as
follows to check that all eprint records in the live archive
have titles, abstracts and creators indexed (if they exist for
that record):
./find_eprint_rindex_unindexed
If your EPrints installation's archives are not under
/opt/eprints3/archives then specify with -p flag:
./find_eprint_rindex_unindexed -p /usr/share/eprints/archives
If you want to check a specific archive rather than the first
one the script finds then specify -a flag:
./find_eprint_rindex_unindexed -a my_archive
Results are output to the following file or run with -v flag to
outout to the screen:
EPRINTS_PATH/archives/ARCHIVE_ID/var/eprint_rindex_unindexed.txt
If you have un-indexed results you want to ignore you can
provide a new line separated list of these in:
EPRINTS_PATH/archives/ARCHIVE_ID/var/ignore_eprint_rindex_unindexed.txt
Did you specify the ARCHIVE ID as a parameter in the command:
./find_eprint_rindex_unindexed ARCHIVE_ID
Did you make sure you update EP_PATH in the script to match your EPrints path if this is not /opt/eprints3?
Di you update USER_PASS to the username and password for your EPrints database. The default assume that the root user can access the database with a need for a password. You will probably need to change:
USER_PASS="-u root"
To something like:
USER_PASS="-u USERNAME -pPASSWORD"
Where USERNAME is $c->{dbuser} and PASSWORD is $c->{dbpass} in your archive's cfg/cfg.d/database.pl.
I could probably improve the script to get it to pull this out by default when looking up the database name, which is already does from by grabbing dbname from this file.
Regards
David Newman
On 24/07/2024 11:49, Agung Prasetyo W. wrote:
CAUTION: This e-mail originated outside the University of Southampton.Hi David,
How do I know we use eprints database or xapian? After I run your script, it shows nothing. After I open the the file /var/eprint_rindex_unindexed.txt, it shows like below :Copyright (c) 2000, 2021, Oracle and/or its affiliates.
Is my step wrong ??
Thank you.
Regards,Agung PW
On Wed, 24 Jul 2024 at 17:22, David R Newman <drn@ecs.soton.ac.uk> wrote:
Hi Agung,
If you are using the database (i.e. eprint__rindex) table, then I wrote the following (rather hacky) Bash script to test this:
https://files.eprints.org/3065/
The script will ignore items whose metadata visibility is not set to show. It is worth manually checking you database for item you expect to be able to find in search but cannot to see if the metadata_visibility field has been changed. If you create new versions of items this will automatically set the current (now old) version to hide. (This is a far from ideal situation but it is quite difficult to determine a better way to ensure users only find the latest versions, especially when the "New Version" button gets used in the wrong circumstances).
If you are using a Xapian index, (e.g. typically used for simple search), then I did write a different script for this but it is a lot more complex to deploy.
Regards
David Newman
On 24/07/2024 10:51, Agung Prasetyo W. wrote:
CAUTION: This e-mail originated outside the University of Southampton.CAUTION: This e-mail originated outside the University of Southampton.Hi,
Sometimes there are items that don't appear when I do a search, even though they are in the repository. But after I did the command: epadmin reindex [archive_id] eprint [item_id]
As a result, these items can appear in search results.
Is there a way to find out the item IDs that have not been indexed so that we can reindex the item IDs?
Thank you.
Regards,Agung Prasetyo W.
*** Options: https://wiki.eprints.org/w/Eprints-tech_Mailing_List *** Archive: https://www.eprints.org/tech.php/ *** EPrints community wiki: https://wiki.eprints.org/
*** Options: https://wiki.eprints.org/w/Eprints-tech_Mailing_List *** Archive: https://www.eprints.org/tech.php/ *** EPrints community wiki: https://wiki.eprints.org/
- Follow-Ups:
- Re: [EP-tech] Ask about search result and reindex
- From: "Agung Prasetyo W." <prazetyo@gmail.com>
- Re: [EP-tech] Ask about search result and reindex
- References:
- [EP-tech] Ask about search result and reindex
- From: "Agung Prasetyo W." <prazetyo@gmail.com>
- Re: [EP-tech] Ask about search result and reindex
- From: David R Newman <drn@ecs.soton.ac.uk>
- Re: [EP-tech] Ask about search result and reindex
- From: "Agung Prasetyo W." <prazetyo@gmail.com>
- Re: [EP-tech] Ask about search result and reindex
- From: David R Newman <drn@ecs.soton.ac.uk>
- [EP-tech] Ask about search result and reindex
- Prev by Date: Re: [EP-tech] Bots - Server Resources
- Next by Date: Re: [EP-tech] "Configuring Distinct Input Forms for Multiple Archives in Shared Hosting Setup"
- Previous by thread: Re: [EP-tech] Ask about search result and reindex
- Next by thread: Re: [EP-tech] Ask about search result and reindex
- Index(es):