EPrints Technical Mailing List Archive
Message: #09307
< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First
Re: [EP-tech] SQL Problem at EPrints
- To: David R Newman <drn@ecs.soton.ac.uk>
- Subject: Re: [EP-tech] SQL Problem at EPrints
- From: "Agung Prasetyo W." <prazetyo@gmail.com>
- Date: Sat, 6 May 2023 11:12:19 +0700
CAUTION: This e-mail originated outside the University of Southampton.
Hi David,
I implemented the 2 solutions you provided and when I ran the epadmin reindex command the previous error was resolved.
eprints@repo:~$ ./bin/epadmin reindex repos eprint 7039 7040 7041 7042 7043 7044 7045 7046 7047 --force --verbose
Possible attempt to put comments in qw() list at (eval 162) line 36.
Starting EPrints Repository.
Connecting to DB ... done.
Indexed item: eprint/7039
Indexed item: eprint/7040
Indexed item: eprint/7041
Indexed item: eprint/7042
Indexed item: eprint/7043
Indexed item: eprint/7044
Indexed item: eprint/7045
Indexed item: eprint/7046
Indexed item: eprint/7047
Possible attempt to put comments in qw() list at (eval 162) line 36.
Starting EPrints Repository.
Connecting to DB ... done.
Indexed item: eprint/7039
Indexed item: eprint/7040
Indexed item: eprint/7041
Indexed item: eprint/7042
Indexed item: eprint/7043
Indexed item: eprint/7044
Indexed item: eprint/7045
Indexed item: eprint/7046
Indexed item: eprint/7047
Thank you
Regards,
Agung Prasetyo W.
On Mon, Apr 17, 2023 at 2:47 PM David R Newman <drn@ecs.soton.ac.uk> wrote:
Hi Agung PW,
I think this may be similar to the issue that Mario reported recently. The database cannot index certain words that are in the indexcodes files generated, so that the full text of documents can be indexed.
Before, I proposed two solutions. Below 1 is a stopgap to fix the issue whilst you are on the current version of EPrints but it will mean certain words will not be indexed. 2 is my implemented solution for future versions of EPrints that avoids certain words not being indexed:
1. Add the following to your archive's cfg/cfg.d/indexing.pl (if this does not exist, copy into place from lib/cfg.d/indexing.pl).
if( $word =~ m/[^\x20-\xEF]/ )
{
$ok=0;
}
Add this after the block of code:
if( $word =~ m/^[A-Z][A-Z0-9]+$/ )
{
$ok=1;
}
The words that this will stop being indexed are unlikely to be words that would be search for, as this code should only affect extended characters. The work I did on Mario's issue found these worlds were mostly Latin or Greek characters using a particular font as they were part of mathematical equations. One example is: 𝑒𝑥𝑝.
2. Look at https://github.com/eprints/eprints3.4/issues/320 and merge the commit it contains. This should add mappings for the indexer, so these words can now be indexed. However, for full text indexing, this occurs when the indexcodes files is regenerated. epadmin has a command to regenerate all these and reindex but that could take a very long time with a large repository. Therefore, I have improved the indexer so that the --force flag on "epadmin reindex" will force the indexcodes files to be regenerated and make use of this new mappings (see https://github.com/eprints/eprints3.4/issues/321) if you do not want to use the new version of epadmin. Using the "Reindex Item" button in the web interface should achieve the same thing. If you see my earlier emails to Mario on the EPrints Tech list, you will see I was a little baffled why indexcodes files were only re-generated this was and not currently when using epadmin.
Anyway, with either solution, make sure that the indexer is restarted to apply the changes made. (If you intend to use the "Reindex Item" button I would also reload the webserver just to be sure). Not restarting will not affect you initial use of "epadmin reindex" for specific eprints you want to test/fix but will prevent the changes being applied for future indexing tasks carried out by the indexer.
Regards
David Newman
On 17/04/2023 12:51 am, Agung Prasetyo W. via Eprints-tech wrote:
CAUTION: This e-mail originated outside the University of Southampton.Hi,
When I running command : epadmin reindex *repository_id* *dataset_id* [*eprint_id*]
I got an error like this :Indexed item: eprint/7039
DBD::mysql::st execute failed: Incorrect string value: '\xF0\x9D\x91\x9F13' for column 'word' at row 1 at /usr/share/eprints3/bin/../perl_lib/EPrints/Database.pm line 1287.
Indexed item: eprint/7040
Indexed item: eprint/7041
DBD::mysql::st execute failed: Incorrect string value: '\xF0\x9D\x91\xA6\xF0\x9D...' for column 'word' at row 1 at /usr/share/eprints3/bin/../perl_lib/EPrints/Database.pm line 1287.
Indexed item: eprint/7042
Indexed item: eprint/7043
DBD::mysql::st execute failed: Incorrect string value: '\xF0\x9D\x90\xBF\xF0\x9D...' for column 'word' at row 1 at /usr/share/eprints3/bin/../perl_lib/EPrints/Database.pm line 1287.
Indexed item: eprint/7044
DBD::mysql::st execute failed: Incorrect string value: '\xF0\x9D\x91\xA1\xF0\x9D...' for column 'word' at row 1 at /usr/share/eprints3/bin/../perl_lib/EPrints/Database.pm line 1287.
Indexed item: eprint/7045
Indexed item: eprint/7046
DBD::mysql::st execute failed: Incorrect string value: '\xF0\x9D\x91\x9D\xF0\x9D...' for column 'word' at row 1 at /usr/share/eprints3/bin/../perl_lib/EPrints/Database.pm line 1287.
Indexed item: eprint/7047
Is there any solution for this problem?
Thank you.
Regards,Agung PW
*** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech *** Archive: http://www.eprints.org/tech.php/ *** EPrints community wiki: http://wiki.eprints.org/
- Follow-Ups:
- Re: [EP-tech] SQL Problem at EPrints
- From: "Agung Prasetyo W." <prazetyo@gmail.com>
- Re: [EP-tech] SQL Problem at EPrints
- References:
- [EP-tech] SQL Problem at EPrints
- From: "Agung Prasetyo W." <prazetyo@gmail.com>
- Re: [EP-tech] SQL Problem at EPrints
- From: "Agung Prasetyo W." <prazetyo@gmail.com>
- [EP-tech] SQL Problem at EPrints
- Prev by Date: Re: [EP-tech] {Suspected SPAM} Re: {Suspected SPAM} Re: {Suspected SPAM} Undefined or invalid function when add new dataset field
- Next by Date: [EP-tech] {Suspected SPAM} Re: {Suspected SPAM} Re: {Suspected SPAM} Re: {Suspected SPAM} Undefined or invalid function when add new dataset field
- Previous by thread: [EP-tech] Sort view with creators_name and corp_creators
- Index(es):