EPrints Technical Mailing List Archive
Message: #09266
< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First
[EP-tech] indexing full text document pdf with tag get empty word in eprint__rindex table
- To: "eprints-tech@ecs.soton.ac.uk" <eprints-tech@ecs.soton.ac.uk>
- Subject: [EP-tech] indexing full text document pdf with tag get empty word in eprint__rindex table
- From: "Beaudoin, Mario" <Mario.Beaudoin@uqtr.ca>
- Date: Wed, 5 Apr 2023 19:45:11 +0000
CAUTION: This e-mail originated outside the University of Southampton.
Hello, We use eprints 3.4.3 and 14 repositories with a lot of pdf on each and we have an indexing bug with tagged PDF. All the eprint__rindex tables with theses PDF got an empty “word”. They index only some documents word not all. The sql command give to me all the bugged PDF select * from eprint__rindex where word=''; The reindex of these eprint makes an error ./epadmin reindex eprints_fra1 --verbose eprint 75; DBD::mysql::st execute failed: Duplicate entry 'documents--75' for key 'PRIMARY' at /opt/eprints3/bin/../perl_lib/EPrints/Database.pm line 1289. I try to modify the file indexing.pl but they already include bypass empty word. I check the indexcodes.txt for this document and it is complete with a lot of words not include in the database eprint__rindex I download another PDF document for these eprints not tagged reindex the document with no error.
I think that epadmin reindex got some empty word and stop to index soon as it got another empty word because it indexes some words but not all. The double –verbose of the function gives that.
[eprints_fra1] Database execute debug: SELECT `eprintid`,`pos`,`projects` FROM `eprint_projects` WHERE `eprintid` IN (75) Database execute debug: SELECT `eprintid`,`pos`,`skill_areas` FROM `eprint_skill_areas` WHERE `eprintid` IN (75) [eprints_fra1] Database execute debug: SELECT `eprintid`,`pos`,`skill_areas` FROM `eprint_skill_areas` WHERE `eprintid` IN (75) Database execute debug: INSERT INTO `eprint__rindex` (`eprintid`,`field`,`word`) VALUES (?,?,?) [eprints_fra1] Database execute debug: INSERT INTO `eprint__rindex` (`eprintid`,`field`,`word`) VALUES (?,?,?) DBD::mysql::st execute failed: Duplicate entry 'documents--75' for key 'PRIMARY' at /opt/eprints3/bin/../perl_lib/EPrints/Database.pm line 1289. Database execute debug: INSERT INTO `eprint__rindex` (`eprintid`,`field`,`word`) VALUES (?,?,?) [eprints_fra1] Database execute debug: INSERT INTO `eprint__rindex` (`eprintid`,`field`,`word`) VALUES (?,?,?) Database execute debug: INSERT INTO `eprint__index_grep` (`eprintid`,`fieldname`,`grepstring`) VALUES (?,?,?) [eprints_fra1] Database execute debug: INSERT INTO `eprint__index_grep` (`eprintid`,`fieldname`,`grepstring`) VALUES (?,?,?) Database execute debug: INSERT INTO `eprint__rindex` (`eprintid`,`field`,`word`) VALUES (?,?,?) I notice that if we take the bugged .pdf to .PS and convert to .pdf it work fine but it’s not a solution for us Thank you for your help Mario |
- Follow-Ups:
- [EP-tech] indexing full text document pdf with tag get empty word in eprint__rindex table
- From: "Beaudoin, Mario" <Mario.Beaudoin@uqtr.ca>
- [EP-tech] indexing full text document pdf with tag get empty word in eprint__rindex table
- References:
- [EP-tech] indexing full text document pdf with tag get empty word in eprint__rindex table
- From: "Beaudoin, Mario" <Mario.Beaudoin@uqtr.ca>
- [EP-tech] indexing full text document pdf with tag get empty word in eprint__rindex table
- Prev by Date: [EP-tech] {Suspected SPAM} Re: Error calling df: Cannot allocate memory
- Next by Date: Re: [EP-tech] indexing full text document pdf with tag get empty word in eprint__rindex table
- Previous by thread: [EP-tech] Sort view with creators_name and corp_creators
- Index(es):