EPrints Technical Mailing List Archive

Message: #02924


< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First

[EP-tech] Indexing issues


Hi All,

 

We seem to be running into some indexer issues, just wondering if anyone else is experiencing the same. On a full index, all seems to go well for 100’s of eprints, after which we are getting piles of pdftotext errors. Post-index, content for the eprints noted in the errors is (predictably) not in the index. Example run at the bottom, any suggestions on what might be happening would be welcome,

 

Cheers,

Casey

 

 

Example run:

 

eprints@puppy:~/bin$ ./epadmin reindex research_eprints eprint --verbose

 

Starting EPrints Repository.

Connecting to DB ... done.

 

You are about to reindex "eprint" in the research_eprints repository.

This can take some time.

 

Number of records in set: 3980

Continue [y/n] ? es

Exception: Unable to get write lock on /usr/share/eprints3/archives/research_eprints/var/xapian: already locked

Indexed item: eprint/1

Indexed item: eprint/3

...

Indexed item: eprint/1488

Error 255 from pdftotext command: \/usr\/bin\/pdftotext -enc UTF-8 -layout \/usr\/share\/eprints3\/archives\/research_eprints\/documents\/disk0\/00\/00\/14\/90\/01\/Whelan_Maudie\.pdf \/tmp\/jdeN7U21yV\/Whelan_Maudie\.txt at /mnt/eprintsdrive/eprints3/bin/../perl_lib/EPrints/BackCompatibility.pm line 463

        EPrints::Platform::exec('EPrints::Repository=HASH(0x37ed110)', 'pdftotext', 'TARGET', '/tmp/jdeN7U21yV/Whelan_Maudie.txt', 'TARGET_DIR', 'File::Temp::Dir=HASH(0x66d6300)', 'SOURCE', 'File::Temp=GLOB(0x6d3ebd0)') called at /mnt/eprintsdrive/eprints3/bin/../perl_lib/EPrints/Repository.pm line 1994

        EPrints::Repository::exec('EPrints::Repository=HASH(0x37ed110)', 'pdftotext', 'SOURCE', 'File::Temp=GLOB(0x6d3ebd0)', 'TARGET_DIR', 'File::Temp::Dir=HASH(0x66d6300)', 'TARGET', '/tmp/jdeN7U21yV/Whelan_Maudie.txt') called at /usr/share/eprints3/perl_lib/EPrints/Plugin/Convert/PlainText.pm line 141

        EPrints::Plugin::Convert::PlainText::export('EPrints::Plugin::Convert::PlainText=HASH(0x6711160)', 'File::Temp::Dir=HASH(0x66d6300)', 'EPrints::DataObj::Document=HASH(0x67d0430)', 'text/plain') called at (eval 102) line 137

        EPrints::Config::research_eprints::__ANON__('repository', 'EPrints::Repository=HASH(0x37ed110)', 'fields', 'ARRAY(0x67c5e40)', 'dataobj', 'EPrints::DataObj::EPrint=HASH(0x6f74250)') called at /mnt/eprintsdrive/eprints3/bin/../perl_lib/EPrints/Repository.pm line 1551

        EPrints::Repository::run_trigger('EPrints::Repository=HASH(0x37ed110)', 11, 'dataobj', 'EPrints::DataObj::EPrint=HASH(0x6f74250)', 'fields', 'ARRAY(0x67c5e40)') called at /usr/share/eprints3/perl_lib/EPrints/Plugin/Event/Indexer.pm line 68

        EPrints::Plugin::Event::Indexer::_index_fields('EPrints::Plugin::Event::Indexer=HASH(0x64f7660)', 'EPrints::DataObj::EPrint=HASH(0x6f74250)', 'ARRAY(0x67c5e40)') called at /usr/share/eprints3/perl_lib/EPrints/Plugin/Event/Indexer.pm line 37

        EPrints::Plugin::Event::Indexer::index_all('EPrints::Plugin::Event::Indexer=HASH(0x64f7660)', 'EPrints::DataObj::EPrint=HASH(0x6f74250)') called at ./epadmin line 1992

        main::__ANON__('EPrints::Repository=HASH(0x37ed110)', 'EPrints::DataSet=HASH(0x3f8d198)', 'EPrints::DataObj::EPrint=HASH(0x6f74250)', undef) called at /mnt/eprintsdrive/eprints3/bin/../perl_lib/EPrints/List.pm line 664

        EPrints::List::map('EPrints::List=HASH(0x2952a48)', 'CODE(0x358c368)') called at ./epadmin line 1998

        main::reindex('research_eprints', 'eprint') called at ./epadmin line 344

 

Error 255 from pdftotext command: \/usr\/bin\/pdftotext -enc UTF-8 -layout \/usr\/share\/eprints3\/archives\/research_eprints\/documents\/disk0\/00\/00\/14\/90\/03\/Whelan_Maudie\.pdf \/tmp\/jdeN7U21yV\/Whelan_Maudie\.txt at /mnt/eprintsdrive/eprints3/bin/../perl_lib/EPrints/BackCompatibility.pm line 463

        EPrints::Platform::exec('EPrints::Repository=HASH(0x37ed110)', 'pdftotext', 'TARGET', '/tmp/jdeN7U21yV/Whelan_Maudie.txt', 'TARGET_DIR', 'File::Temp::Dir=HASH(0x66d6300)', 'SOURCE', 'File::Temp=GLOB(0x7137318)') called at /mnt/eprintsdrive/eprints3/bin/../perl_lib/EPrints/Repository.pm line 1994

        EPrints::Repository::exec('EPrints::Repository=HASH(0x37ed110)', 'pdftotext', 'SOURCE', 'File::Temp=GLOB(0x7137318)', 'TARGET_DIR', 'File::Temp::Dir=HASH(0x66d6300)', 'TARGET', '/tmp/jdeN7U21yV/Whelan_Maudie.txt') called at /usr/share/eprints3/perl_lib/EPrints/Plugin/Convert/PlainText.pm line 141

        EPrints::Plugin::Convert::PlainText::export('EPrints::Plugin::Convert::PlainText=HASH(0x6e40e20)', 'File::Temp::Dir=HASH(0x66d6300)', 'EPrints::DataObj::Document=HASH(0x6c02ec8)', 'text/plain') called at (eval 102) line 137

        EPrints::Config::research_eprints::__ANON__('repository', 'EPrints::Repository=HASH(0x37ed110)', 'fields', 'ARRAY(0x67c5e40)', 'dataobj', 'EPrints::DataObj::EPrint=HASH(0x6f74250)') called at /mnt/eprintsdrive/eprints3/bin/../perl_lib/EPrints/Repository.pm line 1551

        EPrints::Repository::run_trigger('EPrints::Repository=HASH(0x37ed110)', 11, 'dataobj', 'EPrints::DataObj::EPrint=HASH(0x6f74250)', 'fields', 'ARRAY(0x67c5e40)') called at /usr/share/eprints3/perl_lib/EPrints/Plugin/Event/Indexer.pm line 68

        EPrints::Plugin::Event::Indexer::_index_fields('EPrints::Plugin::Event::Indexer=HASH(0x64f7660)', 'EPrints::DataObj::EPrint=HASH(0x6f74250)', 'ARRAY(0x67c5e40)') called at /usr/share/eprints3/perl_lib/EPrints/Plugin/Event/Indexer.pm line 37

        EPrints::Plugin::Event::Indexer::index_all('EPrints::Plugin::Event::Indexer=HASH(0x64f7660)', 'EPrints::DataObj::EPrint=HASH(0x6f74250)') called at ./epadmin line 1992

        main::__ANON__('EPrints::Repository=HASH(0x37ed110)', 'EPrints::DataSet=HASH(0x3f8d198)', 'EPrints::DataObj::EPrint=HASH(0x6f74250)', undef) called at /mnt/eprintsdrive/eprints3/bin/../perl_lib/EPrints/List.pm line 664

        EPrints::List::map('EPrints::List=HASH(0x2952a48)', 'CODE(0x358c368)') called at ./epadmin line 1998

        main::reindex('research_eprints', 'eprint') called at ./epadmin line 344

 

Indexed item: eprint/1503

Error 255 from pdftotext command: \/usr\/bin\/pdftotext -enc UTF-8 -layout \/usr\/share\/eprints3\/archives\/research_eprints\/documents\/disk0\/00\/00\/15\/04\/01\/Emberley\-Burke_Wanda\.pdf \/tmp\/diwwe8oTDs\/Emberley\-Burke_Wanda\.txt at /mnt/eprintsdrive/eprints3/bin/../perl_lib/EPrints/BackCompatibility.pm line 463

        EPrints::Platform::exec('EPrints::Repository=HASH(0x37ed110)', 'pdftotext', 'TARGET', '/tmp/diwwe8oTDs/Emberley-Burke_Wanda.txt', 'TARGET_DIR', 'File::Temp::Dir=HASH(0x6af6f88)', 'SOURCE', 'File::Temp=GLOB(0x70c9260)') called at /mnt/eprintsdrive/eprints3/bin/../perl_lib/EPrints/Repository.pm line 1994

        EPrints::Repository::exec('EPrints::Repository=HASH(0x37ed110)', 'pdftotext', 'SOURCE', 'File::Temp=GLOB(0x70c9260)', 'TARGET_DIR', 'File::Temp::Dir=HASH(0x6af6f88)', 'TARGET', '/tmp/diwwe8oTDs/Emberley-Burke_Wanda.txt') called at /usr/share/eprints3/perl_lib/EPrints/Plugin/Convert/PlainText.pm line 141

        EPrints::Plugin::Convert::PlainText::export('EPrints::Plugin::Convert::PlainText=HASH(0x6993008)', 'File::Temp::Dir=HASH(0x6af6f88)', 'EPrints::DataObj::Document=HASH(0x6b9ae38)', 'text/plain') called at (eval 102) line 137

        EPrints::Config::research_eprints::__ANON__('repository', 'EPrints::Repository=HASH(0x37ed110)', 'fields', 'ARRAY(0x65d2750)', 'dataobj', 'EPrints::DataObj::EPrint=HASH(0x6ed82c8)') called at /mnt/eprintsdrive/eprints3/bin/../perl_lib/EPrints/Repository.pm line 1551

        EPrints::Repository::run_trigger('EPrints::Repository=HASH(0x37ed110)', 11, 'dataobj', 'EPrints::DataObj::EPrint=HASH(0x6ed82c8)', 'fields', 'ARRAY(0x65d2750)') called at /usr/share/eprints3/perl_lib/EPrints/Plugin/Event/Indexer.pm line 68

        EPrints::Plugin::Event::Indexer::_index_fields('EPrints::Plugin::Event::Indexer=HASH(0x64f7660)', 'EPrints::DataObj::EPrint=HASH(0x6ed82c8)', 'ARRAY(0x65d2750)') called at /usr/share/eprints3/perl_lib/EPrints/Plugin/Event/Indexer.pm line 37

        EPrints::Plugin::Event::Indexer::index_all('EPrints::Plugin::Event::Indexer=HASH(0x64f7660)', 'EPrints::DataObj::EPrint=HASH(0x6ed82c8)') called at ./epadmin line 1992

        main::__ANON__('EPrints::Repository=HASH(0x37ed110)', 'EPrints::DataSet=HASH(0x3f8d198)', 'EPrints::DataObj::EPrint=HASH(0x6ed82c8)', undef) called at /mnt/eprintsdrive/eprints3/bin/../perl_lib/EPrints/List.pm line 664

        EPrints::List::map('EPrints::List=HASH(0x2952a48)', 'CODE(0x358c368)') called at ./epadmin line 1998

        main::reindex('research_eprints', 'eprint') called at ./epadmin line 344

 

Error 255 from pdftotext command: \/usr\/bin\/pdftotext -enc UTF-8 -layout \/usr\/share\/eprints3\/archives\/research_eprints\/documents\/disk0\/00\/00\/15\/04\/03\/Emberley\-Burke_Wanda\.pdf \/tmp\/diwwe8oTDs\/Emberley\-Burke_Wanda\.txt at /mnt/eprintsdrive/eprints3/bin/../perl_lib/EPrints/BackCompatibility.pm line 463

        EPrints::Platform::exec('EPrints::Repository=HASH(0x37ed110)', 'pdftotext', 'TARGET', '/tmp/diwwe8oTDs/Emberley-Burke_Wanda.txt', 'TARGET_DIR', 'File::Temp::Dir=HASH(0x6af6f88)', 'SOURCE', 'File::Temp=GLOB(0x71b74a8)') called at /mnt/eprintsdrive/eprints3/bin/../perl_lib/EPrints/Repository.pm line 1994

        EPrints::Repository::exec('EPrints::Repository=HASH(0x37ed110)', 'pdftotext', 'SOURCE', 'File::Temp=GLOB(0x71b74a8)', 'TARGET_DIR', 'File::Temp::Dir=HASH(0x6af6f88)', 'TARGET', '/tmp/diwwe8oTDs/Emberley-Burke_Wanda.txt') called at /usr/share/eprints3/perl_lib/EPrints/Plugin/Convert/PlainText.pm line 141

        EPrints::Plugin::Convert::PlainText::export('EPrints::Plugin::Convert::PlainText=HASH(0x686c748)', 'File::Temp::Dir=HASH(0x6af6f88)', 'EPrints::DataObj::Document=HASH(0x71fefb0)', 'text/plain') called at (eval 102) line 137

        EPrints::Config::research_eprints::__ANON__('repository', 'EPrints::Repository=HASH(0x37ed110)', 'fields', 'ARRAY(0x65d2750)', 'dataobj', 'EPrints::DataObj::EPrint=HASH(0x6ed82c8)') called at /mnt/eprintsdrive/eprints3/bin/../perl_lib/EPrints/Repository.pm line 1551

        EPrints::Repository::run_trigger('EPrints::Repository=HASH(0x37ed110)', 11, 'dataobj', 'EPrints::DataObj::EPrint=HASH(0x6ed82c8)', 'fields', 'ARRAY(0x65d2750)') called at /usr/share/eprints3/perl_lib/EPrints/Plugin/Event/Indexer.pm line 68

        EPrints::Plugin::Event::Indexer::_index_fields('EPrints::Plugin::Event::Indexer=HASH(0x64f7660)', 'EPrints::DataObj::EPrint=HASH(0x6ed82c8)', 'ARRAY(0x65d2750)') called at /usr/share/eprints3/perl_lib/EPrints/Plugin/Event/Indexer.pm line 37

        EPrints::Plugin::Event::Indexer::index_all('EPrints::Plugin::Event::Indexer=HASH(0x64f7660)', 'EPrints::DataObj::EPrint=HASH(0x6ed82c8)') called at ./epadmin line 1992

        main::__ANON__('EPrints::Repository=HASH(0x37ed110)', 'EPrints::DataSet=HASH(0x3f8d198)', 'EPrints::DataObj::EPrint=HASH(0x6ed82c8)', undef) called at /mnt/eprintsdrive/eprints3/bin/../perl_lib/EPrints/List.pm line 664

        EPrints::List::map('EPrints::List=HASH(0x2952a48)', 'CODE(0x358c368)') called at ./epadmin line 1998

        main::reindex('research_eprints', 'eprint') called at ./epadmin line 344

 

 

-----------------------------------------------

Casey Hilliard

System Administrator

Library Information Technology Services (LITS)

Memorial University of Newfoundland

Ph: (709)864-6267

Ce: (709)699-3041