EPrints Technical Mailing List Archive

See the EPrints wiki for instructions on how to join this mailing list and related information.

Message: #04547


< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First

[EP-tech] digital preservation - indexing errors


 

We just completed an upgrade of our repository, which includes a re-indexing phase of all the contents.

It was a good opportunity to take note of the errors that come up during indexing.

 

Here is a list of the common errors that occurred during indexing:

1.       Error: Illegal entry in bfrange block in ToUnicode CMap

2.       Error: Invalid Font Weight

3.       Error (##): Illegal character <##> in hex string

4.       Error: Can't create transform

5.       Error: Couldn't link the profiles

There are also some of these:

6.        

Use of uninitialized value $data in substr at /opt/eprints3/tools/../perl_lib/Text/Extract/Word.pm line 68.

Use of uninitialized value $magic in numeric eq (==) at /opt/eprints3/tools/../perl_lib/Text/Extract/Word.pm line 69.

Use of uninitialized value $magic in sprintf at /opt/eprints3/tools/../perl_lib/Text/Extract/Word.pm line 69.

This does not seem to be a Word document, but it is pretending to be one: 0 at /opt/eprints3/tools/doc2txt line 68

Error 255 from doc2txt command: […]

 

 

Error #1 and #3 look to be the most common.

 

Have you encountered these types of indexing errors? 

How serious are they in terms of digital preservation?

Do you use any specific strategies/workflows for dealing with these?

Do the EPrints preservation (http://files.eprints.org/696/) plugins help with identifying/solving these issues?

 

Thanks for any comments/suggestions about this.

 

Tomasz