EPrints Technical Mailing List Archive

See the EPrints wiki for instructions on how to join this mailing list and related information.

Message: #10115


< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First

RE: [EP-tech] Thousands of dataobj.xml files


CAUTION: This e-mail originated outside the University of Southampton.

Hi Fernando,
In addition to David’s reply, for most repositories, the size of all the revision files is much smaller than the content, but they provide useful data to see e.g. when embargos have been lifted, or what changes there were in the item data when it was saved. Combined with the data in the ‘history’ dataset, you can see who changed a record, what was changed, and when.

 

They shouldn’t cause any problems on your system.

 

If you want to get from the data you have, to the specific file, run the following queries in your database – based on the /id/file/[number] – 136799 in your example:
SELECT * FROM file WHERE fileid = 136799;

 

From the results, the datasetid should be ‘history’, and there should be a number in the ‘objectid’ field e.g. 12345.

 

If you select that ‘objectid’ from the history table, it will give you details of what caused the revision to the data:

SELECT * FROM history WHERE historyid = 12345;

 

The columns from history table should include datasetid (hopefully ‘eprint’), the objectid (the eprint ID – 6789 as an example below) and the revision number e.g. 10.

This revision number is the filename on-disk, within the item’s document folder: archives/ARCHIVEID/documents/disk0/00/00/67/89/revisions/10.xml

 

Each eprint should have a ‘revisions’ folder with numbered XML files in it.

I think very old versions of EPrints (v2, maybe v3.1?) stored these revision files differently, so if the repository existed before v3.3.12, early eprints may have other styles of revision files.

 

Cheers,

John

 

From: eprints-tech-request@ecs.soton.ac.uk <eprints-tech-request@ecs.soton.ac.uk> On Behalf Of David Newman
Sent: 08 May 2025 21:13
To: eprints-tech@ecs.soton.ac.uk
Subject: Re: [EP-tech] Thousands of dataobj.xml files

 

CAUTION: External Message. Use caution opening links and attachments.

Hi,

 

dataobj.xml files are the placeholder name for history revision files that appear in the individual EPrints record's document's subdirectory under its revisions subdirectory.  Here they appear as 1.xml, 2.xml, etc. rather than daatobj.xml. where the number is the revision number of the history record for that EPrints.

 

History revision files are a snapshot in time for the metadata of that EPrints record.

 

Regards

 

David Newman

 


From: eprints-tech-request@ecs.soton.ac.uk <eprints-tech-request@ecs.soton.ac.uk> on behalf of kralizeck@gmail.com <kralizeck@gmail.com>
Sent: Thursday, May 8, 2025 7:05:09 PM
To: eprints-tech@ecs.soton.ac.uk <eprints-tech@ecs.soton.ac.uk>
Subject: [EP-tech] Thousands of dataobj.xml files

 

CAUTION: This e-mail originated outside the University of Southampton.

CAUTION: This e-mail originated outside the University of Southampton.

Hi.

I have EPrints 3.4.6 on the latest AlmaLinux and Apache. I upgraded from EPrints 3.3.12 on a very old Ubuntu.

 

I get 77414 files when I go to "Manage records->Files" and filter by name "dataobj.xml". (a total of 119968 files without filters).

 

Modifications date from 2010 (first eprints installation by other guys) until now (I took control to upgrade from 3.3.12 to 3.4.6 a few weeks ago).

 

I've searched for information, but haven't found anything.

 

All .xml have the same content when I export it with Atom (url edited):

<?xml version="1.0" encoding="utf-8" ?>
<entry>
  <id>https://mysite-url/id/file/136799</id>
  <title>dataobj.xml</title>
  <link rel="alternate"/>
</entry>

 

There is no dataobj.xml in the filesystem, so I assume they are in the database.

 

I would appreciate any help or recommendations to investigate this issue and my doubts:

  • For what and how are those .xml generated?
  • Do they serve any purpose?
  • Can I stop their generation?
  • Can I delete them? Any batch system to delete them?


Thanks and best regards.
Fernando Hdez.