EPrints Technical Mailing List Archive

See the EPrints wiki for instructions on how to join this mailing list and related information.

Message: #04740


< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First

[EP-tech] Re: Merging two eprints using 'succeeds'


Just for the record, yes, that bit of XML does work - it really was that simple :-) 

</FamousLastWords>

Andy

barebones TEST Script:  Minimal error checking or safety checks
Would need another section to set <metadata-visibility> on the old ID
================================================================8<
$tmpDIR = "/tmp/";
$username='xxxxxxxxxxx';$password='yyyyyyyyyyyy'; 
$oldID=$_REQUEST['oldID'];
$newID=$_REQUEST['newID'];



$XML=<<< EOX
<?xml version="1.0" encoding="UTF-8"?>
<eprints xmlns="http://eprints.org/ep2/data/2.0">
    <eprint id="http://foo.bar.ac.uk/id/eprint/$newID"><!-- not sure this is necessary, but haven't tested without it - I think the CURLOPT_URL may be sufficient —>
        <eprintid>$newID</eprintid>                                <!--  ditto   —>
        <succeeds>$oldID</succeeds>
    </eprint>
</eprints>
EOX;

$ch = curl_init();
curl_setopt($ch, CURLOPT_USERPWD, $username . ":" . $password);
$tmpFILESIZE=strlen($XML);
$tmpFILE=$tmpDIR.$newID.'_succeeds_'.$oldID.'.xml';
file_put_contents($tmpFILE,$XML) or die("unable to write to file $tmpFILE");
curl_setopt($ch, CURLOPT_PUT,1);
 $handle = fopen($tmpFILE, "r");
curl_setopt($ch,CURLOPT_INFILE,$handle);
curl_setopt($ch,CURLOPT_INFILESIZE,$tmpFILESIZE);

curl_setopt($ch, CURLOPT_URL, "http://eprints.foo.bar.ac.uk/id/eprint/$newID");

curl_setopt($ch, CURLOPT_HEADER, 1);

if(!($newID && $oldID) ){die( "Needs old and new eprint IDs");}  # but doesn't check that either eprint ID actually exists on the server!

$pkgheader=Array('X-Packaging: http://eprints.org/ep2/data/2.0',
                 'Content-Type: text/xml',
                 'Metadata-Relevant: true',
                 'X-Verbose: true' ); #,               
curl_setopt($ch,CURLOPT_HTTPHEADER,$pkgheader);


#########################################################
($result=curl_exec($ch) )|| die( "curl_exec failed: ". curl_error($ch));
#########################################################

echo "RESULT=". $result;


curl_close($ch);
fclose($handle);
unlink($tmpFILE);

?>

>>> "Andy Reid" <Andy.Reid@lshtm.ac.uk> 21 September 2015 15:11 >>>
Hi,
I'm trying to work out how to thread two eprints so that I can merge the Accepted Manuscript record with Final published version, given that I have two separate systems feeding into the repository -  one feeding Manuscripts, one harvesting published metadata from PubMed etc.  I've looked at what happens when you create a linked new version of a record, with the <succeeds> set on the new record, and <metadata_visibility> set to no-search  on the old one. I'm in the slightly different position of wanting to merge two records once they are in the repository, but it seems like those are the fields I need to tweak.  I've looked at the code in https://github.com/eprints/eprints/blob/3.3/perl_lib/EPrints/DataObj/EPrint.pm 
to see the various in_thread manipulation functions, and can just about follow them. 

What I can't find is an admin utility that says 'Merge these two records, making this one the version of record, and retaining that one as an earlier version.'  Am I not looking hard enough?

What I might also want to do is to trigger this from the external systems, via SWORD, sending a couple of minimal XML packages to modify each of the records:

<?xml version="1.0" encoding="UTF-8"?>
<eprints xmlns="http://eprints.org/ep2/data/2.0"><eprint id="http://blah.ac.uk/id/eprint/991329"><eprintid>991329</eprintid><succeeds>         991328</succeeds></eprint></eprints>
and likewise for the <metadata-visiblility> on the old record**
Does that seem like it ought to work?  It seems too easy.  Am I missing deeper layers of subtlety that are going to get corrupted by my naive approach.
Also, the eprint id is given twice in a standard eprint XML export, as per the edited code above.  Is it necessary to have both to trigger an update correctly?
Andy Reid
** (I say 'old record' but in reality the published versions may arrive before the manuscript versions, which depend on the authors sending them and us having the resources to process them ;-/ )

Andy Reid
Research Information Manager
Room G43, Executive Office
London School of Hygiene & Tropical Medicine
Keppel St
LONDON WC1E 7HT
+44 020-7927-2618