EPrints Technical Mailing List Archive
See the EPrints wiki for instructions on how to join this mailing list and related information.
Message: #05490
< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First
Re: [EP-tech] subject dataset - removing subjectid from eprint
- To: "eprints-tech@ecs.soton.ac.uk" <eprints-tech@ecs.soton.ac.uk>
- Subject: Re: [EP-tech] subject dataset - removing subjectid from eprint
- From: Adam Field <Adam.Field@jisc.ac.uk>
- Date: Fri, 11 Mar 2016 14:52:15 +0000
Hi Monica
I'm not saying it would be quick, but I'd be surprised it it really took an infeasible amount of time, even on a large repository. Loading records is fairly lightweight and trivial -- it's writing
that takes time, and that would only happen for records that were changed by the script.
As you've identified, EPrints is trying to be 'clever' with the subject by searching for items at that level or below. Now that the subject in question has been removed from the tree, this may
be what's causing the problem. Three solutions I would consider:
* Do a record by record iterative search over the repository.
* Reinstate the subject id using the subject editor, run the script, then remove it from the tree.
* Identify the eprintids of items that have that subject set using a mysql query, write them to a file, then write a script to load and modify each of those eprints.
From: <eprints-tech-bounces@ecs.soton.ac.uk> on behalf of Monica Wood <monica.wood@utas.edu.au>
Reply-To: "eprints-tech@ecs.soton.ac.uk" <eprints-tech@ecs.soton.ac.uk> Date: Thursday, 10 March 2016 23:12 To: "eprints-tech@ecs.soton.ac.uk" <eprints-tech@ecs.soton.ac.uk> Subject: Re: [EP-tech] subject dataset - removing subjectid from eprint Hi Adam,
I believe changing the search would return all the eprint items in the repository?
We have a massive repository, so I this wouldn’t be a good option.
I have now done a bulk change and set the collections metafield as empty across all thesis item types.
However to help with debugging the script, I ran it with the args: FIELDNAME = collections and SUBJECTID = theses . If either of these were incorrect the script would have returned an error.
I only did the dry-run to see what it would output, but it never got to the bit of the script where it printed anything out, which is why I’m assuming the search returned no results, therefore $list is empty.
As in my previous email, I stated I put the noise level up to 3 so I could find out exactly what was happening and this was the Output:
Starting EPrints Repository.
Connecting to DB ... Database execute debug: SET NAMES 'utf8'
done.
Database execute debug:
SELECT `eprint`.`eprintid`
FROM `eprint`, `eprint_collections` AS `eprint_collections`, `subject_ancestors` AS `127395456subject_ancestors`
WHERE `eprint`.`eprintid`=`eprint_collections`.`eprintid`
AND `eprint_collections`.`collections`=`127395456subject_ancestors`.`subjectid`
AND `127395456subject_ancestors`.`ancestors` = 'theses'
GROUP BY `eprint`.`eprintid`
Ending EPrints Repository.
As you can see, it’s only returning those that match the eprint_collections.collections and the subject_ancestors.subjectid. As I had removed the node ‘theses’ from the subject tree, it’s giving back no results
from this query.
I’m wondering if something should be added to the UNLINK function in the Subject Tree, that when you remove a node for good from the subject tree than any matching metafields are also removed from the records?
Monica Wood
From: <eprints-tech-bounces@ecs.soton.ac.uk> on behalf of Adam Field <Adam.Field@jisc.ac.uk>
Reply-To: "eprints-tech@ecs.soton.ac.uk" <eprints-tech@ecs.soton.ac.uk> Date: Friday, 11 March 2016 at 12:21 AM To: "eprints-tech@ecs.soton.ac.uk" <eprints-tech@ecs.soton.ac.uk> Subject: Re: [EP-tech] subject dataset - removing subjectid from eprint I would suggest running the script over the whole repository.
Looking at John's script, change this:
my $list = $session->get_repository->dataset( 'eprint' )->search( filters => [ { meta_fields => [ $fieldname ], value => $subjectid }] ); To this:
my $list = $session->dataset('eprint')->search();
...and see what happens.
(though I agree with John that this shouldn't really make a difference). If it doesn't work, please post exactly what you typed on the command-line to invoke the script.
From: <eprints-tech-bounces@ecs.soton.ac.uk> on behalf of John Salter <J.Salter@leeds.ac.uk>
Reply-To: "eprints-tech@ecs.soton.ac.uk" <eprints-tech@ecs.soton.ac.uk> Date: Wednesday, 9 March 2016 06:36 To: "eprints-tech@ecs.soton.ac.uk" <eprints-tech@ecs.soton.ac.uk> Subject: Re: [EP-tech] subject dataset - removing subjectid from eprint
Interesting...
You could try adding the subject back into the tree temporarily to see if it works that way? Using this script should cause any affected EPrints' summary pages to be regenerated - if you alter the database directly, you'd have to do this by running bin/generate_abstracts. Cheers, John From:eprints-tech-bounces@ecs.soton.ac.uk <eprints-tech-bounces@ecs.soton.ac.uk>
on behalf of Monica Wood <monica.wood@utas.edu.au>
Sent: 09 March 2016 04:33:34 To: 'eprints-tech@ecs.soton.ac.uk' Subject: Re: [EP-tech] subject dataset - removing subjectid from eprint
Hi John,
Thanks for linking me to this script.
I’ve had a look through it and tried it out, but it’s not working. I believe this is because I’ve already removed the node from the subject tree (Unlinked it from the tree).
Putting the noise level up on the script to 3 gives me some feedback on a query it’s doing at I believe this line?
my $list = $session->get_repository->dataset( 'eprint' )->search( filters => [ { meta_fields => [ $fieldname ], value => $subjectid } This query is (with filename set to collections and subjectid set to theses)
Database execute debug: SELECT `eprint`.`eprintid` FROM `eprint`, `eprint_collections` AS `eprint_collections`, `subject_ancestors` AS `127395456subject_ancestors` WHERE `eprint`.`eprintid`=`eprint_collections`.`eprintid` AND `eprint_collections`.`collections`=`127395456subject_ancestors`.`subjectid` AND `127395456subject_ancestors`.`ancestors` = 'theses' GROUP BY `eprint`.`printed`
This is returning an empty list, as the theses subjectid no longer exists in subject_ancestors, but it does still exist in eprint_collections. I’ll have a go at bulk changing the records from the GUI, if that doesn’t work out, I’ll do a bulk change directly in the database by removing the entries in eprint_collections that point to the theses subjectid. Cheers, Monica Wood
Library Systems Officer
From: <eprints-tech-bounces@ecs.soton.ac.uk> on behalf of John Salter <J.Salter@leeds.ac.uk>
Reply-To: "'eprints-tech@ecs.soton.ac.uk'" <eprints-tech@ecs.soton.ac.uk> Date: Tuesday, 8 March 2016 at 10:06 PM To: "'eprints-tech@ecs.soton.ac.uk'" <eprints-tech@ecs.soton.ac.uk> Subject: Re: [EP-tech] subject dataset - remove_field Hi Monica, I think your suggestion will remove the field itself, rather than a specific value stored in that field. I’ve done something similar – just added it to the wiki for you: https://wiki.eprints.org/w/Remove_subjectid_script Let me know if it doesn’t work for you. Cheers, John From:eprints-tech-bounces@ecs.soton.ac.uk
[mailto:eprints-tech-bounces@ecs.soton.ac.uk]
On Behalf Of Monica Wood Hi there, In our repository we have a root subject called ‘Collections’ Under this I have unlinked(deleted) a child of Collections. I now have the issue that all items that were connected to this collection still have the metadata saying so and on our summary page we display the collection
an item belongs to. So it’s now showing ‘??colllectionName??’ as a link and that link is now dead. Is there a way to delete these connections without needing to do it directly through the database? I was wondering if the epadmin remove_field might do the job on the subject dataset? Something like: ~/bin/epadmin remove_field repoid subject collectionid ?? Thanks in advanced
Monica Wood Available Times Tues: 9am – 5pm Wed: 1pm – 5pm Fri: 9am – 5pm
Jisc is a registered charity (number 1149740) and a company limited by guarantee which is registered in England under Company No. 5747339, VAT No. GB 197 0632 86. Jisc’s registered office is: One Castlepark, Tower Hill, Bristol, BS2 0JA. T 0203 697 5800. Jisc Services Limited is a wholly owned Jisc subsidiary and a company limited by guarantee which is registered in England under company number 2881024, VAT number GB 197 0632 86. The registered office is: One Castle Park, Tower Hill, Bristol BS2 0JA. T 0203 697 5800. |
- References:
- Re: [EP-tech] subject dataset - removing subjectid from eprint
- From: Monica Wood <monica.wood@utas.edu.au>
- Re: [EP-tech] subject dataset - removing subjectid from eprint
- From: John Salter <J.Salter@leeds.ac.uk>
- Re: [EP-tech] subject dataset - removing subjectid from eprint
- From: Adam Field <Adam.Field@jisc.ac.uk>
- Re: [EP-tech] subject dataset - removing subjectid from eprint
- From: Monica Wood <monica.wood@utas.edu.au>
- Re: [EP-tech] subject dataset - removing subjectid from eprint
- Prev by Date: Re: [EP-tech] email to editor immediately after deposit
- Next by Date: [EP-tech] Database migration: display a 503 message
- Previous by thread: Re: [EP-tech] subject dataset - removing subjectid from eprint
- Next by thread: [EP-tech] EPrints Quick Search Results
- Index(es):