EPrints Technical Mailing List Archive
See the EPrints wiki for instructions on how to join this mailing list and related information.
Message: #09573
< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First
Re: [EP-tech] Indexing - cleanup indexed terms after mass deletions
- To: <eprints-tech@ecs.soton.ac.uk>
- Subject: Re: [EP-tech] Indexing - cleanup indexed terms after mass deletions
- From: Matthew Kerwin <matthew@kerwin.net.au>
- Date: Tue, 30 Jan 2024 13:46:48 +1000
CAUTION: This e-mail originated outside the University of Southampton. Hi Matt, On Tue, 30 Jan 2024 at 09:31, Matthew Brady <Matthew.Brady@unisq.edu.au> wrote: > > Hi All, > > Our original repo, houses traditional outputs (Articles, conference papers etc.) as well as Theses… > We have split the Theses into a dedicated repo, cloning the original system (metadata and files), and then removed the non-theses (search->batch edit->remove all records). > > I have noticed that there are entries in the various database index tables, referring to eprints that are no longer in the system… > I have run epadmin reindex over ‘<repo> eprint’ and ‘<repo> document’, but the indexed values persist… > > e.g. eprint__index contains a fieldword = ‘title:elephant’ with ids = ‘:12345:’ but there is no eprint 12345 in the system any longer. > > I thought the permanent removal of the non-theses items would have cleaned up the index tables as process occurred? > > Any thoughts appreciated. > > Cheers, > Matt > In this particular case, is the 'title:elephant' associated with any of your theses, or _only_ with deleted records? Because if it's the latter, then the row is orphaned – it has no inward referential links – so any reindexing task that is built around "foreach(eprint)" rather than "foreach(tablerow)" won't even see the row in question, so won't know to clean it up. We should probably have a look at the remove/delete routines and see how deep they go into cleaning up index tables, filesystem directories, view pages, etc. Off the top of my head I don't know at all, I'm afraid. I assume "not very deep." For what it's worth, in moments of questionable judgement I have purged our repository's various _index, _rindex, and _orderval tables and triggered the appropriate reindexing/reordering tasks manually. It doesn't seem to have caused any problems after the fact. Cheers -- Matthew Kerwin https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmatthew.kerwin.net.au%2F&data=05%7C02%7Ceprints-tech%40ecs.soton.ac.uk%7C5f9cf25386cd4c452baf08dc21461d9a%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C638421832634796844%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C60000%7C%7C%7C&sdata=SY8j7WlOYPAq4B9ccrkdSHsW9q2qPYbih8co7C53s0M%3D&reserved=0
- Follow-Ups:
- Re: [EP-tech] Indexing - cleanup indexed terms after mass deletions
- From: David R Newman <drn@ecs.soton.ac.uk>
- Re: [EP-tech] Indexing - cleanup indexed terms after mass deletions
- References:
- [EP-tech] Indexing - cleanup indexed terms after mass deletions
- From: Matthew Brady <Matthew.Brady@unisq.edu.au>
- [EP-tech] Indexing - cleanup indexed terms after mass deletions
- Prev by Date: [EP-tech] Indexing - cleanup indexed terms after mass deletions
- Next by Date: [EP-tech] Problem uploading file
- Previous by thread: [EP-tech] Indexing - cleanup indexed terms after mass deletions
- Next by thread: Re: [EP-tech] Indexing - cleanup indexed terms after mass deletions
- Index(es):