EPrints Technical Mailing List Archive

See the EPrints wiki for instructions on how to join this mailing list and related information.

Message: #09862


< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First

Re: [EP-tech] Encoding for database tables - utf8mb4


CAUTION: This e-mail originated outside the University of Southampton.

Hi John,

On Fri, 25 Oct 2024 at 22:37, John Salter <J.Salter@leeds.ac.uk> wrote:
>
> Hi,
> Currently my EPrints database still uses utf8, rather than utf8mb4.
>
> There are a couple of fields (title, abstract) that I would like to update to utf8mb4 – before (at some point in the future) updating the whole database to utf8mb4.
> Has anyone done anything similar – changing a limited subset of columns - or should I just try and do the whole database at once?

I definitely updated ours at some point, many years ago. I think I did
the whole database, but if I recall correctly it required a lot of
typing in a terminal because I had to change each column of each
table, and collations, and various other things. I also changed the
code somewhere around EPrints::Database (and possibly ::mysql) to
ensure it connected with the right encoding/charset as well. I can
probably dig out some more details once I'm back at work tomorrow, if
you need it.

> These instructions seem to cover what I need to do for individual columns:
> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwiki.eprints.org%2Fw%2FUnicode%23Managing_32-Bit_Unicode_Characters&data=05%7C02%7Ceprints-tech%40ecs.soton.ac.uk%7Ca1177dccb4ed458b29e008dcf680fb8f%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C638656281753018196%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C40000%7C%7C%7C&sdata=tc1IBYf1TVfIaVLxvqeJcEgHRdrD%2FQT2Cgw%2BjGJAjkQ%3D&reserved=0
> and this is a useful guide: https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.eprints.org%2Feptech%2Fmsg07198.html&data=05%7C02%7Ceprints-tech%40ecs.soton.ac.uk%7Ca1177dccb4ed458b29e008dcf680fb8f%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C638656281756143231%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C40000%7C%7C%7C&sdata=DLrugBeTzp7izwVq4%2FTX5oc%2BRV5sM6Khd64VJoFrPDY%3D&reserved=0
>
> If the fields are indexed, these changes are probably also needed: https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.eprints.org%2Feptech%2Fmsg09275.html&data=05%7C02%7Ceprints-tech%40ecs.soton.ac.uk%7Ca1177dccb4ed458b29e008dcf680fb8f%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C638656281756143231%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C40000%7C%7C%7C&sdata=aVmZ%2BXl7tOVMwRk4advtGQDtt18%2F7ve%2FWk24qp9TEbk%3D&reserved=0.
>
> The reason I want to do it this way is:
> I have some external systems that want to push utf8_mb4 data into those fields
> If I try to update the whole DB at the moment, I hit the index length issues, so need more time to investigate/reconfigure/resolve these
>
> Any details from anyone who's update part, or all their DB provision to utf8mb4 welcomed!
>
> Cheers,
> John
>

Cheers
--
  Matthew Kerwin [he/him]
  https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmatthew.kerwin.net.au%2F&data=05%7C02%7Ceprints-tech%40ecs.soton.ac.uk%7Ca1177dccb4ed458b29e008dcf680fb8f%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C638656281756143231%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C40000%7C%7C%7C&sdata=Q5ZIdsYSTzPFUXS13auuOYgWRBhnQE0bdkbA%2B4ym2uQ%3D&reserved=0