Uploaded image for project: 'MusicBrainz Batch Edits'
  1. MusicBrainz Batch Edits
  2. MBBE-39

Remove invalid characters from existing annotations

XMLWordPrintable

    • Icon: Task Task
    • Resolution: Fixed
    • Icon: Normal Normal
    • Annotation

      Since last week's deployment of MBS-10416, invalid characters can no longer be entered. It's time to remove invalid characters from existing characters. These can be found in the database using the following query adapted from remove_invalid_characters subroutine:

      SELECT *
      FROM annotation
      WHERE text ~ '[\x0-\x8\xB-\xC\xE-\x1F\xD800-\xDFFF\xFDD0-\xFDEF\xFEFF\xFFFE-\xFFFF\x1FFFE-\x1FFFF\x2FFFE-\x2FFFF\x3FFFE-\x3FFFF\x4FFFE-\x4FFFF\x5FFFE-\x5FFFF\x6FFFE-\x6FFFF\x7FFFE-\x7FFFF\x8FFFE-\x8FFFF\x9FFFE-\x9FFFF\xAFFFE-\xAFFFF\xBFFFE-\xBFFFF\xCFFFE-\xCFFFF\xDFFFE-\xDFFFF\xEFFFE-\xEFFFF\xF0000-\xFFFFF\x100000-\x10FFFF]';
      

      There are 139 characters to be removed:

      Count Codepoint
      6 U+02
      41 U+03
      4 U+05
      23 U+08
      2 U+0B
      2 U+10
      1 U+18
      6 U+19
      3 U+1C
      2 U+1D

      Among 70 annotations:

      Entity type Count
      area 0
      artist 6
      event 0
      instrument 0
      label 0
      place 0
      recording 14
      release 44
      release-group 0
      series 0
      work 6

      Attached is a dump of annotations where invalid characters have been replace with their codepoint in the form U+{HEXA}.

            yvanzo yvanzo
            yvanzo yvanzo
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved:

                Version Package