-
Task
-
Resolution: Fixed
-
Normal
Since last week's deployment of MBS-10416, invalid characters can no longer be entered. It's time to remove invalid characters from existing characters. These can be found in the database using the following query adapted from remove_invalid_characters subroutine:
SELECT * FROM annotation WHERE text ~ '[\x0-\x8\xB-\xC\xE-\x1F\xD800-\xDFFF\xFDD0-\xFDEF\xFEFF\xFFFE-\xFFFF\x1FFFE-\x1FFFF\x2FFFE-\x2FFFF\x3FFFE-\x3FFFF\x4FFFE-\x4FFFF\x5FFFE-\x5FFFF\x6FFFE-\x6FFFF\x7FFFE-\x7FFFF\x8FFFE-\x8FFFF\x9FFFE-\x9FFFF\xAFFFE-\xAFFFF\xBFFFE-\xBFFFF\xCFFFE-\xCFFFF\xDFFFE-\xDFFFF\xEFFFE-\xEFFFF\xF0000-\xFFFFF\x100000-\x10FFFF]';
There are 139 characters to be removed:
Count | Codepoint |
---|---|
6 | U+02 |
41 | U+03 |
4 | U+05 |
23 | U+08 |
2 | U+0B |
2 | U+10 |
1 | U+18 |
6 | U+19 |
3 | U+1C |
2 | U+1D |
Among 70 annotations:
Entity type | Count |
---|---|
area | 0 |
artist | 6 |
event | 0 |
instrument | 0 |
label | 0 |
place | 0 |
recording | 14 |
release | 44 |
release-group | 0 |
series | 0 |
work | 6 |
Attached is a dump of annotations where invalid characters have been replace with their codepoint in the form U+{HEXA}.
- is related to
-
MBS-10416 Prevent entering control character in annotation
- Closed