Uploaded image for project: 'MusicBrainz Server'
  1. MusicBrainz Server
  2. MBS-6929

Support coordinates from the Japanese Wikipedia

XMLWordPrintable

    • Icon: Improvement Improvement
    • Resolution: Fixed
    • Icon: Normal Normal
    • 2013-11-25
    • None
    • Editing interface
    • None

      Coordinates on the Japanese Wikipedia are written using Japanese characters, rather than the usual degrees, minutes and seconds signs. Some examples:
      北緯43度2分39.22秒 東経141度21分9.77秒
      南緯22度54分30秒 西経43度11分47秒
      北緯35度39分59.81秒東経139度44分29.06秒

      The important characters: 度 is degrees, 分 is minutes and 秒 is seconds. 北 is north, 南 is south, 東 is east and 西 is west.
      There doesn't have to be any spaces or separators.

      I wrote a quick bit of Perl to parse them which might be useful:

      use utf8;
      binmode STDOUT, ":utf8";
      
      my @coords = (
          "北緯43度2分39.22秒 東経141度21分9.77秒",
          "南緯22度54分30秒 西経43度11分47秒",
          "北緯35度39分59.81秒東経139度44分29.06秒",
          "北緯35度39分59.81秒 東経139度44分29.06秒",
      );
      
      for my $coord (@coords) {
          $coord =~ tr/ .0-9/ .0-9/; # replace fullwidth characters with normal ASCII
          $coord =~ s/(北|南)緯 *([0-9.]+)度 *([0-9.]+)分 *([0-9.]+)秒 *(東|西)経 *([0-9.]+)度 *([0-9.]+)分 *([0-9.]+)秒/$2° $3' $4" $1, $6° $7' $8" $5/;
          $coord =~ tr/北南東西/NSEW/; # replace direction characters
      
          print "$coord\n";
      }
      

      alternatively, the last section without the Unicode characters:

      for my $coord (@coords) {
          $coord =~ tr/\x{3000}\x{FF0E}\x{FF10}-\x{FF19}/ .0-9/; # replace fullwidth characters with normal ASCII
          $coord =~ s/(\x{5317}|\x{5357})\x{7DEF} *([0-9.]+)\x{5EA6} *([0-9.]+)\x{5206} *([0-9.]+)\x{79D2} *(\x{6771}|\x{897F})\x{7D4C} *([0-9.]+)\x{5EA6} *([0-9.]+)\x{5206} *([0-9.]+)\x{79D2}/$2\x{00B0} $3' $4" $1, $6\x{00B0} $7' $8" $5/;
          $coord =~ tr/\x{5317}\x{5357}\x{6771}\x{897F}/NSEW/; # replace direction characters
      
          print "$coord\n";
      }
      

      I haven't actually come across any using fullwidth ASCII characters, but I included stuff to handle that anyway just in case.

            ianmcorvidae Ian McEwen
            nikki nikki
            Votes:
            1 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved:

                Version Package
                2013-11-25