A review of those Wikimedia Commons URLs in the database that don't follow the standard structure reveals some common issues that the cleanup code should handle:
- The overlay image viewer uses a new URL structure, e.g. https://commons.wikimedia.org/wiki/Category:Natalie_Merchant#/media/File:NatalieMerchant.jpg, which needs to be cleaned up into https://commons.wikimedia.org/wiki/File:NatalieMerchant.jpg.
- Sometimes, the File: part of the URL appears percent-encoded: File%3A
- Pages that aren't a media page (such as gallery and category pages) should be blocked.