-
Task
-
Resolution: Fixed
-
Normal
-
None
-
None
-
None
In https://github.com/metabrainz/listenbrainz-server/pull/1679 and https://github.com/metabrainz/listenbrainz-server/pull/1693 we made the API prevent the submission of listens from before 2002.
We discovered that last.fm returns many obviously invalid dates in their API. It's not clear if they're just reporting the value that was originally sent, or not.
This means that for uses who imported their last.fm collection before the above PRs, we have a lot of data with bad dates (ranging from 1900 to 1970 and more). Some users have 10s or 100s of thousands of listens in this case.
Many of the users with such listens are both actively submitting new data to LB, and also have logged in recently. While these listens exist in the database, stats only include data from post-2002.
We should delete this data from the database, but we need to decide on how we want to do this:
- Should we inform users who have this bad data? What should our options be to help them to fix it? Deleting the data and re-importing from lfm is still going to result in bad data in their account
- Should we actively reach out to users, or just put a warning on their account that they see when they log in? (a log in warning requires new db cols/tables, etc)
- Should we just make a dump of these listens, delete it, and wait for users to complain about missing data?
- How should we update/reimport this data for users if they want it?