[PICARD-507] After commit ab32813972382822797657b77668cae269811568 tracks are added to wrong release

Type: Bug
Resolution: Unresolved
Priority: Normal
Fix Version/s: None
Affects Version/s: 1.2
Component/s: File Move & Rename, Lookup & Match
Labels:
- test-on-picard-2
Environment:
FreeBSD 9-STABLE

I've ripped two CD's, Paste Magazine Sampler #48 and #41.

With Picard 1.1 I can load them, cluster them, look-them-up, and
everything works.

With the git master I load and cluster them. Looking up #48 works as
expected, but when I look up #41 its tracks get crammed (poorly) into
Album #48. Similar misbehavior happens if I load #41 first and #48
second.

I used git bisect to walk the commits between 1.1 and the tip of the
git tree and discovered that commit
ab32813972382822797657b77668cae269811568 caused the problem.

The bit that seems to be causing the change in behavior is in
metadata.py:compare_to_release, at the very end of the method, where
the score for the release is given a boost if it is in a release group
that has already been loaded. cluster.py:_compare_to_release calls
metadata.py:compare_to_release when it's trying to figure out where to
put its tracks.

This change is sufficient to cause #48
release: u'90b66a30-878b-4c32-a943-7f87842d2a2c'
release group: u'826bc83c-069b-3e09-aaea-a44ebb2dbcda'
to appear to be a better match for #41 than its real release
release: u'30979257-92ba-455f-8423-7b7509090165'
release group: u'14338bba-34af-3a87-9932-c2c55f96b3ee'

I'll attach three debugging traces ( from./tagger.py -d): good-trace,
bad.trace, and bad-trace-patched-to-work.trace which are for the good
commit, bad commit and (yep...) the bad commit with metadata.py lines
138-140 commented out.

Please ignore the register_save_file_processor error, that's my
transcoding plugin that's only supported in my code.

All three traces have this snippet in cluster.py:_lookup_finished,
around line 186 (right after the commented out debug statement)

for match in matches:
self.log.debug("Track id=%r, score=%r, ", match[1].id, match[0])

so that the tracks and their scores appear in the trace files.

I don't have enough background to understand why the fact that one of
the candidate releases being considered for a cluster has already been
loaded should matter. It certainly messes up this case.

So far, running with metadata.py lines 138-140 commented out is
working for me, but I don't really understand the ramifications.

Anyone have any insights?

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

bad.trace
38 kB
2013-06-29 19:39
bad-patched-to-work.trace
42 kB
2013-06-29 19:39
good.trace
42 kB
2013-06-29 19:39

is related to

PICARD-665 Web service calls to ports 80 and 443 do not need explicit port specification. 443 should be automatically made https.

Closed

George Hartzell added a comment - 2014-06-09 22:10

I'm cleaning up old TODO list items and stumbled across this.

Has it been dealt with?

George Hartzell added a comment - 2014-06-09 22:10 I'm cleaning up old TODO list items and stumbled across this. Has it been dealt with?

Michael Wiencek added a comment - 2013-06-30 00:38

I'm leaning towards a modified version of the first solution: check if the release group of the top match contains a loaded release, AND check if the recording we've matched to appears on the loaded release.

Michael Wiencek added a comment - 2013-06-30 00:38 I'm leaning towards a modified version of the first solution: check if the release group of the top match contains a loaded release, AND check if the recording we've matched to appears on the loaded release.

Michael Wiencek added a comment - 2013-06-30 00:34

Thanks for the detailed bug report.

Giving already-loaded releases more weight was intended to solve one of the most common complaints about the "Lookup" feature, which was that tracks from the same album would often get scattered among 2-4 different versions from the same release group. Commit ab32813972382822797657b77668cae269811568 solved that very effectively (I can easily reproduce the old, bad behavior by commenting out those lines), but you're right that it goes haywire in this case.

The easiest way to solve this is to remove the lines you suggested, so that the "similarity" weighting isn't skewed. But /afterwards/ check if the top match's release group contains an already-loaded release, and use that instead. This has its own problem: one example would be where the recording is a bonus track that doesn't appear on the already-loaded release.

Another solution is to let things match and become scattered as they previously would, but try to automatically merge albums from the same RG /after/ they're loaded, if the combined tracks/recordings all appear on a certain version and don't otherwise conflict. This could prove difficult to implement cleanly, and I guess the user could become confused that loaded albums are disappearing (assuming they even notice).

We can't make things work perfectly in all cases, but a balance needs to be struck here with accuracy vs. usefulness.

Michael Wiencek added a comment - 2013-06-30 00:34 Thanks for the detailed bug report. Giving already-loaded releases more weight was intended to solve one of the most common complaints about the "Lookup" feature, which was that tracks from the same album would often get scattered among 2-4 different versions from the same release group. Commit ab32813972382822797657b77668cae269811568 solved that very effectively (I can easily reproduce the old, bad behavior by commenting out those lines), but you're right that it goes haywire in this case. The easiest way to solve this is to remove the lines you suggested, so that the "similarity" weighting isn't skewed. But /afterwards/ check if the top match's release group contains an already-loaded release, and use that instead. This has its own problem: one example would be where the recording is a bonus track that doesn't appear on the already-loaded release. Another solution is to let things match and become scattered as they previously would, but try to automatically merge albums from the same RG /after/ they're loaded, if the combined tracks/recordings all appear on a certain version and don't otherwise conflict. This could prove difficult to implement cleanly, and I guess the user could become confused that loaded albums are disappearing (assuming they even notice). We can't make things work perfectly in all cases, but a balance needs to be struck here with accuracy vs. usefulness.

Details

Description

Attachments

Attachments

Issue Links

Activity

Collapse comment: George Hartzell added a comment - 2014-06-09 22:10

Expand comment: George Hartzell added a comment - 2014-06-09 22:10

Collapse comment: Michael Wiencek added a comment - 2013-06-30 00:38

Expand comment: Michael Wiencek added a comment - 2013-06-30 00:38

Collapse comment: Michael Wiencek added a comment - 2013-06-30 00:34

Expand comment: Michael Wiencek added a comment - 2013-06-30 00:34

People

Dates

Packages