-
Bug
-
Resolution: Fixed
-
Normal
-
None
-
None
-
None
-
Stats Sprint -- 2020-06-02
Getting this error when calculating all time user stats
20/05/26 17:20:14 ERROR TaskSchedulerImpl: Lost executor 0 on 10.0.1.100: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages. 20/05/26 17:20:15 ERROR TaskSchedulerImpl: Lost executor 1 on 10.0.1.101: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages. 20/05/26 17:20:16 ERROR TaskSchedulerImpl: Lost executor 2 on 10.0.1.99: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
I have a feeling we could fix this by tuning the spark cluster but it'll eventually be a problem anyways.
We should do the following:
- Make all time release stats an independent command so that we can test it without needing to calculate artist stats at the same time.
- Only get around a 100-200 releases per user. The query will get complicated, but that seems a reasonable compromise.