Evaluating Chain-of-Thought Monitorability With Debugged Pro