Saving SWE-Bench: A Benchmark Mutation Approach for Realistic Agent Evaluation
Spandan Garg, Benjamin Steenhoek, Yufan Huang
ArXiv | October 2025, Vol abs/2510.08996
Spandan Garg, Benjamin Steenhoek, Yufan Huang
ArXiv | October 2025, Vol abs/2510.08996
Benjamin Steenhoek, Michele Tufano, Neel Sundaresan, Alexey Svyatkovskiy
DeepTest (ICSE Workshop) | April 2025
Benjamin Steenhoek, Siva Sivaraman, Renata Saldivar, Yevhen Mohylevskyy, Roshanak Zilouchian Moghaddam, Wei Le
2025 International Conference on Software Engineering | April 2025
Spandan Garg, Benjamin Steenhoek, Yufan Huang
ArXiv | October 2025, Vol abs/2510.08996
Benjamin Steenhoek, Michele Tufano, Neel Sundaresan, Alexey Svyatkovskiy
DeepTest (ICSE Workshop) | April 2025
Benjamin Steenhoek, Siva Sivaraman, Renata Saldivar, Yevhen Mohylevskyy, Roshanak Zilouchian Moghaddam, Wei Le
2025 International Conference on Software Engineering | April 2025
Benjamin Steenhoek, Michele Tufano, Neel Sundaresan, Alexey Svyatkovskiy
DeepTest (ICSE Workshop) | April 2025
Benjamin Steenhoek, Siva Sivaraman, Renata Saldivar, Yevhen Mohylevskyy, Roshanak Zilouchian Moghaddam, Wei Le
2025 International Conference on Software Engineering | April 2025
Benjamin Steenhoek, Siva Sivaraman, Renata Saldivar, Yevhen Mohylevskyy, Roshanak Zilouchian Moghaddam, Wei Le
2025 International Conference on Software Engineering | April 2025
Spandan Garg, Benjamin Steenhoek, Yufan Huang
ArXiv | October 2025, Vol abs/2510.08996
Benjamin Steenhoek, Michele Tufano, Neel Sundaresan, Alexey Svyatkovskiy
DeepTest (ICSE Workshop) | April 2025
Benjamin Steenhoek, Siva Sivaraman, Renata Saldivar, Yevhen Mohylevskyy, Roshanak Zilouchian Moghaddam, Wei Le
2025 International Conference on Software Engineering | April 2025