I agree completely. All I can say is that I think it's even worse than you make out!
I'm impressed you thought of negatives I didn't, because I'm a real nitpicker, but I have a big one you missed - code quality.
Sturgeon's Law says that 90% of everything is crap. This very strongly applies to GitHub code.
What happens when you train a machine learning system on a huge corpus, 90% of which is crap? What happens is that the bad code drives out the good.
It isn't just bugs, which you mention - it's antipatterns, slow algorithms, bad variable names, or other quality defects.
I have to say that machine learning has been steadily losing its lustre for me after an initial thrill. There are just too many projects like this - "Let's throw a lot of shit and a few pieces of chocolate into this ML vat, and we can make a cake out of it!"