The main issue is actually data. I've hand labeled much of my training set for my website:

For a problem such as argument identification (which I think is totally possible), it would likely take a hundred thousand or more labeled pieces of data. Meaning it would likely cost thousands of dollars.

Now... if you have people who flag content provide a reason, you have the labels generated for you ;)

Good point. Good idea. Might even have people guiding the process with many folks in decent communities doing it over time. I imagine almost all decent ones, regardless of group norms, will be against a specific subset of behavior such as trolling, obvious words denoting hate, and specific patterns of heated, pointless argument. So, even a lowest, common denominator among many forums downvotes and flags might be useful. Then, forum-specific methods take over from there.


