laarcnew | comments | discord | tags | ask | show | place | submitlogin
Show laarc: Sentence Classifications with Neural Networks (github.com)
6 points by lettergram to news show 566 days ago | 5 comments


3 points by shawn 566 days ago

Lettergram just showed me https://hnprofile.com/compare which is a ridiculously impressive example of sentence classification in action. I urged him to show off some of it here.

Personally, I've wanted to know: How long have you been working with AI to get to this point? The site's a bit magic to me.

I spent some time studying a few fundamentals, like Mining of Massive Datasets, Introduction to Data Mining, and the classic "Consumer Credit Risk Models via Machine Learning" https://dspace.mit.edu/openaccess-disseminate/1721.1/66301

(That "classic" comment was tongue in cheek. I have no idea what I'm doing in this space and mostly do a random walk through stuff that seems like it might be useful.)

Mining of Massive Datasets: http://infolab.stanford.edu/~ullman/mmds/book.pdf

Intro to data mining: https://www-users.cs.umn.edu/~kumar001/dmbook/sol.pdf

That last one was from 2006, but people have said the new edition is good, so maybe it's still somewhat relevant.

Mostly was just curious if you knew where to start, and about how you started.

-----


> How long have you been working with AI to get to this point?

I've been working on neural networks since 2014, so five years or so. However, at the time I was building them from scratch in OpenCL and CUDA. Now, it would take significantly less time to get up to speed.

I highly recommend Coursera's machine learning course.

-----


Random thought: I wonder if sentence classification could be a low-cost tool for moderators in a system that tries to spot potential arguments. Start with a baseline of what's on most threads. Then, a flurry of exclamatory and interrogative might indicate one form of argument.

I'm sure there's high, false positives. One could always use more reliable, costlier techniques on whatever these quick and dirty methods draw attention, too. A few stages in the process before human sees it. Votes and flags would obviously still be there. This is just an extra, preemptive technique.

-----


The main issue is actually data. I've hand labeled much of my training set for my website: https://hnprofile.com/

For a problem such as argument identification (which I think is totally possible), it would likely take a hundred thousand or more labeled pieces of data. Meaning it would likely cost thousands of dollars.

Now... if you have people who flag content provide a reason, you have the labels generated for you ;)

-----


Good point. Good idea. Might even have people guiding the process with many folks in decent communities doing it over time. I imagine almost all decent ones, regardless of group norms, will be against a specific subset of behavior such as trolling, obvious words denoting hate, and specific patterns of heated, pointless argument. So, even a lowest, common denominator among many forums downvotes and flags might be useful. Then, forum-specific methods take over from there.

-----




Welcome | Guidelines | Bookmarklet | Feature Requests | Source | Contact | Twitter | Lists

RSS (stories) | RSS (comments)

Search: