On Aug. 5, the Federal Communications Commission announced the bulk release of the comments from its largest-ever public comment collection. We’ve spent the last three weeks cleaning and preparing the data and leveraging our experience in machine learning and natural language processing to try and make sense of the hundreds-of-thousands of comments in the docket. Here is a high-level overview, as well as our cleaned version of the full corpus which is available for download in the hopes of making further research easier.
Our first exploration uses natural language processing techniques to identify topical keywords within comments and use those keywords to group comments together. We analyzed a corpus of 800,959 comments. Some key findings:
- We estimate that less than 1 percent of comments were clearly opposed to net neutrality1.
- At least 60 percent of comments submitted were form letters written by organized campaigns (484,692 comments); while these make up the majority of comments, this is actually a lower percentage than is common for high-volume regulatory dockets.
- At least 200 comments came from law firms, on behalf of themselves or their clients.
Below is an interactive visualization that lets you explore these groupings and view individual comments within the groups.
Read the full article @ The Sunlight Foundation.