Analysis of Russian Troll Farm using Anomalous Detection

2 min readAug 17, 2019

Anomalous Detection in the Data Science domain is the identification of irregular patterns, outliers, or strange instances which raises suspicion due to their difference they exhibit from a set baseline(majority of the data).

I will be doing a pseudo analysis on the Russian troll tweets as reconstructed by NBC https://www.kaggle.com/vikasg/russian-troll-tweets

One Class Support Vector Machine will be used to create a baseline with the text feature of the dataset, and trending political tweets will be scraped from the Twitter API, which will serve as the outliers.

The purpose of this is to identify the characteristics of these tweets, what they share in common(inliers), what they gravitate towards and how distinguishable can they be from real ones(outliers).

STEP ONE

import all dependencies

Pandas for loading the DataFrame. Numpy for handling data arrays. Regular expression for data cleaning. Sklearn Tfidf for Vectorization.

STEP TWO

Data preparation

STEP THREE

Prepare new data by mining political tweets from the API, note: we might observe high difference given the fact that, the zeitgeist of each political periods differ, however, since we are trying to know the story this data tells us, we are bound to gain new insights.

STEP FOUR

The aim here is to observe the characteristics of these tweets based on what they share in common, in basic terms, that’s what this algorithm will be doing by splitting them into two distinct classes. And then further compare them with real day to day tweets;

This code can be tweaked based on the state of observation which will differ based on current trends, notably;

The data size
Outlier/Inlier size
Twitter search query

However using this method we can determine how these trolls share an equal goal, and how well they are different from today’s political trends.

Analysis of Russian Troll Farm using Anomalous Detection

STEP ONE

STEP TWO

STEP THREE

STEP FOUR

Written by stillbigjosh