What is Netograph

Netograph monitors links passing through the social web (Reddit, Hacker News, Delicious, Pinboard and Digg) in realtime, and generates privacy snapshots showing what resources are requested and from whom. It also shows you what persistent state is set - whether it be cookies, Flash Local Storage objects, or HTML5 local storage. All told, this gives you a quick and accurate view of what a site actually does, before you visit it.

The visualization that Netograph provides right now is just the first step - there are a huge number of interesting directions to explore from here. Subscribe to my blog or follow me on Twitter to track updates as they happen.

The graphs

A couple of things to note:

  • The graph is interactive - drag things around and click on nodes for more information.
  • Domains are amalgamated together by top-level domain. So "www.google.com", "google.com" and "mail.google.com" will all be grouped in a node called just "google.com".
  • Persistent state can be HTTP cookies, Flash local storage objects, or HTML5 local storage.

You will need a modern browser to view the Netograph visualizations - IE9, or a current version of Firefox, Chrome or Safari.

How it works

Here's how Netograph works. First, a browser is used to render a URL. All traffic is routed through a specialized version of mitmproxy, and captured for later analysis. After the page has rendered the browser is shut down, and standard browser forensic techniques are used to capture all persistent state. The result is a very detailed snapshot of a URL's activity.

Netograph does this "at scale" in realtime - this means running many instances of this process in parallel on headless servers, decoupling things using queues, and backing it all onto a database.