The Network Structure of Baseball Blogs: Part 1
Earlier in the week I read about the network structure of twitter employees' accounts and that got me thinking about the network structure of baseball blogs. Network theory (or graph theory) looks at the structure of objects connected by pairwise connections. It has been used to study the structure of the internet, email networks, the phone and power grids, epidemiological networks, food webs and tons of other things. In this case you can think of baseball blogs as vertices and then connect them with edges if they link one another, then graph out all the connected blogs and see whether there is any structure.
I used the data from BallHype to generate the web. I looked at their top 200 baseball blogs and then went back to each blog's last 100 posts and saw which of the other 200 blogs linked to that post. These are links from posts to posts not general links from a blog to another blog. Here are all the blogs with at least one connection to the main component, with an edge draw whenever one blog links another.
The algorithm tries to draw the vertices in positions such that they are close to blogs that linked them and which they linked. So you can sort of see clusters of blogs which should be similar (linked to and from similar blogs). Here I have labeled the top 15 blogs (a cutoff that conveniently includes Baseball Analysts -- BA).
Next I wanted to see how strongly blogs following the same teams clustered out together in the network. I should say that the vertices are not all of the blogs, because of the cutoff I am only showing blogs which connect to this strongly connected component (remember my definition for an edge is three or more links). The Reds Sox, Cubs, Cardinals and Angles all have lots of blogs in the top 200 but most of these fell away, presumably because they either did not link enough or did not have a enough links in (I am not saying anything about the quality of these blogs based on that). Some other teams with a lot fewer blogs had more stay in the network.
Then you have some surprising teams. Who knew there were so many Nats blogs? You can see this is largely driven by one, Federal Baseball, which regularly links a number of other Nats blogs. On the other hand the Pirates section is driven by one blog, PBC blog, which receives links in from a number of other blogs. There is an interesting blog in there, Call to the Pen, which links to Padres, Mariners and Pirates blogs, as well as many others.
I am not trying to make a value statement that having blogs in this network is a better than not (e.g., I am not saying that the Nationals blog community is any better or worse than the Red Sox blog community). I am just showing the network based on my arbitrary way of defining a connection.
This is a first pass at the data and next week I will dig a little deeper into the network structure. How connected is the network? What is the average distance between two random blogs? Do any teams cluster out together?