Presenting a method to estimate the amount of active Subscribers on any YouTube Channel, based on an in-depth analysis of 500,000 Subscribers from 6 Channels from the Most Subscribed List.

I. Introduction

YouTube Dynamics

“My last 5 videos have failed to reach my number of subscribers in views... that's a bad sign. It's time for a change!” - smpfilms

Back in 2006, after the first channels had collected a few thousand subscribers, it quickly became a rule of thumb that the number of views usually match or exceed the number of subscribers. Today this rule still seems to apply, at least on channels who are on the most subscribed list and have a decent subscriber growth rate, like smpfilms, who has a subscriber base of over 250,000 and a growth rate of 15,000 new subscribers per month. On channels with a decreasing growth rate however, the number of views tend to decrease continuingly. Among those who first experienced this was Boh3m3.

“I'm sick to death of seeing a subscriber count that has nothing to do with the view count!” - Boh3m3

With around 4,000 subscribers Boh3m3 was one of the early YouTubers on the most subscribed list in summer 2006. His subscriber base grew up to 40,000 throughout 2007. At that time this was large enough to almost guarantuee a spot on the most viewed list, resulting in many more views than subscribers on a regular basis. By the end of 2007 his subscriber growth began declining, not matching the growth rate of many other top 100 channels anymore, and he eventually fell off the most subscribed list in 2008. While a subscriber base of over 40,000 is still considered large today, it doesn't guarantuee a spot on the most viewed list anymore. So most views are coming from subscribers only and those numbers hardly reach 30% of the subscriber base on Boh3m3's channel.

This is no isolated case. These days there are plenty of channels with large but old subscriber bases, most of them with views reaching far less than 50% of their subscribers. Some of the "oldest" channels have views as low as 15% of their subscriber bases.

These numbers certainly leave room for interpretation about what happened with the subscribers who aren't watching anymore.

The effects of an aging Subscriber Base

“The most interesting thing about some of my subscribers and other ‘ Top’ people is that most of them don’t even exist.” - thatgirlonline

Apart from subscribers who are just not watching anymore, there are two more options. By visiting a few subscriber profiles, we notice that some subscribers haven't signed-in in quite a while, which makes them "inactive". Then there are also subscribers with closed or suspended accounts who are still listed.

So in general, a subscriber-base can be split into 4 categories:

- Active Accounts (watching)
- Active Accounts (not watching)
- Inactive Accounts
- Closed and Suspended Accounts

As a subscriber base gets older, there will be more closed and inactive accounts. While it's clear what constitutes a closed account, the period of time that has to pass until a subscriber can be labeled inactive has to be defined. A way to do this, is to look at the actual distribution of the last sign-in times of all subscribers and define it in relation to those numbers.

II. Analysis

Data Acquisition

Most studies about the YouTube community have been using crawlers that either sample randomly across the whole site like here, or employed graph search algorithms like here, each depending on their specific research objectives. To measure a subscriber base however, it's necessary to collect possibly large amounts of subscribers from a couple of channels. This doesn't require any fancy search or randomization algorithms, but it relies on accurate and complete subscriber listings.These subscriber listings were collected from the subscriber tab on each selected channel where all subscribers are listed in ascending order of their registration date. I then verified the data via the Google Data API for each item. A bug in the subscriber lists resulted in a sample size of 85%, i.e. 15% of the listed subscribers are double listings. However, from a statistical point of view, this sample size is more than enough, and luckily these double listings are spread equally across a whole subscriber base.

Another bug that was introduced with a site update in January 2009 eventually prevented further collection of subscriber data completely. Since January the listings for channel comments, friends and subscribers are all cut off at 1,000 items. Currently complete subscriber listings are only accessible to the channel owner in the account tab.

In total I collected the data of over 1,000,000 subscribers from over 15 channels in December '08 and January '09. By analyzing this data and revisiting all subscribers at regular intervals over the next year, I hope to be able to give a detailed picture of how a subscriber base develops over time.

The Median Subscriber Age

After looking at the collected data of all the different subscriber bases, a few general characteristics stood out. Unsurprisingly, the data shows that every subscriber base of a reasonable size follows the law of large numbers, i.e. the collective of subscribers doesn't behave erratically. Since a subscriber base is a congregation of users, a reasonably large subscriber base already represents the behavior of users in general. From this follows that the subscriber base of another channel will have rather similar characteristics and only differ in certain ways. I discovered that these differences between several subscriber bases correlate strongly with the median age of a subscriber base.

So what exactly is the median subscriber age?

The median subscriber age tells us that 50% of all users within a subscriber base subscribed before a certain point in time. It can be determined by counting the elapsed time since when a channel reached 50% of its current subscriber count.

Since it is so easy to figure out the median subscriber age without collecting any data, all we need is a model of how the amount of active subscribers develop in relation to the median age of a subscriber base.

Method

The objective is to find out how the distribution of active, inactive and closed accounts differs for subscriber bases with different median ages. For this I'm using the data of 6 subscriber bases.

The following characteristics will be compared:

- Closed accounts
- Shared subscribers
- Year of account registration
- Date of last sign-in

The Channels

- Each two channels have a similar aged subscriber base - This allows for verification of the results for each age group.
- Subscriber bases of different sizes - This will show if the results are independent of the size of a subscriber base.
- All channels produce 'user generated content' specifically for YouTube - This will provide for a homogenous group of subscribers with similar interests.

Channel | A1 | A2 | B1 | B2 | C1 | C2 |

Subscribers | 77,187 | 67,584 | 95,976 | 148,776 | 63,264 | 117,551 |

Median Age | 21.7 Months | 19.3 Months | 14.6 Months | 13.8 Months | 9.2 Months | 8.5 Months |

table 1: median subscriber age |

1. Closed Accounts

2. Shared Subscribers

A1 | A2 | B1 | B2 | C1 | C2 | |

A1 | x | 9,527 | 4,414 | 10,059 | 4,439 | 5,352 |

A2 | 9,527 | x | 5,982 | 13,008 | 8,110 | 12,697 |

B1 | 4,414 | 5,982 | x | 14,810 | 4,583 | 5,422 |

B2 | 10,059 | 13,008 | 14,810 | x | 8,620 | 12,598 |

C1 | 4,439 | 8,110 | 4,583 | 8,620 | x | 10,567 |

C2 | 5,352 | 12,697 | 5,422 | 12,598 | 10,567 | x |

table 2: shared subscribers |

Another way of showing the relation between these channels, is to compare every subscriber base with the cumulative subscribers of the remaining 5 channels, as shown in table 3.

Channel | A1 | A2 | B1 | B2 | C1 | C2 |

Registered subscribers | 69,276 | 61,484 | 87,890 | 136,232 | 58,862 | 111,787 |

Shared subscribers with other 5 channels | 21,689 (31.3%) | 32,066 (52.1%) | 24,962 (28.4%) | 43,273 (31.8%) | 23,253 (39.5%) | 32,118 (28.7%) |

table 3: shared subscribers |

This leads to another question: how many unique users are in this pool of over 500,000 counted subscribers? The cumulative subscriber count (of registered subscribers) of all 6 channels is 525,531. These subscriptions are coming from 419,374 unique user accounts (79.8%).

3. Year of Account Registration

Channel | A1 | A2 | B1 | B2 | C1 | C2 |

Median Age | 21.7 Months | 19.3 Months | 14.6 Months | 13.8 Months | 9.2 Months | 8.5 Months |

Median User Age | 26 Months | 25 Months | 21 Months | 22 Months | 20 Months | 20.5 Months |

table 4: median user age vs. median subscriber age |

Channel | A1 | A2 | B1 | B2 | C1 | C2 |

Subscriber growth 2008 | 14,252 | 13,270 | 34,724 | 69,828 | 37,051 | 75,228 |

Subscribers registered 2008 | 7,012 (49%) | 5,906 (44%) | 16,112 (46%) | 29,405 (42%) | 13,031 (35%) | 28,207 (61%) |

table 5: 2008 subscriber growth |

4. Active vs. Inactive Subscribers

The remaining subscribers spread widely over all other timeframes. We also see the aligment among the 6 channels decreasing in the second timeframe, then diffusing in the third one, and eventually building up in reverse order, as time increases. This distribution suggests to mark all subscribers within the first 3 timeframes as active, and consider everyone who hasn't signed in in over a month as inactive. Being inactive doesn't necessarily mean that those users will never return. I expect quite a few subscribers within the 1-3 month timeframe to become active again, although many of those in the > 1 year timeframe might be gone forever.

By summing up the timeframes according to active and inactive subscribers and adding the closed accounts, we get the final bar chart:

5. Verification

Channel | T1 | T2 |

Median Subscriber Age | 15.8 Months | 5.4 Months |

Registered subscribers | 18,054 | 54,097 |

Shared subscribers | 913 (5%) | 913 (1.6%) |

Shared subscribers with reference group | 7,452 (41%) | 11,763 (22%) |

table 6 |

The estimated amount of active subscribers is calculated with the the linear fitting function, as shown in the table below.

Channel | T1 | T2 |

Estimated Active Subscribers | 88.7% - 1.52% * 15.8 = 64.6% | 88.7% - 1.52% * 5.4 = 80.5% |

table 7 |

Channel | T1 | T2 |

Median Subscriber Age | 15.8 Months | 5.4 Months |

Median User Age | 24 Months | 16 Months |

Estimated active subscribers | 64.6% | 80.5% |

Actual active subscribers | 61.53% | 81.42% |

table 8 |

Conclusion

- The amount of active subscribers who log in at least once a month correlates with the median age of a subscriber base and can be modeled with the linear function y = a + bx (a = 88.7%, b = -1.52%/month).

- The majority of active users are regulars who log-in on a daily basis.
- The amount of active subscribers is reduced to 50% for a subscriber base with a median age larger than 2 years.
- Subscribers who log-in daily only account for roughly 50% of a subscriber base with a low median age (6 months), and for only 25% with a median age of 2 years.

- The cumulative amount of subscribers with log-in times larger than 1 year and closed accounts reaches up to 25% for a subscriber base with a median age of 2 years.

I have quite a few ideas for follow-up posts that might eventually find their way into this blog. What's already planned is an update of how the collected subscribers develop over time. I will revisit all subscribers every 3 to 5 months to see how many of them become active or inactive, and how many of them remove their subscriptions from the respective channels. Another interesting topic might be to take a closer look at the subscriber accounts and see if there are any characteristical differences between active and inactive users.