Half-Life of a Subscriber Base on YouTube



Presenting a method to estimate the amount of active Subscribers on any YouTube Channel, based on an in-depth analysis of 500,000 Subscribers from 6 Channels from the Most Subscribed List.



I. Introduction

YouTube Dynamics

YouTube Subscribers are an interesting phenomenon: What started in 2006 with only a few thousand subscribers for a handful of channels has since been growing exponentially. In 2008 the 50th most subscribed channel gained an average of 7,000 new subscribers per month and reached 100,000 subscribers in August 2008. Nine months later, in May 2009, another 50 channels reached over 100,000 subscribers. All in all there are well over 500 channels with more than 20,000 subscribers on YouTube. Most of these channels, and many more, are revenue sharing partners with YouTube and rely on their subscribers to generate the majority of views on their videos.

Surprisingly enough, there are no statistics about subscribers available other than the subscriber count itself. YouTube provides no tool like YouTube Insight that would help monitor a subscriber base. Other than visiting each subscribers channel page, the only viable measure for the health of a subscriber base is to compare the subscriber count with the view count and see if these two numbers match up.

“My last 5 videos have failed to reach my number of subscribers in views... that's a bad sign. It's time for a change!” - smpfilms

Just like smpfilms, many video makers are influenced by the feedback they receive. On a YouTube video these can be comments, ratings or just the number of views. Since only a fraction of subscribers interact by commenting or rating, the view counter is used by many to estimate the general interest of their subscribers.

Back in 2006, after the first channels had collected a few thousand subscribers, it quickly became a rule of thumb that the number of views usually match or exceed the number of subscribers. Today this rule still seems to apply, at least on channels who are on the most subscribed list and have a decent subscriber growth rate, like smpfilms, who has a subscriber base of over 250,000 and a growth rate of 15,000 new subscribers per month. On channels with a decreasing growth rate however, the number of views tend to decrease continuingly. Among those who first experienced this was Boh3m3.

“I'm sick to death of seeing a subscriber count that has nothing to do with the view count!” - Boh3m3

With around 4,000 subscribers
Boh3m3 was one of the early YouTubers on the most subscribed list in summer 2006. His subscriber base grew up to 40,000 throughout 2007. At that time this was large enough to almost guarantuee a spot on the most viewed list, resulting in many more views than subscribers on a regular basis. By the end of 2007 his subscriber growth began declining, not matching the growth rate of many other top 100 channels anymore, and he eventually fell off the most subscribed list in 2008. While a subscriber base of over 40,000 is still considered large today, it doesn't guarantuee a spot on the most viewed list anymore. So most views are coming from subscribers only and those numbers hardly reach 30% of the subscriber base on Boh3m3's channel.

This is no isolated case. These days there are plenty of channels with large but old subscriber bases, most of them with views reaching far less than 50% of their subscribers. Some of the "oldest" channels have views as low as 15% of their subscriber bases.

These numbers certainly leave room for interpretation about what happened with the subscribers who aren't watching anymore.


The effects of an aging Subscriber Base


The most interesting thing about some of my subscribers and other ‘ Top’ people is that most of them don’t even exist. - thatgirlonline

Apart from subscribers who are just not watching anymore, there are two more options. By visiting a few subscriber profiles, we notice that some subscribers haven't signed-in in quite a while, which makes them "inactive". Then there are also subscribers with closed or suspended accounts who are still listed.

So in general, a subscriber-base can be split into 4 categories:

  1. Active Accounts (watching)
  2. Active Accounts (not watching)
  3. Inactive Accounts
  4. Closed and Suspended Accounts

As a subscriber base gets older, there will be more closed and inactive accounts. While it's clear what constitutes a closed account, the period of time that has to pass until a subscriber can be
labeled inactive has to be defined. A way to do this, is to look at the actual distribution of the last sign-in times of all subscribers and define it in relation to those numbers.


II. Analysis

Data Acquisition

Most studies about the YouTube community have been using crawlers that either sample randomly across the whole site like here, or employed graph search algorithms like here, each depending on their specific research objectives. To measure a subscriber base however, it's necessary to collect possibly large amounts of subscribers from a couple of channels. This doesn't require any fancy search or randomization algorithms, but it relies on accurate and complete subscriber listings.

These subscriber listings were collected from the subscriber tab on each selected channel where all subscribers are listed in ascending order of their registration date. I then verified the data via the Google Data API for each item. A bug in the subscriber lists resulted in a sample size of 85%, i.e. 15% of the listed subscribers are double listings. However, from a statistical point of view, this sample size is more than enough, and luckily these double listings are spread equally across a whole subscriber base.

Another bug that was introduced with a site update in January 2009 eventually prevented further collection of subscriber data completely. Since January the listings for channel comments, friends and subscribers are all cut off at 1,000 items. Currently complete subscriber listings are only accessible to the channel owner in the account tab.

In total I collected the data of over 1,000,000 subscribers from over 15 channels in December '08 and January '09. By analyzing this data and revisiting all subscribers at regular intervals over the next year, I hope to be able to give a detailed picture of how a subscriber base develops over time.


The Median Subscriber Age

After looking at the collected data of all the different subscriber bases, a few general characteristics stood out. Unsurprisingly, the data shows that every subscriber base of a reasonable size follows the law of large numbers, i.e. the collective of subscribers doesn't behave erratically. Since a subscriber base is a congregation of users, a reasonably large subscriber base already represents the behavior of users in general. From this follows that the subscriber base of another channel will have rather similar characteristics and only differ in certain ways. I discovered that these differences between several subscriber bases correlate strongly with the median age of a subscriber base.

So what exactly is the median subscriber age?

The median subscriber age tells us that 50% of all users within a subscriber base subscribed before a certain point in time. It can be determined by counting the elapsed time since when a channel reached 50% of its current subscriber count.

For example, determining the median subscriber ages for Lonelygirl15 and Fred (as of December 2008): Lonelygirl15 had about 115,000 subscribers and Fred 720,000 subscribers. Lonelygirl15 reached 50% in October 2006 and Fred in August 2008. The median subscriber ages therefore are 26 months (Lonelygirl15) and 4 months (Fred). These figures are also prime examples for two relatively old and young subscriber bases. It's also interesting to note that their amounts of subscribers with old accounts differ drastically: Over 72% of Lonelygirl15's subscribers registered their accounts in 2006. For Fred it's only 7%.

Since it is so easy to figure out the median subscriber age without collecting any data, all we need is a model of how the amount of active subscribers develop in relation to the median age of a subscriber base.


Method

The objective is to find out how the distribution of active, inactive and closed accounts differs for subscriber bases with different median ages. For this I'm using the data of 6 subscriber bases.

The following characteristics will be compared:

  • Closed accounts
  • Shared subscribers
  • Year of account registration
  • Date of last sign-in
After doing so I will plot the percentage of active subscribers against the median subscriber age which will give us the fitting function, allowing to estimate the distribution on any YouTube channel. Another two channels will then be used for verification of the results.


The Channels

All 6 channels have certain properties that are relevant:

  • Each two channels have a similar aged subscriber base - This allows for verification of the results for each age group.
  • Subscriber bases of different sizes - This will show if the results are independent of the size of a subscriber base.
  • All channels produce 'user generated content' specifically for YouTube - This will provide for a homogenous group of subscribers with similar interests.
The following chart shows the subscriber bases of the 6 channels over the last two years as they appeared on the most subscribed list. The subscriber count of the 50th most subscribed channel is shown as well. Markers represent the points in time where each channel reached 50% of its current subscriber count. The chart is plotted with a 1 month resolution, but the exact dates for the median age were calculated with a higher accuracy, as shown in table 1. The sample date was December 2008.



ChannelA1A2B1B2C1C2
Subscribers77,18767,58495,976148,77663,264117,551
Median Age21.7 Months19.3 Months14.6 Months13.8 Months9.2 Months8.5 Months
table 1: median subscriber age

The chart shows that the channels A1, A2 and B2 already had over 10,000 subscribers by the end of 2006. The other 3 channels appeared in the top-100 later in 2007. The subscriber bases of channel A1 and A2 developed very similar over time (declining growth) and their subscriber-counts don't differ much in absolute numbers. They have the highest median age of the 6 channels, but they also have the largest age gap of all 3 groups (2.4 months). Even though channel B2 is as old as A1 and A2, its subscriber base grew at a higher rate in 2008, which results in a lower median age, similar to that of channel B1. B1 started collecting subscribers in early 2007, but due to a declining growth rate in 2008 it has a matching median age with channel B2. The channels in group C show similar disparities like those in group B. Both groups have considerably different subscriber counts and an age gap of less than 1 month. It's interesting to note that all channels except C1 were ranked higher than No.50 in the Top-100 charts prior to May 2008, and within only 6 months during the second half of 2008, three of the Channels fell behind No.50.


1. Closed Accounts

First thing to do is to sort out the closed accounts. These provide no further information as their channel pages aren't available anymore. YouTube still keeps them in the subscriber-lists and supposedly removes them from time to time. Therefore the numbers for closed accounts aren't expected to be very huge.



As expected, the numbers for closed accounts are moderate, ranging from 5% to 11%. Despite differently sized subscriber bases, the numbers align nicely on the percentage scale in accordance to the age of each subscriber base. From now on I will mostly refer to the number of "registered subscribers" of a channel, meaning the total amount of subscribers minus the closed accounts, as shown here.


2. Shared Subscribers

As mentioned earlier, I was aiming for a relatively homogenous group of users when I selected the 6 channels. Therefore these channels are expected to have some common subscribers. Table 2 shows how many subscribers are shared between any two of the 6 channels. It turns out that they actually do share quite a notable amount of subscribers. In relation to their absolute subscriber counts, these numbers go from 5% to almost 20%.



A1A2B1B2C1C2
A1x9,5274,41410,0594,4395,352
A29,527x5,98213,0088,11012,697
B14,4145,982x14,8104,5835,422
B210,05913,00814,810 x8,62012,598
C14,4398,1104,5838,620x10,567
C25,35212,6975,42212,59810,567x
table 2: shared subscribers

Another way of showing the relation between these channels, is to compare every subscriber base with the cumulative subscribers of the remaining 5 channels, as shown in table 3.


ChannelA1A2B1B2C1C2
Registered subscribers69,27661,48487,890136,23258,862111,787
Shared subscribers
with other 5 channels
21,689
(31.3%)
32,066
(52.1%)
24,962
(28.4%)
43,273
(31.8%)
23,253
(39.5%)
32,118
(28.7%)
table 3: shared subscribers

Four of the 6 channels share about one third of their subscribers with the rest of the group. Channel A1 and B1 share as much as 40% and 50% of their subscribers. These numbers seem massive, but aren't unexpected when considering that these channels are well known among each other and cumulatively contain more than 500,000 subscribers.

This leads to another question: how many unique users are in this pool of over 500,000 counted subscribers? The cumulative subscriber count (of registered subscribers) of all 6 channels is 525,531. These subscriptions are coming from 419,374 unique user accounts (79.8%).


3. Year of Account Registration

The year of account registration will help estimate the volatility within a subscriber base, which is actually one thing that needs more than one sampling of a subscriber base. Only by observing a subscriber base over a longer period of time, it's possible to tell how many subscribers actually unsubscribe and are replaced by new ones. However, I don't expect to see a high fluctuation of subscribers in general. Therefore, all subscriber bases are expected to show peaks in the years of high popularity and exposure of the respective channel. I also determined the median user age to see how strong this figure correlates to the median subscriber age.



Channel A1, A2 and B2, who already had many subscribers by the end of 2006, have their peaks with subscribers who registered in 2006. Likewise, channel B1 and C1 have their peaks with subscribers who joined in 2007. This indicates that most users keep their subscriptions for a long time. We also notice a correlation of the median user age with the median subscriber age, as shown in table 4.

ChannelA1A2B1B2C1C2
Median Age21.7 Months19.3 Months14.6 Months13.8 Months9.2 Months8.5 Months
Median User Age26 Months25 Months21 Months22 Months20 Months20.5 Months
table 4: median user age vs. median subscriber age

In table 5 a comparison of the subscriber growth in 2008 with the number of subscribers who joined in 2008 leads to the same conclusion and reveals some extra information: On all 6 channels only 35% to 60% of last years subscriber growth came from users who registered in 2008.

ChannelA1A2B1B2C1C2
Subscriber growth 200814,25213,27034,72469,82837,05175,228
Subscribers registered 20087,012 (49%)5,906 (44%)16,112 (46%)29,405 (42%)13,031 (35%)28,207 (61%)
table 5: 2008 subscriber growth

And on the topic of new users, another interesting observation are the low numbers of 2008-Subscribers for channel A1, A2 and B1. It turns out that these 3 channels didn't post many videos and had almost no exposure on the most viewed list or weren't featured by YouTube in 2008. This more or less confirms the general idea of new users spending their time preferably on the most viewed list, instead of exploring the site.


4. Active vs. Inactive Subscribers

The numbers for the last date of sign in will tell how many subscribers are really active users. Besides the closed accounts, this is where a subscriber base looses most users over time. The first chart divides all registered subscribers into six timeframes according to their date of last sign in. The distribution among these timeframes and the correlation with the respective age of each subscriber base will help determine where to draw the line between active and inactive subscribers.



Apparently a large majority of subscribers log in within 48 hours. This first timeframe also shows a strong alignment between all 6 channels on the percentage scale. Since the number of views within the first 48 hours decide if a video goes on the most viewed list, it's interesting to note that none of the 6 Channels can reach more than 48% of their subscribers within these 48 hours. For channel A1 with the oldest subscriber base (21 months), this number is down to 28%.

The remaining subscribers spread widely over all other timeframes. We also see the aligment among the 6 channels decreasing in the second timeframe, then diffusing in the third one, and eventually building up in reverse order, as time increases. This distribution suggests to mark all subscribers within the first 3 timeframes as active, and consider everyone who hasn't signed in in over a month as inactive. Being inactive doesn't necessarily mean that those users will never return. I expect quite a few subscribers within the 1-3 month timeframe to become active again, although many of those in the > 1 year timeframe might be gone forever.

By summing up the timeframes according to active and inactive subscribers and adding the closed accounts, we get the final bar chart:



In all three categories the subscribers line up nicely in accordance to their median age. Again, it's notable how the differently sized subscriber bases within group B and C don't affect the percentage distribution. After calculating the averages for each group, the different distributions can also be visualized with pie charts.



When plotting the percentage values of active subscribers against the median age for each channel, the strong correlation (r = -0.98) can be seen. The linear function with y = a + bx (a = 88.7%, b = -1.52%/month), is the resulting fitting curve.



5. Verification

To verify the resulting fitting function I chose another two channels, calculated the estimated number of active subscribers, and then determined the actual number of active subscribers like before, and compared the results.

ChannelT1T2
Median Subscriber Age15.8 Months5.4 Months
Registered subscribers18,05454,097
Shared subscribers913 (5%)
913 (1.6%)
Shared subscribers with reference group7,452 (41%)11,763 (22%)
table 6

Channel T1 is from a vlogger, channel T2 is from a comedian, and both were created in 2006. The median subscriber age for channel T1 is 15.8 months which falls in between that of group A and B. Channel T2's subscriber base is with 5.8 months the youngest of all. Both channels don't share many subscribers with each other, and with only 22%, channel T2 has the weakest link to all other channels. Channel T1's subscriber base is notably smaller than all the others.

The estimated amount of active subscribers is calculated with the the linear fitting function, as shown in the table below.


ChannelT1T2
Estimated Active Subscribers88.7% - 1.52% * 15.8 = 64.6%88.7% - 1.52% * 5.4 = 80.5%
table 7

The function predicts that 64.6% of T1's and 80.5% of T2's subscribers are active. Now we take a detailed look at their subscribers to find out the actual number of active subscribers.





It turns out that the actual numbers of active subscribers come very close to the estimated numbers.

ChannelT1T2
Median Subscriber Age15.8 Months5.4 Months
Median User Age24 Months16 Months
Estimated active subscribers64.6%80.5%
Actual active subscribers61.53%81.42%
table 8

The absolute devation from the fitting curve is as low as 3.1% (T1) and 0.9% (T2). Neither the low number of shared subscribers for channel T2, nor the comparably small subscriber base of channel T1 seem to have any drastic effects on the results.




Conclusion


The results in short:

  • The amount of active subscribers who log in at least once a month correlates with the median age of a subscriber base and can be modeled with the linear function y = a + bx (a = 88.7%, b = -1.52%/month).
  • The majority of active users are regulars who log-in on a daily basis.

  • The amount of active subscribers is reduced to 50% for a subscriber base with a median age larger than 2 years.

  • Subscribers who log-in daily only account for roughly 50% of a subscriber base with a low median age (6 months), and for only 25% with a median age of 2 years.
  • The cumulative amount of subscribers with log-in times larger than 1 year and closed accounts reaches up to 25% for a subscriber base with a median age of 2 years.

Outlook

I have quite a few ideas for follow-up posts that might eventually find their way into this blog. What's already planned is an update of how the collected subscribers develop over time. I will revisit all subscribers every 3 to 5 months to see how many of them become active or inactive, and how many of them remove their subscriptions from the respective channels. Another interesting topic might be to take a closer look at the subscriber accounts and see if there are any characteristical differences between active and inactive users.


blog comments powered by Disqus