Social data is defined as publicly available information shared by social media users, like their location, language, and content shared.
Gathering social data is now common practice for many applications, from marketing to physical and digital security. Social data is typically associated with mainstream networks like Twitter, LinkedIn, and YouTube.
But social activity is diversifying across the surface, deep, and dark web, and valuable social data is widely available on sources you might have never associated with the term social media.
So is conflating social data with mainstream networks an outdated approach? And what could this approach cost professionals who use social data?
What is considered social media?
Social media and social data are actually harder to define than you might think. According to a 2017 big data research review,
For most, social media means a collection of popular websites and apps used to facilitate social interaction online—sites including:
- Social networks like Facebook and LinkedIn
- Photo and video-sharing networks like Instagram, YouTube, and Pinterest
- Interactive media networks like Snapchat and TikTok
- Microblogging sites like Twitter and Tumblr
But if we define social media as online technology that enables social interaction, networking, and the exchange of ideas and information, the variety of social data sources gets a lot broader. Beyond popular social media sites, social data then also includes surface, deep, and dark web networks where users exchange publicly available content.
- The deep web and dark web, including forums, imageboards, and less-regulated social media sites. Deep and dark web content is unindexed and thus undiscoverable by standard search engines like Google. These web spaces contain sites like 4chan, Gab, and other low-profile forums.
- Paste sites, like Pastebin and DeepPaste. Paste sites are used to publicly share blocks of plain text in posts known as “pastes.”
- Decentralized networks like Mastodon. These social networks are hosted on distributed servers to evade content policies and takedowns prevalent on more mainstream networks.
- Messaging apps like Telegram.
- Regional social networks. If you’re consuming social data in the Western world, popular social networks in other regions—like Naver (South Korea) or Sina Weibo (China)—are probably not on your social data radar.
Even though these sources may be harder to access, have a smaller user base, or host different content types than mainstream social media, they still make up an important—yet often overlooked—piece of the social mediaverse.
Why redefine “social data?”
Why does it matter whether you expand your definition of social data?
As an everyday social media consumer, it might not matter. But if you use public social data in a professional environment, reconsidering which sources fall under the umbrella of social media could make a huge difference to your operations.
Let’s take security applications, for example. If your team protects facilities, social data from networks like Twitter are valuable for detecting physical security threats like natural disasters or shootings. Mainstream social media has a large user base, so it’s useful for real-time threat detection as bystanders document events.
Now imagine that users on a chan site are planning a demonstration or attack near one of your facilities. If you also consider chan sites and other fringe social networks as part of your social data intelligence strategy, your security team could find early warning signals faster than relying on widely-known sources.
Rethinking which sources count as social data is also crucial for global applications like national security. Some of the most widely used social media sites in the world may be unfamiliar to Western intelligence operations. Areas of interest may also block populations from accessing mainstream social media, causing citizens to shift to alternative communication platforms. Without accessing these sources, intelligence teams are likely missing important social data.
Social activity now spans a range of web spaces beyond established social media sites. But search for “social data” in Google, and you’ll find that the term is still synonymous with tech giants.
Honoring this outdated understanding of social data means that professionals consuming this information—like security teams and the public sector—are probably missing out on the context necessary to protect data, assets, and populations.
Including more covert social media—like deep and dark websites and decentralized networks—helps security professionals minimize information gaps and paint a more accurate picture of online social activities. Whether you’re assessing information environments or providing physical security, broadening your perception of what constitutes social data is the first step towards a holistic strategy.