I’ve been fortunate enough to travel the world helping people solve their OSINT problems and teaching OSINT. People often share what areas they would like to learn more about, and learning to program (usually in Python) is constantly on that list. There are countless resources to help you learn programming but a great way to get started is by using Application Programming Interfaces, or APIs.
What is an API?
An API is a way that someone with information or capabilities can make those things accessible to others in an easy-to-use way. For example, have you ever noticed that many more third-party sites analyze Twitter data than most other social media platforms? One of the primary reasons is that Twitter has an API that is free to use and lets people write code to acquire Twitter data and analyze or visualize it as they wish.
If you’re interested in OSINT and would like to work on your programming skills, working with APIs is a great place to start, since instead of having to create your capability from scratch, you can use data from other sources to provide functionality for your research, and potentially to share with others.
Open Source APIs for Mere Mortals: Supercharge Your Investigators With OSINT APIs
Watch the webinar on demand
Three useful OSINT APIs
This blog post will look at three different APIs that can be useful for OSINT. One will focus on automatically gathering publicly available information (PAI), one will focus on enriching email addresses, and one will focus on providing analytical capabilities.
We will show code in the Python programming language for each of these examples. Programming skills aren’t necessary to be an OSINT practitioner, but as we’ll see in a few examples here, having basic programming skills can be beneficial in many ways.
If you are interested in learning how to write code, Python is an excellent choice for a variety of reasons, including the fact that it’s so popular and there are a lot of resources available online to help you learn or troubleshoot.
Automatically gathering data for analysis
First, we’ll look at how we can get started using the Twitter API; then, we’ll talk about the why. The first step is registering as a Twitter Developer. This process takes under two minutes, is 100% free, and the only requirement is that the Twitter account you’re using must have a valid phone number with Twitter. Once you’ve completed this step, you will receive a unique, long string of characters called a “Bearer Token.” You put this in your code to prove who you are to Twitter.
With that step complete, let’s look at some basic python code to view all the recent tweets from my friend Nico whom many in the OSINT community know as dutch_osintguy.
import requests
# Replace this with your own bearer token
bearer_token = 'INSERT_YOUR_BEARER_TOKEN_HERE'
# Set the "Authorization" header of your API request to the bearer token
headers = {'Authorization': f'Bearer {bearer_token}'}
# Use the requests.get() function to make an API request to the Twitter API v2
# For example, you can get the most recent tweets from a specific user like this:
response = requests.get(
'https://api.twitter.com/2/tweets/search/recent',
headers=headers,
params={'query': 'from:dutch_osintguy'}
)
# The API response will include the requested data
tweets = response.json()['data']
# You can then iterate over the tweets and analyze their data
for tweet in tweets:
print (tweet)
That’s only 22 lines of code, including comments and blank lines!
Now that we’ve seen the how, let’s talk about the why. The above example seems rather…. pointless? Why would we write code to look at a user’s recent tweets instead of, you know, just going to the Twitter website? In this case, since we’re only looking at one user, we likely wouldn’t. But while doing something once may be easy, doing it 10,000 or 100,000 (or even more!) times may be very difficult or completely impractical. In cases like these, writing code to automate that process can free you up for tasks that a human excels at, like analysis and providing context to the results.
Another massive benefit to using the API is performing persistent automated monitoring. Numerous times in my career, I’ve been tasked with: “If the home address of this key person leaks out, we need to know about it ASAP.” I had no desire to stare at my web browser or TweetDeck for 24 hours a day to watch for that, so I wrote code to monitor everything I needed to scan for and to send me an email if any new results were found. You can get excellent results without needing to write too much code.
If you’re interested in using the Twitter API for persistent monitoring, I talked about my process and released my code at the 2021 SANS Open-Source Intelligence Summit.
Automatically enriching emails
The service Hunter.io is popular for offensive security professionals and OSINT practitioners looking to find email addresses for members of an organization, or to find out additional information about an email address. Hunter.io has a free plan with an API that allows for 25 searches and 50 verifications per month. Let’s look at some code to use the verification capability to find additional information about the email address [email protected].
import requests
# Replace this with your own API key
api_key = 'INSERT_YOUR_API_KEY_HERE'
# Use the requests.get() function to make an API request to the Hunter.io API
# For example, you can search for an email address like this:
response = requests.get(
'https://api.hunter.io/v2/email-verifier',
params={'email': '[email protected]', 'api_key': api_key}
)
# The API response will include the requested data
data = response.json()
# You can access the data about the email address and its associated domain and organization
This code is even shorter than the Twitter code! Let’s take a look at the results:
[email protected] {'data': {'status': 'accept_all', 'result': 'risky', '_deprecation_notice': 'Using result is deprecated, use status instead', 'score': 71, 'email': '[email protected]', 'regexp': True, 'gibberish': False, 'disposable': False, 'webmail': False, 'mx_records': True, 'smtp_server': True, 'smtp_check': True, 'accept_all': True, 'block': False, 'sources': []}, 'meta': {'params': {'email': '[email protected]'}}}
We didn’t get back a ton of information, but some of it is potentially useful, including that they don’t believe the email is from a disposable domain.
I recently worked with a Fortune 500 company that needed to check a database of over 150,000 emails to see if any looked like they could be tied to copycat domains committing fraud. We were able to use techniques similar to this to isolate three emails from the over 150,000 that appeared likely malicious. Once again, doing something once may be easy, but nobody wants to do anything manually 150,000+ times.
Adding analytical capabilities
The first two examples we looked at dealt with acquiring data; the last one will cover adding analytical and processing capabilities to analyze data you already possess.
In 2020, I wrote a few blog posts trying to introduce OSINT practitioners to the wonderful world of Amazon’s Cloud, AWS. Many people know that you can “rent” cloud-based systems from Amazon to use as web servers, virtual machines, etc. Still, many people don’t know that Amazon has several APIs available to help you analyze, process, or translate data. Most of these APIs are eligible for the “Free Tier,” which lets users utilize these capabilities for 12 months for free, usually with a more than reasonable monthly limit. Several of these APIs are helpful for OSINT, but the one we will cover here is their image recognition service, Rekognition.
Rekognition
Rekognition has multiple uses, including identifying objects in images and extracting text from images. We’re going to use it to help solve an age-old OSINT problem.
Suppose you’ve walked into an office where individuals were performing OSINT, and you see multiple analysts staring intently at a single monitor. In that case, there’s a good chance that they were trying to tell if a person from one picture is the same person from another image. Unfortunately, the manual “It looks like the same person to me” method is far from perfect. There have been numerous instances of individuals getting wrongly accused on social media based on looking similar to someone from a photo posted online. Rekognition can help us with this issue with its “compare_faces” function of the boto3, which is the Python library for using AWS.
Here is the code to compare individuals in two images:
import boto3 # Set up the client client = boto3.client('rekognition') # Read in the images to compare image1 = open('image1.jpg', 'rb') image2 = open('image2.jpg', 'rb') # Call the compare_faces function response = client.compare_faces( SourceImage={ 'Bytes': image1.read() }, TargetImage={ 'Bytes': image2.read() } ) # Get the similarity score similarity = response['FaceMatches'][0]['Similarity'] # Print the similarity score print(f"Similarity score: {similarity}") # Check if the similarity score is above a certain threshold if similarity > 90: print("These are likely the same person") else: print("These are likely not the same person")
When I used this code to compare two different pictures of me, these were the results:
No method is perfect, but it is nice to have an unbiased second opinion when making these determinations. If you’re interested in reading more about using AWS for OSINT, I wrote a two-part blog series available here.
Countless other APIs can be helpful for OSINT research. Still, these three examples can help you start creating custom capabilities without requiring much programming.