Techniques for Web Application Origin Discovery

When attacking a web application, I will frequently encounter a web application firewall (WAF) which greatly slows, or outright prevents classes of attacks. There are many guides available that discuss techniques for evading specific WAF protections. Following is a different take on this common challenge. In this post, I will discuss techniques for discovering the location of the protected web application origin server on the internet. If the origin is poorly configured, this can allow absolute bypass of the WAF. If anything, this exercise may expose revealing information about an application, and can expand the available attack surface.

The application origin should be isolated so that all public traffic must pass through the application firewall. In practice, this isn’t always the case. I would gamble to say that while WAFs are more common than ever, the percentage of those which are properly configured are lower than ever, for a couple of reasons:

  • The WAF may be incidental. Developers may use Cloudflare specifically for site performance and uptime. The WAF is added value. Security was never the primary purpose, and so the origin configuration is neglected.
  • The bar of entry has been lowered. Services such as CloudFlare, Sucuri, and the various WAF-as-a-service offerings by most cloud providers have a push-button deployment. Unsophisticated developers or sysadmins may deploy these services without properly configuring the origin. They probably get “good enough” results, anyway.

If the origin is on the internet, it’s worth spending the time to search for it. Even if you don’t get unbridled HTTP/HTTPS access, you may find other services there, or other interesting hosts nearby. If anything, it always helps to augment the OSINT database.

Here are some techniques I’ve successfully used to find origin servers.

1. Search Historical DNS Records

A lot of information about an organization can be revealed by searching historical DNS records. Security Trails (https://securitytrails.com/dns-trails) has an excellent search interface and API for exploring this data. Sometimes, a WAF is deployed after a site or application has already gone into production. WAF implementation will likely require a DNS change, which may be reflected in the historical DNS records. Check out old A records and see if they offer any clues. See if there is an old staging or development domain. Security Trails is a great OSINT resource, in general.

2. Search Censys

Censys (https://censys.io) is the commercial offshoot of an internet mass-scanning project conducted by researchers at the University of Michigan. In addition to recording specific services running on public-facing hosts, Censys records interesting metadata about these services. This data includes TLS/SSL certificate metadata such the Common Name field. If the target domain has a public-facing origin server with a CA-signed certificate, then it is probably recorded in Censys. I’ve encountered application servers running with Extended Validation certificates, while the corresponding WAF runs a Let’s Encrypt certificate. Censys is another great general OSINT resource.

3. Send GET Requests to Known IPs with a Forged Host Header

Even if you’ve found the right host, the origin server may not reveal itself from a generic HTTP/HTTPS GET request pointed at an IP. It’s best to take a list of candidate web server IPs, and make requests including the “Host:” header with the target domain name. These candidate IPs may come from DNSTrails, Censys, or from other discoveries found during the information gathering phases of the engagement.

This can be done manually by modifying the HOSTS file on a given system, or by modifying request headers using Burp. Scripting a search is appealing, especially with a long list of candidate IPs. Given a text file with a list of IP addresses, the following rough Python script (using the Requests library) can be used to rapidly gather and explore HTTP responses to these requests.

#!/usr/bin/env python3
import requests

target_domain = "example.com"

host_header = {"Host": target_domain}
output_format = "{0:<35} {1:<10} {2:<10}"

with open("ip_addresses.txt") as ip_addresses:
    print(output_format.format("HEADER", "STATUS", "LENGTH"))
    for ip_address in ip_addresses:
        try:
            url = "http://" + ip_address.rstrip()
            r = requests.get(url, headers=host_header, timeout=2)
            print(output_format.format(ip_address.rstrip(), r.status_code, r.headers["Content-Length"]))
        except:
            continue

Long, non-404 responses can be investigated manually. I should add that brute-force search with forged host-headers is an excellent information gathering technique, in general. I’ve compromised networks with fresh attack surfaces revealed this way.

4. Make The Origin Talk To You

It’s sometimes possible to get an origin application to reveal itself through direct communication. SMTP is the most obvious service to exploit for this purpose. If the application has a user registration, newsletter sign-up, or other functionality that initiates an email response, fire them off and inspect the SMTP headers of the received email. They may contain IPs, hostnames, or other relevant information about the origin server.

There may be other services running that can reveal the origin. For example, I once encountered an e-commerce site with an on-the-fly thumbnail creation service. This service accepted a relative path to an image as a query string parameter, along with optional parameters for thumbnail size. It would then serve that image with the optional transformations. Further testing revealed that the service wasn’t referencing internal file paths (darn!), but rather was making HTTP GET requests to retrieve this content. Furthermore, it readily accepted absolute links to images on the internet. I was able to abuse this behavior in a number of creative ways. I was also able to discover the origin server by asking it to resize an image hosted on a server under my control. Coaxing an application to make an out-of-band connection can be revealing.

5. Google Dork Novel Strings

Specialized sites may reveal their neighbors through a simple Google search.

In one case, I encountered what appeared to be a niche SaaS CMS for a specific industry. However, the site itself was protected by a WAF. It was one of those WAFs that give you the “benefit” of placing a Security Seal on your site to demonstrate to the world how secure you are. By hitting the proxied site with a browser on a non-standard port, I was presented with an error page, which included the last octet of the origin server IP as a sort of diagnostic. This wasn’t much to go on. But since this SaaS site was clearly niche, I simply looked for site strings and URL paths that appeared specific to the application, and used Google to find similar sites. The search results were all packed within a pretty small set of network addresses. By using that last octet, and forging the “Host:” header in the request, I was able to discover the unprotected origin.

There are certainly other ways to discover an origin, or coax a WAF to let you speak to an origin server unbridled. These are just some practical techniques that have worked for me. Even when I’m stymied, I still walk away with something to write about in the narrative.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.