Flash, the decades old media platform that once ruled video and animations on the web, was discontinued in December 2020 and was considered a security liability even before EOL. Two years later, several browsers still support it and thousands of websites still have it.
Using Google tools and a small bash script, it's possible to find legacy Flash still hosted on any website (presumably yours, so that you can remove it). A full copy of the script is available here.
This tutorial requires:
- A Linux server with a connection to the Internet along with bash, curl, and standard Unix tools (grep, sed, etc.)
- A Google Programmable Search Engine ID (see here for more information)
- A Google Custom Search API key (see here for more information)
How it Works
Google has indexed trillions of web pages, making it possible to search for Flash content that is publicly accessible. This method won't find Flash on pages that are private, require authorization, or are otherwise inaccessible. It also won't find Flash on pages that have restricted search engine indexing.
The basis of the search is the text flash player. Almost always, embedded Flash flash files included alternate text such as "you need to upgrade your flash player" or "get adobe flash player." The alternate text would display on the page if the browser did not have the Flash player installed and enabled. Searching for the text, rather than the actual embedded content, will work despite the announcement from Google that it would stop indexing SWF (Flash) files.
The flash player search text can be combined with the site search operator to return a comprehensive list of Flash files on a specific website. Type the queries below in a Google search form to see an example of sites with and without Flash content.
site:technicalciso.com flash player site:sohu.com flash player
To do this from a Unix/Linux shell, run the curl command below. Quotation marks, spaces, and special characters can get tricky with some shells, so set all the variables first. Notice the URL encoding that is used for spacing in the flash player search text.
CXID="" #your google programmable search engine ID here APIKEY="" #your google custom search API key here FLASHTEXT="%20flash%20player" curl https://www.googleapis.com/customsearch/v1?key=$APIKEY&cx=$CXID&q=site:$1:$FLASHTEXT
Bash the Cache
Google's search index can include pages that have changed or no longer exist, meaning that some of the results may be false positives without any actual Flash. To validate pages returned in the search results, download each page and look for embedded SWF files. Do this from the command line use another curl.
PAGE="" #page to check curl $PAGE | grep -i swf > /dev/null 2>&1 && echo "flash confirmed: $PAGE" || echo "listed (flash not found, could be old cache): $PAGE"
To automate the entire process, use a script to search for flash text on a website and iterate through the results to validate if SWF files exist.
A full copy of the script is available here.