White Hat Institute

Collecting metadata with Metagoofil

Metagoofil is an information-gathering tool created for extracting metadata of public documents like pdf, doc, xls, ppt, docx, pptx, and xlsx, belonging to a target company. It will search in Google to find and download the files to a local computer and then extract the metadata with different libraries like Hachoir, PdfMiner, and others.

The results will create a report with software versions, usernames, and servers or machine names that will aid penetration testers in the information-gathering phase.

To install metagoofil, open the terminal and use the “apt install” command.

Ex: (root@kali:/opt# apt install metagoofil).

After the installation, you can run the tool from anywhere in the terminal by typing “metagoofil.”

Ex: (root@kali:/opt# metagoofil –h).

metagoofil, metadata,

As an example of metagoofil usage, we will collect all the documents from our target domain “-d example.com” that are PDF and document files “-t doc, pdf,” searching 100 results “-l 100”, download 50 files “-n 50”, provide the location to save downloaded files “-o /root/Download/test,” and save them to an HTML file named “-f test.html.”

Ex: (root@kali:/opt# metagoofil -d example.com -t doc,pdf -l 100 -n 50 -o /root/Downloads/test -f test.html).

metagoofil 2

Once the information gathering process is complete, metagoofil will save all gathered information into the provided location so that you can check them later on.

metagoofil 3

This open-source intelligence application may also extract MAC addresses from these files, providing the attacker a good picture of the networking equipment employed at the target location. Metagoofil can be used in conjunction with the attacker’s intuition and intellect to estimate the type of operating system, network names, and so on. 

Once enough information has been gathered from the metadata of the files, a brute force attack can be carried out. It is possible to extract path information from the metadata received using Metagoofil, which can be used to map the network. The outcomes are presented in HTML format.