In this article, I tried to prepare a write-up for the “Google Dorking” room on tryhackme.
[Task 1] Ye Ol’ Search Engine
#1 Roger dodger!
ANSWER: No answer needed
[Task 2] Let’s Learn About Crawlers
What are Crawlers and how do They Work?
These crawlers discover content through various means. One being by pure discovery, where a URL is visited by the crawler and information regarding the content type of the website is returned to the search engine. In fact, there are lots of information modern crawlers scrape – but we will discuss how this is used later. Another method crawlers use to discover content is by following any and all URLs found from previously crawled websites. Much like a virus in the sense that it will want to traverse/spread to everything it can.
#1 Name the key term of what a “Crawler” is used to do
#2 What is the name of the technique that “Search Engines” use to retrieve this information about websites?
#3 What is an example of the type of contents that could be gathered from a website?
[Task 3] Enter: Search Engine Optimisation
Search Engine Optimisation
Search Engine Optimisation or SEO is a prevalent and lucrative topic in modern-day search engines. In fact, so much so, that entire businesses capitalise on improving a domains SEO “ranking”. At an abstract view, search engines will “prioritise” those domains that are easier to index. There are many factors in how “optimal” a domain is – resulting in something similar to a point-scoring system.
To highlight a few influences on how these points are scored, factors such as:
- How responsive your website is to the different browser types I.e. Google Chrome, Firefox and Internet Explorer – this includes Mobile phones!
- How easy it is to crawl your website (or if crawling is even allowed …but we’ll come to this later) through the use of “Sitemaps”
- What kind of keywords your website has (i.e. In our examples, if the user was to search for a query like “Colours” no domain will be returned – as the search engine has not (yet) crawled a domain that has any keywords to do with “Colours”
There is a lot of complexity in how the various search engines individually “point-score” or rank these domains – including vast algorithms. Naturally, the companies running these search engines such as Google don’t share exactly how the hierarchic view of domains ultimately ends up. Although, as these are businesses at the end of the day, you can pay to advertise/boost the order of which your domain is displayed.
#1 Using the SEO Site Checkup tool on “tryhackme.com”, does TryHackMe pass the “Meta Title Test”? (Yea / Nay)
#2 Does “tryhackme.com” pass the “Keywords Usage Test?” (Yea / Nay)
#3 Use https://neilpatel.com/seo-analyzer/ to analyse http://googledorking.cmnatic.co.uk:
ANSWER: No answer needed
#4 With the same tool and domain in Question#3 (previous): How many pages use “flash”
#5 From a “rating score” perspective alone, what website would list first? tryhackme.com or googledorking.cmnatic.co.uk Use tryhackme.com’s score of 62/100 as of 31/03/2020 for this question.
[Task 4] Beepboop – Robots.txt
Similar to “Sitemaps” which we will later discuss, this file is the first thing indexed by “Crawlers” when visiting a website.
#1 Where would “robots.txt” be located on the domain “ablog.com”
#2 If a website was to have a sitemap, where would that be located?
#3 How would we only allow “Bingbot” to index the website?
Answer: User-agent: Bingbot
#4 How would we prevent a “Crawler” from indexing the directory “/dont-index-me/”?
ANSWER: Disallow: /dont-index-me/
#5 What is the extension of a Unix/Linux system configuration file that we might want to hide from “Crawlers”?
[Task 5] Sitemaps
Comparable to geographical maps in real life, “Sitemaps” are just that – but for websites!
“Sitemaps” are indicative resources that are helpful for crawlers, as they specify the necessary routes to find content on the domain. The below illustration is a good example of the structure of a website, and how it may look on a “Sitemap”.
#1 What is the typical file structure of a “Sitemap”?
#2 What real life example can “Sitemaps” be compared to?
#3 Name the keyword for the path taken for content on a website
[Task 6] What is Google Dorking?
Using Google for Advanced Searching
As we have previously discussed, Google has a lot of websites crawled and indexed. Your average Joe uses Google to look up Cat pictures (I’m more of a Dog person myself…). Whilst Google will have many Cat pictures indexed ready to serve to Joe, this is a rather trivial use of the search engine in comparison to what it can be used for.
#1 What would be the format used to query the site bbc.co.uk about flood defences
ANSWER: site: bbc.co.uk flood defences
#2 What term would you use to search by file type?
#3 What term can we use to look for login pages?
ANSWER: intitle: login
I hope I can explain the subject as a whole. I hope it is useful for you. For your questions, comments and feedback, you can send an e-mail to email@example.com.
You can also reach me via linkedin. I thank everyone who reads and wish you healthy days.
See you in my next write-up…