← Back to companies
[ OK ] Loaded —
[ INFO ]
$ cd
$ ls -lt
01
02
03
04
05
$ ls -lt
01
02
03
04
05
user@intervues:~/$
You are given:
startUrlHtmlParser that can fetch all URLs from a given web page
Implement a web crawler that returns all URLs reachable from startUrl that share the same hostname as startUrl.
The order of URLs in the result does not matter.
Below is the interface for HtmlParser: [Source: darkinterview.com]
interface HtmlParser { // Returns all URLs from a given page URL. public List<String> getUrls(String url); Your function will be called like List<String> crawl(String startUrl, HtmlParser htmlParser)`.Your crawler should:
startUrl.HtmlParser.getUrls(url) to obtain all links from a page.startUrl.http protocol and do not include a port.http://example.com/page#section1). Should these be treated as the same URL or different? Clarify with the interviewer if needed.After implementing the basic single-threaded version, implement a multithreaded or concurrent version of the web crawler to improve performance. [Source: darkinterview.com]