Practice/Apple/Web Content Parsing and Traversal Algorithms
CodingMust
You are building a web scraping tool that needs to extract specific HTML elements from raw HTML strings. Your task is to implement a function that finds all occurrences of a particular HTML tag within an HTML document and returns them as a list of strings.
Given an HTML string and a tag name, you need to parse the HTML structure and locate all instances of that tag, including their complete content (opening tag, content, and closing tag). The tags should be returned in the order they appear in the document using a depth-first traversal approach.
For example, given the HTML <div><p>Hello</p><span>World</span></div> and the tag name "p", your function should return ['<p>Hello</p>'].
Example 1:
Input: html = "<div><p>Hello</p><span>World</span></div>", tag_name = "p" Output: ['<p>Hello</p>'] Explanation: There is one paragraph tag containing "Hello"
Example 2:
Input: html = "<div><p>First</p><p>Second</p></div>", tag_name = "p" Output: ['<p>First</p>', '<p>Second</p>'] Explanation: Both paragraph tags are found in the order they appear
Example 3:
Input: html = "<div><div><span>Nested</span></div></div>", tag_name = "div" Output: ['<div><div><span>Nested</span></div></div>', '<div><span>Nested</span></div>'] Explanation: The outer div contains the inner div, both are returned with their complete content