Given an array of strings, organize them into clusters where each cluster contains words that have identical letter compositions. Two words belong to the same cluster if they contain exactly the same letters with the same frequencies, regardless of order.
Return the clusters in any order. The words within each cluster can also appear in any order.
0 <= words.length <= 100000 <= words[i].length <= 100words[i] consists of lowercase English letters onlyO(n * k) where n is the number of words and k is the maximum length of a wordO(n * k) for storing the results and intermediate data structuresExample 1:
` Input: words = ["eat", "tea", "tan", "ate", "nat", "bat"] Output: [["eat", "tea", "ate"], ["tan", "nat"], ["bat"]] Explanation:
Example 2:
Input: words = [""] Output: [[""]] Explanation: An empty string forms its own cluster
Example 3:
Input: words = ["a"] Output: [["a"]] Explanation: A single word forms a single cluster
Example 4:
Input: words = ["listen", "silent", "enlist"] Output: [["listen", "silent", "enlist"]] Explanation: All three words contain the same letters with identical frequencies
Hint 1: Canonical Representation To determine if two words have the same letter composition, you need a way to represent each word that is identical for all words in the same cluster. Think about how you could create a signature or key that captures the letter composition independent of order.
Hint 2: Sorting Approach One straightforward approach is to sort the characters in each word alphabetically. Words with the same letter composition will produce identical sorted strings. This sorted string can serve as a key in a hash map.
Hint 3: Character Frequency Approach Another approach is to count the frequency of each character. You could create a tuple or string representation of the character counts (like "a1b2c1" for "abc"). This frequency signature uniquely identifies words with the same composition.
Full Solution `` Approach Explanation:
The solution uses a hash map (dictionary) to group words with identical letter compositions. The key insight is creating a canonical representation that is the same for all words in a cluster.
Method 1: Sorted Characters (Primary Solution)
- For each word, sort its characters alphabetically
- Use the sorted string as a key in the hash map
- Words with the same sorted form belong to the same cluster
- Time: O(n × k log k) due to sorting each word
- Space: O(n × k) for storing results
Method 2: Character Frequency (Alternative)
- Count frequency of each letter (a-z) using an array of size 26
- Convert the frequency array to a tuple to use as a hash key
- Words with identical frequency patterns belong together
- Time: O(n × k) which is faster for longer words
- Space: O(n × k) for storing results
Both methods correctly group words by their letter composition. The frequency-based approach is asymptotically faster, especially for longer words, while the sorting approach is simpler to implement and understand.
from typing import List
from collections import defaultdict
def clusterWords(words: List[str]) -> List[List[str]]:
"""
Cluster words by their letter composition using sorted characters as keys.
Time Complexity: O(n * k * log(k)) where n is number of words, k is max word length
Space Complexity: O(n * k) for storing all words and their keys
"""
# Dictionary to map canonical form to list of words
clusters = defaultdict(list)
# Process each word
for word in words:
# Sort characters to create canonical key
# Words with same letters will have identical sorted strings
key = ''.join(sorted(word))
# Add word to the appropriate cluster
clusters[key].append(word)
# Return all clusters as a list of lists
return list(clusters.values())
# Alternative solution using character frequency counts
def clusterWordsFrequency(words: List[str]) -> List[List[str]]:
"""
Cluster words using character frequency as the key.
Time Complexity: O(n * k) where n is number of words, k is max word length
Space Complexity: O(n * k) for storing all words and their keys
"""
clusters = defaultdict(list)
for word in words:
# Create frequency count key using 26 slots for lowercase letters
# This is faster than sorting for longer words
count = [0] * 26
for char in word:
count[ord(char) - ord('a')] += 1
# Convert count array to tuple (hashable) to use as dictionary key
key = tuple(count)
clusters[key].append(word)
return list(clusters.values())