# 28 - Implement strStr()

解法一 - 暴力法

走過 haystack 中每個 char，如果跟 needle 的第一個 char 一樣，那就比較 haystack 接下來的 substring 跟 needle，效率是 O(mn)，m 是 haystack 長度，n 是 needle 長度。

class Solution {
public:
    int strStr(string haystack, string needle) {
        if(needle == "") return 0;

        int pos=0;
        for(; pos<haystack.length(); pos++) {
            if(haystack[pos] == needle[0]) {
                string candidate = haystack.substr(pos, needle.length());
                if(candidate == needle) break;
            }
        }

        return pos == haystack.length() ? -1 : pos;
    }
};

// check at each char == first char of needle if the needle is in haystack

這樣實作的效率很差：

Runtime: 56 ms, faster than 23.21% of C++ online submissions for Implement strStr(). Memory Usage: 645.8 MB, less than 5.03% of C++ online submissions for Implement strStr().

如果換個實作方式，可以大幅提高效率：

class Solution {
public:
    int strStr(string haystack, string needle) {
        int m = haystack.size(), n = needle.size();
        for (int i = 0; i <= m - n; i++) {
            int j = 0;
            for (; j < n; j++) {
                if (haystack[i + j] != needle[j]) {
                    break;
                }
            }
            if (j == n) {
                return i;
            }
        }
        return -1;
    }
};

提升的原因主要就是省去了 substr()，因為如果每次看到可能的 candidate 就得取出 needle 長度的子字串，當可能的 candidate 出現很多次時，就會凸顯出效率低落的問題，舉例來說：

haystack = "helololololololololololololololololololololllo" needle = "llllllllllllllllllllllllllllllllllllllllll"

應該不難想像每遇到一個 l 就得取一次 substr，跟上面這種實作一旦遇到 o 就可以跳出內層迴圈的效率差。

解法二 - KMP 演算法(linear time - O(m+n)!!!)

下面這個影片是目前看到講解得最清楚的一個：

演算法核心觀念：

利用 needle 中重複出現的 pattern，讓每次發生不 match 時，i 都能夠不用往回走，只需要移動 j 就可以了，影片中最後的例子講得非常清楚，不過因為這個演算法可能很難在面試中背出來，就先不實作。不過真的是很炫ＸＤ

記錄一下程式碼：

class Solution {
public:
    int strStr(string haystack, string needle) {
        int m = haystack.size(), n = needle.size();
        if (!n) {
            return 0;
        }
        vector<int> lps = kmpProcess(needle);
        for (int i = 0, j = 0; i < m;) {
            if (haystack[i] == needle[j]) { 
                i++, j++;
            }
            if (j == n) {
                return i - j;
            }
            if (i < m && haystack[i] != needle[j]) {
                j ? j = lps[j - 1] : i++;
            }
        }
        return -1;
    }
private:
    vector<int> kmpProcess(string needle) {
        int n = needle.size();
        vector<int> lps(n, 0);
        for (int i = 1, len = 0; i < n;) {
            if (needle[i] == needle[len]) {
                lps[i++] = ++len;
            } else if (len) {
                len = lps[len - 1];
            } else {
                lps[i++] = 0;
            }
        }
        return lps;
    }
};

解法三 - Rabin-Karp algorithm

演算法的核心概念：

把 needle 用 hash table 存起來，然後依序走過 haystack，看有沒有跟 needle 一樣長度的子字串的 hash 值跟 needle 一樣，

這題感覺也有點像一個 pattern，因為我們要看 needle 有沒有出現過，所以我們可以用 hash table 來做到 O(1) 的檢查。

Previous# 14 - Longest Common Prefix Next# 30 - Substring with Concatenation of All Words

Last updated 4 years ago