_1、注册账号

首先需要去 algolia 官网注册自己的账号,可以直接使用 Github 或者其他邮箱注册登录。

新账号会自动创建一个Application ,也可以自己创建一个新的

image-20250824130816421

image-20250824130142558

点击确定后接着下一个页面继续点击create Application按钮,然后点击NEXT就创建了一个新的Application,创建完成后点击 Skip for now,不然会根据域名生成index名称,爬虫配置也不对 image-20250824130516564

_2、新建index

按下图步骤创建index

image-20250824131152530

_3、验证域名

按要求验证域名 image-20250824131514561

然后点击Skip for now进行下一步骤

_4、配置爬虫

按下图所示新建爬虫

image-20250824133254539

新建后点击爬虫名称进入爬虫配置

image-20250824135124409

爬虫配置如下,首先查看自动生成的配置,保存appIdapiKey填写到下方,然后将域名indexName改成自己的配置,复制到algolia代码框中,配置完成后点击右上角start Crawling开始爬取网站内容

      new Crawler({
  appId: "",
  apiKey: "",
  indexPrefix: "",
  rateLimit: 8,
  startUrls: ["https://xiaoying.org.cn/"],
  renderJavaScript: false,
  sitemaps: ["https://xiaoying.org.cn/sitemap.xml"],
  exclusionPatterns: [],
  ignoreCanonicalTo: true,
  discoveryPatterns: ["https://xiaoying.org.cn/**"],
  schedule: "on the first day of the week",
  actions: [
    {
      indexName: "hugos",
      pathsToMatch: ["https://xiaoying.org.cn/**"],
      recordExtractor: ({ url, $, helpers }) => {
        // 1. 提取标题和基础内容(用于 splitContentIntoRecords)
        const baseRecord = {
          url,
          title: $("head title").text().trim(),
        };

        const $bodyClone = $("body").clone();
        // 移除代码块,以免内部的代码被解析
        $bodyClone.find("pre, code").remove();

        const splitRecords = helpers.splitContentIntoRecords({
          baseRecord,
          $elements: $bodyClone,
          maxRecordBytes: 1000,
          textAttributeName: "text",
          orderingAttributeName: "part",
        });

        // 2. 抽取代码片段
        const code = helpers.codeSnippets({
          tag: "pre",
          languageClassPrefix: "language-",
        });

        // 3. DocSearch 风格结构化内容
        let lvl0 = "Documentation";
        const breadcrumbs = [];
        $("#breadcrumbs li.breadcrumb-item").each((i, el) => {
          const name = $(el).find('[itemprop="name"]').text().trim();
          breadcrumbs.push(name);
        });
        if (breadcrumbs.length >= 2) lvl0 = breadcrumbs[breadcrumbs.length - 2];

        $(".docs-content h2 i").remove();

        const docsearchRecords = helpers.docsearch({
          aggregateContent: true,
          indexHeadings: true,
          recordVersion: "v3",
          recordProps: {
            lvl0: { selectors: "", defaultValue: lvl0 },
            lvl1: ".docs-content h1",
            lvl2: ".docs-content h2",
            lvl3: ".docs-content h3",
            lvl4: ".docs-content h4",
            lvl5: ".docs-content h5",
            content: ".main-content p, .main-content li",
          },
        });

        // 合并所有提取内容为单一数组
        return [
          ...splitRecords,
          ...docsearchRecords,
          ...(code.code ? [{ code: code.code }] : []),
        ];
      },
    }
  ],
  initialIndexSettings: {
    hugos: {
      attributesForFaceting: ["type", "lang", "chunkIndex", "totalChunks"],
      attributesToRetrieve: [
        "hierarchy.lvl0",
        "hierarchy.lvl1",
        "hierarchy.lvl2",
        "hierarchy.lvl3",
        "hierarchy.lvl4",
        "hierarchy.lvl5",
        "hierarchy.lvl6",
        "content",
        "anchor",
        "url",
        "chunkIndex",
        "totalChunks",
      ],
      attributesToHighlight: ["hierarchy", "content"],
      attributesToSnippet: ["content:20"],
      searchableAttributes: [
        "unordered(hierarchy.lvl0)",
        "unordered(hierarchy.lvl1)",
        "unordered(hierarchy.lvl2)",
        "unordered(hierarchy.lvl3)",
        "unordered(hierarchy.lvl4)",
        "unordered(hierarchy.lvl5)",
        "unordered(hierarchy.lvl6)",
        "unordered(anchor)",
        "content",
      ],
      distinct: true,
      attributeForDistinct: "url",
      customRanking: [
        "desc(weight.pageRank)",
        "desc(weight.level)",
        "asc(weight.position)",
        "asc(chunkIndex)",
      ],
      ranking: [
        "words",
        "filters",
        "typo",
        "attribute",
        "proximity",
        "exact",
        "custom",
      ],
      highlightPreTag: '<span class="algolia-highlight">',
      highlightPostTag: "</span>",
      minWordSizefor1Typo: 3,
      minWordSizefor2Typos: 7,
      allowTyposOnNumericTokens: false,
      minProximity: 1,
      ignorePlurals: true,
      advancedSyntax: true,
      removeWordsIfNoResults: "allOptional",
    }
  },
});
    

如果需要AI搜索可以添加如下配置,然后将域名indexName改成自己的配置

      // 上方actions内添加

{
      indexName: "hugos-md",
      pathsToMatch: ["https://hugo.xiaoying.org.cn/pages/**"],
      recordExtractor: ({ $, url, helpers }) => {
        // Target only the main content, excluding navigation
        const text = helpers.markdown("main");
        if (text === "") return [];

        const language = $("html").attr("lang") || "en";
        const title = $("head > title").text();

        // Get the main heading for better searchability
        const h1 = $(".docs-content h1").first().text();

        return helpers.splitTextIntoRecords({
          text,
          baseRecord: {
            url,
            objectID: url,
            title: title || h1,
            heading: h1, // Add main heading as separate field
            lang: language,
          },
          maxRecordBytes: 8000,
          orderingAttributeName: "part",
        });
      },
 }
    
      // 上方initialIndexSettings内添加 
"hugos-md": {
      attributesForFaceting: ["type", "lang"],
      ignorePlurals: true,
      minProximity: 1,
      indexLanguages: ["zh"],
      queryLanguages: ["zh"],
      distinct: true,
      attributeForDistinct: "url",
      removeStopWords: false,
      searchableAttributes: ["title", "heading", "unordered(text)"],
      removeWordsIfNoResults: "lastWords",
      attributesToHighlight: ["title", "text"],
      typoTolerance: false,
      advancedSyntax: false,
 },
    

添加AI页面位置如下图,按要求添加即可

image-20260425212221275

_5、索引设置

回到搜索页面看是否有数据

image-20250824134034515

接着配置索引,选择要搜索的内容

image-20250824134400340

接着配置facets,这是实现高级搜索和筛选功能的核心特性之一,主要作用是帮助用户快速缩小搜索范围,提升搜索体验,这里要重点注意 lang必须被选择,否则网页搜索为空

image-20250824134630115

_6、代码配置

按图所示找到下面的配置

image-20250824135518386

image-20250824135844855

image-20250824135939156

      export default defineConfig({
   ...
    lang: "zh-CN",
   ...
    themeConfig: ({
      ...
        search: {
            provider: 'algolia',
            options: {
                appId: '...',
                apiKey: '...',
                indexName: '...',
                askAi: {
                  indexName: '', 
                  assistantId: '上面获取到的Assistant ID'
               },
            },
        } 
      ...
    })
})
    

声明

作者: liyao

版权:本博客所有文章除特别声明外,均采用CCBY-NC-SA4.O许可协议。转载请注明!