Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
46 changes: 7 additions & 39 deletions .github/workflows/update-search.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,61 +9,29 @@ on:
paths:
- '**/*.md'
- '**/*.html'
- 'scripts/generate_search_db.rb'
- 'scripts/fetch_github_blog_content.js'
- 'scripts/package.json'
- '.github/workflows/update-search.yml'

# Add permissions needed for the workflow
permissions:
contents: write # This allows the action to commit and push changes
packages: read # This allows the action to read from npm

jobs:
update-search:
runs-on: ubuntu-latest
env:
NODE_AUTH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
NPM_CONFIG_CACHE: ${{ github.workspace }}/.npm

steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0 # Fetch all history for proper git operations

- name: Set up Ruby
uses: ruby/setup-ruby@v1
with:
ruby-version: '3.2'
bundler-cache: true

- name: Set up Node.js
uses: actions/setup-node@v4
with:
node-version: '20'
cache: 'npm'
cache-dependency-path: 'scripts/package.json'
registry-url: 'https://registry.npmjs.org'

- name: Install Ruby dependencies
- name: Clone comphy-search repository
run: |
gem install bundler
bundle config path vendor/bundle
bundle install
gem install nokogiri
git clone https://github.com/comphy-lab/comphy-search.git
mkdir -p assets/js

- name: Install Node.js dependencies
- name: Copy search database
run: |
cd scripts
npm install

- name: Fetch blog content from GitHub repository
run: |
cd scripts
node fetch_github_blog_content.js

- name: Generate search database
run: ruby scripts/generate_search_db.rb

cp comphy-search/search_db.json assets/js/search_db.json

- name: Commit and push changes directly (bypassing branch protection)
run: |
Expand All @@ -73,7 +41,7 @@ jobs:
if git diff --staged --quiet; then
echo "No changes to commit"
else
git commit -m "Update search database"
git commit -m "Update search database from comphy-search repository"
# Use PAT to bypass branch protection
git push https://${{ secrets.BYPASS_TOKEN }}@github.com/${GITHUB_REPOSITORY}.git HEAD:main
fi
67 changes: 29 additions & 38 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -254,6 +254,12 @@ Search results are prioritized and filtered as follows:
- Updated automatically every 12 hours via GitHub Actions
5. Regular content (headings and paragraphs)

The search database is maintained in a separate repository [comphy-lab/comphy-search](https://github.com/comphy-lab/comphy-search) and is automatically updated in this website via GitHub Actions. This approach:
- Centralizes search database generation in a dedicated repository
- Ensures consistent search functionality across the website
- Automatically updates the search database daily or when content changes
- Simplifies maintenance by separating search logic from the website code

### Command Palette Functionality
The website includes a command palette feature that provides quick access to actions and navigation through keyboard shortcuts:

Expand Down Expand Up @@ -292,20 +298,12 @@ Search behavior and features:
- Search results appear instantly as you type
- Results are ranked by relevance and match percentage

The search database is automatically generated during the build process by `scripts/generate_search_db.rb`. This script:
- Indexes all HTML and markdown content
- Identifies and prioritizes team members, teaching content, and research papers
- Extracts tags from research papers
- Processes teaching pages and course details
- Fetches and indexes blog posts from blogs.comphy-lab.org
- Generates a JSON database used by the search functionality

### External Blog Integration
The search functionality includes content from our external blog at blogs.comphy-lab.org:
- Blog posts are fetched and indexed during build
- Blog posts are fetched and indexed in the comphy-search repository
- Each post's title and content are searchable
- Results link directly to the blog post
- Blog content is refreshed with each build
- Blog content is refreshed with each update to the search database

### Tags System
Research papers can be tagged with multiple topics. Tags are defined in the markdown files using the following format:
Expand Down Expand Up @@ -394,17 +392,35 @@ The website uses three GitHub Actions workflows for automation:
3. **Update Search Database** (`.github/workflows/update-search.yml`)
- Maintains site's search functionality
- Triggers:
- Every 4 hours automatically
- Daily at 4:00 UTC automatically
- On content file changes (MD/HTML)
- Manual trigger available
- Generates and updates `search_db.json`
- Fetches the search database from [comphy-lab/comphy-search](https://github.com/comphy-lab/comphy-search)
- Updates `search_db.json` in the website repository
- Commits changes back to repository

These workflows work together to ensure:
- Automated site builds and deployments
- Up-to-date search functionality
- Consistent deployment to GitHub Pages

3. **Blog Content Indexing**
- Blog content from [blogs.comphy-lab.org](https://blogs.comphy-lab.org) is indexed in the [comphy-search](https://github.com/comphy-lab/comphy-search) repository
- Source: [comphy-lab/CoMPhy-Lab-Blogs](https://github.com/comphy-lab/CoMPhy-Lab-Blogs)
- Filtering criteria:
- Only indexes markdown files where `publish: false` is NOT set in frontmatter
- Automatically excludes any files with "todo" in the filename (case-insensitive)
- The search index is automatically updated:
- Daily via GitHub Actions
- When changes are pushed to markdown or HTML files
- Can be manually triggered from the Actions tab
- This approach improves search quality by:
- Centralizing search database generation
- Accessing the raw markdown directly from the source
- Respecting publish status in frontmatter
- Processing content in a more structured way
- Avoiding web scraping issues or rate limits

## Contributing

### Issue Templates
Expand Down Expand Up @@ -436,29 +452,4 @@ To submit a PR:
3. Test changes locally
4. Create a PR using the template
5. Link any related issues
6. Wait for review

3. **Blog Content Indexing**
- Blog content from [blogs.comphy-lab.org](https://blogs.comphy-lab.org) is now indexed directly from the GitHub repository
- Source: [comphy-lab/CoMPhy-Lab-Blogs](https://github.com/comphy-lab/CoMPhy-Lab-Blogs)
- Filtering criteria:
- Only indexes markdown files where `publish: false` is NOT set in frontmatter
- Automatically excludes any files with "todo" in the filename (case-insensitive)
- The search index is automatically updated:
- Every 12 hours via GitHub Actions
- When changes are pushed to markdown or HTML files
- Can be manually triggered from the Actions tab
- To manually update the search index locally:
```bash
# From the project root directory
cd scripts
npm install
node fetch_github_blog_content.js
cd ..
ruby scripts/generate_search_db.rb
```
- This approach improves search quality by:
- Accessing the raw markdown directly from the source
- Respecting publish status in frontmatter
- Processing content in a more structured way
- Avoiding web scraping issues or rate limits
6. Wait for review
Loading
Loading