For context I created a video search engine last year, I shut it down and put the data online. You can read about it here: https://www.bendangelo.me/2024/07/16/failed-attempt-at-creating-a-video-search-engine/

I put that project on hold because of scaling issues, anyway I’m back with an other idea. I’ve been frustrated with how AI slop is ruining the internet and recently it’s been hitting Youitube pretty hard with AI videos. I’m brainstorming a tool for people to selfhost:

Self-hosted crawler: Pick which sites/videos to index (blogs, forums, YT channels, etc.). AI chat interface: Ask questions like, “Show me Rust tutorials from 2023” or “Summarize recent posts about homelab backups.” Optional sharing: Pool indexes with trusted friends/communities.

Why? No Google/YouTube spam—only content you choose. Works offline (archive forums, videos, docs). Local AI (Mistral) or cloud (paid) for smarter searches.

Would this be useful to you? What sites would you crawl? Any killer features I’m missing?

Prototype in progress—just testing interest!

  • Sims@lemmy.ml
    link
    fedilink
    English
    arrow-up
    2
    arrow-down
    3
    ·
    6 days ago

    Absolutely. I’m not as AI-phobic, and I absolutely need both an information gathering agent that searches, follows rss/channels, explores/researches topics, summarizes swats of daily events, and I need a filter between me and corporate internet. Let the AI discard obvious slop, Ads/other propaganda and general informational noise.

    However, you should think about how to share search-results so all our agents don’t floor small services/commoners. We really need to get more information out of corporate silo’s, and into public search/knowledge systems. Perhaps you can integrate some distributed search/knowledge properties ? (now that you have AI to help build it)