hey wasteof I’ve been making something I’m quite proud of, a completely independent search engine.
it’s definitely not the best, having only just under 6,000 web pages indexed, but it doesn’t take any data from third parties such as Bing or Google, which most other “search engines” (like DuckDuckGo, Ecosia, Yahoo etc) do
you can try it out at https://novasearch.xyz, it’s still very work in progress, please do feel free to contribute on github, whether by reporting a bug you find or creating a pull request https://github.com/Nova-Search
idea: check if the website have a sitemap, use it to crawl every pages
looks cool! btw - instead of saving icons, you can use duckduckgo's icon server. I forgot exactly how to use it but you can figure it out by searching something and then copying the favicon urls.
That would be the easy way to do things but I plan on keeping Nova completely independent for the time being - no third parties
Currently I’m crawling manually, so I just give it a site to start crawling from and it finds other sites
Thanks, I tried crawling both fortnite.com and its epic games store page but both seem to block my crawler, i’ll need a better way to crawl that doesn't get detected as easily
very much in progress I can tell but everything’s got to start somewhere :D
true, got a lot of web scraping to do, I also plan on opening up the API more to allow other things to use nova search results without having to pay for something like bing or google (and it’ll be easier to use too). things I hope to be able to add in the future range from image search, to those useful little search apps google has like the timer, and autocomplete