I want to create a "social search engine" website. The idea is for people to search for jobs (or articles, or recipes -- I am necessarily targeting jobs), and be able to view them on my site, comment on them, vote on them, share them, etc. An example of this kind of idea is http://www.yummly.com which targets recipes.
My problem is with the copyright. Do I need to get permission from every site I want to crawl to display their content on my own site -- even if I am linking back to them as the source? If so, how would I go about this?
Also, are there any good ways I can have the social features while still being treated as merely a search engine? (As opposed to "stealing" content.)
Most sites have a file called Robots.txt, placed in the site root, that tells which parts of the site you may, or may not, access with non-human behaviour. Also make sure you follow fair use when you publish content (publish an abstract, always link to the source site etc.). The Electronic Frontier Foundation has some good info on legals.
In some cases you may collect the data using the site's API. Just make sure the Terms of Service, the EULA or whatever agreement covers the API allows for your usage.
If you are doing as google does, just putting a title in the search results and description you are fine. Just give your site owners the option to verify that they own a domain name and opt out.