I work on a website that is backed by a database with a lot of product information. Much of this information is user generated.
I would like to know if there are any copyright issues around crawling the web for more product information, given that the entries are simply facts.
A typical database entry might look like:
Some Toy name, Some manufacturer name, Release date
Is it possible that any site that has compiled this information (but made it freely available via their website) could claim that it was copyrighted by them? I understand that if the presentation of the data was the same they might claim copyright infringement, but the presentation of that data is unique to the app I am working on.
Facts cannot be copyrighted. Stories about facts can be copyrighted, however.
In the EU there is the concept of database rights, which protects the investment made in compiling a database of facts ("the protection for databases is reserved for those databases where it can be shown that there has been ... a substantial investment in the obtaining, verification or presentation of their contents" ). According to the Wikipedia article there is not an equivalent in the US (but please don't rely on me or Wikipedia for legal advice).
Google recently came under serious fire for aggregating data such as reviews, ratings, information,, and more from social ratingss sites and the like, In the end Google was dealt a major defeat and was forced to pay hefty infringement fines and also had to pull all the data.
In a nutshell if you're taking specs directly off a site that's considered copyright infringment as when it comes to websites all content is copoyrighted by default unless otherewise noted.
Now for specs, typically going direct to the dealer/manufacturer is your best bet since they are the ones in charge of the info so they can license it directly.
As for pulling data from other sites, you'll have to make sure that those sites are authorized to license the info and if so you can arrange permissions from there, but from my experience its always best to cut the middleman out as much as possible.
In the U.S. facts cannot be copyrighted. This was a big decision that allowed a lot of fantasy sports sites to stay online a few years back. There are trademark issues, however. With brand names it can be a problem. Some companies pursue things aggressively. Others not at all.
Recently Go-Pro came under fire for issuing DMCA requests against sites that were not legal resellers - they didn't just go after the offending pages, they also went after reviews.
Some companies use seeding strategies to prevent scraping and to identify potential intellectual property infringement cases. Others will hire firms to protect their interests. Sometimes a third party will come into play and offer to attack identified offenders for a commission.
The more successful your operation, the more susceptible you are to attack. There are a ton of sites out there that do the same thing for SEO. They will generate thousands of worthless web pages to try to generate relevant links. Just because they get away with it doesn't mean you will.
Rather than scrape and re-publish I would look for a reliable datasource. Manufacturers are sometimes open with information. Sometimes large distributors will provide data sets. There's always Amazon's set of utilities as well.