Robots.txt can save your site from being penalized for containing duplicate content by not allowing search engine bots to crawl specific files that may be repeated or unnecessary to search.
What exactly is robots.txt? Robots.txt is a file that essentially places limitations on search engine robots (“bots”) that crawl the Internet trying to find pages to index to their database. If you have content that you do not want the automated bots to index, you upload a robot.txt file to the root accessible directory of your site. This will not allow the bots to crawl files, directories or even your entire website.
You have to be very careful when implementing robots.txt though. If you make a mistake and include files that you want to be displayed on your website and crawled by search engine robots to improve your ranking inside of a robots.txt, you could completely make your site unsearchable or even not viewable.
To find out more about how robots.txt operates and to further understand the process to decide where to apply this file, visit The Web Robots Pages.