Disclaimer: OK...Nothing that you are reading is fact, we haven't even begun to scratch the surface in understanding Penguin. The goal of this thread is to provide a resource to discuss evidence and begin formulating theories of Penguin.
I've been spending a painful amount of time analyzing Penguin's effects, and talking with other bulldog SEO's to discover what factors were implemented in the latest algorithm update. One thing that we all instantly agreed on, was that the Penguin update has been unlike anything we've ever seen before. We expect to see a major revamp and tweaks to the algo soon.
Examples of Low Quality Garbage Rising to the Top:
Credit Card Refinance
5th result on Page 1
The site that's ranked has 7 pages indexed, of which are all default pages for a stock CMS. The total content on this site is all on the homepage, which is a short blurb that was clearly written in a matter of minutes.
Exact Match Domain
Backlinks: Zero Backlinks
Age of domain: 2 yrs old
5th result on Page 1
The site has 140 pages indexed, but most are duplicate, tag, or thin pages in general. Overall the site is very thin, with unmasked affiliate links! Overall, not a site you'd expect to be prominently displayed on the first page of Google for the keyword 'paid surveys'.
Not an exact match domain, but includes the keyword 'survey' in domain
Backlinks: 354 (ahrefs)
Notes about backlinks: A great majority of the backlinks to this domain are from BuildMyRank and other networks that have been deindexed entirely. Some directory links, but overall a very thin link profile that is blatantly artificial with little anchor text diversity
Age of domain: 4 yrs old
Other Crappy Rankings:
5th result is a movie review for a Christian Movie on a popular movie review website
Zero links pointed to this page with any keywords related to "mexico pharmacy"
It's very unlikely that this was a hacked, redirect, Google mask, or anything more than a ranking mistake.
credit card review
10th result on Page 1 is a Wikipedia link to Amazon's page!
Similarities Among Sites Crushed by Penguin:
Sitewide Above the Fold Call to Actions/Forms
I've been spending a god awful amount of time on Google's Webmaster Forum, which is a great resource to find sites that were negatively affected by the recent algorithm update. Frustrated webmasters provide their full URL in hopes that someone will point out the horrible mistake(s) they've made so that they can correct it. Don't waste time reading the responses, you'd get better advice from an Eskimo. Regardless, its a good resource for finding actual URL's of sites affected, and in niches that you may never hear about or think about.
One commonality I'm seeing is that sites with large sitewide Call To Actions/forms above the fold were hit, and hit hard. Think insurance related websites, investigative websites, and most lead gen type sites.
Case-Studies of Above-the-Fold Penalty Theory
Age of domain: 16 yrs old (well branded)
Notice the huge orange button? The large form to enter information in? Each and every page on this website has this same exact lead gen form on the top of the page. It's clear that a team of people have poured their hearts and souls into this site. Painstakingly optimizing each page on the site with unique, well written content. But, there's a lot of overlap in terms of design of each page, and that big form takes up most of the real estate ABOVE the fold. Viewing the site in 800x600 resolutions, there's NO content above the fold.
All other optimization factors could qualify as amazing, great backlinks, good anchor text and incoming link diversity. A very well branded domain with an ancient old domain (16 yrs old!)! The SEOMoz team would get their rocks off if they had something to do with this domain, its that good. It's astonishing, if anything, this domain's optimization is too good (another Penguin theory of mine).
An unbelievable backlink profile, with no signs of artificial linkbuilding. Links from .edu's, .gov's, .mil, and just about every authority type of TLD that you can imagine. This is nearly the pinnacle of a backlink profile that SEOMoz would give as an example of what to do for a white-hat, well branded domain.
Age of domain: 12 yrs old
Again, notice the white fields at the top of the page? That's how it is on EVERY page of the domain. The top header section is simply a lead gen form that's sitewide. According to this guy in the Webmaster forum he had sustained amazing rankings until the recent Penguin update. This was a leader in this niche, and had been for many years.
All backlinks appear to be natural, with no sign of manually built links.
Age of domain: 8 yrs old
This is a UK based site that was also hit very hard according to the owner on GWT forums. According to the owner the site has plummeted in the rankings after Penguin was rolled out. Notice the huge header image? When viewing on the pinnacle 800x600 resolution there is NO visible content on the site. For the content itself, and title tags, it could be considered over-optimized with the primary keyword showing up on just about every page, with slight modification.
Major Take Aways:
While it's still early to determine the actual changes in the algorithm, when can begin to paint a picture and make some hypothesis about potential changes. My gut feeling is that Penguin largely affected on-site factors rather than off-site factors. Sites that would be considered perfectly optimized, are some of the best examples of sites that got crushed in the latest Penguin update.
Above the fold penalty
It's very likely that Google has implemented this into Penguin. Sites with forms, advertisements or large images that fill up area above the fold sitewide appear to have been hit hardest. If you think this was your problem try viewing your site in a 800x600 screen resolution, how much unique content is visible in this area? You can use Google's own tool (http://browsersize.googlelabs.com/)
"Bad Backlinks" AREN'T Reason for Ranking Drops
Like many BH SEO's, I've got a ton of domains that I've done testing with. Manytest domains with nothing but massive Xrumer and Scrapebox spam skyrocketed in the SERPs after the recent algorithm update.
A couple examples I provided above that increased in rankings have links from BMR, ALN, and other networks that have been deindexed! The rest are lower quality article directory links, low quality social bookmarks, and nothing to really write home about.
If anything, link "penalties", because of over-optimization, were distributed a few weeks ago, but not as a direct result of Penguin.
How can YOU Help?
Let's make this the best thread on BHW discussing theories, based on evidence, of what the latest factors of algorithm change were!
What trends are you seeing? If you were affected by Penguin, in a good or bad way, please share some details about your site. How is the content on your site? How many pages do you have indexed? On a scale of 1-10, how good is your on-site optimization. What is your gut telling you about Penguin?