Ok, I am about to use CEON for in a very unique, off-label way. But I have some questions and I'd like some advice/opinions before I do it.

+++++++++++++++++++
SITUATION
A few years ago, I migrated a client's site to ZC (currently zc154). At some point, I set up Google Webmaster for them. Recently (over the last week), the console has been kicking out tons of 404 Crawl Errors (159, as of just now… was 48 last weekend). A big chunk (80%) of the URL's end with .aspx and go something like this:
  • mydomain.com/contact.aspx
  • mydomain.com/cat-Soldering_Equipment_and_Supplies-17.aspx
  • mydomain.com/cat-Fume_Extraction-76.aspx

Which means they are legacy links, used to be legit a few years ago.

When I first received the alert from google that there was an increase of 404's, I was tempted to ignore them. I thought they were just broken incoming links. But when I drilled down on them in the console to find the origin, almost all appeared not to be incoming, rather links found on their site.

So last night I downloaded their entire db via phpMyAdmin. I found many .asp and .aspx links, but all were legit outbound links. So then I zipped the entire public_html folder, downloaded, and gripped. Again, every instance of asp/aspx was legit.

This left me very confused. I have no flipping clue how google can say that pages on my site that do NOT exist are linking to other pages on my site that do NOT exist.

Went bed very late, very frustrated. But woke up with a pretty good solution (maybe?).


+++++++++++++++++++
CAUSATION
I'm fairly sure that, at some point long ago, I used Ceon to redirect legacy url's to updated ones. My clients recently decided to restructure all URI's using the Ceon mod (commercial version I bought from Conor shortly before he passed). This rebuild broke all the legacy stuff for some reason.


+++++++++++++++++++
SOLUTION
Ok as I see it, there are two problems. First, external sites are still linking to old pages that no longer exist (legacy). Second, google also seems to be blaming my site as the source of links to these pages that do not exist.

PART 1
The second one seems easy enough to fix: I will use robot.txt to place a nofollow:
User-agent: *
Disallow: *.asp
Disallow: *.aspx

This will tell SE's not to even try to index those old url's. So even if google believes the links are coming from me, it will ignore them.

PART 2
If you can't beat 'em, join 'em!

Apparently there are still plenty of external sites supplying old incoming links (I was able to find a few). Rather than try to hunt them all down and beg webmasters to make changes, I'll just line up legacy urls to the current url and insert the legacy uri into the Ceon db for redirect to the current uri.

This is what I'll do (NOTE: the following tags may differ from the free mod, but the principal should be the same and thus applicable/usable for others who may visit this forum with the same SEO problem). I will:
  1. Open phpMyAdmin and dump the table 'ceon_uri_mappings' as csv.
  2. Open google webmaster console, export 404 url's as csv.
  3. Line up legacy uri's in column next to ceon uri's to get the correct associated_db_id.
  4. Rename legacy column to 'uri' and delete current 'uri' column.
  5. Remove all rows, except the legacy uri's that I want to insert.
  6. Change existing current_uri entry from 1 to 0 (so that ceon redirects).
  7. Import legacy uri file back into ceon_uri_mappings table.

Now, incoming links will be redirected instead of landing on a Not Found. In theory, external/incoming legacy links will:
  • Regain any SEO strength
  • Not trigger SE 404 errors (????)


PART 3
As soon as the two above are finished, I will:
  1. go to the google console and mark all as fixed.
  2. ping SE's for a crawl.
  3. pray real hard that it works! lol


+++++++++++++++++++
QUESTIONS
Are there any wholes in the PART 2 fix above? Will it cause any problems?

Should I implement the robot.txt fix? Will that undo the Ceon fix? My thought is that my Disallow of asp extensions will simply stop any following of internal links (just in case something weird is going on that I can't find). But if an SE follows a link in, they will:
  • not get a 404
  • simply get redirected before they can index

The possible problem would be that maybe the SE bot––upon arrival--maybe see the disallow before the redirect. If that happens, any SEO strength gained from external links could be undone, maybe?

Anybody see a dif solution to this whole problem? Again, I have searched the entire db and every last text/php/whatever file contained in the public_html, finding no internal links to any of 404 uri's.

But maybe instead of editing the URI table, there is another way to do this?

Any thoughts would be appreciated!