Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use CloudFlare Workers for html-preview-link #3305

Merged
merged 2 commits into from Jul 2, 2020

Conversation

@kidonng kidonng changed the title Use CloudFlare Workeres for html-preview-link Use CloudFlare Workers for html-preview-link Jul 1, 2020
@fregante fregante merged commit 2d487c2 into refined-github:master Jul 2, 2020
@fregante
Copy link
Member

fregante commented Jul 2, 2020

Excellent! Might be worth redirecting the root (https://refined-github-html-preview.kidonng.workers.dev/) to this PR or a to a repo containing the code.

@kidonng
Copy link
Member Author

kidonng commented Jul 3, 2020

Great idea, now the root is redirecting to this PR.

@kidonng kidonng deleted the html-preview-link branch July 28, 2020 02:58
@fregante
Copy link
Member

Some pages are showing on Google: https://www.google.com/search?q=site:refined-github-html-preview.kidonng.workers.dev

Can you add an X-Robots-Tag: noindex, nofollow, noarchive header? https://developers.google.com/search/reference/robots_meta_tag

@kidonng
Copy link
Member Author

kidonng commented Feb 19, 2021

Oops, that's a big oversight. Added x-robots-tag and robots.txt, you can refer to the updated code.

@fregante
Copy link
Member

none doesn't include noarchive, which I think means that those 3 links won't be removed.

Also probably we should drop robots.txt because that prevents Google from seeing the robots tag. If I remember correctly, if Google isn't allowed to fetch the links it finds, it will still list them on the SERPs as "Google couldn't fetch the description due to the robots.txt file. more info"

@kidonng
Copy link
Member Author

kidonng commented Feb 20, 2021

Just noticed how this works:

For the noindex directive to be effective, the page must not be blocked by a robots.txt file. If the page is blocked by a robots.txt file, the crawler will never see the noindex directive, and the page can still appear in search results, for example if other pages link to it.

So I made some changes accordingly. Thanks for the reminder!

I have also submitted removal requests for existing results:

image

@mvy-siteimprove
Copy link

FYI: this feature doesn't work well with private repos - in fact, it looks like a phishing attack, since it renders github 404 screen with the refined-github-html-preview.kidonng.workers.dev in the address bar (and the login form is focused 🙈).

@fregante
Copy link
Member

Good point, private repos will be excluded

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants