Recent Awesome π
Is similar to github trending page, but for items in awesome lists.
It’s composed on two basic parts
- Crawler: a job executed daily (using github actions?) to parse the awesome lists. The idea is to get some information as (awesome list, category in the list, link to the resource, date added to the list)
- a simple webpage that could get the json and show you the lastest additions
Crawler π
First milestone is be able to get the for a sample awesome list (for example this: https://github.com/TheJambo/awesome-testing#readme) get each bullet point with the text, the link and the category (if possible)
I am going to give a try to mistune2
library, that apart from generating html from markdown, can get you the AST (Abstract Syntax Tree) from the markdown.
From here it should be easy to detect the bullet points that have a link and get them.
You can see the ugly code here https://github.com/kozko2001/awesome-list-crawler
Infrastructure π
I am gonna use AWS:
- Create and ECR (a place to store docker images)
- Create a github action that each time I commit to the repository will push the image to the ECR (I have to be extra careful to create a user with just enough permissions to push there) see: useful tutorial -> https://aws.plainenglish.io/build-a-docker-image-and-publish-it-to-aws-ecr-using-github-actions-f20accd774c3
- Have a lambda that will execute the docker image once a day -> and store the result into a public S3
For doing all the provisioning I will try to use CDK for python (first time!)