Creating a Personal Archive of a 12-week boot camp in only two Blocks of code?
“I choose a lazy person to do a hard job. Because a lazy person will find an easy way to do it.” Not Bill Gates?
Become a Data Scientist in 12 weeks, with no prior coding experience required, open to anyone willing to dedicate themselves to a rigorous, challenging experience? Sign me up.
To think of all the knowledge we have gained over the past 12 weeks, I know my poor mind will struggle to retain each detail of how to hyper tune each model or very in-depth understanding of A/B testing. I did have a small running list of key facts to keep track of after this program:
However, I knew this was not going to be able to reflect all of the information we had access to including Appendix information I didn’t personally touch until 7 weeks into the program.
Therefore I looked for a way to preserve all of this information as a personal archive to ensure rather than fighting to understand a Stack-OverFlow post to see if it is what I need I can reference this material. A few problems arose when I started on this project:
- First, this material is owned in full by the Company that provided it to me as a student, through a Learning Management System (Canvas) and there is no clear way to capture each lesson in a reviewable way.
- Second, some of the lessons and almost all Labs are housed through a secondary platform call IllumiDesk, which provides a SaaS Jupyter notebook through a browser. The root of each lesson can be found on a provided GitHub page which was shown to be part of the solution.
- Third, if I could somehow come up with a process that did work for archiving this material in a reviewable way how much personal time would that require, and how annoying would it be? Well for anyone that has a repetitive task 100 times in a row, very annoying is an understatement.
So as all greats do I tackled each problem piece by piece and found some way to make it lazy, hints the title quote. I found that I could preserve each lesson as a pdf using Google Chrome's print page feature (“Ctrl” + “P” when using Chrome on Windows) which then could be saved locally to create a personal archive of the material.
This also happened to be the solution to the Lab issues described during problem two, since I could locate each lesson as a ReadME file on Github and some labs even with Solutions.
Here comes the problem, who is going to go through each phase and capture each lesson as a pdf and store them on a local computer.
This is where PyAutoGUI comes into play. This library allows the coder to mimic mouse movements to complete tasks, their document page includes a youtube video of code playing Sushi Go Round. It is worth a watch if you are really interested in what this library can do in the right hands. In my case I just wanted a code to complete this printing task a set number of times. Below I will show the code which worked on my laptop to complete this task and some of the assumptions needed to mimic the code:
- Some prior code needs to function before this will work:
This allows the checks within the code to decide which page it is looking at and how to complete the task at hand.
- The user needs to be using Google Chrome with a pinned task (Slack in my case) and the Canvas website as the first full tab. This allows the mouse to find the correct tag.
- You have two png files located in the same folder as the Jupyter Notebook, this is an image of the Next button
at the bottom of the Canvas page and a snip of something that is unique on the Lab assignment page. I selected
as my snip of choice, since it was always found on these pages but not on any other assignment pages.
- Lastly, Google Chrome’s Print page needs to be set to “Save to PDF” and the destination folder needs to have the save buttons overlapping so the clicker will complete two clicks to save the PDF in the correct place.
This can be set to run through all the lessons and labs through the Appendix so you too can have a copy of this knowledge without having to Google every “simple” question. This did not take into account any quizzes from the first 3 weeks of this course so the code will just take a PDF of the page, this would have to either be skipped in an else-if statement or deleted from the archive after the collection is complete.
Please note that you may have to adjust some of the code to match your computer’s monitor specifications and processing speed but I will leave that debugging for you to play with.