Texidium
Texidum is a platform primarily designed for enrolled students to access ebooks. However, it feels like a draconian system because it doesn’t allow users to download a local copy of the PDF. The creators seem so obsessed with controlling distribution—probably for monetary gain—that they’ve implemented stringent restrictions. This mindset feels antithetical to the spirit of sharing knowledge. I’m all for intellectual property rights, but there are established licenses that could balance accessibility with fair use. I’m not advocating for piracy or commercial misuse, just the freedom to access knowledge in a more practical way.
Given these restrictions, I’ve started exploring ways to extract the embedded PDFs from their iframe. I’ve previously had success with similar tasks, like extracting font files from websites (since those files must be stored locally for the page to render properly). While experimenting with the browser’s developer tools, I tried a few approaches, but I didn’t document my process well, and most of it didn’t work. So, I’m restarting this effort with a more structured approach.
A script that clicks the next-section
button, where I have a screen recorder recording each page. The purpose of this is, I hope that there is a python tool where I can extract the image displayed at each frame, complile those images into 1 pdf, thereby recreating the book.
// Global variable to control the loop
let stopClicking = false;
function autoClickButton() {
// Select the button inside the 'footer-button right' div
const button = document.querySelector('.footer-button.right .footer-button-icon.desktop');
// Check if the button exists and the stop condition is not met
if (button && !stopClicking) {
// Click the button
button.click();
console.log('Button clicked');
// Call the function again after 1 second
setTimeout(autoClickButton, 1000);
} else if (!button) {
// Stop if the button is not found, possibly indicating the end of all pages
console.log('Button not found or end reached');
} else {
console.log('Script stopped by user.');
}
}
// Start the auto-clicking process
autoClickButton();
// To stop the script, run the following command in the console:
stopClicking = true;
Current version downsides (besides the quality when you want to zoom in):
- Every new page starts the scroll from the top when I want the scroll to be in the middle to capture the whole page.
It turns out that iframe
wasn’t the one controlling the scrolling.
Steps to reproduce:
- Disable all CSS using the web dev extension.
- delete the tag/div with anything
print
- change the height of the iframe from 100% to 900px (or change the
id=DocumentViewer
’s height) - Run the following script.
// Global variable to control the scrolling loop
let stopScrolling = false;
// Function to scroll the DocumentViewer div
function autoScrollDiv() {
// Select the DocumentViewer div
const viewer = document.querySelector('#DocumentViewer'); // Ensure this selector matches the correct element
if (viewer) {
// Scroll down by a set amount (e.g., 50 pixels)
viewer.scrollBy(0, 50); // Adjust this value for smoother or faster scrolling
console.log('Scrolling the DocumentViewer...');
// Continue scrolling every second unless stopped
if (!stopScrolling) {
setTimeout(autoScrollDiv, 1000); // Adjust the delay as needed
} else {
console.log('Scrolling stopped by user.');
}
} else {
console.log('DocumentViewer not found.');
}
}
// Start the scrolling process
autoScrollDiv();
// To stop scrolling, set the following in the console:
stopScrolling = true;