feat: the app

This commit is contained in:
2025-11-04 20:11:53 +01:00
commit f712c84a2e
11 changed files with 4307 additions and 0 deletions

12
.dockerignore Normal file
View File

@@ -0,0 +1,12 @@
node_modules
npm-debug.log
data
.env
.git
.gitignore
*.md
.DS_Store
.vscode
.idea
*.log

6
.gitignore vendored Normal file
View File

@@ -0,0 +1,6 @@
node_modules/
data/
.env
*.log
.DS_Store

59
Dockerfile Normal file
View File

@@ -0,0 +1,59 @@
# Use Node.js LTS version
FROM node:20-slim
# Install Chromium and dependencies for Puppeteer
RUN apt-get update && apt-get install -y \
chromium \
chromium-sandbox \
fonts-liberation \
libappindicator3-1 \
libasound2 \
libatk-bridge2.0-0 \
libatk1.0-0 \
libcups2 \
libdbus-1-3 \
libdrm2 \
libgbm1 \
libgdk-pixbuf2.0-0 \
libglib2.0-0 \
libgtk-3-0 \
libnspr4 \
libnss3 \
libx11-xcb1 \
libxcomposite1 \
libxdamage1 \
libxfixes3 \
libxrandr2 \
xdg-utils \
&& rm -rf /var/lib/apt/lists/*
# Set working directory
WORKDIR /app
# Copy package files
COPY package*.json ./
# Install dependencies
RUN npm ci --only=production
# Copy application files
COPY . .
# Create data directory for links storage
RUN mkdir -p /app/data
# Set environment variables
ENV NODE_ENV=production
ENV PORT=3000
ENV CHROME_EXECUTABLE_PATH=/usr/bin/chromium
# Expose port
EXPOSE 3000
# Health check
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
CMD node -e "require('http').get('http://localhost:3000/api/links', (r) => {process.exit(r.statusCode === 200 ? 0 : 1)})"
# Start the application
CMD ["node", "server.js"]

144
README.md Normal file
View File

@@ -0,0 +1,144 @@
# LinkDing
LinkDing is a modern bookmarking application where you can paste links and get a list of links with title, description, and image. After a link is pasted, the page is scraped for metadata including the main product image that is then displayed in the link list. It is primarily used to make a list of links to products that you may want to buy.
## Features
- ✅ Paste links and get a list of links with title, description, and image
- ✅ Automatic metadata extraction (title, description, product images)
- ✅ Search functionality by title, description, and URL
- ✅ Modern, responsive web interface
- ✅ Smart image extraction from product containers
- ✅ Support for JavaScript-heavy sites using Puppeteer
- ✅ Automatic fallback from HTTP scraping to browser rendering
## Tech Stack
- **Backend**: Express.js (Node.js)
- **Frontend**: Vanilla JavaScript, HTML5, CSS3
- **Web Scraping**: Cheerio + Puppeteer (for JavaScript-heavy sites)
- **Data Storage**: JSON file
## Installation
### Prerequisites
- Node.js 18+ (or Docker)
- Chromium/Chrome (for Puppeteer support, optional)
### Local Installation
1. Clone the repository or navigate to the project directory:
```bash
cd linkding
```
2. Install dependencies:
```bash
npm install
```
3. Start the server:
```bash
npm start
```
4. Open your browser to `http://localhost:3000`
### Docker Installation
1. Build the Docker image:
```bash
docker build -t linkding .
```
2. Run the container:
```bash
docker run -d \
--name linkding \
-p 3000:3000 \
-v $(pwd)/data:/app/data \
linkding
```
Or use Docker Compose:
```bash
docker-compose up -d
```
3. Access the application at `http://localhost:3000`
## Usage
1. **Add a Link**: Paste a URL into the input field and click "Add Link"
2. **Search**: Use the search bar to filter links by title, description, or URL
3. **View Links**: Browse your saved links with images, titles, and descriptions
4. **Delete Links**: Click the "Delete" button on any link card to remove it
## API Endpoints
- `GET /api/links` - Get all saved links
- `GET /api/links/search?q=query` - Search links
- `POST /api/links` - Add a new link (body: `{ "url": "https://example.com" }`)
- `DELETE /api/links/:id` - Delete a link by ID
## Metadata Extraction
The application automatically extracts:
- **Title**: From Open Graph tags, JSON-LD structured data, or HTML `<h1>`/`<title>` tags
- **Description**: From meta tags, structured data, or page content
- **Images**: Prioritizes product container images, then meta tags, with smart fallbacks
### Image Extraction Priority
1. Product container images (`.product-container img`, etc.)
2. Product-specific image containers
3. Open Graph / Twitter Card meta tags
4. JSON-LD structured data
5. Generic product selectors
6. Fallback to meaningful images
## Environment Variables
- `PORT` - Server port (default: 3000)
- `CHROME_EXECUTABLE_PATH` - Path to Chrome/Chromium executable (for Puppeteer)
- `NODE_ENV` - Environment mode (production/development)
## Data Storage
Links are stored in `data/links.json`. Make sure this directory exists and is writable. When using Docker, mount the `data` directory as a volume for persistence.
## Troubleshooting
### Puppeteer Issues
If you encounter issues with Puppeteer:
1. **NixOS**: The app uses `puppeteer-core` and automatically detects system Chromium
2. **Docker**: Chromium is included in the Docker image
3. **Manual Setup**: Set `CHROME_EXECUTABLE_PATH` environment variable to your Chromium path
### 403 Errors
Some sites block automated requests. The app automatically:
- First tries HTTP requests with realistic headers
- Falls back to Puppeteer for JavaScript rendering if blocked
- Uses system Chromium for browser automation
## Development
```bash
# Install dependencies
npm install
# Run in development mode with auto-reload
npm run dev
# Start production server
npm start
```
## License
ISC

23
docker-compose.yml Normal file
View File

@@ -0,0 +1,23 @@
version: '3.8'
services:
linkding:
build: .
container_name: linkding
image: wirelos/linkding:latest
ports:
- "3000:3000"
volumes:
- ./data:/app/data
environment:
- NODE_ENV=production
- PORT=3000
- CHROME_EXECUTABLE_PATH=/usr/bin/chromium
restart: unless-stopped
healthcheck:
test: ["CMD", "node", "-e", "require('http').get('http://localhost:3000/api/links', (r) => {process.exit(r.statusCode === 200 ? 0 : 1)})"]
interval: 30s
timeout: 3s
retries: 3
start_period: 5s

2751
package-lock.json generated Normal file

File diff suppressed because it is too large Load Diff

27
package.json Normal file
View File

@@ -0,0 +1,27 @@
{
"name": "linkding",
"version": "1.0.0",
"description": "A modern link bookmarking app with metadata extraction",
"main": "server.js",
"scripts": {
"start": "node server.js",
"dev": "nodemon server.js"
},
"keywords": [
"bookmark",
"links",
"scraping"
],
"author": "",
"license": "ISC",
"dependencies": {
"axios": "^1.6.0",
"cheerio": "^1.0.0-rc.12",
"cors": "^2.8.5",
"express": "^4.18.2",
"puppeteer-core": "^22.15.0"
},
"devDependencies": {
"nodemon": "^3.0.1"
}
}

208
public/app.js Normal file
View File

@@ -0,0 +1,208 @@
const API_BASE = '/api/links';
// DOM elements
const linkForm = document.getElementById('linkForm');
const linkInput = document.getElementById('linkInput');
const addButton = document.getElementById('addButton');
const searchInput = document.getElementById('searchInput');
const linksContainer = document.getElementById('linksContainer');
const messageDiv = document.getElementById('message');
// State
let allLinks = [];
let searchTimeout = null;
// Initialize app
document.addEventListener('DOMContentLoaded', () => {
loadLinks();
setupEventListeners();
});
// Event listeners
function setupEventListeners() {
linkForm.addEventListener('submit', handleAddLink);
searchInput.addEventListener('input', handleSearch);
}
// Load all links
async function loadLinks() {
try {
const response = await fetch(API_BASE);
if (!response.ok) throw new Error('Failed to load links');
allLinks = await response.json();
displayLinks(allLinks);
} catch (error) {
showMessage('Failed to load links', 'error');
console.error('Error loading links:', error);
}
}
// Handle add link form submission
async function handleAddLink(e) {
e.preventDefault();
const url = linkInput.value.trim();
if (!url) return;
// Disable form
addButton.disabled = true;
const btnText = addButton.querySelector('.btn-text');
const btnLoader = addButton.querySelector('.btn-loader');
btnText.style.display = 'none';
btnLoader.style.display = 'inline';
try {
const response = await fetch(API_BASE, {
method: 'POST',
headers: {
'Content-Type': 'application/json'
},
body: JSON.stringify({ url })
});
const data = await response.json();
if (!response.ok) {
throw new Error(data.error || 'Failed to add link');
}
// Add to beginning of array
allLinks.unshift(data);
displayLinks(allLinks);
// Clear input
linkInput.value = '';
showMessage('Link added successfully!', 'success');
} catch (error) {
showMessage(error.message, 'error');
console.error('Error adding link:', error);
} finally {
// Re-enable form
addButton.disabled = false;
btnText.style.display = 'inline';
btnLoader.style.display = 'none';
}
}
// Handle search input
function handleSearch(e) {
const query = e.target.value.trim();
// Debounce search
clearTimeout(searchTimeout);
searchTimeout = setTimeout(async () => {
if (!query) {
displayLinks(allLinks);
return;
}
try {
const response = await fetch(`${API_BASE}/search?q=${encodeURIComponent(query)}`);
if (!response.ok) throw new Error('Search failed');
const filteredLinks = await response.json();
displayLinks(filteredLinks);
} catch (error) {
showMessage('Search failed', 'error');
console.error('Error searching:', error);
}
}, 300);
}
// Display links
function displayLinks(links) {
if (links.length === 0) {
linksContainer.innerHTML = `
<div class="empty-state">
<p>No links found. ${allLinks.length === 0 ? 'Add your first link to get started!' : 'Try a different search term.'}</p>
</div>
`;
return;
}
linksContainer.innerHTML = links.map(link => createLinkCard(link)).join('');
// Add delete event listeners
document.querySelectorAll('.delete-btn').forEach(btn => {
btn.addEventListener('click', (e) => {
const linkId = e.target.closest('.link-card').dataset.id;
handleDeleteLink(linkId);
});
});
}
// Create link card HTML
function createLinkCard(link) {
const date = new Date(link.createdAt);
const formattedDate = date.toLocaleDateString('en-US', {
year: 'numeric',
month: 'short',
day: 'numeric'
});
const imageHtml = link.image
? `<img src="${escapeHtml(link.image)}" alt="${escapeHtml(link.title)}" onerror="this.parentElement.textContent='No image available'">`
: '<span>No image available</span>';
return `
<div class="link-card" data-id="${link.id}">
<div class="link-image">
${imageHtml}
</div>
<div class="link-content">
<h3 class="link-title">${escapeHtml(link.title)}</h3>
${link.description ? `<p class="link-description">${escapeHtml(link.description)}</p>` : ''}
<a href="${escapeHtml(link.url)}" target="_blank" rel="noopener noreferrer" class="link-url">
${escapeHtml(link.url)}
</a>
<div class="link-footer">
<span class="link-date">${formattedDate}</span>
<button class="delete-btn">Delete</button>
</div>
</div>
</div>
`;
}
// Handle delete link
async function handleDeleteLink(id) {
if (!confirm('Are you sure you want to delete this link?')) {
return;
}
try {
const response = await fetch(`${API_BASE}/${id}`, {
method: 'DELETE'
});
if (!response.ok) throw new Error('Failed to delete link');
// Remove from local array
allLinks = allLinks.filter(link => link.id !== id);
displayLinks(allLinks);
showMessage('Link deleted successfully', 'success');
} catch (error) {
showMessage('Failed to delete link', 'error');
console.error('Error deleting link:', error);
}
}
// Show message
function showMessage(text, type = 'success') {
messageDiv.textContent = text;
messageDiv.className = `message ${type}`;
messageDiv.style.display = 'block';
setTimeout(() => {
messageDiv.style.display = 'none';
}, 3000);
}
// Escape HTML to prevent XSS
function escapeHtml(text) {
const div = document.createElement('div');
div.textContent = text;
return div.innerHTML;
}

55
public/index.html Normal file
View File

@@ -0,0 +1,55 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>LinkDing - Your Link Collection</title>
<link rel="stylesheet" href="styles.css">
</head>
<body>
<div class="container">
<header>
<h1>🔗 LinkDing</h1>
<p class="subtitle">Collect and organize your favorite links</p>
</header>
<div class="input-section">
<form id="linkForm" class="link-form">
<div class="input-group">
<input
type="url"
id="linkInput"
placeholder="Paste a link here..."
required
autocomplete="off"
>
<button type="submit" id="addButton" class="btn-primary">
<span class="btn-text">Add Link</span>
<span class="btn-loader" style="display: none;"></span>
</button>
</div>
</form>
<div class="search-section">
<input
type="text"
id="searchInput"
placeholder="🔍 Search links by title, description, or URL..."
autocomplete="off"
>
</div>
</div>
<div id="message" class="message" style="display: none;"></div>
<div id="linksContainer" class="links-container">
<div class="empty-state">
<p>No links yet. Add your first link to get started!</p>
</div>
</div>
</div>
<script src="app.js"></script>
</body>
</html>

340
public/styles.css Normal file
View File

@@ -0,0 +1,340 @@
* {
margin: 0;
padding: 0;
box-sizing: border-box;
}
:root {
--primary-color: #6366f1;
--primary-hover: #4f46e5;
--secondary-color: #8b5cf6;
--background: #0f172a;
--surface: #1e293b;
--surface-light: #334155;
--text-primary: #f1f5f9;
--text-secondary: #cbd5e1;
--text-muted: #94a3b8;
--border: #334155;
--success: #10b981;
--error: #ef4444;
--shadow: rgba(0, 0, 0, 0.3);
--shadow-lg: rgba(0, 0, 0, 0.5);
}
body {
font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen, Ubuntu, Cantarell, sans-serif;
background: linear-gradient(135deg, var(--background) 0%, #1a1f3a 100%);
color: var(--text-primary);
min-height: 100vh;
padding: 2rem 1rem;
line-height: 1.6;
}
.container {
max-width: 1200px;
margin: 0 auto;
}
header {
text-align: center;
margin-bottom: 3rem;
animation: fadeIn 0.6s ease-out;
}
header h1 {
font-size: 3rem;
font-weight: 700;
background: linear-gradient(135deg, var(--primary-color), var(--secondary-color));
-webkit-background-clip: text;
-webkit-text-fill-color: transparent;
background-clip: text;
margin-bottom: 0.5rem;
}
.subtitle {
color: var(--text-secondary);
font-size: 1.1rem;
}
.input-section {
margin-bottom: 2rem;
animation: fadeIn 0.8s ease-out;
}
.link-form {
margin-bottom: 1.5rem;
}
.input-group {
display: flex;
gap: 1rem;
background: var(--surface);
padding: 0.5rem;
border-radius: 12px;
border: 1px solid var(--border);
box-shadow: 0 4px 6px var(--shadow);
transition: all 0.3s ease;
}
.input-group:focus-within {
border-color: var(--primary-color);
box-shadow: 0 0 0 3px rgba(99, 102, 241, 0.1);
}
#linkInput {
flex: 1;
background: transparent;
border: none;
color: var(--text-primary);
font-size: 1rem;
padding: 1rem;
outline: none;
}
#linkInput::placeholder {
color: var(--text-muted);
}
.btn-primary {
background: linear-gradient(135deg, var(--primary-color), var(--secondary-color));
color: white;
border: none;
padding: 1rem 2rem;
border-radius: 8px;
font-size: 1rem;
font-weight: 600;
cursor: pointer;
transition: all 0.3s ease;
white-space: nowrap;
box-shadow: 0 4px 6px var(--shadow);
}
.btn-primary:hover:not(:disabled) {
transform: translateY(-2px);
box-shadow: 0 6px 12px var(--shadow-lg);
}
.btn-primary:active:not(:disabled) {
transform: translateY(0);
}
.btn-primary:disabled {
opacity: 0.6;
cursor: not-allowed;
}
.search-section {
background: var(--surface);
padding: 0.5rem;
border-radius: 12px;
border: 1px solid var(--border);
box-shadow: 0 4px 6px var(--shadow);
}
#searchInput {
width: 100%;
background: transparent;
border: none;
color: var(--text-primary);
font-size: 1rem;
padding: 1rem;
outline: none;
}
#searchInput::placeholder {
color: var(--text-muted);
}
.message {
padding: 1rem;
border-radius: 8px;
margin-bottom: 1.5rem;
animation: slideDown 0.3s ease-out;
}
.message.success {
background: rgba(16, 185, 129, 0.2);
border: 1px solid var(--success);
color: var(--success);
}
.message.error {
background: rgba(239, 68, 68, 0.2);
border: 1px solid var(--error);
color: var(--error);
}
.links-container {
display: grid;
grid-template-columns: repeat(auto-fill, minmax(320px, 1fr));
gap: 1.5rem;
animation: fadeIn 1s ease-out;
}
.empty-state {
grid-column: 1 / -1;
text-align: center;
padding: 4rem 2rem;
color: var(--text-muted);
font-size: 1.1rem;
}
.link-card {
background: var(--surface);
border: 1px solid var(--border);
border-radius: 16px;
overflow: hidden;
transition: all 0.3s ease;
box-shadow: 0 4px 6px var(--shadow);
display: flex;
flex-direction: column;
animation: fadeInUp 0.5s ease-out;
}
.link-card:hover {
transform: translateY(-4px);
box-shadow: 0 8px 16px var(--shadow-lg);
border-color: var(--primary-color);
}
.link-image {
width: 100%;
height: 200px;
object-fit: cover;
background: var(--surface-light);
display: flex;
align-items: center;
justify-content: center;
color: var(--text-muted);
font-size: 0.9rem;
}
.link-image img {
width: 100%;
height: 100%;
object-fit: cover;
}
.link-content {
padding: 1.5rem;
flex: 1;
display: flex;
flex-direction: column;
}
.link-title {
font-size: 1.25rem;
font-weight: 600;
margin-bottom: 0.75rem;
color: var(--text-primary);
line-height: 1.4;
display: -webkit-box;
-webkit-line-clamp: 2;
-webkit-box-orient: vertical;
overflow: hidden;
}
.link-description {
color: var(--text-secondary);
font-size: 0.95rem;
margin-bottom: 1rem;
flex: 1;
display: -webkit-box;
-webkit-line-clamp: 3;
-webkit-box-orient: vertical;
overflow: hidden;
line-height: 1.5;
}
.link-url {
color: var(--primary-color);
font-size: 0.85rem;
text-decoration: none;
margin-bottom: 1rem;
word-break: break-all;
transition: color 0.3s ease;
}
.link-url:hover {
color: var(--primary-hover);
text-decoration: underline;
}
.link-footer {
display: flex;
justify-content: space-between;
align-items: center;
padding-top: 1rem;
border-top: 1px solid var(--border);
}
.link-date {
color: var(--text-muted);
font-size: 0.85rem;
}
.delete-btn {
background: rgba(239, 68, 68, 0.2);
color: var(--error);
border: 1px solid var(--error);
padding: 0.5rem 1rem;
border-radius: 6px;
font-size: 0.85rem;
cursor: pointer;
transition: all 0.3s ease;
}
.delete-btn:hover {
background: var(--error);
color: white;
}
@keyframes fadeIn {
from {
opacity: 0;
}
to {
opacity: 1;
}
}
@keyframes fadeInUp {
from {
opacity: 0;
transform: translateY(20px);
}
to {
opacity: 1;
transform: translateY(0);
}
}
@keyframes slideDown {
from {
opacity: 0;
transform: translateY(-10px);
}
to {
opacity: 1;
transform: translateY(0);
}
}
@media (max-width: 768px) {
header h1 {
font-size: 2rem;
}
.input-group {
flex-direction: column;
}
.btn-primary {
width: 100%;
}
.links-container {
grid-template-columns: 1fr;
}
}

682
server.js Normal file
View File

@@ -0,0 +1,682 @@
const express = require('express');
const cors = require('cors');
const fs = require('fs').promises;
const path = require('path');
const axios = require('axios');
const cheerio = require('cheerio');
// Lazy load puppeteer (only if needed)
let puppeteer = null;
let puppeteerAvailable = null;
async function getPuppeteer() {
if (puppeteerAvailable === false) {
return null; // Already tried and failed
}
if (!puppeteer) {
try {
puppeteer = require('puppeteer-core');
puppeteerAvailable = true;
console.log('Puppeteer-core loaded successfully');
} catch (e) {
console.warn('Puppeteer-core not available:', e.message);
puppeteerAvailable = false;
return null;
}
}
return puppeteer;
}
// Find system Chromium/Chrome executable
function findChromeExecutable() {
const { execSync } = require('child_process');
// Check environment variable first
if (process.env.CHROME_EXECUTABLE_PATH) {
return process.env.CHROME_EXECUTABLE_PATH;
}
// Try which command for common names
const commands = ['chromium', 'chromium-browser', 'google-chrome', 'google-chrome-stable'];
for (const cmd of commands) {
try {
const result = execSync(`which ${cmd} 2>/dev/null`, { encoding: 'utf8' }).trim();
if (result) {
return result;
}
} catch (e) {
// Continue to next command
}
}
// Try common NixOS paths
try {
const nixPaths = execSync('find /nix/store -name chromium -type f -executable 2>/dev/null | head -1', { encoding: 'utf8' }).trim();
if (nixPaths) return nixPaths;
} catch (e) {
// Ignore
}
return null;
}
const app = express();
const PORT = process.env.PORT || 3000;
const DATA_FILE = path.join(__dirname, 'data', 'links.json');
// Middleware
app.use(cors());
app.use(express.json());
app.use(express.static('public'));
// Ensure data directory exists
async function ensureDataDir() {
const dataDir = path.dirname(DATA_FILE);
try {
await fs.access(dataDir);
} catch {
await fs.mkdir(dataDir, { recursive: true });
}
try {
await fs.access(DATA_FILE);
} catch {
await fs.writeFile(DATA_FILE, JSON.stringify([]));
}
}
// Read links from file
async function readLinks() {
try {
const data = await fs.readFile(DATA_FILE, 'utf8');
return JSON.parse(data);
} catch (error) {
return [];
}
}
// Write links to file
async function writeLinks(links) {
await fs.writeFile(DATA_FILE, JSON.stringify(links, null, 2));
}
// Extract metadata using Puppeteer (for JavaScript-heavy sites)
async function extractMetadataWithPuppeteer(url) {
const pptr = await getPuppeteer();
if (!pptr) {
throw new Error('Puppeteer not available');
}
let browser = null;
try {
console.log('Launching Puppeteer browser...');
// Find system Chrome/Chromium executable
const executablePath = findChromeExecutable();
if (!executablePath) {
throw new Error('Chrome/Chromium not found. Please install it via NixOS or set CHROME_EXECUTABLE_PATH environment variable.');
}
console.log(`Using Chrome executable: ${executablePath}`);
browser = await pptr.launch({
headless: 'new',
executablePath: executablePath,
args: [
'--no-sandbox',
'--disable-setuid-sandbox',
'--disable-dev-shm-usage',
'--disable-blink-features=AutomationControlled',
'--disable-features=IsolateOrigins,site-per-process',
'--disable-gpu'
]
});
const page = await browser.newPage();
// Set realistic viewport and user agent
await page.setViewport({ width: 1920, height: 1080 });
await page.setUserAgent('Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36');
// Add extra headers to look more like a real browser
await page.setExtraHTTPHeaders({
'Accept-Language': 'en-US,en;q=0.9,de;q=0.8',
'Accept-Encoding': 'gzip, deflate, br',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8',
'Connection': 'keep-alive',
'Upgrade-Insecure-Requests': '1'
});
console.log(`Navigating to ${url}...`);
// Navigate to the page with longer timeout
await page.goto(url, {
waitUntil: 'networkidle2',
timeout: 60000
});
// Helper function to wait (replacement for deprecated waitForTimeout)
const wait = (ms) => new Promise(resolve => setTimeout(resolve, ms));
// Wait a bit for any lazy-loaded content and images
console.log('Waiting for content to load...');
await wait(3000);
// Scroll a bit to trigger lazy loading
await page.evaluate(() => {
window.scrollTo(0, 300);
});
await wait(1000);
// Get the rendered HTML
console.log('Extracting HTML content...');
const html = await page.content();
await browser.close();
console.log('Browser closed, processing HTML...');
// Use the same extraction logic as the regular function
return await extractMetadataFromHTML(html, url);
} catch (error) {
if (browser) {
try {
await browser.close();
} catch (e) {
// Ignore close errors
}
}
console.error('Puppeteer extraction error:', error.message);
throw error;
}
}
// Common extraction logic that works with HTML string
async function extractMetadataFromHTML(html, url) {
const $ = cheerio.load(html);
const urlObj = new URL(url);
// Try to extract JSON-LD structured data (common in e-commerce sites)
let jsonLdData = null;
$('script[type="application/ld+json"]').each(function() {
try {
const content = $(this).html();
let jsonData = JSON.parse(content);
// Handle arrays of structured data
if (Array.isArray(jsonData)) {
jsonData = jsonData.find(item =>
item['@type'] === 'Product' ||
item['@type'] === 'WebPage' ||
item['@type'] === 'Offer'
) || jsonData[0];
}
if (jsonData && (jsonData['@type'] === 'Product' || jsonData['@type'] === 'WebPage' || jsonData['@type'] === 'Offer')) {
jsonLdData = jsonData;
return false; // break
}
} catch (e) {
// Ignore parse errors
}
});
// Extract title with priority order
let title = '';
if (jsonLdData) {
title = jsonLdData.name || jsonLdData.headline || jsonLdData.title;
}
if (!title) {
title = $('meta[property="og:title"]').attr('content') ||
$('meta[name="twitter:title"]').attr('content') ||
$('h1').first().text().trim() ||
$('title').text().trim() ||
'';
}
title = title || 'Untitled';
// Extract description with priority order
let description = '';
if (jsonLdData) {
description = jsonLdData.description || jsonLdData.about;
}
if (!description) {
description = $('meta[property="og:description"]').attr('content') ||
$('meta[name="twitter:description"]').attr('content') ||
$('meta[name="description"]').attr('content') ||
'';
}
// If still no description, try to find product description sections
if (!description) {
// Try common product description selectors
const descSelectors = [
'[data-testid="product-description"]',
'.product-description',
'.description',
'[itemprop="description"]',
'section[aria-label*="description" i]',
'section[aria-label*="beschreibung" i]' // German
];
for (const selector of descSelectors) {
const descText = $(selector).first().text().trim();
if (descText && descText.length > 20) {
description = descText;
break;
}
}
}
// Fallback to first paragraph if still no description
if (!description) {
$('p').each(function() {
const text = $(this).text().trim();
if (text.length > 50 && text.length < 1000) {
description = text;
return false; // break
}
});
}
// Extract image with multiple strategies
let image = '';
// Helper function to extract image source from an img element
const extractImgSrc = (img) => {
return img.attr('src') ||
img.attr('data-src') ||
img.attr('data-lazy-src') ||
img.attr('data-original') ||
img.attr('data-image') ||
img.attr('data-lazy') ||
img.attr('data-url');
};
// Helper function to extract best image from srcset
const extractFromSrcset = (img) => {
if (!img.attr('srcset')) return null;
const srcset = img.attr('srcset');
// Extract the largest image from srcset (usually the last one)
const srcsetMatches = srcset.match(/([^\s,]+)\s+(\d+w|\d+\.\d+x)/g);
if (srcsetMatches && srcsetMatches.length > 0) {
// Get the last entry which is usually the highest resolution
const lastMatch = srcsetMatches[srcsetMatches.length - 1];
const srcMatch = lastMatch.match(/^([^\s]+)/);
if (srcMatch) {
return srcMatch[1];
}
} else {
// Fallback: just get first URL from srcset
const srcsetMatch = srcset.match(/^([^\s,]+)/);
if (srcsetMatch) {
return srcsetMatch[1];
}
}
return null;
};
// Priority 1: Product container images (most specific - check BEFORE meta tags)
const productContainerSelectors = [
'.product-container img',
'[class*="product-container" i] img',
'#product-container img',
'.product-container picture img',
'[class*="product-container" i] picture img'
];
for (const selector of productContainerSelectors) {
const imgs = $(selector);
if (imgs.length > 0) {
// Try to find the main product image (usually the first one that's not a thumbnail)
for (let i = 0; i < imgs.length; i++) {
const img = $(imgs[i]);
const src = extractImgSrc(img);
if (src && !src.includes('thumb') && !src.includes('thumbnail') && !src.includes('icon')) {
image = extractFromSrcset(img) || src;
break;
}
}
// If no good image found, just take the first one
if (!image && imgs.length > 0) {
const firstImg = $(imgs[0]);
image = extractFromSrcset(firstImg) || extractImgSrc(firstImg);
}
if (image) break;
}
}
// Priority 2: Other product-specific containers (before meta tags)
if (!image) {
const productImageContainers = [
'[data-testid="product-image"] img',
'[data-testid="productImage"] img',
'.product-image img',
'.product-gallery img',
'.product__image img',
'.product-images img',
'[class*="product-image" i] img',
'[class*="product-gallery" i] img',
'[id*="product-image" i] img'
];
for (const selector of productImageContainers) {
const img = $(selector).first();
if (img.length) {
const imgSrc = extractImgSrc(img);
if (imgSrc) {
image = extractFromSrcset(img) || imgSrc;
if (image) break;
}
}
}
}
// Priority 3: Try Open Graph and Twitter Card images (after product containers)
if (!image) {
image = $('meta[property="og:image"]').attr('content') ||
$('meta[name="twitter:image"]').attr('content') ||
$('meta[name="twitter:image:src"]').attr('content');
}
// Priority 4: Try JSON-LD image
if (!image && jsonLdData) {
if (jsonLdData.image) {
if (typeof jsonLdData.image === 'string') {
image = jsonLdData.image;
} else if (Array.isArray(jsonLdData.image) && jsonLdData.image.length > 0) {
image = jsonLdData.image[0];
} else if (jsonLdData.image.url) {
image = jsonLdData.image.url;
}
}
}
// Priority 5: Galaxus-specific (keep existing logic)
if (!image) {
const isGalaxus = url.includes('galaxus.');
if (isGalaxus) {
const galaxusImg = $('img[alt*="Produktbild" i], img[alt*="Produkt" i]').first();
if (galaxusImg.length) {
image = extractImgSrc(galaxusImg);
}
if (!image) {
const galleryImg = $('[class*="product" i] img, [class*="image" i] img, [class*="gallery" i] img').first();
if (galleryImg.length) {
image = extractImgSrc(galleryImg);
}
}
}
}
// Priority 6: Generic product selectors
if (!image) {
const genericSelectors = [
'[itemprop="image"]',
'picture img',
'figure img',
'main img',
'[role="img"]',
'article img',
'[class*="main-image" i] img',
'[id*="main-image" i] img'
];
for (const selector of genericSelectors) {
const img = $(selector).first();
if (img.length) {
const imgSrc = extractImgSrc(img);
if (imgSrc &&
!imgSrc.includes('logo') &&
!imgSrc.includes('icon') &&
!imgSrc.includes('avatar') &&
!imgSrc.includes('spacer') &&
!imgSrc.includes('pixel')) {
image = extractFromSrcset(img) || imgSrc;
if (image) break;
}
}
}
}
// Fallback to first meaningful image
if (!image) {
$('img').each(function() {
const img = $(this);
const src = img.attr('src') ||
img.attr('data-src') ||
img.attr('data-lazy-src');
// Skip very small images, icons, and logos
if (src &&
!src.includes('logo') &&
!src.includes('icon') &&
!src.includes('avatar') &&
!src.includes('spacer') &&
!src.includes('pixel')) {
image = src;
return false; // break
}
});
}
// Convert relative URLs to absolute
if (image && !image.startsWith('http')) {
if (image.startsWith('//')) {
image = urlObj.protocol + image;
} else if (image.startsWith('/')) {
image = urlObj.origin + image;
} else {
image = new URL(image, url).href;
}
}
// Clean up title and description
title = title.trim().replace(/\s+/g, ' ');
description = description.trim().replace(/\s+/g, ' ').substring(0, 500);
return {
title: title,
description: description,
image: image
};
}
// Extract metadata from URL
async function extractMetadata(url) {
try {
const urlObj = new URL(url);
// More realistic browser headers to avoid 403 errors
const headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7',
'Accept-Language': 'en-US,en;q=0.9,de;q=0.8',
'Accept-Encoding': 'gzip, deflate, br',
'DNT': '1',
'Connection': 'keep-alive',
'Upgrade-Insecure-Requests': '1',
'Sec-Fetch-Dest': 'document',
'Sec-Fetch-Mode': 'navigate',
'Sec-Fetch-Site': 'none',
'Sec-Fetch-User': '?1',
'Cache-Control': 'max-age=0',
'Referer': urlObj.origin + '/'
};
const response = await axios.get(url, {
headers: headers,
timeout: 20000,
maxRedirects: 5,
validateStatus: function (status) {
return status >= 200 && status < 500; // Don't throw on 403, we'll handle it
}
});
// Check if we got blocked - use Puppeteer as fallback
if (response.status === 403 || response.status === 429) {
console.log(`Received ${response.status} status, trying Puppeteer fallback...`);
const pptr = await getPuppeteer();
if (pptr) {
try {
console.log('Using Puppeteer to extract metadata...');
return await extractMetadataWithPuppeteer(url);
} catch (puppeteerError) {
console.error('Puppeteer extraction failed:', puppeteerError.message);
// Fall through to retry with simpler headers
}
}
// Fallback: try simpler headers if Puppeteer not available or failed
console.log('Trying with simpler headers...');
const retryHeaders = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language': 'en-US,en;q=0.9'
};
const retryResponse = await axios.get(url, {
headers: retryHeaders,
timeout: 20000,
maxRedirects: 5,
validateStatus: function (status) {
return status >= 200 && status < 500;
}
});
if (retryResponse.status === 403 || retryResponse.status === 429) {
throw new Error(`Site is blocking requests. Please try again later or the site may require JavaScript rendering.`);
}
if (retryResponse.status !== 200) {
throw new Error(`Request failed with status code ${retryResponse.status}`);
}
// Use shared extraction function
return await extractMetadataFromHTML(retryResponse.data, url);
} else if (response.status !== 200) {
throw new Error(`Request failed with status code ${response.status}`);
}
// Use shared extraction function
return await extractMetadataFromHTML(response.data, url);
} catch (error) {
console.error('Error extracting metadata:', error.message);
return {
title: 'Error loading page',
description: 'Could not extract metadata from this URL',
image: ''
};
}
}
// API Routes
// Get all links
app.get('/api/links', async (req, res) => {
try {
const links = await readLinks();
res.json(links);
} catch (error) {
res.status(500).json({ error: 'Failed to read links' });
}
});
// Search links
app.get('/api/links/search', async (req, res) => {
try {
const query = req.query.q?.toLowerCase() || '';
const links = await readLinks();
if (!query) {
return res.json(links);
}
const filtered = links.filter(link => {
const titleMatch = link.title?.toLowerCase().includes(query);
const descMatch = link.description?.toLowerCase().includes(query);
const urlMatch = link.url?.toLowerCase().includes(query);
return titleMatch || descMatch || urlMatch;
});
res.json(filtered);
} catch (error) {
res.status(500).json({ error: 'Failed to search links' });
}
});
// Add a new link
app.post('/api/links', async (req, res) => {
try {
const { url } = req.body;
if (!url || !isValidUrl(url)) {
return res.status(400).json({ error: 'Invalid URL' });
}
// Check if link already exists
const links = await readLinks();
const existingLink = links.find(link => link.url === url);
if (existingLink) {
return res.status(409).json({ error: 'Link already exists' });
}
// Extract metadata
const metadata = await extractMetadata(url);
// Create new link
const newLink = {
id: Date.now().toString(),
url: url,
title: metadata.title,
description: metadata.description,
image: metadata.image,
createdAt: new Date().toISOString()
};
links.unshift(newLink); // Add to beginning
await writeLinks(links);
res.status(201).json(newLink);
} catch (error) {
console.error('Error adding link:', error);
res.status(500).json({ error: 'Failed to add link' });
}
});
// Delete a link
app.delete('/api/links/:id', async (req, res) => {
try {
const { id } = req.params;
const links = await readLinks();
const filtered = links.filter(link => link.id !== id);
if (filtered.length === links.length) {
return res.status(404).json({ error: 'Link not found' });
}
await writeLinks(filtered);
res.json({ message: 'Link deleted successfully' });
} catch (error) {
res.status(500).json({ error: 'Failed to delete link' });
}
});
// Helper function to validate URL
function isValidUrl(string) {
try {
const url = new URL(string);
return url.protocol === 'http:' || url.protocol === 'https:';
} catch (_) {
return false;
}
}
// Initialize server
async function startServer() {
await ensureDataDir();
app.listen(PORT, () => {
console.log(`LinkDing server running on http://localhost:${PORT}`);
});
}
startServer();