1# AGENTS.md - Project Guide for AI Coding Agents
2
3## Project Overview
4
5This is an **Apify scraper** designed to monitor and extract promotional information from the Kaktus mobile operator website. Specifically, it tracks the "Dobíječka" (top-up double) promotion - a promotional campaign where customers who top up their prepaid credit receive bonus credit.
6
7**Key Facts:**
8- **Project Name:** kaktus-dobijecka
9- **Type:** Apify Actor (web scraper)
10- **Target:** https://www.mujkaktus.cz/chces-pridat
11- **Purpose:** Extract the date and time of the next "Dobíječka" promotion
12- **Technology Stack:** Node.js, Apify SDK, Crawlee with Cheerio
13- **Operator:** Kaktus (Czech mobile phone operator)
14
15## What This Actor Does
16
17The actor:
181. Scrapes the Kaktus website promotional page
192. Extracts the date and time of the next "Dobíječka" promotion
203. Stores the results in Apify dataset
214. Optionally sends email notifications if the promotion is happening today
22
23### Output Format
24
25The actor produces a JSON object with the following structure:
26
27```json
28{
29 "Date": "2025-07-09",
30 "From": "16:00",
31 "To": "18:00"
32}
33```
34
35**Note:** The `From` and `To` fields are optional and only included when time information is available on the website.
36
37### Input Format
38
39The actor accepts optional email configuration for notifications:
40
41```json
42{
43 "email": [
44 {
45 "to": "your@mail.com"
46 },
47 {
48 "to": "other@mail.com",
49 "cc": "cc@mail.com",
50 "bcc": "bcc@mail.com"
51 }
52 ]
53}
54```
55
56If no email notification is needed, leave the input empty.
57
58## How to Run
59
60### Development Mode
61```bash
62apify run -p
63```
64
65This command:
66- Runs the actor locally in purge mode (cleans storage before run)
67- Uses local INPUT.json if present
68- Stores results in `./storage/datasets/default/`
69
70### Production
71The actor runs on the Apify platform according to its scheduled configuration.
72
73### Running Tests
74```bash
75npm test
76```
77
78This executes the Mocha test suite located in `test/test.js`.
79
80## Project Structure
81
82```
83.
84├── src/
85│ ├── main.js # Entry point, initializes crawler
86│ ├── routes.js # Request handler with scraping logic
87│ └── utils.js # Utility functions for parsing and email
88├── test/
89│ └── test.js # Unit tests for parsing functions
90├── .actor/
91│ └── actor.json # Apify actor metadata
92├── INPUT_SCHEMA.json # Input validation schema
93├── package.json # Dependencies and scripts
94└── README.md # User-facing documentation
95```
96
97## How It Works
98
99### Technical Implementation
100
101The scraper uses a two-phase parsing strategy:
102
103#### Phase 1: Parse from HTML Text (Primary Method)
104Located in: `src/routes.js:9-15`
105
106The scraper first attempts to extract date and time directly from the promotional text on the page. It looks for patterns like:
107- `9.7.2025 16:00 - 18:00`
108- `31. 10. 2025 20:00 - 22:00` (with spaces after dots)
109
110This is handled by `Utils.parseDateTimeFromText()` in `src/utils.js:43-69`.
111
112#### Phase 2: Parse from PDF Filename (Fallback Method)
113Located in: `src/routes.js:17-32`
114
115If Phase 1 fails, the scraper looks for PDF links containing promotional terms and conditions. The filename contains the date in format `DDMMYYYY.pdf` (e.g., `OP-Odmena-za-dobiti-FB_09072025.pdf`).
116
117This is handled by `Utils.parseDate()` in `src/utils.js:5-19`.
118
119### Email Notifications
120
121When a promotion date matches today's date:
1221. The actor checks input for email configuration
1232. Sends notification via `apify/send-mail` actor
1243. Email subject includes time range if available
1254. Email body contains link to the promotional page
126
127Implementation: `src/utils.js:71-103`
128
129## Maintenance Guide
130
131### Key Maintenance Principles
132
133**Backward Compatibility First**
134
135When maintaining this actor, the golden rule is: **NEVER DELETE EXISTING PARSING LOGIC**. The Kaktus website may change formats unpredictably or even alternate between different date formats for different promotional events. Always maintain backward compatibility by:
136
1371. Adding new parsing patterns alongside existing ones
1382. Testing all historical formats remain functional
1393. Trying newer formats first, then falling back to older ones
1404. Never replacing or removing working regex patterns
141
142This layered approach ensures the scraper continues to work regardless of which format the website uses.
143
144### When to Update This Actor
145
1461. **Website Structure Changes**
147 - If Kaktus redesigns their promotional page
148 - If CSS selectors change (currently uses `div.richTextStyles`)
149 - If PDF URL format changes
150
1512. **Date Format Changes**
152 - If the date/time format in promotional text changes
153 - If PDF filename format changes
154 - **Important:** Always ADD new format support, never REPLACE existing formats
155
1563. **New Data Requirements**
157 - If you need to extract additional information (e.g., bonus amount, conditions)
158
159### Common Maintenance Tasks
160
161#### 1. Update CSS Selectors
162
163If the website structure changes, update the selector in `src/routes.js:10`:
164```javascript
165const textResult = Utils.parseDateTimeFromText($('div.richTextStyles').text());
166```
167
168#### 2. Update Date Parsing Regex
169
170Date/time parsing regex is in `src/utils.js:47`:
171```javascript
172const dateTimeMatch = text.match(/(\d{1,2})\.\s*(\d{1,2})\.\s*(\d{4})\s+(\d{1,2}):(\d{2})\s*-\s*(\d{1,2}):(\d{2})/);
173```
174
175PDF filename parsing regex is in `src/utils.js:8`:
176```javascript
177const filenameMatch = input.match(/(\d{2})(\d{2})(\d{4})\.pdf/);
178```
179
180**IMPORTANT: When adding support for new date/time formats, ALWAYS keep the existing parsing logic intact.**
181
182The parsing functions should support multiple formats simultaneously. Never delete or replace previous regex patterns - instead, add new patterns as additional attempts. This ensures backward compatibility if the website alternates between different date formats or shows multiple promotional events.
183
184Example approach:
185- Try parsing with new format first
186- If that fails, try the existing format(s)
187- Return the first successful match
188- All historical formats should remain functional
189
190#### 3. Update Target URL
191
192If the promotional page URL changes, update `src/main.js:7`:
193```javascript
194const startUrls = ['https://www.mujkaktus.cz/chces-pridat'];
195```
196
197#### 4. Add New Tests
198
199When adding new parsing logic or fixing bugs:
2001. Add test cases to `test/test.js`
2012. Run `npm test` to verify
2023. Ensure all existing tests still pass
203
204### Testing Strategy
205
206The test suite covers:
207- **Date parsing from PDF filenames** (`parseDate`)
208- **Date/time parsing from text** (`parseDateTimeFromText`)
209- **Date comparison** (`isSameDay`)
210
211Test cases include:
212- Various date formats (with/without spaces after dots)
213- Single and double-digit days/months
214- Time range extraction
215- Invalid input handling
216
217### Debugging Tips
218
2191. **Check scraped content:**
220 - Run locally with `apify run -p`
221 - Add `console.log($('div.richTextStyles').text())` to see extracted text
222
2232. **Test parsing functions:**
224 - Use the test suite to verify regex patterns
225 - Add new test cases for edge cases
226
2273. **Verify output:**
228 - Check `./storage/datasets/default/` after local run
229 - Verify date format is `YYYY-MM-DD`
230
2314. **Email issues:**
232 - Verify INPUT.json has correct email configuration
233 - Check that date matches today for emails to send
234 - Review `apify/send-mail` actor logs
235
236### Dependencies
237
238- **apify** (^3.0.0): Apify SDK for actor development
239- **crawlee** (^3.0.0): Web scraping and crawling library
240- **cheerio**: HTML parsing (included in Crawlee)
241
242Development dependencies:
243- **mocha**: Test runner
244- **chai**: Assertion library
245- **eslint**: Code linting
246
247### Linting
248
249```bash
250npm run lint # Check for issues
251npm run lint:fix # Auto-fix issues
252```
253
254## Recent Changes
255
256Based on git history, recent updates include:
257- Support for date format with spaces after dots (e.g., `31. 10. 2025`)
258- Added time fields (`From`/`To`) to output when available
259- Enhanced date/time parsing from promotional text
260- Updated for new Kaktus website structure
261
262## Troubleshooting
263
264### No data extracted
2651. Verify the target URL is accessible
2662. Check if website structure has changed
2673. Add debug logging to see what content is being scraped
2684. Test parsing functions with current website content
269
270### Wrong dates extracted
2711. Check if date format on website has changed
2722. Review regex patterns in `utils.js`
2733. Add test cases for the new format
2744. Update parsing logic accordingly
275
276### Emails not sending
2771. Ensure date matches today's date exactly
2782. Verify INPUT.json has valid email configuration
2793. Check Apify platform logs for `apify/send-mail` actor
2804. Verify `apify/send-mail` actor is accessible
281
282## Development Workflow
283
2841. **Make changes** to source files
2852. **Add tests** for new functionality
2863. **Run tests**: `npm test`
2874. **Lint code**: `npm run lint:fix`
2885. **Test locally**: `apify run -p`
2896. **Verify output** in `./storage/datasets/default/`
2907. **Commit and push** changes
2918. **Deploy** to Apify platform
292
293## Additional Resources
294
295- [Apify SDK Documentation](https://docs.apify.com/sdk/js)
296- [Crawlee Documentation](https://crawlee.dev/)
297- [Kaktus Website](https://www.mujkaktus.cz/chces-pridat)