Skip to content

Feat/meetup eventalways#23

Merged
hdmGOAT merged 2 commits into
developmentfrom
feat/meetup-eventalways
Jun 18, 2026
Merged

Feat/meetup eventalways#23
hdmGOAT merged 2 commits into
developmentfrom
feat/meetup-eventalways

Conversation

@JESREAL1JDL7LUSTRE

@JESREAL1JDL7LUSTRE JESREAL1JDL7LUSTRE commented Jun 18, 2026

Copy link
Copy Markdown
Collaborator

Summary by CodeRabbit

Release Notes

  • New Features

    • Added support for scraping events from ClickTheCity, EventAlways, Eventsize, Meetup, and Sistic
    • Introduced automatic cross-source deduplication system for Events, Venues, and Organizers
    • Added venue detail pages with event listings
    • Implemented organizer CSV export functionality
    • Added UI controls for script execution and deduplication
  • Documentation

    • Added comprehensive deduplication system documentation
    • Added feature implementation plans
  • Chores

    • Removed debug/test scripts and files
    • Updated development environment configuration

@coderabbitai

coderabbitai Bot commented Jun 18, 2026

Copy link
Copy Markdown

Review Change Stack

Caution

Review failed

Failed to post review comments

📝 Walkthrough

Walkthrough

Removes six debug scripts and HTML artifacts. Adds five new event scrapers (ClickTheCity, SISTIC, EventAlways, Eventsize, Meetup) and registers them in the scraper registry. Introduces a two-layer deduplication system: an inline post-save hook in the scraper base and a standalone CLI script backed by a pure-Python utility module. Adds backend endpoints for venue detail, organizer CSV export, dedup trigger, and script trigger; extends the frontend with a venue detail page, clickable event links, organizer export button, and scraper control buttons.

Changes

Scrapers, Dedup System, and UI Expansion

Layer / File(s) Summary
Debug artifact removal
apps/backend/debug_page.html
Removes the Cloudflare challenge HTML debug artifact (the six debug Python scripts and HTML files are removed across the diff).
Dedup utility module and standalone CLI script
apps/backend/scripts/dedup.py, apps/backend/scripts/deduplicate.py
dedup.py defines pure-Python normalization helpers, two-pass duplicate finders with guards (place_id conflict, date proximity, shared name words), richness-based winner selection, and merge functions that fill fields, remap FKs, and hard-delete losers. deduplicate.py is a CLI entry point that loads .env, connects via psycopg2, runs per-entity dedup with --dry-run/--verbose, manages per-group transactions, and prints a summary.
Inline dedup hook wired into save functions
apps/backend/events/scrapers/base.py
Adds _dedup_after_save() with URL/name normalization, grouping, blank-field fill, protected-field handling, and entity-specific merge strategies; wires it into save_events, save_organizers, and save_venues with full exception swallowing.
New REST-based scrapers: ClickTheCity and SISTIC
apps/backend/events/scrapers/clickthecity.py, apps/backend/events/scrapers/sistic.py
ClickTheCityScraper fetches up to 1000 events from a single unauthenticated JSON REST API. SisticScraper uses a two-step Drupal CMS REST API flow (paginated listings then per-alias detail), parses HTML fields, and converts dates to Asia/Singapore datetimes. Both persist via save_events/save_organizers.
New stealth scrapers: EventAlways, Eventsize, Meetup; scraper registry
apps/backend/events/scrapers/eventalways.py, apps/backend/events/scrapers/eventsize.py, apps/backend/events/scrapers/meetup.py, apps/backend/events/scrapers/__init__.py
EventAlwaysScraper uses StealthyFetcher for paginated category scraping with LD+JSON detail parsing. EventsizeScraper discovers URLs via Google SERP and a public offers API, parses Schema.org JSON-LD with OG fallback, and scrapes organizer profiles. MeetupScraper uses headless Playwright to intercept GraphQL responses and extract event/group nodes. All five new scrapers are registered in the SCRAPERS dict.
New backend API endpoints: venue detail, organizer export, dedup/script triggers
apps/backend/events/views.py, apps/backend/events/urls.py, apps/backend/package.json, apps/backend/scripts/classify-neon-venues.py
api_events gains venue_slug/organizer_slug. api_organizers_export streams a CSV attachment. api_venue_detail returns venue metadata plus up to 50 related events. api_dedup_trigger runs deduplicate.py via subprocess.run with a _DEDUP_LOCK guard. api_script_trigger starts allowlisted scripts in detached subprocesses. URL patterns are wired for all four. Package.json dev script updated to Windows venv path.
Frontend types, API client, and venue detail page
apps/frontend/src/lib/types.ts, apps/frontend/src/lib/api.ts, apps/frontend/src/routes/venues/[slug]/+page.svelte, apps/frontend/src/routes/venues/[slug]/+page.ts
EventRow gains venue_slug/organizer_slug; VenueRow gains agents_primary_types; VenueDetail, DedupResult, and ScriptStartResult interfaces are added. api object gains venue(), deduplicate(), and runScript(). The new venues/[slug] route loads and renders a full venue profile with conditional fields and a related events table.
Frontend UI updates: event links, organizer export, scrapers controls, venue type display
apps/frontend/src/routes/events/+page.svelte, apps/frontend/src/routes/organizers/+page.svelte, apps/frontend/src/routes/scrapers/+page.svelte, apps/frontend/src/routes/venues/+page.svelte, apps/frontend/src/lib/components/PageHeader.svelte, apps/frontend/src/lib/components/Sidebar.svelte
Events page renders conditional slug-based links for venue/organizer cells. Organizers page gains an "Export CSV" button. Scrapers page adds handlers and buttons for AI scripts and deduplication with error/output display. Venues list prefers agents_primary_types for the Type column. PageHeader removes the date fallback; Sidebar rebrands to "VEENT SCRAPER" and removes the admin email.
Dedup and export tests
apps/backend/events/tests.py
Adds NormalizationTests, FindDuplicatesTests, MergeTests, DedupCommandTests (covering _dedup_after_save), and OrganizerExportTests with path-patching for dedup_utils and a RealDictCursor helper.
Dedup documentation and implementation plans
docs/README.md, docs/deduplication/*, process/general-plans/active/*
Adds a full deduplication documentation suite (overview, API reference, protocols, running-the-script, README) and process plan documents for both the deduplication system and the CSV export feature.

Sequence Diagram(s)

sequenceDiagram
  participant Frontend as Scrapers Page
  participant BackendView as api_dedup_trigger / api_script_trigger
  participant Subprocess as deduplicate.py / AI script
  participant DB as PostgreSQL

  rect rgba(59, 130, 246, 0.5)
    note over Frontend,DB: Deduplication flow
    Frontend->>BackendView: POST /api/scrapers/dedup/
    BackendView->>BackendView: acquire _DEDUP_LOCK
    BackendView->>Subprocess: subprocess.run deduplicate.py --entity all
    Subprocess->>DB: find_*_duplicates(cursor)
    DB-->>Subprocess: duplicate groups
    Subprocess->>DB: merge_*(cursor, winner_id, loser_ids)
    Subprocess-->>BackendView: stdout (summary table)
    BackendView-->>Frontend: { output, entity }
  end

  rect rgba(16, 185, 129, 0.5)
    note over Frontend,DB: Script trigger flow
    Frontend->>BackendView: POST /api/scripts/classify-events/run/
    BackendView->>BackendView: validate against _ALLOWED_SCRIPTS
    BackendView->>Subprocess: Popen(detached AI script)
    Subprocess-->>BackendView: pid
    BackendView-->>Frontend: { started: true, pid }
  end
Loading
sequenceDiagram
  participant Scraper as BaseScraper (save_events/venues/organizers)
  participant Hook as _dedup_after_save
  participant DB as Django ORM

  Scraper->>DB: upsert events/venues/organizers
  Scraper->>Hook: _dedup_after_save("events", event_ids)
  Hook->>DB: query rows by ids
  Hook->>Hook: normalize URLs, group by key
  alt duplicates found
    Hook->>DB: fill missing fields on winner
    Hook->>DB: UPDATE Event FK references
    Hook->>DB: DELETE loser rows
  end
  Hook-->>Scraper: returns (exceptions swallowed)
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

  • hdmGOAT/veent-event-scraper#9: Both PRs update apps/backend/events/scrapers/__init__.py's SCRAPERS registry — the retrieved PR adds allevents/eventbee/ticketmelon scrapers to the same dict this PR extends.
  • hdmGOAT/veent-event-scraper#20: Directly overlaps with this PR's deduplication rollout — same _dedup_after_save hook in base.py, same scripts/dedup.py/deduplicate.py files, and same test classes.
  • hdmGOAT/veent-event-scraper#17: Both implement the same api_organizers_export backend view, events/urls.py route, and OrganizerExportTests coverage for the organizer CSV export feature.

Suggested reviewers

  • potakaaa
  • hdmGOAT

Poem

🐰 Hop, hop! Five scrapers born today,
ClickTheCity, SISTIC join the fray,
Meetup, Eventsize, EventAlways too —
Dedup merges all the doubles askew.
A venue page blooms, CSV flows free,
Clean data at last — just trust the bunny! 🎉

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 38.97% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'Feat/meetup eventalways' is vague and uses a branch naming convention rather than a clear, descriptive commit message. Revise the title to clearly describe the main change (e.g., 'Add Meetup and EventAlways scrapers with deduplication system').
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/meetup-eventalways

Comment @coderabbitai help to get the list of available commands and usage tips.

@JESREAL1JDL7LUSTRE JESREAL1JDL7LUSTRE changed the base branch from main to development June 18, 2026 03:46
@hdmGOAT hdmGOAT merged commit 4ba0899 into development Jun 18, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants