Skip to content

Update main with development#24

Merged
hdmGOAT merged 44 commits into
mainfrom
development
Jun 18, 2026
Merged

Update main with development#24
hdmGOAT merged 44 commits into
mainfrom
development

Conversation

@hdmGOAT

@hdmGOAT hdmGOAT commented Jun 18, 2026

Copy link
Copy Markdown
Owner

Summary by CodeRabbit

  • New Features

    • Added six new event data sources: ClickTheCity, SISTIC, Eventsize, Meetup, EventAlways, and TicketSpice.
    • Organizer profile enrichment from website crawling for better contact data.
    • Cross-source deduplication system reducing event, venue, and organizer duplicates.
    • CSV export for organizers.
    • Individual venue detail pages with associated events.
    • Enhanced event and venue filtering and sorting options.
  • UI/UX Improvements

    • Updated branding to "VEENT SCRAPER."
    • Loading skeletons for improved perceived performance.

hdmGOAT and others added 30 commits June 17, 2026 16:53
- api_venues now serializes agents_primary_types alongside primary_type_display
- VenueRow type gains agents_primary_types: string[]
- Venues table Type column shows AI labels (comma-joined); falls back to
  primary_type_display for venues not yet classified

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
feat(venues): display agents_primary_types in venues UI
- Extend api_events to return organizer_slug and venue_slug
- Add api_venue_detail JSON endpoint and URL
- Add VenueDetail type and api.venue() client method
- Render organizer/venue as links when slug is available, plain text otherwise
- Create new /venues/[slug] detail page mirroring organizer detail layout

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…l page

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
feat(events): clickable organizer and venue links
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…and email

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds GET /api/organizers/export/ that streams a CSV of all organizers
matching the current q/status filters with no pagination. Columns:
Name, Email, Phone, Website, Address, City, Country, Facebook,
Instagram, Source. Frontend organizers page gets an Export CSV button
that passes the active filter state into the download URL.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…root

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
feat(ui): rebrand sidebar, remove notification bell and email
chore: remove debug scripts and HTML/screenshot artifacts
… spam

- Events page: add Source and Category select dropdowns backed by
  existing api.eventsBySource() and api.eventsByCategory() endpoints
- All table pages (events, venues, organizers): disable Previous/Next
  buttons while loading to prevent duplicate requests on rapid clicks

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…nizers

Events:
- Sort by name / starts_at via backend ordering param
- Upcoming only toggle filter

Venues:
- Sort by name, city, rating, event_count via backend ordering param
- Type filter dropdown (new api/venues/types/ endpoint)
- ordering resets when switching status tabs

Organizers:
- Add sortable Source column (client-side)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add enriched_at and enrichment_source fields to Organizer model
- Fix save_organizers() to merge-not-overwrite existing contact data
- Add DIFFBOT_API_KEY and HUNTER_API_KEY to settings
- New management command: enrich_organizers (--limit, --dry-run, --force, --delay)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Removes Diffbot and Hunter.io API dependencies in favor of a self-contained
HTML crawler. Adds contact_extractor.py as a shared extraction helper and
rewrites the enrich_organizers management command to use direct HTTP crawling.
Drops DIFFBOT_API_KEY and HUNTER_API_KEY from settings.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…et encoding

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Active Sources StatCard now displays data.scrapers.length as the primary
value (number of scrapers registered in Scraper Center) and
data.stats.active_sources as the sub-label ("N with events"), replacing
the previous inversion.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
hdmGOAT and others added 14 commits June 18, 2026 10:41
…elect, deduplicate organizer source display

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
feat(ui): source/category filters on events table + pagination spam fix
…row DedupResult entity type

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…fix always_update set

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…pdate

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
feat(organizers): crawler-based enrichment (no API keys)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
fix(dashboard): show scraper center count as active sources
@coderabbitai

coderabbitai Bot commented Jun 18, 2026

Copy link
Copy Markdown

Review Change Stack

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 0a4a5405-17d3-41e9-a8bd-bbe143b924fa

📥 Commits

Reviewing files that changed from the base of the PR and between 5a83a4c and 4ba0899.

⛔ Files ignored due to path filters (1)
  • apps/backend/debug_screenshot.png is excluded by !**/*.png
📒 Files selected for processing (47)
  • apps/backend/check_schema.py
  • apps/backend/config/settings.py
  • apps/backend/debug_allevents.py
  • apps/backend/debug_detail_happeningnext.py
  • apps/backend/debug_detail_page.html
  • apps/backend/debug_happeningnext.py
  • apps/backend/debug_page.html
  • apps/backend/events/management/commands/enrich_organizers.py
  • apps/backend/events/migrations/0017_organizer_enrichment_fields.py
  • apps/backend/events/migrations/0018_alter_organizer_enrichment_help_text.py
  • apps/backend/events/models.py
  • apps/backend/events/scrapers/__init__.py
  • apps/backend/events/scrapers/base.py
  • apps/backend/events/scrapers/clickthecity.py
  • apps/backend/events/scrapers/contact_extractor.py
  • apps/backend/events/scrapers/eventalways.py
  • apps/backend/events/scrapers/eventsize.py
  • apps/backend/events/scrapers/meetup.py
  • apps/backend/events/scrapers/sistic.py
  • apps/backend/events/tests.py
  • apps/backend/events/urls.py
  • apps/backend/events/views.py
  • apps/backend/package.json
  • apps/backend/scripts/classify-neon-venues.py
  • apps/backend/scripts/dedup.py
  • apps/backend/scripts/deduplicate.py
  • apps/frontend/src/lib/api.ts
  • apps/frontend/src/lib/components/ChartSkeleton.svelte
  • apps/frontend/src/lib/components/PageHeader.svelte
  • apps/frontend/src/lib/components/Sidebar.svelte
  • apps/frontend/src/lib/components/StatCard.svelte
  • apps/frontend/src/lib/types.ts
  • apps/frontend/src/routes/+page.svelte
  • apps/frontend/src/routes/events/+page.svelte
  • apps/frontend/src/routes/organizers/+page.svelte
  • apps/frontend/src/routes/scrapers/+page.svelte
  • apps/frontend/src/routes/venues/+page.svelte
  • apps/frontend/src/routes/venues/[slug]/+page.svelte
  • apps/frontend/src/routes/venues/[slug]/+page.ts
  • docs/README.md
  • docs/deduplication/README.md
  • docs/deduplication/api-reference.md
  • docs/deduplication/overview.md
  • docs/deduplication/protocols.md
  • docs/deduplication/running-the-script.md
  • process/general-plans/active/csv-export-organizers_PLAN_18-06-26.md
  • process/general-plans/active/deduplication_PLAN_18-06-26.md

📝 Walkthrough

Walkthrough

Adds a two-layer cross-source deduplication system (scripts/dedup.py, scripts/deduplicate.py, and a post-save hook in base.py), five new event scrapers (ClickTheCity, SISTIC, Eventsize, Meetup, EventAlways), an organizer contact-enrichment management command, backend API extensions (CSV export, venue detail, dedup/script triggers), and corresponding SvelteKit frontend pages and filter/sort controls. Debug scripts and HTML artifacts are removed.

Changes

Dedup System, New Scrapers, Organizer Enrichment, and Frontend Expansions

Layer / File(s) Summary
Dedup normalization, duplicate finders, and merge functions
apps/backend/scripts/dedup.py
Pure-Python normalization helpers for names, URLs, dates, and cities; richness-based winner selection; two-pass duplicate finders for events, venues, and organizers with guards (place_id conflict, date proximity, shared-words); merge functions that fill blank winner fields, remap FK references, and hard-delete losers.
Standalone dedup CLI runner
apps/backend/scripts/deduplicate.py
Connects to Postgres via DATABASE_URL, dispatches per-entity duplicate detection and transactional merge/rollback per group, supports --entity, --dry-run, and --verbose, and prints a final summary table.
Post-save dedup hook wired into scraper base
apps/backend/events/scrapers/base.py
Adds _dedup_after_save and internal normalization/grouping utilities; wires URL-only dedup into save_events, selective blank-fill semantics into save_organizers, and full dedup into save_venues; all failures swallowed.
Dedup tests and documentation
apps/backend/events/tests.py, docs/deduplication/*, process/general-plans/active/*
Test classes for normalization helpers, duplicate finders, FK remapping, protected-field preservation, _dedup_after_save dispatch, and organizer CSV export; full deduplication docs (overview, api-reference, protocols, running-the-script) and plan documents.
New scrapers: ClickTheCity and SISTIC
apps/backend/events/scrapers/clickthecity.py, apps/backend/events/scrapers/sistic.py, apps/backend/events/scrapers/__init__.py
ClickTheCityScraper fetches a JSON API and constructs venue/event objects. SisticScraper pages through a Drupal CMS REST API, strips HTML, parses timezone-aware datetimes, and builds event/venue/organizer objects. Both registered in SCRAPERS.
New headless scrapers: Eventsize, Meetup, EventAlways, and contact extractor
apps/backend/events/scrapers/eventsize.py, apps/backend/events/scrapers/meetup.py, apps/backend/events/scrapers/eventalways.py, apps/backend/events/scrapers/contact_extractor.py
EventsizeScraper discovers URLs via Google SERP and offers API, parses JSON-LD/OG. MeetupScraper uses Playwright stealth with GraphQL interception and NEXT_DATA extraction. EventAlwaysScraper uses StealthyFetcher with listing pagination and LD+JSON detail parsing. Shared contact_extractor.py parses email, phone, social URLs, and JSON-LD postal addresses.
Organizer enrichment: model, migrations, and command
apps/backend/events/models.py, apps/backend/events/migrations/0017_*, apps/backend/events/migrations/0018_*, apps/backend/events/management/commands/enrich_organizers.py
Adds enriched_at and enrichment_source fields to Organizer with two migrations. The enrich_organizers command validates public URLs, fetches homepages with HTTP/stealth fallback, crawls /contact and /about, and writes only to blank organizer fields.
Backend API extensions: export, venue detail, and trigger endpoints
apps/backend/events/views.py, apps/backend/events/urls.py
Adds api_organizers_export (CSV download), api_venue_detail, api_venue_types, api_dedup_trigger (subprocess with threading.Lock), and api_script_trigger (whitelisted detached subprocess). Extends api_events/api_venues with ordering, upcoming, type params and venue_slug/organizer_slug/agents_primary_types fields.
Frontend types, API client, and venue detail route
apps/frontend/src/lib/types.ts, apps/frontend/src/lib/api.ts, apps/frontend/src/routes/venues/[slug]/+page.ts, apps/frontend/src/routes/venues/[slug]/+page.svelte
Adds VenueDetail, DedupResult, ScriptStartResult interfaces; extends EventRow/VenueRow; adds api.venue, api.venueTypes, api.deduplicate, api.runScript methods; adds venues/[slug] route with full detail page.
Frontend page updates: events, venues, organizers, scrapers, dashboard
apps/frontend/src/routes/events/+page.svelte, apps/frontend/src/routes/venues/+page.svelte, apps/frontend/src/routes/organizers/+page.svelte, apps/frontend/src/routes/scrapers/+page.svelte, apps/frontend/src/routes/+page.svelte, apps/frontend/src/lib/components/*
Events page gains source/category/upcoming filters and column sorting; venues page gains type filter and column sorting; organizers page gains Source column and CSV export button; scrapers page gains AI script trigger buttons and dedup action/output; dashboard gains ChartSkeleton loading states and StatCard href links; shared components updated (Sidebar branding, StatCard href, PageHeader fallback removal).
Debug file removal and minor housekeeping
apps/backend/debug_*.py, apps/backend/debug_*.html, apps/backend/check_schema.py, apps/backend/package.json, apps/backend/scripts/classify-neon-venues.py, apps/backend/config/settings.py
Removes all debug scripts and HTML artifacts; updates package.json dev script to Windows venv path; replaces Unicode arrow with ASCII in classify-neon-venues.py.

Sequence Diagram(s)

sequenceDiagram
  participant Frontend
  participant DjangoAPI
  participant DedupScript as scripts/deduplicate.py
  participant DeduPy as scripts/dedup.py
  participant PostgresDB

  Frontend->>DjangoAPI: POST /api/scrapers/dedup/ {entity}
  DjangoAPI->>DjangoAPI: acquire _DEDUP_LOCK (non-blocking)
  DjangoAPI->>DedupScript: subprocess.run --entity venues/organizers/events
  DedupScript->>PostgresDB: connect via DATABASE_URL
  DedupScript->>DeduPy: find_X_duplicates(cursor)
  DeduPy->>PostgresDB: SELECT rows, group by normalized key
  DeduPy-->>DedupScript: [[winner_id, loser_ids], ...]
  loop each group
    DedupScript->>DeduPy: merge_X(cursor, winner_id, loser_ids)
    DeduPy->>PostgresDB: UPDATE winner fields
    DeduPy->>PostgresDB: UPDATE FK references on events_event
    DeduPy->>PostgresDB: DELETE loser rows
    DedupScript->>PostgresDB: COMMIT
  end
  DedupScript-->>DjangoAPI: stdout summary
  DjangoAPI-->>Frontend: {output, entity}
Loading
sequenceDiagram
  participant BaseScraper
  participant SaveOrganizers as save_organizers
  participant DedupAfterSave as _dedup_after_save
  participant DeduPy as scripts/dedup.py logic (inline)
  participant OrganizerDB

  BaseScraper->>SaveOrganizers: [ScrapedOrganizer, ...]
  loop each organizer
    SaveOrganizers->>OrganizerDB: INSERT or UPDATE (always_update fields + blank-fill)
    SaveOrganizers->>SaveOrganizers: collect organizer_ids
  end
  SaveOrganizers->>DedupAfterSave: ("organizers", organizer_ids)
  DedupAfterSave->>OrganizerDB: SELECT rows WHERE id IN organizer_ids
  DedupAfterSave->>DedupAfterSave: bucket by normalized website/name
  DedupAfterSave->>OrganizerDB: UPDATE winner + remap FK + DELETE losers
  DedupAfterSave-->>SaveOrganizers: (exceptions swallowed)
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

  • hdmGOAT/veent-event-scraper#20: Directly overlaps with the deduplication implementation — _dedup_after_save hook in events/scrapers/base.py, scripts/dedup.py/scripts/deduplicate.py, dedup trigger endpoint in events/views.py, and corresponding frontend API/type wiring.
  • hdmGOAT/veent-event-scraper#21: Directly overlaps with the organizer crawler enrichment work — enrich_organizers.py management command, contact_extractor.py, and the Organizer enrichment fields/migrations.
  • hdmGOAT/veent-event-scraper#23: Overlaps with the MeetupScraper and EventAlwaysScraper implementations and the shared post-save deduplication hook in events/scrapers/base.py.

Suggested reviewers

  • potakaaa

Poem

🐇 Hop, hop, the duplicates flee,
Five new scrapers buzz like a bee,
Winners keep fields, losers depart,
CSV exports — a work of art!
VEENT SCRAPER glows on the side,
Enriched organizers fill with pride. ✨

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch development

Comment @coderabbitai help to get the list of available commands and usage tips.

@hdmGOAT hdmGOAT merged commit 8813b58 into main Jun 18, 2026
4 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants