{
  "summary": "EN URL inventory + Semrush real-keyword/volume backbone for a SHARD of hair-transplant domains (selected via args)",
  "agentCount": 2,
  "logs": [
    "Shard: cosmedica, vera",
    "Shard done: 2/2"
  ],
  "result": {
    "domains": [
      {
        "site": "cosmedica",
        "csv_path": "/opt/automator/cinik-rponse/files/raw/cosmedica_master.csv",
        "total_en_urls": 176,
        "ranking_urls": 2,
        "nonranking_urls": 174,
        "patient_case_urls": 5,
        "semrush_rows_pulled": 8,
        "by_type": [
          {
            "type": "post",
            "count": 133
          },
          {
            "type": "faq",
            "count": 20
          },
          {
            "type": "page",
            "count": 11
          },
          {
            "type": "service",
            "count": 11
          },
          {
            "type": "beforeafter",
            "count": 1
          }
        ],
        "theme_preview": [
          {
            "theme": "generic hair-transplant",
            "count": 78
          },
          {
            "theme": "other",
            "count": 73
          },
          {
            "theme": "patient-case",
            "count": 8
          },
          {
            "theme": "price",
            "count": 7
          },
          {
            "theme": "turkey",
            "count": 7
          },
          {
            "theme": "grafts",
            "count": 3
          }
        ],
        "sample_rows": [
          "https://cosmedica.com,page,hair graft turkey,6600,1,8.49,5,0,semrush",
          "https://cosmedica.com/50-famous-bald-celebrities,post,bald men,2900,1,6.0,3,0,semrush",
          "https://cosmedica.com/before-after,beforeafter,,,,,,0,sitemap-only",
          "https://cosmedica.com/afro-hair-transplant,service,,,,,,0,sitemap-only",
          "https://cosmedica.com/fue-hair-transplant,service,,,,,,0,sitemap-only",
          "https://cosmedica.com/10-days-after-hair-transplant,post,,,,,,0,sitemap-only",
          "https://cosmedica.com/can-birth-control-cause-hair-loss,faq,,,,,,0,sitemap-only",
          "https://cosmedica.com/hair-transplant-cost-uk,post,,,,,,0,sitemap-only",
          "https://cosmedica.com/3000-grafts-hair-transplant-keiran-lee,page,,,,,,1,sitemap-only",
          "https://cosmedica.com/3500-grafts-hair-transplant-cafu,page,,,,,,1,sitemap-only",
          "https://cosmedica.com/4000-grafts-hair-transplant-adam22,page,,,,,,1,sitemap-only",
          "https://cosmedica.com/keiran-lee-hair-transplant,post,,,,,,1,sitemap-only"
        ],
        "notes": "CRITICAL — Semrush pull was BLOCKED. execute_report (domain_organic, db=uk) returned \"active subscription but not enough API units\" on both the limit=500 call and a retry; the page-2 offset call also errored (code 605: display_offset must be < display_limit, so true pagination would need offset<500 windows, but units were exhausted regardless). I did NOT fabricate ranking data. The Semrush backbone therefore consists ONLY of 8 genuine rows that had been pulled and saved to disk in a prior interrupted run (cosmedica_semrush_raw1.txt) for this exact domain — verbatim, no invention. Those 8 raw rows aggregate to just 2 EN URLs: the homepage (primary kw \"hair graft turkey\", vol 6600, pos 1, 8.49% traffic, 5 kws) and /50-famous-bald-celebrities (primary kw \"bald men\", vol 2900, pos 1, 6.0%, 3 kws). semrush_rows_pulled=8 reflects this real partial pull, NOT the intended 500-1000. To complete the ranking backbone, the Semrush account needs API units topped up, then re-run domain_organic and re-join.\n\nSITEMAPS: all 3 fetched OK in parallel with browser UA, saved as cosmedica_post-sitemap.xml (968 locs incl images), cosmedica_post-sitemap2.xml (148), cosmedica_page-sitemap.xml (226). All are flat urlset (no nested sitemap indexes). After stripping image locs and de-duping by normalized URL: 1341 total cosmedica page URLs across all languages.\n\nEN FILTER: tabulated first path segments -> 2-letter language buckets discovered were pl(206), de(203), it(192), fr(177), ru(148), nl(95), es(64), br(50), tr(30). br=Brazilian Portuguese, tr=Turkish — both excluded as 2-letter codes. EN = root-level single-segment slug. All 176 EN URLs are depth-1 (no multi-segment folders exist for EN), plus the bare root. Normalization: lowercased host, stripped trailing slash, URL-decoded.\n\nTYPING: page-sitemap -> page, except hair/beard/eyebrow-transplant + dhi/fue/sapphire/afro service slugs -> service (11). post-sitemaps -> post, except question-form slugs (does/do/can/is/what/why/how...) -> faq (20). /before-after -> beforeafter (1). The 3 celebrity graft pages (keiran-lee, cafu, adam22) live in page-sitemap so are typed \"page\" but flagged is_patient_case_url=1.\n\nPATIENT CASES (5): 3000-grafts-hair-transplant-keiran-lee, 3500-grafts-hair-transplant-cafu, 4000-grafts-hair-transplant-adam22 (named individual + graft count), plus keiran-lee-hair-transplant and adam22-hair-transplant. I deliberately EXCLUDED the 1000/2000/3000/4000/5000-grafts-hair-transplant-cost pages (generic price pages, not named testimonials) and 50-famous-bald-celebrities (a listicle, not a single patient case).\n\nJOIN: 2 EN sitemap URLs matched Semrush (source=semrush); 174 are source=sitemap-only with blank kw/vol/pos for later fetch+read enrichment. Both Semrush EN URLs (/ and /50-famous-bald-celebrities) were already present in the sitemap set, so no extra semrush-only rows were appended. No www-vs-nonwww ambiguity (site uses bare cosmedica.com; all sitemap and Semrush URLs are https://cosmedica.com without www). Master CSV verified: 177 lines (1 header + 176 data), every row exactly 9 fields. theme_preview is kw-or-slug-based across all 176 URLs (real keyword used where present, else slug tokens) since only 2 URLs have real keywords. Supporting files: cosmedica_semrush.csv (aggregated map, 2 rows), cosmedica_en_universe.json, cosmedica_all_urls.json. Note: leftover artifacts from a prior run (cosmedica_universe.json) were ignored; I rebuilt fresh."
      },
      {
        "site": "vera",
        "csv_path": "/opt/automator/cinik-rponse/files/raw/vera_master.csv",
        "total_en_urls": 116,
        "ranking_urls": 116,
        "nonranking_urls": 0,
        "patient_case_urls": 0,
        "semrush_rows_pulled": 500,
        "by_type": [
          {
            "type": "page",
            "count": 114
          },
          {
            "type": "post",
            "count": 1
          },
          {
            "type": "beforeafter",
            "count": 1
          }
        ],
        "theme_preview": [
          {
            "theme": "generic hair-transplant",
            "count": 14
          },
          {
            "theme": "turkey",
            "count": 17
          },
          {
            "theme": "price",
            "count": 7
          },
          {
            "theme": "grafts",
            "count": 4
          },
          {
            "theme": "patient-case",
            "count": 0
          },
          {
            "theme": "other",
            "count": 74
          }
        ],
        "sample_rows": [
          "https://www.veraclinic.net/,page,vera clinic turkey,590,1,19.97,50,0,semrush",
          "https://www.veraclinic.net/hair-transplant-turkey-cost,page,fue turkey cost,2900,16,0.54,21,0,semrush",
          "https://www.veraclinic.net/hair-transplant-cost,page,turkey hair transplant cost,1900,16,0.27,5,0,semrush",
          "https://www.veraclinic.net/beard-transplant-in-turkey,page,turkey beard transplant,260,6,0.82,4,0,semrush",
          "https://www.veraclinic.net/citalopram-hair-loss,page,citalopram and hair loss,110,2,0.95,6,0,semrush",
          "https://www.veraclinic.net/hair-transplant-for-women,page,female hair transplant,1600,31,0.13,12,0,semrush",
          "https://www.veraclinic.net/lichen-planopilaris,page,lichen planopilaris,2400,24,0.41,4,0,semrush",
          "https://www.veraclinic.net/hair-transplant-turkey-reviews,page,vera clinic turkey reviews,110,2,0.54,2,0,semrush",
          "https://www.veraclinic.net/hair-transplant-before-after,beforeafter,hair replacement turkey,1900,96,0.00,8,0,semrush",
          "https://www.veraclinic.net/blog/page/2,post,vera clinic london,140,15,0.00,1,0,semrush",
          "https://www.veraclinic.net/tummy-tuck,page,tummy tuck turkey,1600,33,0.13,2,0,semrush",
          "https://www.veraclinic.net/what-is-a-hair-graft,page,hair grafts,390,82,0.00,2,0,semrush"
        ],
        "notes": "Site blocked by Cloudflare (403) so sitemaps were skipped entirely per instructions; the entire URL universe is from Semrush, hence ALL 116 rows are source=semrush and nonranking_urls=0 (no sitemap-only rows). SEMRUSH PAGINATION INCOMPLETE: only the first 500 rows (display_sort=tr_desc, the highest-traffic slice) were retrieved. The offset=500 and offset=1000 calls first failed with API error 605 (display_offset must be a positive integer LESS than display_limit; offset 500/1000 with limit 500 is invalid). On retry with display_limit=1500/display_offset=500 the Semrush API returned an insufficient-API-units message, so deeper pages could not be fetched. semrush_rows_pulled=500 (raw rows before aggregation). EN FILTER: 480 EN rows kept, 20 excluded because first path segment is a 2-letter language code (observed: de, fr, ar, ru). Excluded set includes all celebrity 'journey'/hair-transplant-story pages (Gordon Ramsay, Henry Cavill, Nicolas Cage, David Silva, Kevin Costner) which live only under /de/ and /fr/ — therefore patient_case_urls=0 (no EN named-individual testimonial URL exists in the data). HOST: every URL is www.veraclinic.net; non-www never appeared, no www/non-www conflict. NORMALIZATION: lowercased host, stripped trailing slash (root kept as '/'), URL-decoded %xx (only affected the excluded ar/ru rows). AGGREGATION: primary keyword per URL = row with highest Traffic(%), tie-broken by higher volume then better position. n_keywords = distinct keywords that URL ranks for in the pulled 500 rows (so it undercounts vs full corpus given truncated pagination). TYPE TAGGING: only one /blog/ URL (/blog/page/2 -> post, a paginated archive not a real article) and one before-after gallery (-> beforeafter); everything else is single-segment editorial/service -> page (no /faq/, /patient/, /service/ folder URLs surfaced in EN). The big 'other' theme bucket (74) is mostly hair-loss-cause condition articles (thyroid, vitamin D, iron deficiency, alopecia, medications, PCOS, etc.) plus aesthetic-surgery pages (rhinoplasty, tummy tuck, BBL, liposuction, veneers) which don't match the hair-transplant/turkey/price/grafts buckets. Intermediate files written: vera_semrush_raw.txt (raw faithful rows), vera_semrush.csv (aggregated per-URL map, 116 rows), build_vera.py (parser/aggregator/join). Master verified at 117 lines (1 header + 116 data) via wc -l."
      }
    ]
  }
}