Review plant-id 0299fca — EXIF extraction, iNaturalist enrichment, geo-status layout #11

Open
opened 2026-03-15 21:34:43 +01:00 by timothy · 0 comments
Owner

What changed

Commit 0299fca on plant-id main.

Features

1. EXIF GPS + datetime priority (new extract_exif_geo_dt() in app.py)

  • Reads GPSInfo IFD (tag 34853) and DateTimeOriginal (tag 36867) from the first uploaded image using Pillow's Image.getexif() / .get_ifd()
  • If EXIF GPS present → overrides browser lat/lng for the history record
  • If EXIF DateTimeOriginal present → used as created_at instead of time.time()
  • Wrapped in broad try/except Exception with logger.debug on failure; always rewinds file to 0 in finally
  • Coordinate sanity check: not (-90 <= lat <= 90 and -180 <= lng <= 180) → nulled

2. iNaturalist read-only observation counts (new /inat-obs route)

  • Unauthenticated GET to api.inaturalist.org/v1/observations?taxon_name=<name>&quality_grade=research&per_page=1
  • Returns total_results as observation count
  • Cached in new inat_cache(scientific_name PRIMARY KEY, observation_count, cached_at) table, 7-day TTL
  • Frontend renders clickable chip (<a> tag) linking to iNaturalist species page
  • User-Agent header sent as courtesy: PlantID-App/1.0 (https://plants.tblindustries.be)

3. Geo-status inline with Single/Batch toggle — pure CSS/HTML layout change. Low risk.

Files changed

  • app.py (+114 lines) — extract_exif_geo_dt(), /inat-obs route, inat_cache table init, INAT_API/INAT_CACHE_DAYS constants
  • static/js/identify.js (+25 lines) — fetchInatCounts(), call in renderResults()
  • static/css/main.css (+10 lines) — .form-top-row, a.care-chip styles
  • templates/index.html (+8 lines) — .form-top-row wrapper, #geo-status moved

Areas to review

EXIF extraction

  • Image.getexif() on untrusted upload — Pillow's EXIF parser has had CVEs historically (e.g. PIL 9.x decompression bombs in certain tag types). Is img.verify() in validate_image() sufficient to catch malformed EXIF, or does it not validate EXIF at all?
  • get_ifd(34853) — does Pillow raise or silently return {} on malformed GPS IFD? The broad except Exception catches it either way, but worth knowing.
  • GPS spoofing: a user can trivially craft EXIF GPS. The app treats EXIF as more authoritative than browser GPS — is there any scenario where this is a problem? (Probably not for a personal plant-ID app, but worth noting.)
  • DateTimeOriginal could be any date — past (photo from 2010) or future (malformed/spoofed). created_at ordering in history would be wrong but not exploitable. Any concern with extreme timestamps in MariaDB FROM_UNIXTIME() queries in stats?

iNaturalist API

  • taxon_name parameter is passed directly from request.args to the external API as a query param (via requests.get params=). requests URL-encodes it automatically — any injection surface here?
  • No rate-limit handling: if iNat returns 429, the endpoint returns 502. DB cache prevents hammering for cached names, but a burst of unique-name lookups could hit rate limits. Acceptable for a personal app?
  • total_results from iNat JSON is trusted as an integer — .get("total_results", 0) with no bounds check before storing in INT column. Max plausible value is ~10M for common species; MariaDB INT max is ~2.1B. Fine.

Not flagged (looks OK)

  • extract_exif_geo_dt always rewinds file_storage to 0 in finally — subsequent convert_image() call gets the full file ✓
  • EXIF extraction runs on the original file before JPEG conversion (Pillow strips EXIF on save) ✓
  • inat_cache uses scientific_name VARCHAR(255) PRIMARY KEY — same pattern as species table ✓
  • a.care-chip href is constructed from encodeURIComponent(r.scientific_name) — XSS safe ✓
  • Auth check via get_current_user() on /inat-obs
## What changed Commit `0299fca` on `plant-id` main. ### Features **1. EXIF GPS + datetime priority** (new `extract_exif_geo_dt()` in `app.py`) - Reads `GPSInfo` IFD (tag 34853) and `DateTimeOriginal` (tag 36867) from the first uploaded image using `Pillow`'s `Image.getexif()` / `.get_ifd()` - If EXIF GPS present → overrides browser lat/lng for the history record - If EXIF `DateTimeOriginal` present → used as `created_at` instead of `time.time()` - Wrapped in broad `try/except Exception` with `logger.debug` on failure; always rewinds file to 0 in `finally` - Coordinate sanity check: `not (-90 <= lat <= 90 and -180 <= lng <= 180)` → nulled **2. iNaturalist read-only observation counts** (new `/inat-obs` route) - Unauthenticated GET to `api.inaturalist.org/v1/observations?taxon_name=<name>&quality_grade=research&per_page=1` - Returns `total_results` as observation count - Cached in new `inat_cache(scientific_name PRIMARY KEY, observation_count, cached_at)` table, 7-day TTL - Frontend renders clickable chip (`<a>` tag) linking to iNaturalist species page - User-Agent header sent as courtesy: `PlantID-App/1.0 (https://plants.tblindustries.be)` **3. Geo-status inline with Single/Batch toggle** — pure CSS/HTML layout change. Low risk. ## Files changed - `app.py` (+114 lines) — `extract_exif_geo_dt()`, `/inat-obs` route, `inat_cache` table init, `INAT_API`/`INAT_CACHE_DAYS` constants - `static/js/identify.js` (+25 lines) — `fetchInatCounts()`, call in `renderResults()` - `static/css/main.css` (+10 lines) — `.form-top-row`, `a.care-chip` styles - `templates/index.html` (+8 lines) — `.form-top-row` wrapper, `#geo-status` moved ## Areas to review ### EXIF extraction - `Image.getexif()` on untrusted upload — Pillow's EXIF parser has had CVEs historically (e.g. PIL 9.x decompression bombs in certain tag types). Is `img.verify()` in `validate_image()` sufficient to catch malformed EXIF, or does it not validate EXIF at all? - `get_ifd(34853)` — does Pillow raise or silently return `{}` on malformed GPS IFD? The broad `except Exception` catches it either way, but worth knowing. - GPS spoofing: a user can trivially craft EXIF GPS. The app treats EXIF as _more_ authoritative than browser GPS — is there any scenario where this is a problem? (Probably not for a personal plant-ID app, but worth noting.) - `DateTimeOriginal` could be any date — past (photo from 2010) or future (malformed/spoofed). `created_at` ordering in history would be wrong but not exploitable. Any concern with extreme timestamps in MariaDB `FROM_UNIXTIME()` queries in stats? ### iNaturalist API - `taxon_name` parameter is passed directly from `request.args` to the external API as a query param (via `requests.get params=`). `requests` URL-encodes it automatically — any injection surface here? - No rate-limit handling: if iNat returns 429, the endpoint returns 502. DB cache prevents hammering for cached names, but a burst of unique-name lookups could hit rate limits. Acceptable for a personal app? - `total_results` from iNat JSON is trusted as an integer — `.get("total_results", 0)` with no bounds check before storing in `INT` column. Max plausible value is ~10M for common species; MariaDB `INT` max is ~2.1B. Fine. ### Not flagged (looks OK) - `extract_exif_geo_dt` always rewinds `file_storage` to 0 in `finally` — subsequent `convert_image()` call gets the full file ✓ - EXIF extraction runs on the original file before JPEG conversion (Pillow strips EXIF on save) ✓ - `inat_cache` uses `scientific_name VARCHAR(255) PRIMARY KEY` — same pattern as `species` table ✓ - `a.care-chip` href is constructed from `encodeURIComponent(r.scientific_name)` — XSS safe ✓ - Auth check via `get_current_user()` on `/inat-obs` ✓
Sign in to join this conversation.