Daniel sent us this one — he's asking about map projections at the local scale, those tens to hundreds of kilometers where the globe-is-round problem doesn't quite vanish but gets a lot more manageable. He wants to know what choices people actually make at that scale, and then how those choices show up in the Python geospatial stack — GeoPandas, pyproj, the whole foundation. There's a lot here because the global stuff gets all the attention, the Mercator versus Gall-Peters debates, but the decisions that actually break your analysis happen at the local level.
They break silently, which is the part that never gets enough attention. You can run an entire spatial join in the wrong coordinate reference system and GeoPandas will happily give you results. They'll just be wrong in ways that aren't obvious until you look at the numbers and realize your distances are in decimal degrees, which isn't a unit of length at all.
Right, so before we even touch Python, let's talk about what people actually choose at local scales. Because I think most people who've done any GIS know about UTM, the Universal Transverse Mercator, but the State Plane Coordinate System is a whole other animal, and honestly it's where the interesting engineering decisions live.
State Plane is fascinating because it was designed backward from a performance requirement. The specification says distance distortion must be limited to one part in ten thousand. That means if you measure a two-mile distance on the ground, your measurement on the map will be accurate to within about a foot. That's a survey-grade constraint, and it was set in the nineteen thirties.
One part in ten thousand is absurdly tight. Who was the customer for that?
Surveyors, engineers, mappers who needed to connect land surveys to a common reference system. The US Coast and Geodetic Survey developed it, and the design constraint was basically: we need Euclidean geometry to work. If you're building a bridge or laying out a county road network, you don't want to do spherical trigonometry. You want to use Pythagoras.
Which is the whole point of a projection at local scale. You're trading off the fidelity of the sphere for the convenience of a flat plane where distance formulas are simple and areas are straightforward. And the State Plane system achieves that by breaking the entire US into one hundred twenty-four separate zones, each with its own projection and its own parameters.
Here's the design logic that I think is really elegant. The choice of projection for each zone depends on the shape of the zone. If the zone is taller than it is wide — think Illinois or New Mexico — you use Transverse Mercator. That's the same mathematical basis as UTM. But if the zone is wider than it is tall — Iowa, Pennsylvania, California — you use Lambert Conformal Conic. The projection is matched to the geometry of the region it's serving.
The projection isn't just about location, it's about orientation. A north-south elongated state gets a different mathematical treatment than an east-west elongated state, because the distortion patterns are different.
Transverse Mercator minimizes distortion along a north-south meridian. Lambert Conformal Conic minimizes distortion along one or two standard parallels, east-west lines. So you pick the tool that aligns with the shape of your problem. It's a really clean engineering decision, and it's been baked into the system since the nineteen thirties.
UTM, which is more widely known, takes a cruder approach — just slice the entire planet into sixty six-degree-wide strips, north and south hemispheres, giving you a hundred twenty zones total. Within a zone, accuracy is one part in twenty-five hundred, so about two feet of error per mile. Less precise than State Plane, but globally consistent and designed originally by the US Army Corps of Engineers in the nineteen forties for land navigation.
One part in twenty-five hundred versus one part in ten thousand. State Plane is about four times more precise, but it's also hyper-local. If your project crosses a State Plane zone boundary, you've got a problem. UTM zones are wider — six degrees of longitude — so you're less likely to cross boundaries, but at the cost of more distortion within the zone.
Then there's the datum problem, which is where the silent errors really start compounding. State Plane has been through multiple datum revisions — NAD twenty-seven, NAD eighty-three, NAD eighty-three HARN, NAD eighty-three CORS ninety-six, NSRS two thousand seven, NAD eighty-three twenty eleven. That's six major variants, and they don't agree with each other.
State Plane zones can use meters, US survey feet, or international feet. The US survey foot was officially deprecated as of December thirty-first, twenty twenty-two, but legacy data doesn't magically update itself. If you pull a shapefile from a county GIS department that was created in the nineteen nineties, it might be in NAD eighty-three with US survey feet, and if you naively combine it with modern data in NAD eighty-three twenty eleven with meters, you'll get a mismatch.
How big is the mismatch between a US survey foot and an international foot?
Small enough to be invisible in casual inspection, large enough to matter in precise work. The difference is two parts per million — about a tenth of a foot over a mile. That sounds trivial, but if you're doing cadastral mapping or precision agriculture, it accumulates, and suddenly your property boundaries don't quite close or your drone flight paths are slightly offset.
The projection choice at local scale isn't just "pick UTM and move on." It's a whole stack of decisions: which datum, which units, which zone, and whether your data even spans a zone boundary. And if you get any of those wrong, the math still runs. The computer doesn't complain.
That's the silent failure mode. And it brings us directly to the Python stack, because GeoPandas and pyproj are the tools that let you manage this complexity, but they also make it incredibly easy to get wrong. The most common bug in Python GIS is confusing set_crs with to_crs.
Let's unpack that, because I think even people who use GeoPandas regularly might not have internalized the difference. What does set_crs actually do?
set_crs assigns a coordinate reference system to a GeoDataFrame without transforming any coordinates. It's a metadata operation. You're telling GeoPandas: "These numbers you're looking at, interpret them as being in this coordinate system." You use it when you have coordinates that you know are in a specific CRS but the CRS information wasn't stored in the file — which happens constantly with CSV files or poorly-formed shapefiles.
It's like labeling a box. You're not changing what's inside, you're just attaching a description.
to_crs, on the other hand, actually reprojects the geometries. It runs the mathematical transformation through PROJ to convert coordinates from one CRS to another. So if you have data in WGS eighty-four lat-lon and you want it in UTM zone thirty-six north, to_crs does the actual math — the ellipsoid calculations, the datum transformations, the whole pipeline.
The bug is when someone uses set_crs thinking it reprojects?
They'll load a CSV with latitude and longitude columns, create a GeoDataFrame, call set_crs with a UTM EPSG code, and then wonder why their points are plotted somewhere in the Atlantic Ocean off the coast of Africa. Because they've told GeoPandas "these degree values are actually meter values in UTM coordinates," and GeoPandas believes them.
Which is a very trusting library.
It's a library that assumes you know what you're doing, which is both its strength and its weakness. The modern GeoPandas, since version zero point seven, stores CRS as a pyproj CRS object. You can pass CRS information in a bunch of formats — EPSG integer codes, authority strings like EPSG colon four three two six, PROJ strings, WKT strings, even dictionaries of PROJ parameters.
The GeoPandas documentation actually recommends against PROJ strings now, right? Something about information loss?
Yes, the documentation explicitly says conversions between WKT and PROJ strings will in most cases cause a loss of information. They recommend WKT or SRIDs, the spatial reference identifiers. PROJ strings were the old way and they're convenient for quick-and-dirty work, but they don't capture the full complexity of modern datum transformations. You can lose grid shift files and other correction data.
The stack has gotten more sophisticated, but also more intimidating. And then there's this newer convenience function that I think is fascinating: estimate_utm_crs. It's almost a magic function — you pass it a GeoDataFrame and it automatically figures out which UTM zone is appropriate based on the spatial bounds.
This is a really interesting design choice. Under the hood, it uses pyproj's query_utm_crs_info with an area of interest derived from the geometry centroid. The typical workflow is df.estimate_utm_crs()) — one line and you've automatically reprojected your data into the optimal local metric CRS.
That's powerful, but I can already see the problem. Someone uses estimate_utm_crs, it works perfectly for their data in Germany, and then they run it on a dataset that spans from Poland to Portugal and it picks one zone and distorts everything east of that zone.
Right, because estimate_utm_crs picks the UTM zone that contains the centroid of your data. If your data spans multiple zones, you get the zone of the center point, and everything at the edges is increasingly distorted. The function doesn't warn you about this. It just returns a single CRS.
It's a footgun wrapped in a convenience. The kind of thing that works ninety percent of the time and then silently fails on the ten percent of cases where you actually needed to think about the problem.
The ten percent cases are often the ones that matter most — cross-border analysis, long linear features like pipelines or rivers, anything that doesn't fit neatly into a single UTM zone. The function is great for quick exploratory work, but if you're doing production analysis, you need to understand why it's picking zone thirty-two north and whether that's actually appropriate.
Let's talk about the other big silent failure: doing distance or area calculations in unprojected lat-lon. EPSG four three two six, WGS eighty-four. GeoPandas will happily compute areas and distances in decimal degrees, and those numbers are geometrically meaningless.
A degree of longitude at the equator is about a hundred eleven kilometers. At sixty degrees latitude, it's about fifty-five kilometers. So the same decimal degree value means completely different physical distances depending on where you are on the planet. If you compute the area of a polygon in WGS eighty-four, you're getting a number that has no fixed relationship to square meters or square miles.
Beginners do this constantly, because WGS eighty-four is the default CRS for GPS data, for most open data portals, for virtually everything on the web. You load a GeoJSON file, it's in WGS eighty-four, you run geo_df.area, and you get a number. The library doesn't throw an error. It doesn't even warn you.
This is where the web mapping ecosystem creates a really unfortunate incentive. Web Mercator — EPSG three eight five seven — is the default tiled projection for Google Maps, OpenStreetMap, Mapbox, basically every web basemap. So if you're building a web application, your basemap expects Web Mercator, and you need your data in Web Mercator for it to align properly.
Web Mercator is officially terrible for measurement. The EPSG geodetic authority initially refused to even include it in their dataset. I saw the quote — they said, "We will not devalue the EPSG dataset by including such inappropriate geodesy and cartography.
That's a real quote. And it's not just snobbery. Web Mercator takes a computational shortcut — it uses spherical formulas instead of ellipsoidal ones. The Earth is not a sphere, it's an ellipsoid, and using spherical math introduces errors of about zero point seven percent in scale and up to forty-three kilometers in northing compared to true Mercator.
Forty-three kilometers is not a rounding error.
It's not. The US National Geospatial-Intelligence Agency issued what was effectively a cease-and-desist notice regarding Web Mercator for navigation and targeting. This is the projection that underlies every Google Maps embed on the internet.
You've got this tension. Web Mercator is computationally fast — that's why Google picked it in two thousand five — and it's become the de facto standard for tiled web maps. But it's officially condemned by the very authorities that maintain geodetic standards. And every web GIS developer lives in that tension.
The practical workflow is: keep your analysis data in a proper local projection — UTM, State Plane, Lambert Conformal Conic — do all your distance and area calculations there, and then use to_crs to reproject to Web Mercator only at the visualization layer, when you need to display on a web map. It's a constant back-and-forth, and it's a hidden complexity in every web-based GIS project.
Let's get concrete. Say I'm doing a project in Rhode Island. I've geocoded a bunch of addresses against the Census Bureau's NAD eighty-three lat-lon, which is EPSG four two six nine. But my analysis needs to happen in the Rhode Island state plane system, EPSG three four three eight, which is in feet. What does that transformation actually look like in pyproj?
There's a great writeup on this from Frank Donnelly, who did exactly that workflow. You create a transformer object using pyproj.from_crs with the source CRS, the target CRS, and the always_xy parameter set to True. That always_xy flag is crucial — it enforces longitude-latitude ordering, which is what most data uses, rather than the traditional GIS latitude-longitude ordering that pyproj inherited from PROJ.
The coordinate order problem. That's another silent failure.
It's a classic. PROJ historically used latitude-longitude order, because that's the cartographic convention. But almost all modern data — GeoJSON, GPS, most APIs — uses longitude-latitude. If you get the order wrong, your points end up in the wrong hemisphere. The always_xy flag is pyproj's way of saying "just use x-y order, which is longitude-latitude, and don't try to be clever about axis swapping.
Donnelly creates the transformer, then runs a loop over his geocoded results, transforms each coordinate pair from EPSG four two six nine to EPSG three four three eight, and suddenly his Census Bureau data and his state plane data are in the same coordinate system and can be merged into a single file.
The key thing is: this is a one-time transformation. Once the data is in state plane, everything else is easy. Distances are in feet, areas are in square feet, and you can use simple Euclidean geometry. That's the whole point of choosing the right local projection — it makes everything downstream simpler and more reliable.
The projection decision cascades. You make one careful choice at the beginning — which CRS, which datum, which units — and then the rest of your analysis can be straightforward. But if you skip that step or get it wrong, every subsequent calculation is subtly corrupted.
The corruption is often invisible. Let's say you're doing a spatial join to count points within polygons. If both layers are in WGS eighty-four, the join will work — the computational geometry still functions — but the concept of "within" is being computed on a sphere, not on a plane. For small areas, the difference might be negligible. For larger areas or higher precision requirements, it might not be.
What's the rule of thumb for when you need to reproject versus when you can get away with WGS eighty-four?
If you're doing visualization only — plotting points on a map — WGS eighty-four or Web Mercator is fine. If you're doing any measurement — distance, area, buffer operations — you need a projected coordinate system with metric units. If you're doing spatial joins over areas smaller than a few hundred kilometers, WGS eighty-four might be acceptable for topological operations, but I'd still reproject if precision matters.
"if precision matters" covers a lot of ground. Precision agriculture, where you're optimizing fertilizer application over a field. Drone flight planning, where you need to avoid obstacles by specific margins. Cadastral mapping, where property boundaries have legal implications. Any kind of environmental monitoring where you're calculating area of wetland loss or forest cover change.
And the tools now make it easy to do the right thing, if you know what the right thing is. The auto-discovery of UTM zones, the estimate_utm_crs function, the pyproj transformer API — all of this lowers the barrier to correct projection handling. But it also lowers the barrier to incorrect projection handling, because you can now do sophisticated transformations without understanding what's happening under the hood.
That's the classic abstraction trade-off. You gain productivity but lose the intuition for when the abstraction leaks.
Projections are a domain where the abstraction leaks constantly. Zone boundaries, datum shifts, unit conversions, the coordinate order problem — these aren't edge cases, they're the normal state of working with spatial data. Every dataset you encounter might be in a different CRS, and part of the skill of GIS is developing the reflex to check.
The reflex to check — that's the practical takeaway, really. Before you do anything with a new dataset, check the CRS. If it's not set, figure out what it should be and use set_crs. If it is set but it's WGS eighty-four and you need to measure things, use to_crs to get to a local projection. And if you're not sure which local projection, estimate_utm_crs is a reasonable starting point, but verify that your data actually fits in a single UTM zone.
Know your datum. If you're working with US data, you need to know whether you're in NAD eighty-three or something older. If you're combining datasets from different decades, check the metadata for datum information. The US survey foot deprecation is recent enough that there's still a ton of legacy data out there in the old units.
The other thing I'd add: if you're doing web mapping, keep your analysis CRS and your display CRS separate. Do the math in UTM or State Plane, reproject to Web Mercator only for the tiles. Don't let the convenience of Web Mercator alignment tempt you into doing analysis in a projection that the NGA says shouldn't be used for navigation.
The NGA knows a thing or two about navigation.
Alright, we should talk about one more thing before we wrap — the datum zoo. You mentioned State Plane has six major datum variants. That's not just a historical curiosity. If you're working with data from different agencies or different time periods, you can end up with layers that are all nominally in the same State Plane zone but on different datums, and they won't align.
The shift between NAD twenty-seven and NAD eighty-three can be tens of meters in some parts of the US. That's not subtle. If you're doing parcel mapping and your parcels are offset by thirty meters from your aerial imagery, that's a datum mismatch, and it's one of the most common support questions in GIS forums.
The fix is a datum transformation, which is another layer of the pyproj pipeline. When you call to_crs, pyproj doesn't just do the projection math — it also handles the datum transformation if the source and target CRS use different datums. It looks up the appropriate grid shift file or uses a Helmert transformation, depending on what's available and what accuracy you need.
This is where the WKT versus PROJ string distinction really matters. A PROJ string might capture the projection parameters but lose the datum transformation details. WKT can encode the full chain: datum, ellipsoid, prime meridian, projection, units, and the transformation path to WGS eighty-four. That's why GeoPandas recommends WKT — it's lossless.
The stack has gotten more capable, but the complexity hasn't gone away. It's just been pushed into metadata formats and convenience functions. And the people who need to understand it are the ones whose analysis depends on getting the numbers right.
Which is everyone doing any kind of quantitative spatial work. The projection isn't just a cartographic detail — it's the mathematical foundation of every distance, area, and spatial relationship you compute. Get it wrong and everything downstream is suspect.
Now: Hilbert's daily fun fact.
The average cumulus cloud weighs about five hundred metric tons — roughly the same as a fully loaded Airbus A three eighty.
For the practitioner, here's what matters. First, always check your CRS before doing anything. It's the first line of any GIS workflow and skipping it is the root cause of most silent errors. Second, keep analysis and visualization in separate projections — measure in UTM or State Plane, display in Web Mercator. Third, if you're working with US legacy data, know your datum and know your units. The US survey foot is deprecated but not extinct. Fourth, use estimate_utm_crs as a starting point, not a final answer — verify that your data fits in a single zone. And fifth, when you're writing transformation code, use pyproj's modern API with Transformer.from_crs and always_xy set to True. The old PROJ string approach is deprecated for good reason.
I'd add: read the metadata. Before you merge two datasets, check that their CRS, datum, and units all match. If they don't, transform one to match the other explicitly. Don't rely on implicit on-the-fly reprojection — it might work, but when it doesn't, the error is silent and the debugging is painful.
There's a bigger question here that I think is worth sitting with. As tools like estimate_utm_crs get better, as the stack gets smarter about auto-detecting and auto-transforming, does the practitioner eventually stop needing to understand projections at all? Or is there a floor of knowledge that's non-negotiable?
I think there's a floor. Projections are a model of the Earth, and every model has assumptions and limitations. You can abstract away the math, but you can't abstract away the fact that you're making a choice about what to preserve and what to distort. Conformal projections preserve shape. Equal-area projections preserve area. No projection preserves both. Someone has to decide what matters for this particular analysis.
That decision can't be automated, because it's a judgment about what the analysis is trying to accomplish. The tool can suggest, but the analyst has to decide.
Which is why I think learning projections is still worth the effort. Not the spherical trigonometry — the concepts. What you're trading off, why it matters, and how to check that you've made the right choice. That mental model is the floor.
This has been My Weird Prompts. Thanks to our producer Hilbert Flumingtop for the daily fun fact and for keeping us on schedule, which is no small feat. By the way, today's episode is powered by DeepSeek V four Pro.
Oh nice, a new model in the rotation. I'm Herman Poppleberry.
I'm Corn. You can find us at myweirdprompts dot com, and if you've got a projection horror story or a CRS debugging tale, we'd love to hear it. Until next time.