Skip to content

Conversation

@knaaptime
Copy link
Member

this starts the restructuring discussed in #4. i think it pretty substantially cuts down on repetition and makes the code much easier to navigate. We also get #104 for free

@renanxcortes
Copy link
Collaborator

This looks good to me and I am +1 for merging it, however, I'm assuming that this is a WIP, since just the single group measures were refactored, right?

@knaaptime
Copy link
Member Author

sweet. yeah, still WIP, just wanted to get some feedback/buy-in for this approach. I'll keep going with the multigroup indices

@knaaptime
Copy link
Member Author

this is ready to go. The main benefit is we adopt the logic of Reardon & Osullivan, that we can derive generalized spatial indices by transforming the input data of aspatial indices through a weights matrix (so, e.g. now we can compute multiscalar profiles for any SpatialImplicit index).

the original tests are unchanged to show that the (aspatial) calculations are still correct, with one exception: MinMaxS has a very different estimate. I think its correct, but want to flag this one did change

aside from the reorganization, there's one important API change, which is that fitted indices have the attribute data now instead of core_data

@codecov-commenter
Copy link

codecov-commenter commented Apr 20, 2021

Codecov Report

Merging #161 (fea1117) into master (736098b) will decrease coverage by 19.97%.
The diff coverage is n/a.

Impacted file tree graph

@@             Coverage Diff             @@
##           master     #161       +/-   ##
===========================================
- Coverage   85.75%   65.77%   -19.98%     
===========================================
  Files          65      116       +51     
  Lines        2779     4322     +1543     
===========================================
+ Hits         2383     2843      +460     
- Misses        396     1479     +1083     
Impacted Files Coverage Δ
...egregation/aspatial/multigroup_aspatial_indexes.py 31.44% <0.00%> (-68.56%) ⬇️
segregation/aspatial/aspatial_indexes.py 17.03% <0.00%> (-66.53%) ⬇️
segregation/network/network.py 27.27% <0.00%> (-65.91%) ⬇️
segregation/spatial/spatial_indexes.py 15.07% <0.00%> (-57.32%) ⬇️
segregation/util/util.py 78.72% <0.00%> (-6.72%) ⬇️
segregation/util/__init__.py 100.00% <0.00%> (ø)
segregation/local/__init__.py 100.00% <0.00%> (ø)
segregation/aspatial/__init__.py 100.00% <0.00%> (ø)
segregation/decomposition/decompose_segregation.py 97.53% <0.00%> (ø)
...regation/tests/test_local_multi_local_diversity.py 92.85% <0.00%> (ø)
... and 71 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 736098b...fea1117. Read the comment docs.

@knaaptime
Copy link
Member Author

(tests are all passing, there's a CI issue with rtree on windows)

statistics : np.array(n,k)
Local Diversity values for each group and unit

core_data : a pandas DataFrame
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR looks good to me. But wasn't supposed to return data rather than core_data?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its ok here because we set the class attribute appropriately (so even though we return the variable core_data above, its contents get set to the class attribute named data)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, gotcha!

@renanxcortes
Copy link
Collaborator

renanxcortes commented Apr 21, 2021

Awesome, @knaaptime! By the way, when you say "MinMaxS has a very different estimate", you mean the "MinMax" from https://github.com/knaaptime/segregation/blob/2.0/segregation/singlegroup/minmax.py, right? Any clue of why this might be happening? 'Cause it seems like nothing changed in the code/math behind it, right?

@knaaptime
Copy link
Member Author

yeah, the difference between old and new. I thought it had to do with different values for the spatial params, but now im not too sure. The old number looks awfully low relative to the values from the paper

@knaaptime
Copy link
Member Author

right, the math and the function that actually compute the statistic havent changed. I thnk the root is this line in the new version, whereas the old code probably created a Kernel weights object from unprojected data

@renanxcortes
Copy link
Collaborator

right, the math and the function that actually compute the statistic havent changed. I thnk the root is this line in the new version, whereas the old code probably created a Kernel weights object from unprojected data

Yep, this might it, perhaps the new version of estimate_utm_crs projects that gdp differently than the original test. I guess we could just update the test to the new value (even tough it is largely different than the previous one) and go for the merge.

@knaaptime knaaptime changed the title starting 2.0 refactor 2.0 refactor Apr 21, 2021
@sjsrey
Copy link
Member

sjsrey commented May 9, 2021

mamba update -c conda-forge geopandas

puts gp at 0.9.0 and problem with MISSING is solved.

@knaaptime
Copy link
Member Author

thanks martin

c = 1-dist.copy()

Pxx = (data.xi.values * data.xi.values * c).sum() / (X ** 2)
Pyy = (data.xi.values * data.yi.values * c).sum() / (X*Y)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@renanxcortes one substantive change here: I edited this formula because i think the last version had a mistake (ycy/y^2 instead of ycx/yx)

image

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@knaaptime , the original version is the one that follows Massey and Denton. If we look closer at formulas (19) and (20) which are, respectively, Spatial Proximity and Relative Clustering, they both use the definition of Pxx and Pyy, which is our original implementation.

image

The definition of the "interaction" represented by "Pxy" is just for illustration.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, snap. you're right

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No sweat!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants