geograpy package¶
Submodules¶
geograpy.extraction module¶
- class geograpy.extraction.Extractor(text=None, url=None, debug=False)[source]¶
Bases:
object
Extract geo context for text or from url
- find_entities(labels=['GPE', 'GSP', 'PERSON', 'ORGANIZATION'])[source]¶
Find entities with the given labels set self.places and returns it Args:
- labels:
Labels: The labels to filter
- Returns:
- list:
List of places
geograpy.geograpy_nltk module¶
geograpy.labels module¶
Created on 2020-09-10
@author: wf
geograpy.locator module¶
The locator module allows to get detailed city information including the region and country of a city from a location string.
Examples for location strings are:
Amsterdam, Netherlands Vienna, Austria Vienna, IL Paris - Texas Paris TX
the locator will lookup the cities and try to disambiguate the result based on the country or region information found.
The results in string representationa are:
Amsterdam (NH(North Holland) - NL(Netherlands)) Vienna (9(Vienna) - AT(Austria)) Vienna (IL(Illinois) - US(United States)) Paris (TX(Texas) - US(United States)) Paris (TX(Texas) - US(United States))
Each city returned has a city.region and city.country attribute with the details of the city.
Created on 2020-09-18
@author: wf
- class geograpy.locator.City(**kwargs)[source]¶
Bases:
Location
a single city as an object
- property country¶
- static fromCityLookup(cityLookupRecord: dict)[source]¶
create a city from a cityLookupRecord and setting City, Region and Country while at it Args:
cityRecord(dict): a map derived from the CityLookup view
- property region¶
- class geograpy.locator.CityManager(name: str = 'CityManager', config: StorageConfig | None = None, debug=False)[source]¶
Bases:
LocationManager
a list of cities
- class geograpy.locator.Country(lookupSource='sqlDB', **kwargs)[source]¶
Bases:
Location
a country
- class geograpy.locator.CountryManager(name: str = 'CountryManager', config: StorageConfig | None = None, debug=False)[source]¶
Bases:
LocationManager
a list of countries
- classmethod fromErdem()[source]¶
get country list provided by Erdem Ozkol https://github.com/erdem
- class geograpy.locator.Location(**kwargs)[source]¶
Bases:
JSONAble
Represents a Location
- balltreeQueryResultToLocationManager(distances, indices, lookupListOfLocations)[source]¶
convert the given ballTree Query Result to a LocationManager
- Args:
distances(list): array of distances indices(list): array of indices lookupListOfLocations(list): a list of valid locations to use for lookup
- Return:
list: a list of result Location/distance tuples
- distance(other) float [source]¶
calculate the distance to another Location
- Args:
other(Location): the other location
- Returns:
the haversine distance in km
- classmethod fromRecord(regionRecord: dict)[source]¶
create a location from a dict record
- Args:
regionRecord(dict): the records as returned from a Query
- Returns:
Region: the corresponding region information
- getLocationsWithinRadius(lookupLocationManager, radiusKm: float)[source]¶
Gives the n closest locations to me from the given lookupListOfLocations
- Args:
lookupLocationManager(LocationManager): a LocationManager object to use for lookup radiusKm(float): the radius in which to check (in km)
- Returns:
list: a list of result Location/distance tuples
- getNClosestLocations(lookupLocationManager, n: int)[source]¶
Gives a list of up to n locations which have the shortest distance to me as calculated from the given listOfLocations
- Args:
lookupLocationManager(LocationManager): a LocationManager object to use for lookup n(int): the maximum number of closest locations to return
- Returns:
list: a list of result Location/distance tuples
- static haversine(lon1, lat1, lon2, lat2)[source]¶
Calculate the great circle distance between two points on the earth (specified in decimal degrees)
- class geograpy.locator.LocationContext(countryManager: CountryManager, regionManager: RegionManager, cityManager: CityManager, config: StorageConfig)[source]¶
Bases:
object
Holds LocationManagers of all hierarchy levels and provides methods to traverse through the levels
- property cities: list¶
- property countries: list¶
- db_filename = 'locations.db'¶
- classmethod fromCache(config: StorageConfig | None = None, forceUpdate: bool = False)[source]¶
Inits a LocationContext form Cache if existent otherwise init cache
- Args:
config(StorageConfig): configuration of the cache if None the default config is used forceUpdate(bool): If True an existent cache will be over written
- interlinkLocations(warnOnDuplicates: bool = True, profile=True)[source]¶
Interlinks locations by adding the hierarchy references to the locations
- Args:
warnOnDuplicates(bool): if there are duplicates warn
- locateLocation(*locations, verbose: bool = False)[source]¶
Get possible locations for the given location names. Current prioritization of the results is city(ordered by population)→region→country ToDo: Extend the ranking of the results e.g. matching of multiple location parts increase ranking Args:
*locations: verbose(bool): If True combinations of locations names are used to improve the search results. (Increases lookup time)
Returns:
- property regions: list¶
- class geograpy.locator.LocationManager(name: str, entityName: str, entityPluralName: str, listName: str | None = None, tableName: str | None = None, clazz=None, primaryKey: str | None = None, config: StorageConfig | None = None, handleInvalidListTypes=True, filterInvalidListTypes=False, debug=False)[source]¶
Bases:
EntityManager
a list of locations
- add(location)[source]¶
add the given location to me
- Args:
location(object): the location to be added and put in my hash map
- classmethod downloadBackupFileFromGitHub(fileName: str, targetDirectory: str | None = None, force: bool = False)[source]¶
download the given fileName from the github data directory
- Args:
fileName(str): the filename to download targetDirectory(str): download the file this directory force(bool): force the overwriting of the existent file
- Return:
str: the local file
- getBallTuple(cache: bool = True)[source]¶
get the BallTuple=BallTree,validList of this location list
- Args:
cache(bool): if True calculate and use a cached version otherwise recalculate on every call of this function
- Returns:
BallTree,list: a sklearn.neighbors.BallTree for the given list of locations, list: the valid list of locations list: valid list of locations
- getByName(*names: str)[source]¶
Get locations matching given names Args:
name: Name of the location
- Returns:
Returns locations that match the given name
- getLocationByID(wikidataID: str)[source]¶
Returns the location object that corresponds to the given location
- Args:
wikidataID: wikidataid of the location that should be returned
- Returns:
Location object
- class geograpy.locator.Locator(db_file=None, correctMisspelling=False, storageConfig: StorageConfig | None = None, debug=False)[source]¶
Bases:
object
location handling
- cities_for_name(cityName)[source]¶
find cities with the given cityName
- Args:
cityName(string): the potential name of a city
- Returns:
a list of city records
- correct_country_misspelling(name)[source]¶
correct potential misspellings Args:
name(string): the name of the country potentially misspelled
- Return:
string: correct name of unchanged
- db_has_data()[source]¶
check whether the database has data / is populated
- Returns:
boolean: True if the cities table exists and has more than one record
- db_recordCount(tableList, tableName)[source]¶
count the number of records for the given tableName
- Args:
tableList(list): the list of table to check tableName(str): the name of the table to check
- Returns
int: the number of records found for the table
- disambiguate(country, regions, cities, byPopulation=True)[source]¶
try determining country, regions and city from the potential choices
- Args:
country(Country): a matching country found regions(list): a list of matching Regions found cities(list): a list of matching cities found
- Return:
City: the found city or None
- downloadDB(forceUpdate: bool = False)[source]¶
download my database
- Args:
forceUpdate(bool): force the overwriting of the existent file
- getCountry(name)[source]¶
get the country for the given name Args:
name(string): the name of the country to lookup
- Returns:
country: the country if one was found or None if not
- static getInstance(correctMisspelling=False, debug=False)[source]¶
get the singleton instance of the Locator. If parameters are changed on further calls the initial parameters will still be in effect since the original instance will be returned!
- Args:
correctMispelling(bool): if True correct typical misspellings debug(bool): if True show debug information
- getView()[source]¶
get the view to be used
- Returns:
str: the SQL view to be used for CityLookups e.g. CityLookup
- static isISO(s)[source]¶
check if the given string is an ISO code (ISO 3166-2 code) see https://www.wikidata.org/wiki/Property:P300
- Returns:
bool: True if the string might be an ISO Code as per a regexp check
- is_a_country(name)[source]¶
check if the given string name is a country
- Args:
name(string): the string to check
- Returns:
True: if pycountry thinks the string is a country
- locateCity(places: list)[source]¶
locate a city, region country combination based on the given wordtoken information
- Args:
places(list): a list of places derived by splitting a locality e.g. “San Francisco, CA” leads to “San Francisco”, “CA”
- Returns:
City: a city with country and region details
- locator = None¶
- normalizePlaces(places: list)[source]¶
normalize places
- Args:
places(list) a list of places
- Return:
list: stripped and aliased list of places
- places_by_name(placeName, columnName)[source]¶
get places by name and column Args:
placeName(string): the name of the place columnName(string): the column to look at
- populate_Cities(sqlDB)[source]¶
populate the given sqlDB with the Wikidata Cities
- Args:
sqlDB(SQLDB): target SQL database
- populate_Countries(sqlDB)[source]¶
populate database with countries from wikiData
- Args:
sqlDB(SQLDB): target SQL database
- populate_Regions(sqlDB)[source]¶
populate database with regions from wikiData
- Args:
sqlDB(SQLDB): target SQL database
- populate_db(force=False)[source]¶
populate the cities SQL database which caches the information from the GeoLite2-City-Locations.csv file
- Args:
force(bool): if True force a recreation of the database
- class geograpy.locator.LocatorCmd[source]¶
Bases:
object
command line handling for locator
- cmd_main(argv: None) int [source]¶
main program as an instance
- Args:
argv(list): list of command line arguments
- Returns:
int: exit code - 0 of all went well 1 for keyboard interrupt and 2 for exceptions
- cmd_parse(argv: list | None = None)[source]¶
parse the argument lists and prepare
- Args:
argv(list): list of command line arguments
- class geograpy.locator.Region(**kwargs)[source]¶
Bases:
Location
a Region (Subdivision)
- property country¶
- class geograpy.locator.RegionManager(name: str = 'RegionManager', config: StorageConfig | None = None, debug=False)[source]¶
Bases:
LocationManager
a list of regions
geograpy.nominatim module¶
Created on 2021-12-27
@author: wf
geograpy.places module¶
- class geograpy.places.PlaceContext(place_names: list, setAll: bool = True, correctMisspelling: bool = False)[source]¶
Bases:
Locator
Adds context information to a place name
- getRegions(countryName: str) list [source]¶
get a list of regions for the given countryName
countryName(str): the countryName to check
geograpy.utils module¶
- class geograpy.utils.Download[source]¶
Bases:
object
Utility functions for downloading data
- static downloadBackupFile(url: str, fileName: str, targetDirectory: str, force: bool = False)[source]¶
Downloads from the given url the zip-file and extracts the file corresponding to the given fileName.
- Args:
url: url linking to a downloadable gzip file fileName: Name of the file that should be extracted from gzip file targetDirectory(str): download the file this directory force (bool): True if the download should be forced
- Returns:
Name of the extracted file with path to the backup directory
- static needsDownload(filePath: str, force: bool = False) bool [source]¶
check if a download of the given filePath is necessary that is the file does not exist has a size of zero or the download should be forced
- Args:
filePath(str): the path of the file to be checked force(bool): True if the result should be forced to True
- Return:
bool: True if a download for this file needed
- geograpy.utils.fuzzy_match(s1, s2, max_dist=0.8)[source]¶
Fuzzy match the given two strings with the given maximum distance jellyfish jaro_winkler_similarity based on https://en.wikipedia.org/wiki/Jaro-Winkler_distance Args:
- s1:
string: First string
- s2:
string: Second string
- max_dist:
float: The distance - default: 0.8
- Returns:
True if the match is greater equals max_dist. Otherwise false
geograpy.version module¶
Created on 2024-03-29
@author: wf
- class geograpy.version.Version[source]¶
Bases:
object
Version handling for the geograpy3 project.
- authors = 'Somnath Rakshit, Wolfgang Fahl, Tim Holzheim'¶
- chat_url = 'https://github.com/somnathrakshit/geograpy3/discussions'¶
- cm_url = 'https://github.com/somnathrakshit/geograpy3'¶
- date = '2023-09-10'¶
- description = 'Extract countries, regions, and cities from a URL or text'¶
- doc_url = 'https://geograpy3.readthedocs.io'¶
- license = 'Copyright 2023-2024 contributors. All rights reserved.\n\n Licensed under the Apache License 2.0\n http://www.apache.org/licenses/LICENSE-2.0\n\n Distributed on an "AS IS" basis without warranties\n or conditions of any kind, either express or implied.'¶
- longDescription = 'geograpy3 version 0.3.0\nExtract countries, regions, and cities from a URL or text\n\n Created by Somnath Rakshit, Wolfgang Fahl, Tim Holzheim on 2023-09-10 last updated 2024-03-29.\n For more information, visit https://geograpy3.readthedocs.io.'¶
- name = 'geograpy3'¶
- updated = '2024-03-29'¶
- version = '0.3.0'¶
geograpy.wikidata module¶
Created on 2020-09-23
@author: wf
- class geograpy.wikidata.Wikidata(endpoint='https://query.wikidata.org/sparql', profile: bool = True)[source]¶
Bases:
object
Wikidata access
- getCities(limit=1000000)[source]¶
get all human settlements as list of dict with duplicates for label, region, country …
- static getCoordinateComponents(coordinate: str) -> (<class 'float'>, <class 'float'>)[source]¶
Converts the wikidata coordinate representation into its subcomponents longitude and latitude Example: ‘Point(-118.25 35.05694444)’ results in (‘-118.25’ ‘35.05694444’)
- Args:
coordinate: coordinate value in the format as returned by wikidata queries
- Returns:
Returns the longitude and latitude of the given coordinate as separate values
- static getValuesClause(varName: str, values, wikidataEntities: bool = True)[source]¶
generates the SPARQL value clause for the given variable name containing the given values Args:
varName: variable name for the ValuesClause values: values for the clause wikidataEntities(bool): if true the wikidata prefix is added to the values otherwise it is expected taht the given values are proper IRIs
- Returns:
str
- static getWikidataId(wikidataURL: str)[source]¶
Extracts the wikidata id from the given wikidata URL
- Args:
wikidataURL: wikidata URL the id should be extracted from
- Returns:
The wikidata id if present in the given wikidata URL otherwise None
Module contents¶
main geograpy 3 module
- geograpy.get_geoPlace_context(url=None, text=None, debug=False)[source]¶
Get a place context for a given text with information about country, region, city and other based on NLTK Named Entities having the Geographic(GPE) label.
- Args:
url(String): the url to read text from (if any) text(String): the text to analyze debug(boolean): if True show debug information
- Returns:
- places:
PlaceContext: the place context
- geograpy.get_place_context(url=None, text=None, labels=['GPE', 'GSP', 'PERSON', 'ORGANIZATION'], debug=False)[source]¶
Get a place context for a given text with information about country, region, city and other based on NLTK Named Entities in the label set Geographic(GPE), Person(PERSON) and Organization(ORGANIZATION).
- Args:
url(String): the url to read text from (if any) text(String): the text to analyze debug(boolean): if True show debug information
- Returns:
- pc:
PlaceContext: the place context