geograpy package

Submodules

geograpy.extraction module

class geograpy.extraction.Extractor(text=None, url=None, debug=False)[source]

Bases: object

Extract geo context for text or from url

find_entities(labels=['GPE', 'GSP', 'PERSON', 'ORGANIZATION'])[source]

Find entities with the given labels set self.places and returns it Args:

labels:

Labels: The labels to filter

Returns:
list:

List of places

find_geoEntities()[source]

Find geographic entities

Returns:
list:

List of places

set_text()[source]

Setter for text

split(delimiter=',')[source]

simpler regular expression splitter with not entity check

hat tip: https://stackoverflow.com/a/1059601/1497139

geograpy.geograpy_nltk module

geograpy.geograpy_nltk.main()[source]

geograpy.labels module

Created on 2020-09-10

@author: wf

class geograpy.labels.Labels[source]

Bases: object

NLTK labels

default = ['GPE', 'GSP', 'PERSON', 'ORGANIZATION']
geo = ['GPE', 'GSP']

geograpy.locator module

The locator module allows to get detailed city information including the region and country of a city from a location string.

Examples for location strings are:

Amsterdam, Netherlands Vienna, Austria Vienna, IL Paris - Texas Paris TX

the locator will lookup the cities and try to disambiguate the result based on the country or region information found.

The results in string representationa are:

Amsterdam (NH(North Holland) - NL(Netherlands)) Vienna (9(Vienna) - AT(Austria)) Vienna (IL(Illinois) - US(United States)) Paris (TX(Texas) - US(United States)) Paris (TX(Texas) - US(United States))

Each city returned has a city.region and city.country attribute with the details of the city.

Created on 2020-09-18

@author: wf

class geograpy.locator.City(**kwargs)[source]

Bases: Location

a single city as an object

property country
static fromCityLookup(cityLookupRecord: dict)[source]

create a city from a cityLookupRecord and setting City, Region and Country while at it Args:

cityRecord(dict): a map derived from the CityLookup view

classmethod getSamples()[source]
property region
setValue(name, record)[source]

set a field value with the given name to the given record dicts corresponding entry or none

Args:

name(string): the name of the field record(dict): the dict to get the value from

class geograpy.locator.CityManager(name: str = 'CityManager', config: StorageConfig | None = None, debug=False)[source]

Bases: LocationManager

a list of cities

classmethod getJsonFiles(config: StorageConfig) list[source]

get the list of the json files that have my data

Return:

list: a list of json file names

class geograpy.locator.Country(lookupSource='sqlDB', **kwargs)[source]

Bases: Location

a country

static fromCountryLookup(countryLookupRecord: dict)[source]

create a region from a regionLookupRecord and setting Region and Country while at it Args:

regionRecord(dict): a map derived from the CityLookup view

classmethod getSamples()[source]
class geograpy.locator.CountryManager(name: str = 'CountryManager', config: StorageConfig | None = None, debug=False)[source]

Bases: LocationManager

a list of countries

classmethod fromErdem()[source]

get country list provided by Erdem Ozkol https://github.com/erdem

class geograpy.locator.Earth[source]

Bases: object

radius = 6371.0
class geograpy.locator.Location(**kwargs)[source]

Bases: JSONAble

Represents a Location

balltreeQueryResultToLocationManager(distances, indices, lookupListOfLocations)[source]

convert the given ballTree Query Result to a LocationManager

Args:

distances(list): array of distances indices(list): array of indices lookupListOfLocations(list): a list of valid locations to use for lookup

Return:

list: a list of result Location/distance tuples

distance(other) float[source]

calculate the distance to another Location

Args:

other(Location): the other location

Returns:

the haversine distance in km

classmethod fromRecord(regionRecord: dict)[source]

create a location from a dict record

Args:

regionRecord(dict): the records as returned from a Query

Returns:

Region: the corresponding region information

getLocationsWithinRadius(lookupLocationManager, radiusKm: float)[source]

Gives the n closest locations to me from the given lookupListOfLocations

Args:

lookupLocationManager(LocationManager): a LocationManager object to use for lookup radiusKm(float): the radius in which to check (in km)

Returns:

list: a list of result Location/distance tuples

getNClosestLocations(lookupLocationManager, n: int)[source]

Gives a list of up to n locations which have the shortest distance to me as calculated from the given listOfLocations

Args:

lookupLocationManager(LocationManager): a LocationManager object to use for lookup n(int): the maximum number of closest locations to return

Returns:

list: a list of result Location/distance tuples

classmethod getSamples()[source]
static haversine(lon1, lat1, lon2, lat2)[source]

Calculate the great circle distance between two points on the earth (specified in decimal degrees)

isKnownAs(name) bool[source]

Checks if this location is known under the given name

Args:

name(str): name the location should be checked against

Returns:

True if the given name is either the name of the location or present in the labels of the location

static mappedDict(record, keyMapList: list)[source]
static partialDict(record, clazz, keys=None)[source]
class geograpy.locator.LocationContext(countryManager: CountryManager, regionManager: RegionManager, cityManager: CityManager, config: StorageConfig)[source]

Bases: object

Holds LocationManagers of all hierarchy levels and provides methods to traverse through the levels

property cities: list
property countries: list
db_filename = 'locations.db'
classmethod fromCache(config: StorageConfig | None = None, forceUpdate: bool = False)[source]

Inits a LocationContext form Cache if existent otherwise init cache

Args:

config(StorageConfig): configuration of the cache if None the default config is used forceUpdate(bool): If True an existent cache will be over written

static getDefaultConfig() StorageConfig[source]

Returns default StorageConfig

interlinkLocations(warnOnDuplicates: bool = True, profile=True)[source]

Interlinks locations by adding the hierarchy references to the locations

Args:

warnOnDuplicates(bool): if there are duplicates warn

load(forceUpdate: bool = False, warnOnDuplicates: bool = False)[source]

load my data

locateLocation(*locations, verbose: bool = False)[source]

Get possible locations for the given location names. Current prioritization of the results is city(ordered by population)→region→country ToDo: Extend the ranking of the results e.g. matching of multiple location parts increase ranking Args:

*locations: verbose(bool): If True combinations of locations names are used to improve the search results. (Increases lookup time)

Returns:

property regions: list
class geograpy.locator.LocationManager(name: str, entityName: str, entityPluralName: str, listName: str | None = None, tableName: str | None = None, clazz=None, primaryKey: str | None = None, config: StorageConfig | None = None, handleInvalidListTypes=True, filterInvalidListTypes=False, debug=False)[source]

Bases: EntityManager

a list of locations

add(location)[source]

add the given location to me

Args:

location(object): the location to be added and put in my hash map

classmethod downloadBackupFileFromGitHub(fileName: str, targetDirectory: str | None = None, force: bool = False)[source]

download the given fileName from the github data directory

Args:

fileName(str): the filename to download targetDirectory(str): download the file this directory force(bool): force the overwriting of the existent file

Return:

str: the local file

fromCache(force=False, getListOfDicts=None, sampleRecordCount=-1)[source]

get me from the cache

static getBackupDirectory()[source]
getBallTuple(cache: bool = True)[source]

get the BallTuple=BallTree,validList of this location list

Args:

cache(bool): if True calculate and use a cached version otherwise recalculate on every call of this function

Returns:

BallTree,list: a sklearn.neighbors.BallTree for the given list of locations, list: the valid list of locations list: valid list of locations

getByName(*names: str)[source]

Get locations matching given names Args:

name: Name of the location

Returns:

Returns locations that match the given name

getLocationByID(wikidataID: str)[source]

Returns the location object that corresponds to the given location

Args:

wikidataID: wikidataid of the location that should be returned

Returns:

Location object

getLocationByIsoCode(isoCode: str)[source]

Get possible locations matching the given isoCode Args:

isoCode: isoCode of possible Locations

Returns:

List of wikidata ids of locations matching the given isoCode

getLocationsByWikidataId(*wikidataId: str)[source]

Returns Location objects for the given wikidataids Args:

*wikidataId(str): wikidataIds of the locations that should be returned

Returns:

Location objects matching the given wikidataids

class geograpy.locator.Locator(db_file=None, correctMisspelling=False, storageConfig: StorageConfig | None = None, debug=False)[source]

Bases: object

location handling

cities_for_name(cityName)[source]

find cities with the given cityName

Args:

cityName(string): the potential name of a city

Returns:

a list of city records

correct_country_misspelling(name)[source]

correct potential misspellings Args:

name(string): the name of the country potentially misspelled

Return:

string: correct name of unchanged

createViews(sqlDB)[source]
db_has_data()[source]

check whether the database has data / is populated

Returns:

boolean: True if the cities table exists and has more than one record

db_recordCount(tableList, tableName)[source]

count the number of records for the given tableName

Args:

tableList(list): the list of table to check tableName(str): the name of the table to check

Returns

int: the number of records found for the table

disambiguate(country, regions, cities, byPopulation=True)[source]

try determining country, regions and city from the potential choices

Args:

country(Country): a matching country found regions(list): a list of matching Regions found cities(list): a list of matching cities found

Return:

City: the found city or None

downloadDB(forceUpdate: bool = False)[source]

download my database

Args:

forceUpdate(bool): force the overwriting of the existent file

getAliases()[source]

get the aliases hashTable

getCountry(name)[source]

get the country for the given name Args:

name(string): the name of the country to lookup

Returns:

country: the country if one was found or None if not

static getInstance(correctMisspelling=False, debug=False)[source]

get the singleton instance of the Locator. If parameters are changed on further calls the initial parameters will still be in effect since the original instance will be returned!

Args:

correctMispelling(bool): if True correct typical misspellings debug(bool): if True show debug information

getView()[source]

get the view to be used

Returns:

str: the SQL view to be used for CityLookups e.g. CityLookup

static isISO(s)[source]

check if the given string is an ISO code (ISO 3166-2 code) see https://www.wikidata.org/wiki/Property:P300

Returns:

bool: True if the string might be an ISO Code as per a regexp check

is_a_country(name)[source]

check if the given string name is a country

Args:

name(string): the string to check

Returns:

True: if pycountry thinks the string is a country

loadDB()[source]

loads the database from cache and sets it as sqlDB property

locateCity(places: list)[source]

locate a city, region country combination based on the given wordtoken information

Args:

places(list): a list of places derived by splitting a locality e.g. “San Francisco, CA” leads to “San Francisco”, “CA”

Returns:

City: a city with country and region details

locator = None
normalizePlaces(places: list)[source]

normalize places

Args:

places(list) a list of places

Return:

list: stripped and aliased list of places

places_by_name(placeName, columnName)[source]

get places by name and column Args:

placeName(string): the name of the place columnName(string): the column to look at

populate_Cities(sqlDB)[source]

populate the given sqlDB with the Wikidata Cities

Args:

sqlDB(SQLDB): target SQL database

populate_Countries(sqlDB)[source]

populate database with countries from wikiData

Args:

sqlDB(SQLDB): target SQL database

populate_Regions(sqlDB)[source]

populate database with regions from wikiData

Args:

sqlDB(SQLDB): target SQL database

populate_Version(sqlDB)[source]

populate the version table

Args:

sqlDB(SQLDB): target SQL database

populate_db(force=False)[source]

populate the cities SQL database which caches the information from the GeoLite2-City-Locations.csv file

Args:

force(bool): if True force a recreation of the database

readCSV(fileName: str)[source]

read the given CSV file

Args:

fileName(str): the filename to read

recreateDatabase()[source]

recreate my lookup database

regions_for_name(region_name)[source]

get the regions for the given region_name (which might be an ISO code)

Args:

region_name(string): region name

Returns:

list: the list of cities for this region

static resetInstance()[source]
class geograpy.locator.LocatorCmd[source]

Bases: object

command line handling for locator

cmd_main(argv: None) int[source]

main program as an instance

Args:

argv(list): list of command line arguments

Returns:

int: exit code - 0 of all went well 1 for keyboard interrupt and 2 for exceptions

cmd_parse(argv: list | None = None)[source]

parse the argument lists and prepare

Args:

argv(list): list of command line arguments

getArgParser(description: str, version_msg: str) ArgumentParser[source]

Setup command line argument parser

Args:

description(str): the description version_msg(str): the version message

Returns:

ArgumentParser: the argument parser

handle_args()[source]

handle the arguments

class geograpy.locator.Region(**kwargs)[source]

Bases: Location

a Region (Subdivision)

property country
static fromRegionLookup(regionLookupRecord: dict)[source]

create a region from a regionLookupRecord and setting Region and Country while at it Args:

regionRecord(dict): a map derived from the CityLookup view

classmethod getSamples()[source]
class geograpy.locator.RegionManager(name: str = 'RegionManager', config: StorageConfig | None = None, debug=False)[source]

Bases: LocationManager

a list of regions

geograpy.locator.main(argv: list | None = None)[source]

main call

geograpy.nominatim module

Created on 2021-12-27

@author: wf

class geograpy.nominatim.NominatimWrapper(cacheDir: str | None = None, user_agent: str = 'ConferenceCorpus')[source]

Bases: object

Nominatim Wrapper to hide technical details of Nominatim interface

lookupWikiDataId(locationText: str)[source]

lookup the Wikidata Identifier for the given locationText (if any)

Args:

locationText(str): the location text to search for

Return:

the wikidata Q identifier most fitting the given location text

geograpy.places module

class geograpy.places.PlaceContext(place_names: list, setAll: bool = True, correctMisspelling: bool = False)[source]

Bases: Locator

Adds context information to a place name

getRegions(countryName: str) list[source]

get a list of regions for the given countryName

countryName(str): the countryName to check

get_region_names(countryName: str) list[source]

get region names for the given country

Args:

countryName(str): the name of the country

setAll()[source]

Set all context information

set_cities()[source]

set the cities information

set_countries()[source]

get the country information from my places

set_other()[source]
set_regions()[source]

get the region information from my places (limited to the already identified countries)

geograpy.utils module

class geograpy.utils.Download[source]

Bases: object

Utility functions for downloading data

static downloadBackupFile(url: str, fileName: str, targetDirectory: str, force: bool = False)[source]

Downloads from the given url the zip-file and extracts the file corresponding to the given fileName.

Args:

url: url linking to a downloadable gzip file fileName: Name of the file that should be extracted from gzip file targetDirectory(str): download the file this directory force (bool): True if the download should be forced

Returns:

Name of the extracted file with path to the backup directory

static getFileContent(path: str)[source]
static getURLContent(url: str)[source]
static needsDownload(filePath: str, force: bool = False) bool[source]

check if a download of the given filePath is necessary that is the file does not exist has a size of zero or the download should be forced

Args:

filePath(str): the path of the file to be checked force(bool): True if the result should be forced to True

Return:

bool: True if a download for this file needed

class geograpy.utils.Profiler(msg, profile=True)[source]

Bases: object

simple profiler

time(extraMsg='')[source]

time the action and print if profile is active

geograpy.utils.fuzzy_match(s1, s2, max_dist=0.8)[source]

Fuzzy match the given two strings with the given maximum distance jellyfish jaro_winkler_similarity based on https://en.wikipedia.org/wiki/Jaro-Winkler_distance Args:

s1:

string: First string

s2:

string: Second string

max_dist:

float: The distance - default: 0.8

Returns:

True if the match is greater equals max_dist. Otherwise false

geograpy.utils.remove_non_ascii(s)[source]

Remove non ascii chars from the given string Args:

s:

string: The string to remove chars from

Returns:

string: The result string with non-ascii chars removed

Hat tip: http://stackoverflow.com/a/1342373/2367526

geograpy.version module

Created on 2024-03-29

@author: wf

class geograpy.version.Version[source]

Bases: object

Version handling for the geograpy3 project.

authors = 'Somnath Rakshit, Wolfgang Fahl, Tim Holzheim'
chat_url = 'https://github.com/somnathrakshit/geograpy3/discussions'
cm_url = 'https://github.com/somnathrakshit/geograpy3'
date = '2023-09-10'
description = 'Extract countries, regions, and cities from a URL or text'
doc_url = 'https://geograpy3.readthedocs.io'
license = 'Copyright 2023-2024 contributors. All rights reserved.\n\n    Licensed under the Apache License 2.0\n    http://www.apache.org/licenses/LICENSE-2.0\n\n    Distributed on an "AS IS" basis without warranties\n    or conditions of any kind, either express or implied.'
longDescription = 'geograpy3 version 0.3.0\nExtract countries, regions, and cities from a URL or text\n\n    Created by Somnath Rakshit, Wolfgang Fahl, Tim Holzheim on 2023-09-10 last updated 2024-03-29.\n    For more information, visit https://geograpy3.readthedocs.io.'
name = 'geograpy3'
updated = '2024-03-29'
version = '0.3.0'

geograpy.wikidata module

Created on 2020-09-23

@author: wf

class geograpy.wikidata.Wikidata(endpoint='https://query.wikidata.org/sparql', profile: bool = True)[source]

Bases: object

Wikidata access

getCities(limit=1000000)[source]

get all human settlements as list of dict with duplicates for label, region, country …

getCitiesForRegion(regionId, msg)[source]

get the cities for the given Region

getCityStates(limit=None)[source]

get city states from Wikidata

try query

static getCoordinateComponents(coordinate: str) -> (<class 'float'>, <class 'float'>)[source]

Converts the wikidata coordinate representation into its subcomponents longitude and latitude Example: ‘Point(-118.25 35.05694444)’ results in (‘-118.25’ ‘35.05694444’)

Args:

coordinate: coordinate value in the format as returned by wikidata queries

Returns:

Returns the longitude and latitude of the given coordinate as separate values

getCountries(limit=None)[source]

get a list of countries

try query

getRegions(limit=None)[source]

get Regions from Wikidata

try query

static getValuesClause(varName: str, values, wikidataEntities: bool = True)[source]

generates the SPARQL value clause for the given variable name containing the given values Args:

varName: variable name for the ValuesClause values: values for the clause wikidataEntities(bool): if true the wikidata prefix is added to the values otherwise it is expected taht the given values are proper IRIs

Returns:

str

static getWikidataId(wikidataURL: str)[source]

Extracts the wikidata id from the given wikidata URL

Args:

wikidataURL: wikidata URL the id should be extracted from

Returns:

The wikidata id if present in the given wikidata URL otherwise None

query(msg, queryString: str, limit=None) list[source]

get the query result

Args:

msg(str): the profile message to display queryString(str): the query to execute

Return:

list: the list of dicts with the result

store2DB(lod, tableName: str, primaryKey: str | None = None, sqlDB=None)[source]

store the given list of dicts to the database

Args:

lod(list): the list of dicts tableName(str): the table name to use primaryKey(str): primary key (if any) sqlDB(SQLDB): target SQL database

Module contents

main geograpy 3 module

geograpy.get_geoPlace_context(url=None, text=None, debug=False)[source]

Get a place context for a given text with information about country, region, city and other based on NLTK Named Entities having the Geographic(GPE) label.

Args:

url(String): the url to read text from (if any) text(String): the text to analyze debug(boolean): if True show debug information

Returns:
places:

PlaceContext: the place context

geograpy.get_place_context(url=None, text=None, labels=['GPE', 'GSP', 'PERSON', 'ORGANIZATION'], debug=False)[source]

Get a place context for a given text with information about country, region, city and other based on NLTK Named Entities in the label set Geographic(GPE), Person(PERSON) and Organization(ORGANIZATION).

Args:

url(String): the url to read text from (if any) text(String): the text to analyze debug(boolean): if True show debug information

Returns:
pc:

PlaceContext: the place context

geograpy.locateCity(location, correctMisspelling=False, debug=False)[source]

locate the given location string Args:

location(string): the description of the location

Returns:

Locator: the location