Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/astridnielsen-lab/tellmewho


https://github.com/astridnielsen-lab/tellmewho

Last synced: 10 days ago
JSON representation

Awesome Lists containing this project

README

        

COMS E6111 Project2
============================================================================
*Group members

Di Li - dl2943
Bingjie Sun - bs2888
============================================================================
*List of files

main.py - main code for 3 modes of input
matching.py - mapping configuration from querying Freebase API properties to
our program
printable.py - print customized table
question.py - MQLquery for part 2
README - readme file
transcript.txt - Sample run results (please view this full screen for proper table views)

============================================================================
*How to run?

- Dependency libraries (Python):
urllib
json
OrderedDict

- To run the code:

Basic mode : If you want to input a single query in string format ""
-k -q -t [infobox|question]

File input mode : If you have an input file with queries on each line
-k -f -t [infobox|question]

Interactive mode : If you want to keep inputing in terminal until you are done
-k

- GoogleAPI Search Account Key

AIzaSyB-LJI6QQ9P_D1WyKuCLT6yABME20lrYwM

- requests per second per user

Same as default, 10

============================================================================
*Internal Project Design

The main.py takes in five type of input (k,q,t,f,h), we used check_args() function
to extract the input mode and fields, also we provided instructions on how to
use the system through usage() function.

For mode 1, we choose to run either infobox.py or question.py depending on the
input parameter -t. If the user want to query infobox, the system pass api key
and query to run(api_k, query) function inside infobox.py.

For mode 2, similar to mode 1, we run either infobox.py or question.py depending
on the input parameter -t, for each line of the input file.

For mode 3, we wrote an infinite loop which will only break on KeyboardInterrupt
(Control+D on mac). Inside each loop, we take in a raw input and check if its a
"Who created"/"who created" question or others. If former, we call questions.py
and if latter we call infobox.py.


How infobox.py runs -

First the search(query) function was called, quering
Freebase API and return a list of mids, which may contain more types than
required. So we have two helper functions valid_topic() and cleanup_type(),
the former for filtering out the 6 categories amongst all mids, and the
latter for resolving conflicted catergories (eg. Author and League).
Using the accepted mids, topic() function query Freebase API and get the data
in Dict format.
Then, we call assemble_infobox() to format the Dict raw data into our
own design - a Dict of lists, where lists can consist of text values or
a nested layer of dicts. Inside these nested dicts, each key and value
are a row of values of all sub-field under the outer layer. We used OrderedDict
for this design, so that each output could have a certain output format. Also,
for some entities that does not contain all the properties, we just put an
empty list in the corresponding position.
At last, printable() function is called to display the data.

How question.py runs -

First the extractX() function was called to get the query word, we only
consider "Who created"/"who created" as valid input, and only query the API
with the given two types ('Author', 'BusinessPerson').
Then, we call MQLquery() function to query the API. Noticed that same as the
reference program, we provide table-like output for interactive mode, and text
output for Basic and Fileinput mode. This distinction was implemented simply
by calling printable() function for mode 3, whereas we just output lines of
concatenated strings for mode 1 and 2.

How printable.py works -

This is a purly coded from scrach function for displaying beautiful tables and
allowing nested columns. To make the table wider just change the whole parameter.
The function first checks if the header data was passed in as parameter, so that
we can print Name+Categories for infobox, and no such header for questions.
Then, for each key and value(list) in the given Dict, we check the type of (list[0]).
If it is a Dict, then we have to divide the columns in terms of the nested fields we
have. If not, we can simply print out the list values. All of these functionalities
have taken automatically breakline into account to prevent table dsplay overflow.

============================================================================
*Additional Points

mapping list: Convert Freebase Properties to our data structure, from matching.py

- The mapping we used to filter out only six categories of entities:

accepted_type_list = OrderedDict([
('/people/person', 'Person'),
('/people/deceased_person', 'Person'),
('/book/author', 'Author'),
('/film/actor', 'Actor'),
('/tv/tv_actor', 'Actor'),
('/organization/organization_founder', 'BusinessPerson'),
('/business/board_member', 'BusinessPerson'),
('/sports/sports_league', 'League'),
('/sports/sports_team', 'SportsTeam'),
('/sports/professional-_sports_team', 'SportsTeam'),
])

- The mapping we used to match properties with our data structure ( OrderedDictionary of lists )

information_map = OrderedDict([
('/people/person', OrderedDict([
('/type/object/name', 'Name'),
('/people/person/date_of_birth', 'Birthday'),
('/people/person/place_of_birth', 'Place of Birth'),
('/people/person/sibling_s', {
'name': 'Siblings',
'children': {
'/people/sibling_relationship/sibling': 'Sibling',
},
}),
('/people/person/spouse_s', {
'name': 'Spouse(s)',
'children': {
'/people/marriage/spouse': 'Spouse Name',
'/people/marriage/from': 'Marriage From',
'/people/marriage/to': 'Marriage To',
'/people/marriage/location_of_ceremony': 'Ceremony Location',
},
}),
('/common/topic/description', 'Description'),
])),
('/people/deceased_person', OrderedDict([
('/people/deceased_person/date_of_death', 'Death Date'),
('/people/deceased_person/place_of_death', 'Death Place'),
('/people/deceased_person/cause_of_death', 'Death Cause'),
])),
('/book/author', OrderedDict([
('/book/author/works_written', 'Books'),
('/book/book_subject/works', 'Books About The Author'),
('/influence/influence_node/influenced', 'Influenced'),
('/influence/influence_node/influenced_by', 'Influenced By'),
])),
('/film/actor', OrderedDict([
('/film/actor/film', {
'name': 'Films',
'children': {
'/film/performance/character': 'Character',
'/film/performance/film': 'Film',
},
}),
])),
('/tv/tv_actor', OrderedDict([
('/tv/tv_actor/guest_roles', {
'name': 'TV Series',
'children': {
'/tv/tv_guest_role/character': 'Character',
'/tv/tv_guest_role/episodes_appeared_in': 'TV Series',
}
}),
('/tv/tv_actor/starring_roles', {
'name': 'TV Series',
'children': {
'/tv/regular_tv_appearance/character': 'Character',
'/tv/regular_tv_appearance/series': 'TV Series',
},
}),
])),
('/organization/organization_founder', OrderedDict([
('/organization/organization_founder/organizations_founded', 'Founded'),
])),
('/business/board_member', OrderedDict([
('/business/board_member/leader_of', {
'name': 'Leadership',
'children': {
'/organization/leadership/from': 'From',
'/organization/leadership/to': 'To',
'/organization/leadership/organization': 'Organization',
'/organization/leadership/role': 'Role',
'/organization/leadership/title': 'Title',
},
}),
('/business/board_member/organization_board_memberships', {
'name': 'Board Membership',
'children': {
'/organization/organization_board_membership/from': 'From',
'/organization/organization_board_membership/to': 'To',
'/organization/organization_board_membership/organization': 'Organization',
'/organization/organization_board_membership/role': 'Role',
'/organization/organization_board_membership/title': 'Title',
},
}),
])),
('/sports/sports_league', OrderedDict([
('/type/object/name', 'Name'),
('/sports/sports_league/championship', 'Championship'),
('/sports/sports_league/sport', 'Sport'),
('/organization/organization/slogan', 'Slogan'),
('/common/topic/official_website', 'Website'),
('/common/topic/description', 'Description'),
('/sports/sports_league/teams', {
'name': 'Teams',
'children': {
'/sports/sports_league_participation/team': 'Team',
},
}),
])),
('/sports/sports_team', OrderedDict([
('/type/object/name', 'Name'),
('/common/topic/description', 'Description'),
('/sports/sports_team/sport', 'Sport'),
('/sports/sports_team/arena_stadium', 'Arena'),
('/sports/sports_team/championships', 'Championships'),
('/sports/sports_team/coaches', {
'name': 'Coaches',
'children': {
'/sports/sports_team_coach_tenure/coach': 'Name',
'/sports/sports_team_coach_tenure/position': 'Position',
'/sports/sports_team_coach_tenure/from': 'From',
'/sports/sports_team_coach_tenure/to': 'To',
},
}),
('/sports/sports_team/founded', 'Founded'),
('/sports/sports_team/league', {
'name': 'League(s)',
'children': {
'/sports/sports_league_participation/league': 'League',
},
}),
('/sports/sports_team/location', 'Location'),
('/sports/sports_team/roster', {
'name': 'PlayerRoster',
'children': {
'/sports/sports_team_roster/player': 'Name',
'/sports/sports_team_roster/position': 'Position',
'/sports/sports_team_roster/number': 'Number',
'/sports/sports_team_roster/from': 'From',
'/sports/sports_team_roster/to': 'To',
},
}),
])),
('/sports/professional_sports_team', OrderedDict([
### empty
])),
])

============================================================================