PyGoogleDM – Access the Google Analytics API with Python

Extract Data from the Google Analytics API via Python and Insert it into a MySQL database

This Python program extracts data from the Google Analytics API and inserts it into a MySQL database. The utility can be installed by extracting the contents of the ga_data.zip to the desired location and running from the command line. Make sure the pre-requisite Python modules listed below have been installed. This program was originally written on Python v2.6.5.

Pre-requisite Python Modules

Run “ga_data.py” from the command line as follows:

user@host:~/google_analytics/ga_data/src$ ./ga_data.py

“ga_data.py” has a number of run-time options:

Main prompt selection:
  • 1 = Add new records ONLY to MySQL database
  • 2 = Add new records & update existing records in MySQL database
  • 3 = Output CSV files (page,site_stats,visitor) containing data retrieved
  • 4 = Output SQL script containing INSERT/UPDATE statements for bulk load
Extracting Data from the Google Analytics API

Command window for ga_data.py execution

Additional run-time options:
  • Change home directory and reload config
  • Drop & recreate ga_dm database before program execution
  • Truncate all tables before program execution
  • Obfuscate website and page data in MySQL database after program execution
Sourcecode for main() in ga_data.py python program
""" MAIN PROGRAM """
ga = ga_data()
set_paths() # set initial directory paths
ga.load_config() # set up environment and load configuration
ga.main_prompt(action) # prompt user for action
ga.load_ga_elements()
ga.get_accounts() # retrieve google analytics data
db = mysql()
db.connect(ga) # establish connection to mysql server
if ga.prompts['create']: # option to recreate ga_dm database
	ga.log.info("Dropping and recreating ga_dm database")
	db.run_sql_script(ga,sql_files['create'])
if ga.prompts['truncate']: # option to truncate ga_dm database
	ga.log.info("Truncating data in ga_dm database")
	db.run_sql_script(ga,sql_files['truncate'])
for account in ga.accounts: # perform for each account
	''' date table must always be checked and updated if necessary '''
	db.check_dates(ga)
	''' site table must always be checked and updated if necessary '''
	db.update_sites(account,ga)
	ga.get_data(account,'site') # retrieve site stats
	if ga.prompts['action'] is action.csv: # output to csv if desired
		ga.csv_dump(account,'site')
	else: # otherwise update site stats table
		db.update_site_stats(account,ga,action)
	ga.get_data(account,'page') # retrieve page stats
	if ga.prompts['action'] is action.csv: # output to csv if desired
		ga.csv_dump(account,'page')
	else: # otherwise update page table
		db.update_page(account,ga)
	ga.get_data(account,'visitor') # retrieve visitor stats
	if ga.prompts['action'] is action.csv: # output to csv if desired
		ga.csv_dump(account,'visitor')
	else: # otherwise update visitor table
		db.update_visitors(account,ga)
if ga.prompts['obfuscate']: # option to obfuscate ga_dm database
	ga.log.info("Obfuscating website and page data in ga_dm database")
	db.run_sql_script(ga,sql_files['obfuscate'])
	print "Website and page data obfuscated."
ga.finish() # flush writes and close open file handles

The ga_data.zip file download contains a ga_data directory with all required files (except Python pre-requisite modules noted in README file). Follow instructions in README.txt to set up and run.