django as a data tool in the enterprise - pydata new york 2015
TRANSCRIPT
Using Django as a Data Tool in the Enterprise
Trent OliphantContinuum Analytics
PyData NYC – November 10, 2015
© 2015 Continuum Analytics- Confidential & Proprietary
NOT ALL DATA IS BIG
© 2015 Continuum Analytics- Confidential & Proprietary
3
Enterprise Reporting
© 2015 Continuum Analytics- Confidential & Proprietary
Central Data Store
Simple Process
Clear Results
4
Enterprise Reporting
© 2015 Continuum Analytics- Confidential & Proprietary
BC
BC
BC
BC
BC
Complex Processes Results
Extra
5
Enterprise Reporting• Aggregated• Multiple Business Centers• Various Size Centers• Different Data
© 2015 Continuum Analytics- Confidential & Proprietary
6
Business Center Data
© 2015 Continuum Analytics- Confidential & Proprietary
Multiple
Data Sources
Multiple
Processes
Results
Corporate
7
Business Center Reporting• Needs to feed upstream• Have their own needs• Smaller Teams• Smaller Budgets• Smaller Data
© 2015 Continuum Analytics- Confidential & Proprietary
OPEN SOURCE AS AN OPTION
© 2015 Continuum Analytics- Confidential & Proprietary
9
Advantages• Cost• Ease of use• Community Resources
– Github– Stack Overflow– Anaconda.org
© 2015 Continuum Analytics- Confidential & Proprietary
10
Disadvantages• Distribution and Installation• Support• Knowledge
– Lack of internal sharing– No external sharing
© 2015 Continuum Analytics- Confidential & Proprietary
11
Anaconda Enterprise• Package Deployment• Collaboration• Support• Indemnification
© 2015 Continuum Analytics- Confidential & Proprietary
WHY DJANGO
© 2015 Continuum Analytics- Confidential & Proprietary
13
What is django?• http://djangoproject.com• Web framework• Written in python• Model-View-Template model• v. 1.8 or higher
© 2015 Continuum Analytics- Confidential & Proprietary
14
Why Django?• Easy Install and Setup• Django ORM• Built in Authentication• Built in Admin Interface• Talent Pool
© 2015 Continuum Analytics- Confidential & Proprietary
15
Easy Install and Setup• Using Anaconda
– conda install django– django-admin startproject myproj
• Built in Development web server– python manage.py runserver
© 2015 Continuum Analytics- Confidential & Proprietary
16
Django ORM• Create Models with fields• DB Management Handled• Work with Objects/Properties not SQL• Can work with SQL directly
© 2015 Continuum Analytics- Confidential & Proprietary
17
Built in Authentication• django.contrib.auth• Basic Permissions• Groups• Sessions
© 2015 Continuum Analytics- Confidential & Proprietary
18
Built in Admin Interface• django.contrib.admin• Register model• Basic data entry and editing
© 2015 Continuum Analytics- Confidential & Proprietary
19
Talent Pool• Large Community• Active Community• Available Developers
© 2015 Continuum Analytics- Confidential & Proprietary
20
What about ______?• SQLAlchemy• Flask• Turbo Gears
© 2015 Continuum Analytics- Confidential & Proprietary
PROJECT SETUP
© 2015 Continuum Analytics- Confidential & Proprietary
22
Requirements• Automate forecasting• Simple User Interface• Regular Data Update• Excel “integration”
© 2015 Continuum Analytics- Confidential & Proprietary
23
Team Structure• Four Groups
– Modeling – Finance– Data– Development
© 2015 Continuum Analytics- Confidential & Proprietary
24
Other influences• Corporate Finance• Corporate IT• Internal Corporate Audit• Regulations
© 2015 Continuum Analytics- Confidential & Proprietary
25
Tools used• SAS• Oracle• TeraData• Excel • Python
© 2015 Continuum Analytics- Confidential & Proprietary
26
Environments• Servers
– Production– UAT (User Acceptance Testing)– Development
• Workstations
© 2015 Continuum Analytics- Confidential & Proprietary
27
Workstations• Desktop/Laptops• Windows 7 Enterprise• Locked down
© 2015 Continuum Analytics- Confidential & Proprietary
28
Servers• Linux• Apache• Oracle
© 2015 Continuum Analytics- Confidential & Proprietary
29
Data• Aggregated from TeraData• 115 Tables (including output tables)• Each run generates ~30 MB of data• “Future” data becomes real each month• New future data sets created
© 2015 Continuum Analytics- Confidential & Proprietary
SPECIFIC ISSUES
© 2015 Continuum Analytics- Confidential & Proprietary
31
Data Governance and Controls• Authentication (Single Sign On)• Access Control• Data Validation
© 2015 Continuum Analytics- Confidential & Proprietary
32
Data Sharing• Excel Files
– Multiple Copies– Modifications
• Database– Access Concerns
© 2015 Continuum Analytics- Confidential & Proprietary
33
Data Sharing• Specialization
© 2015 Continuum Analytics- Confidential & Proprietary
34
Limited Machine Access• No shell access
© 2015 Continuum Analytics- Confidential & Proprietary
SPECIFIC SOLUTIONS
© 2015 Continuum Analytics- Confidential & Proprietary
36
Request Flow• Apache > SSO Agent > Django• Request > Middleware > URL resolution >
View resolution > Template > Response• Models can be used anywhere in the chain
© 2015 Continuum Analytics- Confidential & Proprietary
37
Integrating with Authentication• Create Custom Authentication• Create Middleware Class• Update settings.py file to recognize
– AUTHENICATION_BACKENDS– MIDDLEWARE_CLASSES
© 2015 Continuum Analytics- Confidential & Proprietary
38
Create Custom Authenticationclass MyBackend(object):
def authenticate(self, username=None, password=None): # Check the username/password and return a User. return User.objects.get(username=username)
def get_user(self, user_id): try: return User.objects.get(pk=user_id) except User.DoesNotExist: return None
© 2015 Continuum Analytics- Confidential & Proprietary
39
class IntegratedBackend(object):
def authenticate(self, **credentials): username = credentials.get('STANDARDID') first_name = credentials.get('FIRSTNAME') last_name = credentials.get('LASTNAME') email = credentials.get('EMAIL')
try: user = User.objects.get(username=username) except User.DoesNotExist: user = User(username=username, password='Using external login', first_name=first_name, last_name=last_name, email=email, is_active=False) user.save() if not user.is_active: user = None return user
© 2015 Continuum Analytics- Confidential & Proprietary
40
Create Middleware Classfrom django.contrib.auth import authenticate, login, logout
class SSOIntegrationMiddleware(object):
header_fields = ['STANDARDID','FIRSTNAME','LASTNAME','EMAIL']
def process_request(self, request): headers = {x:request.META.get(x) for x in self.header_fields} if not (request.user.username==request.META.get('STANDARDID')): logout(request) if not request.user.is_authenticated(): user = authenticate(**headers) if user is not None: login(request, user)
return None
© 2015 Continuum Analytics- Confidential & Proprietary
41
Update settings.py fileAUTHENTICATION_BACKENDS = ( 'auth.IntegratedBackend', 'django.contrib.auth.backends.ModelBackend')
MIDDLEWARE_CLASSES = ( 'django.contrib.sessions.middleware.SessionMiddleware', 'django.middleware.common.CommonMiddleware', 'django.middleware.csrf.CsrfViewMiddleware', 'django.contrib.auth.middleware.AuthenticationMiddleware', 'django.contrib.messages.middleware.MessageMiddleware', 'django.middleware.clickjacking.XFrameOptionsMiddleware', 'middleware.SSOIntegrationMiddleware',)
© 2015 Continuum Analytics- Confidential & Proprietary
42
Mocking Integration• Create yaml file• Create mock function• Update Middleware Class
© 2015 Continuum Analytics- Confidential & Proprietary
43
Basic ssomock.yamlactive : trent
trent : STANDARDID : 123456 FIRSTNAME : Trent LASTNAME : Oliphant EMAIL : [email protected]
bob : STANDARDID : 987654 FIRSTNAME : Bob LASTNAME : Rumsfield EMAIL : [email protected]
© 2015 Continuum Analytics- Confidential & Proprietary
44
Create mock functionimport yamldef _get_mocked_headers(self): headers = None with open('ssomock.yaml','r') as f: raw = yaml.load(f) active = raw.get('active') if active: headers = raw.get(active) return headers
© 2015 Continuum Analytics- Confidential & Proprietary
45
Update Middleware classheaders = {x:request.META.get(x) for x in self.header_fields}
if not request.META.get('STANDARDID'): headers = self._get_mocked_headers() request.META.update(headers)else: headers = {x:request.META.get(x) for x in self.header_fields}
© 2015 Continuum Analytics- Confidential & Proprietary
46
Access Control• Normally at the model level
– delete, change, add• Uses django_content_type table• Needed it to be at a view (page) level
© 2015 Continuum Analytics- Confidential & Proprietary
47
Access Control• Create Custom Content Type• Custom model manager• Create Custom Permission model• Register admin interface• Add decorator to views
© 2015 Continuum Analytics- Confidential & Proprietary
48
Create Content Type• Insert into django_content_type table
– app_label = ‘ui’– model = ‘uipermission’
• Through admin interface or direct to DB
© 2015 Continuum Analytics- Confidential & Proprietary
49
Custom Permission Managerfrom django.db import Models
class UIPermissionManager(models.Manager): def get_queryset(self): return super(UIPermissionManager, self).get_queryset().filter( content_type__model='uipermission' )
© 2015 Continuum Analytics- Confidential & Proprietary
50
Custom Permission Modelfrom django.contrib.auth.models import Permissionfrom django.contrib.contenttypes.model import ContentType
class UIPermission(Permission):
objects = UIPermissionManager()
class Meta: proxy = True verbose_name = 'ui_permission'
def save(self, *args, **kwargs): ct, create = ContentType.objects.get_or_create( model=self._meta.model_name, app_label=self._meta.app_label, ) self.content_type = ct super(UIPermission, self).save(*args)
© 2015 Continuum Analytics- Confidential & Proprietary
51
Add permission to viewfrom django.contrib.auth.decorators import permission_required
@permission_required(‘permission_name’, login_url=‘/denied_page’)def my_view(request):
…
© 2015 Continuum Analytics- Confidential & Proprietary
52
Accessing Output• Output written to database• Create excel files
– email – Download
• Download CSV and log files
© 2015 Continuum Analytics- Confidential & Proprietary
53
Create Excel file• Uses xlswriter• Gets pandas dataframe from SQL query• Each query written to own tab
© 2015 Continuum Analytics- Confidential & Proprietary
54
Download Fileimport osfrom django.http import HttpResponsefrom django.core.servers.basehttp import FileWrapper
def download_file(request): filepath = 'Newly created file' wrapper = FileWrapper(open(filepath, 'rb')) response = HttpResponse(wrapper, content_type='application/force-download') response['Content-Length'] = os.path.getsize(filepath) filename = os.path.basename(filepath) response['Content-Disposition'] = 'attachment; filename={}'.format(filename) return response
© 2015 Continuum Analytics- Confidential & Proprietary
55
Uploading Data• Simple form• Tab names must match table/model names• Column names must match• Uses xlrd, pandas and cursor (not ORM)
© 2015 Continuum Analytics- Confidential & Proprietary
56
Uploading Dataimport xlrdimport pandas as pdfrom django.shortcuts import renderfrom django.db import connection, IntegityError, DatabaseError
def upload_data(request): if request.method == 'POST': workbook = self.open_workbook(request.FILES['uploaded_file']) for sheetname in workbook.sheet_name: # Do some error checking df = pd.read_excel(workbook, sheetname, engine='xlrd') cols = ', '.join(df.columns) # Django wrapper of the cx_oracle connector expects %s format val_holder = ', '.join(['%s'])*len(df.columns) stmt_text = "INSERT INTO {} ({}) VALUES {()}" stmt = stmt_text.format(sheetname, cols, val_holder) cursor = connection.cursor() cursor.executemany(stmt, df.values.to_list()) return render(request, 'upload.html’)
© 2015 Continuum Analytics- Confidential & Proprietary
57
Basic Admin Access• __str__ representation of the object• No data
from django.contrib import adminfrom django.apps import apps
for model in apps.get_app_config('data').get_models(): admin.site.register(model)
© 2015 Continuum Analytics- Confidential & Proprietary
58
Tabular view• Use list_display as property of class• Needs a ModelAdmin class
class ExampleModelAdmin(admin.ModelAdmin): list_display('field1','field2','field3')
admin.site.register(ExampleModel, ExampleModelAdmin)
© 2015 Continuum Analytics- Confidential & Proprietary
59
Tabular Admin Viewfor model in apps.get_app_config('data').get_models(): field_names = [f.name for f in model._meta.get_fields() if f.concrete] cls_nm = "{}_admin".format(model._meta.model_name) options = {'list_display': field_names} cls = type(cls_nm, (admin.ModelAdmin,), options)
admin.site.register(model, cls)
© 2015 Continuum Analytics- Confidential & Proprietary
60
Using a Different Oracle Schema• Runs check_migrate
– Reads USER_TABLES
© 2015 Continuum Analytics- Confidential & Proprietary
61
Intercepting Django Logging• Turn off default logging
– LOGGING_CONFIG = None• Use ‘django’ as the name of logger
© 2015 Continuum Analytics- Confidential & Proprietary
62
Overriding SETTINGS• settings.py is just a python file• Read yaml file• Update globals() with those from file
© 2015 Continuum Analytics- Confidential & Proprietary
63
Managed = False• Different team deployed database schema• No rights for Django to create schema• manage.py sqlmigrate > output.sql
© 2015 Continuum Analytics- Confidential & Proprietary
64
Things to watch out for• Meta options
– table_name – Managed
• Database Error, IntegrityError– Django wraps the underlying cx_oracle
© 2015 Continuum Analytics- Confidential & Proprietary