django as a data tool in the enterprise - pydata new york 2015

64
Using Django as a Data Tool in the Enterprise Trent Oliphant Continuum Analytics PyData NYC – November 10, 2015 © 2015 Continuum Analytics- Confidential & Proprietary

Upload: trentoliphant

Post on 14-Apr-2017

517 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Django as a Data Tool in the Enterprise - PyData New York 2015

Using Django as a Data Tool in the Enterprise

Trent OliphantContinuum Analytics

PyData NYC – November 10, 2015

© 2015 Continuum Analytics- Confidential & Proprietary

Page 2: Django as a Data Tool in the Enterprise - PyData New York 2015

NOT ALL DATA IS BIG

© 2015 Continuum Analytics- Confidential & Proprietary

Page 3: Django as a Data Tool in the Enterprise - PyData New York 2015

3

Enterprise Reporting

© 2015 Continuum Analytics- Confidential & Proprietary

Central Data Store

Simple Process

Clear Results

Page 4: Django as a Data Tool in the Enterprise - PyData New York 2015

4

Enterprise Reporting

© 2015 Continuum Analytics- Confidential & Proprietary

BC

BC

BC

BC

BC

Complex Processes Results

Extra

Page 5: Django as a Data Tool in the Enterprise - PyData New York 2015

5

Enterprise Reporting• Aggregated• Multiple Business Centers• Various Size Centers• Different Data

© 2015 Continuum Analytics- Confidential & Proprietary

Page 6: Django as a Data Tool in the Enterprise - PyData New York 2015

6

Business Center Data

© 2015 Continuum Analytics- Confidential & Proprietary

Multiple

Data Sources

Multiple

Processes

Results

Corporate

Page 7: Django as a Data Tool in the Enterprise - PyData New York 2015

7

Business Center Reporting• Needs to feed upstream• Have their own needs• Smaller Teams• Smaller Budgets• Smaller Data

© 2015 Continuum Analytics- Confidential & Proprietary

Page 8: Django as a Data Tool in the Enterprise - PyData New York 2015

OPEN SOURCE AS AN OPTION

© 2015 Continuum Analytics- Confidential & Proprietary

Page 9: Django as a Data Tool in the Enterprise - PyData New York 2015

9

Advantages• Cost• Ease of use• Community Resources

– Github– Stack Overflow– Anaconda.org

© 2015 Continuum Analytics- Confidential & Proprietary

Page 10: Django as a Data Tool in the Enterprise - PyData New York 2015

10

Disadvantages• Distribution and Installation• Support• Knowledge

– Lack of internal sharing– No external sharing

© 2015 Continuum Analytics- Confidential & Proprietary

Page 11: Django as a Data Tool in the Enterprise - PyData New York 2015

11

Anaconda Enterprise• Package Deployment• Collaboration• Support• Indemnification

© 2015 Continuum Analytics- Confidential & Proprietary

Page 12: Django as a Data Tool in the Enterprise - PyData New York 2015

WHY DJANGO

© 2015 Continuum Analytics- Confidential & Proprietary

Page 13: Django as a Data Tool in the Enterprise - PyData New York 2015

13

What is django?• http://djangoproject.com• Web framework• Written in python• Model-View-Template model• v. 1.8 or higher

© 2015 Continuum Analytics- Confidential & Proprietary

Page 14: Django as a Data Tool in the Enterprise - PyData New York 2015

14

Why Django?• Easy Install and Setup• Django ORM• Built in Authentication• Built in Admin Interface• Talent Pool

© 2015 Continuum Analytics- Confidential & Proprietary

Page 15: Django as a Data Tool in the Enterprise - PyData New York 2015

15

Easy Install and Setup• Using Anaconda

– conda install django– django-admin startproject myproj

• Built in Development web server– python manage.py runserver

© 2015 Continuum Analytics- Confidential & Proprietary

Page 16: Django as a Data Tool in the Enterprise - PyData New York 2015

16

Django ORM• Create Models with fields• DB Management Handled• Work with Objects/Properties not SQL• Can work with SQL directly

© 2015 Continuum Analytics- Confidential & Proprietary

Page 17: Django as a Data Tool in the Enterprise - PyData New York 2015

17

Built in Authentication• django.contrib.auth• Basic Permissions• Groups• Sessions

© 2015 Continuum Analytics- Confidential & Proprietary

Page 18: Django as a Data Tool in the Enterprise - PyData New York 2015

18

Built in Admin Interface• django.contrib.admin• Register model• Basic data entry and editing

© 2015 Continuum Analytics- Confidential & Proprietary

Page 19: Django as a Data Tool in the Enterprise - PyData New York 2015

19

Talent Pool• Large Community• Active Community• Available Developers

© 2015 Continuum Analytics- Confidential & Proprietary

Page 20: Django as a Data Tool in the Enterprise - PyData New York 2015

20

What about ______?• SQLAlchemy• Flask• Turbo Gears

© 2015 Continuum Analytics- Confidential & Proprietary

Page 21: Django as a Data Tool in the Enterprise - PyData New York 2015

PROJECT SETUP

© 2015 Continuum Analytics- Confidential & Proprietary

Page 22: Django as a Data Tool in the Enterprise - PyData New York 2015

22

Requirements• Automate forecasting• Simple User Interface• Regular Data Update• Excel “integration”

© 2015 Continuum Analytics- Confidential & Proprietary

Page 23: Django as a Data Tool in the Enterprise - PyData New York 2015

23

Team Structure• Four Groups

– Modeling – Finance– Data– Development

© 2015 Continuum Analytics- Confidential & Proprietary

Page 24: Django as a Data Tool in the Enterprise - PyData New York 2015

24

Other influences• Corporate Finance• Corporate IT• Internal Corporate Audit• Regulations

© 2015 Continuum Analytics- Confidential & Proprietary

Page 25: Django as a Data Tool in the Enterprise - PyData New York 2015

25

Tools used• SAS• Oracle• TeraData• Excel • Python

© 2015 Continuum Analytics- Confidential & Proprietary

Page 26: Django as a Data Tool in the Enterprise - PyData New York 2015

26

Environments• Servers

– Production– UAT (User Acceptance Testing)– Development

• Workstations

© 2015 Continuum Analytics- Confidential & Proprietary

Page 27: Django as a Data Tool in the Enterprise - PyData New York 2015

27

Workstations• Desktop/Laptops• Windows 7 Enterprise• Locked down

© 2015 Continuum Analytics- Confidential & Proprietary

Page 28: Django as a Data Tool in the Enterprise - PyData New York 2015

28

Servers• Linux• Apache• Oracle

© 2015 Continuum Analytics- Confidential & Proprietary

Page 29: Django as a Data Tool in the Enterprise - PyData New York 2015

29

Data• Aggregated from TeraData• 115 Tables (including output tables)• Each run generates ~30 MB of data• “Future” data becomes real each month• New future data sets created

© 2015 Continuum Analytics- Confidential & Proprietary

Page 30: Django as a Data Tool in the Enterprise - PyData New York 2015

SPECIFIC ISSUES

© 2015 Continuum Analytics- Confidential & Proprietary

Page 31: Django as a Data Tool in the Enterprise - PyData New York 2015

31

Data Governance and Controls• Authentication (Single Sign On)• Access Control• Data Validation

© 2015 Continuum Analytics- Confidential & Proprietary

Page 32: Django as a Data Tool in the Enterprise - PyData New York 2015

32

Data Sharing• Excel Files

– Multiple Copies– Modifications

• Database– Access Concerns

© 2015 Continuum Analytics- Confidential & Proprietary

Page 33: Django as a Data Tool in the Enterprise - PyData New York 2015

33

Data Sharing• Specialization

© 2015 Continuum Analytics- Confidential & Proprietary

Page 34: Django as a Data Tool in the Enterprise - PyData New York 2015

34

Limited Machine Access• No shell access

© 2015 Continuum Analytics- Confidential & Proprietary

Page 35: Django as a Data Tool in the Enterprise - PyData New York 2015

SPECIFIC SOLUTIONS

© 2015 Continuum Analytics- Confidential & Proprietary

Page 36: Django as a Data Tool in the Enterprise - PyData New York 2015

36

Request Flow• Apache > SSO Agent > Django• Request > Middleware > URL resolution >

View resolution > Template > Response• Models can be used anywhere in the chain

© 2015 Continuum Analytics- Confidential & Proprietary

Page 37: Django as a Data Tool in the Enterprise - PyData New York 2015

37

Integrating with Authentication• Create Custom Authentication• Create Middleware Class• Update settings.py file to recognize

– AUTHENICATION_BACKENDS– MIDDLEWARE_CLASSES

© 2015 Continuum Analytics- Confidential & Proprietary

Page 38: Django as a Data Tool in the Enterprise - PyData New York 2015

38

Create Custom Authenticationclass MyBackend(object):

def authenticate(self, username=None, password=None): # Check the username/password and return a User. return User.objects.get(username=username)

def get_user(self, user_id): try: return User.objects.get(pk=user_id) except User.DoesNotExist: return None

© 2015 Continuum Analytics- Confidential & Proprietary

Page 39: Django as a Data Tool in the Enterprise - PyData New York 2015

39

class IntegratedBackend(object):

def authenticate(self, **credentials): username = credentials.get('STANDARDID') first_name = credentials.get('FIRSTNAME') last_name = credentials.get('LASTNAME') email = credentials.get('EMAIL')

try: user = User.objects.get(username=username) except User.DoesNotExist: user = User(username=username, password='Using external login', first_name=first_name, last_name=last_name, email=email, is_active=False) user.save() if not user.is_active: user = None return user

© 2015 Continuum Analytics- Confidential & Proprietary

Page 40: Django as a Data Tool in the Enterprise - PyData New York 2015

40

Create Middleware Classfrom django.contrib.auth import authenticate, login, logout

class SSOIntegrationMiddleware(object):

header_fields = ['STANDARDID','FIRSTNAME','LASTNAME','EMAIL']

def process_request(self, request): headers = {x:request.META.get(x) for x in self.header_fields} if not (request.user.username==request.META.get('STANDARDID')): logout(request) if not request.user.is_authenticated(): user = authenticate(**headers) if user is not None: login(request, user)

return None

© 2015 Continuum Analytics- Confidential & Proprietary

Page 41: Django as a Data Tool in the Enterprise - PyData New York 2015

41

Update settings.py fileAUTHENTICATION_BACKENDS = ( 'auth.IntegratedBackend', 'django.contrib.auth.backends.ModelBackend')

MIDDLEWARE_CLASSES = ( 'django.contrib.sessions.middleware.SessionMiddleware', 'django.middleware.common.CommonMiddleware', 'django.middleware.csrf.CsrfViewMiddleware', 'django.contrib.auth.middleware.AuthenticationMiddleware', 'django.contrib.messages.middleware.MessageMiddleware', 'django.middleware.clickjacking.XFrameOptionsMiddleware', 'middleware.SSOIntegrationMiddleware',)

© 2015 Continuum Analytics- Confidential & Proprietary

Page 42: Django as a Data Tool in the Enterprise - PyData New York 2015

42

Mocking Integration• Create yaml file• Create mock function• Update Middleware Class

© 2015 Continuum Analytics- Confidential & Proprietary

Page 43: Django as a Data Tool in the Enterprise - PyData New York 2015

43

Basic ssomock.yamlactive : trent

trent : STANDARDID : 123456 FIRSTNAME : Trent LASTNAME : Oliphant EMAIL : [email protected]

bob : STANDARDID : 987654 FIRSTNAME : Bob LASTNAME : Rumsfield EMAIL : [email protected]

© 2015 Continuum Analytics- Confidential & Proprietary

Page 44: Django as a Data Tool in the Enterprise - PyData New York 2015

44

Create mock functionimport yamldef _get_mocked_headers(self): headers = None with open('ssomock.yaml','r') as f: raw = yaml.load(f) active = raw.get('active') if active: headers = raw.get(active) return headers

© 2015 Continuum Analytics- Confidential & Proprietary

Page 45: Django as a Data Tool in the Enterprise - PyData New York 2015

45

Update Middleware classheaders = {x:request.META.get(x) for x in self.header_fields}

if not request.META.get('STANDARDID'): headers = self._get_mocked_headers() request.META.update(headers)else: headers = {x:request.META.get(x) for x in self.header_fields}

© 2015 Continuum Analytics- Confidential & Proprietary

Page 46: Django as a Data Tool in the Enterprise - PyData New York 2015

46

Access Control• Normally at the model level

– delete, change, add• Uses django_content_type table• Needed it to be at a view (page) level

© 2015 Continuum Analytics- Confidential & Proprietary

Page 47: Django as a Data Tool in the Enterprise - PyData New York 2015

47

Access Control• Create Custom Content Type• Custom model manager• Create Custom Permission model• Register admin interface• Add decorator to views

© 2015 Continuum Analytics- Confidential & Proprietary

Page 48: Django as a Data Tool in the Enterprise - PyData New York 2015

48

Create Content Type• Insert into django_content_type table

– app_label = ‘ui’– model = ‘uipermission’

• Through admin interface or direct to DB

© 2015 Continuum Analytics- Confidential & Proprietary

Page 49: Django as a Data Tool in the Enterprise - PyData New York 2015

49

Custom Permission Managerfrom django.db import Models

class UIPermissionManager(models.Manager): def get_queryset(self): return super(UIPermissionManager, self).get_queryset().filter( content_type__model='uipermission' )

© 2015 Continuum Analytics- Confidential & Proprietary

Page 50: Django as a Data Tool in the Enterprise - PyData New York 2015

50

Custom Permission Modelfrom django.contrib.auth.models import Permissionfrom django.contrib.contenttypes.model import ContentType

class UIPermission(Permission):

objects = UIPermissionManager()

class Meta: proxy = True verbose_name = 'ui_permission'

def save(self, *args, **kwargs): ct, create = ContentType.objects.get_or_create( model=self._meta.model_name, app_label=self._meta.app_label, ) self.content_type = ct super(UIPermission, self).save(*args)

© 2015 Continuum Analytics- Confidential & Proprietary

Page 51: Django as a Data Tool in the Enterprise - PyData New York 2015

51

Add permission to viewfrom django.contrib.auth.decorators import permission_required

@permission_required(‘permission_name’, login_url=‘/denied_page’)def my_view(request):

© 2015 Continuum Analytics- Confidential & Proprietary

Page 52: Django as a Data Tool in the Enterprise - PyData New York 2015

52

Accessing Output• Output written to database• Create excel files

– email – Download

• Download CSV and log files

© 2015 Continuum Analytics- Confidential & Proprietary

Page 53: Django as a Data Tool in the Enterprise - PyData New York 2015

53

Create Excel file• Uses xlswriter• Gets pandas dataframe from SQL query• Each query written to own tab

© 2015 Continuum Analytics- Confidential & Proprietary

Page 54: Django as a Data Tool in the Enterprise - PyData New York 2015

54

Download Fileimport osfrom django.http import HttpResponsefrom django.core.servers.basehttp import FileWrapper

def download_file(request): filepath = 'Newly created file' wrapper = FileWrapper(open(filepath, 'rb')) response = HttpResponse(wrapper, content_type='application/force-download') response['Content-Length'] = os.path.getsize(filepath) filename = os.path.basename(filepath) response['Content-Disposition'] = 'attachment; filename={}'.format(filename) return response

© 2015 Continuum Analytics- Confidential & Proprietary

Page 55: Django as a Data Tool in the Enterprise - PyData New York 2015

55

Uploading Data• Simple form• Tab names must match table/model names• Column names must match• Uses xlrd, pandas and cursor (not ORM)

© 2015 Continuum Analytics- Confidential & Proprietary

Page 56: Django as a Data Tool in the Enterprise - PyData New York 2015

56

Uploading Dataimport xlrdimport pandas as pdfrom django.shortcuts import renderfrom django.db import connection, IntegityError, DatabaseError

def upload_data(request): if request.method == 'POST': workbook = self.open_workbook(request.FILES['uploaded_file']) for sheetname in workbook.sheet_name: # Do some error checking df = pd.read_excel(workbook, sheetname, engine='xlrd') cols = ', '.join(df.columns) # Django wrapper of the cx_oracle connector expects %s format val_holder = ', '.join(['%s'])*len(df.columns) stmt_text = "INSERT INTO {} ({}) VALUES {()}" stmt = stmt_text.format(sheetname, cols, val_holder) cursor = connection.cursor() cursor.executemany(stmt, df.values.to_list()) return render(request, 'upload.html’)

© 2015 Continuum Analytics- Confidential & Proprietary

Page 57: Django as a Data Tool in the Enterprise - PyData New York 2015

57

Basic Admin Access• __str__ representation of the object• No data

from django.contrib import adminfrom django.apps import apps

for model in apps.get_app_config('data').get_models(): admin.site.register(model)

© 2015 Continuum Analytics- Confidential & Proprietary

Page 58: Django as a Data Tool in the Enterprise - PyData New York 2015

58

Tabular view• Use list_display as property of class• Needs a ModelAdmin class

class ExampleModelAdmin(admin.ModelAdmin): list_display('field1','field2','field3')

admin.site.register(ExampleModel, ExampleModelAdmin)

© 2015 Continuum Analytics- Confidential & Proprietary

Page 59: Django as a Data Tool in the Enterprise - PyData New York 2015

59

Tabular Admin Viewfor model in apps.get_app_config('data').get_models(): field_names = [f.name for f in model._meta.get_fields() if f.concrete] cls_nm = "{}_admin".format(model._meta.model_name) options = {'list_display': field_names} cls = type(cls_nm, (admin.ModelAdmin,), options)

admin.site.register(model, cls)

© 2015 Continuum Analytics- Confidential & Proprietary

Page 60: Django as a Data Tool in the Enterprise - PyData New York 2015

60

Using a Different Oracle Schema• Runs check_migrate

– Reads USER_TABLES

© 2015 Continuum Analytics- Confidential & Proprietary

Page 61: Django as a Data Tool in the Enterprise - PyData New York 2015

61

Intercepting Django Logging• Turn off default logging

– LOGGING_CONFIG = None• Use ‘django’ as the name of logger

© 2015 Continuum Analytics- Confidential & Proprietary

Page 62: Django as a Data Tool in the Enterprise - PyData New York 2015

62

Overriding SETTINGS• settings.py is just a python file• Read yaml file• Update globals() with those from file

© 2015 Continuum Analytics- Confidential & Proprietary

Page 63: Django as a Data Tool in the Enterprise - PyData New York 2015

63

Managed = False• Different team deployed database schema• No rights for Django to create schema• manage.py sqlmigrate > output.sql

© 2015 Continuum Analytics- Confidential & Proprietary

Page 64: Django as a Data Tool in the Enterprise - PyData New York 2015

64

Things to watch out for• Meta options

– table_name – Managed

• Database Error, IntegrityError– Django wraps the underlying cx_oracle

© 2015 Continuum Analytics- Confidential & Proprietary