identifying third party software with scancode
TRANSCRIPT
![Page 1: Identifying third party software with ScanCode](https://reader034.vdocuments.site/reader034/viewer/2022051707/58ae90761a28abdf068b5af5/html5/thumbnails/1.jpg)
Identifying open source software with ScanCodeMay 2016
Open Source for Open Source
![Page 2: Identifying third party software with ScanCode](https://reader034.vdocuments.site/reader034/viewer/2022051707/58ae90761a28abdf068b5af5/html5/thumbnails/2.jpg)
▷ Introduction to ScanCode○ Toolkit○ App
▷ Demo▷ More Details▷ About nexB
![Page 3: Identifying third party software with ScanCode](https://reader034.vdocuments.site/reader034/viewer/2022051707/58ae90761a28abdf068b5af5/html5/thumbnails/3.jpg)
Benefits of an open source scannerAs a developer:
▷ I get normalized data for comprehensive origin and license
▷ I can find the license immediately when I evaluate a library
▷ I can identify and resolve license issues before a release
▷ I can identify issues for each commit
▷ I can communicate clearly with legal and business about license
and origin of third-party code
You can use the Apache-licensed ScanCode Toolkit now!
Participate by contributing code, license rules, bugs or suggestions.
![Page 4: Identifying third party software with ScanCode](https://reader034.vdocuments.site/reader034/viewer/2022051707/58ae90761a28abdf068b5af5/html5/thumbnails/4.jpg)
What does ScanCode Toolkit do?It scans source and binary code to find:
▷ License notices, texts and “mentions”
▷ Copyright notices
▷ Package-level information (RPM, nuget, NPM, Jar, etc.)
▷ Other provenance clues (author, email, etc.)
▷ File-level information (type, name, checksums, etc.)
![Page 5: Identifying third party software with ScanCode](https://reader034.vdocuments.site/reader034/viewer/2022051707/58ae90761a28abdf068b5af5/html5/thumbnails/5.jpg)
ScanCode Results are provided as:
▷ JSON file
▷ Dynamic HTML
▷ Static HTML table usable in a
spreadsheet
▷ AND
▷ ... the new ScanCode App
▷ ... next, in the ScanCode.io server
![Page 6: Identifying third party software with ScanCode](https://reader034.vdocuments.site/reader034/viewer/2022051707/58ae90761a28abdf068b5af5/html5/thumbnails/6.jpg)
Place your screenshot here
ScanCode Toolkit Demo
![Page 7: Identifying third party software with ScanCode](https://reader034.vdocuments.site/reader034/viewer/2022051707/58ae90761a28abdf068b5af5/html5/thumbnails/7.jpg)
Available on GitHub
▷ Get the codehttps://github.com/nexB/scancode-toolkit/
▷ Read morehttps://github.com/nexB/scancode-toolkit/wiki
▷ Report an issue or ideahttps://github.com/nexB/scancode-toolkit/issues
▷ Commercial support and services available from nexB : ScanCode starter pack http://www.nexb.com/
![Page 8: Identifying third party software with ScanCode](https://reader034.vdocuments.site/reader034/viewer/2022051707/58ae90761a28abdf068b5af5/html5/thumbnails/8.jpg)
ScanCode Licensing
License Notes
Software Apache 2.0With an acknowledgement in the scan output.
Reference Data
CC0 1.0 Public Domain
Third Party Components
L/GPL, MIT, BSD, Apache Various Licenses
![Page 9: Identifying third party software with ScanCode](https://reader034.vdocuments.site/reader034/viewer/2022051707/58ae90761a28abdf068b5af5/html5/thumbnails/9.jpg)
ScanCode Toolkit Roadmap▷ New scans for software packages (RPM, NPM, Gems, Java Jars,
Debian, Nuget, Python, etc.) ▷ Approximate license detection▷ SPDX license expressions▷ Speed improvements▷ See https://github.com/nexB/scancode-toolkit/wiki/Roadmap
![Page 10: Identifying third party software with ScanCode](https://reader034.vdocuments.site/reader034/viewer/2022051707/58ae90761a28abdf068b5af5/html5/thumbnails/10.jpg)
ScanCode AppWhat we’ve been working on!
![Page 11: Identifying third party software with ScanCode](https://reader034.vdocuments.site/reader034/viewer/2022051707/58ae90761a28abdf068b5af5/html5/thumbnails/11.jpg)
ScanCode App
Motivation:
▷ Analyze ScanCode results▷ Document your conclusion about the
provenance and license for a software component.
▷ Save conclusions▷ Share results
![Page 12: Identifying third party software with ScanCode](https://reader034.vdocuments.site/reader034/viewer/2022051707/58ae90761a28abdf068b5af5/html5/thumbnails/12.jpg)
ScanCode Conclusions
Document Component-level conclusions such as:▷ Component Name▷ Component Version▷ Component Owner▷ Concluded License▷ Concluded Copyright
![Page 13: Identifying third party software with ScanCode](https://reader034.vdocuments.site/reader034/viewer/2022051707/58ae90761a28abdf068b5af5/html5/thumbnails/13.jpg)
Preview of ScanCode App
![Page 14: Identifying third party software with ScanCode](https://reader034.vdocuments.site/reader034/viewer/2022051707/58ae90761a28abdf068b5af5/html5/thumbnails/14.jpg)
Summary of Features
▷ View results in tree or tabular view▷ Add conclusion data at any node of the
existing codebase hierarchy▷ Save Components and conclusions to a
JSON file
![Page 15: Identifying third party software with ScanCode](https://reader034.vdocuments.site/reader034/viewer/2022051707/58ae90761a28abdf068b5af5/html5/thumbnails/15.jpg)
Thanks!Any questions?
![Page 16: Identifying third party software with ScanCode](https://reader034.vdocuments.site/reader034/viewer/2022051707/58ae90761a28abdf068b5af5/html5/thumbnails/16.jpg)
CreditsSpecial thanks to all the people who made and released these awesome free resources:
▷ Presentation template by SlidesCarnival▷ Photographs by Unsplash▷ And all the software authors who made ScanCode possible
![Page 17: Identifying third party software with ScanCode](https://reader034.vdocuments.site/reader034/viewer/2022051707/58ae90761a28abdf068b5af5/html5/thumbnails/17.jpg)
About nexB Inc.
We offer:
▷ DejaCode™- Open Data Platform for Managing Open Source - http://www.dejacode.com/
▷ Open Source Scanning & Tracking Tools - https://github.com/nexB
▷ Open Source Software Expert Audit Services - http://www.nexb.com/services.html
![Page 18: Identifying third party software with ScanCode](https://reader034.vdocuments.site/reader034/viewer/2022051707/58ae90761a28abdf068b5af5/html5/thumbnails/18.jpg)
ScanCode Details
▷ ScanCode by the numbers▷ What is scanning?▷ How does ScanCode work?
![Page 19: Identifying third party software with ScanCode](https://reader034.vdocuments.site/reader034/viewer/2022051707/58ae90761a28abdf068b5af5/html5/thumbnails/19.jpg)
Over 6,000 tests
Over 500 large software products scanned
Over 3,000 licenses, notices and samples
ScanCode by the numbers
![Page 20: Identifying third party software with ScanCode](https://reader034.vdocuments.site/reader034/viewer/2022051707/58ae90761a28abdf068b5af5/html5/thumbnails/20.jpg)
ScanCode Toolkit- Technology
▷ Written primarily in Python○ also JavaScript, Ruby, Java and C/C++
▷ Tested on Linux, OS X and Windows▷ Command line tool or library▷ Simple HTML browser-app (any modern
browser) - runs locally
![Page 21: Identifying third party software with ScanCode](https://reader034.vdocuments.site/reader034/viewer/2022051707/58ae90761a28abdf068b5af5/html5/thumbnails/21.jpg)
ScanCode App - Technology
▷ Based on Electron and written primarily in JavaScript
▷ D3.js used for data visualizations
![Page 22: Identifying third party software with ScanCode](https://reader034.vdocuments.site/reader034/viewer/2022051707/58ae90761a28abdf068b5af5/html5/thumbnails/22.jpg)
What is Scanning?
Detect and discover “evidence” of origin and license in code (source or binary files)
▷ Copyright notice▷ License notice and/or license test▷ Software package manifests▷ Email, URL, author or other names▷ Other origin and license clues found in the
code
![Page 23: Identifying third party software with ScanCode](https://reader034.vdocuments.site/reader034/viewer/2022051707/58ae90761a28abdf068b5af5/html5/thumbnails/23.jpg)
Scanning is not Matching
Matching looks for similarities between your code and an index (digital fingerprints) of OSS code
▷ If your code is similar it “may” share a similar origin
▷ Matching may be applied at multiple levels○ Package○ File or snippet
![Page 24: Identifying third party software with ScanCode](https://reader034.vdocuments.site/reader034/viewer/2022051707/58ae90761a28abdf068b5af5/html5/thumbnails/24.jpg)
Scanning plus Matching
▷ Scanning will identify origin and license in most cases, but○ Does not detect copying of snippets, or○ Intentional stripping of notices, etc.
▷ Matching can identify code that was copied and/or stripped, but
○ Typically produces MANY false
positives and requires extensive review
○ Especially for the most commonly used
OSS projects
![Page 25: Identifying third party software with ScanCode](https://reader034.vdocuments.site/reader034/viewer/2022051707/58ae90761a28abdf068b5af5/html5/thumbnails/25.jpg)
How does ScanCode work? (1)
▷ Each file is categorized based on its type▷ Archives and compressed files are fully extracted▷ The text of each file is collected (source and binaries)▷ Each file's text is then "scanned"▷ Results are formatted and returned as a JSON file▷ You can view the results in a browser, or▷ Use the JSON file as you want
![Page 26: Identifying third party software with ScanCode](https://reader034.vdocuments.site/reader034/viewer/2022051707/58ae90761a28abdf068b5af5/html5/thumbnails/26.jpg)
How does ScanCode work? (2)
▷ For licenses, the techniques are similar to DNA analysis with multi-pattern matching
▷ Licenses are found exactly or approximately based on a set of thousands of license texts, notices and examples
▷ For copyrights, a syntax and grammar analyzer captures the many forms of copyright statements
▷ Emails, URLs, authors, person names and other data are captured using similar pattern matching techniques
![Page 27: Identifying third party software with ScanCode](https://reader034.vdocuments.site/reader034/viewer/2022051707/58ae90761a28abdf068b5af5/html5/thumbnails/27.jpg)
Alternatives and complements
▷ Open source such as:○ Fossology (c, PHP): regex-based○ ninka (Perl): regex & sentences-based○ OSLC (Java, unmaintained)
▷ Commercial ▷ Complementary:
○ AboutCode: document origin side-by-side with code, collect inventory, generate attribution doc
○ TraceCode (not yet released): trace the source to binary transformation to find (static) linking and what is the subset of the source code used (dynamically trace a build or does a static analysis)