watching the detectives forensic information in digital objects (fido)
TRANSCRIPT
Watching the Detectives
Forensic Information in Digital Objects (FIDO)
KCL Facts• 5 million archives
(including artefacts, images, sound recordings and databases)
• 295,000 rare/special books• Spans 6 centuries (most from 18th Century
onwards)• Wide range of subjects, formats and languages• Internationally and nationally recognised• Whole collection valued at £81,000,000• Liddell Hart Centre for Military Archives and
Foyle Special Collections library
Information Management Team
Responsible for advice and support for:
• Content creation
• Active management during business use
• Retention for legal or business purposes
• Digital archiving and preservation
JISC FIDO Project
• 6 month project in 2011• Investigation of tools to aid data acquisition, file
identification & process documentation• Case study to report findings & lessons learnt • Mapping of forensic terms to archival terms • Address ethical issues of the approach• Establish suitable computer hardware and tools to
assist in newly defined digital acquisition process
Why digital forensics?
• Forensic investigation is an emerging profession developing tools that map user activity to legal admissibility standards
• Digital collections can be large and difficult to appraise – forensic tools can provide analysis of file characteristics and document what is done & when
• Forensic tools can provide contextual information such as a timeline or file types for initial appraisal
• Authenticity – Archivists need to capture authentic digital collections - forensic tools can support this process
Digital forensics vs Digital appraisal
• Different language – terms mean different things to each practitioner
• Confidence & skills – Digital archive skills are much closer to forensics or IT than traditional skills
• Forensics are dealing with potential crime scene – archivists work with the co-operation of the depositor
• Forensics want all available information including deleted documents & browser history whereas archivists may only have consent to take files defined by the donor
Ethical Issues
• Does the depositor know the collection?• A forensic image will capture everything!• Is e-mail included in the deposit?• Do all family members agree to the deposit?• Does the depositor own the copyright?• Is there unpublished work that might be published
after deposit?• Are computers included or just their contents?
Technical Issues
• Data transfer or recovery
• Level of rights required for tasks
• Additional hardware/software familiarisation
• New skills for archives staff
• Redaction
• Finding new software for particular tasks
Data handling workflowAcquire
Analyse
Appraise
Archive
Obtain data from depositor / donor
Examine the acquired data to locate user generated content
Appraise data to select data of potential value to the institution
Transfer selected data into digital repository for curation &
preservation
Data Acquisition Methods
1. File copy: Files are copied/moved from the donor’s media to AIM-owned storage, e.g. FTP, DVD-R, hard disk
2. Disk clone: Bit copy of files on source disk copied to mirror disk
3. Disk image: Bit copy of disk is created and stored as a file on other media.
Different HardwareDifferent Media
12
Is the disk installed in a computer?
Locate media reader &
create disk image
No Other
No
Does the machine possess appropriate
ports (e.g. USB/Firewire) to allow connection of an
external HD?
Yes
ATA/IDE or SATA
What type of connectors does it
have?
Install into portable disk
enclosure
Are you able to perform a network
capture?
Boot from media & perform imaging
Yes
Obtain appropriate
reader device
Are you able to boot from disk/optical media
& perform capture?
No Yes
Do you have permission to remove the disk from
the machine & is it physically possible ?
Perform capture via host system
Capture disk image
using network capture
No Yes
No Yes
Copy files to disk. Notify
donor that some content may be
missed
What type of media do you wish to image?
Removable media(e.g. floppy, CD-ROM, USB stick,
etc.)
Hard disk
13
Data held on digital media• Types:
– Operating system files, e.g. Windows has 30,000+ after fresh install
– Software: Applications, utilities, games, etc.– Log data: Windows Registry, browser cache, cookies, temp files– User-generated content: Documents, images, sound, emails, etc.
• Data layers:1. Active data: Information normally seen by Operating System2. Inactive/residual data: deleted or modified data
• Deleted files located in unallocated space that have yet to be overwritten (retrieved using undelete application)
• Data fragments that contains information from a partially deleted file (retrieved through carving)
Usefulness of Inactive data still to be seen
Active Data Analysis
Common techniques:• Navigate directory structure to get a ‘feel’
for data files held on disk• Search by:
• File name, e.g. *report*• File type, e.g. *.doc, *.pdf, etc.• Creation/modification date• Content type, e.g. word usage• File size
• Windows search does not identify everythinginvestigation process leaves artefacts, e.g. thumbs.db behind
OS Forensic search interface for
active filesSort by:
•Name,
•Folder,
•Size
•Type,
•Creation date,
•Modification date,
Recovering deleted files
Recovering partial/complete files
•Undelete\File recovery software searches unallocated space and makes found files available.
Recovering Data Fragments
•Data carving technique - raw bits of disk analysed to identify recognisable patterns that may indicate a data file, e.g. header/footer, semantic information.
– Carving software designed to take a linear approach to locating data files – ineffective on fragmented disks
– Creates Franken-Files! – incomplete files, large files containing info from multiple sources, extracts embedded images from PowerPoint's, etc
Keyword Search
• Scan the content of a disk, including all emails, documents and other text content, to locate a particular search term
• Commonly used by police to identify illegal content, e.g. bank numbers, telephone numbers
Archival use:• Does the disk contain reference to topic X?
• What trends may be identified in use of concept – when did term
appear and disappear?
Analysis of research behaviour
•Hard disk may contain other information:
– Web sites visited/bookmarked for research
– Chat logs indicating discussion with colleagues
– Other digital media that may have been used to store data
This may be useful for understanding researcher work process, but consider the ethical issues
What type of information do you
wish to locate on the drive?
Do you know what keywords should be
used?
Examine event logs for devices connected/
disconnected
What level of analysis are you permitted to
perform?
Contact/research donor
Perform search of active & inactive
(deleted) files
Do you have any additional criteria for
user content?Create & search index
Perform file search of specific file types
Data created/modifed before/after/between
a set dateSpecific objecttypes/formatsNone
Perform file search of common file types
Perform file search with additional date
parameters
Full searchIncluding active,
Deleted &fragments
Only readily available files (active files)
Available & deleted files
Specific information on a
topic User created
data files
Information about othermedia on which data
may be stored
YesNo
Forensic Hardware1) Desktop PC
Intel Pentium Dual Core E5800 CPU (3.20Ghz)
2GB DDR
500GB HD
Super multi DVD-RW
(2) USB Write Blocker
Prevents OS writing to connected devices
(4) Kryoflux USB
Floppy disk controller to enable attachment of disparate disk devices & forensic imaging
(3) Drive enclosure
Enables connection of internal ATA/SATA disks via USB
Access to digital collections
• Publication of summary guide
• Folder hierarchy to give overview of collection
• Ability of researchers to search across file lists/index to identify information
• Access to whole digital collection?
• Policy regarding number of files, what access, copies still to be determined
Next steps
• Working with desktop support to capture images• Drafting new advice for depositors• Encouraging depositors to deposit their digital
records• Working with College Senior staff to capture their
personal papers and research data throughout their career
• Improving skills within the AIM team – especially Mac skills
• Preserving digital records in our collections