yuwu chen wastewater treatment

19
The urban wastewater treatment Yuwu Chen Department of Chemical Engineering 12/4/2014

Upload: yuwu-chen

Post on 18-Jul-2015

49 views

Category:

Engineering


8 download

TRANSCRIPT

Page 1: Yuwu chen wastewater treatment

The urban wastewater treatment

Yuwu Chen

Department of Chemical Engineering

12/4/2014

Page 2: Yuwu chen wastewater treatment

Introduction

Wastewater treatment is the process of removing contaminants from wastewater

Page 3: Yuwu chen wastewater treatment

Introduction

Water quality index

Chemical oxygen demand (COD): the amount of dissolved oxygen needed by a

strong oxidizing agent water to break down organic material present in a given

water sample at certain temperature over a specific time period.

Biological oxygen demand (BOD): the amount of dissolved oxygen needed by

aerobic biological organisms in a body of water to break down organic material

present in a given water sample at certain temperature over a specific time

period.

They indirectly measure the amount of organic compounds in water. COD and

BOD should be correlated.

Suspended solids (SS)

Volatile supended

Sediments (SED)

Inorganic element (N-NH3, P, S etc)

pH

Directly measure the amount of a certain contaminant in water

Page 4: Yuwu chen wastewater treatment

Data Description

The dataset comes from the daily measures of sensors in a urban wastewater treatment

plant.

The data was collected by Manel Poch at Universitat Autonoma de Barcelona. Bellaterra.

Barcelona; Spain

The full dataset was donated by Javier Bejar and Ulises Cortes at Universitat Politecnica

de Catalunya. Barcelona; Spain, and is available at:

http://archive.ics.uci.edu/ml/machine-learning-databases/water-treatment/

Page 5: Yuwu chen wastewater treatment

Data Description

Date

In dd/mm/yy format: 1/1/90 to10/30/91. Some days in this period are not

included.

Water volume

The daily flow volume to the plant in m3: 10005 to 60081

Water quality index (28 variables)

Water quality index were recorded before and/or after a process step.

BOD, COD, SS, SSV, SED ...

Performance (9 variables )

Performance variables were directly calculated from water quality index. They

can be used to evaluate the performance of each process unit. 0.6% to 100%

Page 6: Yuwu chen wastewater treatment

Data Description

Page 7: Yuwu chen wastewater treatment

Data Management Data transformation

The original variable “date” is characteristic and too long. So I transform it to

a categorical variable “day”:date day

1/1/1990 1

2/1/1990 2

……

30/10/1991 668

Then rename the row name of the data-frame with the variable day.

Correct the wrong format in the variable BOD.in3

Subset data

In this study, five water quality index of influent/effluent were used: pH, COD, BOD,

SS, SED.

Omit the missing value in each subset

Pretreatment Primar

y

Secondar

y

influent2 influent3 effluentinfluent1

Page 8: Yuwu chen wastewater treatment

Data Summary Paired plot example: influent1 (influent to the pretreatment unit)

Page 9: Yuwu chen wastewater treatment

Method Description Step 1: Principle component analysis (PCA) on each influent/effluent subset

Visualize the data to see the relationships among the observations and

variables in low dimensions

Step 2: Clustering days based on the daily performance

Identify subgroups of similar days based on the daily performance of each

process unit or the whole plant

Page 10: Yuwu chen wastewater treatment

Step 1: Principle component analysis (PCA)

on influent1 subset Principal component loading vector of influent1 (influent to the pretreatment unit)

Proportion of variance explained (PVE) by each PC and cumulative PVE

1 2 3 4 5 6

0.0

0.2

0.4

0.6

0.8

1.0

Principal Component

Pro

port

ion o

f V

ariance E

xpla

ined

1 2 3 4 5 6

0.0

0.2

0.4

0.6

0.8

1.0

Principal Component

Cum

ula

tive P

roport

ion o

f V

ariance E

xpla

ined

Page 11: Yuwu chen wastewater treatment

Step 1: Principle component analysis (PCA)

on influent1subset Biplot for influent1

Page 12: Yuwu chen wastewater treatment

Step 1: Principle component analysis (PCA)

on other three influent/effluent subsets Biplots for other three influent/effluent subsets

-2 0 2 4 6

-20

24

6

PC1

PC

2

2

34

7

8910

11 12

14

15

1617

1819

21

2223

24 25

26

28

293033

35

363738

394042

43 4445

4647

49

50

52

5354

566466

67

6870

71727374

7577

78

79

80

8182

8485

8687

88

8991

9293

94

9596

98

99

100

101

106

107

108109112

113

114

115116

117

119121

122123124

126

128

129

130

131133

134

135

138140

141142

143

144145

147148

149 150

152

154

155

156157

158 159

161

162163

164

165166

168

169170171

172

173175

176

177

178

179180

182

183

184

185

186

187189

190

191

192193194

196

197

198

199

200201

203

204

205

206

207208

210

212213

214

215217

218219

220

221222225

231232

233

234

235236

239240

241

242

243

245

246

247

248

249250

252254

255256

257

259

260261

262263264

266267

268

269270

271

273

274

275

276

277278280

281282283285

287

288289

290 291 292

294

295

296

297298

299

308

309310

311312313

315

316

317318

319322

323324

325

326

327

329

330 331

332333334

336337

338

340

343

344

346347

350351 352353

354

355

357

360

361364

366

367

368369

371

372

373374

375

378

379

380381382

383

385

386387388389

392

393

394

395

396

397

399

400

401 402403

406

407408409410

411

413

414

415

417

420

421422423

424425

427

428429

430431

434 435436

437438

439441

443444

445448

449

450

456 457458

459

460

462

463464

465

466

469

470471472

473474

476

477

478480

483

484486

487

488490

491

492

493494

497

498

499500

501502504

505506 507

508511

512

513514515516

518 519520

521

522525

526528529532

533534535

536537

540

541542543544

546547

548

549550

553

554

555

556

578579

581

582

583

584585

588

589590

591593

596597

598599600

603 604

605606

639640

641

642644

646

647649

650

651

653

654

656

657658

660

661

667

-0.5 0.0 0.5 1.0 1.5

-0.5

0.0

0.5

1.0

1.5

volume

pH.in3

BOD.in3COD.in3

SS.in3

SED.in3

0 5 10

05

10

PC1

PC

2

3 4

7

11

1214

15161718

19

21

22

23

242526

28

2930

33

35

37

383940

46

47

49

50

52

5354

5664

70

71

72

73

7475

7778

79

81

8284

858687

88

8991

9293

949596

98 99100101

106107

108109110

112

113

114115 116117119

121122

123

124

126

127128129

130131133

134

135

138

140

141142

143144145147

148149

150152154

155

156

157

158159

161

162163164

165

166

168

169170171

172173

175

176 177178179180

182

183

184

185186

187

189

190

191

192193194196

197198 199200201203 204

205

206207

208

210

211

212213214

215

217 218219

220

221222224

225227231232

233234

235

239240

241

242

243

245

246247248

249250252254

255256

257

259

260261262

263264266

267268

270271

273274275276277

278

280

281282

283

285287

288289

290

291292

294295

296 297299

306308

309

310311312

313

315

316317318

319

322

323 324325326327

330

331332

333334336

337338340

343

345346347

350351

352

353

354 355

357

360

361364366367368

369371

372

373 374375

378

379

380

381382383

385

386387

388389

392

393394

395

396

397

399400

401

402 403

406

407408409410

411

413

414415

417420421

422

423

424

425

427428429430431434

435

436437438

439441442

443

444445

448

449

450

451

456

457458

459

460

462

463464

465

466

469

470

471472

473474476477

478

479480

483

484486

487488

490

491492

493

494497

498499500 501

502504

505506

507

508

511512

513

514515

516518519520

521522

525 526528529

532

533534

535

536537

540

541542543

544546547548

549550

553554555

556

578579581

582

583

584

585

586

588

589590

591593

595

596597598599

600603

604

605606

639640641643

644

646647649

650

651653

654656657658

660661662

663665667

-0.2 0.0 0.2 0.4 0.6 0.8

-0.2

0.0

0.2

0.4

0.6

0.8

volume

pH.in2

BOD.in2

SS.in2

SED.in2

Influent2 (pretreatment >> primary) Influent3 (primary >>

secondary )

Effluent (out of plant)

0 5 10 15 20 25

05

10

15

20

25

PC1

PC

2

1234

7

891011121415

161718192122

2324

2526

28

293033

35

36 3738394042

4345464749

505253

545664656667

6870

71

7273

7475777879 80 8182

84

8586

87888991

9293

9495

96

9899

100101

106107108

109

112

113114

115 116117 119121

122123124126127

128129

130131133134135137138140

141 142

143

144145147148

149150

152154

155

156157

158 159

161162163

164165

166169170171172173

175

177

178179180

182

183184185186187189190

191192

193194196

197198199200201

204205206208

210212213

214

215217

218219220221222224225227228229231232233234235236

240242

243

245246

247248

249250252254255257

259

260261262263264266267

269270

271

273274

275

276277278280

281282283285287

288289290291292294

295

296

297298299308

310311312313315316

317318

319322

323324325

326327329

330331

332333334336337338340343344 346347

350351352353354355

357360361366367368369371372

373374

375

378

379380 381

382383

385386387388389392393394395396397

399400401402403408409410411

413414415

417420421422423

424425427

428429430

431434435436437438439

441442443444445

448450456457458459460

462463464465

466

469470471472473474

476477

478479480

483

484486487

488490491492

493494497498

499500501502 504

505506507508511512513514515

518519520521522

525526528529532533534535536537

540541542543544546547548550

553555

556578579582

583584

585586

588

589590591593

595

596597598599600603604

605606639640641642644646

647649

650651

653654655656

657 658660

661663664665667

0.0 0.5 1.0

0.0

0.5

1.0

volume

pH.out

BOD.out

COD.out

SS.out

SED.out

Page 13: Yuwu chen wastewater treatment

Step 2: Clustering days based on the daily

performance What dissimilarity measure should be used to cluster the days?

If Euclidean distance is used, then days when the process unit/the whole plant

have similar overall performance will be clustered together (Yes, this is

desirable).

if correlation-based distance is used, then days with similar “preferences” (e.g.

days when have better BOD and COD performance but worse SS and SED

performance) will be clustered together, even if some days with these

“preferences” were better overall performance than others

Scale to the unit variance or not?

Data must be scaled, otherwise the water volume will dominate.

Hierarchical clustering will be used.

K-means or K-medoids?

K-medoids is more robust than K-means in the presence of outlier

Page 14: Yuwu chen wastewater treatment

Hierarchical clustering: Average linkage

74

403

116

222

149

162

448

378

224

219

147

430

142

191

437 9

166

325

148

33

22

270

177

85

282

86

330

260

505

94

93

96

122

667

236

235

654

595

329 7

525

420

327

582

534 3

518

205

352

544

112

488

478

555

591

152

65

45

108

383

184

91

190

507

506

387

266

285

355

463

277

371

201

439

199

547

350

589

550

500

435

511

457

198

374

197

502

99

492

200

140

470

476

332

583

597

422

606

519

14

02

46

810

12

average linkage

Heig

ht

Page 15: Yuwu chen wastewater treatment

Hierarchical clustering: Complete linkage

74

403

116

222

149

378

122

667

235

236

152

591

65

45

108

534 3

518

205

352

544

112

488

478

555

654 7

420

327

582

595

329

140

470

476

332

14

525

583

422

606

200

597

519

162

448

224

219

147

430

142

191

437 9

166

325

96

93

260

505

94

148

33

22

86

330

85

282

270

177

457

374

197

435

198

492

511

502

99

387

266

285

355

463

371

350

277

589

550

500

201

439

199

547

383

184

91

190

507

506

05

10

15

complete linkage

Heig

ht

Page 16: Yuwu chen wastewater treatment

Hierarchical clustering: Single linkage

74

403

116

378

222

149

448

162

96

147

235

437

93

33

22

148

85

282

219

177

654 9

236

430

142

191

224

595

329

270

166

325

260

505

94

86

330

591

152

355 7

506

534 3

507

91

190

285

45

108

65

463

544

184

383

420

327

582

205

518

352

555

478

112

488

511

371

14

435

525

200

492

277

476

457

332

201

439

199

350

547

589

550

500

198

374

197

502

99

583

140

470

519

597

422

606

387

266

122

667

02

46

810

single linkage

Heig

ht

Page 17: Yuwu chen wastewater treatment

K-medoids clustering

0 5 10 15 20

-50

5

clusplot(pam(x = sdata, k = k, diss = diss))

Component 1

Com

ponent

2

These two components explain 80.33 % of the point variability.

Silhouette width si

-0.4 -0.2 0.0 0.2 0.4 0.6 0.8 1.0

Silhouette plot of pam(x = sdata, k = k, diss = diss)

Average silhouette width : 0.37

n = 430 2 clusters Cj

j : nj | avei Cj si

1 : 149 | -0.01

2 : 281 | 0.57

0 5 10 15 20

-10

-50

5

clusplot(pam(x = globalscale2, k = 3))

Component 1

Com

ponent

2

These two components explain 80.33 % of the point variability.

Silhouette width si

-0.4 -0.2 0.0 0.2 0.4 0.6 0.8 1.0

Silhouette plot of pam(x = globalscale2, k = 3)

Average silhouette width : 0.19

n = 430 3 clusters Cj

j : nj | avei Cj si

1 : 91 | -0.17

2 : 157 | 0.25

3 : 182 | 0.32

Page 18: Yuwu chen wastewater treatment

Conclusion

Water quality index and flow amount of influent/effluent

have been visualized by PCA to see the relationships

among the observations and variables in low dimensions.

Clustering methods have been used to identify subgroups

of similar days.

Page 19: Yuwu chen wastewater treatment

Reference

``Avaluacio de tecniques de classificacio per a la gestio de Bioprocessos: Aplicacio a un

reactor de fangs activats'' Master Thesis. Dept. de Quimica. Unitat d'Enginyeria Quimica.

Universitat Autonoma de Barcelona. Bellaterra (Barcelona). 1993.

``LINNEO+: A Classification Methodology for Ill-structured Domains''. Research report RT-

93-10-R. Dept. Llenguatges i Sistemes Informatics. Barcelona. 1993.

``A knowledge-based system for the diagnosis of waste-water treatment plant''.

Proceedings of the 5th international conference of industrial and engineering applications of

AI and Expert Systems IEA/AIE-92. Ed Springer-Verlag. Paderborn, Germany, June 92.