MSV FM

dot.antimicrobial@66.96.161.157: ~ $
Path : /hermes/sb_web/b744/davidmckiecom.netfirms.com/
File Upload :
Current < : /hermes/sb_web/b744/davidmckiecom.netfirms.com/311_make table_load_select queries_updated.sql

Create schema 311_2020;

#### this is code that instructs MySQL to use te 311_2020 schema. You can accomplish the same task by right-clicking on the schama label ("311_2020") and chosing the "Set as Default Schema" option
#### from the shortcut menu

use 311_2020;

#### Before you import a file in MySQL, you have to create a table, which we will do here programmatically, using the CREATE TABLE SQL statement. 
#### You need the following information: the name of the table after the key words "CREATE TABLE" (which is not case-sensitive); underneath CREATE TABLE
#### you need the name and definition of the table columns separated by commas######
#### This CREATE TABLE statement has the bare minimum you need to create a table: a table name, and column details #####
#### Parenthesis enclose all the table definition: that is, all the columns. And the columns themselves are separated by commas, except for the last one.
#### Each colum name is followed by a the column's datatype, much like formatting values in Excel. You'll find a description of the data types on page 83 in Chapter 5 of The Data Journalist
#### In this query, the two data types are "varchar". Short for "variable character". This data type is the most common, in large part becuase it accommodates text, numbers and many other character types 
#### The number in brackets dicates the number of characters the column can accommodate. If, for example, the data type is a one-digit code, then the data type would be "varchar(1)" 
#### If you're creating a column for long titles, you'll need more characters, as is the case with the Subject, Reason and Type columns. 
#### The second data type in this query is a date
### Once you have created all the columns for the table in the csv file you're about to import, you must omit the comma and close the SQL statement with a parenthesis

CREATE TABLE if not exists 311_master(
Subject varchar(200),
Reason varchar(200),
Type varchar(100),
Date_raised date,
Channel varchar(20),
ward varchar(50)
);

##### this query allows you to drop the table if a mistake is made. Once dropping, or deleting, the table, you can recreate it by running the CREATE TABLE query, clicking the thunderbold query with the "I" in the middle, which allows you to execute the statement. 
drop table 311_master;

#### We need this "LOAD DATA LOCAL INFILE" command to import the table from your hard drive. As we explain on page 88 of The Data Journalist, LOAD DATA LOCAL INFILE is extremely fast, and can import a 
#### table with millions of records within seconds or minutes, depending on your hard drive's spee. Since we're already using a pre-existing file for this exercise, there's no need to import anything from your hard drive. 
#### The syntax for the LOAD DATA LOCAL INFILE COMMAND IS AVAILABLE IN THE MYSQL HELP FILE, or page 10 of the "Making Tables and Importing Data into MySQL" tutorial that accompanies Chapter 5. 
#### What we've done is load a csv file for each year into the same table. This is possible because the table for each year is structured the same way. Otherwise, we would have to do some clean-up. 

LOAD DATA LOCAL INFILE  
'c:\\SR-2020.csv'
into Table 311_master Fields terminated by ',' enclosed by '"' Lines terminated by '\n'
 Ignore 1 Lines
 (Subject, Reason, Type, @date_raised, Channel, ward)
 set date_raised = str_to_date(@date_raised,'%d-%b-%y');
 
 load data local infile 
'c:\\SR-2019.csv'
into Table 311_master Fields terminated by ',' enclosed by '"' Lines terminated by '\n'
 Ignore 1 Lines
 (Subject, Reason, Type, @date_raised, Channel, ward)
 set date_raised = str_to_date(@date_raised,'%d-%b-%y');
 
 
  load data local infile 
'c:\\fullsr-2018.csv'
into Table 311_master Fields terminated by ',' enclosed by '"' Lines terminated by '\n'
 Ignore 1 Lines
 (Subject, Reason, Type, @date_raised, Channel, ward)
 set date_raised = str_to_date(@date_raised,'%d-%b-%y');
 
 
   load data local infile 
'c:\\fullsr-2017.csv'
into Table 311_master Fields terminated by ',' enclosed by '"' Lines terminated by '\n'
 Ignore 1 Lines
 (Subject, Reason, Type, @date_raised, Channel, ward)
 set date_raised = str_to_date(@date_raised,'%d-%b-%y');
 
    load data local infile 
'c:\\fullsr-2016.csv'
into Table 311_master Fields terminated by ',' enclosed by '"' Lines terminated by '\n'
 Ignore 1 Lines
 (Subject, Reason, Type, @date_raised, Channel, ward)
 set date_raised = str_to_date(@date_raised,'%d-%b-%y');
 

#### Once we have loaded all the csv files into our master table, we can begin running queries, the first being a count to see how many records it contains. 
#### The SELECT QUERY is discussed on pages 90-91 of The Data Journalist. There are six, main clauses (not case-sensitive) in a select query: SELECT; FROM; WHERE; GROUP BY; HAVING; ORDER BY; 
#### The first two clauses -- SELECT AND FROM -- are required. Because we may not want load the entire an database with a million-plus records, we can put a "LIMIT" on the number of rows; in this case "1000"
#### "LIMIT" 1000, restricts the output to 1000, which allows you to make sure all the datatypes check out and everything in the table is where it should be.
 
select count(*)
from 311_master;

select *
from  311_master
limit 1000;

#### As we saw in Excel, we never want to use the entire table; instead, we're always drilling down to obtain subsets of the data, by filtering, sorting, or using pivot tables. 
### In MySQL, the "WHERE" statement is your filter where the heavy lifting is done. The "WHERE" statement places conditions. If you're unsure about keywords from your table to use in the "WHERE" statement, simply run the query above again.

select *
from 311_master
limit 1000;

#### We can see that the most  interesting information is contained in the "Type" column. And by clicking on the "Type" title, we can sort the column in alphabetical order which makes it easier to see
#### the different categories while scrolling down. The term "Animal" is combined with other descriptions such as "Lost Pet, or "Too Many". There also lots of dog complaints, and different categories the deeper
#### you descend into the alphabet. Since animal seems to be common in the first 1000 rows, it might be interesting to query all the animal-related complaints for the entire dataset.
### To do this, we need a WHERE statement. If we want to grab all the animal complaints, we'll have to use a wildcard search after the WHERE statement, as you can see in the syntax below, which isalso described
### on page 93 of The Data Journalist. In this case, the statement is "WHERE TYPE LIKE '%animal%'. "LIKE" makes it possible to search for text within a field using the % sign as a wildcard, which must be placed
### on either side of your keyword. The % sign on either end of the word, and the word itself, MUST all be within SINGLE quotes. If you to use quotation marks, the query will not work. 
#### Now, let's run it. 

select *
from 311_master
where Type like '%animal%';

#### we have a table with 12,114 rows. Each row represents a complaint. 
##### if we were interested in a subject of this query, for instance "Lost Pet", then we could modifty ouf 	WHERE statement in a new query. 

select *
from 311_master
where Type like '%Lost Pet%';

### The table is now reduced by less than half to 4,020 records or complaints. It's important to stress that each record represents a complaint, not a person. 
### given that the categories in "Type" column seem to yield the best results, it might be helpful to group the next query by Type of complaint and count the number of complaints in each category.
### this query allows us to determine the most common complaints and whether animals make the top-10 or top-20
### To do this, we will use a "GROUP BY" clause, covered on pages 98-99 in The Data Journalist. It might help to undersand the GROUP BY statement by comparing it to a pivot table in Excel
### The same grouping and counting is on display, as you can see in the query below. 
### In it, we can to count the number of records "count(*)", and select the "Type" column; from the 311_master table, we will GROUP BY type (the column after the GROUP BY CLAUSE, MUSTalso be in the SELECT clause
### And then we want to order the count in descending "DESC" order. 

select count(*), type
from 311_master
group by type
order by count(*) desc;

### We get the result we want, but the count column is awkwardly named. Let's fix that by replacing "count(*)" with an alias, 'Number of complaints', which must be surrounded by single quotes because it has spaces
### Typically aliases are assigned with the AS keyword, as you can see in the query below
### However, You can omit the "AS" as your keyword, but your alias needs quotes if it has spaces. 

select count(*) AS 'Number of Complaints', Type
from 311_master
group by type
order by count(*) desc;

### That's a better result. See if you can use another query using the same steps to replace "type" with a more descriptive name. 

#### grouping and counting by "Type" makes it possible to identify the most common complaints, say the top-10 or top-20. 
#### But you can also use the "where" statement to narrow that grouping. 
#### Let's stick with the "animal example". You may want groups that have anything to do with animals

select count(*) AS 'Number of Complaints', Type
from 311_master
where Type like '%animal%'
group by type
order by count(*) desc;

### We get five results with the last category being the most bizarre. 

#### let's try another combination of queries, beginning with a simple select query which produces all the dog-related complaints. 

select *
from 311_master
where Type like '%dog%';

### we get 35,927 results.
### The query is okay, but could be more refined in order to see if there's a trend with dog-related complaints. Again, think pivot tables. We might want to group count
### the complaints by year to see if the numbers are increasing or decreasing. 
### in the first select statement, you want to create column that counts each record, then you want to use the "Year" function to pull the year out of each date, and use aliases for both new columns

select count(*) AS 'Number of dog complaints', year(Date_raised) AS Year
from 311_master
where Type like '%dog%'
group by Year
order by 'Number of dog complaints' desc;

#### given that 2020 is still an incomplete year, we can see that the numbers dipped from 2016 to 2017, and then increased sharply from 2018 to 2019. 
### Though we should always be suspicious of sharp increases or decreases, we could be looking at a potential story that seeks to find reasons behind the increase
#### But let's refine the search a bit more by grouping the complaints by "Ward", in addition to Year.
#### To do this, we will have to include the Ward column in the first select statement, and also add it to the "Group by" statement. 
 
select count(*) AS 'Number of dog complaints', year(Date_raised) AS Year, Ward
from 311_master
where Type like '%dog%'
group by Year, Ward
order by 'Number of dog complaints' desc;

### Now that you've hopefully got the hang of writing simple select queries, try some of your own. 
### If you're unsure of the categories in the columns, re-run the select query that begins on line 83