Your name: Jerry Liu
Your UID: 404474229
Please upload only this notebook to CCLE by the deadline.
Policy for late submission of solutions: We will use Paul Eggert's Late Policy: $N$ days late $\Leftrightarrow$ $2^N$ points deducted} The number of days late is $N=0$ for the first 24 hrs, $N=1$ for the next 24 hrs, etc., and if you submit an assignment $H$ hours late, $2^{\lfloor \,H\,/\,24\, \rfloor}$ points are deducted.
Remember the $k$-th note in the the equal-tempered scale has frequency $c^k$ times the frequency of C, where $c = 2^{1/12} = 1.059463\ldots$:
note | frequency (Hz) | frequency/C | interval |
---|---|---|---|
Middle C | 261.63 | $c^0 = 1.000$ | ~ |
C# | 277.18 | $c^1 = 1.059$ | minor 2nd |
D | 293.66 | $c^2 = 1.122$ | major 2nd |
D# | 311.13 | $c^3 = 1.189$ | minor 3rd |
E | 329.63 | $c^4 = 1.260$ | major 3rd |
F | 349.23 | $c^5 = 1.334$ | 4th |
F# | 369.99 | $c^6 = 1.414$ | diminished 5th |
G | 392.00 | $c^7 = 1.498$ | 5th |
G# | 415.30 | $c^8 = 1.587$ | minor 6th |
A | 440 | $c^9 = 1.682$ | major 6th |
A# | 466.16 | $c^{10} = 1.782$ | minor 7th |
B | 493.88 | $c^{11} = 1.888$ | major 7th |
High C | 523.25 | $c^{12} = 2.000$ | octave/8th |
The equal-tempered scale can be extended to higher or lower octaves by multiplying the frequencies in it by a power of 2. (For example, the note 'A' occurs at frequencies 220, 440, 880, etc., because 440 = 2 $\times$ 220, 880 = 4 $\times$ 220, etc.
However, it is not right to call the frequency 660 = 3 $\times$ 220 an 'A'; the frequency 660 is an 'E', because 600 = 2 $\times$ 330, and 330 is the frequency for 'E'.) Only the frequencies $2^k \times 440$ (for integer $k$) are called 'A'.
Remember: if a signal is sampled at frequency Fs, then the Nyquist frequency at the end of this first half is Fs/2.
Read in the file mystery.wav, creating a vector y that contains a sound track (audio sequence) for the first few seconds of a mystery tune.
As discussed in class, if Fs is an integer sampling frequency. In Octave and Matlab, executing sound(y, Fs) will play y at the frequency Fs.
Change y to contain only the first of the 2 stereo audio tracks.
[y, Fs] = audioread('mystery.wav');
sound(y, Fs) % play the mystery tune
y = y(:,1); % only keep the first track for this assignment
n = length(y)
The length of y is $n = 86000$. Find the integer factors of $n$ by using the Matlab function factor().
Then find out how much cpu time it takes to compute fft(y);
(To do this, run it 100 times and compute the average time required, using cputime.)
factor(n)
time = zeros(100, 1);
for i = 1 : 100
start = cputime;
fft(y);
time(i) = cputime - start;
end
fprintf('Mean CPU time for fft(y): ')
mean(time)
First, define a new vector z that contains y(1:(n-13)). Find the integer factors of $(n-13)$.
Find out how much cpu time it takes to compute fft(z); using the method above.
Explain the difference in timing; this is discussed on pp.235-236 of the Course Reader.
z = y(1:(n-13));
factor(n - 13)
time = zeros(100, 1);
for i = 1 : 100
start = cputime;
fft(z);
time(i) = cputime - start;
end
fprintf('Mean CPU time for fft(z): ')
mean(time)
%% Explanation:
% Since (n - 13) is a prime number, we cannot use fast fourier transform on this number.
% On n, however, we have many small integer factors
% which permit divide-and-conquer algorithm to faster execution.
% This is why it takes much longer to compute.
Plot the frequency spectrum for y by plotting the power of the 'first half' of the squared power of its Fourier transform, using Fs as the sampling frequency.
Assume for this assignment that the power of a complex value $z$ is its complex absolute value: i.e., $\mbox{power}(z) = |z| = \sqrt{z \, \bar{z}}$.
Here by "first half" of a sequence $s =[s_0, \ldots, s_{n-1}]$ we actually mean $[s_1, \ldots, s_m]$ where $m = \lfloor n/2 \rfloor$. For example, the "first half" of $[0,1,2,3,4,5,6,7]$ is $[1,2,3,4]$.
To increase the information displayed by your plot, omit the very first element in the power vector (which is just the sum of the data).
firstHalf = 2 : (n / 2 + 1);
Y = fft(y);
power = abs(Y(firstHalf)) .^ 2; % the squared power of its Fourier transform
NyquistFrequency = Fs / 2;
frequencies = linspace(1, NyquistFrequency, n / 2);
plot(frequencies, power);
There are altogether about $(5 \times 12)+1 = 61$ notes in the equal-tempered scale with frequencies between low C (with frequency near 131) and the very high C with frequency near $131 \times 2^5 = 4192$.
Suppose we also define dividing lines between notes. For example, C has frequency 261.63 and C# has frequency 271.18. The dividing line between these two notes is $\Delta = (261.63+271.18)/2 = 266.40$.
Just as every note in the equal-tempered scale has frequency $c^k$ times the frequency of C, we can define equal tempered dividing lines between all adjacent notes as having frequency $c^k$ times the frequency $\Delta$.
Write a Matlab function note_power(power,Fs) that distills a power spectrum vector power into an $61 \times 1$ vector that gives, for each equal-tempered note, the maximum value among all entries in the vector within the dividing lines above and below this note. Find the top 10 notes (the 10 notes with highest power).
For the mystery clip, use your function to print the power value for each of these 10 notes.
function notes = note_power(power, Fs)
cVal = 2 ^ (1 / 12);
% I assume 131 is the lowest note, and compute the divide line between low C and low C# as the base delta
delta = (131 + 131 * cVal) / 2;
numNodes = 61;
C = ones(numNodes, 1) * cVal;
for i = 2 : numNodes
C(i) = C(i - 1) * cVal;
end
Delta = [1; delta * C];
len = length(power);
notes = zeros(numNodes, 1);
p = zeros(numNodes, 1);
for i = 1 : numNodes
start = ceil(Delta(i));
divLine = floor(Delta(i + 1));
searchInterval = start : divLine;
maxLocation = find(power(searchInterval) == max(power(searchInterval)));
notes(i) = power(maxLocation + start);
end
end
notes = note_power(power, Fs);
[sp, I] = sort(notes, 'descend');
top10notes = I(1:10) % ith note, 1st (low C), 2nd, ...
highestPower = sp(1:10)
Find the note whose spike in the power spectrum is largest; this is called the key.
For each of the top 10 notes, compute the ratio of their frequency to that of the key. Each ratio will be a power $c^k$; for each note, give the value of $k$.
A major fifth has a ratio $c^7$, which is approximately 3/2 -- the tonic/dominant ratio.
A minor fifth (diminished fifth) has a ratio $c^6$, which is approximately 1.4.
A major third has a ratio $c^4$, which is approximately 5/4 -- the ratio of harmony.
c = 2^(1/12);
% major_third = c^4
% minor_fifth = c^6
% major_fifth = c^7
lowCFreq = 131;
Notes = ['C' 'C#' 'D' 'D#' 'E' 'F' 'F#' 'G' 'G#' 'A' 'A#' 'B' 'B#'];
[val, key] = max(power);
k = round(mod(log(key / lowCFreq) / log(c), 12)) + 1;
fprintf('The key is: %s', Notes(k));
ratio = zeros(10, 1);
for i = 1 : 10
ratio(i) = find(power == highestPower(i)) / lowCFreq;
end
ratio
k = log(ratio) / log(c)
Sunspots have effects on power grids and communications on earth, and have had high intensity recently. They often appear in the news, and there is a sunspot observatory at Big Bear near Los Angeles.
In class we studied a classic example of Fourier analysis: determining the periodicity of sunspot data. The plot of the data shows it is clearly periodic.
% sunspot activity from 1700 to 1987.
years = 1700 : 1987;
sunspots = [ ...
5 11 16 23 36 58 29 20 10 8 3 0 0 2 11 27 ...
47 63 60 39 28 26 22 11 21 40 78 122 103 73 47 35 ...
11 5 16 34 70 81 111 101 73 40 20 16 5 11 22 40 ...
60 81 83 48 48 31 12 10 10 32 48 54 63 86 61 45 ...
36 21 11 38 70 106 101 82 67 35 31 7 20 93 154 126 ...
85 68 39 23 10 24 83 132 131 118 90 67 60 47 41 21 ...
16 6 4 7 15 34 45 43 48 42 28 10 8 3 0 1 ...
5 12 14 35 46 41 30 24 16 7 4 2 9 17 36 50 ...
64 67 71 48 28 9 13 57 122 138 103 86 65 37 24 11 ...
15 40 62 99 125 96 67 65 54 39 21 7 4 23 55 94 ...
96 77 59 44 47 31 16 7 38 74 139 111 102 66 45 17 ...
11 12 3 6 32 54 60 64 64 52 25 13 7 6 7 36 ...
73 85 78 64 42 26 27 12 10 3 5 24 42 64 54 62 ...
49 44 19 6 4 1 10 47 57 104 81 64 38 26 14 6 ...
17 44 64 69 78 65 36 21 11 6 9 36 80 114 110 89 ...
68 48 31 16 10 33 93 152 136 135 84 69 32 14 4 38 ...
142 190 185 159 112 54 38 28 10 15 47 94 106 106 105 67 ...
69 38 35 16 13 28 93 155 155 140 116 67 46 18 13 29 ...
];
plot( years, sunspots, 'b' )
title('Sunspot intensity from 1700 to 1987')
xlabel('year')
We can use Fourier analysis to find the frequency of the cycles shown in the plot above. Based on the script below, sunspots go through a cycle that has frequency (0.091/year) which is (1/(11 years)). Thus the period of sunspot activity is about 11 years long.
figure
plot( years, sunspots, 'b' )
title('Sunspot intensity from 1700 to 1987')
xlabel('year')
power_spectrum = abs(fft(sunspots));
n = length(sunspots);
sampling_frequency = 1; % one sample / year
frequencies = linspace( 0, sampling_frequency, n );
figure
plot( frequencies, power_spectrum, 'b' )
text( sampling_frequency/5, max(power_spectrum)*0.80, 'Note the symmetry (always obtained for real data)', 'fontsize', 6 )
text( sampling_frequency/5, max(power_spectrum)*0.70, 'Sampling frequency = once per year = (1 / year)', 'fontsize', 6 )
title( 'Fourier power spectrum of the sunspot data' )
xlabel('frequency (1/year)')
n_over_2 = floor(n/2);
figure
plot( frequencies(2:n_over_2), power_spectrum(2:n_over_2) )
title( 'Spectrum up to the Nyquist frequency' )
xlabel('frequency (1/year)')
% text( sampling_frequency/8, max(power_spectrum)*0.90, 'Nyquist frequency = sampling frequency / 2', 'fontsize', 6 )
% text( sampling_frequency/8, max(power_spectrum)*0.70, 'We ignore the 1st Fourier coefficient here;', 'fontsize', 6 )
% text( sampling_frequency/8, max(power_spectrum)*0.60, 'it is always just the sum of the input data.', 'fontsize', 6 )
search_interval = 2:floor(n/4); % skip the first coefficient, which is always the sum of the input data
spike_location = 1+find( power_spectrum(search_interval) == max(power_spectrum(search_interval)) );
spike_frequency = frequencies(spike_location);
period_of_sunspot_cycle = 1/spike_frequency;
figure
plot( frequencies(search_interval), power_spectrum(search_interval) )
text( spike_frequency*1.05, power_spectrum(spike_location)*1.00, ...
sprintf('Frequency f of largest spike (units: 1/year) = %6.3f',spike_frequency), 'fontsize', 5 )
text( spike_frequency*1.05, power_spectrum(spike_location)*0.90, ...
sprintf('1/f = period of sunspot cycle (units: year) = %5.2f',period_of_sunspot_cycle), 'fontsize', 5 )
title( 'Spike represents sunspot cycle frequency: once per 11 years' )
xlabel('frequency (1/year)')
hold on
plot( [spike_frequency spike_frequency], [0 power_spectrum(spike_location)], 'r--' )
hold off
Sunspot data since 1700 is available from the Belgian Royal Observatory at http://www.sidc.be/silso/datafiles. Daily, monthly, and yearly sunspot measures are available.
Define a function to read in a .csv file across the network using urlread(), writefile(), and csvread().
Use your function to read in the yearly and monthly data.
% There is no writefile() in MATLAB 2016b
function [yearly, monthly] = urlcsv(yearurl, monthurl)
year = urlread(yearurl);
tmp = textscan(year, repmat('%f',1,5),'delimiter',';');
year = horzcat(tmp{1:end});
dlmwrite('y.csv', year);
yearly = csvread('y.csv');
month = urlread(monthurl);
tmp = textscan(month, repmat('%f',1,5),'delimiter',';');
month = horzcat(tmp{1:end});
dlmwrite('m.csv', month);
monthly = csvread('m.csv');
end
year = 'http://www.sidc.be/silso/DATA/SN_y_tot_V2.0.csv';
month = 'http://www.sidc.be/silso/DATA/SN_m_tot_V2.0.csv';
[yearly, monthly] = urlcsv(year, month);
Is the yearly data close to the sunspots vector analyzed above? (up to 1987) Find the L2-distance between the two vectors.
filtered_data = yearly(:, 2)';
sunspots_belgia = filtered_data(yearly(:, 1) < 1988); % up to 1987
norm(sunspots_belgia - sunspots)
plot( years, sunspots_belgia, 'b' )
title('Belgian Sunspot intensity from 1700 to 1987')
xlabel('year')
Find the period of yearly sunspot intensity, using analysis like that shown above. Do the two values of the period agree?
power_spectrum = abs(fft(sunspots_belgia));
n = length(sunspots_belgia);
sampling_frequency = 1; % one sample / year
frequencies = linspace( 0, sampling_frequency, n );
search_interval = 2:floor(n/4); % skip the first coefficient, which is always the sum of the input data
spike_location = 1+find( power_spectrum(search_interval) == max(power_spectrum(search_interval)) );
spike_frequency = frequencies(spike_location);
period_of_sunspot_cycle = 1/spike_frequency;
figure
plot( frequencies(search_interval), power_spectrum(search_interval) )
text( spike_frequency*1.05, power_spectrum(spike_location)*1.00, ...
sprintf('Frequency f of largest spike (units: 1/year) = %6.3f',spike_frequency), 'fontsize', 5 )
text( spike_frequency*1.05, power_spectrum(spike_location)*0.90, ...
sprintf('1/f = period of sunspot cycle (units: year) = %5.2f',period_of_sunspot_cycle), 'fontsize', 5 )
title( 'Spike represents sunspot cycle freq: once / 11 years' )
xlabel('frequency (1/year)')
hold on
plot( [spike_frequency spike_frequency], [0 power_spectrum(spike_location)], 'r--' )
hold off
fprintf('The two values of the period agree.')
Find the period of sunspot intensity using the monthly data. At which frequencies are there spikes?
Remember, the sampling frequency here is (1/month) = (12/year).
Is there a period at a frequency close to the period for the yearly data?
filtered_data_m = monthly(:, 4);
sunspots_belgia_m = filtered_data_m(monthly(:, 1) < 1988)'; % up to 1987
power_spectrum = abs(fft(sunspots_belgia_m));
n = length(sunspots_belgia_m);
sampling_frequency = 12; % twelve sample / year
frequencies = linspace( 0, sampling_frequency, n );
search_interval = 2:floor(n/4); % skip the first coefficient, which is always the sum of the input data
spike_location = 1+find( power_spectrum(search_interval) == max(power_spectrum(search_interval)) );
spike_frequency = frequencies(spike_location);
period_of_sunspot_cycle = 1/spike_frequency;
figure
plot( frequencies(search_interval), power_spectrum(search_interval) )
text( spike_frequency*1.05, power_spectrum(spike_location)*1.00, ...
sprintf('Frequency f of largest spike (units: 1/year) = %6.3f',spike_frequency), 'fontsize', 5 )
text( spike_frequency*1.05, power_spectrum(spike_location)*0.90, ...
sprintf('1/f = period of sunspot cycle (units: year) = %5.2f',period_of_sunspot_cycle), 'fontsize', 5 )
title( 'Spike represents sunspot cycle freq: once / 11 years' )
xlabel('frequency (1/year)')
hold on
plot( [spike_frequency spike_frequency], [0 power_spectrum(spike_location)], 'r--' )
hold off
fprintf('The period is pretty close to that of the yearly data.')
A simple way to check if a $m \times n$ RGB image has been faked (with Photoshop, say) is to use the reshape function to convert it into a $(m\,n) \times 3$ matrix, compute the 3 principal components of its $3 \times 3$ covariance matrix, project the reshaped image on the second principal component, and then reshape the result back to a $m \times n$ grayscale image. If this grayscale image has bright or dark spots, it is possible that the color distribution in that area (and perhaps also that part of the image) has been altered.
For example, the image LIFE_projected_on_2nd_PC.jpg shows the result of projecting the image LIFE.jpg on the 2nd PC, and treating the result as a grayscale image. Notice the flag on Aldrin's shoulder and some buttons on his suit are bright, suggesting that they have been retouched. The buttons are bright red in LIFE.jpg.
![]() |
![]() |
Write a function that takes the filename of an input image, and returns its 2nd PC image.
As a test, show the output of your program on LIFE.jpg.
function res = secondPCImage(filename)
image = double(imread(filename)) / 255;
[m, n, h] = size(image);
im = reshape(image, m * n, 3);
[coeff, score] = pca(im);
res = reshape(score(:, 2), m, n);
% Same as follows, but shorter
% imCov = cov(im);
% [U, S, V] = svd(imCov);
% PC2 = U(:, 2);
% res = reshape(im * PC2, m, n);
end
figure
colormap('gray');
imagesc(secondPCImage('LIFE.jpg'));
(With Octave, you can include the output image in this notebook with a <img src="..."> HTML tag.)
Show the output of your program on the photo Jennifer_in_Paradise.jpg included in the assignment .zip file. This image is allegedly the first image ever actually photoshopped -- a photo of the wife of the original developer of Photoshop, famous as a photoshopped image.
figure
colormap('gray');
imagesc(secondPCImage('Jennifer_in_Paradise.jpg'));
Using e.g. Google image search, find an image of Hillary or Donald whose 2nd PC has clearly highlighted areas. Comment on the highlighting.
figure
colormap('gray');
imagesc(secondPCImage('hrc.jpg'));
display( 'We see some part of her hair, her left eyebrow, her eye, etc. have clear highlighted areas.\nThis probably mean that this picture has been photoshopped.');
The goal of this problem is to analyze the graph (actually, a numeric matrix $H$). You can read it from one of the attached files hero_social_network_50.tsv, hero_social_network_50.m, hero_social_network_50.py.
This $H$ matrix represent a core part of Marvel's Hero Social Network (a graph of social connections between superheroes. The entry $h_{ij}$ of $H$ is the strength of edges, so higher values mean that heroes $i$ and $j$ appear together more often.
The entire Marvel set of heroes is amazingly large; this dataset includes 206 heroes in the graph, obtained from the subset who appear in at least 50 comics. Despite their frequent appearance, this matrix is still pretty sparse -- apparently superheroes don't have hundreds of friends.
For example, you can read the file with dlmread() in Matlab; in Python either ad hoc parsing or perhaps the csv module.
Notice that the network is undirected, so $H$ is real symmetric, and all entries of $H$ are nonnegative.
load('hero_social_network_50_names.mat');
load('hero_social_network_50_network.mat');
Compute the eigenvalues of $H$ and plot them -- so that the $i$-th eigenvalue $\lambda_i$ is plotted at position $(x,y) = (i, \lambda_i)$.
[V, D] = eig(Hero_network);
eigvals = diag(D);
n = length(eigvals);
x = linspace(1, n, n);
plot(x, eigvals);
Find the largest eigenvalue $r$ of $H$. Is it equal to the spectral norm $||H||_2$?
max(eigvals)
norm(Hero_network)
fprintf('Yes, they are equal');
Find the eigenvector e of $H$ that has this largest eigenvalue. Plot the sequence of entries of the eigenvector, showing that they are nonnegative. Which entry $i$ has the largest value? (Who is the $i$-th hero?)
[val, ind] = max(eigvals);
eigvec = V(:, ind);
plot(eigvec, '.');
[val, i] = max(eigvec);
fprintf('Entry %d has the largest value, corresponds to name %s', i, string(names(i)));
The degree sequence is a vector $f d$ of integer values whose $i$-th entry $d_i$ is the degree of the $i$-th node. That is, $d_i$ is the number of other superheroes to which the $i$-th hero has nonzero edges. Plot the sequence of degrees. What is the name of the hero $i$ who has the largest value?
adjacency = Hero_network ~= 0;
deg = sum(adjacency, 2);
[val, ind] = max(deg);
names(ind)
Compute the graph laplacian $L = D - A$, where $D = diag({\bf d})$ is the diagonal matrix determined by the degree sequence, and $A$ is the zero-one adjacency matrix for $H$ (The entries of $A$ are zero wherever they are zero in $H$, and are one wherever nonzero in $H$.)
What is the number of zero eigenvalues of the graph laplacian $L$, if we assume that all eigenvalues whose absolute value is below 1e-10 are zero? (This is supposed to be the number of connected components in the graph.)
D = diag(deg);
L = D - adjacency;
eigval = eig(L);
sum((eigval < 10^(-10)))