Last update: July 17, 2018 04:47 PM

July 17, 2018


Curtis Miller

Stock Data Analysis with Python (Second Edition)

This is a lecture for MATH 4100/CS 5160: Introduction to Data Science, offered at the University of Utah, introducing time series data analysis applied to finance. This is also an update to my earlier blog posts on the same topic (this one combining them together). I show how to get and visualize stock data in , some basic stock analytics, and how to develop a trading system.

July 17, 2018 02:00 PM


Continuum Analytics Blog

New Release of Anaconda Enterprise features Expanded GPU and Container Usage

Anaconda, Inc. is thrilled to announce the latest release of Anaconda Enterprise, our popular AI/ML enablement platform for teams at scale. The release of Anaconda Enterprise 5.2 adds capabilities for GPU-accelerated, scalable machine learning and cloud-native model management, giving enterprises the power to respond at the speed required by today’s digital interactions.  Anaconda Enterprise—An AI/ML …
Read more →

The post New Release of Anaconda Enterprise features Expanded GPU and Container Usage appeared first on Anaconda.

July 17, 2018 12:25 PM


Test and Code

Preparing for Technical Talks with Kelsey Hightower – bonus episode

After I had wrapped up the interview with Kelsey Hightower for episode 43, I asked him one last question.

You see, I admire the his presentation style.
So I asked him if he would share with me how he prepared for his presentations.

His answer is so thoughtful and makes so much sense, I couldn’t keep it to myself.

I’m releasing this as a bonus mini-episode so that it’s easy to refer back to the next time you or I have a chance to do a technical talk.

Special Guest: Kelsey Hightower.

After I had wrapped up the interview with Kelsey Hightower for episode 43, I asked him one last question.

You see, I admire the his presentation style. So I asked him if he would share with me how he prepared for his presentations.

His answer is so thoughtful and makes so much sense, I couldn’t keep it to myself.

I’m releasing this as a bonus mini-episode so that it’s easy to refer back to the next time you or I have a chance to do a technical talk.Special Guest: Kelsey Hightower.

July 17, 2018 05:45 AM

July 16, 2018


NumFOCUS

NumFOCUS Projects at SciPy 2018

The post NumFOCUS Projects at SciPy 2018 appeared first on NumFOCUS.

July 16, 2018 09:17 PM


Bhishan Bhandari

Python Assignment Expression – PEP 572 – Python3.8

A recent buzz in the Python Community is PEP 572’s acceptance for Python3.8 . PEP stands for Python Enhancement Proposals and each such PEPs are assigned a number by the PEP editors and once assigned are never changed. What exactly is PEP 572(Directly from PEP 572)? Abstract This is a proposal for creating a way […]

The post Python Assignment Expression – PEP 572 – Python3.8 appeared first on The Tara Nights.

July 16, 2018 05:52 PM


Made With Mu

Mu Release Candidate

The release candidate for Mu 1.0.0 is out!
This is the last step before the final release of Mu 1.0. Apart from a few
minor bug fixes, the biggest change from beta 17 is the inclusion of various
translations for the user interface. Full details can be found in the
changelog.

Many thanks to the following people for their magnificent work on the following
translations:

I would love to include more translations in the final release, especially
if they’re in one of the following languages:

  • Arabic
  • German
  • Greek
  • Hebrew
  • Hindi
  • Italian
  • Russian

(This list reflects both reach and accessibility of languages so Mu is usable
by as many beginner programmers as possible.)

Other highlights include a fix to allow users of Adafruit devices to save a
file called code.py. This was getting erroneously caught by the new “shadow
module” feature which, in this specific case, doesn’t apply. Zander Brown
continues to make extraordinary progress in making the user interface both
great to look at and consistent across all platforms. We had quite a bit of
feedback from teachers who value such UI consistency: it allows them to create
resources that apply to all platforms, thus avoiding all the
complications of, “if you’re on <platform>, then this will look different”
interruptions in the flow of such resources. Finally, Tim Golden and
Jonny Austin have done sterling work testing the various fixes for problematic
edge-cases in the new BBC micro:bit flash functionality.

July 16, 2018 05:25 PM


Real Python

Reading and Writing CSV Files in Python

Let’s face it: you need to get information into and out of your programs through more than just the keyboard and console. Exchanging information through text files is a common way to share info between programs. One of the most popular formats for exchanging data is the CSV format. But how do you use it?

Let’s get one thing clear: you don’t have to (and you won’t) build your own CSV parser from scratch. There are several perfectly acceptable libraries you can use. The Python csv library will work for most cases. If your work requires lots of data or numerical analysis, the pandas library has CSV parsing capabilities as well, which should handle the rest.

In this article, you’ll learn how to read, process, and parse CSV from text files using Python. You’ll see how CSV files work, learn the all-important csv library built into Python, and see how CSV parsing works using the pandas library.

So let’s get started!

What Is a CSV File?

A CSV file (Comma Separated Values file) is a type of plain text file that uses specific structuring to arrange tabular data. Because it’s a plain text file, it can contain only actual text data—in other words, printable ASCII or Unicode characters.

The structure of a CSV file is given away by its name. Normally, CSV files use a comma to separate each specific data value. Here’s what that structure looks like:

column 1 name,column 2 name, column 3 name
first row data 1,first row data 2,first row data 3
second row data 1,second row data 2,second row data 3
...

Notice how each piece of data is separated by a comma. Normally, the first line identifies each piece of data—in other words, the name of a data column. Every subsequent line after that is actual data and is limited only by file size constraints.

In general, the separator character is called a delimiter, and the comma is not the only one used. Other popular delimiters include the tab (t), colon (:) and semi-colon (;) characters. Properly parsing a CSV file requires us to know which delimiter is being used.

Where Do CSV Files Come From?

CSV files are normally created by programs that handle large amounts of data. They are a convenient way to export data from spreadsheets and databases as well as import or use it in other programs. For example, you might export the results of a data mining program to a CSV file and then import that into a spreadsheet to analyze the data, generate graphs for a presentation, or prepare a report for publication.

CSV files are very easy to work with programmatically. Any language that supports text file input and string manipulation (like Python) can work with CSV files directly.

Parsing CSV Files With Python’s Built-in CSV Library

The csv library provides functionality to both read from and write to CSV files. Designed to work out of the box with Excel-generated CSV files, it is easily adapted to work with a variety of CSV formats. The csv library contains objects and other code to read, write, and process data from and to CSV files.

Reading CSV Files With csv

Reading from a CSV file is done using the reader object. The CSV file is opened as a text file with Python’s built-in open() function, which returns a file object. This is then passed to the reader, which does the heavy lifting.

Here’s the employee_birthday.txt file:

name,department,birthday month
John Smith,Accounting,November
Erica Meyers,IT,March

Here’s code to read it:

import csv

with open('employee_birthday.txt') as csv_file:
    csv_reader = csv.reader(csv_file, delimiter=',')
    line_count = 0
    for row in csv_reader:
        if line_count == 0:
            print(f'Column names are {", ".join(row)}')
            line_count += 1
        else:
            print(f't{row[0]} works in the {row[1]} department, and was born in {row[2]}.')
            line_count += 1
    print(f'Processed {line_count} lines.')

This results in the following output:

Column names are name, department, birthday month
    John Smith works in the Accounting department, and was born in November.
    Erica Meyers works in the IT department, and was born in March.
Processed 3 lines.

Each row returned by the reader is a list of String elements containing the data found by removing the delimiters. The first row returned contains the column names, which is handled in a special way.

Reading CSV Files Into a Dictionary With csv

Rather than deal with a list of individual String elements, you can read CSV data directly into a dictionary (technically, an Ordered Dictionary) as well.

Again, our input file, employee_birthday.txt is as follows:

name,department,birthday month
John Smith,Accounting,November
Erica Meyers,IT,March

Here’s the code to read it in as a dictionary this time:

import csv

with open('employee_birthday.txt', mode='r') as csv_file:
    csv_reader = csv.DictReader(csv_file)
    line_count = 0
    for row in csv_reader:
        if line_count == 0:
            print(f'Column names are {", ".join(row)}')
            line_count += 1
        print(f't{row["name"]} works in the {row["department"]} department, and was born in {row["birthday month"]}.')
        line_count += 1
    print(f'Processed {line_count} lines.')

This results in the same output as before:

Column names are name, department, birthday month
    John Smith works in the Accounting department, and was born in November.
    Erica Meyers works in the IT department, and was born in March.
Processed 3 lines.

Where did the dictionary keys come from? The first line of the CSV file is assumed to contain the keys to use to build the dictionary. If you don’t have these in your CSV file, you should specify your own keys by setting the fieldnames optional parameter to a list containing them.

Optional Python CSV reader Parameters

The reader object can handle different styles of CSV files by specifying additional parameters, some of which are shown below:

  • delimiter specifies the character used to separate each field. The default is the comma (',').

  • quotechar specifies the character used to surround fields that contain the delimiter character. The default is a double quote (' " ').

  • escapechar specifies the character used to escape the delimiter character, in case quotes aren’t used. The default is no escape character.

These parameters deserve some more explanation. Suppose you’re working with the following employee_addresses.txt file:

name,address,date joined
john smith,1132 Anywhere Lane Hoboken NJ, 07030,Jan 4
erica meyers,1234 Smith Lane Hoboken NJ, 07030,March 2

This CSV file contains three fields: name, address, and date joined, which are delimited by commas. The problem is that the data for the address field also contains a comma to signify the zip code.

There are three different ways to handle this situation:

  • Use a different delimiter
    That way, the comma can safely be used in the data itself. You use the delimiter optional parameter to specify the new delimiter.

  • Wrap the data in quotes
    The special nature of your chosen delimiter is ignored in quoted strings. Therefore, you can specify the character used for quoting with the quotechar optional parameter. As long as that character also doesn’t appear in the data, you’re fine.

  • Escape the delimiter characters in the data
    Escape characters work just as they do in format strings, nullifying the interpretation of the character being escaped (in this case, the delimiter). If an escape character is used, it must be specified using the escapechar optional parameter.

Writing CSV Files With csv

You can also write to a CSV file using a writer object and the .write_row() method:

import csv

with open('employee_file.csv', mode='w') as employee_file:
    employee_writer = csv.writer(employee_file, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)

    employee_writer.writerow(['John Smith', 'Accounting', 'November'])
    employee_writer.writerow(['Erica Meyers', 'IT', 'March'])

The quotechar optional parameter tells the writer which character to use to quote fields when writing. Whether quoting is used or not, however, is determined by the quoting optional parameter:

  • If quoting is set to csv.QUOTE_MINIMAL, then .writerow() will quote fields only if they contain the delimiter or the quotechar. This is the default case.
  • If quoting is set to csv.QUOTE_ALL, then .writerow() will quote all fields.
  • If quoting is set to csv.QUOTE_NONNUMERIC, then .writerow() will quote all fields containing text data and convert all numeric fields to the float data type.
  • If quoting is set to csv.QUOTE_NONE, then .writerow() will escape delimiters instead of quoting them. In this case, you also must provide a value for the escapechar optional parameter.

Reading the file back in plain text shows that the file is created as follows:

John Smith,Accounting,November
Erica Meyers,IT,March

Writing CSV File From a Dictionary With csv

Since you can read our data into a dictionary, it’s only fair that you should be able to write it out from a dictionary as well:

import csv

with open('employee_file2.csv', mode='w') as csv_file:
    fieldnames = ['emp_name', 'dept', 'birth_month']
    writer = csv.DictWriter(csv_file, fieldnames=fieldnames)

    writer.writeheader()
    writer.writerow({'emp_name': 'John Smith', 'dept': 'Accounting', 'birth_month': 'November'})
    writer.writerow({'emp_name': 'Erica Meyers', 'dept': 'IT', 'birth_month': 'March'})

Unlike DictReader, the fieldnames parameter is required when writing a dictionary. This makes sense, when you think about it: without a list of fieldnames, the DictWriter can’t know which keys to use to retrieve values from your dictionaries. It also uses the keys in fieldnames to write out the first row as column names.

The code above generates the following output file:

emp_name,dept,birth_month
John Smith,Accounting,November
Erica Meyers,IT,March

Parsing CSV Files With the pandas Library

Of course, the Python CSV library isn’t the only game in town. Reading CSV files is possible in pandas as well. It is highly recommended if you have a lot of data to analyze.

pandas is an open-source Python library that provides high performance data analysis tools and easy to use data structures. pandas is available for all Python installations, but it is a key part of the Anaconda distribution and works extremely well in Jupyter notebooks to share data, code, analysis results, visualizations, and narrative text.

Installing pandas and its dependencies in Anaconda is easily done:

As is using pip/pipenv for other Python installations:

We won’t delve into the specifics of how pandas works or how to use it. For an in-depth treatment on using pandas to read and analyze large data sets, check out Shantnu Tiwari’s superb article on working with large Excel files in pandas.

Reading CSV Files With pandas

To show some of the power of pandas CSV capabilities, I’ve created a slightly more complicated file to read, called hrdata.csv. It contains data on company employees:

Name,Hire Date,Salary,Sick Days remaining
Graham Chapman,03/15/14,50000.00,10
John Cleese,06/01/15,65000.00,8
Eric Idle,05/12/14,45000.00,10
Terry Jones,11/01/13,70000.00,3
Terry Gilliam,08/12/14,48000.00,7
Michael Palin,05/23/13,66000.00,8

Reading the CSV into a pandas DataFrame is quick and straightforward:

import pandas
df = pandas.read_csv('hrdata.csv')
print(df)

That’s it: three lines of code, and only one of them is doing the actual work. pandas.read_csv() opens, analyzes, and reads the CSV file provided, and stores the data in a DataFrame. Printing the DataFrame results in the following output:

             Name Hire Date   Salary  Sick Days remaining
0  Graham Chapman  03/15/14  50000.0                   10
1     John Cleese  06/01/15  65000.0                    8
2       Eric Idle  05/12/14  45000.0                   10
3     Terry Jones  11/01/13  70000.0                    3
4   Terry Gilliam  08/12/14  48000.0                    7
5   Michael Palin  05/23/13  66000.0                    8

Here are a few points worth noting:

  • First, pandas recognized that the first line of the CSV contained column names, and used them automatically. I call this Goodness.
  • However, pandas is also using zero-based integer indices in the DataFrame. That’s because we didn’t tell it what our index should be.
  • Further, if you look at the data types of our columns , you’ll see pandas has properly converted the Salary and Sick Days remaining columns to numbers, but the Hire Date column is still a String. This is easily confirmed in interactive mode:

    >>> print(type(df['Hire Date'][0]))
    <class 'str'>
    

Let’s tackle these issues one at a time. To use a different column as the DataFrame index, add the index_col optional parameter:

import pandas
df = pandas.read_csv('hrdata.csv', index_col='Name')
print(df)

Now the Name field is our DataFrame index:

               Hire Date   Salary  Sick Days remaining
Name                                                  
Graham Chapman  03/15/14  50000.0                   10
John Cleese     06/01/15  65000.0                    8
Eric Idle       05/12/14  45000.0                   10
Terry Jones     11/01/13  70000.0                    3
Terry Gilliam   08/12/14  48000.0                    7
Michael Palin   05/23/13  66000.0                    8

Next, let’s fix the data type of the Hire Date field. You can force pandas to read data as a date with the parse_dates optional parameter, which is defined as a list of column names to treat as dates:

import pandas
df = pandas.read_csv('hrdata.csv', index_col='Name', parse_dates=['Hire Date'])
print(df)

Notice the difference in the output:

                Hire Date   Salary  Sick Days remaining
Name                                                   
Graham Chapman 2014-03-15  50000.0                   10
John Cleese    2015-06-01  65000.0                    8
Eric Idle      2014-05-12  45000.0                   10
Terry Jones    2013-11-01  70000.0                    3
Terry Gilliam  2014-08-12  48000.0                    7
Michael Palin  2013-05-23  66000.0                    8

The date is now formatted properly, which is easily confirmed in interactive mode:

>>> print(type(df['Hire Date'][0]))
<class 'pandas._libs.tslibs.timestamps.Timestamp'>

If your CSV files doesn’t have column names in the first line, you can use the names optional parameter to provide a list of column names. You can also use this if you want to override the column names provided in the first line. In this case, you must also tell pandas.read_csv() to ignore existing column names using the header=0 optional parameter:

import pandas
df = pandas.read_csv('hrdata.csv', 
            index_col='Employee', 
            parse_dates=['Hired'], 
            header=0, 
            names=['Employee', 'Hired','Salary', 'Sick Days'])
print(df)

Notice that, since the column names changed, the columns specified in the index_col and parse_dates optional parameters must also be changed. This now results in the following output:

                    Hired   Salary  Sick Days
Employee                                     
Graham Chapman 2014-03-15  50000.0         10
John Cleese    2015-06-01  65000.0          8
Eric Idle      2014-05-12  45000.0         10
Terry Jones    2013-11-01  70000.0          3
Terry Gilliam  2014-08-12  48000.0          7
Michael Palin  2013-05-23  66000.0          8

Writing CSV Files With pandas

Of course, if you can’t get your data out of pandas again, it doesn’t do you much good. Writing a DataFrame to a CSV file is just as easy as reading one in. Let’s write the data with the new column names to a new CSV file:

import pandas
df = pandas.read_csv('hrdata.csv', 
            index_col='Employee', 
            parse_dates=['Hired'],
            header=0, 
            names=['Employee', 'Hired', 'Salary', 'Sick Days'])
df.to_csv('hrdata_modified.csv')

The only difference between this code and the reading code above is that the print(df) call was replaced with df.to_csv(), providing the file name. The new CSV file looks like this:

Employee,Hired,Salary,Sick Days
Graham Chapman,2014-03-15,50000.0,10
John Cleese,2015-06-01,65000.0,8
Eric Idle,2014-05-12,45000.0,10
Terry Jones,2013-11-01,70000.0,3
Terry Gilliam,2014-08-12,48000.0,7
Michael Palin,2013-05-23,66000.0,8

Conclusion

If you understand the basics of reading CSV files, then you won’t ever be caught flat footed when you need to deal with importing data. Most CSV reading, processing, and writing tasks can be easily handled by the basic csv Python library. If you have a lot of data to read and process, the pandas library provides quick and easy CSV handling capabilities as well.

Are there other ways to parse text files? Of course! Libraries like ANTLR, PLY, and PlyPlus can all handle heavy-duty parsing, and if simple String manipulation won’t work, there are always regular expressions.

But those are topics for other articles…


[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

July 16, 2018 02:00 PM


Mike Driscoll

ANN: Jupyter Notebook 101 Kickstarter

I am happy to announce my latest Kickstarter which is to raise funds to create a book on Jupyter Notebook!

Jupyter Notebook 101 will teach you all you need to know to create and use Notebooks effectively. You can use Jupyter Notebook to help you learn to code, create presentations, make beautiful documentation and much more!

The Jupyter Notebook is also used by the scientific community to demonstrate research in an easy-to-replicate manner.

You will learn the following in Jupyter Notebook 101:

  • How to create and edit Notebooks
  • How to add styling, images, graphs, etc
  • How to configure Notebooks
  • How to export your Notebooks to other formats
  • Notebook extensions
  • Using Notebooks for presentations
  • and more!

Release Date

I am planning to release the book in November, 2018

You can learn more on Kickstarter!

July 16, 2018 01:00 PM


Will Kahn-Greene

Thoughts on Guido retiring as BDFL of Python

I read the news of Guido van Rossum announcing his retirement as BDFL of
Python and it made me a bit sad.

I’ve been programming in Python for almost 20 years on a myriad of open
source projects, tools for personal use, and work. I helped out with
several PyCon US conferences and attended several others. I met a lot
of amazing people who have influenced me as a person and as a programmer.

I started PyVideo in March 2012. At a PyCon US
after that (maybe 2015?), I found myself in an elevator with Guido and
somehow we got to talking about PyVideo and he asked point-blank, “Why
work on that?” I tried to explain what I was trying to do with it:
create an index of conference videos across video sites, improve the meta-data,
transcriptions, subtitles, feeds, etc. I remember he patiently listened to me
and then said something along the lines of how it was a good thing to
work on. I really appreciated that moment of validation. I think about it
periodically. It was one of the reasons Sheila and I worked hard to
transition PyVideo to a new group after we were burned out.

It wouldn’t be an overstatement to say that through programming in Python,
I’ve done some good things and become a better person.

Thank you, Guido, for everything!

July 16, 2018 12:00 PM


Mike Driscoll

PyDev of the Week: Katharine Jarmul

This week we welcome Katharine Jarmul (@kjam) as our PyDev of the Week! Katherine is the co-author of Data Wrangling with Python . She is also the co-founder of KIProtect. You can catch up with the projects she works on over on Github. Let’s take some time to get to know her better!

Can you tell us a little about yourself (hobbies, education, etc):

Sure! I first started working on computers building fan websites for house music in the 90s with my dial-up shared Windows 95 computer. Since then, I have had a love / hate relationship with computers and what is now called data science. I have some formal education with math, statistics and computer science, but also learned most of what I do on my own and therefore am proud to count myself a member of the primarily self-taught folks. For fun, I like to cook and eat with friends, read news or arXiv papers and rant with like-minded folks on and offline. 😂 (I am @kjam on Twitter…)

Why did you start using Python?

I first started using Python in 2007, when I was working at the Washington Post. A mentor (Ryan O’Neil) took a chance on me after seeing a small application I built using JavaScript. He set up a Linux computer and installed the Django application stack along with it — even gave me a commit key! I can’t tell you how many times I broke the server, but 6 months later I launched my first Django app. I was hooked and wanted to build and do more.

What other programming languages do you know and which is your favorite?

I have dabbled in numerous other languages: C++, Java, Go, even Perl, R, PHP and Ruby. I like Python the best, but that’s probably because I know it the best. I am working more regularly in Go now, which is really fun — but also hard for me to do so much typing. Python as my primary language has definitely spoiled me, and for data science and machine learning, there is a reason it has been so widely adopted.

What projects are you working on now?

I recently announced my new company, KIProtect (https://kiprotect.com). We are building solutions for data privacy and security for data science and machine learning. Essentially, we believe data privacy should be a right for everyone, not just those of us lucky enough to live in Europe. For this reason, we want to democratize data privacy — making it easier for data scientists and engineers everywhere to enable secure and private data sharing. Our first offering is a pseudonymization API which is free for limited usage (and paid for larger use). This allows you to send private data and get back properly pseudonymized data via one API call. We will be offering additional tools, solutions and APIs to help increase security and privacy in the coming year.

Which Python libraries are your favorite (core or 3rd party)?

NumPy is pretty much the best thing ever as someone working in machine learning and data science. It is such a useful library and the optimizations the core developers have made to allow for us to do fast, efficient math in Python (ahem, Cython) are fantastic. I am unsure if we would have things like Pandas, Scikit-Learn, even Keras and TensorFlow if it wasn’t for the steady grounding of NumPy to help foster a real data science community within Python.

How did you end up writing a book on Python?

I was approached by my co-author Jacqueline Kazil shortly after I moved to Europe. Ironically, the week before I turned to my partner and said, “you know, I am finally feeling less burnt out. I wonder what I should do next?” The book seemed like a great opportunity to get started with computers again.

What did you learn from that experience?

Writing a book is really hard. I know everyone says it, but it takes quite a lot out of you; and you are likely never fully satisfied with the outcome. That said, I have heard a lot of nice things from folks who used our book as a welcoming introduction to the world of Python and data — and if I even convert one new Pythonista, I can say I have achieved some impact. 🤗

Is there anything else you’d like to say?

Don’t take your website offline to comply with GDPR (the new EU Privacy regulation). It is alarming to me the blanket blocks of European IPs or other ridiculously clueless reactions and takes I have heard from (primarily) US Americans on the regulation.

First off, the regulation is pretty easy to read — so I recommend reading it. If that’s too hard for you, check out our article covering a lot of what you need to know as a data scientist (https://kiprotect.com/blog/gdpr_for_data_science.html ) or this article for software engineers (https://www.smashingmagazine.com/2018/02/gdpr-for-web-developers/ ).

Secondly, think of it first as a user. Wouldn’t you want more say over your data? Don’t you want to know about data breaches? Is it okay for someone to resell your data without telling you? Treat your users how you want to be treated.

Finally, there are tools to help! At KIProtect, we are building several solutions to help make your life easier. There are also many other companies and projects working to help make our software safer for everyone. Don’t treat privacy and security as nice add-ons, treat them as part of your core product. Protect your data, it might be the most valuable thing you create.

Thanks for doing the interview, Katherine!

July 16, 2018 05:05 AM


Matthew Rocklin

Who uses Dask?

This work is supported by Anaconda Inc

People often ask general questions like “Who uses Dask?” or more specific
questions like the following:

  1. For what applications do people use Dask dataframe?
  2. How many machines do people often use with Dask?
  3. How far does Dask scale?
  4. Does dask get used on imaging data?
  5. Does anyone use Dask with Kubernetes/Yarn/SGE/Mesos/… ?
  6. Does anyone in the insurance industry use Dask?

This yields interesting and productive conversations where new users can dive
into historical use cases which informs their choices if and how they use the
project in the future.

New users can learn a lot from existing users.

To further enable this conversation we’ve made a new tiny project,
dask-stories. This is a small
documentation page where people can submit how they use Dask and have that
published for others to see.

To seed this site six generous users have written down how their group uses
Dask. You can read about them here:

  1. Sidewalk Labs: Civic Modeling
  2. Genome Sequencing for Mosquitoes
  3. Full Spectrum: Credit and Banking
  4. Ice Cube: Detecting Cosmic Rays
  5. Pangeo: Earth Science
  6. NCAR: Hydrologic Modeling

We’ve focused on a few questions, available in our
template
that
focus on problems over technology, and include negative as well as positive
feedback to get a complete picture.

  1. Who am I?
  2. What problem am I trying to solve?
  3. How Dask helps?
  4. What pain points did I run into with Dask?
  5. What technology do I use around Dask?

Easy to Contribute

Contributions to this site are simple Markdown documents submitted as pull
requests to
github.com/dask/dask-stories. The site
is then built with ReadTheDocs and updated immediately. We tried to make this
as smooth and familiar to our existing userbase as possible.

This is important. Sharing real-world experiences like this are probably more
valuable than code contributions to the Dask project at this stage. Dask is
more technically mature than it is well-known. Users look to other users to
help them understand a project (think of every time you’ve Googled for “some
tool
in some topic”)

If you use Dask today in an interesting way then please share your story.
The world would love to hear your voice.

If you maintain another project you might consider implementing the same model.
I hope that this proves successful enough for other projects in the ecosystem
to reuse.

July 16, 2018 12:00 AM


Michael Foord

The Role of Abstractions in Software Engineering

Abstract Representation of a Concrete Apple

This is a video and text of a lightning talk, a five minute presentation, given at PyCon US 2018 in Cleveland. The image is an abstract representation of a concrete apple.

This is an abstract talk. There isn’t time to give examples but I hope that the application to the day to day challenges of the practise of software engineering is clear. The only theory worth a damn is the theory of the practise. This is a talk about the role of abstractions in software engineering.

Programming is all about the use of abstractions. We often say that the fundamental language spoken by the machine is ones and zeros. Binary. This isn’t true. Ones and zeroes are an abstract representation of the fundamental operation of computers. It’s a way of representing what central processors do in a way that can be understood by people.

The actual language spoken by computers is the electromagnetic dance across wires and etched silicon, choreographed by the beating of a quartz crystal at the heart of the machine.

Ones and zeroes are a representation of that dance, understandable by humans in order for us to reason about the behaviour of the system.

That’s a very low level abstraction. Very close to the actual operation of computers, but very hard to work with. The next step up is assembly language where we use mnemonics, symbolic instructions like JMP for jump, to represent these patterns of ones and zeroes. We can also use human recognisable labels for memory locations instead of numbers and allow the assembler to calculate offsets for us. Much easier.

Next we have languages like C and then right at the very top we have Python where each construct, a print statement for example, may correspond to as many as millions of the lowest level operations.

Computer programming is communication in two directions. Programming provides a language the computer understands, and is able to execute deterministically, whilst also communicating with humans so they can conceptualise the behaviour of the system. A programming language is a set of conceptual tools to facilitate that communication in both directions.

The art and craft of software engineering is taking the conceptual tools that programming languages provide and using them to solve real world problems. This is the difference between science and engineering. Science is the theory, engineering is the application.

In order to be able to do this we have to have an understanding of the problem domain. We conceptualise it. We think about it. Software is easy to understand and maintain when the abstractions you build map well to the problem domain. If the way you think about the problem is close to the way you think about your software then you have to do less mental translation between the problem and your code.

Joel Spolsky talks about the law of leaky abstractions. Any abstraction that maps to lower level operations in the system will leak. At some point something will go wrong and you will only be able to fix it by understanding the lower level operations too.

I’ve heard it said, and it rings true, that a good programmer can hold about ten thousand lines of code in their head. So if your system is less than ten thousand lines of code, even if it’s terrible code, you don’t need to build higher level building blocks to hold it in your head.

An all too common situation is that a system becomes too complex to reason about, so an engineer decides to create abstractions to simplify how they think. So they create black boxes, abstractions, in which to place the complexity. These type of abstractions conceal complexity. So now you don’t have to look at the mess you just made.

You can reason about your system with your abstractions, but in order to understand the actual behaviour (at a lower level) you need to go digging in all that dirt.

Instead of concealing complexity a good abstraction will explain and point you to the lower level operations. Good abstractions simplify and reveal complexity rather than concealing it.

We can also use this kind of reasoning to think about product and system design. What user experience are you providing, what’s the user story? Your users also think about the problem domain using conceptual tools. The closer the abstractions your software presents to your user map to the way they already think about the problem the easier your software will be to use.

And here we come full circle. If the way you build your software maps well to the problem domain then it will be easy to reason about and maintain. If the abstractions you present to the user map well to the problem domain then it will be easier for your users to think within your system and it will be more intuitive to use.

So abstractions matter. They’re the raw stuff of our world.

This post originally appeared on my personal blog Abstractions on Unpolished Musings.

July 16, 2018 12:00 AM

July 15, 2018


Stefan Behnel

A really fast Python web server with Cython

Shortly after I wrote about speeding up Python web frameworks with Cython,
Nexedi posted an article about their attempt to build
a fast multicore web server for Python
that can compete with the performance of compiled coroutines in the Go language.

Their goal is to use Cython to build a web framework around a fast native web server, and to use Cython’s concurrency and coroutine support to gain native performance also in the application code, without sacrificing the readability that Python provides.

Their experiments look very promising so far.
They managed to process 10K requests per second concurrently, which actually do real processing work.
That is worth noting, because many web server benchmarks out there content themselves with the blank response time for a “hello world”, thus ignoring any concurrency overhead etc.
For that simple static “Hello world!”, they even got 400K requests per second, which shows that this is not a very realistic benchmark.
Under load, their system seems to scale pretty linearly with the number of threads, also not a given among web frameworks.

I might personally get involved in further improving Cython for this kind of concurrent, async applications. Stay tuned.

July 15, 2018 07:42 PM


Bhishan Bhandari

Idiomatic Python – Looping Approaches

Python has it’s own unique techniques and guidelines for looping. Through this article, I will present a few examples on bad and better approaches on looping. While the end goal can be achieved using both sets of the codes to follow, the purpose is to highlight on the better approaches and encourage it. Looping over […]

The post Idiomatic Python – Looping Approaches appeared first on The Tara Nights.

July 15, 2018 05:54 PM


EuroPython

EuroPython 2018: Delaying switch to Late Bird Tickets by one day – please use your coupons today !

Since we still have quite a few people with discount coupons who haven’t bought their tickets yet, we are extending the regular ticket sales by one day.

Switch to Late Bird Tickets on July 17, 00:00 CEST

We will now switch to late bird prices, which are about 30% higher than the regular ones on Tuesday, July 17.

Issued coupons are not valid for Late Bird Tickets

Please note that the coupons we have issued so far are not valid for the late bird tickets, so if you have a coupon for the conference, please order your tickets before we switch to late bird.

This includes coupons for sponsors, speakers, trainers and also the EPS community discount coupons we have given to several user groups.

Please make sure you use your coupon before the switch on Tuesday, 00:00 CEST.

Enjoy,

EuroPython 2018 Team
https://ep2018.europython.eu/
https://www.europython-society.org/

July 15, 2018 11:27 AM


EuroPython Society

Invitation to the EuroPython Society General Assembly 2018

We would like to invite all EuroPython attendees and EuroPython Society
(EPS) members to attend this
year’s EPS General Assembly (GA), which we will run as in-person meeting
at the upcoming EuroPython 2018, held in Edinburgh, Scotland, UK from July 23 – 29.

We had already sent a invite to the members mailing on 2018-06-17, but would like to announce this more broadly as well and with complete agenda.

Place of the General Assembly meeting:

We will meet on Friday, July 27, at 14:15 BST in room Kilsyth of the EICC,
The Exchange, Edinburgh EH3 8EE.

There will be a short talk to invite volunteers to participate in organizing EuroPython 2019 in preparation for next year’s event at 14:00 BST in the same room,
right before the General Assembly. You may want to attend that talk as
well. In this talk, we will present the EuroPython Workgroup Concept, we have been using successfully for the past years now.

General Assembly Agenda

The agenda contents for the assembly is defined by the EPS bylaws. We are planning to use the following structure:

  • Opening of the meeting
  • Selection of meeting chair, secretary and 2 checkers of the minutes
  • Motion establishing the timeliness of the call to the meeting
  • Presentation of the annual report and annual accounts by the board
  • Presentation of the report of the auditor
  • Discharge from liability for the board
  • Presentation of a budget by the outgoing board.
  • Acceptance of budget and decision on membership fees for the upcoming year
  • Election of members of the board
  • Election of chair of the board
  • Election of one auditor and one replacement. The auditor does not
    have to be certified in any way and is normally selected among the
    members of the society.
  • The optional election of a nomination committee for the next annual meeting of the General Assembly
  • Propositions from the board, if any
  • Motions from the members, if any
  • Closing of the meeting

In an effort to reduce the time it takes to go through this long list, which is mandated by the bylaws, we will try to send as much information to the members mailing list before the GA, so that we can limit presentations to a minimum.

Election of the members of the board

The EPS bylaws
limit the number of board members to one chair and 2 – 8 directors, at
most 9 directors in total. Experience has shown that the board members are the
most active organizers of the EuroPython conference, so we try to get as
many board members as possible to spread the work load.

All members of the EPS are free to nominate or self nominate board members. Please write to [email protected]
no later than Friday, July 20 2017, if you want to run for board. We
will then include you in the list we’ll have in the final nomination announcement
before the GA, which is scheduled for July 21.

The
following people from the current board have already shown interest in running for board in the next term as well (in alphabetical order):

  • Anders Hammarquist
  • Darya Chyzhyk
  • Marc-André Lemburg

We will post more detailed information about the candidates and any new nominations we receive in a separate blog post.

Propositions from the board

  • We would like to propose to grant CPython Core Developers a lifetime free entry to EuroPython conferences in recognition for their efforts to build the foundation on what our community is built. The details are to be defined by the EPS board.

The bylaws allow for additional propositions to be announced up until
5 days before the GA, so the above list is not necessarily the final
list.

Motions from the members

EPS members are entitled to suggest motions to be voted on at the GA.
The bylaws require any such motions to be announced at least 5 days
before the GA. If you would like to propose a motion, please send it to [email protected] no later than Friday, July 20 2017, so we can announce the final list to everyone.

Enjoy,

EuroPython Society

July 15, 2018 10:44 AM


Bhishan Bhandari

Idiomatic Python – Use of Falsy and Truthy Concepts

Out of many, one reason for python’s popularity is the readability. Python has code style guidelines and idioms and these allow future readers of the code to comprehend to the intentions of it. It is highly important that the code is readable and concise. One such important tip is to use falsy and truthy concepts. […]

The post Idiomatic Python – Use of Falsy and Truthy Concepts appeared first on The Tara Nights.

July 15, 2018 07:20 AM


Django Weblog

DjangoCon AU 2018: Tickets on sale

DjangoCon Australia, the cute little sibling conference to DjangoCons EU
and US, is on again next month in sunny Sydney.

A one-day event packed full of content, DjangoCon AU is run as a
Specialist Track – a dedicated one-day, one track “mini conference” –
inside PyCon AU.

Tickets for DjangoCon AU and PyCon AU are now on
sale
. If you can only
join us for one day, you can get a ticket for just DjangoCon AU for only
AU$150. But, if you’d like to make a long weekend of it, tickets for the
full event – DjangoCon AU on the Friday, and PyCon AU on the Saturday
and Sunday – are available starting from AUD$440. As part of our ongoing
commitment to ensuring as many people can get to PyCon AU as possible,
there are generous discounts for students, and Contributor ✨
Tickets
that
directly help fill the financial assistance pool of funds.

The talks lists for DjangoCon AU
and all of PyCon AU
are already live, so take a look at what we have in store.

Buy your tickets by August 7 2018 to ensure you get the a coveted PyCon
AU t-shirt. Shirts for DjangoCon AU will be revealed and details
announced on the day.

We hope to see you in Sydney next month!

Katie McLaughlin, PyCon AU Conference Director, DSF Board

July 15, 2018 01:36 AM

July 14, 2018


Weekly Python StackOverflow Report

(cxxxiv) stackoverflow python report

These are the ten most rated questions at Stack Overflow last week.
Between brackets: [question score / answers count]
Build date: 2018-07-14 20:51:04 GMT


  1. Remove item from list based on the next item in same list – [10/9]
  2. Create dict from list of list – [10/7]
  3. In Python, how to drop into the debugger in an except block and have access to the exception instance? – [8/2]
  4. Python: Extracting lists from list with module or regular expression – [6/6]
  5. How to perform contain operation between a numpy array and a vector row-wise? – [6/5]
  6. How can I replace OrderedDict with dict in a Python AST before literal_eval? – [6/3]
  7. python .exe not working properly – [6/2]
  8. Make python’s compile strip docstrings but not asserts – [6/1]
  9. Numpy performance gap between len(arr) and arr.shape[0] – [6/1]
  10. Python: Why is “~” now included in the set of reserved characters in urllib.parse.quote()? – [6/0]

July 14, 2018 08:51 PM


pgcli

Release v1.10.0

Pgcli is a command line interface for Postgres database that does
auto-completion and syntax highlighting. You can install this version using:

$ pip install -U pgcli

This release adds new special commands ev and ef, more table formats,
and a --user alias for --username option, to be compatible with psql. Pgcli
also sets application_name to identify itself within postgres. Multiple bugs
were fixed.

This release was very special because we had a lot of first-time contributors, thanks
to Amjith leading a sprint on pgcli during PyCon 2018! It’s wonderful to see that
spike of commits in mid-may:

Our huge thanks to all the new contributors!

Features:

  • Add quit commands to the completion menu. (Thanks: Jason Ribeiro)
  • Add table formats to T completion. (Thanks: Jason Ribeiro)
  • Support ev`, ef (#754). (Thanks: Catherine Devlin)
  • Add application_name to help identify pgcli connection to database (issue #868) (Thanks: François Pietka)
  • Add –user option, duplicate of –username, the same cli option like psql (Thanks: Alexandr Korsak)

Internal changes:

  • Mark tests requiring a running database server as dbtest (Thanks: Dick Marinus)
  • Add an is_special command flag to MetaQuery (Thanks: Rishi Ramraj)
  • Ported Destructive Warning from mycli.
  • Refactor Destructive Warning behave tests (Thanks: Dick Marinus)

July 14, 2018 07:00 AM


Justin Mayer

Python Development Environment on macOS High Sierra

While installing Python and Virtualenv on macOS High Sierra can be done several ways, this tutorial will guide you through the process of configuring a stock Mac system into a solid Python development environment.

First steps

This guide assumes that you have already installed Homebrew. For details, please follow the steps in the macOS Configuration Guide.

Python

We are going to install the latest 2.7.x version of Python via Homebrew. Why bother, you ask, when Apple includes Python along with macOS? Here are some reasons:

  • When using the bundled Python, macOS updates can nuke your Python packages, forcing you to re-install them.
  • As new versions of Python are released, the Python bundled with macOS will become out-of-date. Homebrew always has the most recent version.
  • Apple has made significant changes to its bundled Python, potentially resulting in hidden bugs.
  • Homebrew’s Python includes the latest versions of Pip and Setuptools (Python package management tools)

Along the same lines, the version of OpenSSL that comes with macOS is out-of-date, so we’re going to tell Homebrew to download the latest OpenSSL and compile Python with it.

Use the following command to install Python via Homebrew:

You’ve already modified your PATH as mentioned in the macOS Configuration Guide, right? If not, please do so now.

Since Python 2.7 is deprecated, I highly recommend that you also install Python 3:

This makes it easy to test your code on both Python 2.7 and Python 3. More importantly, since Python 3 is the present and future of all things Python, the examples below assume you have installed Python 3.

Pip

Let’s say you want to install a Python package, such as the fantastic Virtualenv environment isolation tool. While nearly every Python-related article for macOS tells the reader to install it via sudo pip install virtualenv, the downsides of this method include:

  1. installs with root permissions
  2. installs into the system /Library
  3. yields a less reliable environment when using Homebrew’s Python

As you might have guessed by now, we’re going to use the tools provided by Homebrew to install the Python packages that we want to be globally available. When installing via Homebrew’s Pip, packages will be installed to /usr/local/lib/python{version}/site-packages, with binaries placed in /usr/local/bin.

Homebrew recently changed the names of Python-related binaries to avoid potential confusion with those bundled with macOS. As a result, pip became pip2, et cetera. Between this change and the many new improvements in Python 3, it seems a good time to start using pip3 for all the examples that will follow below. If you don’t want to install Python 3 or would prefer your global packages to use the older, deprecated Python 2.7, you can replace the relevant invocations below with pip2 instead.

Version control (optional)

The first thing I pip-install is Mercurial, since I have Mercurial repositories that I push to both Bitbucket and GitHub. If you don’t want to install Mercurial, you can skip ahead to the next section.

The following command will install Mercurial and hg-git:

pip3 install Mercurial hg-git

At a minimum, you’ll need to add a few lines to your .hgrc file in order to use Mercurial:

The following lines should get you started; just be sure to change the values to your name and email address, respectively:

[ui]
username = YOUR NAME <[email protected]>

To test whether Mercurial is configured and ready for use, run the following command:

If the last line in the response is “no problems detected”, then Mercurial has been installed and configured properly.

Virtualenv

Python packages installed via the steps above are global in the sense that they are available across all of your projects. That can be convenient at times, but it can also create problems. For example, sometimes one project needs the latest version of Django, while another project needs an older Django version to retain compatibility with a critical third-party extension. This is one of many use cases that Virtualenv was designed to solve. On my systems, only a handful of general-purpose Python packages (such as Mercurial and Virtualenv are globally available — every other package is confined to virtual environments.

With that explanation behind us, let’s install Virtualenv:

Create some directories to store our projects, virtual environments, and Pip configuration file, respectively:

mkdir -p ~/Projects ~/Virtualenvs ~/Library/Application Support/pip

We’ll then open Pip’s configuration file (which may be created if it doesn’t exist yet)…

vim ~/Library/Application Support/pip/pip.conf

… and add some lines to it:

[install]
require-virtualenv = true

[uninstall]
require-virtualenv = true

Now we have Virtualenv installed and ready to create new virtual environments, which we will store in ~/Virtualenvs. New virtual environments can be created via:

cd ~/Virtualenvs
virtualenv foobar

If you have both Python 2.x and 3.x and want to create a Python 3.x virtualenv:

virtualenv -p python3 foobar-py3

… which makes it easier to switch between Python 2.x and 3.x foobar environments.

Restricting Pip to virtual environments

What happens if we think we are working in an active virtual environment, but there actually is no virtual environment active, and we install something via pip3 install foobar? Well, in that case the foobar package gets installed into our global site-packages, defeating the purpose of our virtual environment isolation.

In an effort to avoid mistakenly Pip-installing a project-specific package into my global site-packages, I previously used easy_install for global packages and the virtualenv-bundled Pip for installing packages into virtual environments. That accomplished the isolation objective, since Pip was only available from within virtual environments, making it impossible for me to pip3 install foobar into my global site-packages by mistake. But easy_install has some deficiencies, such as the inability to uninstall a package, and I found myself wanting to use Pip for both global and virtual environment packages.

Thankfully, Pip has an undocumented setting (source) that tells it to bail out if there is no active virtual environment, which is exactly what I want. In fact, we’ve already set that above, via the require-virtualenv = true directive in Pip’s configuration file. For example, let’s see what happens when we try to install a package in the absence of an activated virtual environment:

$ pip3 install markdown
Could not find an activated virtualenv (required).

Perfect! But once that option is set, how do we install or upgrade a global package? We can temporarily turn off this restriction by defining a new function in ~/.bashrc:

gpip(){
   PIP_REQUIRE_VIRTUALENV="0" pip3 "[email protected]"
}

(As usual, after adding the above you must run source ~/.bash_profile for the change to take effect.)

If in the future we want to upgrade our global packages, the above function enables us to do so via:

gpip install --upgrade pip setuptools wheel virtualenv

You could achieve the same effect via env PIP_REQUIRE_VIRTUALENV="0" pip3 install --upgrade foobar, but that’s much more cumbersome to type.

Creating virtual environments

Let’s create a virtual environment for Pelican, a Python-based static site generator:

cd ~/Virtualenvs
virtualenv pelican

Change to the new environment and activate it via:

cd pelican
source bin/activate

To install Pelican into the virtual environment, we’ll use pip:

pip3 install pelican markdown

For more information about virtual environments, read the Virtualenv docs.

Dotfiles

These are obviously just the basic steps to getting a Python development environment configured. Feel free to also check out my dotfiles (GitHub mirror).

If you found this article to be useful, please follow me on Twitter. Also, if you are interested in server security monitoring, be sure to sign up for early access to Monitorial!

July 14, 2018 06:00 AM

July 13, 2018


Bhishan Bhandari

Copying mutable objects in Python

An assignment statement in python does not create copies of objects. It binds the name to the object. While working with mutable objects and/or collections of mutable objects, it creates inconsistencies and hence it would be of interest to us to have ways to make real copies of the objects. Essentially, we would require copies […]

The post Copying mutable objects in Python appeared first on The Tara Nights.

July 13, 2018 07:14 PM


Mike Driscoll

Guido Retires as BDFL

Guido van Rossum, the creator of Python, and the Benevolent Dictator for Life (BDFL) has retired as the BDFL with no successor named as of July 12, 2018. See the following email from the Python Committers list for full details.

Basically there was a lot of negativity over PEP 572 – Assignment Expressions that appears to have driven the creator of Python into early retirement. While he will still be around to help and mentor, he will no longer be taking part in the community in quite the same way.

I love Python and its community so it makes me sad that Guido would need to step down in this way. However I wish him well and will continue to use and promote Python and civility in our community.

July 13, 2018 05:49 PM


EuroPython

EuroPython 2018: Late Bird Rates and Day Passes

We will be switching to the late bird rates for tickets on Monday next week (July 16), so this is your last chance to get tickets at the regular rate, which is about 30% less than the late bird rate.

image

EuroPython 2018 Tickets

Late Bird Tickets

We will have the following categories of late bird ticket prices for the conference tickets:

  • Business conference ticket: EUR 750.00 excl. VAT, EUR 900.00 incl. 20% UK VAT
    (for people using Python to make a living)
  • Personal conference ticket: EUR 500.00 incl. 20% UK VAT
    (for people enjoying Python from home)

Please note that we do not sell on-desk student tickets. Students who decide late will have to buy day passes or a personal ticket.

Day Passes

As in the past, we will also sell day passes for the conference. These allow attending the conference for a single day (Wednesday, Thursday or Friday; valid on the day you pick up the day pass):

  • Business conference day pass: EUR 375.00 excl. VAT, EUR 450.00 incl. 20% UK VAT
    (for people using Python to make a living)
  • Personal conference day pass: EUR 250.00 incl. 20% UK VAT
    (for people enjoying Python from home)
  • Student conference day pass: EUR 105.00 incl. 20% UK VAT
    (only available for pupils, students and postdoctoral researchers; please bring your student card or declaration from University, stating your affiliation, starting and end dates of your contract)

Please see the registration page for full details of what is included in the ticket price. Also note that neither late bird tickets, nor day passes are refundable.

Enjoy,

EuroPython 2018 Team
https://ep2018.europython.eu/
https://www.europython-society.org/

July 13, 2018 03:10 PM


Continuum Analytics Blog

Deep Learning with GPUs in Anaconda Enterprise

AI is a hot topic right now. While a lot of the conversation surrounding advanced AI techniques such as deep learning and machine learning can be chalked up to hype, the underlying tools have been proven to provide real value. Even better, the tools aren’t as hard to use as you might think. As Keras …
Read more →

The post Deep Learning with GPUs in Anaconda Enterprise appeared first on Anaconda.

July 13, 2018 02:18 PM





Source link

LEAVE A REPLY

Please enter your comment!
Please enter your name here